Stochastic System Notes 2011
Stochastic System Notes 2011
Abstract
A note to ORIE 3510/ORIE 5510/STSCI 3510 students: these lecture notes were
taken and compiled in LATEX by Georgia Tech Ph.D. student Hyunwoo Park in Spring
2011. Jim Dai has not edited this document. Some of the terms and examples have
not been updated yet, even though they are out of context for ORIE 3510. When in
conflict, the class notes in current ORIE 3510 (during Spring 2022) have the precedence
over this document.
1 Lecture 1: Jan 13
1.1 Newsvendor Problems
It used to be called “Newsboy Problem.” This class of problems helps you determine optimal
ordering policy facing uncertain demand while selling price is fixed (deterministic). Imagine
you are starting a new business selling New York Times on campus. First of all, each day,
you need to decide how many copies you need to get from the publisher. This is a decision
variable. Variable cost is also associated with your decision. Say cv = $0.25. You also
need to decide how much you sell this newspaper. The publisher may place a constraints
on minimum price of papers. Say cp = $1.00. How about left-over copies? Most likely, the
value of left-overs is zero. However, it could be positive or negative depending on situation.
Say cs = $0.00. Also, cs < cv , otherwise you can make risk-free profits by selling left-overs
back to the publisher. You also need to know the characteristic of demand. Denote daily
demand by D. Say E(D) = 50. It can be impractical to obtain the exact distribution of D,
but you can try or approximate it at least.
d 10 15 20 25 30
1 1 1 1 1
P(D = d) 4 8 8 4 4
Although you figure out the distribution, you do not know how many demand there will
be on each given day. Your company could try to change the distribution by, for example,
1
lowering the price. However, the uncertainty here is intrinsic and let us assume that the
price is fixed.
Finally, the question comes down to “How many copies should you order each evening?”.
However many you decide to order, your decision will probably turn out to be wrong.
A few years ago, I went to Intel. Its manufacturing process is one of the most complex
manufacturing system that human being ever constructed. The entire process takes six
weeks. Every day each step, you will have intermediate products. In this setting, you cannot
manage this kind of operation without having a model. In this case, what would be the
right question to answer? Profit? Maximizing the profit? In detail, it should be maximizing
long-run average profit per day.
Not in this lecture but in the next, we will discuss how to obtain the optimal amount of
order each day. Now let us explore the option that we order 25 copies flat every day.
Day 1 2 3
D 10 25 15
Profit 10 − 6.25 = 3.75 25 − 6.25 = 18.75 15 − 6.25 = 8.75
Although you are always making profit in this case, you can imagine that you can conceive
the ordering policy that generates loss instead of profit on a given day. The average profit
over 100 days will be
3.75 + 18.75 + 8.75 + · · ·
.
100
In symbolical notation,
p 1 + p2 + p3 + · · · + pn
Pn = ,
n
and assuming that demand of a day is iid and our order is constant, the long-run average
profit will be
p1 + p2 + p3 + · · · + pn
lim Pn = lim .
n→∞ n→∞ n
By the law of large numbers, we know that the limit exists and equals to the expectation of
p1 , E[p1 ].
Computer simulation generates large numbers of random numbers once you know the
distribution of demand instead of yourself. You might ask if a random number generated by
computer is truly random. It is not in a sense that you can replicate the sequence by using
the same random seed. However, it appears to be random and passes various statistical
tests.
Anyway, if two persons simulate the situation with the same demand distribution, the
long-run average profit should be same even though actual realizations between two persons
differ. This is the power of the strong law of large numbers.
p1 + p 2 + p3 + · · · + pn
P lim = E[p1 ] = 1
n→∞ n
2
I am stressing this here because this is the underlying assumption on the following lec-
tures. Textbook usually tells you to maximize the expectation, but does not tell you why.
This is why.
So in the end, we need to maximize expected profit E(P ). The formula for the profit is
where a ∧ b = min(a, b) and a+ = max(a, 0). You will be asked to obtain y which maximizes
P.
I will be away to Netherlands next week, so TAs will be cover instead of me.
3
2 Lecture 2: Jan 18
The instructor for this week will be Mr. Shuangchi He. Students can write an email to
[email protected] to ask questions about these two lectures.
Usually, Cs < Cv < Cp . The basic metrics we are interested in would be:
We can also think of an alternative expression. For each paper sold, I earn Cp − Cv . On the
other hand, I lose (Cv − Cs ) per paper at the end of the day.
4
In the long run, I am interested in the average profit I make. If the demands of all days
are independent and identically distributed (iid), then by the law of large numbers,
the long run average profit per day ≈ E[P (D, y)].
You should understand that the important part of the approximation above is that LHS is
average of “many” periods while RHS is the expectation for “one” period.
Now let g(y) = E[P (D, y)] that is the expected profit when I have y papers at the
beginning of the day. In the expression E[P (D, y)], D is the random factor and we take
expectation with respect to D. We want to maximize the profit, so our optimization problem
would be formulated as
Remark 2.2. (i) Expected profit is relevant only when the system is managed repeatedly
over many periods.
(ii) If you manage the system for only one or a few periods, maximizing the expected profit
may not make sense.
(iii) In this setting, the optimal order quantity y ∗ that maximizes g(y) should be used for
every period.
Then, how do we compute E[y ∧ D] and E[(y − D)+ ] in order to compute g(y) given y?
Example 2.1. Assume that D follows the following distribution and y = 30, 24 for example.
d 20 25 30 35
P[D = d] 0.1 0.2 0.4 0.3
30 ∧ d 20 25 30 30
(30 − d)+ 10 5 0 0
24 ∧ d 20 24 24 24
(24 − d)+ 4 0 0 0
Then,
Also,
5
Example 2.2. Let D ∼ Uniform(20, 40). What would E[25∧D] and E[(25−D)+ ] be? From
uniform distribution, we have
(
1/20, if 20 ≤ x ≤ 40
f (x) =
0, otherwise.
Then,
Z 40 Z 40
1
E[25 ∧ D] = (25 ∧ x)f (x)dx = (25 ∧ x)dx
20 20 20
Z 25 Z 40
1 1
= xdx + 25dx
20 20 25 20
1 1 25
= 252 − 202 + 15
20 2
Z 40 20 Z
25
1
E[(25 − D)+ ] = (25 − x)+ f (x)dx = (25 − x) dx
20 20 20
Z 25 Z 25
1
= 25dx − xdx
20 20 20
1 1 2 2
= 25 · 5 − (25 − 20 ) .
20 2
6
3 Lecture 3: Jan 20
You learned E[y ∧ D], E[(y − D)+ ] in last class. In Homework 2, you will have to compute
E[(8 − D)− ], which is the shortage cost that will be incurred if you run out of stock. Let us
make it clear here about x+ and x− .
(
x, if x ≥ 0
x+ = max{x, 0} = ,
0, if x < 0
(
− −x, if x ≤ 0
x = max{−x, 0} =
0, if x > 0
• If you order too much, you have unused inventory that will be perished after one period.
• If you order too little, you cannot sell even when excess customers are willing to buy.
Let us now find out the optimal order quantity y ∗ . Let F (x) be the cumulative distribu-
tion function of D.
F (x) = P[D ≤ x]
7
First, D is a continuous random variable with pdf f (x).
Z x
F (x) = f (t)dt
Z0 ∞ Z y Z ∞
E[y ∧ D] = (y ∧ x)f (x)dx = f (x)xdx + f (x)ydx
0 0 y
Z ∞ Z y
+ +
E[(y − D) ] = f (x)(y − x) dx = f (x)(y − x)dx
0 0
Z ∞ Z ∞ Z y
f (x)ydx =y f (x)dx = y 1 − f (x)dx
y y 0
Then,
Z y
0
g (y) = (a + b)yf (y) − (a + b)yf (y) − (a + b) f (x)dx + a = −(a + b)F (y) + a.
0
Setting g 0 (y ∗ ) = 0, we get
a Cp − Cv Cp − Cv
F (y ∗ ) = = = .
a+b Cp − Cv + Cv − Cs Cp − Cs
To verify g(y ∗ ) is maximum, take the second derivative. g 00 (y) = −(a + b)f (y) = (Cp −
Cv )(−f (x)) < 0 implying that g(y) is concave. So g(y ∗ ) must be a maximum!
8
g(y*)
g(y)
y*
Let us turn our attention to the discrete case. D is a discrete random variable taking
values in {d0 , d1 , d2 , · · · } with pmf P[D = di ] = pi .
d d0 d1 d2 ···
P[D = d] p0 p1 p2 ···
9
Example 3.2. Discrete demand, D, is
d 20 25 30 35
P[D = d] 0.1 0.2 0.4 0.3
Also, Cp = 1, Cv = 0.25, Cs = 0. Then, y ∗ is the smallest y such that F (y) ≥ 0.75. Since
F (20) = 0.1, F (25) = 0.3, F (30) = 0.7, F (35) = 1, y ∗ = 35.
Cp − Cv
F (y ∗ ) = .
Cp − Cs
(ii) When D is a discrete random variable, choose y ∗ to be the smallest y such that
Cp − Cv
F (y) ≥ .
Cp − Cs
d 0 10
P[D = d] 0.5 0.5
Cp = 2, Cv = 1, Cs = 0
2−1
F (y) ≥ = 0.5 ⇒ y ∗ = 0
2
For an arbitrary 0 ≤ y ≤ 10,
Every y is optimal.
10
4 Lecture 6: Feb 1
Before beginning today’s lecture, I inform you to register on the Littlefield game this week.
(i) # of servers
(iv) expected time spent in the buffer (i.e. waiting time in queue excluding service time)
The general notation of a queueing system is G/G/1. The first G means that inter-
arrival time is in general distribution. The second G means that processing time is in
general distribution. The last 1 means single server system. A server is assumed to work on
one customer at a time.
Note that there are two views on this system. One is the manager’s perspective and the
other is customers’ standpoint. The performance measures above are important from the
manager’s view. Each customer cares more about how much time he/she should be waiting.
In this queueing system, there are two inputs: inter-arrival time and processing time.
Let vi be the processing time of the i-th customer and ui be the inter-arrival time between
the (i − 1)-th customer and the i-th customer. Given
{ui : i = 1, 2, · · · },
{vi : i = 1, 2, · · · },
we can say that the dynamics of this queue is known. By dynamics, we mean
11
Example 4.1. Assume that the system is empty at t = 0. Assume that
u1 = 1, u2 = 3, u3 = 2, u4 = 3, u5 = 4
v1 = 4, v2 = 2, v3 = 1.
2 2
Q(t)
Z(t)
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
t t
When we say Q(1), it is ambiguous. Let us make it clear here. At t = 1− , the first
customer has not arrived yet, so Q(1− ) = 0. Also, right after t = 1, Q(1+ ) = 1. Let
Q(t) = Q(t+ ) so that this graph becomes right-continuous.
You need to keep your eyes on Q and Z over time and your manager may want to know
the average queue size over [0, T ] time window. Let T = 8.5.
1 T
Z
1
Average Queue Size = sum of Q’s = Q(t)dt
T T 0
1 1 2
= area under Q from 0 to T = (1 + 1) =
T T 8.5
The last number, average queue size, fluctuates over time. However, we can imagine that
if we observe enough customers, this number may settle down. Let us generalize this idea.
Let wi be the waiting time of i-th customer and wn be the average waiting time for the first
n customers.
w1 + w2 + · · · + wn
wn =
n
Assuming we look at enough customers, we can send n to ∞.
wq = lim wn
n→∞
12
4.1.1 Lindley Equation
In fact, even you do not have to compute wi one by one.
If wn + vn ≤ un+1 , it means that (n + 1)-th customer does not have to wait and will go to
server as soon as he/she arrives. It is called the Lindley equation. You can prove this using
a spreadsheet software drawing three columns: ui , vi , wi .
Intel’s manufacturing process usually takes about six weeks to finish a product. Each
process is composed of many subprocesses. Since each machine is very expensive, running
those machines as most of times as possible is desirable. So, managers used to generate much
more parts than a machine can process so that the machine always has something to work
on. These decisions can be modeled and optimized using queueing theory as well.
>> u = unidrnd(6,1,10)
>> mean(u)
>> v = unidrnd(5,1,10)
>> mean(v)
We threw a six-sided die for u and a five-sided die for v. You can use other distributions
such as Gamma distribution which will be taught in the next class.
>> u = gamrnd(5,10,10000);
>> v = gamrnd(5,10,10000);
>> w = zeros(1,10);
>> for i=1:10
>> if i==1 w(i)=0;
>> else w(i)=max(w(i-1)+v(i-1)-u(i),0);
>> end
>> end
>>
>> sum(w)/10
>> mean(w)
13
5 Lecture 7: Feb 3
5.1 G/G/1 Queue Cont’d
In last lecture, we introduced G/G/1 queue. For this system, we needed two sequences of
random variables.
These two sequences are the input for the queueing system. Performance measures for this
system were as follows.
Imagine a service queue of the Bank of America. Each day, ui , vi may differ. How would
you deal with the changing input which seems to be totally random? We need to have
something invariant to make the model more useful. Although each day’s actual sequences
of ui , vi may be different, they may be statistically same. That is why we need to look at
the distribution of the randomness.
Assume that {ui : i = 1, 2, · · · } is a sequence of iid (independent and identically dis-
tributed) random variables having distribution
(The subscript a signifies “arrival”.) Later equalities in the equation above hold because of
the iid condition. iid means
(i) how long it took for the previous customers to arrive does not affect the time it takes
for the next customer to arrive, i.e., inter-arrival times are independent
(ii) how long it takes for each customer to arrive follows the exactly same distribution.
Also, assume that {vi : i = 1, 2, · · · } is a sequence of iid r.v.’s having cdf Fs (x) where the
subscript s signifies “service”.
For example, if you have vi for each day as follows,
Monday v1 , v2 , · · · = 1, 2, 3, 3, 5, 2, 2, 2, 1, · · ·
Tuesday v1 , v2 , · · · = 5, 1, 4, 2, 1, 1, 2, 3, · · ·
without some commonality for the input, our model is quite useless. That is why we need
iid assumption so that we can assume that each different realizations actually come from the
same distribution.
Now we have distributions for r.v.’s ui , vi . Where do we start with this? Maybe we can
start looking at the expectation of each r.v.
14
Example 5.1. For example,
In this case, we are short of capacity and the queue will grow over time. How about the
arrival rate which means how many customers arrive during unit time?
1
arrival rate λ =
E(u1 )
mean processing time m =E(v1 )
Then, we can define the “traffic intensity” using the two variables defined above.
traffic intensity ρ = λm
u1 = u2 = · · · = un = 3 ⇒ E[ui ] = 3, ∴ λ = 1/3
v1 = v2 = · · · = vn = 2 ⇒ E[vi ] = 2, ∴ m = 2 ⇒ ρ = 2/3
15
Theorem 5.1 (Kingman heavy-traffic approximation formula for average waiting time).
Assume ρ < 1 and ρ is close to 1.
2
ca + c2s
ρ
wq ≈ m
1−ρ 2
where
Var(v1 )
c2s = , squared coefficient of variation (SCV)
[E(v1 )]2
Var(u1 )
c2a =
[E(u1 )]2
Let us look at c2s first. Variance is already one way to measure variability of a random
variable. Then, why don’t we just use variance of v1 instead of newly introduced metric? It
is normalization. If we use variance as it is, the numeric value of the variance will depend
on the time unit we measure. If Var(v1 ) = 100sec2 , the variance value will shrink a lot if we
use hour as the time unit. Variability of a r.v. is intrinsic and our intuition tells us that the
measure for the variability should be intrinsic independent of time unit we use. That is why
we divide the variance with squared expectation.
Now look at the approximation formula itself. If you look at the second term involving ρ,
you can see that ρ/(1−ρ) → ∞ as ρ ↑ 1. The third term captures variabilities of inter-arrival
times and service times.
Another thing that this formula has great practicability because this formula involves
only first two moments of Fa , Fs , E[u1 ], E[u21 ], E[v1 ], E[v12 ]. In light of data collecting, it is
much much easier to obtain just first few moments of a r.v. than the whole distribution.
How much can we trust this formula because it is approximation? In practice, if ρ >
85%, this formula is quite a good approximation. That is why it is called “heavy-traffic”
approximation. Then, what if ρ << 1? In such case, waiting time is usually very low, so
you either don’t care about it or it is not that important part you need to pay attention.
Are you reading the assigned book, The Goal? It is about identifying bottlenecks in a
plant. When you become a manager of a company and are running a expensive machine,
you usually want to run it all the time with full utilization. However, the implication of
Kingman formula tells you that as your utilization approaches to 100%, the waiting time
will be skyrocketing. It means that if there is any uncertainty or random fluctuation input
to your system, your system will greatly suffer. In lower ρ region, increasing ρ is not that
bad. If ρ near 1, increasing utilization a little bit can lead to a disaster.
Atlanta, 10 years ago, did not suffer that much of traffic problem. As its traffic infras-
tructure capacity is getting closer to the demand, it is getting more and more fragile to
uncertainty.
A lot of strategies presented in The Goal is in fact to decrease ρ. You can do various
things to reduce ρ of your system by outsourcing some process, etc.
You can also strategically manage or balance the load on different parts of your system.
You may want to utilize customer service organization 95% of time, while utilization of sales
people is 10%.
16
5.3 Long-run Average Queue Size
Queue size is another important performance metric for managers. The long-run average
queue size per unit time can be computed by the formula,
1 T
Z
lq = lim Q(s)ds.
T →∞ T 0
Recall there was another metric w. This w averages over n customers, while Q averages
over time. Hence,
lq time average
wq headcount average.
Let
lq = λwq .
The Little’s Law is much more general than G/G/1 queue. It can be applied to any black
box with definite boundary. The Georgia Tech campus can be one black box. ISyE building
itself can be another. In G/G/1 queue, we can easily get average size of queue or service
time or time in system as we differently draw box onto the queueing system.
17
Average size in system = [Average time in system]
Kingman’s formula already tells us a lot and I will talk about other important perfor-
mance measures next time.
18
6 Lecture 8: Feb 8
Today we will connect what we have learned with each other.
c2a + c2s
ρ
E(w) = w ≈ m ρ<1
1−ρ 2
where m is mean service time and ρ = λm is traffic intensity. When ρ < 1, ρ is the utilization
of the system.
Let vi ∼exp with mean 2 minutes, then it is a continuous r.v. with pdf
1 1
f (x) = e− 2 x = λe−λx , x ≥ 0.
2
E(vi ) = 2, Var(vi ) = (1/λ)2 = 4, c2 (vi ) = 4/22 = 1. If c2 (vi ) = 0, vi is deterministic because
it means variance is zero. Let us learn about the gamma distribution.
vi ∼ Gamma(α, β)
Here, α is shape parameter and β is scale parameter. A gamma distribution can be expressed
as sum of exponential random variables. If X ∼Gamma, X = X1 + X2 + · · · with Xi are iid
and exponentially distributed with mean β.
Example 6.1. Let us experiment our service system with 100,000 customers.
n=100000
>> u=gamrnd(1.0, 6, 1, n)
>> mean(u)
6.0030
>> var(u)
15.7406
>> var(u)/mean(u)^2
0.9915
19
In u, α = 1.0, β = 6.
Since the mean of gamma r.v. is αβ, the mean should be around 5. Every six minute, there
will be an arrival.
Then, some customers do not have to wait, but some have to. The average waiting time is
around 37 minutes per customer. Let us look at what Kingman’s formula tells us.
>> rho=5*(1/6.0)
>> rho
0.8333
>> m=5
>> m*rho/(1-rho)*(1+2)/2
37.5000
It is quite accurate as you are seeing now. ρ = λm = 56 . Then, how do we get the parameters
for your model? Here you need to observe your system yourself and collect data and derive
parameters for your model maybe using some statistics. The beauty of Kingman’s formula is
that you do not have to get the whole distribution of randomness in your system. Estimating
the entire distribution is much harder than estimating the mean and variance of random
variables.
20
More than 1 customer is in the system sometimes, but we have the average of 1. Now think
about time spent in the system for each customer. S1 = 4, S2 = 3, S3 = 2, so
1 1
S = (S1 + S2 + S3 ) = (4 + 3 + 2) = 3minutes.
3 3
How about the arrival rate?
3 1
λ= =
9 3
What does the Little’s Law tell us?
average number of customers in the queue =arrival rate × average waiting time
L =λw
? 1
1= 3
3
It does work. The beauty of the Little’s Law is that, as long as the measurements for L, λ, w
are consistent, it can be applied to any black box system. Also note that L, w mean that we
compute the average over the long period of time.
6.4 Throughput
Throughput is the rate of output flow from a system.
• If ρ ≤ 1, throughput= λ.
• If ρ > 1, throughput= µ.
Example 6.4. Suppose a system with two queues linked in a series like in the previous
example.
21
(i) If λ = 15 units per minute, µA = 20 units per minute, µB = 25 units per minute,
compute the throughput of the whole system.
λ 15 15
ρA = = = .75, ρB = = .6
µA 20 25
Since both traffic intensities are less than 1, throughput of A is λ = 15 and that of B
is also λ = 15.
(ii) Suppose that λ = 30 units per minute while everything else remains the same.
20
ρA = 1.5, ρB = = .8
25
Throughput of A is µA = 20, so arrival rate of station B is also 20 not 30. Thus, B’s
traffic intensity is .8 ≤ 1. Throughput of B is 20, and it is also the throughput of the
system because B is the terminal station of the system.
22
7 Lecture 9: Feb 10
7.1 The Wisdom of Jonah
The most important part of this book is how to manage bottlenecks. First of all, identifying
bottlenecks then managing bottlenecks. Three quantities mentioned in the book.
(i) Throughput: The rate at which the system generates money through sales. If you
produce something but do not sell it, it is not throughput.
(ii) Inventory: All the money that the system has invested in purchasing things that is
(iii) Operational Expense
Most important part of this book is capacity management.
(i) Identify the system bottlenecks (constraints).
(ii) Increase effective capacity of the system bottlenecks.
(iii) Adjust product mix, if appropriate.
If you are selling vehicles: cars and trucks, you could sell half and half or 70:30. The
thing is that if you change your product mix, the bottleneck may shift because each
product requires different resources.
(iv) Subordinate everything else to the bottlenecks (for example, drum & ropes).
Team has to march together.
(v) If, in a previous step, a bottleneck constraints has been broken, return to step 1.
There’s always a bottleneck. If you don’t know, you may be the one.
Then, how to increase effective capacity of bottlenecks?
(i) Never let the bottleneck be idle because throughput of system is equal to throughput
of bottleneck. Do not overload the bottleneck with too much work in progress. Let
bottleneck “pull” work from buffer.
(a) Never lack an operator, change lunch and break schedule, etc.
(b) Maintain a buffer inventory in front of the bottleneck so that it is never starved.
(ii) Squeeze as much output as you can from the bottleneck.
(a) Use you best workers at the bottleneck
(b) Reduce bottleneck setup time.
(c) Split and overlap batches to increase throughput
I will post the slides to t-square.
23
7.2 Discrete Time Markov Chain
We are now moving on to another part of the course. We have three more modules. After
DTMC, we will be talking about Poisson process and then continuous-time Markov chain.
So far, we had only one decision, because item is perishable. You had to get rid of leftover.
Now, we will be talking about non-perishable items.
Example 7.1 (Inventory Model for Non-Perishable Item). Dn is the demand in the nth
period (days or weeks). Note that inventory that is left at the end of a week can be used to
satisfy the demand in the following week. For example, {Dn } is an iid sequence.
d 0 1 2 3
1 1 1 1
P(Dn = d) 8 4 2 8
Let Cp =$100, Cv =$50, Cf =$100, h=$10. What is the optimal inventory policy to maximize
the long-term average profit per week? Do not be confused this with the newsvendor problem.
At that time, you had to get rid of items at the end of each period.
How do you analyze the performance of an inventory policy? Every Friday 5pm, let’s say
we decide how much to order for the following week so that the ordered items will arrive at
8am the following Monday.
What do you symbolize an example inventory policy? Let Xn be the inventory level at
the end of period n. Say if Xn ≥ 2 do not order. If Xn < 2, order up to S = 4 items. This
type of policies is called (s, S) policy. In general, we can define it as
This is very popular policy. Virtually every company has some version of this policy. To
decide what value s, S should take, you need some analysis. Once you’ve done this class,
you will know how to decide that.
We begin analysis with building a table. Let
24
8 Lecture 10: Feb 15
8.1 Discrete Time Markov Chains (DTMCs)
Example 8.1. Assume iid demand
d 0 1 2 3
P(D = d) 1/8 1/4 1/2 1/8
In the real world, demand is usually not iid. There could be some seasonality. But, in some
business like WalMart selling items as cheap as possible, demand can be modeled as iid.
Assume our inventory policy to be (s, S) = (1, 4). Let Xn be inventory level at the end
of week n. Note that values that Xn can take is in {0, 1, 2, 3, 4}. Does there exist s such
that {Xn : n = 1, 2, · · · } is iid sequence? It is time series.
Consider the following probability. What would the value be?
P(Xn+1 = 3|Xn = 2) = 0
Xn = 2 means that nth week ends with 2 items and Xn+1 = 3 means that (n + 1)th week
ends with 3 items. How about the following probability?
1
P(Xn+1 = 3|Xn = 1) = P(Dn+1 = 1) =
4
Note that we will order according to our inventory policy over the nth weekend, so the
demand for (n + 1)th week should be 1. Matrix is a good way to denote these probability in
a neat form.
0 1 2 3 4
0 0 18 12 41 81
1 1 1 1
1 05 81 21 4 8
2 81 41 81 01 0
3 8 2 4 8 0
4 0 18 12 41 81
(i) State space S, e.g. S = {0, 1, 2, 3, · · · }: You will see that S does not have to be finite.
P
(ii) Transition probability matrix P = (Pij ) such that Pij ≥ 0 and j∈S Pij = 1: This is
the matrix you just saw above. (The sum of row should be 1 but the sum of column
does not have to be.)
(iii) Initial state (distribution): This is how much inventory you are given at the starting
point. It is the information about X0 .
25
Definition 8.1 (Discrete Time Markov Chain). A discrete time stochastic process X =
{Xn : n = 0, 1, 2, · · · } is said to be a DTMC on state space S with transition matrix P if for
each n ≥ 1 for i0 , i1 , i2 , · · · , i, j ∈ S
P(Xn+1 = j|X0 = i0 , X1 = i1 , X2 = i2 , · · · , Xn−1 = in−1 , Xn = i) = Pij . (8.1)
The most important part of this definition is (8.1). At this point, let us recall the
definition of conditional probability.
P(A ∩ B) P(A, B)
P(A|B) = =
P(B) P(B)
Note that comma and ∩ are interchangeable in this context. This (8.1) is called the Markov
property. In plain English, it says that once you know today’s state, tomorrow’s state has
nothing to do with past information. No matter how you reached the current state, your
tomorrow will only depend on the current state. In mathematical notation,
P(Xn+1 = j|X0 = i0 , X1 = i1 , X2 = i2 , · · · , Xn−1 = in−1 , Xn = i) = P(Xn+1 = j|Xn = i).
(i) Past states: X0 = i0 , X1 = i1 , X2 = i2 , · · · , Xn−1 = in−1
(ii) Current state: Xn = i
(iii) Future state: Xn+1 = j
Definition 8.2 (Markov Property). Given the current information (state), future and past
are independent.
From information gathering perspective, it is very appealing because you just need to
remember the current. For an opposite example, Wikipedia keeps track of all the histories
of each article. It requires tremendous effort. This is the beauty of the Markov property.
What if I think my situation depends not only the current but one week ago? Then, you
can properly define state space so that each state contains two weeks instead of one week. I
have to stress that you are the one who decides how your model should be: what the state
space is, etc. You can add a bit more assumption to fit your situation to Markov model.
This type of DTMC is called the time-homogeneous DTMC. It means that from this
week to next week is same as from next week to the following week.
Confirming question: is our inventory model DTMC? Yes, because we know that we do
not have to know the past stock level to decide whether to order or not.
26
(i) Consider we are modeling the weather of a day. State 0 means hot and state 1 means
cold. Then, the probability of hot after a hot day is 3/4.
(ii) Another example would be machine repairing process. State 0 is the machine is up
and running and state 1 means machine is under repair.
Example 8.3 (A Simple Random Walk). Suppose you toss a coin at each time n and
you go up if you get a head, down if you get a tail. Then, the state space S = Z =
{· · · , −2, −1, 0, 1, 2, · · · } and Xn is the position after nth toss of the coin.
Pi,i+1 = p
Pij = Pi,i−1 = q
0, otherwise
You can see that P(Head) = p, P(Tail) = q, p + q = 1 and p can be bigger or smaller than q
in which case you are tossing a biased coin. Note that if Xn = i then Xn+1 is either i + 1 or
i − 1 as shown in the figure below.
f : S × (−∞, ∞) 7→ S, f (i, u) ∈ S
such that Xn+1 = f (Xn , Un ) and {Ui : i = 1, 2, 3, · · · } is an iid sequence. Then, {Xn : n =
1, 2, · · · } is a DTMC
I will not prove this here, but it is very useful theorem to see if something is a DTMC or
not.
Example 8.4. Let Xn+1 = Xn + Un+1 and Ui is a coin toss at time i. Suppose
then Xn is a DTMC.
Now see if our inventory model fits in this theorem. Let us express the inventory model
in a different way.
(
4 − Dn+1 , if Xn ≤ 1
Xn+1 = +
(Xn − Dn+1 ) , if Xn ≥ 2
27
Then,
f (1, d) =4 − d d ∈ {0, 1, 2, 3}
=f (0, d)
f (2, d) =(2 − d)+
f (3, d) =(3 − d)+
f (4, d) =(4 − d)+ .
28
9 Lecture 11: Feb 17
d 0 1 2 3
P(Di = d) 1/8 1/2 1/4 1/8
Let Xn denote the number of items at the end of week n. Our inventory policy is
(s, S) = (1, 4), which means if the inventory level goes below or equal to 1, you order up to
4. Then,
1
P(Xn+1 = 1|Xn = 3) = P(D = 2) = .
4
Likewise, the transition matrix of this DTMC is
0 0 1/8 1/4 1/2 1/8
1 0 1/8
1/4 1/2 1/8
P = 2 3/8 1/2 1/8 0 0
.
3 1/8 1/4 1/2 1/8 0
4 0 1/8 1/4 1/2 1/8
Now, let Yn denote the number of items at the beginning of week n. Is Y = {Yn : n =
0, 1, 2, · · · } a DTMC? Yes. The state space S = {2, 3, 4} and the transition matrix is
2 1/8 0 7/8
P = 3 1/2 1/8 3/8 .
4 1/4 1/2 1/4
For example, let us look at P(Yn+1 = 4|Yn = 2). It corresponds with the cases where demand
is 1 or 2 or 3. Then, leftover on Friday night is less than or equal to 1, so you order up to 4
by the beginning of next week. Mathematically,
(
4, if Xn ≤ 1
Yn =
Xn , if Xn ≥ 2,
or,
(
4, if Yn − Dn ≤ 1
Yn =
Yn − Dn , if Yn − Dn ≥ 2.
29
To simulate the DTMC, you need in general 3 dice. Then, a sample path of the DTMC may
be as follows.
3
S 1
0
0 1 2 3 4 5
n
Note that the samples of DTMC may be different. Now, suppose we are computing
P(X4 = 0|X2 = 0). How would you compute it?
Likewise, we can also compute P(X4 = 1|X2 = 0). But, how can we calculate such probabil-
ities more easily? In fact,
2
P(X4 = j|X2 = i) = P(i,j) = (i, j)th entry of P 2 .
30
0 1/4 3/4 0 1/4 3/4
P 2 = P · P = 1/2 0 1/2 1/2 0 1/2 = 0
1 0 0 1 0 0
In general,
(n)
n
P(Xn+k = j|Xk = i) = Pij = Pi,j
n
P(Xn = j|X0 = i) = Pi,j .
The following conditional probability formula can help you understand these types of
calculation.
P(A, B, C) P(A, B, C) P(B, C)
P(A, B|C) = = = P(A|B, C)P(B|C)
P(C) P(B, C) P(C)
µn = µ · P n .
31
10 Lecture 12: Feb 22
10.1 Review for Midterm
Traffic intensity is mathematically defined as
λ λ
ρ1 = λm1 = , ρ2 = λm2 =
µ1 µ2
where λ is arrival rate and µ is service rate. You should be aware that traffic intensity and
utilization may be different. Depending on situation, these two may differ.
Also,
1
λ= , m1 = E(vi1 ), m2 = E(vi2 ).
E(ui )
We also covered Gamma distribution. It has two parameters.
X ∼ Gamma(α, β)
α is shape parameter and β is scale parameter. Erlang distribution is a special case of
Gamma distribution with integer α value. For example,
X ∼ Gamma(10, .2)
X =X1 + X2 + · · · + X10
where Xi are iid, exponentially distributed. Xi is exponential with mean 0.2. Therefore,
E[X] =10E[Xi ] = 2, Var(X) = 10 Var(Xi )
Var(X) 10 Var(X1 ) 1 Var(X1 ) 1
c2 (X) = 2
= 2
= =
(E[X]) (10E[Xi ]) 10 E(Xi ) 10
For general Gamma distribution, you can generalize the result. c2 (X) = 1/α. Hence, if α is
below 1, variability grows.
Suppose we have multiple stations. Say two cases of three stations.
First case: ρ1 = 1.2 ρ2 = .8 ρ3 = .5
Second case: ρ1 = .8 ρ2 = 1.2 ρ3 = .5
Throughput for both systems should be same. The bottleneck for the first system is ρ1 while
that of the second is ρ2 . Thinking about utilization of ρ3 , it is always .5/1.2 because the item
feeding from station 2 to 3 is always 1/1.2. For the first system, station 2 is not overwhelmed
so it passes out items at the rate at which it gets fed.
Let us think about another case.
ρ1 = .9 ρ2 = .8 ρ3 = .5
In this case, utilization of station 3 is .5/1 = .5 because no stations were overwhelmed.
Throughput is 1.
Can Kingman’s approximation be generalized into multiple stations? Generally, yes.
There is network version of the approximation, but I would not emphasize that because it
is more like heuristics.
32
10.2 DTMC generated by a Recursive Function
You need to be able to interpret transition probabilities Pi,j by looking at this matrix. For
example,
If you need to compute for multiple steps, you need to get P 2 or P 3 . Examples are
2
P(X4 = 3, X2 = 1|X1 = 2) =P2,1 P1,3
3
X 3
X
2
P(X4 = 3|X2 = 1) = P(X4 = 3, X3 = i|X2 = 1) = P1,i Pi,3 = P1,3 .
i=1 i=1
I can of course give you a bit more complicated version of this problem.
2 6
P(X10 = 1, X4 = 3, X2 = 1|X1 = 2) =P2,1 P1,3 P3,1
33
Then, does steady-state always exist? Sometimes not. Suppose following transition
matrix.
0 1
P =
1 0
Here wn+1 plays the role of Xn+1 and Un+1 := vn − un+1 . We need to have Ui iid for the
chain be Markov. If vi and ui are iid, Un+1 is also iid. Hence, this chain is a DTMC.
34
11 Lecture 13: Mar 1
11.1 Stationary Distribution
Let us start with the inventory model we did last time. Assume the demand distribution as
follows.
d 0 1 2 3
P(D = d) .1 .4 .3 .2
If our inventory policy is (s, S) = (1, 3) and Xn is the inventory at the end of each week
(i.e. S = {0, 1, 2, 3}),
0 .2 .3 .4 .1
1 .2 .3 .4 .1
P = .
2 .5 .4 .1 0
3 .2 .3 .4 .1
Let us define µ = (0, 0, 0, 1) as the initial distribution of our DTMC. It means that X0 = 3
with probability 1, i.e. deterministically.
Try the following Matlab codes.
>> mu=[0 0 0 1]
>> mu*P^2
>> mu*P^10
>> mu*P^100
>> mu*P^101
As you will find the probability distributions for µP 100 and µP 101 the same. How fast does
this distribution reach to the final distribution? Even just after 10 weeks, it seems like the
chain already reached the stable stage. This is related to the relaxation time. In this case,
the relaxation time seems to be quite short.
How would you call limn→∞ µn ? We can call it the “limit distribution”, which is in
this case (.2923, .3308, .3077, .0692). In mathematical notation, π = (π0 , π1 , π2 , π3 ). What
condition should this π satisfy?
3
X
πi = 1
i=0
πP = π
The first condition is due to π is a probability distribution, but the second condition may
not be directly intuitive. Let us think this way. Since we know µ100 ≈ µ101 , let us call that
value π. We also know that µ101 = µ100 P . Hence, π = πP .
35
Definition 11.1 (Stationary Distribution). Suppose π = (πi , i ∈ S) satisfies
P
(i) πi ≥ 0 and i∈S πi = 1
(ii) π = πP .
Then, π is said to be a stationary distribution of the DTMC.
There are two interpretations of stationary distribution. Suppose π0 = .29 = 29%.
• Long run fraction of time that this DTMC stays in state 0 is 0.29.
• Probability that this DTMC will be in state 0 if you run this system for long time is
0.29.
Example 11.1. Let us apply this to our transition matrix. Using π = πP , we have the
following system of linear equations.
To solve this a bit more easily, you can first solve with respect to one variable, say, π3 .
Solving this, we obtain
38 9 43 9 40 9 9
π0 = , π1 = , π2 = , π3 = ≈ .0692.
9 130 9 130 9 130 130
In the test, you will not have a luxury to use Matlab. Even in such case, you could
be asked to give the stationary distribution. Hence, you should practice to compute the
stationary distribution from the given transition matrix.
36
(v) Unfulfilled demand is lost.
If you are a manager of a company, you will be interested in something like this: What
is the long-run average profit per week? You can lose one week, earn another week. But,
you should be aware of profitability of your business in general.
Let C(i) be the expected profit of the following week, given that this week’s inventory
ends with i items. Let us first think about the case i = 0 first. You need to order three
items, and cost of doing it will be associated with both variable cost and fixed cost. We
should also count in the revenue you will earn next week. Fortunately, we do not have to
pay any holding cost because we do not have any inventory at the end of this week.
C(0) = − Cost + Revenue
=[−3($1000) − $1500] + [3($2000)(.2) + 2($2000)(.3) + 1($2000)(.4) + 0(.1)]
= − $1300
This is not the best week you want. How about the case you are left with 2 items at the end
of this week? First of all, you should pay the holding cost $100 per item. When calculating
the expected revenue, you should add probabilities for D is 2 or 3. This is because even if
the demand is 3, you can only sell 2 items. Since you do not order, there will be no cost
associated with ordering.
C(2) = − Cost + Revenue
= − 2(100) + 2000E[D ∧ 2]
=[−2($100)] + [($0)(.1) + ($2000)(.4) + ($4000)(.3 + .2)]
=$2600
This seems to be quite a good week. I computed C(3) = $2900 and C(1) = −$400 for you.
C(1) =E[D ∧ 3] = −3600 + 3200 = −400
C(3) = − 300 + 2000E[D ∧ 3] = 2900
Based on this, how would you compute the long-run average profit? This is where the
stationary distribution comes into play.
3
X
Long-run avg profit = C(i)πi
i=0
=C(0)π0 + C(1)π1 + C(2)π2 + C(3)π3
=$488.39
It means that under the (s, S) = (1, 3) policy, we are earning about 500 dollars every week
using this policy. So, probably you can now decide whether or not to keep running the
business. It does not guarantee that you earn exactly $488.39 each week. It is the long-run
average profit. In terms of getting the optimal policy maximizing the long-run average profit,
it involves advanced knowledge on stochastic processes. Especially, this kind of problem
cannot be solved using the simplex method because it is highly nonlinear problem. You will
be better off by trying to solve it numerically using Matlab or other computer aids.
37
12 Lecture 14: Mar 3
12.1 Interpretation of Stationary Distribution
>> P1 = [0 .1 .9; .8 0 .2; .4 .6 0]
>> P1^100
>> P1^101
Not surprisingly, P2100 ≈ P2101 . You will see each row (.3607, .2623, .3770) is identical. Do
you understand each identical row is the stationary distribution of the DTMC? The first
element of the stationary distribution vector is about 36%. What does it mean? It means
that, for 36% of time, the DTMC stays in state 1.
However, this is not always the case.
>> P3 = [0 1; 1 0]
>> P3^100
>> P3^101
>> (P3^100+P3^101)/2
In this simple example, the chain goes from 0 to 1 and from 1 to 0 for 100% chance. Does
this chain have a stationary distribution? This chain is obviously not stabilized. Then, do
we have two stationary distributions? What do you think about taking average of P3100 and
P3101 ?
Here, we again have P4100 = P4101 . However, not all rows are identical. What does it mean
that every row is identical? It means that your initial state is irrelevant once you run the
chain for a long time. For P4 , odd rows are identical and even rows are identical, but those
two are different. Here we have two stationary distributions. If you started at odd states,
you will be in an equilibrium of odd rows. Otherwise, you will be stuck in the even row
equilibrium. I hope our economic social ladder is not similar to this.
38
12.2 Transition Diagram
Example 12.1. Suppose we have the following transition matrix.
1 0 .2 .8
P1 = 2 .5 0 .5
3 .6 .4 0
State space S = {1, 2, 3}. Is the following diagram equivalent to the matrix?
1
.5 .8
.2 .6
.5
2 3
.4
The diagram contains exactly the same amount of information as the transition matrix has.
Example 12.2. Now, think of P2 different from P1 . Suppose P2 can be represented as the
following diagram with state space S = {1, 2, 3, 4, 5, 6, 7}.
1
.5 .8
.2 .6
.5
2 3
.4
1 .6 .7
4 5 6 7
.4 .3 1
In this example, there are two separate groups of diagrams in the entire picture. In that
sense, this chain can be reduced into two separate chains. These chains are called “reducible”.
Definition 12.1. (i) State i can reach state j if there exists an n such that P(Xn =
(n)
j|X0 = i) > 0, i.e. Pij > 0. This is mathematically noted as i → j.
39
(ii) States i and j are said to communicate if i → j and j → i.
Why does irreducibility important? In the previous Matlab example, we saw that if a
chain is reducible we have more than one stationary distribution. In the previous example,
if we have an arrow from state 4 to 3, does it make the chain irreducible? No, because 7 → 3
but 3 6→ 7.
Theorem 12.1. (i) If X is irreducible, there exists at most one stationary distribution.
(ii) If X has a finite state space, it has at least one stationary distribution.
Corollary 12.1. For a finite state, irreducible DTMC, there exists a unique stationary
distribution.
1 4
2 3
In this case, we have P 100 6= P 101 . However, according to our corollary, there should be a
unique stationary distribution for this chain, too. How do we obtain it given that P 100 6=
P 101 ? How about getting the average of the two?
P 100 + P 101
2
Even in this case, we cannot obtain the limiting distribution, because this chain oscillates
as n → ∞. How do we formally classify such cases?
40
For example, in the first example,
d(1) = gcd{2, 4, 3, · · · } = 1
d(2) =1.
In fact, if i ↔ j, d(i) = d(j). It is called the solidarity property. Since all states in a
irreducible DTMC communicate, the periods of all states are the same.
From the third example, d(4) = gcd{2, 4, 6, · · · } = 2, so this DTMC is periodic with
period d = 2.
Theorem 12.2. (i) If DTMC is aperiodic, then limn→∞ P n exists. (Each row of limn→∞ P n
is a stationary distribution.)
(ii) If DTMC is periodic with period d ≥ 2,
P n + P n+1 + P n+2 + · · · + P n+d−1
lim
n→∞ d
exists.
Since we are getting too abstract, let us look at another example.
Example 12.4.
.2 .8
P =
.5 .5
Is this DTMC irreducible? Yes. Now check Pii1 > 0. Therefore, d(i) = 1. It is an easy way to
check if a DTMC is aperiodic. Therefore, the limiting distribution exists, each row of which
is equal to the stationary distribution. What would be the stationary distribution then?
πP = π
.2 .8
(π1 , π2 ) = (π1 , π2 )
.5 .5
5 8
∴ π= ,
13 13
Hence, even without the help from Matlab, we can say that we know
n 5/13 8/13
lim P = .
n→∞ 5/13 8/13
Example 12.5.
0 .5 0 .5
.5 0 .5 0
P =
0
.5 0 .5
.5 0 .5 0
41
According to our theorems, how many distributions do we have? Just one. Can you give
the stationary distribution?
It means that the long-run average fraction of time you are in each state is a quarter. Then,
is this true?
1/4 1/4 1/4 1/4
1/4 1/4 1/4 1/4
Pn = 1/4 1/4 1/4 1/4
Example 12.6. Suppose the following DTMC and that you are asked to compute limn→∞ P n .
.2
.8
1 2 .5
.5
.5
1
.8
3 4 5
.2
.5 .8
6 7 .5
.5
.2
5/13 8/13 0 0 0 0 0
5/13 8/13 0 0 0 0 0
(1/2)(5/13) (1/2)(8/13) 0 0 0 (1/2)(5/13) (1/2)(8/13)
n
lim P = ((.8)(.5) + .2)(5/13) ((.8)(.5) + .2)(8/13)
0 0 0 (.8)(.5)(5/13) (.8)(.5)(8/13)
n→∞
5/13 8/13 0 0 0 0 0
0 0 0 0 0 5/13 8/13
0 0 0 0 0 5/13 8/13
When computing rows 1, 2, you can just forget about states except for 1 and 2 because there
is no arrow going out. Same for rows 6, 7.
42
13 Lecture 15: Mar 8
13.1 Recurrence
Let X be a DTMC on state space S with transition matrix P . For each state i ∈ S, let τi
denote the 1st n ≥ 1 such that Xn = i.
Example 13.1.
.5
.5 1 2
1
Given that X0 = 1, is it possible τ1 = 1, meaning that the chain returns to state 1 at time
1? Yes.
1
P(τ1 = 1|X0 = 1) =
2
P(τ1 = 2|X0 = 1) =P(X1 6= 1, X2 = 1|X0 = 1) = (0.5)1 = 0.5
P(τ1 = 3|X0 = 1) =0
1 3
E(τ1 |X0 = 1) =1 + 2(0.5) = < ∞
2 2
Note that the second probability is not 0.25+0.5, because X1 should not be equal to 1 for
τ1 = 2. Since the last expectation is finite, this chain is positive recurrent.
Example 13.2.
.5
.5
1 3
1 .5
.5 2
43
Is state 1 positive recurrent? Yes. State 2 is transient and how about state 3?
P(τ3 = 1|X0 = 3) =0
P(τ3 = 2|X0 = 3) =0.5
P(τ3 = 3|X0 = 3) =(0.5)2
P(τ3 = 4|X0 = 3) =(0.5)3
..
.
τ3 is basically a geometric random variable and E(τ3 |X0 = 3) = 1/p = 2 < ∞.
44
In this case, we know that state 1 is positive recurrent. We also know that 1 ↔ 2 and 1 ↔ 3.
Thus, we can conclude that all states in this chain is positive recurrent.
(i) p = 1/3, q = 2/3: State i is transient. Say you started from 100, there is positive
probability that you never return to state 100. Hence, every state is transient.
(ii) p = 2/3 > q: By the strong law of large numbers, P(Sn /n → −1/3) = 1 because
n Pn
Sn i=1 ξi 2 1 1
X
Sn = ξi , = → E(ξ1 ) = (−1)q + (1)p = − + = − .
i=1
n n 3 3 3
Note from the example above that if the chain is irreducible, every state is either recurrent
or transient.
Theorem 13.2. Assume that X is irreducible in a finite state space. Then, X is recurrent.
Furthermore, X is positive recurrent.
In a finite state space DTMC, there is no difference between positive recurrence and
recurrence. These two theorems are important because it gives us a simple tool to check if
stationary distribution exists. How about limit distribution? Depending on transition ma-
trix, there may or may not be limiting distribution. When do we have limiting distribution?
Aperiodic. If periodic, we used average. Let us summarize.
45
(iii) limn→∞ Pijn = πj , independent of what i is if the DTMC is aperiodic.
(iv)
1 ↔ 2 ← 3 ↔ {4, 5, 6, 7}
For the rest part, let us have more time to do this. This part is important.
46
14 Lecture 16: Mar 10
14.1 Absorption Probability
Let us begin with the DTMC we were dealing with last time.
1 2/3 1/3 0 0 0 0 0
22/3 1/3 0 0 0 0 0
3 0 0 0
100 n
P ≈ lim P = 4 0 0 0
n→∞
5 0 0 0
6 0 0 0 0 0 2/3 1/3
7 0 0 0 0 0 2/3 1/3
What would be P31 ? Before this, note that {1, 2}, {6, 7} are closed sets, meaning no
arrows are going out. Contrarily, {3, 4, 5} are transient states. It means that the DTMC
starts from state 3, it may be “absorbed” into either {1, 2} or {6, 7}.
Let us define a new notation. Let f3,{1,2} denote the probability that, starting state 3, the
DTMC ends in {1, 2}. Thus, f3,{1,2} + f3,{6,7} = 1. Let us compute the numbers for these.
We have three unknowns and three equations, so we can solve this system of linear equations.
5
x = .25 + .5y x = f3,{1,2} = 8
y = .5 + .5z ⇒ y = f4,{1,2} = 34
z = .5x + .25y z = f5,{1,2} = 12
We also now know that f3,{6,7} = 1 − 5/8 = 3/8, f4,{6,7} = 1 − 3/4 = 1/4, f5,{6,7} = 1 − 1/2 =
1/2.
However, to compute P31 , we consider not only f3,{1,2} but also the probability that the
DTMC will be in state 1 not in state 2. Therefore,
52 32
P51 = f5,{1,2} π1 = 12 23
P 31 = f 3,{1,2} π 1 = 8 3
P 41 = f 4,{1,2} π 1 = 4 3
51 31 11
P = f
3,{1,2} π2 = 8 3
P = f
4,{1,2} π2 = 4 3
P = f
32 42 52 5,{1,2} π2 = 2 3
32
, 12
, 12
.
P 36 = f3,{6,7} π6 = 8 3
P 46 = f4,{6,7} π6 = 4 3
P 56 = f5,{6,7} π6 = 2 3
P37 = f3,{6,7} π7 = 83 13 P47 = f4,{6,7} π7 = 14 13 P57 = f5,{6,7} π7 = 12 13
47
Finally, we have the following complete limiting transition matrix.
1 2/3 1/3 0 0 0 0 0
2 2/3 1/3 0 0 0 0 0
35/12 5/24 0 0 0 1/4 1/8
100 n
P ≈ lim P = 4 1/2 1/4 0 0 0 1/6 1/12
n→∞
5 1/3 1/6 0 0 0 1/3 1/6
6 0 0 0 0 0 2/3 1/3
7 0 0 0 0 0 2/3 1/3
.5
.25 1 .25
.5 3
.5 2 1
How do we find the stationary distribution? As far as we learned, we can solve the
following system of linear equations.
X
π = πP, πi = 1
i
This is sometimes doable, but becomes easily tedious as the number of states increases.
We can instead use the “flow balance equations”. The idea is that for a state, rate into
the state should be equal to rate out of the state. Considering flow in and out, we ignore
self feedback loop. Utilizing this idea, we have the following equations.
Equating each pair of rate in and out, now we have three equations. These three equations
are equivalent to the equations we can get from π = πP .
48
Example 14.2 (Reflected Random Walk). Suppose X has the following state diagram.
p p p
q 0 1 2 3
q q q
If the probability of the feedback loop at state 0 is 1, it means that you will be stuck there
once you get in there. In business analogy or gambler analogy, it means the game is over.
Somebody forcefully pick the DTMC out from there to keep the game going. The Wall Street
in 2008 resembles this model. They expected they would be bailed out. It is an incentive
problem, but we will not cover that issue now.
Suppose p = 1/3, q = 2/3. Then, the chain is irreducible. I can boldly say that there
exists a unique stationary distribution. However, solving π = πP gives us infinite number of
equations. Here the flow balance equations come into play. Let us generalize the approach
we used in the previous example.
For any subset of states A ⊂ S,
rate into A = rate out ofA.
If A = {0, 1, 2}, we essentially look at the flow between state 2 and 3 because this state
diagram is linked like a simple thin chain. We have the following equations.
p
π1 = π0
q
2
p p
π2 = π1 = π0
q q
3
p p
π3 = π2 π0
q q
4
p p
π4 = π3 π0
q q
Every element of the stationary
P distribution boils down to the value of π0 . Recall we always
have one more condition: i πi = 1.
∞ 2 3 !
X p p p
πi =π0 1 + + + + ··· = 1
i=0
q q q
1 1
π0 = 2 3 = 1
p p p 1− pq
1+ q
+ q
+ q
+ ···
p 1
=1 − =
q 2
n
p
πn = π0
q
49
So far, we assumed that p < q. What if p = q? What would π0 be? π0 = π1 = π2 =
· · · = 0. Then, is π a probability distribution? No. Since we do not have the stationary
distribution, this chain is not positive recurrent. What if p > q? It gets worse. The chain
cannot be positive recurrent because you have nonzero probability of not coming back. In
fact, In this case, every state is transient.
I call this method “cut method”. You probably realized that it would be very useful.
Remember that positive recurrence means that the chain will eventually come back to a
state and the chain will have some types of cycles. You will see the second example often as
you study further. It is usually called “birth-death process”.
50
15 Lecture 17: Mar 15
15.1 A DTMC model of stock price
Suppose you invest in stocks. Let Xn denote the stock price at the end of period n. This
period can be month, quarter, or year as you want. Investing in stocks is risky. How would
you define the return?
Xn − Xn−1
Rn = , n≥1
Xn−1
Rn is the return in period n. We can think that X0 is the initial money you invest in. Say
X0 = 100, X1 = 110.
X1 − X 0 110 − 100 1
R1 = = = = 10%
X0 100 10
Suppose you are working in a financial firm. You should have a model for stock prices. No
model is perfect, but each model has its own strength. One way to model the stock prices is
using iid random variables. Assume that {Rn } is a series of iid random variables. Then, Xn
can be represented as follows.
n
Y
Xn = X0 (1 + R1 )(1 + R2 )(1 + R3 ) · · · (1 + Rn ) = X0 (1 + Ri )
i=1
Q P
is similar to except that it represents multiplication instead of summation. Assuming
Rn iid, is X = {Xn : n = 0, 1, · · · , n} DTMC? Let us have more concrete distribution of Rn .
Xn = Xn−1 (1 + Rn ) = f (Xn−1 , Rn )
so you know that Xn is a DTMC because Xn can be expressed as a function of Xn−1 and
iid Rn .
This type of model is called “binomial model”. In this model, only two possibilities for
Rn exist: 0.1 or 0.05. Hence, we can rewrite Xn as follows.
Xn = X0 (1 + .1)Zn (1 + .05)n−Zn
51
area. Our school has a master program called Quantitative Computational Finance. It
models stock prices not only in a discrete manner but in a continuous manner. They have
a concept called geometric Brownian motion. Brownian motion has the term like eB(t) . In
binomial model, we also had the term with powers. If you look at the stock price from news-
papers, it looks continuous depending on the resolution. However, it is in fact discrete. There
are two mainstreams in academia and practice regarding stock price modeling: discrete and
continuous. Famous Black-Scholes formula for options pricing is a continuous model. This
is a huge area and I can easily get into this subject and talk about it forever. Let us keep it
this level.
F (x) = 1 − e−λx
and p.d.f. of
(
λe−λx , if x ≥ 0
f (x) = .
0, if x < 0
Be careful that rate λ implies that the mean of the exponential distribution is 1/λ. Often,
inter-arrival times are modeled by a sequence of iid exponential random variables. What do
we know about the exponential distribution?
(ii) c2 (X) = 1
Let me paraphrase this concept. Look at the light bulb on the ceiling. Let X denote
the lifetime of the bulb and assume that this light bulb’s lifetime follows the exponential
distribution. If it is on now, the length of its lifetime is as long as a new bulb. s is the time
until now and t is additional lifetime. Given that the light has been on for s units of time,
52
the probability that it will be on for t more units of time is same as that of a new light bulb.
How is it so?
P(X > t + s) e−λ(t+s)
P(X > t + s|X > s) = = = e−λt = P(X > t)
P(X > s) e−λs
In fact, it is the defining property of exponential property. It means that if a non-negative
continuous random variable has the memoryless property, it must be exponential distribution.
Corollary 16.1.
P(X > t + S|X > S) = P(X > t)
for any positive random variable S that is independent of X.
Mathematically, we just replaced s with S. Let me explain the change in plain English.
If we are saying, “if the light is on at noon today, its remaining lifetime distribution is as if it
is new,” we are referring to s. If we are saying, “if the light is on when IBM stock price hits
100 dollars, its remaining lifetime is as if it is new,” we are referring to S which is random.
The important thing to note is that S should be independent of X. Suppose S = X/2, i.e.
S depends on X.
P X > t + X2
X X X
P X > t + |X > = =P X >t+ = P(X > 2t)
2 2 P (X > X/2) 2
6=P(X > t)
The memory property does not hold if X and S are not independent.
Come back to other derivations from exponential distribution. Let X1 and X2 be indepen-
dent exponential random variables with rates λ1 and λ2 respectively, i.e. X1 ∼ Exp(λ1 ), X2 ∼
Exp(λ2 ). Let X = min(X1 , X2 ), then X denote the time when any one of two bulbs fails.
P(X > t) =P(X1 > t, X2 > t) = P(X1 > t)P(X2 > t) ∵ X1 ⊥⊥ X2
−λ1 x −λ2 x −(λ1 +λ2 )x
=e e =e
Therefore, we can say that X = min(X1 , X2 ) ∼ Exp(λ1 + λ2 ).
Example 16.1. Let X1 and X2 follow exponential distribution with means 2 hours and 6
hours respectively. Then, what would be the mean of min(X1 , X2 )?
1 1
E(min(X1 , X2 )) = 1 1 = = 1.5hours
2
+ 6
4/6
How about the expectation of max(X1 , X2 )? We can use the fact that X1 +X2 = min(X1 , X2 )+
max(X1 , X2 ).
E(max(X1 , X2 )) =E(X1 + X2 ) − E(min(X1 , X2 )) = 8 − 1.5 = 6.5hours
We do not have a convenient way to compute the expectation of max(X1 , X2 , · · · , Xn ) more
than two exponential random variables.
53
Corollary 16.2.
λ1 λ2
P(X1 < X2 ) = , P(X1 > X2 ) = , P(X1 = X2 ) = 0
λ1 + λ2 λ1 + λ2
How to compute this? One way is to guess the answer. As λ1 becomes large, X1 gets
shorter. It implies that P(X1 < X2 ) goes close to 1. Another way is to double integrate.
54
17 Lecture 18: Mar 17
17.1 Comparing Two Exponentials
Let X1 and X2 be two independent r.v.’s having exponential distribution with rates λ1 and
λ2 .
Proposition 17.1.
λ1
P(X1 < X2 ) = , f (x1 , x2 ) = λ1 e−λ1 x1 e−λ2 x2
λ1 + λ2
How would you remember the correct formula was λ1 /(λ1 + λ2 ) not λ2 /(λ1 + λ2 )? With
λ1 increasing, the expected lifetime of the first lightbulb X1 becomes shorter, hence the
probability X1 breaks down before X2 gets close to 1. We can also compute the probability
using the joint pdf.
ZZ Z ∞ Z x2
P(X1 < X2 ) = f (x1 , x2 )dx1 dx2 = f (x1 , x2 )dx1 dx2
D 0 0
When computing double integral, it is helpful to draw the region where integration applies.
The shaded region in the following figure is the integration range in 2 dimensions.
5
3
X2
0
0 1 2 3 4 5
X1
Example 17.1. (i) A bank has two teller, John and Mary. John’s processing times are
iid exponential distributions X1 with mean 6 minutes. Mary’s processing times are iid
exponential distributions X2 with mean 4 minutes. A car with three customers A, B, C
shows up at 12:00 noon and two tellers are both free. What is expected time the car
leaves the bank?
Using intuition, we can see that it would be between 8 and 12 minutes. Suppose A, B
first start to get service. Once one server completes, C will occupy either A or B’s
55
position depending on which server finishes first. If C occupies A’s server after A is
completed, B has already been in service while A was getting served. Let Y1 , Y2 denote
the remaining processing time for John and Mary respectively. The expected time
when the car leaves is
1/6 4 4
P(B before A) = = =
1/4 + 1/6 4+6 10
2
4
∴ P(C before A) =
10
This is because we can think of A just starting to get service when C started service.
(ii) N has independent increments, i.e., N (t) − N (s) is independent of N (u) for 0 ≤ u ≤ s
(iii) N (0) = 0
56
N is said to model a counting process, e.g. N (t) is the number of arrivals in [0, t]. First
of all, whichever time interval you look at closely, say (s, t], the number of arrivals in the
interval follows the distribution of Poisson random variable. Second, how many people arrive
during a time interval in the past does not affect the number of arrivals in a future interval.
Where do we see this thing in the real world? We have very large population base, say 1
million or 100 thousands. Each person makes independent decision, e.g. placing a call, with
small probability, then you will see the Poisson process.
(i) Find the probability that there are exactly 4 arrivals in first 3 minutes.
(ii) What is the probability that exactly two arrivals in [0, 2] and at least 3 arrivals in
[1, 3]?
P(N (3) − N (2) ≥ 1) =1 − P(N (3) − N (2) < 1) = 1 − P(N (3) − N (2) = 0)
20
=1 − e−2 = 1 − e−2
0!
57
(iv) What is the probability that the first arrival will take at least 4 minutes? Let T1 be
the arrival time of the first customer. Is T1 a continuous or discrete random variable?
Continuous.
Can you understand the equality above? In plain English, “the first arrival takes at
least 4 minutes” is equivalent to “there is no arrival for the first 4 minutes.” It is very
important duality. What if we change “4” minutes to t minutes?
µk −µ
P(Y = k) = e , k = 0, 1, 2, · · · .
k!
What is the meaning of µ? In Poisson random variable, there is no concept of rate. µ = E[Y ].
58
18 Lecture 19: Mar 29
18.1 Time-nonhomogeneous Poisson Processes
Example 18.1. Let N (t) be the number of arrivals in [0, t]. Assume that N = {N (t), t ≥ 0}
is a Poisson process with rate λ = 2/min.
(i)
(2(7 − 3))5 −2(7−3) 85 −8
P(N (3, 7] = 5) = e = e
5! 5!
(2t)0 −2t
P(T1 > t) = P(N (0, t] = 0) = e = e−2t , t≥0
0!
What is this distribution? It is exponential distribution, meaning that T1 follows
exponential distribution. T1 ∼exp with mean “0.5” minutes. Therefore,
P(T1 ≤ t) = 1 − e−2t .
(iii) Let T3 be the arrival time of the 3rd customer. Which of the following is correct?
P(T3 > t) =P(N (0, t] ≤ 2) = P(N (0, t] = 0) + P(N (0, t] = 1) + P(N (0, t] = 2)
2
2t (2t) −2t
=e−2t + e−2t + e
1! 2!
P(T3 > t) =P(N (0, t] = 2)
59
distribution is still defined. However, it is tricky to compute something like (2.3)!.
That’s where Γ(α) , gamma function, comes into play. You can look up the table to get
the specific value for a certain α.
Now think why we get gamma distribution for T3 . Each inter-arrival time is exponential
r.v. Also, they are iid. Therefore, we can compute T3 by summing three of them. That’s
why we get gamma distribution.
By the way, Erlang distribution, if you have heard of them, is gamma distribution with
integer α.
Why do we need non-homogeneity? Suppose you are observing a hospital’s hourly arrival
rate. Arrival rate will not be the same over time during a day. See the following figure.
10
8
Homogeneous
Hourly Arrival Rate
3
Non-homogeneous
2
0
0 2 4 6 8 10 12 14 16 18 20 22 24
Time (Hour)
Therefore, to model this type of real world phenomena, we need more sophisticated
model.
Definition 18.1 (Non-homogeneous Poisson Process). N = {N (t), t ≥ 0} is said to be a
time-inhomogeneous Poisson process with rate function λ = {λ(t), t ≥ 0} if
(i) N has independent increments,
(ii)
R k
t
s
λ(u)du Rt
P(N (s, t] = k) = e− s λ(u)du
,
k!
where N (s, t] = N (t) − N (s) is the number of arrivals in (s, t].
Non-homogeneous Poisson process allows us to model a rush-hour phenomenon.
Example 18.2. Suppose we modeled the arrival rate of a store as follows. One month after
launching, the arrival rate settles down and it jumps up at the end of the second month due
to discount coupon.
60
3
0
0 1 2 3 4 5
Time
(i)
(ii)
61
4
exp(ld3)
3
exp(ld2) exp(ld2)
State
2
exp(ld1) exp(ld1)
1
0
0 1 2 3 4 5 6 7 8 9 10
Time
62
19 Lecture 20: Mar 31
19.1 Thinning Poisson Process
19.1.1 Merging Poisson Process
We covered most of Poisson process related topics. Some topics remaining are merging and
splitting of Poisson arrival processes. In Georgia Tech, there are two entrances. Denote the
numbers of arrivals through gate A and B by NA , NB .
The independence assumption may or may not be true. Thinking of the Georgia Tech
case here, more people through gate A may mean that less people through gate B. Think
about Apple’s products. Would sales of iPad be correlated with sales of iPhone? Or, would
it be independent? We need to be careful about independence condition when you model
the real world.
Theorem 19.2. Suppose N is Poisson with rate λ. Then, NA is a Poisson process with rate
pλ and NB is a Poisson process with rate (1 − p)λ.
It may be silly that you shop and choose by flipping a coin. However, from a company’s
perspective, they view people choose based on flipping a coin using statistical inference.
Suppose a company has two products. They will model people’s choice using a lot of tracking
data and make a conclusion based on statistical inference.
Let me give you the sketch of the proof of the theorem above. (By sketch, it means that
it won’t be on the test.)
Proof.
NA (t) ∼Poisson(λp)
NB (t) ∼Poisson(λ(1 − p))
63
Define
N (t)
(
X 1, if the ith toss is a head
NA (t) = Yi , Yi =
i=1
0, if the ith toss is a tail
64
The Taylor expansion is
x2 x3
ex = 1 + x +
+ + ··· .
2 3!
Although we obtained what we wanted at first, we still need to prove the independence
between NA and NB . To do that, we need to show
(λpt)k e−λpt (λpt)l e−λpt
P(NA (t) = k, NB (t) = l) =
k! l!
which is the product form. We do not go over the computation, but it will be very similar.
In Poisson process, independent increment is another important concept, but we will not
go over that very much. FYI, as another example, if each customer does not toss a coin and
instead odd number customers go to store A and even number customers go to store B, the
split processes will not be Poisson.
65
20 Lecture 21: Apr 5
20.1 Review of CTMC
We have two machines and one repairman. Up times and down times follow the following
distributions.
Up times ∼ Exp(1/10)
Down times ∼ Exp(1)
Let X(t) be the number of machines that are up at time t. Since X(t) is a stochastic process,
I am going to model it using CTMC. For CTMC, we need three ingredients. First, state
space S = {0, 1, 2}. Second, holding times λ0 = 1, λ1 =, λ2 =. When you compute λi , i ∈ S,
think each case separately.
• When X(t) = 0, the repairman is working on one of two machines both of which are
down at the moment, so λ0 = 1.
• When X(t) = 1, the holding time follows min(up time, down time) = Exp((1/10) + 1),
so λ1 = 11/10.
• When X(t) = 2, the holding time in this case follows min(up time, up time) = Exp(0.1+
0.1), so λ2 = 1/5.
Now we know when the chain will jump, but we don’t know to which state the chain will
jump. Roadmap matrix, the last ingredient of a CTMC, tells us the probability.
0 0 1 0
R = 1 (1/10)/(1 + 1/10) 0 1/(1 + (1/10))
2 0 1 0
We can draw a rate diagram showing these data graphically. If you are given the following
diagram, you should be able to construct G from it.
1 1
0 1 2
1/10 1/5
66
If you look at the diagram, maybe you can interpret the problem more intuitively.
Example 20.1. We have two operators and three phone lines. Calls arrival follows a Poisson
process with rate λ = 2 calls/minute. Each call processing time is exponentially distributed
with mean 5 minutes. Let X(t) be the number of calls in the system at time t. In this
example, we assume that calls arriving when no phone line is available are lost. What would
the state space be? S = {0, 1, 2, 3}.
2 2 2
0 1 2 3
1/5 2/5 2/5
How do we compute the roadmap matrix based on this diagram? For example, you are
computing R21 , R23 . Simply just add up all rates leaving from state 2 and divide the rate to
the destination state by the sum.
2 2/5
R23 = , R21 =
2 + (2/5) 2 + (2/5)
Now, let us add another condition. If each customer has a patience that is exponentially
distributed with mean 10 minutes. When the waiting time in the queue of a customer exceeds
his patience time, the customer abandons the system without service. Then, the only change
we need to make is the rate on the arrow from state 3 to state 2.
2 2 2
0 1 2 3
1/5 2/5 2/5+1/10
Example 20.2. John and Mary are the two operators. John’s processing times are ex-
ponentially distributed with mean 6 minutes. Mary’s processing times are exponentially
distributed with mean 4 minutes. Model this system by a CTMC. What would the state
space be? S = {0, 1J, 1M, 2, 3}. Why can’t we use S = {0, 1, 2, 3}? Let’s see and let the
question be open so far. Let us draw the diagram first assuming that S = {0, 1, 2, 3}.
2 2 2
0 1 2 3
? 1/6+1/41/6+1/4+1/10
We cannot determine the rate from state 2 to state 1 because we don’t know who is processing
the call, John or Mary. So, we cannot live with S = {0, 1, 2, 3}. For Markov chain, it
is really an important concept that we don’t have to memorize all the history up to the
present. It’s like “Just tell me the state. I will tell you how it will evolve.” Then, let’s see if
S = {0, 1J, 1M, 2, 3} works.
67
1M
? 2
2
1/4
1/6
0 2 3
? 2
1/6+1/4+1/10
1/6 1/4
1J
Even in this case, we cannot determine who takes a call when the call arrives when both
of them are free. It means that we do not have enough specification to model completely.
In tests or exams, you will see more complete description. It is part of manager’s policy.
You may want John takes the call when both are free. You may toss a coin whenever such
cases happen. What would the optimal policy be in this case? It depends on your objective.
Probably, John and Mary are not paid same. You may want to reduce total labor costs or
the average waiting time of customers.
Anyway, suppose now that we direct every call to John when both are free.
1M
0 2
2
1/4
1/6
0 2 3
2 2
1/6+1/4+1/10
1/6 1/4
1J
We have complete information for modeling a CTMC. These are inputs. Then, what would
outputs we are interested in be? We want to know which fraction of time the chain stays
in a certain state in the long run. We also want to know how many customers are lost.
We may plan to install one more phone line and want to evaluate the improvement by new
installation. We will cover these topics from next time.
This Thursday will be the test 2. Topics covered in the test are
• DTMC
• exponential distribution
• Poisson process: time homogeneous, time non-homogeneous.
68
21 Lecture 22: Apr 12
21.1 CTMC Example Revisited
Example 21.1. A small call center with 3 phone lines and two agents, Mary and John.
Call arrivals follow a Poisson process with rate λ = 2 calls per minute. Mary’s processing
times are iid exponential with mean 4 minutes. John’s processing times are iid exponential
with mean 6 minutes. Customer’s patience follows iid exponential with mean 10 minutes.
An incoming call to an empty system always goes to John.
The rate diagram will be as follows.
1M
2
1/4
2
1/6
0 2 3
2 2
1/6+1/4+1/10
1/6 1/4
1J
It is very tempting to model this problem like the following. It is wrong. You won’t get
any credit for this if you model like this in test.
2 2 2
0 1 2 3
? 1/6+1/4 1/6+1/4+1/10
This model is not detail enough to capture all situations explained in the question. You
cannot determine what number should go into the rate from state 1 to 0 because you did
not take care of who is handling the phone call if there is only one.
For example, S = {1, 2, 3}. φi (j) takes vector in (1, 0, 0), (0, 1, 0), (0, 0, 1). Think you are
throwing a three-sided die and choose one of three possible vectors for the next φi value.
69
4
u3(1)
3
Y Axis
u2(1) u2(2)
2
u1(1)
1
0
0 1 2 3 4 5 6 7 8 9 10
X Axis
Think of φi (j) as the outcome of the jth toss of the ith coin. If we have the roadmap
matrix,
0 1/2 1/2
R = 1/4
0 3/4 ,
1/8 7/8 0
each row i represent ith coin. In this case, the first coin is a fair coin but the other two are
biased. Now we can generate a series of vectors φi (j).
Then, how can we generate iid “exponentially” distributed random variables, ui (j)? Com-
puter is only able to drop a needle within an interval. Say [0, 1]. We can ask computer to
drop a needle between 0 and 1. How can we generate an exponentially distributed random
variable from this basic building block? Say we are trying to generate exponential random
variables with rate 4. Define
1
X = − ln(1 − U ).
4
Then,
1
P(X > t) =P − (1 − U ) > t
4
=P(ln(1 − U ) < −4t) = P (1 − U ) < e−4t
=P U > 1 − e−4t
=1 − 1 − e−4t
=e−4t .
We got the exponential distribution we initially wanted by just using a uniform random
variable. You will learn how to generate other types of random variables from a uniform
distribution in simulation course. That will be the main topic there. Simulation in fact is
an integral part of IE curriculum.
70
21.3 Markov Property
for any t0 < t1 < · · · < tn−1 < t and any i0 , i1 , · · · , in−1 ∈ S and any i, j ∈ S. This is the
defining property of a CTMC. As in DTMC, Kolmogorov-Chapman equation holds
Example 21.2. Let X = {X(t), t ≥ 0} be a CTMC on state space S = {1, 2, 3} with the
following rate diagram.
2 1
1 2 3
5 4
The generator is
−2 2 0
G = 5 −6 1 .
0 4 −4
Suppose you are asked to compute P1,3 (10) meaning the probability going from state 1 to
state 3 after 10 minutes. Using Kolmogorov-Chapman equation,
X
P1,3 (10) = P1,k (5)Pk,3 (5) = [P (5)]21,3 .
k
You can compute P1,3 (10) given that you are given P1,3 (5). How about P1,3 (1) or P1,3 (1/10)?
We can still compute P1,3 (10) by exponentiating these to the 10th power or 100th power. In
this way, we can have the following approximation formula.
P (t) = etG
What are we talking about? Exponentiating a matrix? There are two commands in Matrax
relevant to exponentiating a matrix.
>> exp(A)
[e1 e2; e3 e4]
>> expm(A)
This is what we want.
71
∞
X An
expm(A) =
n=0
n!
72
22 Lecture 23: Apr 14
22.1 Generator Matrix
Let X be a CTMC on state space S = {1, 2, 3} with generator
−2 1 1
G = 2 −5 3 .
2 2 −4
In reality, we can compute expm without Matlab only in a few cases. In such special
case, you can first obtain eigenvalues of G matrix using the following formula.
If all eigenvalues are distinct, we can exponentiate the matrix rather easily.
a1 a1
GV =V a2 ⇒ G=V a2
a3 a3
2
a1 a1
G =V a2 V −1 , G2 = V a22 V −1
a3 a23
n
a1
Gn =V an2 V −1
an3
73
Hence,
P∞ an
1
a
∞ e1
X Gn n=0 n! P∞ an V −1 V −1 .
expm(G) = =V 2
n=0 n!
=V ea2
n! P∞ an a3
n=0 3
n=0 n!
e
Since you are not allowed to use Matlab in test, you will be given a matrix such as
expm(G).
0 = πP 0 (0) ⇒ πG = 0.
In our case,
−2 1 1
(π1 , π2 , π3 ) 2 −5 3 = 0.
2 2 −4
74
(ii) Because S is finite, it has at least one stationary distribution.
Example 22.1. Consider a call center with two homogeneous agents and 3 phone lines.
Arrival process is Poisson with rate λ = 2 calls per minute. Processing times are iid expo-
nentially distributed with mean 4 minutes.
(i) What is the long-run fraction of time that there are no customers in the system? π0
(ii) What is the long-run fraction of time that both agents are busy? π2 + π3
(iii) What is the long-run fraction of time that all three lines are used? π3
X(t) is the number of calls in the system at time t. S = {0, 1, 2, 3}. We can draw the
rate diagram based on this information. In fact, having the rate diagram is equivalent to
having the generator matrix. When we solve πG = 0, it is really just solving flow balancing
equations, flow in = flow out in each state.
1 1 1
2π0 = π1 , 2π1 = π2 , 2π2 = π3 , π0 + π1 + π2 + π3 = 1
4 2 2
Solving this by setting π0 = 1 and normalizing the result, we obtain
1 8 32 128
π = (1, 8, 32, 128) ⇒ π = , , , .
169 169 169 169
(i) The number of calls lost per minute is λπ3 = 2(128/169) which seems to be quite high.
75
23 Lecture 24: Apr 19
23.1 M/M/1 Queue as a CTMC
Suppose we have an M/M/1 queue, meaning that we have Poisson arrival process with rate λ
arrivals/minute and service times are iid exponentially distributed with rate µ. To illustrate
the point, set λ = 1/2, µ = 1. Assume that the buffer size is infinite. Let X(t) be the
number of customers in the system at time t. Then, X = {X(t), t ≥ 0} is a CTMC with
state space S = {0, 1, 2, · · · }. What is the easiest way to model this as a CTMC? Draw the
rate diagram.
λ λ λ λ
0 1 2 3 ···
µ µ µ µ
Is this CTMC irreducible? Yes. Does it have a stationary distribution? Yes or no. It
depends on the relationship between λ and µ. What if λ > µ? The queue will eventually be
fed up to infinity. In such case, we don’t have a stationary distribution. If λ < µ, we will have
a unique stationary distribution. Even if λ = µ, we won’t have a stationary distribution.
We will look into this later.
How can we determine the stationary distribution? We can get one by using πG = 0,
but let us first try the cut method. If we cut states into two groups, in steady state, flow
going out from and in to one group should equate. Therefore,
π0 λ =π1 µ ⇒ π1 = ρπ0
π1 λ =π2 µ ⇒ π2 = ρπ1
π2 λ =π3 µ ⇒ π3 = ρπ2
..
.
where ρ = λ/µ = 0.5 in this case. Solving the system of equations, you will get the following
solution.
π1 =ρπ0
π2 =ρ2 π0
π3 =ρ3 π0
..
.
The problem is that we don’t know what π0 is. Let us determine π0 intuitively first. If server
utilization is ρ, it means that in the long run the server is not busy for 1 − ρ fraction of time.
76
Therefore, π0 = 1 − ρ. We can get πi , ∀i from ρi π0 . We can solve this analytically as well.
Remember the sum of stationary distribution should be 1.
π0 + π1 + π2 + · · · =1
π0 + ρπ0 + ρ2 π0 + · · · =1
π0 (1 + ρ + ρ2 + · · · ) =1
1
π0 =1 ⇒ π0 = 1 − ρ
1−ρ
We expected this. More concretely,
2
1 1 1
π2 = 1− = = 0.125.
2 2 8
What does this π2 mean? It means that 12.5% of time the system has two customers.
In general, we can conclude that the CTMC has a stationary distribution if and only if
ρ < 1. Let us assume ρ < 1 for further examples. As a manager, you will want to know
more than long run fraction of time how many customers you have. You may want to know
the long run average number of customers in the system.
∞
X
nπn =0π0 + 1π1 + 2π2 + · · ·
n=0
=1ρ(1 − ρ) + 2ρ2 (1 − ρ) + 3ρ3 (1 − ρ) + · · ·
=ρ(1 − ρ)(1 + 2ρ + 3ρ2 + · · · )
=ρ(1 − ρ)(1 + ρ + ρ2 + ρ3 + · · · )0
0
1
=ρ(1 − ρ)
1−ρ
1 ρ
=ρ(1 − ρ) 2
=
(1 − ρ) 1−ρ
Next question you may wonder is what the average time in the system will be. Can we
use the Little’s Law, L = λW ? What do we know among these three variables?
ρ
L= = λW
1−ρ
1 ρ 1 λ/µ 1 1 m
W = = = =
λ1−ρ λ1−ρ µ1−ρ 1−ρ
where m is the mean processing time. A little bit tweaked question will be what the average
waiting time in the queue. We can again use the Little’s Law as long as we define the
boundary of our system correctly. It should be Lq = λWq . How do we compute Lq , Wq ?
Lq =0π0 + 0π1 + 1π2 + 2π3 + · · ·
m ρ
Wq =W − m = −m=m
1−ρ 1−ρ
77
Is this Wq formula familiar? Recall the Kingman’s formula.
2
ca + c2s
ρ
W =m
1−ρ 2
In our case here, since both arrival and processing times are iid exponentially distributed,
c2a = c2s = 1. That’s why we did not have ca , cs terms in our original formula.
You should be able to see the connections among the topics covered during this semester.
If you look at Wq = m/(1 − ρ), you will notice that Wq → ∞ as ρ ↑ 1. If ρ = 1/2,
Wq = m, which means that the length of time you wait is equal to the length of time you
get serviced. If ρ = 0.95, Wq = 19m. It means that you wait 19 times longer than you
actually get service. It is extremely poor service. You, as a manager, should achieve both
high utilization and short waiting time. Using a pool of servers, you can achieve both. We
will talk about this later.
0 1 2 3
µ µ µ
We still can use the detailed balance equations. Suppose λ = 1/2, µ = 1 as in the previous
example.
1 1 8
π0 = 2 3
= 2 3
=
1+ρ+ρ +ρ 1 + 1/2 + (1/2) + (1/2) 15
4 2 1
π1 = , π2 = , π3 =
15 15 15
The questions you will be interested are
78
(i) What is the average number of customers in the system?
8 4 2 1 4+4+3 11
0 +1 +2 +3 = =
15 15 15 15 15 15
(iii) What is average waiting time in queue? Again, we will use the Little’s Law. This
formula is the one you remember 10 years from now like “There was something called
the Little’s Law in IE curriculum.”
4 1 8
Lq = λWq ⇒ = Wq ⇒ Wq =
15 2 15
Is this correct? What is suspicious? We just used λ = 1/2, but in limited size queue
case we lose some customers. We should use effective arrival rate, which excludes
blocked customers.
Ask question now if you do not understand. Do not let it accumulated. If you feel fuzzy
about something, it will likely be on the test.
79
24 Lecture 25: Apr 21
24.1 M/M/∞ Queue
Arrival process is Poisson with rate λ. Service times are iid exponentially distributed with
rate µ. Let X(t) be the number of customers in system at time t. X = {X(t), t ≥ 0} is a
CTMC in S = {0, 1, 2, · · · }. Google’s servers may be modeled using this model. In reality,
you will never have infinite number of servers. However, Google should have so many servers
that we can assume they have infinite number of servers. Model is just for answering a
certain question. What would be the rate diagram?
λ λ λ
0 1 2 3
1µ 2µ 3µ
How can we compute the stationary distribution? Use balance equations and the cut
method.
λ
λπ0 = µπ1 ⇒ π1 = π0
µ
λ2
λπ1 = 2µπ2 ⇒ π 2 = 2 π0
2µ
λ3
λπ2 = 3µπ3 ⇒ π3 = 3 π0
3!µ
..
.
λn
πn = π0
n!µn
P∞
Since we have another condition, i=0 πi = 1,
∞
λ2 λ3
X λ
πi =π0 1+ + 2 + + ··· = 1
i=0
µ 2µ 3!µ3
1 1 λ
π0 = λ2 λ3
= = e− µ
1 + µλ + 2µ2
+ 3!µ3
+ ··· eλ/µ
n
1 λ λ
πn = e− µ , n = 0, 1, 2, · · · .
n! µ
Thinking of π = (π0 , π1 , π2 , · · · ), what is the distribution? It is clearly not exponential
because it is discrete. π ∼ a Poisson distribution. Compare it with M/M/1 queue. M/M/1
queue’s stationary distribution is geometric.
80
You may be not fuzzy about Poisson process, Poisson random variable, etc. Poisson
process tracks a series of incidents. At any given time interval, the number of incidents is a
Poisson random variable following Poisson distribution.
Suppose λ = 10, µ = 1. Then, for example,
102 e−10
π2 = = 50e−10
2!
which is quite small. The long run average number of customers in system is
λ
0π0 + 1π1 + 2π2 + · · · = .
µ
Capacity provisioning: You may want to achieve a certain level of service capacity.
Suppose you have actually 100 servers. Then, you need to compute P(X(∞) > 100). If
P(X(∞) > 100) = 10%, it may not be acceptable because I will be losing 10% of customers.
How can we actually compute this number?
P(X(∞) > 100) =1 − P(X(∞) ≤ 100) = 1 − [P(X(∞) = 0) + P(X(∞) = 1) + P(X(∞) = 2) + · · · ]
You can also compute how many servers you will need to limit the loss probability to a
certain level. Say you want to limit the loss probability to 2%. Solve the following equation
to obtain c.
P(X(∞) > c) = 0.02
This is an approximate way to model a many server situation. We will go further into
the model with finite number of servers.
(k − 1)µ kµ kµ
81
Let ρ = λ/(kµ). Then,
1
π0 = 2 3 k−1 k .
λ 1 λ 1 λ 1 λ 1 λ 1
1+ µ
+ 2! µ
+ 3! µ
+ ··· + (k−1)! µ
+ k! µ 1−ρ
As a manager managing this call center, you want to keep it reasonably low. Remember
that the average waiting time in M/M/1 queue is
ρ
E[W ] = m = mw.
1−ρ
Let us call w waiting time factor. This factor is highly nonlinear. Don’t try to push up the
utilization when you are already fed up. Conversely, a little more carpooling can dramatically
improve traffic condition. What is it for this M/M/k case? Let us first compute the average
queue size.
If you use many servers parallel, I can achieve both quality and efficient service. If the
world is so deterministic that arrival times and service times are both deterministic, you can
achieve quality service of high utilization even with a single server. For example, every two
minutes a customer arrives and service time is slightly less than 2 minutes. There will be
no waiting even if you have only one server. In reality, it is full of uncertainty. In this case,
having a large system will make your system more robust to uncertainty.
Sometimes, you will hear automated message when you call to a call center saying that
there are 10 customers are ahead of you. This information is very misleading. Your actual
waiting time will heavily depend on how many servers the call center hired.
You can analyze M/M/k/b queue in similar way. The only difference is to use 1 + ρ +
ρ + · · · + ρb instead of 1 + ρ + ρ2 + · · · .
2
82
25 Lecture 26: Apr 26
25.1 Open Jackson Network
25.1.1 M/M/1 Queue Review
Before going into Jackson Network, let us review M/M/1 queue first. In an M/M/1 queue,
suppose the arrival rate α = 2 jobs per minute. The mean processing time m = 0.45 minutes.
The traffic intensity is computed as follows.
ρ = αm = 2(0.45) = 0.9
Let X(t) be the number of jobs in system at time t. The stationary distribution is
πn =P(X(∞) = n) = (1 − ρ)ρn , n = 0, 1, 2, 3, · · ·
π2 =(1 − ρ)ρ2 = (1 − 0.9)0.92 = 0.081 = 8.1%.
(3, 3)
α
µ2
(2, 3) (2, 2)
µ1
(1, 4)
This CTMC may have a stationary distribution. How can you compute the following?
P(X(∞) = (2, 3)) =P(X1 (∞) = 2, X2 (∞) = 3)
=P(X1 (∞) = 2)P(X2 (∞) = 3)
∵ We can just assume the independence
because each queue does not seem to affect each other.
=(1 − ρ1 )ρ21 (1 − ρ2 )ρ32
83
where ρ1 = αm1 , ρ2 = αm2 . When determining ρ2 , you may be tempted to set it as m2 /m1 .
However, if you think about it, the output rate of the first station is α not 1/m1 because
ρ1 < 1. If ρ1 ≥ 1, we don’t have to care about the steady state of the system since the first
station will explode in the long run.
Let us summarize. Let ρi = αmi for i = 1, 2 be the traffic intensity at station i. Assume
ρ1 < 1, ρ2 < 1. Then,
Since this chain is irreducible, if one distribution satisfies the balance equation, it is the only
one satisfying it, which is my unique stationary distribution. You may think this looks like
cheating. There can be two ways to find the stationary distribution: one is to solve the
equation πP = π directly, the other is guess something first and just verify it is the one we
are looking for. If possible, we can take the easy way. If you can remember M/M/1 queue’s
stationary distribution, you should be able to compute the stationary distribution of this
tandem queue.
The question is how to set ρ1 , ρ2 . Let us define a new set of variables. Denote the throughput
from station i by λi . It is reasonable to assume that λ1 = λ2 for a stable system. Then,
because of the feedback,
λ1 = α + 0.5λ2 ⇒ λ1 = α + 0.5λ1 ⇒ λ1 = 2α = 2.
Therefore,
84
This is called the traffic equation. How can we compute the average number of jobs in the
system? Recall that the average number of jobs in M/M/1 queue is ρ/(1 − ρ). Then,
average number of jobs in the system
=average # of jobs in station 1 + average # of jobs in station 2
ρ1 ρ2 0.9 0.8
= + = + = 9 + 4 = 13 jobs.
1 − ρ1 1 − ρ2 1 − 0.9 1 − 0.8
Still, we did not prove that (1 − ρ1 )ρn1 1 (1 − ρ2 )ρn2 2 is indeed the stationary distribution.
The only thing we need to prove is that this distribution satisfies the balance equation. At
first glance, the two stations seem to be quite interrelated, but in steady-state, the two are
in fact independent. It was found by Jackson at the end of 50s and published in Operations
Research. In theory, the independence holds for any number of stations and each station
can have multiple servers. It is very general result.
What is the average time in system per job? Use Little’s Law, L = λW .
L = λW ⇒ 13 = 1W ⇒ W = 13 minutes.
It could be 13 hours, days, weeks. It is lead time that you can quote to your potential
customer. How can we reduce the lead time? We can definitely hire or buy more servers or
machines. Another thing you can do is to lower the arrival rate, in reality term, it means
that you should reject some orders. It may be painful or even impossible. Suppose we reduce
the failure rate from 50% to 40%. Then,
5 3 5 2
ρ1 = (0.45) = (0.15)5 = 0.75 = , ρ2 = (0.4) =
3 4 3 3
ρ1 ρ2 3/4 2/3
W = + = + = 3 + 2 = 5 minutes.
1 − ρ1 1 − ρ2 1/4 1/3
Just with 10% point decrease, the lead time dropped more than a half. This is because of
the nonlinearity of ρ/(1 − ρ). What if we have 55% failure than 50%? It will become much
worse system. W = 30 minutes. You must now be convinced that you have to manage the
bottleneck. You cannot load up your bottleneck which is already full.
Before finishing up, let us try out more complicated model. There are two inspections at
the end of station 2 and 3. Failure rates are 30% and 20% respectively. Then,
λ2 =λ1 + 0.2λ3
λ1 =α + 0.3λ2
λ3 = 0.7λ2 .
Now we can compute ρ1 = λ1 m1 , ρ2 = λ2 m2 , ρ3 = λ3 m3 . The stationary distribution also
has the same form.
π(n1 , n2 , n3 ) = (1 − ρ1 )ρn1 1 (1 − ρ2 )ρn2 2 (1 − ρ3 )ρn3 3
Remember that it has been infinite queue size.
85
26 Lecture 27: Apr 28
26.1 Problem Solving Session
26.1.1 Test 2: Question 3
Test 2 Question 3. The first call is under service and the second call is waiting. Part (b)
asks you about the probability that 1st call leaves before 3rd call arrives. Two clocks are
competing: one is measuring the service time of 1st call and the other is measuring the
remaining time until 3rd call arrives. Therefore,
service rate 1/4 1
P(S1 < A3 ) = = = .
service rate + arrival rate 1/4 + 2 9
Part (c) asks you about the probability that even the 2nd call will leave the system before
the 3rd call arrives. Once the 2nd call is in the service, because of the memoryless property,
the probability is again 1/9. Since these two events should happen together.
2
1 1
P(S1 < A3 )P(S2 < A3 ) = =
9 81
Part (d) asks you about the expected time until the 3rd call leaves. It is the sum of the
times until the 2nd call leaves and remaining time for the 3rd call. We have two scenarios
here. One is that the 3rd call has not arrived when the 2nd call leaves. The other is that
the 3rd call has already arrived and waiting for service.
E[time until 3rd call leave] =(4 + 4) + E[A3 + S3 |S1 < A3 , S2 < A3 ]P(S1 < A3 , S2 < A3 )
+ E[S3 |{S1 < A3 , S2 < A3 }C ](1 − P(S1 < A3 , S2 < A3 ))
9 1 80
=8 + +4
2 81 81
Part (e) is somewhat more difficult.
86
26.1.2 Test 2: Question 2
If you draw the state diagram of this DTMC, you will notice three groups of states in which
all states communicate each other. {1, 2}, {3, 4}, {5}. {3, 4} are transient meaning that the
chain will stay there only temporarily. Once you leave {3, 4}, there is no way coming back
there. If you run this chain for long time, there will be no chance the chain still stays either
in state 3 or 4.
100
.2 .8 0 0 0
.5 .5 0 0 0
0 .25 0 .75 0
0 0 .5 0 .5
0 0 0 0 1
5/13 8/13 0 0 0
5/13 8/13 0 0 0
≈
Starting from {1,2} you will end up being there. Compute stationary dist.
5/13 8/13 0 0 0
5/13 8/13 0 0 0
=
0 0 0 0 1
Starting from 5 you will stick there with prob 1.
5/13 8/13 0 0 0
5/13 8/13 0 0 0
=
0 0
0 0
0 0 0 0 1
Starting from {3,4} you have virtually no chance of staying there after 100 steps.
5/13 8/13 0 0 0
5/13 8/13 0 0 0
= (2/5)(5/13) (2/5)(8/13) 0 0 3/5
(1/5)(5/13) (1/5)(8/13) 0 0 4/5
0 0 0 0 1
Numbers are based on the following system of linear equations.
87
Define f3,{1,2} to be the probability that the chain is absorbed by {1, 2}.
(
f3,{1,2} = .25 + (.25)f4,{1,2}
f4,{1,2} = (.5)0 + (.5)f3,{1,2}
88