Book
Book
1 Memoryless Distributions 3
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 The Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Sums of Exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Poisson Processes 11
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Counting Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Stationary Independent Increments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
i
5.4 Jump Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7 UC Markov Semigroups 67
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Notation and Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.3 UC Markov Semigroups and their Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.4 From Intensity Matrix to Jump Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.5 Beyond Bounded Intensity Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9 Bibliography 87
9.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Bibliography 89
Proof Index 91
ii
Continuous Time Markov Chains
CONTENTS 1
Continuous Time Markov Chains
2 CONTENTS
CHAPTER
ONE
MEMORYLESS DISTRIBUTIONS
1.1 Overview
import numpy as np
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit
from scipy.special import factorial, binom
Consider betting on a roulette wheel and suppose that red has come up four times in a row.
Since five reds in a row is an unlikely event, many people instinctively feel that black is more likely on the fifth spin —
“Surely black will come up this time!”
But rational thought tells us such instincts are wrong: the four previous reds make no difference to the outcome of the
next spin.
(Many casinos offer an unlimited supply of free alcoholic beverages in order to discourage this kind of rational analysis.)
A mathematical restatement of this phenomenon is: the geometric distribution is memoryless.
3
Continuous Time Markov Chains
1.2.1 Memorylessness
An example can be constructed from the discussion of the roulette wheel above.
Suppose that,
• the outcome of each spin is either red or black,
• spins are labeled by 0, 1, 2, …,
• on each spin, black occurs with probability 𝜃 and
• outcomes across spins are independent.
Then (1.1) is the probability that the first occurrence of black is at spin 𝑘.
(The outcome “black” fails 𝑘 times and then succeeds.)
Consistent with our discussion in the introduction, the geometric distribution is memoryless.
In particular, given any nonnegative integer 𝑚, we have
In other words, regardless of how long we have seen only red outcomes, the probability of black on the next spin is the
same as the unconditional probability of getting black on the very first spin.
To establish (1.2), we use basic properties of the geometric distribution to obtain.
Later, when we construct continuous time Markov chains, we will need to specify the distribution of the holding times,
which are the time intervals between jumps.
As discussed above (and again below), the holding time distribution must be memoryless, so that the chain satisfies the
Markov property.
While the geometric distribution is memoryless, its discrete support makes it a poor fit for the continuous time case.
Hence we turn to the exponential distribution, which is supported on R+ .
A random variable 𝑌 on R+ is called exponential with rate 𝜆, denoted by 𝑌 ∼ Exp(𝜆), if
The exponential distribution can be regarded as the “limit” of the geometric distribution.
To illustrate, let us suppose that
• customers enter a shop at discrete times 𝑡0 , 𝑡1 , …
• these times are evenly spaced, so that ℎ = 𝑡𝑖+1 − 𝑡𝑖 for some ℎ > 0 and all 𝑖 ∈ Z+
• at each 𝑡𝑖 , either zero or one customers enter (no more because ℎ is small)
• entry at each 𝑡𝑖 occurs with probability 𝜆ℎ and is independent over 𝑖.
The fact that the entry probability is proportional to ℎ is important in what follows.
You can imagine many customers passing by the shop, each entering independently.
If we halve the time interval, then we also halve the probability that a customer enters.
Let
• 𝑌 be the time of the first arrival at the shop,
• 𝑡 be a given positive number and
• 𝑖(ℎ) be the largest integer such that 𝑡𝑖(ℎ) ≤ 𝑡.
Note that, as ℎ → 0, the grid becomes finer and 𝑡𝑖(ℎ) = 𝑖(ℎ)ℎ → 𝑡.
Writing 𝑖(ℎ) as 𝑖 and using the geometric distribution, the probability that the first arrival occurs after 𝑡𝑖 is (1 − 𝜆ℎ)𝑖 .
Hence
𝑖
P{𝑌 > 𝑡𝑖 } = (1 − 𝜆ℎ)𝑖 = (1 −
𝜆𝑖ℎ
𝑖
)
Using the fact that 𝑒𝑥 = lim𝑖→∞ (1 + 𝑥/𝑖)𝑖 for all 𝑥 and 𝑖ℎ = 𝑡𝑖 → 𝑡, we obtain, for large 𝑖,
The exponential distribution is the only memoryless distribution supported on R+ , as the next theorem attests.
P roof. To see that (1.3) holds when 𝑋 is exponential with rate 𝜆, fix 𝑠, 𝑡 > 0 and observe that
P{𝑋 > 𝑠 + 𝑡 and 𝑋 > 𝑠} = P{𝑋 > 𝑠 + 𝑡} = 𝑒−𝜆𝑠−𝜆𝑡 = 𝑒−𝜆𝑡
P{𝑋 > 𝑠} P{𝑋 > 𝑠} 𝑒−𝜆𝑠
To see that the converse holds, let 𝑋 be a random variable supported on R+ such that (1.3) holds.
The “exceedance” function 𝑓(𝑠) ∶= P{𝑋 > 𝑠} then has three properties:
1. 𝑓 is decreasing on R+ ,
2. 0 < 𝑓(𝑡) < 1 for all 𝑡 > 0,
3. 𝑓(𝑠 + 𝑡) = 𝑓(𝑠)𝑓(𝑡) for all 𝑠, 𝑡 > 0.
The first property is common to all exceedance functions, the second is due to the fact that 𝑋 is supported on all of R+ ,
and the third is (1.3).
From these three properties we will show that
This is sufficient to prove the claim because then 𝜆 ∶= − ln 𝑓(1) is a positive real number (by property 2) and, moreover,
It follows that 𝑓(𝑚/𝑛)𝑛 = 𝑓(1/𝑛)𝑚𝑛 = 𝑓(1)𝑚 and, raising to the power of 1/𝑛, we get (1.4) when 𝑡 = 𝑚/𝑛.
The discussion so far confirms that (1.4) holds when 𝑡 is rational.
So now take any 𝑡 ≥ 0 and rational sequences (𝑎𝑛 ) and (𝑏𝑛 ) converging to 𝑡 with 𝑎𝑛 ≤ 𝑡 ≤ 𝑏𝑛 for all 𝑛.
By property 1 we have 𝑓(𝑏𝑛 ) ≤ 𝑓(𝑡) ≤ 𝑓(𝑎𝑛 ) for all 𝑛, so
We know from the proceeding section that any distribution on R+ other than the exponential distribution fails to be
memoryless.
Here’s an example that helps to clarify (although the support of the distribution is a proper subset of R+ ).
A random variable 𝑌 has the Pareto distribution with positive parameters 𝑡0 , 𝛼 if
A random variable 𝑊 on R+ is said to have the Erlang distribution if its density has the form
𝜆𝑛 𝑡𝑛−1 −𝜆𝑡
𝑓(𝑡) = 𝑒 (𝑡 ≥ 0)
(𝑛 − 1)!
1.5 Exercises
Exercise 1.1
Due to its memoryless property, we can “stop” and “restart” an exponential draw without changing its distribution.
To illustrate this, we can think of fixing 𝜆 > 0, drawing from Exp(𝜆), and stopping and restarting whenever a threshold
𝑠 is crossed.
In particular, consider the random variable 𝑋 defined as follows:
• Draw 𝑌 from Exp(𝜆).
• If 𝑌 ≤ 𝑠, set 𝑋 = 𝑌 .
• If not, draw 𝑍 independently from Exp(𝜆) and set 𝑋 = 𝑠 + 𝑍.
Show that 𝑋 ∼ Exp(𝜆).
Exercise 1.2
Fix 𝜆 = 0.5 and 𝑠 = 1.0.
Simulate 1,000 draws of 𝑋 using the algorithm above.
Plot the fraction of the sample exceeding 𝑡 for each 𝑡 ≥ 0 (on a grid) and compare to 𝑡 ↦ 𝑒−𝜆𝑡 .
Is the fit good? How about if the number of draws is increased?
Are the results in line with those of the previous exercise?
1.6 Solutions
Note: code is currently not supported in sphinx-exercise so code-cell solutions are immediately after this solution
block.
λ = 0.5
np.random.seed(1234)
t_grid = np.linspace(0, 10, 200)
@njit
def draw_X(s=1.0, n=1_000):
draws = np.empty(n)
for i in range(n):
Y = np.random.exponential(scale=1/λ)
if Y <= s:
X = Y
else:
Z = np.random.exponential(scale=1/λ)
X = s + Z
draws[i] = X
return draws
fig, ax = plt.subplots()
draws = draw_X()
empirical_exceedance = [np.mean(draws > t) for t in t_grid]
ax.plot(t_grid, np.exp(- λ * t_grid), label='exponential exceedance')
ax.plot(t_grid, empirical_exceedance, label='empirical exceedance')
ax.legend()
plt.show()
1.6. Solutions 9
Continuous Time Markov Chains
The fit is already very close, which matches with the theory in Exercise 1.
The two lines become indistinguishable as 𝑛 is increased further.
fig, ax = plt.subplots()
draws = draw_X(n=10_000)
empirical_exceedance = [np.mean(draws > t) for t in t_grid]
ax.plot(t_grid, np.exp(- λ * t_grid), label='exponential exceedance')
ax.plot(t_grid, empirical_exceedance, label='empirical exceedance')
ax.legend()
plt.show()
TWO
POISSON PROCESSES
2.1 Overview
Counting processes count the number of “arrivals” occurring by a given time (e.g., the number of visitors to a website,
the number of customers arriving at a restaurant, etc.)
Counting processes become Poisson processes when the time interval between arrivals is IID and exponentially distributed.
Exponential distributions and Poisson processes have deep connections to continuous time Markov chains.
For example, Poisson processes are one of the simplest nontrivial examples of a continuous time Markov chain.
In addition, when continuous time Markov chains jump between states, the time between jumps is necessarily exponen-
tially distributed.
In discussing Poisson processes, we will use the following imports:
import numpy as np
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit
from scipy.special import factorial, binom
Let (𝐽𝑘 ) be an increasing sequence of nonnegative random variables satisfying 𝐽𝑘 → ∞ with probability one.
For example, 𝐽𝑘 might be the time the 𝑘-th customer arrives at a shop.
Then
11
Continuous Time Markov Chains
𝑁𝑡 ∶= max{𝑘 ≥ 0 | 𝐽𝑘 ≤ 𝑡}
𝐽𝑘 ∶= 𝑊1 + ⋯ 𝑊𝑘
This sets up a proof by induction, which is time consuming but not difficult — the details can be found in §29 of [Howard,
2017].
Another way to show that 𝑁𝑡 is Poisson with rate 𝜆 is to appeal to Lemma 1.1.
We observe that
This is the (integer valued) CDF for the Poisson distribution with parameter 𝑡𝜆.
An exercise at the end of the lecture asks you to verify that 𝑁𝑡 is Poisson-(𝑡𝜆) informally via simulation.
The next figure shows one realization of a Poisson process (𝑁𝑡 ), with jumps at each new arrival.
One of the defining features of a Poisson process is that it has stationary and independent increments.
This is due to the memoryless property of exponentials.
It means that
1. the variables {𝑁𝑡𝑖+1 − 𝑁𝑡𝑖 }𝑖∈𝐼 are independent for any strictly increasing finite sequence (𝑡𝑖 )𝑖∈𝐼 and
2. the distribution of 𝑁𝑡+ℎ − 𝑁𝑡 depends on ℎ but not 𝑡.
A detailed proof can be found in Theorem 2.4.3 of [Norris, 1998].
Instead of repeating this, we provide some intuition from a discrete approximation.
In the discussion below, we use the following well known fact: If (𝜃𝑛 ) is a sequence such that 𝑛𝜃𝑛 converges, then
We expect from the discussion above that (𝑁𝑡̂ ) approximates a Poisson process.
This intuition is correct because, fixing 𝑡, letting 𝑘 ∶= max{𝑖 ∈ Z+ ∶ 𝑡𝑖 ≤ 𝑡} and applying (2.3), we have
𝑘
𝑁𝑡̂ = ∑ 𝑉𝑖 ∼ Binomial(𝑘, ℎ𝜆) ≈ Poisson(𝑘ℎ𝜆)
𝑖=1
Using the fact that 𝑘ℎ = 𝑡𝑘 ≈ 𝑡 as ℎ → 0, we see that 𝑁𝑡̂ is approximately Poisson with rate 𝑡𝜆, just as we expected.
This approximate construction of a Poisson process helps illustrate the property of stationary independent increments.
This illustrates the idea that, for a Poisson process (𝑁𝑡 ), we have
𝑁𝑠+𝑡 − 𝑁𝑠 ∼ Poisson(𝑡𝜆)
In particular, increments are stationary (the distribution depends on 𝑡 but not 𝑠).
The approximation also illustrates independence of increments, since, in the approximation, increments depend on sep-
arate subsets of (𝑉𝑖 ).
2.4 Uniqueness
𝑀𝑠+𝑡 − 𝑀𝑠 ∼ Poisson(𝑡𝜆)
for any 𝑠, 𝑡.
The proof is similar to our earlier proof that the exponential distribution is the only memoryless distribution.
Details can be found in Section 6.2 of [Pardoux, 2008] or Theorem 2.4.3 of [Norris, 1998].
An important consequence of stationary independent increments is the restarting property, which means that, when sim-
ulating, we can freely stop and restart a Poisson process at any time:
P roof. Independence of (𝑀𝑡 ) and (𝑁𝑟 )𝑟≤𝑠 follows from indepenence of the increments of (𝑁𝑡 ).
2.4. Uniqueness 15
Continuous Time Markov Chains
In view of the uniqueness statement above, we can verify that (𝑀𝑡 ) is a Poisson process by showing that (𝑀𝑡 ) starts at
zero, takes values in Z+ and has stationary independent increments.
It is clear that (𝑀𝑡 ) starts at zero and takes values in Z+ .
In addition, if we take any 𝑡 < 𝑡′ , then
Hence (𝑀𝑡 ) has stationary increments and, using the relation 𝑀𝑡′ − 𝑀𝑡 = 𝑁𝑠+𝑡′ − 𝑁𝑠+𝑡 again, the increments are
independent as well.
We conclude that (𝑁𝑠+𝑡 − 𝑁𝑠 )𝑡≥0 is indeed a Poisson process independent of (𝑁𝑟 )𝑟≤𝑠 .
2.5 Exercises
Exercise 2.1
Fix 𝜆 > 0 and draw {𝑊𝑖 } as IID exponentials with rate 𝜆.
Set 𝐽𝑛 ∶= 𝑊1 + ⋯ 𝑊𝑛 with 𝐽0 = 0 and 𝑁𝑡 ∶= ∑𝑛≥0 𝑛𝟙{𝐽𝑛 ≤ 𝑡 < 𝐽𝑛+1 }.
Provide a visual test of the claim that 𝑁𝑡 is Poisson with parameter 𝑡𝜆.
Do this by fixing 𝑡 = 𝑇 , generating many independent draws of 𝑁𝑇 and comparing the empirical distribution of the
sample with a Poisson distribution with rate 𝑇 𝜆.
Try first with 𝜆 = 0.5 and 𝑇 = 10.
Exercise 2.2
In the lecture we used the fact that Binomial(𝑛, 𝜃) ≈ Poisson(𝑛𝜃) when 𝑛 is large and 𝜃 is small.
Investigate this relationship by plotting the distributions side by side.
Experiment with different values of 𝑛 and 𝜃.
2.6 Solutions
Note: code is currently not supported in sphinx-exercise so code-cell solutions are immediately after this solution
block.
λ = 0.5
T = 10
@njit
def draw_Nt(max_iter=1e5):
J = 0
n = 0
while n < max_iter:
W = np.random.exponential(scale=1/λ)
J += W
if J > T:
return n
n += 1
@njit
def draw_Nt_sample(num_draws):
draws = np.empty(num_draws)
for i in range(num_draws):
draws[i] = draw_Nt()
return draws
sample_size = 10_000
sample = draw_Nt_sample(sample_size)
max_val = sample.max()
vals = np.arange(0, max_val+1)
fig, ax = plt.subplots()
ax.legend(fontsize=12)
plt.show()
2.6. Solutions 17
Continuous Time Markov Chains
k_grid = np.arange(n)
binom_vals = [binomial(k, n, θ) for k in k_grid]
poisson_vals = [poisson(k, n * θ) for k in k_grid]
ax.plot(k_grid, binom_vals, 'o-', alpha=0.5, label='binomial')
ax.plot(k_grid, poisson_vals, 'o-', alpha=0.5, label='Poisson')
ax.set_title(f'$n={n}$ and $\\theta = {θ}$')
ax.legend(fontsize=12)
fig.tight_layout()
plt.show()
2.6. Solutions 19
Continuous Time Markov Chains
THREE
3.1 Overview
A continuous time stochastic process is said to have the Markov property if its past and future are independent given the
current state.
(A more formal definition is provided below.)
As we will see, the Markov property imposes a large amount of structure on continuous time processes.
This structure leads to elegant and powerful results on evolution and dynamics.
At the same time, the Markov property is general enough to cover many applied problems, as described in the introduction.
3.1.1 Setting
In this lecture, the state space where dynamics evolve will be a countable set, denoted henceforth by 𝑆, with typical
elements 𝑥, 𝑦.
(Note that “countable” is understood to include finite.)
Regarding notation, in what follows, ∑𝑥∈𝑆 is abbreviated to ∑𝑥 , the supremum sup𝑥∈𝑆 is abbreviated to sup𝑥 and so
on.
A distribution on 𝑆 is a function 𝜙 from 𝑆 to R+ with ∑𝑥 𝜙(𝑥) = 1.
Let 𝒟 denote the set of all distributions on 𝑆.
To economize on terminology, we define a matrix 𝐴 on 𝑆 to be a map from 𝑆 × 𝑆 to R.
When 𝑆 is finite, this reduces to the usual notion of a matrix, and, whenever you see expressions such as 𝐴(𝑥, 𝑦) below,
you can mentally identify them with more familiar matrix notation, such as 𝐴𝑖𝑗 , if you wish.
The product of two matrices 𝐴 and 𝐵 is defined by
(𝜙𝐴)(𝑦) = ∑ 𝜙(𝑥)𝐴(𝑥, 𝑦)
𝑥
21
Continuous Time Markov Chains
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit
We now introduce the definition of Markov processes, first reviewing the discrete case and then shifting to continuous
time.
The simplest Markov processes are those with a discrete time parameter and finite state space.
Assume for now that 𝑆 has 𝑛 elements and let 𝑃 be a Markov matrix, which means that 𝑃 (𝑥, 𝑦) ≥ 0 and ∑𝑦 𝑃 (𝑥, 𝑦) =
1 for all 𝑥.
In applications, 𝑃 (𝑥, 𝑦) represents the probability of transitioning from 𝑥 to 𝑦 in one step.
A Markov chain (𝑋𝑡 )𝑡∈Z+ on 𝑆 with Markov matrix 𝑃 is a sequence of random variables satisfying
In general, for given Markov matrix 𝑃 , there can be many Markov chains (𝑋𝑡 ) that satisfy (3.2).
This is due to the more general observation that, for a given distribution 𝜙, we can construct many random variables
having distribution 𝜙.
(The exercises below ask for one example.)
Hence 𝑃 is, in a sense, a more primitive object than (𝑋𝑡 ).
There is another way to see the fundamental importance of 𝑃 , which is by constructing the joint distribution of (𝑋𝑡 )
from 𝑃 .
P{𝑋𝑡 1
= 𝑦1 , … , 𝑋𝑡𝑚 = 𝑦𝑚 } = P𝜓 {(𝑥𝑡 ) ∈ 𝑆 ∞ ∶ 𝑥𝑡𝑖 = 𝑦𝑖 for 𝑖 = 1, … 𝑚} (3.4)
For any Markov chain (𝑋𝑡 ) satisfying (3.2) and 𝑋0 ∼ 𝜓, the restriction (𝑋0 , … , 𝑋𝑛 ) has joint distribution P𝑛𝜓 .
This is a solved exercise below.
The last step is to show that the family (P𝑛𝜓 ) defined at each 𝑛 ∈ N extends uniquely to a distribution P𝜓 over the infinite
sequences in 𝑆 ∞ .
That this is true follows from a well known theorem of Kolmogorov.
Hence 𝑃 defines the joint distribution P𝜓 when paired with any initial condition 𝜓.
The definition of a Markov chain (𝑋𝑡 )𝑡∈Z+ on 𝑆 with Markov matrix 𝑃 is exactly as in (3.2).
Given Markov matrix 𝑃 and 𝜙 ∈ 𝒟, we define 𝜙𝑃 by (3.3).
Then, as before, 𝜙𝑃 can be understood as the distribution of 𝑋𝑡+1 when 𝑋𝑡 has distribution 𝜙.
The function 𝜙𝑃 is in 𝒟, since, by (3.3), it is nonnegative and
(Swapping the order of infinite sums is justified here by the fact that all elements are nonnegative — a version of Tonelli’s
theorem).
If 𝑃 and 𝑄 are Markov matrices on 𝑆, then, using the definition in (3.1),
All of the preceding discussion on the connection between 𝑃 and the joint distribution of (𝑋𝑡 ) when 𝑆 is finite carries
over to the current setting.
A continuous time stochastic process on 𝑆 is a collection (𝑋𝑡 ) of 𝑆-valued random variables 𝑋𝑡 defined on a common
probability space and indexed by 𝑡 ∈ R+ .
Let 𝐼 be the Markov matrix on 𝑆 defined by 𝐼(𝑥, 𝑦) = 𝟙{𝑥 = 𝑦}.
A Markov semigroup is a family (𝑃𝑡 ) of Markov matrices on 𝑆 satisfying
1. 𝑃0 = 𝐼,
2. lim𝑡→0 𝑃𝑡 (𝑥, 𝑦) = 𝐼(𝑥, 𝑦) for all 𝑥, 𝑦 in 𝑆, and
3. the semigroup property 𝑃𝑠+𝑡 = 𝑃𝑠 𝑃𝑡 for all 𝑠, 𝑡 ≥ 0.
The interpretation of 𝑃𝑡 (𝑥, 𝑦) is the probability of moving from state 𝑥 to state 𝑦 in 𝑡 units of time.
As such it is natural that 𝑃0 (𝑥, 𝑦) = 1 if 𝑥 = 𝑦 and zero otherwise, which is condition 1.
Condition 2 is continuity with respect to 𝑡, which might seem restrictive but it is in fact very mild.
For all practical applications, probabilities do not jump — although the chain (𝑋𝑡 ) itself can of course jump from state
to state as time goes by.1
The semigroup property in condition 3 is nothing more than a continuous time version of the Chapman-Kolmogorov
equation.
This becomes clearer if we write it more explicitly as
A stochastic process (𝑋𝑡 ) is called a (time homogeneous) continuous time Markov chain on 𝑆 with Markov semigroup
(𝑃𝑡 ) if
Next one builds finite dimensional distributions over 𝑟𝑐𝑆 using expressions similar to (3.5).
Finally, the Kolmogorov extension theorem is applied, similar to the discrete time case.
Corollary 6.4 of [Le Gall, 2016] provides full details.
Given a Markov semigroup (𝑃𝑡 ) on 𝑆, does there always exist a continuous time Markov chain (𝑋𝑡 ) such that (3.8) holds?
The answer is affirmative.
To illustrate, pick any Markov semigroup (𝑃𝑡 ) on 𝑆 and fix initial condition 𝜓.
Next, create the corresponding joint distribution P𝜓 over 𝑟𝑐𝑆, as described above.
Now, for each 𝑡 ≥ 0, let 𝜋𝑡 be the time 𝑡 projection on 𝑟𝑐𝑆, which maps any right continuous function (𝑥𝜏 ) into its time
𝑡 value 𝑥𝑡 .
Finally, let 𝑋𝑡 be an 𝑆-valued function on 𝑟𝑐𝑆 defined at (𝑥𝜏 ) ∈ 𝑟𝑐𝑆 by 𝜋𝑡 ((𝑥𝜏 )).
In other words, after P𝜓 picks out some time path (𝑥𝜏 ) ∈ 𝑟𝑐𝑆, the Markov chain (𝑋𝑡 ) simply reports this time path.
Hence (𝑋𝑡 ) automatically has the correct distribution.
The chain (𝑋𝑡 ) constructed in this way is called the canonical chain for the semigroup (𝑃𝑡 ) and initial condition 𝜓.
While we have answered the existence question in the affirmative, the canonical construction is quite abstract.
Moreover, there is little information about how we might simulate such a chain.
Fortunately, it turns out that there are more concrete ways to build continuous time Markov chains from the objects that
describe their distributions.
We will learn about these in a later lecture.
The Markov property carries some strong implications that are not immediately obvious.
Let’s take some time to explore them.
Let’s look at how the Markov property can fail, via an intuitive rather than formal discussion.
Let (𝑋𝑡 ) be a continuous time stochastic process with state space 𝑆 = {0, 1}.
The process starts at 0 and updates at follows:
1. Draw 𝑊 independently from a fixed Pareto distribution.
2. Hold (𝑋𝑡 ) in its current state for 𝑊 units of time and then switch to the other state.
3. Go to step 1.
What is the probability that 𝑋𝑠+ℎ = 𝑖 given both the history (𝑋𝑟 )𝑟≤𝑠 and current information 𝑋𝑠 = 𝑖?
If ℎ is small, then this is close to the probability that there are zero switches over the time interval (𝑠, 𝑠 + ℎ].
To calculate this probability, it would be helpful to know how long the state has been at current state 𝑖.
This is because the Pareto distribution is not memoryless.
(With a Pareto distribution, if we know that 𝑋𝑡 has been at 𝑖 for a long time, then a switch in the near future becomes
more likely.)
As a result, the history prior to 𝑋𝑠 is useful for predicting 𝑋𝑠+ℎ , even when we know 𝑋𝑠 .
Thus, the Markov property fails.
From the discussion above, we see that, for continuous time Markov chains, the length of time between jumps must be
memoryless.
Recall that, by Theorem 1.1, the only memoryless distribution supported on R+ is the exponential distribution.
Hence, a continuous time Markov chain waits at states for an exponential amount of time and then jumps.
The way that the new state is chosen must also satisfy the Markov property, which adds another restriction.
In summary, we already understand the following about continuous time Markov chains:
1. Holding times are independent exponential draws.
2. New states are chosen in a ``Markovian’’ way, independent of the past given the current state.
We just need to clarify the details in these steps to have a complete description.
Let’s look at some examples of processes that possess the Markov property.
The Poisson process discussed in our previous lecture is a Markov process on state space Z+ .
To obtain the Markov semigroup, we observe that, for 𝑘 ≥ 𝑗,
(𝜆𝑡)𝑘−𝑗
𝑃𝑡 (𝑗, 𝑘) = 𝑒−𝜆𝑡 (3.9)
(𝑘 − 𝑗)!
This chain of equalities was obtained with 𝑁𝑠 = 𝑗 for arbitrary 𝑗, so we can replace 𝑗 with 𝑁𝑠 in (3.9) to verify the
Markov property (3.8) for the Poisson process.
Under (3.9), each 𝑃𝑡 is a Markov matrix and (𝑃𝑡 ) is a Markov semigroup.
The proof of the semigroup property is a solved exercise below.2
If 𝑋𝑡 = 0, then no customers arrive and the firm places an order for 𝑏 units.
The order arrives after a delay of 𝐷 units of time, where 𝐷 ∼ Exp(𝜆).
(We use the same 𝜆 here just for convenience, to simplify the exposition.)
3.5.1 Representation
The inventory process jumps to a new value either when a new customer arrives or when new stock arrives.
Between these arrival times it is constant.
Hence, to track 𝑋𝑡 , it is enough to track the jump times and the new values taken at the jumps.
In what follows, we denote the jump times by {𝐽𝑘 } and the values at jumps by {𝑌𝑘 }.
Then we construct the state process via
3.5.2 Simulation
2 In the definition of 𝑃 in (3.9), we use the convention that 00 = 1, which leads to 𝑃 = 𝐼 and lim
𝑡 0 𝑡→0 𝑃𝑡 (𝑗, 𝑘) = 𝐼(𝑗, 𝑘) for all 𝑗, 𝑘. These
facts, along with the semigroup property, imply that (𝑃𝑡 ) is a valid Markov semigroup.
Return the path as a function X(t) constructed from (J_k) and (Y_k).
"""
J, Y = 0, b
J_vals, Y_vals = [J], [Y]
np.random.seed(seed)
while True:
W = np.random.exponential(scale=1/λ) # W ~ Exp(λ)
J += W
J_vals.append(J)
if J >= T:
break
# Update Y
if Y == 0:
Y = b
else:
U = np.random.geometric(α)
Y = Y - min(Y, U)
Y_vals.append(Y)
Y_vals = np.array(Y_vals)
J_vals = np.array(J_vals)
def X(t):
if t == 0.0:
return Y_vals[0]
else:
k = np.searchsorted(J_vals, t)
return Y_vals[k-1]
return X
T = 20
X = sim_path(T=T)
fig, ax = plt.subplots()
ax.step(grid, [X(t) for t in grid], label="$X_t$")
ax.set(xlabel="time", ylabel="inventory")
ax.legend()
plt.show()
In models such as the one described above, the embedded discrete time process (𝑌𝑘 ) is called the “embedded jump chain”.
It is easy to see that (𝑌𝑘 ) is discrete time finite state Markov chain.
Its Markov matrix 𝐾 is given by 𝐾(𝑥, 𝑦) = 𝟙{𝑦 = 𝑏} when 𝑥 = 0 and, when 0 < 𝑥 ≤ 𝑏,
⎧𝟘 if 𝑦 ≥ 𝑥
{
𝐾(𝑥, 𝑦) = ⎨P{𝑥 − 𝑈 = 𝑦} = (1 − 𝛼)𝑥−𝑦−1 𝛼 if 0 < 𝑦 < 𝑥 (3.11)
{P{𝑈 ≥ 𝑥} = (1 − 𝛼)𝑥−1 if 𝑦 = 0
⎩
The inventory model just described has the Markov property precisely because
1. the jump chain (𝑌𝑘 ) is Markov in discrete time and
2. the holding times are independent exponential draws.
Rather than providing more details on these points here, let us first describe a more general setting where the arguments
will be clearer and more useful.
The examples we have focused on so far are special cases of Markov processes with constant jump intensities.
These processes turn out to be very representative (although the constant jump intensity will later be relaxed).
Let’s now summarize the model and its properties.
3.6.1 Construction
The data for a Markov process on 𝑆 with constant jump rates are
• a parameter 𝜆 > 0 called the jump rate, which governs the jump intensities and
• a Markov matrix 𝐾 on 𝑆, called the jump matrix.
To run the process we also need an initial condition 𝜓 ∈ 𝒟.
The process (𝑋𝑡 ) is constructed by holding at each state for an exponential amount of time, with rate 𝜆, and then updating
to a new state via 𝐾.
In more detail, the construction is
As before, the discrete time process (𝑌𝑘 ) is called the embedded jump chain.
(Not to be confused with (𝑋𝑡 ), which is often called a “jump process” or “jump chain” due to the fact that it changes
states with jumps.)
The draws (𝑊𝑘 ) are called the wait times or holding times.
3.6.2 Examples
Let’s show that the jump process (𝑋𝑡 ) constructed above satisfies the Markov property, and obtain the Markov semigroup
at the same time.
We will use two facts:
• the jump chain (𝑌𝑘 ) has the Markov property in discrete time and
• the Poisson process has stationary independent increments.
From these facts it is intuitive that the distribution of 𝑋𝑡+𝑠 given the whole history ℱ𝑠 = {(𝑁𝑟 )𝑟≤𝑠 , (𝑌𝑘 )𝑘≤𝑁𝑠 } depends
only on 𝑋𝑠 .
Indeed, if we know 𝑋𝑠 , then we can simply
• restart the Poisson process from 𝑁𝑠 and then
• starting from 𝑋𝑠 = 𝑌𝑁𝑠 , update the embedded jump chain (𝑌𝑘 ) using 𝐾 each time a new jump occurs.
Let’s write this more mathematically.
Fixing 𝑦 ∈ 𝑆 and 𝑠, 𝑡 ≥ 0, we have
Recalling that 𝑁𝑠+𝑡 − 𝑁𝑠 is Poisson distributed with rate 𝑡𝜆, independent of the history ℱ𝑠 , we can write the display
above as
𝑘
P{𝑋𝑠+𝑡 = 𝑦 | ℱ𝑠 } = ∑ P{𝑌𝑁 +𝑘 = 𝑦 | ℱ𝑠 } (𝑡𝜆)
𝑠
𝑘!
𝑒−𝑡𝜆
𝑘≥0
Because the embedded jump chain is Markov with Markov matrix 𝐾, we can simplify further to
𝑘
(𝑡𝜆)𝑘 −𝑡𝜆
P{𝑋𝑠+𝑡 = 𝑦 | ℱ𝑠 } = ∑ 𝐾 𝑘 (𝑌𝑁 , 𝑦) (𝑡𝜆)
𝑘!𝑠
𝑒−𝑡𝜆 = ∑ 𝐾 𝑘 (𝑋𝑠 , 𝑦)
𝑘!
𝑒
𝑘≥0 𝑘≥0
Since the expression above depends only on 𝑋𝑠 , we have proved that (𝑋𝑡 ) has the Markov property.
The Markov semigroup can be obtained from our final result, conditioning on 𝑋𝑠 = 𝑥 to get
(𝑡𝜆)𝑘
𝑃𝑡 (𝑥, 𝑦) = P{𝑋𝑠+𝑡 = 𝑦 | 𝑋𝑠 = 𝑥} = 𝑒−𝑡𝜆 ∑ 𝐾 𝑘 (𝑥, 𝑦)
𝑘≥0
𝑘!
If 𝑆 is finite, we can write this in matrix form and use the definition of the matrix exponential to get
(𝑡𝜆𝐾)𝑘
𝑃𝑡 = 𝑒−𝑡𝜆 ∑ = 𝑒−𝑡𝜆 𝑒𝑡𝜆𝐾 = 𝑒𝑡𝜆(𝐾−𝐼)
𝑘≥0
𝑘!
This is a simple and elegant representation of the Markov semigroup that makes it easy to understand and analyze distri-
bution dynamics.
For example, if 𝑋0 has distribution 𝜓, then 𝑋𝑡 has distribution
(Here 𝑚 indicates simulation number 𝑚, which you might think of as the outcome for firm 𝑚.)
Next, for any given 𝑡, we define 𝜓𝑡̂ ∈ 𝒟 as the histogram of observations at time 𝑡, or, equivalently the cross-sectional
distribution at 𝑡:
1 𝑀
𝜓𝑡̂ (𝑥) ∶= ∑ 𝟙{𝑋𝑡 = 𝑥} (𝑥 ∈ 𝑆)
𝑀 𝑚=1
When 𝑀 is large, 𝜓𝑡̂ (𝑥) will be close to P{𝑋𝑡 = 𝑥} by the law of large numbers.
In other words, in the limit we recover 𝜓𝑡 .
Option 2 is to insert the parameters into the right hand side of (3.12) and compute 𝜓𝑡 as 𝜓0 𝑃𝑡 .
The figure below is created using option 2, with 𝛼 = 0.6, 𝜆 = 0.5 and 𝑏 = 10.
For the initial distribution we pick a binomial distribution.
Since we cannot compute the entire uncountable flow 𝑡 ↦ 𝜓𝑡 , we iterate forward 200 steps at time increments ℎ = 0.1.
In the figure, hot colors indicate initial conditions and early dates (so that the distribution “cools” over time)
In the (solved) exercises you will be asked to try to reproduce this figure.
3.8 Exercises
Exercise 3.1
Consider the binary (Bernoulli) distribution where outcomes 0 and 1 each have probability 0.5.
Construct two different random variables with this distribution.
Exercise 3.2
Show by direct calculation that the Poisson matrices (𝑃𝑡 ) defined in (3.9) satisfy the semigroup property (3.7).
Hints
• Recall that 𝑃𝑡 (𝑗, 𝑘) = 0 whenever 𝑗 > 𝑘.
• Consider using the binomial formula.
Exercise 3.3
Consider the distribution over 𝑆 𝑛+1 previously shown in (3.5), which is
Show that, for any Markov chain (𝑋𝑡 ) satisfying (3.2) and 𝑋0 ∼ 𝜓, the restriction (𝑋0 , … , 𝑋𝑛 ) has joint distribution
P𝑛𝜓 .
Exercise 3.4
Try to produce your own version of the figure Probability flows for the inventory model.
The initial condition is ψ_0 = binom.pmf(states, n, 0.25) where n = b + 1.
3.9 Solutions
Note: code is currently not supported in sphinx-exercise so code-cell solutions are immediately after this solution
block.
(𝜆𝑠)𝑖−𝑗 (𝜆𝑡)𝑘−𝑖
∑ 𝑃𝑠 (𝑗, 𝑖)𝑃𝑡 (𝑖, 𝑘) = 𝑒−𝜆(𝑠+𝑡) ∑
𝑖≥0 𝑗≤𝑖≤𝑘
(𝑖 − 𝑗)! (𝑘 − 𝑖)!
𝑠ℓ 𝑡𝑘−𝑗−ℓ
= 𝑒−𝜆(𝑠+𝑡) 𝜆𝑘−𝑗 ∑
0≤ℓ≤𝑘−𝑗
ℓ! (𝑘 − 𝑗 − ℓ)!
𝑘 − 𝑗 𝑠ℓ 𝑡𝑘−𝑗−ℓ
= 𝑒−𝜆(𝑠+𝑡) 𝜆𝑘−𝑗 ∑ ( )
0≤ℓ≤𝑘−𝑗
ℓ (𝑘 − 𝑗)!
(𝜆(𝑠 + 𝑡))𝑘−𝑗
∑ 𝑃𝑠 (𝑗, 𝑖)𝑃𝑡 (𝑖, 𝑘) = 𝑒−𝜆(𝑠+𝑡) = 𝑃𝑠+𝑡 (𝑗, 𝑘)
𝑖≥0
(𝑘 − 𝑗)!
From the Markov property and the induction hypothesis, the right hand side is
𝑃 (𝑥𝑛−1 , 𝑥𝑛 )P𝑛−1
𝜓 (𝑥0 , 𝑥1 , … , 𝑥𝑛−1 ) = 𝑃 (𝑥𝑛−1 , 𝑥𝑛 )𝜓(𝑥0 )𝑃 (𝑥0 , 𝑥1 ) × ⋯ × 𝑃 (𝑥𝑛−2 , 𝑥𝑛−1 )
α = 0.6
λ = 0.5
b = 10
n = b + 1
states = np.arange(n)
I = np.identity(n)
3.9. Solutions 35
Continuous Time Markov Chains
for i in range(steps):
ax.bar(states, ψ, zs=t, zdir='y',
color=colors[i], alpha=0.8, width=0.4)
ψ = P_t(ψ, t=step_size)
t += step_size
ax.set_xlabel('inventory')
ax.set_ylabel('$t$')
plt.show()
3.9. Solutions 37
Continuous Time Markov Chains
FOUR
4.1 Overview
As models become more complex, deriving analytical representations of the Markov semigroup (𝑃𝑡 ) becomes harder.
This is analogous to the idea that solutions to continuous time models often lack analytical solutions.
For example, when studying deterministic paths in continuous time, infinitesimal descriptions (ODEs and PDEs) are often
more intuitive and easier to write down than the associated solutions.
(This is one of the shining insights of mathematics, beginning with the work of great scientists such as Isaac Newton.)
We will see in this lecture that the same is true for continuous time Markov chains.
To help us focus on intuition in this lecture, rather than technicalities, the state space is assumed to be finite, with |𝑆| = 𝑛.
Later we will investigate the case where |𝑆| = ∞.
We will use the following imports
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit
As we have seen, continuous time Markov chains jump between states, and hence can have the form
where (𝐽𝑘 ) are jump times and (𝑌𝑘 ) are the states at each jump.
(We are assuming that 𝐽𝑘 → ∞ with probability one, so that 𝑋𝑡 is well defined for all 𝑡 ≥ 0, but this is always true for
when holding times are exponential and the state space is finite.)
In the previous lecture,
• the sequence (𝑌𝑘 ) was drawn from a Markov matrix 𝐾 and called the embedded jump chain, while
• the holding times 𝑊𝑘 ∶= 𝐽𝑘 − 𝐽𝑘−1 were IID and Exp(𝜆) for some constant jump intensity 𝜆.
39
Continuous Time Markov Chains
In this lecture, we will generalize by allowing the jump intensity to vary with the state.
This difference sounds minor but in fact it will allow us to reach full generality in our description of continuous time
Markov chains, as clarified below.
4.2.1 Motivation
As a motivating example, recall the inventory model, where we assumed that the wait time for the next customer was equal
to the wait time for new inventory.
This assumption was made purely for convenience and seems unlikely to hold true.
When we relax it, the jump intensities depend on the state.
The sequence (𝑊𝑘 ) is drawn as an IID sequence and (𝑊𝑘 ) and (𝑌𝑘 ) are drawn independently.
The restriction 𝐾(𝑥, 𝑥) = 0 for all 𝑥 implies that (𝑋𝑡 ) actually jumps at each jump time.
For the jump process (𝑋𝑡 ) with time varying intensities described in the jump chain algorithm, calculating the Markov
semigroup is not a trivial exercise.
The approach we adopt is
1. Use probabilistic reasoning to obtain an integral equation that the semigroup must satisfy.
2. Convert the integral equation into a differential equation that is easier to work with.
3. Solve this differential equation to obtain the Markov semigroup (𝑃𝑡 ).
The differential equation in question has a special name: the Kolmogorov backward equation.
Here (𝑃𝑡 ) is the Markov semigroup of (𝑋𝑡 ), the process constructed via Algorithm 4.1, while 𝐾𝑃𝑡−𝜏 is the matrix product
of 𝐾 and 𝑃𝑡−𝜏 .
Regarding the first term on the right hand side of (4.2), we have
Evaluating the expectation and using the independence of 𝐽1 and 𝑌1 , this becomes
∞
P{𝑋𝑡 = 𝑦, 𝐽1 ≤ 𝑡} = ∫ 𝟙{𝜏 ≤ 𝑡} ∑ 𝐾(𝑥, 𝑧)𝑃𝑡−𝜏 (𝑧, 𝑦)𝜆(𝑥)𝑒−𝜏𝜆(𝑥) 𝑑𝜏
0 𝑧
𝑡
= 𝜆(𝑥) ∫ ∑ 𝐾(𝑥, 𝑧)𝑃𝑡−𝜏 (𝑧, 𝑦)𝑒−𝜏𝜆(𝑥) 𝑑𝜏
0 𝑧
We have now confirmed that the semigroup (𝑃𝑡 ) associated with the jump chain process (𝑋𝑡 ) satisfies (4.1).
Equation (4.1) is important but we can simplify it further without losing information by taking the time derivative.
This leads to our main result for the lecture
The derivative on the left hand side of (4.4) is taken element by element, with respect to 𝑡, so that
𝑑
𝑃𝑡′ (𝑥, 𝑦) = ( 𝑃 (𝑥, 𝑦)) ((𝑥, 𝑦) ∈ 𝑆 × 𝑆)
𝑑𝑡 𝑡
The proof that differentiating (4.1) yields (4.4) is an important exercise (see below).
𝑃𝑡 = 𝑒𝑡𝑄 (4.5)
where the right hand side is the matrix exponential, with definition
1 𝑡2
𝑒𝑡𝑄 = ∑ (𝑡𝑄)𝑘 = 𝐼 + 𝑡𝑄 + 𝑄2 + ⋯ (4.6)
𝑘≥0
𝑘! 2!
Working element by element, it is not difficult to confirm that the derivative of the exponential function 𝑡 ↦ 𝑒𝑡𝑄 is
𝑑 𝑡𝑄
𝑒 = 𝑄𝑒𝑡𝑄 = 𝑒𝑡𝑄 𝑄 (4.7)
𝑑𝑡
Hence, differentiating (4.5) gives 𝑃𝑡′ = 𝑄𝑒𝑡𝑄 = 𝑄𝑃𝑡 , which convinces us that the exponential solution satisfies (4.4).
Notice that our solution
for the semigroup of the jump process (𝑋𝑡 ) associated with the jump matrix 𝐾 and the jump intensity function 𝜆 ∶ 𝑆 →
(0, ∞) is consistent with our earlier result.
In particular, we showed that, for the model with constant jump intensity 𝜆, we have 𝑃𝑡 = 𝑒𝑡𝜆(𝐾−𝐼) .
This is obviously a special case of (4.8).
While we have confirmed that 𝑃𝑡 = 𝑒𝑡𝑄 solves the Kolmogorov backward equation, we still need to check that this
solution is a Markov semigroup.
As a small exercise, you can check that, with 1 representing a column vector of ones, the following is true
𝑡2 2
𝑃𝑡 1 = 𝑒𝑡𝑄 1 = 𝐼1 + 𝑡𝑄1 + 𝑄 1 + ⋯ = 𝐼1 = 1
2!
In other words, each 𝑃𝑡 has unit row sums.
Next we check nonnegativity of all elements of 𝑃𝑡 (which can easily fail for matrix exponentials).
To this end, adopting an argument from [Stroock, 2013], we set 𝑚 ∶= max𝑥 𝜆(𝑥) and 𝑃 ̂ ∶= 𝐼 + 𝑄/𝑚.
It is not difficult to check that 𝑃 ̂ is a Markov matrix and 𝑄 = 𝑚(𝑃 ̂ − 𝐼).
Recalling that, for matrix exponentials, 𝑒𝐴+𝐵 = 𝑒𝐴 𝑒𝐵 whenever 𝐴𝐵 = 𝐵𝐴, we have
̂ ̂ (𝑡𝑚)2 ̂ 2
𝑒𝑡𝑄 = 𝑒𝑡𝑚(𝑃 −𝐼) = 𝑒−𝑡𝑚𝐼 𝑒𝑡𝑚𝑃 = 𝑒−𝑡𝑚 (𝐼 + 𝑡𝑚𝑃 ̂ + 𝑃 + ⋯)
2!
It is clear from this representation that all entries of 𝑒𝑡𝑄 are nonnegative.
Finally, we need to check the continuity condition 𝑃𝑡 (𝑥, 𝑦) → 𝐼(𝑥, 𝑦) as 𝑡 → 0, which is also part of the definition of
a Markov semigroup. This is immediate, in the present case, because the exponential function is continuous, and hence
𝑃𝑡 = 𝑒𝑡𝑄 → 𝑒0 = 𝐼.
We can now be reassured that our solution to the Kolmogorov backward equation is indeed a Markov semigroup.
4.4.2 Uniqueness
Might there be another, entirely different Markov semigroup that also satisfies the Kolmogorov backward equation?
The answer is no: linear ODEs in finite dimensional vector space with constant coefficients and fixed initial conditions (in
this case 𝑃0 = 𝐼) have unique solutions.
In fact it’s not hard to supply a proof — see the exercises.
Let us look at a modified version of the inventory model where jump intensities depend on the state.
In particular, the wait time for new inventory will now be exponential at rate 𝛾.
The arrival rate for customers will still be denoted by 𝜆 and allowed to differ from 𝛾.
For parameters we take
α = 0.6
λ = 0.5
γ = 0.1
b = 10
J, Y = 0, X_0
m = 0
@njit
def independent_draws(T=10, num_draws=100):
"Generate a vector of independent draws of X_T."
for i in range(num_draws):
X_0 = np.random.binomial(b+1, 0.25)
draws[i] = draw_X(T, X_0)
return draws
T = 30
n = b + 1
draws = independent_draws(T, num_draws=100_000)
fig, ax = plt.subplots()
plt.show()
If you experiment with the code above, you will see that the large amount of mass on zero is due to the low arrival rate 𝛾
for inventory.
4.6 Exercises
Exercise 4.1
In the discussion above, we generated an approximation of 𝜓𝑇 when 𝑇 = 30, the initial condition is Binomial(𝑛, 0.25)
and parameters are set to
α = 0.6
λ = 0.5
γ = 0.1
b = 10
Exercise 4.2
Prove that differentiating (4.1) at each (𝑥, 𝑦) yields (4.4).
Exercise 4.3
We claimed above that the solution 𝑃𝑡 = 𝑒𝑡𝑄 is the unique Markov semigroup satisfying the backward equation 𝑃𝑡′ =
𝑄𝑃𝑡 .
Try to supply a proof.
(This is not an easy exercise but worth thinking about in any case.)
4.7 Solutions
Note: code is currently not supported in sphinx-exercise so code-cell solutions are immediately after this solution
block.
α = 0.6
λ = 0.5
γ = 0.1
b = 10
states = np.arange(n)
I = np.identity(n)
# Q matrix
Q = np.empty_like(K)
for i in range(n):
for j in range(n):
Q[i, j] = r[i] * (K[i, j] - I[i, j])
fig, ax = plt.subplots()
plt.show()
4.7. Solutions 47
Continuous Time Markov Chains
Note also that, with the change of variable 𝑠 = 𝑡 − 𝜏 , we can rewrite (4.1) as
𝑡
𝑃𝑡 (𝑥, 𝑦) = 𝑒−𝑡𝜆(𝑥) {𝐼(𝑥, 𝑦) + 𝜆(𝑥) ∫ (𝐾𝑃𝑠 )(𝑥, 𝑦)𝑒𝑠𝜆(𝑥) 𝑑𝑠} (4.11)
0
̂ − 𝑃𝑠 𝑃𝑡−𝑠
𝑉𝑠′ = 𝑃𝑠′ 𝑃𝑡−𝑠 ̂ − 𝑃𝑠 𝑄𝑃𝑡−𝑠
̂ ′ = 𝑃𝑠 𝑄𝑃𝑡−𝑠 ̂ =0
FIVE
5.1 Overview
In this lecture we approach continuous time Markov chains from a more analytical perspective.
The emphasis will be on describing distribution flows through vector-valued differential equations and their solutions.
These distribution flows show how the time 𝑡 distribution associated with a given Markov chain (𝑋𝑡 ) changes over time.
Distribution flows will be identified with initial value problems generated by autonomous linear ordinary differential
equations (ODEs) in vector space.
We will see that the solutions of these flows are described by Markov semigroups.
This leads us back to the theory we have already constructed – some care will be taken to clarify all the connections.
In order to avoid being distracted by technicalities, we continue to defer our treatment of infinite state spaces, assuming
throughout this lecture that |𝑆| = 𝑛.
As before, 𝒟 is the set of all distributions on 𝑆.
We will use the following imports
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit
from scipy.linalg import expm
Previously we generated this figure, which shows how distributions evolve over time for the inventory model under a
certain parameterization:
(Hot colors indicate early dates and cool colors denote later dates.)
We also learned how this flow is related to the Kolmogorov backward equation, which is an ODE.
In this section we examine distribution flows and their connection to ODEs and continuous time Markov chains more
systematically.
49
Continuous Time Markov Chains
There’s a sense in which a discrete time Markov chain “is” a homogeneous linear difference equation in distribution space.
To clarify this, suppose we take 𝐺 to be a linear map from 𝒟 to itself and write down the difference equation
Because 𝐺 is a linear map from a finite dimensional space to itself, it can be represented by a matrix.
Moreover, a matrix 𝑃 is a Markov matrix if and only if 𝜓 ↦ 𝜓𝑃 sends 𝒟 into itself (check it if you haven’t already).
So, under the stated conditions, our difference equation (5.1) uniquely identifies a Markov matrix, along with an initial
condition 𝜓0 .
Together, these objects identify the joint distribution of a discrete time Markov chain, as previously described.
We have just argued that a discrete time Markov chain can be identified with a linear difference equation evolving in 𝒟.
This strongly suggests that a continuous time Markov chain can be identified with a linear ODE evolving in 𝒟.
This intuition is correct and important.
The rest of the lecture maps out the main ideas.
where
• 𝑄 is an 𝑛 × 𝑛 matrix,
• distributions are again understood as row vectors, and
• derivatives are taken element by element, so that
𝑑 𝑑
𝜓𝑡′ = ( 𝑑𝑡 𝜓𝑡 (𝑥1 ) ⋯ 𝑑𝑡 𝜓𝑡 (𝑥𝑛 ))
Using the matrix exponential, the unique solution to the initial value problem (5.2) is
𝑃𝑡′ = 𝑃𝑡 𝑄
Q = ((-3, 2, 1),
(3, -5, 2),
(4, 6, -10))
As the above discussion shows, we can take the Kolmogorov forward equation 𝑃𝑡′ = 𝑃𝑡 𝑄 and premultiply by any distri-
bution 𝜓0 to get the distribution ODE 𝜓𝑡′ = 𝜓𝑡 𝑄.
In this sense, we can understand the Kolmogorov forward equation as pushing distributions forward in time.
Analogously, we can take the Kolmogorov backward equation 𝑃𝑡′ = 𝑄𝑃𝑡 and postmultiply by any vector ℎ to get
Recalling that (𝑃𝑡 ℎ)(𝑥) = E [ℎ(𝑋𝑡 ) | 𝑋0 = 𝑥], this vector ODE tells us how expectations evolve, conditioning backward
to time zero.
Both the forward and the backward equations uniquely pin down the same solution 𝑃𝑡 = 𝑒𝑡𝑄 when combined with the
initial condition 𝑃0 = 𝐼.
The ODE 𝜓𝑡′ = 𝜓𝑡 𝑄 is sometimes called the Fokker–Planck equation (although this terminology is most commonly
used in the context of diffusions).
It is a vector-valued ODE that describes the evolution of a particular distribution path.
By comparison, the Kolmogorov forward equation is (like the backward equation) a differential equation in matrices.
(And matrices are really maps, which send vectors into vectors.)
Operating at this level is less intuitive and more abstract than working with the Fokker–Planck equation.
But, in the end, the object that we want to describe is a Markov semigroup.
The Kolmogorov forward and backward equations are the ODEs that define this fundamental object.
In the simulation above, 𝑄 was chosen with some care, so that the flow remains in 𝒟.
What are the exact properties we require on 𝑄 such that 𝜓𝑡 is always in 𝒟?
This is an important question, because we are setting up an exact correspondence between linear ODEs that evolve in 𝒟
and continuous time Markov chains.
Recall that the linear update rule 𝜓 ↦ 𝜓𝑃 is invariant on 𝒟 if and only if 𝑃 is a Markov matrix.
So now we can rephrase our key question regarding invariance on 𝒟:
What properties do we need to impose on 𝑄 so that 𝑃𝑡 = 𝑒𝑡𝑄 is a Markov matrix for all 𝑡?
A square matrix 𝑄 is called an intensity matrix if 𝑄 has zero row sums and 𝑄(𝑥, 𝑦) ≥ 0 whenever 𝑥 ≠ 𝑦.
Theorem 5.1
If 𝑄 is a matrix on 𝑆 and 𝑃𝑡 ∶= 𝑒𝑡𝑄 , then the following statements are equivalent:
1. 𝑃𝑡 is a Markov matrix for all 𝑡.
2. 𝑄 is an intensity matrix.
The proof is related to that of Lemma 4.2 and is found as a solved exercise below.
Corollary 5.1
If 𝑄 is an intensity matrix on finite 𝑆 and 𝑃𝑡 = 𝑒𝑡𝑄 for all 𝑡 ≥ 0, then (𝑃𝑡 ) is a Markov semigroup.
Let’s return to the chain (𝑋𝑡 ) created from jump chain pair (𝜆, 𝐾) in Algorithm 4.1.
We found that the semigroup is given by
Using the fact that 𝐾 is a Markov matrix and the jump rate function 𝜆 is nonnegative, you can easily check that 𝑄 satisfies
the definition of an intensity matrix.
Hence (𝑃𝑡 ), the Markov semigroup for the jump chain (𝑋𝑡 ), is the semigroup generated by the intensity matrix 𝑄(𝑥, 𝑦) =
𝜆(𝑥)(𝐾(𝑥, 𝑦) − 𝐼(𝑥, 𝑦)).
We can differentiate 𝑃𝑡 = 𝑒𝑡𝑄 to obtain the Kolmogorov forward equation 𝑃𝑡′ = 𝑃𝑡 𝑄.
We can then premultiply by 𝜓0 ∈ 𝒟 to get 𝜓𝑡′ = 𝜓𝑡 𝑄, which is the Fokker–Planck equation.
More explicitly, for given 𝑦 ∈ 𝑆,
The rate of probability flow into 𝑦 is equal to the inflow from other states minus the outflow.
5.5 Summary
We have seen that any intensity matrix 𝑄 on 𝑆 defines a Markov semigroup via 𝑃𝑡 = 𝑒𝑡𝑄 .
Henceforth, we will say that (𝑋𝑡 ) is a Markov chain with intensity matrix 𝑄 if (𝑋𝑡 ) is a Markov chain with Markov
semigroup (𝑒𝑡𝑄 ).
While our discussion has been in the context of a finite state space, later we will see that these ideas carry over to an
infinite state setting under mild restrictions.
We have also hinted at the fact that every continuous time Markov chain is a Markov chain with intensity matrix 𝑄 for
some suitably chosen 𝑄.
Later we will prove this to be universally true when 𝑆 is finite and true under mild conditions when 𝑆 is countably infinite.
Intensity matrices are important because
1. they are the natural infinitesimal descriptions of Markov semigroups,
2. they are often easy to write down in applications and
3. they provide an intuitive description of dynamics.
Later, we will see that, for a given intensity matrix 𝑄, the elements are understood as follows:
• when 𝑥 ≠ 𝑦, the value 𝑄(𝑥, 𝑦) is the “rate of leaving 𝑥 for 𝑦” and
• −𝑄(𝑥, 𝑥) ≥ 0 is the “rate of leaving 𝑥” .
5.6 Exercises
Exercise 5.1
Let (𝑃𝑡 ) be a Markov semigroup such that 𝑡 ↦ 𝑃𝑡 (𝑥, 𝑦) is differentiable at all 𝑡 ≥ 0 and (𝑥, 𝑦) ∈ 𝑆 × 𝑆.
(The derivative at 𝑡 = 0 is the usual right derivative.)
Define (pointwise, at each (𝑥, 𝑦)),
𝑃ℎ − 𝐼
𝑄 ∶= 𝑃0′ = lim (5.4)
ℎ↓0 ℎ
Assuming that this limit exists, and hence 𝑄 is well-defined, show that
both hold. (These are the Kolmogorov forward and backward equations.)
Exercise 5.2
Recall our model of jump chains with state-dependent jump intensities given by rate function 𝑥 ↦ 𝜆(𝑥).
After a wait time with exponential rate 𝜆(𝑥) ∈ (0, ∞), the state transitions from 𝑥 to 𝑦 with probability 𝐾(𝑥, 𝑦).
We found that the associated semigroup (𝑃𝑡 ) satisfies the Kolmogorov backward equation 𝑃𝑡′ = 𝑄𝑃𝑡 with
Exercise 5.3
Prove Theorem 5.1 by adapting the arguments in Lemma 4.2. (This is nontrivial but worth at least trying.)
Hint: The constant 𝑚 in the proof can be set to max𝑥 |𝑄(𝑥, 𝑥)|.
5.7 Solutions
𝑃𝑡+ℎ − 𝑃𝑡 𝑃 𝑃 − 𝑃𝑡 𝑃 (𝑃 − 𝐼)
= 𝑡 ℎ = 𝑡 ℎ
ℎ ℎ ℎ
Taking ℎ ↓ 0 and using the definition of 𝑄 give 𝑃𝑡′ = 𝑃𝑡 𝑄, which is the Kolmogorov forward equation.
𝑃𝑡+ℎ − 𝑃𝑡 𝑃 𝑃 − 𝑃𝑡 (𝑃 − 𝐼)𝑃𝑡
= ℎ 𝑡 = ℎ
ℎ ℎ ℎ
also holds. Taking ℎ ↓ 0 gives the Kolmogorov backward equation.
𝑄1 = 𝜆(𝐾1 − 1) = 𝜆(1 − 1) = 0
𝑑 𝑑 𝑑
𝑄1 = 𝑄𝑃𝑡 1 = ( 𝑃 ) 1 = (𝑃𝑡 1) = 1 = 0
𝑑𝑡 𝑡 𝑑𝑡 𝑑𝑡
Hence 𝑄 has zero row sums.
We can use the definition of the matrix exponential to obtain, for any 𝑥, 𝑦 and 𝑡 ≥ 0,
From this equality and the assumption that 𝑃𝑡 is a Markov matrix for all 𝑡, we see that the off diagonal elements of 𝑄
must be nonnegative.
Hence 𝑄 is an intensity matrix.
5.7. Solutions 57
Continuous Time Markov Chains
SIX
6.1 Overview
We have seen in previous lectures that every intensity matrix generates a Markov semigroup.
We have also hinted that the pairing is one-to-one, in a sense to be made precise.
To clarify these ideas, we start in an abstract setting, with an arbitrary initial value problem.
In this setting we introduce general operator semigroups and their generators.
Once this is done, we will be able to return to the Markov case and fully clarify the connection between intensity matrices
and Markov semigroups.
The material below is relatively technical, with most of the complications driven by the fact that the state space can be
infinite.
Such technicalities are hard to avoid, since so many interesting Markov chains do have infinite state spaces.
• Our very first example – the Poisson process – has an infinite state space.
• Another example is the study of queues, which often have no natural upper bound.1
Readers are assumed to have some basic familiarity with Banach spaces.
6.2 Motivation
The general theory of continuous semigroups of operators is motivated by the problem of solving linear ODEs in infinite
dimensional spaces.2
More specifically, the challenge is to solve initial value problems such as
where
• 𝑥𝑡 takes value in a Banach space at each time 𝑡,
• 𝐴 is a linear operator and
• the time derivative 𝑥′𝑡 uses a definition appropriate for a Banach space.
1 In fact a major concern with queues is that their length does not explode. This issue cannot be properly explored unless the state space is allowed
to be infinite.
2 An excellent introduction to operator semigroups, combined with applications to PDEs and Markov processes, can be found in [Applebaum, 2019].
59
Continuous Time Markov Chains
6.3 Preliminaries
You will recall that a linear operator on 𝔹 is a map 𝐴 from 𝔹 to itself satisfying
This is the usual definition of a bounded linear operator on a normed linear space.
The set of all bounded linear operators on 𝔹 is denoted by ℒ(𝔹) and is itself a Banach space.
Sums and scalar products of elements of ℒ(𝔹) are defined in the usual way, so that, for 𝛼 ∈ R, 𝐴, 𝐵 ∈ ℒ(𝔹) and 𝑔 ∈ 𝔹,
we have
and so on.
We write 𝐴𝐵 to indicate composition of the operators 𝐴, 𝐵 ∈ ℒ(𝔹).
The value defined in (6.2) is called the operator norm of 𝐴 and, as suggested by the notation, is a norm on ℒ(𝔹).
In addition to being a norm, it enjoys the submultiplicative property ‖𝐴𝐵‖ ≤ ‖𝐴‖‖𝐵‖ for all 𝐴, 𝐵 ∈ ℒ(𝔹).
Let 𝐼 be the identity in ℒ(𝔹), satisfying 𝐼𝑔 = 𝑔 for all 𝑔 ∈ 𝔹.
(In fact ℒ(𝔹) is a unital Banach algebra when multiplication is identified with operator composition and 𝐼 is adopted as
the unit.)
𝐴𝑘 𝐴2
𝑒𝐴 = ∑ =𝐼 +𝐴+ +⋯ (6.3)
𝑘≥0
𝑘! 2!
This is the same as the definition for the matrix exponential. The exponential function arises naturally as the solution
to ODEs in Banach space, one example of which (as we shall see) is distribution flows associated with continuous time
Markov chains.
The exponential map has the following properties:
• For each 𝐴 ∈ ℒ(𝔹), the operator 𝑒𝐴 is a well defined element of ℒ(𝔹) with ‖𝑒𝐴 ‖ ≤ 𝑒‖𝐴‖ .3
• 𝑒0 = 𝐼, where 0 is the zero element of ℒ(𝔹).
• If 𝐴, 𝐵 ∈ ℒ(𝔹) and 𝐴𝐵 = 𝐵𝐴, then 𝑒𝐴+𝐵 = 𝑒𝐴 𝑒𝐵
• If 𝐴 ∈ ℒ(𝔹), then 𝑒𝐴 is invertible and (𝑒𝐴 )−1 = 𝑒−𝐴 .
The last fact is easily checked from the previous ones.
Consider a function
R+ ∋ 𝑡 ↦ 𝑈𝑡 ∈ ℒ(𝔹)
which we can think of as a time path in ℒ(𝔹), such as a flow of Markov operators.
We say that this function is differentiable at 𝜏 ∈ R+ if there exists an element 𝑇 of ℒ(𝔹) such that
𝑈𝜏+ℎ − 𝑈𝜏
→ 𝑇 as ℎ → 0 (6.4)
ℎ
In this case, 𝑇 is called the derivative of the function 𝑡 ↦ 𝑈𝑡 at 𝜏 and we write
𝑑
𝑇 = 𝑈𝜏′ or 𝑇 = 𝑈 ∣
𝑑𝑡 𝑡 𝑡=𝜏
(Convergence of operators is in operator norm. If 𝜏 = 0, then the limit ℎ → 0 in (6.4) is the right limit.)
Example
If 𝑈𝑡 = 𝑡𝑉 for some fixed 𝑉 ∈ ℒ(𝔹), then it is easy to see that 𝑉 is the derivative of 𝑡 ↦ 𝑈𝑡 at every 𝑡 ∈ R+ .
Example
In our discussion of the Kolmogorov forward equation when 𝑆 is finite, we introduced the derivative of a map 𝑡 ↦ 𝑃𝑡 ,
where each 𝑃𝑡 is a matrix on 𝑆.
The derivative was defined by differentiating 𝑃𝑡 element-by-element.
This coincides with the operator-theoretic definition in (6.4) when 𝑆 is finite, because then the space ℒ(ℓ1 ), which consists
of all bounded linear operators on ℓ1 , is finite dimensional, and hence pointwise and norm convergence coincide.
3 Convergence of the sum in (6.3) follows from boundedness of 𝐴 and the fact that ℒ(𝔹) is a Banach space.
6.3. Preliminaries 61
Continuous Time Markov Chains
Analogous to the matrix and scalar cases, we have the following result:
𝑑 𝑡𝐴
𝑒 = 𝑒𝑡𝐴 𝐴 = 𝐴𝑒𝑡𝐴 (6.5)
𝑑𝑡
For continuous time Markov chains where the state space 𝑆 is finite, we saw that Markov semigroups often take the form
𝑃𝑡 = 𝑒𝑡𝑄 for some intensity matrix 𝑄.
This is ideal because the entire semigroup is characterized in a simple way by its infinitesimal description 𝑄.
It turns out that, when 𝑆 is finite, this is always true: if (𝑃𝑡 ) is a Markov semigroup, then there exists an intensity matrix
𝑄 satisfying 𝑃𝑡 = 𝑒𝑡𝑄 for all 𝑡.
Moreover, this statement is again true when 𝑆 is infinite, provided that some restrictions are placed on the semigroup.
Our aim is to make these statements precise, starting in an abstract setting and then specializing.
The claim that (𝑈𝑡 ) is an evolution semigroup follows directly from the properties of the exponential function given above.
Uniform continuity can be established using arguments similar to those in the proof of differentiability in Lemma 6.1.
Since norm convergence on ℒ(𝔹) implies pointwise convergence, every uniformly continuous semigroup is a 𝐶0 semi-
group.
The reverse is certainly not true — there are many important 𝐶0 semigroups that fail to be uniformly continuous.
4 Be careful: the definition of a UC semigroup requires that 𝑡 ↦ 𝑈 is continuous as a map into ℒ(𝔹), rather than uniformly continuous. The UC
𝑡
terminology comes about because, for a UC semigroup, we have, by definition of the operator norm, sup‖𝑔‖≤1 ‖𝑈𝑠 𝑔 − 𝑈𝑡 𝑔‖ → 0 when 𝑠 → 𝑡.
In fact semigroups associated with PDEs, diffusions and other Markov processes on continuous state spaces are typically
𝐶0 but not uniformly continuous.
There are also important examples of Markov semigroups on infinite discrete state spaces that fail to be uniformly con-
tinuous.
However, we will soon see that, for most continuous time Markov chains used in applications, the semigroups are uniformly
continuous.
6.4.2 Generators
Consider a continuous time Markov chain on a finite state space with intensity matrix 𝑄.
The Markov semigroup (𝑃𝑡 ) is fully specified by this infinitesimal description 𝑄, in the sense that
• 𝑃𝑡 = 𝑒𝑡𝑄 for all 𝑡 ≥ 0 and (equivalently)
• the forward and backward equations hold: 𝑃𝑡′ = 𝑃𝑡 𝑄 = 𝑄𝑃𝑡 .
Since 𝑃0 = 𝐼, the matrix 𝑄 can be recovered from the semigroup via
𝑃ℎ − 𝐼
𝑄 = 𝑃0′ = lim
ℎ↓0 ℎ
In the more abstract setting of 𝐶0 semigroups, we say that 𝑄 is the “generator” of the semigroup (𝑃𝑡 ).
More generally, given a 𝐶0 semigroup (𝑈𝑡 ), we say that a linear operator 𝐴 from 𝔹 to itself is the generator of (𝑈𝑡 ) if
𝑈ℎ 𝑔 − 𝑔
𝐴𝑔 = lim (6.6)
ℎ↓0 ℎ
The last three claims in Theorem 6.1 follow directly from the first claim.
The statement 𝑈𝑡′ = 𝐴𝑈𝑡 = 𝑈𝑡 𝐴 is a generalization of the Kolmogorov forward and backward equations.
While slightly more complicated in the Banach setting, the proof of the first claim (existence of an exponential represen-
tation) is a direct extension of the fact that any continuous function 𝑓 from R to itself satisfying
• 𝑓(𝑠)𝑓(𝑡) = 𝑓(𝑠 + 𝑡) for all 𝑠, 𝑡 ≥ 0 and
• 𝑓(0) = 1
also satisfies 𝑓(𝑡) = 𝑒𝑡𝑎 for some 𝑎 ∈ R.
We proved something quite similar in Theorem 1.1, on the memoryless property of the exponential function.
For more discussion of the scalar case, see, for example, [Sahoo and Kannappan, 2011].
For a full proof of the first claim in Theorem 6.1, in the setting of a Banach algebra, see, for example, Chapter 7 of
[Bobrowski, 2005].
6.5 Exercises
Exercise
Prove that (6.5) holds for all 𝐴 ∈ ℒ(𝔹).
Exercise
In many texts, a 𝐶0 semigroup is defined as an evolution semigroup (𝑈𝑡 ) such that
Our aim is to show that (6.7) implies continuity at every point 𝑡, as in the definition we used above.
The Banach–Steinhaus Theorem can be used to show that, for an evolution semigroup (𝑈𝑡 ) satisfying (6.7), there exist
finite constants 𝜔 and 𝑀 such that
Using this and (6.7), show that, for any 𝑔 ∈ 𝔹, the map 𝑡 ↦ 𝑈𝑡 𝑔 is continuous at all 𝑡.
Exercise
Following on from the previous exercise, a UC semigroup is often defined as an evolution semigroup (𝑈𝑡 ) such that
‖𝑈𝑡 − 𝐼‖ → 0 as 𝑡 → 0 (6.9)
Show that (6.9) implies norm continuity at every point 𝑡, as in the definition we used above.
In particular, show that, for any 𝑡𝑛 → 𝑡, we have ‖𝑈𝑡𝑛 − 𝑈𝑡 ‖ → 0 as 𝑛 → ∞.
6.6 Solutions
Since the norm on ℒ(𝔹) is submultiplicative, it suffices to show that ‖𝑒ℎ𝐴 − 𝐼 − 𝐴‖ = 𝑜(ℎ) as ℎ → 0.
Using the definition of the exponential, this is easily verified, completing the proof of the first equality in (6.5).
The proof of the second equality is similar.
6.6. Solutions 65
Continuous Time Markov Chains
SEVEN
UC MARKOV SEMIGROUPS
7.1 Overview
In our previous lecture we covered some of the general theory of operator semigroups.
Next we translate these results into the setting of Markov semigroups.
The Markov semigroups are defined on a countable set 𝑆.
The main aim is to give an exact one-to-one correspondence between
• UC Markov semigroups
• “conservative” intensity matrices and
• jump chains with state dependent jump intensities
Conservativeness is defined below and relates to “nonexplosiveness” of the associated Markov chain.
We will also give a brief discussion of intensity matrices that do not have this property, along with the processes they
generate.
To be consistent with earlier notation, we are writing the argument of 𝑃 to the left and applying 𝑃 to it as if premultiplying
𝑃 by a row vector.
In the exercises you are asked to verify that (7.1) defines a bounded linear operator on ℓ1 such that
Note that composition of 𝑃 with itself is equivalent to powers of the matrix under matrix multiplication.
67
Continuous Time Markov Chains
For an intensity matrix 𝑄 on 𝑆 we can try to introduce the associated operator analogously, via
Theorem 7.1
If (𝑃𝑡 ) is a UC Markov semigroup on ℓ1 , then there exists a conservative intensity matrix 𝑄 such that 𝑃𝑡 = 𝑒𝑡𝑄 for all
𝑡 ≥ 0.
R
particular, 𝑄 ∶ 𝑆 × 𝑆 → is called an intensity matrix if 𝑄 has zero row sums and 𝑄(𝑥, 𝑦) ≥ 0 whenever 𝑥 ≠ 𝑦.
In fact these results are just a special case of the claims in Theorem 6.1.
The second last of these results is the Kolmogorov forward and backward equations.
The last of these results shows that we can obtain the intensity matrix 𝑄 by differentiating 𝑃𝑡 at 𝑡 = 0.
Example 7.1
Let us consider again the Poisson process (𝑁𝑡 ) with rate 𝜆 > 0 in light of the discussion above.
The corresponding semigroup (𝑃𝑡 ) is UC and hence there exists a conservative intensity matrix 𝑄 with 𝑃𝑡 = 𝑒𝑡𝑄 for all
𝑡 ≥ 0.
This fact can be established by proving UC property and then appealing to Theorem 7.1.
Another alternative, easier in this case, is to supply the intensity matrix 𝑄 directly and then verify that 𝑃𝑡 = 𝑒𝑡𝑄 holds.
The semigroup for a Poisson process with rate 𝜆 was given in (3.9) and is repeated here:
𝑘−𝑗
𝑒−𝜆𝑡 (𝜆𝑡)
(𝑘−𝑗)! if 𝑗 ≤ 𝑘
𝑃𝑡 (𝑗, 𝑘) = { (7.4)
0 otherwise
−𝜆 𝜆 0 0 0 ⋯
⎛
⎜ 0 −𝜆 𝜆 0 0 ⋯⎞⎟
⎜ ⎟
𝑄 ∶= ⎜
⎜ 0 0 −𝜆 𝜆 0 ⋯⎟⎟ (7.5)
⎜
⎜ ⎟
0 0 0 −𝜆 𝜆 ⋯⎟
⎝ ⋮ ⋮ ⋮ ⋮ ⋮ ⎠
The form of 𝑄 is intuitive: probability flows out of state 𝑖 and into state 𝑖 + 1 at the rate 𝜆.
It is immediate that 𝑄 is an intensity matrix, as claimed.
The exercises ask you to confirm that 𝑄 is in ℒ(ℓ1 ).
To prove that 𝑃𝑡 = 𝑒𝑡𝑄 for any 𝑡 ≥ 0, we first decompose 𝑄 as 𝑄 = 𝜆(𝐾 − 𝐼), where 𝐾 is defined by
𝐾(𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 1}
(𝜆𝑡𝐾)𝑚
𝑒𝑡𝑄 = 𝑒𝜆𝑡(𝐾−𝐼) = 𝑒−𝜆𝑡 𝑒𝜆𝑡𝐾 = 𝑒−𝜆𝑡 ∑
𝑚≥0
𝑚!
The exercises ask you to verify that, for the powers of 𝐾, we have 𝐾 𝑚 (𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 𝑚}.
Inserting this expression for 𝐾 𝑚 leads to
(𝜆𝑡)𝑚 (𝜆𝑡)𝑚
𝑒𝑡𝑄 (𝑖, 𝑗) = 𝑒−𝜆𝑡 ∑ 𝟙{𝑗 = 𝑖 + 𝑚} = 𝑒−𝜆𝑡 ∑ 𝟙{𝑚 = 𝑗 − 𝑖}
𝑚≥0
𝑚! 𝑚≥0
𝑚!
Our definition of a conservative intensity matrix works for the theory above but can be hard to check in appliations and
lacks probabilistic intuition.
Fortunately, we have the following simple characterization.
Lemma 7.1
An intensity matrix 𝑄 on 𝑆 is conservative if and only if sup𝑥 |𝑄(𝑥, 𝑥)| is finite.
Example 7.2
Recall the jump chain setting where, repeating (4.4), we defined 𝑄 via
The function 𝜆 ∶ 𝑆 → R+ gives the jump rate at each state, while 𝐾 is the Markov matrix for the embedded discrete
time jump chain.
Previously we discussed this in the case where 𝑆 is finite but there is no need to restrict attention to that case.
For general countable 𝑆, the matrix 𝑄 defined in (7.6) is still an intensity matrix.
If we continue to assume that 𝐾(𝑥, 𝑥) = 0 for all 𝑥, then 𝑄(𝑥, 𝑥) = −𝜆(𝑥).
Hence, 𝑄 is conservative if and only if sup𝑥 𝜆(𝑥) is finite.
In other words, 𝑄 is conservative if the set of jump rates is bounded.
It is immediate from Lemma 7.1 that every intensity matrix is conservative when the state space 𝑆 is finite.
Hence, in this setting, every intensity matrix 𝑄 on 𝑆 defines a UC Markov semigroup (𝑃𝑡 ) via 𝑃𝑡 = 𝑒𝑡𝑄 .
Conversely, if 𝑆 is finite, then any Markov semigroup (𝑃𝑡 ) is a UC Markov semigroup.
To see this, recall that, as a Markov semigroup, (𝑃𝑡 ) satisfies lim𝑡→0 𝑃𝑡 (𝑥, 𝑦) = 𝐼(𝑥, 𝑦) for all 𝑥, 𝑦 in 𝑆.
In any finite dimensional space, pointwise convergence implies norm convergence, so 𝑃𝑡 → 𝐼 in operator norm as 𝑡 → 0
from above.
As we saw previously, this is enough to ensure that 𝑡 ↦ 𝑃𝑡 is norm continuous everywhere on R+ .
Hence (𝑃𝑡 ) is a UC Markov semigroup, as claimed.
Combining these results with Theorem 7.1, we conclude that, when 𝑆 is finite, there is a one-to-one correspondence
between Markov semigroups and intensity matrices.
We now understand that there is a one-to-one pairing between conservative intensity matrices and UC Markov semigroups.
These ideas are important from an analytical perspective.
Now we provide another point of view, more connected to probability.
This point of view is important for both theory and computation.
Let us agree to call (𝜆, 𝐾) a jump chain pair if 𝜆 is a map from 𝑆 to R+ and 𝐾 is a Markov matrix on 𝑆.
It is easy to verify that the matrix 𝑄 on 𝑆 defined by
is an intensity matrix.
(We saw in an earlier lecture that 𝑄 is the intensity matrix for the jump chain (𝑋𝑡 ) built via Algorithm 4.1 from jump
chain pair (𝜆, 𝐾).)
As we now show, every intensity matrix admits the decomposition in (7.7) for some jump chain pair.
0 if 𝜆(𝑥) > 0
𝐾(𝑥, 𝑥) = { (7.9)
1 otherwise
Thus, if the rate of leaving 𝑥 is positive, we set 𝐾(𝑥, 𝑥) = 0, so that the embedded jump chain moves away from 𝑥 with
probability one when the next jump occurs.
Otherwise, when 𝑄(𝑥, 𝑥) = 0, we stay at 𝑥 forever, so 𝑥 is an absorbing state.
Off the principle diagonal, where 𝑥 ≠ 𝑦, we set
𝑄(𝑥,𝑦)
𝜆(𝑥) if 𝜆(𝑥) > 0
𝐾(𝑥, 𝑦) = { (7.10)
0 otherwise
The exercises below ask you to confirm that, for 𝜆 and 𝐾 just defined,
1. (𝜆, 𝐾) is a jump chain pair and
2. the intensity matrix 𝑄 satisfies (7.7).
We call (𝜆, 𝐾) the jump chain decomposition of 𝑄.
We summarize in a lemma.
Lemma 7.2
A matrix 𝑄 on 𝑆 is an intensity matrix if and only if there exists a jump chain pair (𝜆, 𝐾) such that (7.7) holds.
We know from Example 7.2 that an intensity matrix 𝑄 is conservative if and only if 𝜆 is bounded.
Moreover, we saw in Theorem 7.1 that the pairing between conservative intensity matrices and UC Markov semigroups
is one-to-one.
This leads to the following result.
Theorem 7.2
On 𝑆, there exists a one-to-one correspondence between the following sets of objects:
1. The set of all jump chain pairs (𝜆, 𝐾) such that 𝜆 is bounded.
2. The set of all conservative intensity matrices.
3. The set of all UC Markov semigroups.
7.4.4 Simulation
In view of the preceding discussion, we have a simple way to simulate a Markov chain given any conservative intensity
matrix 𝑄.
The steps are
1. Decompose 𝑄 into a jump chain pair (𝜆, 𝐾).
2. Simulate via Algorithm 4.1.
Recalling our discussion of the Kolmogorov backward equation, we know that this produces a Markov chain with Markov
semigroup (𝑃𝑡 ) where 𝑃𝑡 = 𝑒𝑡𝑄 for 𝑄 satisfying (7.7).
(Although our argument assumed finite 𝑆, the proof goes through when 𝑆 is countably infinite and 𝑄 is conservative with
very minor changes.)
In particular, (𝑋𝑡 ) is a continuous time Markov chain with intensity matrix 𝑄.
If we do run into an application where an intensity matrix 𝑄 is not conservative, what might we expect?
In this scenario, we can at least hope that 𝑄 is the generator of a 𝐶0 semigroup.
Since 𝑄 is an intensity matrix, we can be sure that this semigroup will be a Markov semigroup.
To know when 𝑄 will be the generator of a 𝐶0 semigroup, we need to look to the Hille-Yoshida Theorem and sufficient
conditions derived from it.
While we omit a detailed treatment, it is worth noting that the issue is linked to explosions.
To see the connection, recall that some initial value problems do not lead to a valid solution defined for all 𝑡 ∈ R+ .
An example is the scalar problem 𝑥′𝑡 = 1 + 𝑥2𝑡 , which has solution 𝑥𝑡 = tan(𝑡 − 𝑐) for some constant 𝑐.
But this solution equals +∞ for 𝑡 ≥ 𝑐 + 𝜋/2.
The problem is that the time path explodes to infinity in finite time.
The same issue can occur for Markov processes, if jump rates grow sufficiently quickly.
For more discussion, see, for example, Section 2.7 of [Norris, 1998].
7.6 Exercises
Exercise 7.1
Let 𝑃 be a Markov matrix on 𝑆 and identify it with the linear operator in (7.1). Verify the claims in (7.2).
Exercise 7.2
Prove the claim in Lemma 7.1.
Exercise 7.3
Confirm that 𝑄 defined in (7.5) induces a bounded linear operator on ℓ1 via (7.3).
Exercise 7.4
Let 𝐾 be defined on Z+ × Z+ by 𝐾(𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 1}.
Show that, with 𝐾 𝑚 representing the 𝑚-th matrix product of 𝐾 with itself, we have 𝐾 𝑚 (𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 𝑚} for any
𝑖, 𝑗 ∈ Z+ .
Exercise 7.5
Let 𝑄 be any intensity matrix on 𝑆.
Prove that the jump chain decomposition of 𝑄 is in fact a jump chain pair.
Prove that, in addition, this decomposition (𝜆, 𝐾) satisfies (7.7).
7.7 Solutions
Hence ‖𝑃 ‖ ≤ 1.
To see that equality holds we can repeat this argument with 𝑓 ≥ 0, obtaining ‖𝑓𝑃 ‖ = ‖𝑓‖.
Now pick any 𝜙 ∈ 𝒟.
7.6. Exercises 73
Continuous Time Markov Chains
Clearly 𝜙𝑃 ≥ 0, and
Hence 𝜙𝑃 ∈ 𝒟 as claimed.
Set 𝑃 ̂ ∶= 𝐼 + 𝑄/𝑚.
It is not hard to check that 𝑃 ̂ is a Markov matrix and that 𝑄 = 𝑚(𝑃 ̂ − 𝐼).
Since 𝑃 ̂ is a Markov matrix, it induces a bounded linear operator on ℓ1 via (7.1).
As ℒ(ℓ1 ) is a linear space, we see that 𝑄 is likewise in ℒ(ℓ1 ).
In particular, 𝑄 is a bounded operator, and hence conservative.
Next, suppose that 𝑄 is conservative and yet sup𝑥 |𝑄(𝑥, 𝑥)| is infinite.
Choose 𝑥 ∈ 𝑆 such that |𝑄(𝑥, 𝑥)| > ‖𝑄‖
Let 𝑓 ∈ ℓ1 be defined by 𝑓(𝑧) = 𝟙{𝑧 = 𝑥}.
Since ‖𝑓‖ = 1, we have
Contradiction.
Applying the triangle inequality, we see that the right hand side is dominated by 2𝜆‖𝑓‖.
Hence ‖𝑓𝑄‖ ≤ 2𝜆‖𝑓‖, which implies that 𝑄 ∈ ℒ(ℓ1 ) as required.
If, on the other hand, 𝜆(𝑥) = 0, then ∑𝑦 𝐾(𝑥, 𝑦) = 1, is immediate from the definition.
As 𝐾 is nonnegative, we see that 𝐾 is a Markov matrix.
Thus, (𝜆, 𝐾) is a valid jump chain pair.
The proof that 𝑄 and (𝜆, 𝐾) satisfy (7.7) is mechanical and the details are omitted.
(Try working case-by-case, with 𝜆(𝑥) = 0, 𝑥 = 𝑦, 𝜆(𝑥) > 0, 𝑥 = 𝑦, etc.)
7.7. Solutions 75
Continuous Time Markov Chains
EIGHT
8.1 Overview
In this lecture we discuss stability and equilibrium behavior for continuous time Markov chains.
To give one example of why this theory matters, consider queues, which are often modeled as continuous time Markov
chains.
Queueing theory is used in applications such as
• treatment of patients arriving at a hospital
• optimal design of manufacturing processes
• requests to a file server
• air traffic
• customers waiting on a helpline
A key topic in queueing theory is average behavior over the long run.
• Will the length of the queue grow without bounds?
• If not, is there some kind of long run equilibrium?
• If so, what is the average waiting time in this equilibrium?
• What is the average length of the queue over a week, or a month?
We will use the following imports
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit
from scipy.linalg import expm
77
Continuous Time Markov Chains
8.2.1 Definition
Let 𝑆 be countable.
Recall that, for a discrete time Markov chain with Markov matrix 𝑃 on 𝑆, a distribution 𝜓 is called stationary for 𝑃 if
𝜓𝑃 = 𝜓.
This means that if 𝑋𝑡 has distribution 𝜓, then so does 𝑋𝑡+1 .
For continuous time Markov chains, the definition is analogous.
Given a Markov semigroup (𝑃𝑡 ) on 𝑆, a distribution 𝜓∗ ∈ 𝒟 is called stationary for (𝑃𝑡 ) if
𝜓∗ 𝑃𝑡 = 𝜓∗ for all 𝑡 ≥ 0
As one example, we look again at the chain on 𝑆 = {0, 1, 2} with intensity matrix
Q = ((-3, 2, 1),
(3, -5, 2),
(4, 6, -10))
The following figure was shown before, except that now there is a black dot that the three trajectories seem to be converging
to.
(Recall that, in the color scheme, trajectories cool as time evolves.)
This black dot is the stationary distribution 𝜓∗ of the Markov semigroup (𝑃𝑡 ) generated by 𝑄.
It was calculated using the stationary_distributions attribute of QuantEcon’s MarkovChain class, by ar-
bitrarily setting 𝑡 = 1 and solving 𝜓𝑃1 = 𝜓.
Below we show that, for this choice of 𝑄, the stationary distribution 𝜓∗ is unique in 𝒟, due to irreducibility.
Moreover, 𝜓𝑃𝑡 → 𝜓∗ as 𝑡 → ∞ for any 𝜓 ∈ 𝒟, as suggested by the figure.
In many cases, it is easier to use the generator of the semigroup to identify stationary distributions rather than the semigroup
itself.
This is analogous to the idea that a point 𝑥̄ in R𝑑 is stationary for a vector ODE 𝑥′𝑡 = 𝐹 (𝑥𝑡 ) when 𝐹 (𝑥)̄ = 0.
(Here 𝐹 is the infinitesimal description, and hence analogous to the generator.)
The next result holds true under weaker conditions but the version stated here is easy to prove and sufficient for applications
we consider.
Theorem 8.1
Let (𝑃𝑡 ) be a UC Markov semigroup with generator 𝑄. A distribution 𝜓 on 𝑆 is stationary for (𝑃𝑡 ) if and only if 𝜓𝑄 = 0.
𝜓𝑄2
𝜓𝑒𝑡𝑄 = 𝜓 + 𝑡𝜓𝑄 + 𝑡2 +⋯
2!
From 𝜓𝑄 = 0 we get 𝜓𝑄𝑘 = 0 for all 𝑘 ∈ N, so the last display yields 𝜓𝑃𝑡 = 𝜓.
Hence 𝜓 is stationary for (𝑃𝑡 ).
Now suppose that 𝜓 is stationary for (𝑃𝑡 ) and set 𝐷𝑡 ∶= (1/𝑡)(𝑃𝑡 − 𝐼).
From the triangle inequality and the definition of the operator norm, for any given 𝑡,
Theorem 8.2
Let (𝑃𝑡 ) be a UC Markov semigroup with generator 𝑄. For distinct states 𝑥 and 𝑦, the following statements are equivalent:
1. The state 𝑦 is accessible from 𝑥 under (𝑃𝑡 ).
2. There is a 𝑄-positive probability flow from 𝑥 to 𝑦.
3. 𝑃𝑡 (𝑥, 𝑦) > 0 for all 𝑡 > 0.
It follows that at least one element of the sum must be strictly positive.
Therefore, a 𝑄-positive probability flow from 𝑥 to 𝑦 exists.
Turning to (2 ⟹ 3), first note that, for arbitrary states 𝑢 and 𝑣, if 𝑄(𝑢, 𝑣) > 0 then 𝑃𝑡 (𝑢, 𝑣) > 0 for all 𝑡 > 0.
To see this, let (𝜆, 𝐾) be the jump chain pair constructed from 𝑄 via (7.8), (7.9) and (7.10).
Observe that, since 𝑄(𝑢, 𝑣) > 0, we must have 𝜆(𝑢) > 0.
As a consequence, applying (7.10), we have
𝑄(𝑢, 𝑣)
𝐾(𝑢, 𝑣) = >0
𝜆(𝑢)
Let (𝑌𝑘 ) and (𝐽𝑘 ) be the embedded jump chain and jump sequence generated by Algorithm 4.1, with 𝑌0 = 𝑢.
With 𝐸1 ∼ Exp(1) and 𝐸2 ∼ Exp(1), we have, for any 𝑡 > 0,
𝑃𝑡 (𝑢, 𝑣) ≥ P{𝐽1 ≤ 𝑡, 𝑌1 = 𝑣, 𝐽2 > 𝑡}
≥ P{𝐸1 ≤ 𝑡𝜆(𝑢), 𝐸2 > 𝑡𝜆(𝑣)}P{𝑌1 = 𝑣}
= P{𝐸1 ≤ 𝑡𝜆(𝑢)}P{𝐸2 > 𝑡𝜆(𝑣)}𝐾(𝑢, 𝑣)
>0
If we fix 𝑡 > 0 and repeatedly apply (3.7) along with the last result, we obtain
𝑚−1
𝑃𝑡 (𝑥, 𝑦) ≥ ∏ 𝑃𝑡/𝑚 (𝑧𝑖 , 𝑧𝑖+1 ) > 0
𝑖=0
Corollary 8.1
For a UC Markov semigroup (𝑃𝑡 ), the following statements are equivalent:
1. (𝑃𝑡 ) is irreducible.
2. 𝑃𝑡 (𝑥, 𝑦) > 0 for all 𝑡 > 0 and all (𝑥, 𝑦) ∈ 𝑆 × 𝑆.
Note: To obtain stable long run behavior in discrete time Markov chains, it is common to assume that the chain is
aperiodic.
This needs to be assumed on top of irreducibility if one wishes to rule out all dependence on initial conditions.
Corollary 8.1 shows that periodicity is not a concern for irreducible continuous time Markov chains.
Positive probability flow from 𝑥 to 𝑦 at some 𝑡 > 0 immediately implies positive flow for all 𝑡 > 0.
We call Markov semigroup (𝑃𝑡 ) asymptotically stable if (𝑃𝑡 ) has a unique stationary distribution 𝜓∗ in 𝒟 and
8.4.1 Contractivity
Let’s recall some useful facts about the discrete time case.
First, if 𝑃 is any Markov matrix, we have, in the ℓ1 norm,
‖𝜓𝑃 − 𝜙𝑃 ‖ ≤ ‖𝜓 − 𝜙‖
The proof follows from the strict triangle inequality, as opposed to the weak triangle inequality we used to obtain (8.4).
See, for example, Proposition 3.1.2 of [Lasota and Mackey, 1994] or Lemma 8.2.3 of [Stachurski, 2009].
8.4.2 Uniqueness
Irreducibility of a given Markov chain implies that there are no disjoint absorbing sets.
This in turn leads to uniqueness of stationary distributions:
Theorem 8.3
Let (𝑃𝑡 ) be a UC Markov semigroup on 𝑆. If (𝑃𝑡 ) is irreducible, then (𝑃𝑡 ) has at most one stationary distribution.
P roof. Suppose to the contrary that 𝜓 and 𝜙 are both stationary for (𝑃𝑡 ).
Since (𝑃𝑡 ) is irreducible, we know that 𝑃1 (𝑥, 𝑦) > 0 for all 𝑥, 𝑦 ∈ 𝑆.
If 𝜓 ≠ 𝜙, then, due to positivity of 𝑃1 , the strict inequality in Lemma 8.1 holds.
At the same time, by stationarity, ‖𝜓𝑃 − 𝜙𝑃 ‖ = ‖𝜓 − 𝜙‖. Contradiction.
Example 8.1
An M/M/1 queue with parameters 𝜇, 𝜆 is a continuous time Markov chain (𝑋𝑡 ) on 𝑆 = Z+ with with intensity matrix
−𝜆 𝜆 0 0 ⋯
⎛ 𝜇 −(𝜇 + 𝜆) 𝜆 0 ⋯⎞
𝑄=⎜
⎜
⎜ 0
⎟
⎟ (8.5)
𝜇 −(𝜇 + 𝜆) 𝜆 ⋯⎟
⎝ ⋮ ⋮ ⋮ ⋮ ⋱⎠
The chain (𝑋𝑡 ) records the length of the queue at each moment in time.
The intensity matrix captures the idea that customers flow into the queue at rate 𝜆 and are served (and hence leave the
queue) at rate 𝜇.
If 𝜆 and 𝜇 are both positive, then there is a 𝑄-positive probability flow between any two states, in both directions, so the
corresponding semigroup (𝑃𝑡 ) is irreducible.
Theorem 8.3 now tells us that (𝑃𝑡 ) has at most one stationary distribution.
Lemma 8.2
Let (𝑃𝑡 ) be a Markov semigroup. If there exists an 𝑠 > 0 such that the Markov matrix 𝑃𝑠 is asymptotically stable, then
(𝑃𝑡 ) is asymptotically stable with the same stationary distribution.
In this section we address drift conditions, which are a powerful method for obtaining asymptotic stability when the state
space can be infinite.
The idea is to show that the state tends to drift back to a finite set over time.
Such drift, when combined with the contractivity in Lemma 8.1, is enough to give global stability.
The next theorem gives a useful version of this class of results.
Theorem 8.4
Let (𝑃𝑡 ) be a UC Markov semigroup with intensity matrix 𝑄. If (𝑃𝑡 ) is irreducible and there exists a function 𝑣 ∶ 𝑆 → R+ ,
a finite set 𝐹 ⊂ 𝑆 and positive constants 𝜖 and 𝑀 such that
𝑀 if 𝑥 ∈ 𝐹 and
∑ 𝑄(𝑥, 𝑦)𝑣(𝑦) ≤ {
𝑦 −𝜖 otherwise
Example 8.2
Setting 𝐹 = {0} and 𝑀 = 𝜆 − 𝜇 = −𝜖, we see that the conditions of Theorem 8.4 hold.
Hence the associated semigroup (𝑃𝑡 ) is asymptotically stable.
Corollary 8.2
If (𝑃𝑡 ) is an irreducible UC Markov semigroup and 𝑆 is finite, then (𝑃𝑡 ) is asymptotically stable.
8.5 Exercises
Exercise 8.1
Let (𝑃𝑡 ) be a Markov semigroup. True or false: for this semigroup, every state 𝑥 is accessible from itself.
Exercise 8.2
Let (𝜆𝑘 ) be a bounded non-increasing sequence in (0, ∞).
A pure birth process starting at zero is a continuous time Markov process (𝑋𝑡 ) on state space Z+ with intensity matrix
−𝜆0 𝜆0 0 0 ⋯
⎛
⎜ 0 −𝜆1 𝜆1 0 ⋯⎞⎟
𝑄=⎜
⎜ 0 ⎟
0 −𝜆2 𝜆2 ⋯⎟
⎝ ⋮ ⋮ ⋮ ⋮ ⋱⎠
Show that (𝑃𝑡 ), the corresponding Markov semigroup, has no stationary distribution.
Exercise 8.3
Confirm that Theorem 8.4 implies Corollary 8.2.
8.6 Solutions
𝜙(𝑗) 𝜆𝑗−1
= ≥1
𝜙(𝑗 − 1) 𝜆𝑗
𝜙(𝑗) ≥ 𝜙(𝑗 − 1)
Hence the drift condition in Theorem 8.4 holds and (𝑃𝑡 ) is asymptotically stable.
8.6. Solutions 85
Continuous Time Markov Chains
NINE
BIBLIOGRAPHY
9.1 References
87
Continuous Time Markov Chains
88 Chapter 9. Bibliography
BIBLIOGRAPHY
[App19] David Applebaum. Semigroups of Linear Operators. Volume 93. Cambridge University Press, 2019.
[Bob05] Adam Bobrowski. Functional analysis for probability and stochastic processes: an introduction. Cambridge
University Press, 2005.
[How17] Douglas C Howard. Elements of Stochastic Processes: A Computational Approach. FE Press, 2017.
[LM94] Andrzej Lasota and Michael C Mackey. Chaos, fractals, and noise: stochastic aspects of dynamics. Vol-
ume 97. Springer Science & Business Media, 1994.
[LG16] Jean-François Le Gall. Brownian motion, martingales, and stochastic calculus. Volume 274. Springer, 2016.
[Lig10] Thomas Milton Liggett. Continuous time Markov processes: an introduction. Volume 113. American Mathe-
matical Soc., 2010.
[Nor98] James R Norris. Markov chains. Number 2. Cambridge University Press, 1998.
[Par08] Etienne Pardoux. Markov processes and applications: algorithms, networks, genome and finance. Volume
796. John Wiley & Sons, 2008.
[PichorRTKaminska12] Katarzyna Pichór, Ryszard Rudnicki, and Marta Tyran-Kamińska. Stochastic semigroups and
their applications to biological models. Demonstratio Mathematica, 45(2):463–494, 2012.
[SK11] Prasanna K Sahoo and Palaniappan Kannappan. Introduction to functional equations. CRC Press, 2011.
[Sta09] John Stachurski. Economic dynamics: theory and computation. MIT Press, 2009.
[Str13] Daniel W Stroock. An introduction to Markov processes. Volume 230. Springer Science & Business Media,
2013.
[Wal12] John B Walsh. Knowing the odds: an introduction to probability. Volume 139. American Mathematical Soc.,
2012.
89
Continuous Time Markov Chains
90 Bibliography
PROOF INDEX
algorithm-0 jccs
algorithm-0 (markov_prop), 30 jccs (uc_mc_semigroups), 70
diffexpmap jctosg
diffexpmap (generators), ?? jctosg (kolmogorov_bwd), 43
ecuc lemma-1
ecuc (generators), ?? lemma-1 (kolmogorov_bwd), 41
ejc_algo perimposs
ejc_algo (kolmogorov_bwd), 40 perimposs (ergodicity), 81
equivirr scintcon
equivirr (ergodicity), 80
scintcon (uc_mc_semigroups), 70
erlexp sdrift
erlexp (memoryless), 7
sdrift (ergodicity), 83
example-5
example-5 (ergodicity), 82
sfinite
sfinite (ergodicity), 84
example-8
example-8 (ergodicity), 83
stabskel
stabskel (ergodicity), 83
exp_unique
exp_unique (memoryless), 5 statfromq
statfromq (ergodicity), 79
generators-prf-1
generators-prf-1 (generators), ?? strictcontract
strictcontract (ergodicity), 82
generators-prf-2
generators-prf-2 (generators), ?? theorem-0
theorem-0 (poisson), 15
imatjc
imatjc (uc_mc_semigroups), 71 theorem-1
theorem-1 (poisson), 15
intvsmk
intvsmk (kolmogorov_fwd), 54 theorem-2
theorem-2 (kolmogorov_bwd), 42
intvsmk_c
intvsmk_c (kolmogorov_fwd), 54
91
Continuous Time Markov Chains
theorem-5
theorem-5 (uc_mc_semigroups), 72
uc-mc-semigroups-prf-1
uc-mc-semigroups-prf-1 (uc_mc_semigroups), 69
ucsgec
ucsgec (generators), ??
uniirr
uniirr (ergodicity), 82
usmg
usmg (uc_mc_semigroups), 68
92 Proof Index