0% found this document useful (0 votes)
25 views96 pages

Book

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views96 pages

Book

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Continuous Time Markov Chains

Thomas J. Sargent & John Stachurski

Oct 20, 2021


CONTENTS

1 Memoryless Distributions 3
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 The Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Sums of Exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Poisson Processes 11
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Counting Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Stationary Independent Increments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 The Markov Property 21


3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Implications of the Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Examples of Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 A Model of Inventory Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6 Jump Processes with Constant Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7 Distribution Flows for the Inventory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.9 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 The Kolmogorov Backward Equation 39


4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 State Dependent Jump Intensities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Computing the Semigroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Properties of the Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 Application: The Inventory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5 The Kolmogorov Forward Equation 49


5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 From Difference Equations to ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 ODEs in Distribution Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

i
5.4 Jump Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Semigroups and Generators 59


6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.4 Semigroups and Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7 UC Markov Semigroups 67
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Notation and Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.3 UC Markov Semigroups and their Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.4 From Intensity Matrix to Jump Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.5 Beyond Bounded Intensity Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

8 Stationarity and Ergodicity 77


8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.2 Stationary Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.3 Irreducibility and Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.4 Asymptotic Stabilitiy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

9 Bibliography 87
9.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Bibliography 89

Proof Index 91

ii
Continuous Time Markov Chains

Authors: Thomas J. Sargent and John Stachurski


These lectures provides a short introduction to continuous time Markov chains. Mathematical ideas are combined with
computer code to build intuition and bridge the gap between theory and applications. There are many solved exercises.
The presentation is rigorous but aims toward applications rather than mathematical curiosities (which are plentiful, if one
starts to look). Applications are drawn from economics, finance and operations research. I assume readers have some
knowledge of discrete time Markov chains. Later lectures, use a small amount of analysis in Banach space.
Code is written in Python and accelerated using JIT compilation via Numba. QuantEcon provides an introduction to these
topics.
• Memoryless Distributions
• Poisson Processes
• The Markov Property
• The Kolmogorov Backward Equation
• The Kolmogorov Forward Equation
• Semigroups and Generators
• UC Markov Semigroups
• Stationarity and Ergodicity
• Bibliography

CONTENTS 1
Continuous Time Markov Chains

2 CONTENTS
CHAPTER

ONE

MEMORYLESS DISTRIBUTIONS

1.1 Overview

Markov processes are, by definition, forgetful.


In particular, for any Markov processes, the distribution over future outcomes depends only on the current state, rather
than the entire history.
In the case of continuous time Markov chains, which jump between discrete states, this requires that the amount of elapsed
time since the last jump is not helpful in predicting the timing of the next jump.
In other words, the jump times are “memoryless”.
It is remarkable that the only distribution on R+ with this property is the exponential distribution.
Similarly, the only memoryless distribution on Z+ is the geometric distribution.
This lecture tries to clarify these ideas.
We will use the following imports:

import numpy as np
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit
from scipy.special import factorial, binom

1.2 The Geometric Distribution

Consider betting on a roulette wheel and suppose that red has come up four times in a row.
Since five reds in a row is an unlikely event, many people instinctively feel that black is more likely on the fifth spin —
“Surely black will come up this time!”
But rational thought tells us such instincts are wrong: the four previous reds make no difference to the outcome of the
next spin.
(Many casinos offer an unlimited supply of free alcoholic beverages in order to discourage this kind of rational analysis.)
A mathematical restatement of this phenomenon is: the geometric distribution is memoryless.

3
Continuous Time Markov Chains

1.2.1 Memorylessness

Let 𝑋 be a random variable supported on the nonnegative integers Z+ .


We say that 𝑋 is geometrically distributed if, for some 𝜃 satisfying 0 ≤ 𝜃 ≤ 1,

P{𝑋 = 𝑘} = (1 − 𝜃)𝑘 𝜃 (𝑘 = 0, 1, …) (1.1)

An example can be constructed from the discussion of the roulette wheel above.
Suppose that,
• the outcome of each spin is either red or black,
• spins are labeled by 0, 1, 2, …,
• on each spin, black occurs with probability 𝜃 and
• outcomes across spins are independent.
Then (1.1) is the probability that the first occurrence of black is at spin 𝑘.
(The outcome “black” fails 𝑘 times and then succeeds.)
Consistent with our discussion in the introduction, the geometric distribution is memoryless.
In particular, given any nonnegative integer 𝑚, we have

P{𝑋 = 𝑚 + 1 | 𝑋 > 𝑚} = 𝜃 (1.2)

In other words, regardless of how long we have seen only red outcomes, the probability of black on the next spin is the
same as the unconditional probability of getting black on the very first spin.
To establish (1.2), we use basic properties of the geometric distribution to obtain.

P{𝑋 = 𝑚 + 1 and 𝑋 > 𝑚} = P{𝑋 = 𝑚 + 1} = (1 − 𝜃)𝑚+1 𝜃 = 𝜃


P{𝑋 ≥ 𝑚} P{𝑋 > 𝑚} (1 − 𝜃)𝑚+1

1.3 The Exponential Distribution

Later, when we construct continuous time Markov chains, we will need to specify the distribution of the holding times,
which are the time intervals between jumps.
As discussed above (and again below), the holding time distribution must be memoryless, so that the chain satisfies the
Markov property.
While the geometric distribution is memoryless, its discrete support makes it a poor fit for the continuous time case.
Hence we turn to the exponential distribution, which is supported on R+ .
A random variable 𝑌 on R+ is called exponential with rate 𝜆, denoted by 𝑌 ∼ Exp(𝜆), if

P{𝑌 > 𝑦} = 𝑒−𝜆𝑦 (𝑦 ≥ 0)

4 Chapter 1. Memoryless Distributions


Continuous Time Markov Chains

1.3.1 From Geometric to Exponential

The exponential distribution can be regarded as the “limit” of the geometric distribution.
To illustrate, let us suppose that
• customers enter a shop at discrete times 𝑡0 , 𝑡1 , …
• these times are evenly spaced, so that ℎ = 𝑡𝑖+1 − 𝑡𝑖 for some ℎ > 0 and all 𝑖 ∈ Z+
• at each 𝑡𝑖 , either zero or one customers enter (no more because ℎ is small)
• entry at each 𝑡𝑖 occurs with probability 𝜆ℎ and is independent over 𝑖.
The fact that the entry probability is proportional to ℎ is important in what follows.
You can imagine many customers passing by the shop, each entering independently.
If we halve the time interval, then we also halve the probability that a customer enters.
Let
• 𝑌 be the time of the first arrival at the shop,
• 𝑡 be a given positive number and
• 𝑖(ℎ) be the largest integer such that 𝑡𝑖(ℎ) ≤ 𝑡.
Note that, as ℎ → 0, the grid becomes finer and 𝑡𝑖(ℎ) = 𝑖(ℎ)ℎ → 𝑡.
Writing 𝑖(ℎ) as 𝑖 and using the geometric distribution, the probability that the first arrival occurs after 𝑡𝑖 is (1 − 𝜆ℎ)𝑖 .
Hence
𝑖
P{𝑌 > 𝑡𝑖 } = (1 − 𝜆ℎ)𝑖 = (1 −
𝜆𝑖ℎ
𝑖
)

Using the fact that 𝑒𝑥 = lim𝑖→∞ (1 + 𝑥/𝑖)𝑖 for all 𝑥 and 𝑖ℎ = 𝑡𝑖 → 𝑡, we obtain, for large 𝑖,

P{𝑌 > 𝑡} ≈ 𝑒−𝜆𝑡

In this sense, the exponential is the limit of the geometric distribution.

1.3.2 Memoryless Property of the Exponential Distribution

The exponential distribution is the only memoryless distribution supported on R+ , as the next theorem attests.

Theorem 1.1 (Characterization of the Exponential Distribution)


If 𝑋 is a random variable supported on R+ , then there exists a 𝜆 > 0 such that 𝑋 ∼ Exp(𝜆) if and only if, for all positive
𝑠, 𝑡,
P{𝑋 > 𝑠 + 𝑡 | 𝑋 > 𝑠} = P{𝑋 > 𝑡} (1.3)

P roof. To see that (1.3) holds when 𝑋 is exponential with rate 𝜆, fix 𝑠, 𝑡 > 0 and observe that
P{𝑋 > 𝑠 + 𝑡 and 𝑋 > 𝑠} = P{𝑋 > 𝑠 + 𝑡} = 𝑒−𝜆𝑠−𝜆𝑡 = 𝑒−𝜆𝑡
P{𝑋 > 𝑠} P{𝑋 > 𝑠} 𝑒−𝜆𝑠
To see that the converse holds, let 𝑋 be a random variable supported on R+ such that (1.3) holds.
The “exceedance” function 𝑓(𝑠) ∶= P{𝑋 > 𝑠} then has three properties:

1.3. The Exponential Distribution 5


Continuous Time Markov Chains

1. 𝑓 is decreasing on R+ ,
2. 0 < 𝑓(𝑡) < 1 for all 𝑡 > 0,
3. 𝑓(𝑠 + 𝑡) = 𝑓(𝑠)𝑓(𝑡) for all 𝑠, 𝑡 > 0.
The first property is common to all exceedance functions, the second is due to the fact that 𝑋 is supported on all of R+ ,
and the third is (1.3).
From these three properties we will show that

𝑓(𝑡) = 𝑓(1)𝑡 ∀ 𝑡 ≥ 0 (1.4)

This is sufficient to prove the claim because then 𝜆 ∶= − ln 𝑓(1) is a positive real number (by property 2) and, moreover,

𝑓(𝑡) = exp(ln(𝑓(1))𝑡) = exp(−𝜆𝑡)

To see that (1.4) holds, fix positive integers 𝑚, 𝑛.


We can use property 3 to obtain both

𝑓(𝑚/𝑛) = 𝑓(1/𝑛)𝑚 and 𝑓(1) = 𝑓(1/𝑛)𝑛

It follows that 𝑓(𝑚/𝑛)𝑛 = 𝑓(1/𝑛)𝑚𝑛 = 𝑓(1)𝑚 and, raising to the power of 1/𝑛, we get (1.4) when 𝑡 = 𝑚/𝑛.
The discussion so far confirms that (1.4) holds when 𝑡 is rational.
So now take any 𝑡 ≥ 0 and rational sequences (𝑎𝑛 ) and (𝑏𝑛 ) converging to 𝑡 with 𝑎𝑛 ≤ 𝑡 ≤ 𝑏𝑛 for all 𝑛.
By property 1 we have 𝑓(𝑏𝑛 ) ≤ 𝑓(𝑡) ≤ 𝑓(𝑎𝑛 ) for all 𝑛, so

𝑓(1)𝑏𝑛 ≤ 𝑓(𝑡) ≤ 𝑓(1)𝑎𝑛 ∀𝑛 ∈ N

Taking the limit in 𝑛 completes the proof.

1.3.3 Failure of Memorylessness

We know from the proceeding section that any distribution on R+ other than the exponential distribution fails to be
memoryless.
Here’s an example that helps to clarify (although the support of the distribution is a proper subset of R+ ).
A random variable 𝑌 has the Pareto distribution with positive parameters 𝑡0 , 𝛼 if

𝑓(𝑡) ∶= P{𝑌 > 𝑡} = {


1 if 𝑡 ≤ 𝑡0
(𝑡0 /𝑡)𝛼 if 𝑡 > 𝑡0

As a result, with 𝑠 > 𝑡0 ,

P{𝑌 > 𝑠 + 𝑡 | 𝑌 > 𝑠} =


P{𝑌 > 𝑠 + 𝑡} = ( 𝑡 )𝛼
P{𝑌 > 𝑠} 𝑡+𝑠

Since this probability falls with 𝑠, the distribution is not memoryless.


If we have waited many hours for an event (i.e., 𝑠 is large), then the probability of waiting another hour is relatively small.

6 Chapter 1. Memoryless Distributions


Continuous Time Markov Chains

1.4 Sums of Exponentials

A random variable 𝑊 on R+ is said to have the Erlang distribution if its density has the form

𝜆𝑛 𝑡𝑛−1 −𝜆𝑡
𝑓(𝑡) = 𝑒 (𝑡 ≥ 0)
(𝑛 − 1)!

for some 𝑛 ∈ N and 𝜆 > 0.


The parameters 𝑛 and 𝜆 are called the shape and rate parameters respectively.
The next figure shows the shape for two parameterizations.

The CDF of the Erlang distribution is


𝑛−1
(𝜆𝑡)𝑘 −𝜆𝑡
𝐹 (𝑡) = P{𝑊 ≤ 𝑡} = 1 − ∑ 𝑒 (1.5)
𝑘=0
𝑘!

The Erlang distribution is of interest to us because of the following fact.

Lemma 1.1 (Distribution of Sum of Exponentials)


𝑛
If, for some 𝜆 > 0, the sequence (𝑊𝑖 ) is IID and exponentially distributed with rate 𝜆, then 𝐽𝑛 ∶= ∑𝑖=1 𝑊𝑖 has the
Erlang distribution with shape 𝑛 and rate 𝜆.

This connects to Poisson process theory, as we shall soon see.

1.4. Sums of Exponentials 7


Continuous Time Markov Chains

1.5 Exercises

Exercise 1.1
Due to its memoryless property, we can “stop” and “restart” an exponential draw without changing its distribution.
To illustrate this, we can think of fixing 𝜆 > 0, drawing from Exp(𝜆), and stopping and restarting whenever a threshold
𝑠 is crossed.
In particular, consider the random variable 𝑋 defined as follows:
• Draw 𝑌 from Exp(𝜆).
• If 𝑌 ≤ 𝑠, set 𝑋 = 𝑌 .
• If not, draw 𝑍 independently from Exp(𝜆) and set 𝑋 = 𝑠 + 𝑍.
Show that 𝑋 ∼ Exp(𝜆).

Exercise 1.2
Fix 𝜆 = 0.5 and 𝑠 = 1.0.
Simulate 1,000 draws of 𝑋 using the algorithm above.
Plot the fraction of the sample exceeding 𝑡 for each 𝑡 ≥ 0 (on a grid) and compare to 𝑡 ↦ 𝑒−𝜆𝑡 .
Is the fit good? How about if the number of draws is increased?
Are the results in line with those of the previous exercise?

1.6 Solutions

Note: code is currently not supported in sphinx-exercise so code-cell solutions are immediately after this solution
block.

Solution to Exercise 1.1


Let 𝑋 be constructed as in the statement of the exercise and fix 𝑡 > 0.
Notice that 𝑋 > 𝑠 + 𝑡 if and only if 𝑌 > 𝑠 and 𝑍 > 𝑡.
As a result of this fact and independence,

P{𝑋 > 𝑠 + 𝑡} = P{𝑌 > 𝑠}P{𝑍 > 𝑡} = 𝑒−𝜆(𝑠+𝑡)

At the same time, 𝑋 > 𝑠 − 𝑡 if and only if 𝑌 > 𝑠 − 𝑡, so

P{𝑋 > 𝑠 − 𝑡} = P{𝑌 > 𝑠 − 𝑡} = 𝑒−𝜆(𝑠−𝑡)

Either way, we have 𝑋 ∼ Exp(𝜆), as was to be shown.

8 Chapter 1. Memoryless Distributions


Continuous Time Markov Chains

Solution to Exercise 1.2


Here’s one solution, starting with 1,000 draws.

λ = 0.5
np.random.seed(1234)
t_grid = np.linspace(0, 10, 200)

@njit
def draw_X(s=1.0, n=1_000):
draws = np.empty(n)
for i in range(n):
Y = np.random.exponential(scale=1/λ)
if Y <= s:
X = Y
else:
Z = np.random.exponential(scale=1/λ)
X = s + Z
draws[i] = X
return draws

fig, ax = plt.subplots()
draws = draw_X()
empirical_exceedance = [np.mean(draws > t) for t in t_grid]
ax.plot(t_grid, np.exp(- λ * t_grid), label='exponential exceedance')
ax.plot(t_grid, empirical_exceedance, label='empirical exceedance')
ax.legend()

plt.show()

Solution to Exercise 1.2


Solution Continued:

1.6. Solutions 9
Continuous Time Markov Chains

The fit is already very close, which matches with the theory in Exercise 1.
The two lines become indistinguishable as 𝑛 is increased further.

fig, ax = plt.subplots()
draws = draw_X(n=10_000)
empirical_exceedance = [np.mean(draws > t) for t in t_grid]
ax.plot(t_grid, np.exp(- λ * t_grid), label='exponential exceedance')
ax.plot(t_grid, empirical_exceedance, label='empirical exceedance')
ax.legend()

plt.show()

10 Chapter 1. Memoryless Distributions


CHAPTER

TWO

POISSON PROCESSES

2.1 Overview

Counting processes count the number of “arrivals” occurring by a given time (e.g., the number of visitors to a website,
the number of customers arriving at a restaurant, etc.)
Counting processes become Poisson processes when the time interval between arrivals is IID and exponentially distributed.
Exponential distributions and Poisson processes have deep connections to continuous time Markov chains.
For example, Poisson processes are one of the simplest nontrivial examples of a continuous time Markov chain.
In addition, when continuous time Markov chains jump between states, the time between jumps is necessarily exponen-
tially distributed.
In discussing Poisson processes, we will use the following imports:

import numpy as np
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit
from scipy.special import factorial, binom

2.2 Counting Processes

Let’s start with the general case of an arbitrary counting process.

2.2.1 Jumps and Counts

Let (𝐽𝑘 ) be an increasing sequence of nonnegative random variables satisfying 𝐽𝑘 → ∞ with probability one.
For example, 𝐽𝑘 might be the time the 𝑘-th customer arrives at a shop.
Then

𝑁𝑡 ∶= ∑ 𝑘𝟙{𝐽𝑘 ≤ 𝑡 < 𝐽𝑘+1 } (2.1)


𝑘≥0

is the number of customers that have visited by time 𝑡.


The next figure illustrate the definition of 𝑁𝑡 for a given jump sequence {𝐽𝑘 }.

11
Continuous Time Markov Chains

An alternative but equivalent definition is

𝑁𝑡 ∶= max{𝑘 ≥ 0 | 𝐽𝑘 ≤ 𝑡}

As a function of 𝑡, the process 𝑁𝑡 is called a counting process.


The jump times (𝐽𝑘 ) are sometimes called arrival times and the intervals 𝐽𝑘 − 𝐽𝑘−1 are called wait times or holding
times.

2.2.2 Exponential Holding Times

A Poisson process is a counting process with independent exponential holding times.


In particular, suppose that the arrival times are given by 𝐽0 = 0 and

𝐽𝑘 ∶= 𝑊1 + ⋯ 𝑊𝑘

where (𝑊𝑖 ) are IID exponential with some fixed rate 𝜆.


Then the associated counting process (𝑁𝑡 ) is called a Poisson process with rate 𝜆.
The rationale behind the name is that, for each 𝑡 > 0, the random variable 𝑁𝑡 has the Poisson distribution with parameter
𝑡𝜆.
In other words,
𝑘
P{𝑁𝑡 = 𝑘} = 𝑒−𝑡𝜆 (𝑡𝜆)
𝑘!
(𝑘 = 0, 1, …) (2.2)

For example, since 𝑁𝑡 = 0 if and only if 𝑊1 > 𝑡, we have

P{𝑁𝑡 = 0} = P{𝑊1 > 𝑡} = 𝑒−𝑡𝜆


and the right hand side agrees with (2.2) when 𝑘 = 0.

12 Chapter 2. Poisson Processes


Continuous Time Markov Chains

This sets up a proof by induction, which is time consuming but not difficult — the details can be found in §29 of [Howard,
2017].
Another way to show that 𝑁𝑡 is Poisson with rate 𝜆 is to appeal to Lemma 1.1.
We observe that

P{𝑁𝑡 ≤ 𝑛} = P{𝐽𝑛+1 > 𝑡} = 1 − P{𝐽𝑛+1 ≤ 𝑡}


Inserting the expression for the Erlang CDF in (1.5) with shape 𝑛 + 1 and rate 𝜆, we obtain
𝑛 𝑘
P{𝑁𝑡 ≤ 𝑛} = ∑ (𝑡𝜆)
𝑘!
𝑒−𝑡𝜆
𝑘=0

This is the (integer valued) CDF for the Poisson distribution with parameter 𝑡𝜆.
An exercise at the end of the lecture asks you to verify that 𝑁𝑡 is Poisson-(𝑡𝜆) informally via simulation.
The next figure shows one realization of a Poisson process (𝑁𝑡 ), with jumps at each new arrival.

2.3 Stationary Independent Increments

One of the defining features of a Poisson process is that it has stationary and independent increments.
This is due to the memoryless property of exponentials.
It means that
1. the variables {𝑁𝑡𝑖+1 − 𝑁𝑡𝑖 }𝑖∈𝐼 are independent for any strictly increasing finite sequence (𝑡𝑖 )𝑖∈𝐼 and
2. the distribution of 𝑁𝑡+ℎ − 𝑁𝑡 depends on ℎ but not 𝑡.
A detailed proof can be found in Theorem 2.4.3 of [Norris, 1998].
Instead of repeating this, we provide some intuition from a discrete approximation.

2.3. Stationary Independent Increments 13


Continuous Time Markov Chains

In the discussion below, we use the following well known fact: If (𝜃𝑛 ) is a sequence such that 𝑛𝜃𝑛 converges, then

Binomial(𝑛, 𝜃𝑛 ) ≈ Poisson(𝑛𝜃𝑛 ) for large 𝑛 (2.3)

(The exercises ask you to examine this claim visually.)


We now return to the environment where we linked the geometric distribution to the exponential.
That is, we fix small ℎ > 0 and let 𝑡𝑖 ∶= 𝑖ℎ for all 𝑖 ∈ Z+ .
Let (𝑉𝑖 ) be IID binary random variables with P{𝑉𝑖 = 1} = ℎ𝜆 for some 𝜆 > 0.
Linking to our previous discussion,
• either one or zero customers visits a shop at each 𝑡𝑖 .
• 𝑉𝑖 = 1 means that a customer visits at time 𝑡𝑖 .
• Visits occur with probability ℎ𝜆, which is proportional to the length of the interval between grid points.
We learned that the wait time until the first visit is approximately exponential with rate 𝑡𝜆.
Since (𝑉𝑖 ) is IID, the same is true for the second wait time and so on.
Moreover, these wait times are independent, since they depend on separate subsets of (𝑉𝑖 ).
Let 𝑁𝑡̂ count the number of visits by time 𝑡, as shown in the next figure.
(𝑉𝑖 = 1 is indicated by a vertical line at 𝑡𝑖 = 𝑖ℎ.)

We expect from the discussion above that (𝑁𝑡̂ ) approximates a Poisson process.
This intuition is correct because, fixing 𝑡, letting 𝑘 ∶= max{𝑖 ∈ Z+ ∶ 𝑡𝑖 ≤ 𝑡} and applying (2.3), we have
𝑘
𝑁𝑡̂ = ∑ 𝑉𝑖 ∼ Binomial(𝑘, ℎ𝜆) ≈ Poisson(𝑘ℎ𝜆)
𝑖=1

Using the fact that 𝑘ℎ = 𝑡𝑘 ≈ 𝑡 as ℎ → 0, we see that 𝑁𝑡̂ is approximately Poisson with rate 𝑡𝜆, just as we expected.
This approximate construction of a Poisson process helps illustrate the property of stationary independent increments.

14 Chapter 2. Poisson Processes


Continuous Time Markov Chains

̂ − 𝑁𝑠̂ is the number of visits between 𝑠 and 𝑠 + 𝑡, so that


For example, if we fix 𝑠, 𝑡, then 𝑁𝑠+𝑡

̂ − 𝑁𝑠̂ = ∑ 𝑉𝑖 𝟙{𝑠 ≤ 𝑡𝑖 < 𝑠 + 𝑡}


𝑁𝑠+𝑡
𝑖

Suppose there are 𝑘 grid points between 𝑠 and 𝑠 + 𝑡, so that 𝑡 ≈ 𝑘ℎ.


Then

̂ − 𝑁𝑠̂ ∼ Binomial(𝑘, ℎ𝜆) ≈ Poisson(𝑘ℎ𝜆) ≈ Poisson(𝑡𝜆)


𝑁𝑠+𝑡

This illustrates the idea that, for a Poisson process (𝑁𝑡 ), we have

𝑁𝑠+𝑡 − 𝑁𝑠 ∼ Poisson(𝑡𝜆)

In particular, increments are stationary (the distribution depends on 𝑡 but not 𝑠).
The approximation also illustrates independence of increments, since, in the approximation, increments depend on sep-
arate subsets of (𝑉𝑖 ).

2.4 Uniqueness

What other counting processes have stationary independent increments?


Remarkably, the answer is none:

Theorem 2.1 (Characterization of Poisson Processes)


If (𝑀𝑡 ) is a stochastic process supported on Z+ and starting at 0 with the property that its increments are stationary and
independent, then (𝑀𝑡 ) is a Poisson process.

In particular, there exists a 𝜆 > 0 such that

𝑀𝑠+𝑡 − 𝑀𝑠 ∼ Poisson(𝑡𝜆)

for any 𝑠, 𝑡.
The proof is similar to our earlier proof that the exponential distribution is the only memoryless distribution.
Details can be found in Section 6.2 of [Pardoux, 2008] or Theorem 2.4.3 of [Norris, 1998].

2.4.1 The Restarting Property

An important consequence of stationary independent increments is the restarting property, which means that, when sim-
ulating, we can freely stop and restart a Poisson process at any time:

Theorem 2.2 (Poisson Processes can be Paused and Restarted)


If (𝑁𝑡 ) is a Poisson process, 𝑠 > 0 and (𝑀𝑡 ) is defined by 𝑀𝑡 = 𝑁𝑠+𝑡 − 𝑁𝑠 for 𝑡 ≥ 0, then (𝑀𝑡 ) is a Poisson process
independent of (𝑁𝑟 )𝑟≤𝑠 .

P roof. Independence of (𝑀𝑡 ) and (𝑁𝑟 )𝑟≤𝑠 follows from indepenence of the increments of (𝑁𝑡 ).

2.4. Uniqueness 15
Continuous Time Markov Chains

In view of the uniqueness statement above, we can verify that (𝑀𝑡 ) is a Poisson process by showing that (𝑀𝑡 ) starts at
zero, takes values in Z+ and has stationary independent increments.
It is clear that (𝑀𝑡 ) starts at zero and takes values in Z+ .
In addition, if we take any 𝑡 < 𝑡′ , then

𝑀𝑡′ − 𝑀𝑡 = 𝑁𝑠+𝑡′ − 𝑁𝑠+𝑡 ∼ Poisson((𝑡′ − 𝑡)𝜆)

Hence (𝑀𝑡 ) has stationary increments and, using the relation 𝑀𝑡′ − 𝑀𝑡 = 𝑁𝑠+𝑡′ − 𝑁𝑠+𝑡 again, the increments are
independent as well.
We conclude that (𝑁𝑠+𝑡 − 𝑁𝑠 )𝑡≥0 is indeed a Poisson process independent of (𝑁𝑟 )𝑟≤𝑠 .

2.5 Exercises

Exercise 2.1
Fix 𝜆 > 0 and draw {𝑊𝑖 } as IID exponentials with rate 𝜆.
Set 𝐽𝑛 ∶= 𝑊1 + ⋯ 𝑊𝑛 with 𝐽0 = 0 and 𝑁𝑡 ∶= ∑𝑛≥0 𝑛𝟙{𝐽𝑛 ≤ 𝑡 < 𝐽𝑛+1 }.
Provide a visual test of the claim that 𝑁𝑡 is Poisson with parameter 𝑡𝜆.
Do this by fixing 𝑡 = 𝑇 , generating many independent draws of 𝑁𝑇 and comparing the empirical distribution of the
sample with a Poisson distribution with rate 𝑇 𝜆.
Try first with 𝜆 = 0.5 and 𝑇 = 10.

Exercise 2.2
In the lecture we used the fact that Binomial(𝑛, 𝜃) ≈ Poisson(𝑛𝜃) when 𝑛 is large and 𝜃 is small.
Investigate this relationship by plotting the distributions side by side.
Experiment with different values of 𝑛 and 𝜃.

2.6 Solutions

Note: code is currently not supported in sphinx-exercise so code-cell solutions are immediately after this solution
block.

Solution to Exercise 2.1


Here is one solution.
The figure shows that the fit is already good with a modest sample size.
Increasing the sample size will further improve the fit.

16 Chapter 2. Poisson Processes


Continuous Time Markov Chains

λ = 0.5
T = 10

def poisson(k, r):


"Poisson pmf with rate r."
return np.exp(-r) * (r**k) / factorial(k)

@njit
def draw_Nt(max_iter=1e5):
J = 0
n = 0
while n < max_iter:
W = np.random.exponential(scale=1/λ)
J += W
if J > T:
return n
n += 1

@njit
def draw_Nt_sample(num_draws):
draws = np.empty(num_draws)
for i in range(num_draws):
draws[i] = draw_Nt()
return draws

sample_size = 10_000
sample = draw_Nt_sample(sample_size)
max_val = sample.max()
vals = np.arange(0, max_val+1)

fig, ax = plt.subplots()

ax.plot(vals, [poisson(v, T * λ) for v in vals],


marker='o', label='poisson')
ax.plot(vals, [np.mean(sample==v) for v in vals],
marker='o', label='empirical')

ax.legend(fontsize=12)
plt.show()

2.6. Solutions 17
Continuous Time Markov Chains

Solution to Exercise 2.2


Here is one solution. It shows that the approximation is good when 𝑛 is large and 𝜃 is small.

def binomial(k, n, p):


# Binomial(n, p) pmf evaluated at k
return binom(n, k) * p**k * (1-p)**(n-k)

θ_vals = 0.5, 0.2, 0.1

n_vals = 50, 75, 100

fig, axes = plt.subplots(len(n_vals), 1, figsize=(6, 12))

for n, θ, ax in zip(n_vals, θ_vals, axes.flatten()):

k_grid = np.arange(n)
binom_vals = [binomial(k, n, θ) for k in k_grid]
poisson_vals = [poisson(k, n * θ) for k in k_grid]
ax.plot(k_grid, binom_vals, 'o-', alpha=0.5, label='binomial')
ax.plot(k_grid, poisson_vals, 'o-', alpha=0.5, label='Poisson')
ax.set_title(f'$n={n}$ and $\\theta = {θ}$')
ax.legend(fontsize=12)

fig.tight_layout()
plt.show()

18 Chapter 2. Poisson Processes


Continuous Time Markov Chains

2.6. Solutions 19
Continuous Time Markov Chains

20 Chapter 2. Poisson Processes


CHAPTER

THREE

THE MARKOV PROPERTY

3.1 Overview

A continuous time stochastic process is said to have the Markov property if its past and future are independent given the
current state.
(A more formal definition is provided below.)
As we will see, the Markov property imposes a large amount of structure on continuous time processes.
This structure leads to elegant and powerful results on evolution and dynamics.
At the same time, the Markov property is general enough to cover many applied problems, as described in the introduction.

3.1.1 Setting

In this lecture, the state space where dynamics evolve will be a countable set, denoted henceforth by 𝑆, with typical
elements 𝑥, 𝑦.
(Note that “countable” is understood to include finite.)
Regarding notation, in what follows, ∑𝑥∈𝑆 is abbreviated to ∑𝑥 , the supremum sup𝑥∈𝑆 is abbreviated to sup𝑥 and so
on.
A distribution on 𝑆 is a function 𝜙 from 𝑆 to R+ with ∑𝑥 𝜙(𝑥) = 1.
Let 𝒟 denote the set of all distributions on 𝑆.
To economize on terminology, we define a matrix 𝐴 on 𝑆 to be a map from 𝑆 × 𝑆 to R.
When 𝑆 is finite, this reduces to the usual notion of a matrix, and, whenever you see expressions such as 𝐴(𝑥, 𝑦) below,
you can mentally identify them with more familiar matrix notation, such as 𝐴𝑖𝑗 , if you wish.
The product of two matrices 𝐴 and 𝐵 is defined by

(𝐴𝐵)(𝑥, 𝑦) = ∑ 𝐴(𝑥, 𝑧)𝐵(𝑧, 𝑦) ((𝑥, 𝑦) ∈ 𝑆 × 𝑆) (3.1)


𝑧

If 𝑆 is finite, then this is just ordinary matrix multiplication.


In statements involving matrix algebra, we always treat distributions as row vectors, so that, for 𝜙 ∈ 𝒟 and given matrix
𝐴,

(𝜙𝐴)(𝑦) = ∑ 𝜙(𝑥)𝐴(𝑥, 𝑦)
𝑥

We will use the following imports

21
Continuous Time Markov Chains

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

import quantecon as qe
from numba import njit

from scipy.linalg import expm


from scipy.stats import binom

from matplotlib import cm


from mpl_toolkits.mplot3d import Axes3D

3.2 Markov Processes

We now introduce the definition of Markov processes, first reviewing the discrete case and then shifting to continuous
time.

3.2.1 Discrete Time, Finite State

The simplest Markov processes are those with a discrete time parameter and finite state space.
Assume for now that 𝑆 has 𝑛 elements and let 𝑃 be a Markov matrix, which means that 𝑃 (𝑥, 𝑦) ≥ 0 and ∑𝑦 𝑃 (𝑥, 𝑦) =
1 for all 𝑥.
In applications, 𝑃 (𝑥, 𝑦) represents the probability of transitioning from 𝑥 to 𝑦 in one step.
A Markov chain (𝑋𝑡 )𝑡∈Z+ on 𝑆 with Markov matrix 𝑃 is a sequence of random variables satisfying

P{𝑋𝑡+1 = 𝑦 | 𝑋0 , 𝑋1 , … , 𝑋𝑡 } = 𝑃 (𝑋𝑡 , 𝑦) (3.2)

with probability one for all 𝑦 ∈ 𝑆 and any 𝑡 ∈ Z+ .


In addition to connecting probabilities to the Markov matrix, (3.2) says that the process depends on its history only through
the current state.
We recall that, if 𝑋𝑡 has distribution 𝜙, then 𝑋𝑡+1 has distribution 𝜙𝑃 .
Since 𝜙 is understood as a row vector, the meaning is

(𝜙𝑃 )(𝑦) = ∑ 𝜙(𝑥)𝑃 (𝑥, 𝑦) (𝑦 ∈ 𝑆) (3.3)


𝑥

The Joint Distribution

In general, for given Markov matrix 𝑃 , there can be many Markov chains (𝑋𝑡 ) that satisfy (3.2).
This is due to the more general observation that, for a given distribution 𝜙, we can construct many random variables
having distribution 𝜙.
(The exercises below ask for one example.)
Hence 𝑃 is, in a sense, a more primitive object than (𝑋𝑡 ).
There is another way to see the fundamental importance of 𝑃 , which is by constructing the joint distribution of (𝑋𝑡 )
from 𝑃 .

22 Chapter 3. The Markov Property


Continuous Time Markov Chains

Let 𝑆 ∞ represent the space of 𝑆-valued sequences (𝑥0 , 𝑥1 , 𝑥2 , …).


Fix an initial condition 𝜓 ∈ 𝒟 and a Markov matrix 𝑃 on 𝑆.
The joint distribution of a Markov chain (𝑋𝑡 ) satisfying (3.2) and 𝑋0 ∼ 𝜓 is the distribution P𝜓 over 𝑆 ∞ such that

P{𝑋𝑡 1
= 𝑦1 , … , 𝑋𝑡𝑚 = 𝑦𝑚 } = P𝜓 {(𝑥𝑡 ) ∈ 𝑆 ∞ ∶ 𝑥𝑡𝑖 = 𝑦𝑖 for 𝑖 = 1, … 𝑚} (3.4)

for any 𝑚 positive integers 𝑡𝑖 and 𝑚 elements 𝑦𝑖 of the state space 𝑆.


(Joint distributions of discrete time processes are uniquely defined by their values at finite collections of times — see, for
example, Theorem 7.2 of [Walsh, 2012].)
We can construct P𝜓 by first defining 𝑃𝜓𝑛 over the finite Cartesian product 𝑆 𝑛+1 via

P𝑛𝜓 (𝑥0 , 𝑥1 , … , 𝑥𝑛 ) = 𝜓(𝑥0 )𝑃 (𝑥0 , 𝑥1 ) × ⋯ × 𝑃 (𝑥𝑛−1 , 𝑥𝑛 ) (3.5)

For any Markov chain (𝑋𝑡 ) satisfying (3.2) and 𝑋0 ∼ 𝜓, the restriction (𝑋0 , … , 𝑋𝑛 ) has joint distribution P𝑛𝜓 .
This is a solved exercise below.
The last step is to show that the family (P𝑛𝜓 ) defined at each 𝑛 ∈ N extends uniquely to a distribution P𝜓 over the infinite
sequences in 𝑆 ∞ .
That this is true follows from a well known theorem of Kolmogorov.
Hence 𝑃 defines the joint distribution P𝜓 when paired with any initial condition 𝜓.

3.2.2 Extending to Infinite (Countable) State Spaces

When 𝑆 is infinite, the same idea carries through.


Consistent with the finite case, a Markov matrix is a map 𝑃 from 𝑆 × 𝑆 to R+ satisfying

∑ 𝑃 (𝑥, 𝑦) = 1 for all 𝑥 ∈ 𝑆


𝑦

The definition of a Markov chain (𝑋𝑡 )𝑡∈Z+ on 𝑆 with Markov matrix 𝑃 is exactly as in (3.2).
Given Markov matrix 𝑃 and 𝜙 ∈ 𝒟, we define 𝜙𝑃 by (3.3).
Then, as before, 𝜙𝑃 can be understood as the distribution of 𝑋𝑡+1 when 𝑋𝑡 has distribution 𝜙.
The function 𝜙𝑃 is in 𝒟, since, by (3.3), it is nonnegative and

∑(𝜙𝑃 )(𝑦) = ∑ ∑ 𝑃 (𝑥, 𝑦)𝜙(𝑥) = ∑ ∑ 𝑃 (𝑥, 𝑦)𝜙(𝑥) = ∑ 𝜙(𝑥) = 1


𝑦 𝑦 𝑥 𝑥 𝑦 𝑥

(Swapping the order of infinite sums is justified here by the fact that all elements are nonnegative — a version of Tonelli’s
theorem).
If 𝑃 and 𝑄 are Markov matrices on 𝑆, then, using the definition in (3.1),

(𝑃 𝑄)(𝑥, 𝑦) ∶= ∑ 𝑃 (𝑥, 𝑧)𝑄(𝑧, 𝑦)


𝑧

It is not difficult to check that 𝑃 𝑄 is again a Markov matrix on 𝑆.


The elements of 𝑃 𝑘 , the 𝑘-th product of 𝑃 with itself, give 𝑘 step transition probabilities.
For example, we have

𝑃 𝑘 (𝑥, 𝑦) = (𝑃 𝑘−𝑗 𝑃 𝑗 )(𝑥, 𝑦) = ∑ 𝑃 𝑘−𝑗 (𝑥, 𝑧)𝑃 𝑗 (𝑧, 𝑦) (3.6)


𝑧

3.2. Markov Processes 23


Continuous Time Markov Chains

which is a version of the (discrete time) Chapman-Kolmogorov equation.


Equation (3.6) can be obtained from the law of total probability: if (𝑋𝑡 ) is a Markov chain with Markov matrix 𝑃 and
initial condition 𝑋0 = 𝑥, then

P{𝑋𝑘 = 𝑦} = ∑ P{𝑋𝑘 = 𝑦 | 𝑋𝑗 = 𝑧}P{𝑋𝑗 = 𝑧}


𝑧

All of the preceding discussion on the connection between 𝑃 and the joint distribution of (𝑋𝑡 ) when 𝑆 is finite carries
over to the current setting.

3.2.3 The Continuous Time Case

A continuous time stochastic process on 𝑆 is a collection (𝑋𝑡 ) of 𝑆-valued random variables 𝑋𝑡 defined on a common
probability space and indexed by 𝑡 ∈ R+ .
Let 𝐼 be the Markov matrix on 𝑆 defined by 𝐼(𝑥, 𝑦) = 𝟙{𝑥 = 𝑦}.
A Markov semigroup is a family (𝑃𝑡 ) of Markov matrices on 𝑆 satisfying
1. 𝑃0 = 𝐼,
2. lim𝑡→0 𝑃𝑡 (𝑥, 𝑦) = 𝐼(𝑥, 𝑦) for all 𝑥, 𝑦 in 𝑆, and
3. the semigroup property 𝑃𝑠+𝑡 = 𝑃𝑠 𝑃𝑡 for all 𝑠, 𝑡 ≥ 0.
The interpretation of 𝑃𝑡 (𝑥, 𝑦) is the probability of moving from state 𝑥 to state 𝑦 in 𝑡 units of time.
As such it is natural that 𝑃0 (𝑥, 𝑦) = 1 if 𝑥 = 𝑦 and zero otherwise, which is condition 1.
Condition 2 is continuity with respect to 𝑡, which might seem restrictive but it is in fact very mild.
For all practical applications, probabilities do not jump — although the chain (𝑋𝑡 ) itself can of course jump from state
to state as time goes by.1
The semigroup property in condition 3 is nothing more than a continuous time version of the Chapman-Kolmogorov
equation.
This becomes clearer if we write it more explicitly as

𝑃𝑠+𝑡 (𝑥, 𝑦) = ∑ 𝑃𝑠 (𝑥, 𝑧)𝑃𝑡 (𝑧, 𝑦) (3.7)


𝑧

A stochastic process (𝑋𝑡 ) is called a (time homogeneous) continuous time Markov chain on 𝑆 with Markov semigroup
(𝑃𝑡 ) if

P{𝑋𝑠+𝑡 = 𝑦 | ℱ𝑠 } = 𝑃𝑡 (𝑋𝑠 , 𝑦) (3.8)

with probability one for all 𝑦 ∈ 𝑆 and 𝑠, 𝑡 ≥ 0.


Here ℱ𝑠 is the history (𝑋𝑟 )𝑟≤𝑠 of the process up until time 𝑠.
If you are an economist you might call ℱ𝑠 the “information set” at time 𝑠.
If you are familiar with measure theory, you can understand ℱ𝑠 as the 𝜎-algebra generated by (𝑋𝑟 )𝑟≤𝑠 .
Analogous to the discrete time case, the joint distribution of (𝑋𝑡 ) is determined by its Markov semigroup plus an initial
condition.
This distribution is defined over the set of all right continuous functions R+ ∋ 𝑡 ↦ 𝑥𝑡 ∈ 𝑆, which we call 𝑟𝑐𝑆.
1 On a technical level, right continuity of paths for (𝑋 ) implies condition 2, as proved in Theorem 2.12 of [Liggett, 2010]. Right continuity of
𝑡
paths allows for jumps, but insists on only finitely many jumps in any bounded interval.

24 Chapter 3. The Markov Property


Continuous Time Markov Chains

Next one builds finite dimensional distributions over 𝑟𝑐𝑆 using expressions similar to (3.5).
Finally, the Kolmogorov extension theorem is applied, similar to the discrete time case.
Corollary 6.4 of [Le Gall, 2016] provides full details.

3.2.4 The Canonical Chain

Given a Markov semigroup (𝑃𝑡 ) on 𝑆, does there always exist a continuous time Markov chain (𝑋𝑡 ) such that (3.8) holds?
The answer is affirmative.
To illustrate, pick any Markov semigroup (𝑃𝑡 ) on 𝑆 and fix initial condition 𝜓.
Next, create the corresponding joint distribution P𝜓 over 𝑟𝑐𝑆, as described above.
Now, for each 𝑡 ≥ 0, let 𝜋𝑡 be the time 𝑡 projection on 𝑟𝑐𝑆, which maps any right continuous function (𝑥𝜏 ) into its time
𝑡 value 𝑥𝑡 .
Finally, let 𝑋𝑡 be an 𝑆-valued function on 𝑟𝑐𝑆 defined at (𝑥𝜏 ) ∈ 𝑟𝑐𝑆 by 𝜋𝑡 ((𝑥𝜏 )).
In other words, after P𝜓 picks out some time path (𝑥𝜏 ) ∈ 𝑟𝑐𝑆, the Markov chain (𝑋𝑡 ) simply reports this time path.
Hence (𝑋𝑡 ) automatically has the correct distribution.
The chain (𝑋𝑡 ) constructed in this way is called the canonical chain for the semigroup (𝑃𝑡 ) and initial condition 𝜓.

3.2.5 Simulation and Probabilistic Constructions

While we have answered the existence question in the affirmative, the canonical construction is quite abstract.
Moreover, there is little information about how we might simulate such a chain.
Fortunately, it turns out that there are more concrete ways to build continuous time Markov chains from the objects that
describe their distributions.
We will learn about these in a later lecture.

3.3 Implications of the Markov Property

The Markov property carries some strong implications that are not immediately obvious.
Let’s take some time to explore them.

3.3.1 Example: Failure of the Markov Property

Let’s look at how the Markov property can fail, via an intuitive rather than formal discussion.
Let (𝑋𝑡 ) be a continuous time stochastic process with state space 𝑆 = {0, 1}.
The process starts at 0 and updates at follows:
1. Draw 𝑊 independently from a fixed Pareto distribution.
2. Hold (𝑋𝑡 ) in its current state for 𝑊 units of time and then switch to the other state.
3. Go to step 1.

3.3. Implications of the Markov Property 25


Continuous Time Markov Chains

What is the probability that 𝑋𝑠+ℎ = 𝑖 given both the history (𝑋𝑟 )𝑟≤𝑠 and current information 𝑋𝑠 = 𝑖?
If ℎ is small, then this is close to the probability that there are zero switches over the time interval (𝑠, 𝑠 + ℎ].
To calculate this probability, it would be helpful to know how long the state has been at current state 𝑖.
This is because the Pareto distribution is not memoryless.
(With a Pareto distribution, if we know that 𝑋𝑡 has been at 𝑖 for a long time, then a switch in the near future becomes
more likely.)
As a result, the history prior to 𝑋𝑠 is useful for predicting 𝑋𝑠+ℎ , even when we know 𝑋𝑠 .
Thus, the Markov property fails.

3.3.2 Restrictions Imposed by the Markov Property

From the discussion above, we see that, for continuous time Markov chains, the length of time between jumps must be
memoryless.
Recall that, by Theorem 1.1, the only memoryless distribution supported on R+ is the exponential distribution.
Hence, a continuous time Markov chain waits at states for an exponential amount of time and then jumps.
The way that the new state is chosen must also satisfy the Markov property, which adds another restriction.
In summary, we already understand the following about continuous time Markov chains:
1. Holding times are independent exponential draws.
2. New states are chosen in a ``Markovian’’ way, independent of the past given the current state.
We just need to clarify the details in these steps to have a complete description.

3.4 Examples of Markov Processes

Let’s look at some examples of processes that possess the Markov property.

3.4.1 Example: Poisson Processes

The Poisson process discussed in our previous lecture is a Markov process on state space Z+ .
To obtain the Markov semigroup, we observe that, for 𝑘 ≥ 𝑗,

P{𝑁𝑠+𝑡 = 𝑘 | 𝑁𝑠 = 𝑗} = P{𝑁𝑠+𝑡 − 𝑁𝑠 = 𝑘 − 𝑗 | 𝑁𝑠 = 𝑗} = P{𝑁𝑠+𝑡 − 𝑁𝑠 = 𝑘 − 𝑗}


where the last step is due to independence of increments.
From stationarity of increments we have
𝑘−𝑗
P{𝑁𝑠+𝑡 − 𝑁𝑠 = 𝑘 − 𝑗} = P{𝑁𝑡 = 𝑘 − 𝑗} = 𝑒−𝜆𝑡 (𝑘
(𝜆𝑡)
− 𝑗)!

In summary, the Markov semigroup is

(𝜆𝑡)𝑘−𝑗
𝑃𝑡 (𝑗, 𝑘) = 𝑒−𝜆𝑡 (3.9)
(𝑘 − 𝑗)!

whenever 𝑗 ≤ 𝑘 and 𝑃𝑡 (𝑗, 𝑘) = 0 otherwise.

26 Chapter 3. The Markov Property


Continuous Time Markov Chains

This chain of equalities was obtained with 𝑁𝑠 = 𝑗 for arbitrary 𝑗, so we can replace 𝑗 with 𝑁𝑠 in (3.9) to verify the
Markov property (3.8) for the Poisson process.
Under (3.9), each 𝑃𝑡 is a Markov matrix and (𝑃𝑡 ) is a Markov semigroup.
The proof of the semigroup property is a solved exercise below.2

3.5 A Model of Inventory Dynamics

Let 𝑋𝑡 be the inventory of a firm at time 𝑡, taking values in the integers 0, 1, … , 𝑏.


If 𝑋𝑡 > 0, then a customer arrives after 𝑊 units of time, where 𝑊 ∼ Exp(𝜆) for some fixed 𝜆 > 0.
Upon arrival, each customer purchases min{𝑈 , 𝑋𝑡 } units, where 𝑈 is an IID draw from the geometric distribution started
at 1 rather than 0:

P{𝑈 = 𝑘} = (1 − 𝛼)𝑘−1 𝛼 (𝑘 = 1, 2, … , 𝛼 ∈ (0, 1))

If 𝑋𝑡 = 0, then no customers arrive and the firm places an order for 𝑏 units.
The order arrives after a delay of 𝐷 units of time, where 𝐷 ∼ Exp(𝜆).
(We use the same 𝜆 here just for convenience, to simplify the exposition.)

3.5.1 Representation

The inventory process jumps to a new value either when a new customer arrives or when new stock arrives.
Between these arrival times it is constant.
Hence, to track 𝑋𝑡 , it is enough to track the jump times and the new values taken at the jumps.
In what follows, we denote the jump times by {𝐽𝑘 } and the values at jumps by {𝑌𝑘 }.
Then we construct the state process via

𝑋𝑡 = ∑ 𝑌𝑘 𝟙{𝐽𝑘 ≤ 𝑡 < 𝐽𝑘+1 } (𝑡 ≥ 0) (3.10)


𝑘≥0

3.5.2 Simulation

Let’s simulate this process, starting at 𝑋0 = 0.


As above,
• 𝐽𝑘 is the time of the 𝑘-th jump (up or down) in inventory.
• 𝑌𝑘 is the size of the inventory after the 𝑘-th jump.
• (𝑋𝑡 ) is defined from these objects via (3.10).
Here’s a function that generates and returns one path 𝑡 ↦ 𝑋𝑡 .
(We are not aiming for computational efficiency at this stage.)

2 In the definition of 𝑃 in (3.9), we use the convention that 00 = 1, which leads to 𝑃 = 𝐼 and lim
𝑡 0 𝑡→0 𝑃𝑡 (𝑗, 𝑘) = 𝐼(𝑗, 𝑘) for all 𝑗, 𝑘. These
facts, along with the semigroup property, imply that (𝑃𝑡 ) is a valid Markov semigroup.

3.5. A Model of Inventory Dynamics 27


Continuous Time Markov Chains

def sim_path(T=10, seed=123, λ=0.5, α=0.7, b=10):


"""
Generate a path for inventory starting at b, up to time T.

Return the path as a function X(t) constructed from (J_k) and (Y_k).
"""

J, Y = 0, b
J_vals, Y_vals = [J], [Y]
np.random.seed(seed)

while True:
W = np.random.exponential(scale=1/λ) # W ~ Exp(λ)
J += W
J_vals.append(J)
if J >= T:
break
# Update Y
if Y == 0:
Y = b
else:
U = np.random.geometric(α)
Y = Y - min(Y, U)
Y_vals.append(Y)

Y_vals = np.array(Y_vals)
J_vals = np.array(J_vals)

def X(t):
if t == 0.0:
return Y_vals[0]
else:
k = np.searchsorted(J_vals, t)
return Y_vals[k-1]

return X

Let’s plot the process (𝑋𝑡 ).

T = 20
X = sim_path(T=T)

grid = np.linspace(0, T, 100)

fig, ax = plt.subplots()
ax.step(grid, [X(t) for t in grid], label="$X_t$")

ax.set(xlabel="time", ylabel="inventory")

ax.legend()
plt.show()

28 Chapter 3. The Markov Property


Continuous Time Markov Chains

As expected, inventory falls and then jumps back up to 𝑏.

3.5.3 The Embedded Jump Chain

In models such as the one described above, the embedded discrete time process (𝑌𝑘 ) is called the “embedded jump chain”.
It is easy to see that (𝑌𝑘 ) is discrete time finite state Markov chain.
Its Markov matrix 𝐾 is given by 𝐾(𝑥, 𝑦) = 𝟙{𝑦 = 𝑏} when 𝑥 = 0 and, when 0 < 𝑥 ≤ 𝑏,

⎧𝟘 if 𝑦 ≥ 𝑥
{
𝐾(𝑥, 𝑦) = ⎨P{𝑥 − 𝑈 = 𝑦} = (1 − 𝛼)𝑥−𝑦−1 𝛼 if 0 < 𝑦 < 𝑥 (3.11)
{P{𝑈 ≥ 𝑥} = (1 − 𝛼)𝑥−1 if 𝑦 = 0

3.5.4 Markov Property

The inventory model just described has the Markov property precisely because
1. the jump chain (𝑌𝑘 ) is Markov in discrete time and
2. the holding times are independent exponential draws.
Rather than providing more details on these points here, let us first describe a more general setting where the arguments
will be clearer and more useful.

3.5. A Model of Inventory Dynamics 29


Continuous Time Markov Chains

3.6 Jump Processes with Constant Rates

The examples we have focused on so far are special cases of Markov processes with constant jump intensities.
These processes turn out to be very representative (although the constant jump intensity will later be relaxed).
Let’s now summarize the model and its properties.

3.6.1 Construction

The data for a Markov process on 𝑆 with constant jump rates are
• a parameter 𝜆 > 0 called the jump rate, which governs the jump intensities and
• a Markov matrix 𝐾 on 𝑆, called the jump matrix.
To run the process we also need an initial condition 𝜓 ∈ 𝒟.
The process (𝑋𝑡 ) is constructed by holding at each state for an exponential amount of time, with rate 𝜆, and then updating
to a new state via 𝐾.
In more detail, the construction is

Algorithm 3.1 (Constant Rate Jump Chain)


Inputs 𝜓 ∈ 𝒟, positive constant 𝜆, Markov matrix 𝐾
Outputs Markov chain (𝑋𝑡 )
1. draw 𝑌0 from 𝜓
2. set 𝑘 = 1 and 𝐽0 = 0
3. draw 𝑊𝑘 from Exp(𝜆) and set 𝐽𝑘 = 𝐽𝑘−1 + 𝑊𝑘
4. set 𝑋𝑡 = 𝑌𝑘−1 for all 𝑡 such that 𝐽𝑘−1 ≤ 𝑡 < 𝐽𝑘 .
5. draw 𝑌𝑘 from 𝐾(𝑌𝑘−1 , ⋅)
6. set 𝑘 = 𝑘 + 1 and go to step 3.

An alternative, more parsimonious way to express the same process is to take


• (𝑁𝑡 ) to be a Poisson process with rate 𝜆 and
• (𝑌𝑘 ) to be a discrete time Markov chain with Markov matrix 𝐾
and then set

𝑋𝑡 ∶= 𝑌𝑁𝑡 for all 𝑡 ≥ 0

As before, the discrete time process (𝑌𝑘 ) is called the embedded jump chain.
(Not to be confused with (𝑋𝑡 ), which is often called a “jump process” or “jump chain” due to the fact that it changes
states with jumps.)
The draws (𝑊𝑘 ) are called the wait times or holding times.

30 Chapter 3. The Markov Property


Continuous Time Markov Chains

3.6.2 Examples

The Poisson process with rate 𝜆 is a jump process on 𝑆 = Z+ .


The holding times are obviously exponential with constant rate 𝜆.
The jump matrix is just 𝐾(𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 1}, so that the state jumps up by one at every 𝐽𝑘 .
The inventory model is also a jump process with constant rate 𝜆, this time on 𝑆 = {0, 1, … , 𝑏}.
The jump matrix was given in (3.11).

3.6.3 Markov Property

Let’s show that the jump process (𝑋𝑡 ) constructed above satisfies the Markov property, and obtain the Markov semigroup
at the same time.
We will use two facts:
• the jump chain (𝑌𝑘 ) has the Markov property in discrete time and
• the Poisson process has stationary independent increments.
From these facts it is intuitive that the distribution of 𝑋𝑡+𝑠 given the whole history ℱ𝑠 = {(𝑁𝑟 )𝑟≤𝑠 , (𝑌𝑘 )𝑘≤𝑁𝑠 } depends
only on 𝑋𝑠 .
Indeed, if we know 𝑋𝑠 , then we can simply
• restart the Poisson process from 𝑁𝑠 and then
• starting from 𝑋𝑠 = 𝑌𝑁𝑠 , update the embedded jump chain (𝑌𝑘 ) using 𝐾 each time a new jump occurs.
Let’s write this more mathematically.
Fixing 𝑦 ∈ 𝑆 and 𝑠, 𝑡 ≥ 0, we have

P{𝑋𝑠+𝑡 = 𝑦 | ℱ𝑠 } = P{𝑌𝑁 𝑠+𝑡


= 𝑦 | ℱ𝑠 } = P{𝑌𝑁𝑠 +𝑁𝑠+𝑡 −𝑁𝑠 = 𝑦 | ℱ𝑠 }

Recalling that 𝑁𝑠+𝑡 − 𝑁𝑠 is Poisson distributed with rate 𝑡𝜆, independent of the history ℱ𝑠 , we can write the display
above as
𝑘
P{𝑋𝑠+𝑡 = 𝑦 | ℱ𝑠 } = ∑ P{𝑌𝑁 +𝑘 = 𝑦 | ℱ𝑠 } (𝑡𝜆)
𝑠
𝑘!
𝑒−𝑡𝜆
𝑘≥0

Because the embedded jump chain is Markov with Markov matrix 𝐾, we can simplify further to
𝑘
(𝑡𝜆)𝑘 −𝑡𝜆
P{𝑋𝑠+𝑡 = 𝑦 | ℱ𝑠 } = ∑ 𝐾 𝑘 (𝑌𝑁 , 𝑦) (𝑡𝜆)
𝑘!𝑠
𝑒−𝑡𝜆 = ∑ 𝐾 𝑘 (𝑋𝑠 , 𝑦)
𝑘!
𝑒
𝑘≥0 𝑘≥0

Since the expression above depends only on 𝑋𝑠 , we have proved that (𝑋𝑡 ) has the Markov property.

3.6.4 Transition Semigroup

The Markov semigroup can be obtained from our final result, conditioning on 𝑋𝑠 = 𝑥 to get

(𝑡𝜆)𝑘
𝑃𝑡 (𝑥, 𝑦) = P{𝑋𝑠+𝑡 = 𝑦 | 𝑋𝑠 = 𝑥} = 𝑒−𝑡𝜆 ∑ 𝐾 𝑘 (𝑥, 𝑦)
𝑘≥0
𝑘!

3.6. Jump Processes with Constant Rates 31


Continuous Time Markov Chains

If 𝑆 is finite, we can write this in matrix form and use the definition of the matrix exponential to get

(𝑡𝜆𝐾)𝑘
𝑃𝑡 = 𝑒−𝑡𝜆 ∑ = 𝑒−𝑡𝜆 𝑒𝑡𝜆𝐾 = 𝑒𝑡𝜆(𝐾−𝐼)
𝑘≥0
𝑘!

This is a simple and elegant representation of the Markov semigroup that makes it easy to understand and analyze distri-
bution dynamics.
For example, if 𝑋0 has distribution 𝜓, then 𝑋𝑡 has distribution

𝜓𝑃𝑡 = 𝜓𝑒𝑡𝜆(𝐾−𝐼) (3.12)

We just need to plug in 𝜆 and 𝐾 to obtain the entire flow 𝑡 ↦ 𝜓𝑃𝑡 .


We will soon extend this representation to the case where 𝑆 is infinite.

3.7 Distribution Flows for the Inventory Model

Let’s apply these ideas to the inventory model described above.


We fix
• the parameters 𝛼, 𝑏 and 𝜆 in the inventory model and
• an initial condition 𝑋0 ∼ 𝜓0 , where 𝜓0 is an arbitrary distribution on 𝑆.
The state 𝑆 is set to {0, … , 𝑏} and the matrix 𝐾 is defined by (3.11).
Now we run time forward.
We are interested in computing the flow of distributions 𝑡 ↦ 𝜓𝑡 , where 𝜓𝑡 is the distribution of 𝑋𝑡 .
According to the theory developed above, we have two options:
Option 1 is to use simulation.
The first step is to simulate many independent observations of the process (𝑋𝑡𝑚 )𝑀
𝑚=1 .

(Here 𝑚 indicates simulation number 𝑚, which you might think of as the outcome for firm 𝑚.)
Next, for any given 𝑡, we define 𝜓𝑡̂ ∈ 𝒟 as the histogram of observations at time 𝑡, or, equivalently the cross-sectional
distribution at 𝑡:

1 𝑀
𝜓𝑡̂ (𝑥) ∶= ∑ 𝟙{𝑋𝑡 = 𝑥} (𝑥 ∈ 𝑆)
𝑀 𝑚=1

When 𝑀 is large, 𝜓𝑡̂ (𝑥) will be close to P{𝑋𝑡 = 𝑥} by the law of large numbers.
In other words, in the limit we recover 𝜓𝑡 .
Option 2 is to insert the parameters into the right hand side of (3.12) and compute 𝜓𝑡 as 𝜓0 𝑃𝑡 .
The figure below is created using option 2, with 𝛼 = 0.6, 𝜆 = 0.5 and 𝑏 = 10.
For the initial distribution we pick a binomial distribution.
Since we cannot compute the entire uncountable flow 𝑡 ↦ 𝜓𝑡 , we iterate forward 200 steps at time increments ℎ = 0.1.
In the figure, hot colors indicate initial conditions and early dates (so that the distribution “cools” over time)
In the (solved) exercises you will be asked to try to reproduce this figure.

32 Chapter 3. The Markov Property


Continuous Time Markov Chains

Fig. 3.1: Probability flows for the inventory model.

3.7. Distribution Flows for the Inventory Model 33


Continuous Time Markov Chains

3.8 Exercises

Exercise 3.1
Consider the binary (Bernoulli) distribution where outcomes 0 and 1 each have probability 0.5.
Construct two different random variables with this distribution.

Exercise 3.2
Show by direct calculation that the Poisson matrices (𝑃𝑡 ) defined in (3.9) satisfy the semigroup property (3.7).
Hints
• Recall that 𝑃𝑡 (𝑗, 𝑘) = 0 whenever 𝑗 > 𝑘.
• Consider using the binomial formula.

Exercise 3.3
Consider the distribution over 𝑆 𝑛+1 previously shown in (3.5), which is

P𝑛𝜓 (𝑥0 , 𝑥1 , … , 𝑥𝑛 ) = 𝜓(𝑥0 )𝑃 (𝑥0 , 𝑥1 ) × ⋯ × 𝑃 (𝑥𝑛−1 , 𝑥𝑛 )

Show that, for any Markov chain (𝑋𝑡 ) satisfying (3.2) and 𝑋0 ∼ 𝜓, the restriction (𝑋0 , … , 𝑋𝑛 ) has joint distribution
P𝑛𝜓 .

Exercise 3.4
Try to produce your own version of the figure Probability flows for the inventory model.
The initial condition is ψ_0 = binom.pmf(states, n, 0.25) where n = b + 1.

3.9 Solutions

Note: code is currently not supported in sphinx-exercise so code-cell solutions are immediately after this solution
block.

Solution to Exercise 3.1


This is easy.
One example is to take 𝑈 to be uniform on (0, 1) and set 𝑋 = 0 if 𝑈 < 0.5 and 1 otherwise.
Then 𝑋 has the desired distribution.
Alternatively, we could take 𝑍 to be standard normal and set 𝑋 = 0 if 𝑍 < 0 and 1 otherwise.

34 Chapter 3. The Markov Property


Continuous Time Markov Chains

Solution to Exercise 3.2


Fixing 𝑠, 𝑡 ∈ R+ and 𝑗 ≤ 𝑘, we have

(𝜆𝑠)𝑖−𝑗 (𝜆𝑡)𝑘−𝑖
∑ 𝑃𝑠 (𝑗, 𝑖)𝑃𝑡 (𝑖, 𝑘) = 𝑒−𝜆(𝑠+𝑡) ∑
𝑖≥0 𝑗≤𝑖≤𝑘
(𝑖 − 𝑗)! (𝑘 − 𝑖)!
𝑠ℓ 𝑡𝑘−𝑗−ℓ
= 𝑒−𝜆(𝑠+𝑡) 𝜆𝑘−𝑗 ∑
0≤ℓ≤𝑘−𝑗
ℓ! (𝑘 − 𝑗 − ℓ)!
𝑘 − 𝑗 𝑠ℓ 𝑡𝑘−𝑗−ℓ
= 𝑒−𝜆(𝑠+𝑡) 𝜆𝑘−𝑗 ∑ ( )
0≤ℓ≤𝑘−𝑗
ℓ (𝑘 − 𝑗)!

Applying the binomial formula, we can write this as

(𝜆(𝑠 + 𝑡))𝑘−𝑗
∑ 𝑃𝑠 (𝑗, 𝑖)𝑃𝑡 (𝑖, 𝑘) = 𝑒−𝜆(𝑠+𝑡) = 𝑃𝑠+𝑡 (𝑗, 𝑘)
𝑖≥0
(𝑘 − 𝑗)!

Hence (3.7) holds, and the semigroup property is satisfied.

Solution to Exercise 3.3


Let (𝑋𝑡 ) be a Markov chain satisfying (3.2) and 𝑋0 ∼ 𝜓.
When 𝑛 = 0, we have P𝑛𝜓 = P0𝜓 = 𝜓, and this agrees with the distribution of the restriction (𝑋0 , … , 𝑋𝑛 ) = (𝑋0 ).
Now suppose the same is true at arbitrary 𝑛 − 1, in the sense that the distribution of (𝑋0 , … , 𝑋𝑛−1 ) is equal to P𝑛−1
𝜓 as
defined above.
Then
P{𝑋0 = 𝑥0 , … , 𝑋𝑛 = 𝑥𝑛 } = P{𝑋𝑛 = 𝑥𝑛 | 𝑋0 = 𝑥0 , … , 𝑋𝑛−1 = 𝑥𝑛−1 }
×P{𝑋0 = 𝑥0 , … , 𝑋𝑛−1 = 𝑥𝑛−1 }

From the Markov property and the induction hypothesis, the right hand side is

𝑃 (𝑥𝑛−1 , 𝑥𝑛 )P𝑛−1
𝜓 (𝑥0 , 𝑥1 , … , 𝑥𝑛−1 ) = 𝑃 (𝑥𝑛−1 , 𝑥𝑛 )𝜓(𝑥0 )𝑃 (𝑥0 , 𝑥1 ) × ⋯ × 𝑃 (𝑥𝑛−2 , 𝑥𝑛−1 )

The last expression equals P𝑛𝜓 , which concludes the proof.

Solution to Exercise 3.4


Here is one approach.
(The statements involving glue are specific to this book and can be deleted by most readers. They store the output so it
can be displayed elsewhere.)

α = 0.6
λ = 0.5
b = 10
n = b + 1
states = np.arange(n)
I = np.identity(n)

(continues on next page)

3.9. Solutions 35
Continuous Time Markov Chains

(continued from previous page)


K = np.zeros((n, n))
K[0, -1] = 1
for i in range(1, n):
for j in range(0, i):
if j == 0:
K[i, j] = (1 - α)**(i-1)
else:
K[i, j] = α * (1 - α)**(i-j-1)

def P_t(ψ, t):


return ψ @ expm(t * λ * (K - I))

def plot_distribution_dynamics(ax, ψ_0, steps=200, step_size=0.1):


ψ = ψ_0
t = 0.0
colors = cm.jet_r(np.linspace(0.0, 1, steps))

for i in range(steps):
ax.bar(states, ψ, zs=t, zdir='y',
color=colors[i], alpha=0.8, width=0.4)
ψ = P_t(ψ, t=step_size)
t += step_size

ax.set_xlabel('inventory')
ax.set_ylabel('$t$')

ψ_0 = binom.pmf(states, n, 0.25)


fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
plot_distribution_dynamics(ax, ψ_0)

from myst_nb import glue


glue("flow_fig", fig, display=False)

plt.show()

36 Chapter 3. The Markov Property


Continuous Time Markov Chains

3.9. Solutions 37
Continuous Time Markov Chains

38 Chapter 3. The Markov Property


CHAPTER

FOUR

THE KOLMOGOROV BACKWARD EQUATION

4.1 Overview

As models become more complex, deriving analytical representations of the Markov semigroup (𝑃𝑡 ) becomes harder.
This is analogous to the idea that solutions to continuous time models often lack analytical solutions.
For example, when studying deterministic paths in continuous time, infinitesimal descriptions (ODEs and PDEs) are often
more intuitive and easier to write down than the associated solutions.
(This is one of the shining insights of mathematics, beginning with the work of great scientists such as Isaac Newton.)
We will see in this lecture that the same is true for continuous time Markov chains.
To help us focus on intuition in this lecture, rather than technicalities, the state space is assumed to be finite, with |𝑆| = 𝑛.
Later we will investigate the case where |𝑆| = ∞.
We will use the following imports

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit

from scipy.linalg import expm


from scipy.stats import binom

4.2 State Dependent Jump Intensities

As we have seen, continuous time Markov chains jump between states, and hence can have the form

𝑋𝑡 = ∑ 𝑌𝑘 𝟙{𝐽𝑘 ≤ 𝑡 < 𝐽𝑘+1 } (𝑡 ≥ 0)


𝑘≥0

where (𝐽𝑘 ) are jump times and (𝑌𝑘 ) are the states at each jump.
(We are assuming that 𝐽𝑘 → ∞ with probability one, so that 𝑋𝑡 is well defined for all 𝑡 ≥ 0, but this is always true for
when holding times are exponential and the state space is finite.)
In the previous lecture,
• the sequence (𝑌𝑘 ) was drawn from a Markov matrix 𝐾 and called the embedded jump chain, while
• the holding times 𝑊𝑘 ∶= 𝐽𝑘 − 𝐽𝑘−1 were IID and Exp(𝜆) for some constant jump intensity 𝜆.

39
Continuous Time Markov Chains

In this lecture, we will generalize by allowing the jump intensity to vary with the state.
This difference sounds minor but in fact it will allow us to reach full generality in our description of continuous time
Markov chains, as clarified below.

4.2.1 Motivation

As a motivating example, recall the inventory model, where we assumed that the wait time for the next customer was equal
to the wait time for new inventory.
This assumption was made purely for convenience and seems unlikely to hold true.
When we relax it, the jump intensities depend on the state.

4.2.2 Jump Chain Algorithm

We start with three primitives


1. An initial condition 𝜓,
2. a Markov matrix 𝐾 on 𝑆 satisfying 𝐾(𝑥, 𝑥) = 0 for all 𝑥 ∈ 𝑆 and
3. a function 𝜆 mapping 𝑆 to (0, ∞).
The process (𝑋𝑡 )
• starts at state 𝑥, draw from 𝜓,
• waits there for an exponential time 𝑊 with rate 𝜆(𝑥) and then
• updates to a new state 𝑦 drawn from 𝐾(𝑥, ⋅).
Now we take 𝑦 as the new state for the process and repeat.
Here is the same algorithm written more explicitly:

Algorithm 4.1 (Jump Chain Algorithm)


Inputs 𝜓 ∈ 𝒟, rate function 𝜆, Markov matrix 𝐾
Outputs Markov chain (𝑋𝑡 )
1. Draw 𝑌0 from 𝜓, set 𝐽0 = 0 and 𝑘 = 1.
2. Draw 𝑊𝑘 independently from Exp(𝜆(𝑌𝑘−1 )).
3. Set 𝐽𝑘 = 𝐽𝑘−1 + 𝑊𝑘 .
4. Set 𝑋𝑡 = 𝑌𝑘−1 for 𝑡 in [𝐽𝑘−1 , 𝐽𝑘 ).
5. Draw 𝑌𝑘 from 𝐾(𝑌𝑘−1 , ⋅).
6. Set 𝑘 = 𝑘 + 1 and go to step 2.

The sequence (𝑊𝑘 ) is drawn as an IID sequence and (𝑊𝑘 ) and (𝑌𝑘 ) are drawn independently.
The restriction 𝐾(𝑥, 𝑥) = 0 for all 𝑥 implies that (𝑋𝑡 ) actually jumps at each jump time.

40 Chapter 4. The Kolmogorov Backward Equation


Continuous Time Markov Chains

4.3 Computing the Semigroup

For the jump process (𝑋𝑡 ) with time varying intensities described in the jump chain algorithm, calculating the Markov
semigroup is not a trivial exercise.
The approach we adopt is
1. Use probabilistic reasoning to obtain an integral equation that the semigroup must satisfy.
2. Convert the integral equation into a differential equation that is easier to work with.
3. Solve this differential equation to obtain the Markov semigroup (𝑃𝑡 ).
The differential equation in question has a special name: the Kolmogorov backward equation.

4.3.1 An Integral Equation

Here is the first step in the sequence listed above.

Lemma 4.1 (An Integral Equation)


The semigroup (𝑃𝑡 ) of the jump chain with rate function 𝜆 and Markov matrix 𝐾 obeys the integral equation
𝑡
𝑃𝑡 (𝑥, 𝑦) = 𝑒−𝑡𝜆(𝑥) 𝐼(𝑥, 𝑦) + 𝜆(𝑥) ∫ (𝐾𝑃𝑡−𝜏 )(𝑥, 𝑦)𝑒−𝜏𝜆(𝑥) 𝑑𝜏 (4.1)
0

for all 𝑡 ≥ 0 and 𝑥, 𝑦 in 𝑆.

Here (𝑃𝑡 ) is the Markov semigroup of (𝑋𝑡 ), the process constructed via Algorithm 4.1, while 𝐾𝑃𝑡−𝜏 is the matrix product
of 𝐾 and 𝑃𝑡−𝜏 .

P roof. Conditioning implicitly on 𝑋0 = 𝑥, the semigroup (𝑃𝑡 ) must satisfy

𝑃𝑡 (𝑥, 𝑦) = P{𝑋𝑡 = 𝑦} = P{𝑋𝑡 = 𝑦, 𝐽1 > 𝑡} + P{𝑋𝑡 = 𝑦, 𝐽1 ≤ 𝑡} (4.2)

Regarding the first term on the right hand side of (4.2), we have

P{𝑋𝑡 = 𝑦, 𝐽1 > 𝑡} = 𝐼(𝑥, 𝑦)𝑃 {𝐽1 > 𝑡} = 𝐼(𝑥, 𝑦)𝑒−𝑡𝜆(𝑥) (4.3)

where 𝐼(𝑥, 𝑦) = 𝟙{𝑥 = 𝑦}.


For the second term on the right hand side of (4.2), we have

P{𝑋𝑡 = 𝑦, 𝐽1 ≤ 𝑡} = E [𝟙{𝐽1 ≤ 𝑡}P{𝑋𝑡 = 𝑦 | 𝑊1 , 𝑌1 }] = E [𝟙{𝐽1 ≤ 𝑡}𝑃𝑡−𝐽 (𝑌1 , 𝑦)] 1

Evaluating the expectation and using the independence of 𝐽1 and 𝑌1 , this becomes

P{𝑋𝑡 = 𝑦, 𝐽1 ≤ 𝑡} = ∫ 𝟙{𝜏 ≤ 𝑡} ∑ 𝐾(𝑥, 𝑧)𝑃𝑡−𝜏 (𝑧, 𝑦)𝜆(𝑥)𝑒−𝜏𝜆(𝑥) 𝑑𝜏
0 𝑧
𝑡
= 𝜆(𝑥) ∫ ∑ 𝐾(𝑥, 𝑧)𝑃𝑡−𝜏 (𝑧, 𝑦)𝑒−𝜏𝜆(𝑥) 𝑑𝜏
0 𝑧

Combining this result with (4.2) and (4.3) gives (4.1).

4.3. Computing the Semigroup 41


Continuous Time Markov Chains

4.3.2 Kolmogorov’s Differential Equation

We have now confirmed that the semigroup (𝑃𝑡 ) associated with the jump chain process (𝑋𝑡 ) satisfies (4.1).
Equation (4.1) is important but we can simplify it further without losing information by taking the time derivative.
This leads to our main result for the lecture

Theorem 4.1 (Kolmogorov Backward Equation)


The semigroup (𝑃𝑡 ) of the jump chain with rate function 𝜆 and Markov matrix 𝐾 satisfies the Kolmogorov backward
equation

𝑃𝑡′ = 𝑄𝑃𝑡 where 𝑄(𝑥, 𝑦) ∶= 𝜆(𝑥)(𝐾(𝑥, 𝑦) − 𝐼(𝑥, 𝑦)) (4.4)

The derivative on the left hand side of (4.4) is taken element by element, with respect to 𝑡, so that

𝑑
𝑃𝑡′ (𝑥, 𝑦) = ( 𝑃 (𝑥, 𝑦)) ((𝑥, 𝑦) ∈ 𝑆 × 𝑆)
𝑑𝑡 𝑡
The proof that differentiating (4.1) yields (4.4) is an important exercise (see below).

4.3.3 Exponential Solution

The Kolmogorov backward equation is a matrix-valued differential equation.


Recall that, for a scalar differential equation 𝑦𝑡′ = 𝑎𝑦𝑡 with constant 𝑎 and initial condition 𝑦0 , the solution is 𝑦𝑡 = 𝑒𝑡𝑎 𝑦0 .
This, along with 𝑃0 = 𝐼, encourages us to guess that the solution to Kolmogorov’s backward equation (4.4) is

𝑃𝑡 = 𝑒𝑡𝑄 (4.5)

where the right hand side is the matrix exponential, with definition

1 𝑡2
𝑒𝑡𝑄 = ∑ (𝑡𝑄)𝑘 = 𝐼 + 𝑡𝑄 + 𝑄2 + ⋯ (4.6)
𝑘≥0
𝑘! 2!

Working element by element, it is not difficult to confirm that the derivative of the exponential function 𝑡 ↦ 𝑒𝑡𝑄 is

𝑑 𝑡𝑄
𝑒 = 𝑄𝑒𝑡𝑄 = 𝑒𝑡𝑄 𝑄 (4.7)
𝑑𝑡
Hence, differentiating (4.5) gives 𝑃𝑡′ = 𝑄𝑒𝑡𝑄 = 𝑄𝑃𝑡 , which convinces us that the exponential solution satisfies (4.4).
Notice that our solution

𝑃𝑡 = 𝑒𝑡𝑄 where 𝑄(𝑥, 𝑦) ∶= 𝜆(𝑥)(𝐾(𝑥, 𝑦) − 𝐼(𝑥, 𝑦)) (4.8)

for the semigroup of the jump process (𝑋𝑡 ) associated with the jump matrix 𝐾 and the jump intensity function 𝜆 ∶ 𝑆 →
(0, ∞) is consistent with our earlier result.
In particular, we showed that, for the model with constant jump intensity 𝜆, we have 𝑃𝑡 = 𝑒𝑡𝜆(𝐾−𝐼) .
This is obviously a special case of (4.8).

42 Chapter 4. The Kolmogorov Backward Equation


Continuous Time Markov Chains

4.4 Properties of the Solution

Let’s investigate further the properties of the exponential solution.

4.4.1 Checking the Transition Semigroup Properties

While we have confirmed that 𝑃𝑡 = 𝑒𝑡𝑄 solves the Kolmogorov backward equation, we still need to check that this
solution is a Markov semigroup.

Lemma 4.2 (From Jump Chain to Semigroup)


Let 𝜆 map 𝑆 to R+ and let 𝐾 be a Markov matrix on 𝑆. If 𝑃𝑡 = 𝑒𝑡𝑄 for all 𝑡 ≥ 0, where 𝑄(𝑥, 𝑦) = 𝜆(𝑥)(𝐾(𝑥, 𝑦) −
𝐼(𝑥, 𝑦)), then (𝑃𝑡 ) is a Markov semigroup on 𝑆.

P roof. Observe first that 𝑄 has zero row sums, since

∑ 𝑄(𝑥, 𝑦) = 𝜆(𝑥) ∑(𝐾(𝑥, 𝑦) − 𝐼(𝑥, 𝑦)) = 0


𝑦 𝑦

As a small exercise, you can check that, with 1 representing a column vector of ones, the following is true

𝑄 has zero row sums ⟺ 𝑄𝑘 1 = 0 for all 𝑘 ≥ 1 (4.9)

This implies that 𝑄𝑘 1 = 0 for all 𝑘 and, as a result, for any 𝑡 ≥ 0,

𝑡2 2
𝑃𝑡 1 = 𝑒𝑡𝑄 1 = 𝐼1 + 𝑡𝑄1 + 𝑄 1 + ⋯ = 𝐼1 = 1
2!
In other words, each 𝑃𝑡 has unit row sums.
Next we check nonnegativity of all elements of 𝑃𝑡 (which can easily fail for matrix exponentials).
To this end, adopting an argument from [Stroock, 2013], we set 𝑚 ∶= max𝑥 𝜆(𝑥) and 𝑃 ̂ ∶= 𝐼 + 𝑄/𝑚.
It is not difficult to check that 𝑃 ̂ is a Markov matrix and 𝑄 = 𝑚(𝑃 ̂ − 𝐼).
Recalling that, for matrix exponentials, 𝑒𝐴+𝐵 = 𝑒𝐴 𝑒𝐵 whenever 𝐴𝐵 = 𝐵𝐴, we have

̂ ̂ (𝑡𝑚)2 ̂ 2
𝑒𝑡𝑄 = 𝑒𝑡𝑚(𝑃 −𝐼) = 𝑒−𝑡𝑚𝐼 𝑒𝑡𝑚𝑃 = 𝑒−𝑡𝑚 (𝐼 + 𝑡𝑚𝑃 ̂ + 𝑃 + ⋯)
2!

It is clear from this representation that all entries of 𝑒𝑡𝑄 are nonnegative.
Finally, we need to check the continuity condition 𝑃𝑡 (𝑥, 𝑦) → 𝐼(𝑥, 𝑦) as 𝑡 → 0, which is also part of the definition of
a Markov semigroup. This is immediate, in the present case, because the exponential function is continuous, and hence
𝑃𝑡 = 𝑒𝑡𝑄 → 𝑒0 = 𝐼.

We can now be reassured that our solution to the Kolmogorov backward equation is indeed a Markov semigroup.

4.4. Properties of the Solution 43


Continuous Time Markov Chains

4.4.2 Uniqueness

Might there be another, entirely different Markov semigroup that also satisfies the Kolmogorov backward equation?
The answer is no: linear ODEs in finite dimensional vector space with constant coefficients and fixed initial conditions (in
this case 𝑃0 = 𝐼) have unique solutions.
In fact it’s not hard to supply a proof — see the exercises.

4.5 Application: The Inventory Model

Let us look at a modified version of the inventory model where jump intensities depend on the state.
In particular, the wait time for new inventory will now be exponential at rate 𝛾.
The arrival rate for customers will still be denoted by 𝜆 and allowed to differ from 𝛾.
For parameters we take
α = 0.6
λ = 0.5
γ = 0.1
b = 10

Our plan is to investigate the distribution 𝜓𝑇 of 𝑋𝑇 at 𝑇 = 30.


We will do this by simulating many independent draws of 𝑋𝑇 and histogramming them.
(In the exercises you are asked to calculate 𝜓𝑇 a different way, via (4.8).)
@njit
def draw_X(T, X_0, max_iter=5000):
"""
Generate one draw of X_T given X_0.
"""

J, Y = 0, X_0
m = 0

while m < max_iter:


s = 1/γ if Y == 0 else 1/λ
W = np.random.exponential(scale=s) # W ~ E(λ)
J += W
if J >= T:
return Y
# Otherwise update Y
if Y == 0:
Y = b
else:
U = np.random.geometric(α)
Y = Y - min(Y, U)
m += 1

@njit
def independent_draws(T=10, num_draws=100):
"Generate a vector of independent draws of X_T."

(continues on next page)

44 Chapter 4. The Kolmogorov Backward Equation


Continuous Time Markov Chains

(continued from previous page)


draws = np.empty(num_draws, dtype=np.int64)

for i in range(num_draws):
X_0 = np.random.binomial(b+1, 0.25)
draws[i] = draw_X(T, X_0)

return draws

T = 30
n = b + 1
draws = independent_draws(T, num_draws=100_000)
fig, ax = plt.subplots()

ax.bar(range(n), [np.mean(draws == i) for i in range(n)], width=0.8, alpha=0.6)


ax.set_xlabel("inventory", fontsize=14)

plt.show()

If you experiment with the code above, you will see that the large amount of mass on zero is due to the low arrival rate 𝛾
for inventory.

4.5. Application: The Inventory Model 45


Continuous Time Markov Chains

4.6 Exercises

Exercise 4.1
In the discussion above, we generated an approximation of 𝜓𝑇 when 𝑇 = 30, the initial condition is Binomial(𝑛, 0.25)
and parameters are set to

α = 0.6
λ = 0.5
γ = 0.1
b = 10

The calculation was done by simulating independent draws and histogramming.


Try to generate the same figure using (4.8) instead, modifying code from our lecture on the Markov property.

Exercise 4.2
Prove that differentiating (4.1) at each (𝑥, 𝑦) yields (4.4).

Exercise 4.3
We claimed above that the solution 𝑃𝑡 = 𝑒𝑡𝑄 is the unique Markov semigroup satisfying the backward equation 𝑃𝑡′ =
𝑄𝑃𝑡 .
Try to supply a proof.
(This is not an easy exercise but worth thinking about in any case.)

4.7 Solutions

Note: code is currently not supported in sphinx-exercise so code-cell solutions are immediately after this solution
block.

Solution to Exercise 4.1


Here is one solution:

α = 0.6
λ = 0.5
γ = 0.1
b = 10

states = np.arange(n)
I = np.identity(n)

# Embedded jump chain matrix


(continues on next page)

46 Chapter 4. The Kolmogorov Backward Equation


Continuous Time Markov Chains

(continued from previous page)


K = np.zeros((n, n))
K[0, -1] = 1
for i in range(1, n):
for j in range(0, i):
if j == 0:
K[i, j] = (1 - α)**(i-1)
else:
K[i, j] = α * (1 - α)**(i-j-1)

# Jump intensities as a function of the state


r = np.ones(n) * λ
r[0] = γ

# Q matrix
Q = np.empty_like(K)
for i in range(n):
for j in range(n):
Q[i, j] = r[i] * (K[i, j] - I[i, j])

def P_t(ψ, t):


return ψ @ expm(t * Q)

ψ_0 = binom.pmf(states, n, 0.25)


ψ_T = P_t(ψ_0, T)

fig, ax = plt.subplots()

ax.bar(range(n), ψ_T, width=0.8, alpha=0.6)


ax.set_xlabel("inventory", fontsize=14)

plt.show()

4.7. Solutions 47
Continuous Time Markov Chains

Solution to Exercise 4.2


One can easily verify that, when 𝑓 is a differentiable function and 𝛼 > 0, we have

𝑔(𝑡) = 𝑒−𝑡𝛼 𝑓(𝑡) ⟹ 𝑔′ (𝑡) = 𝑒−𝑡𝛼 𝑓 ′ (𝑡) − 𝛼𝑔(𝑡) (4.10)

Note also that, with the change of variable 𝑠 = 𝑡 − 𝜏 , we can rewrite (4.1) as
𝑡
𝑃𝑡 (𝑥, 𝑦) = 𝑒−𝑡𝜆(𝑥) {𝐼(𝑥, 𝑦) + 𝜆(𝑥) ∫ (𝐾𝑃𝑠 )(𝑥, 𝑦)𝑒𝑠𝜆(𝑥) 𝑑𝑠} (4.11)
0

Applying (4.10) yields

𝑃𝑡′ (𝑥, 𝑦) = 𝑒−𝑡𝜆(𝑥) {𝜆(𝑥)(𝐾𝑃𝑡 )(𝑥, 𝑦)𝑒𝑡𝜆(𝑥) } − 𝜆(𝑥)𝑃𝑡 (𝑥, 𝑦)

After minor rearrangements this becomes

𝑃𝑡′ (𝑥, 𝑦) = 𝜆(𝑥)[(𝐾 − 𝐼)𝑃𝑡 ](𝑥, 𝑦)

which is identical to (4.4).

Solution to Exercise 4.3


Here is one proof of uniqueness.
Suppose that (𝑃𝑡̂ ) is another Markov semigroup satisfying 𝑃𝑡′ = 𝑄𝑃𝑡 .
̂ for all 𝑠 ≥ 0.
Fix 𝑡 > 0 and let 𝑉𝑠 be defined by 𝑉𝑠 = 𝑃𝑠 𝑃𝑡−𝑠
Note that 𝑉0 = 𝑃𝑡̂ and 𝑉𝑡 = 𝑃𝑡 .
Note also that 𝑠 ↦ 𝑉𝑠 is differentiable, with derivative

̂ − 𝑃𝑠 𝑃𝑡−𝑠
𝑉𝑠′ = 𝑃𝑠′ 𝑃𝑡−𝑠 ̂ − 𝑃𝑠 𝑄𝑃𝑡−𝑠
̂ ′ = 𝑃𝑠 𝑄𝑃𝑡−𝑠 ̂ =0

where, in the second last equality, we used (4.7).


Hence 𝑉𝑠 is constant, so our previous observations 𝑉0 = 𝑃𝑡̂ and 𝑉𝑡 = 𝑃𝑡 now yield 𝑃𝑡̂ = 𝑃𝑡 .
Since 𝑡 was arbitrary, the proof is now done.

48 Chapter 4. The Kolmogorov Backward Equation


CHAPTER

FIVE

THE KOLMOGOROV FORWARD EQUATION

5.1 Overview

In this lecture we approach continuous time Markov chains from a more analytical perspective.
The emphasis will be on describing distribution flows through vector-valued differential equations and their solutions.
These distribution flows show how the time 𝑡 distribution associated with a given Markov chain (𝑋𝑡 ) changes over time.
Distribution flows will be identified with initial value problems generated by autonomous linear ordinary differential
equations (ODEs) in vector space.
We will see that the solutions of these flows are described by Markov semigroups.
This leads us back to the theory we have already constructed – some care will be taken to clarify all the connections.
In order to avoid being distracted by technicalities, we continue to defer our treatment of infinite state spaces, assuming
throughout this lecture that |𝑆| = 𝑛.
As before, 𝒟 is the set of all distributions on 𝑆.
We will use the following imports

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit
from scipy.linalg import expm

from matplotlib import cm


from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Poly3DCollection

5.2 From Difference Equations to ODEs

Previously we generated this figure, which shows how distributions evolve over time for the inventory model under a
certain parameterization:
(Hot colors indicate early dates and cool colors denote later dates.)
We also learned how this flow is related to the Kolmogorov backward equation, which is an ODE.
In this section we examine distribution flows and their connection to ODEs and continuous time Markov chains more
systematically.

49
Continuous Time Markov Chains

Fig. 5.1: Probability flows for the inventory model.

50 Chapter 5. The Kolmogorov Forward Equation


Continuous Time Markov Chains

5.2.1 Review of the Discrete Time Case

Let (𝑋𝑡 ) be a discrete time Markov chain with Markov matrix 𝑃 .


Recall that, in the discrete time case, the distribution 𝜓𝑡 of 𝑋𝑡 updates according to

𝜓𝑡+1 = 𝜓𝑡 𝑃 , 𝜓0 a given element of 𝒟,

where distributions are understood as row vectors.


Here’s a visualization for the case 𝑆 = {0, 1, 2}, so that 𝒟 is the standard simplex in R3 .
The initial condition is (0, 0, 1) and the Markov matrix is

P = ((0.9, 0.1, 0.0),


(0.4, 0.4, 0.2),
(0.1, 0.1, 0.8))

There’s a sense in which a discrete time Markov chain “is” a homogeneous linear difference equation in distribution space.
To clarify this, suppose we take 𝐺 to be a linear map from 𝒟 to itself and write down the difference equation

𝜓𝑡+1 = 𝐺(𝜓𝑡 ) with 𝜓0 ∈ 𝒟 given. (5.1)

Because 𝐺 is a linear map from a finite dimensional space to itself, it can be represented by a matrix.
Moreover, a matrix 𝑃 is a Markov matrix if and only if 𝜓 ↦ 𝜓𝑃 sends 𝒟 into itself (check it if you haven’t already).

5.2. From Difference Equations to ODEs 51


Continuous Time Markov Chains

So, under the stated conditions, our difference equation (5.1) uniquely identifies a Markov matrix, along with an initial
condition 𝜓0 .
Together, these objects identify the joint distribution of a discrete time Markov chain, as previously described.

5.2.2 Shifting to Continuous Time

We have just argued that a discrete time Markov chain can be identified with a linear difference equation evolving in 𝒟.
This strongly suggests that a continuous time Markov chain can be identified with a linear ODE evolving in 𝒟.
This intuition is correct and important.
The rest of the lecture maps out the main ideas.

5.3 ODEs in Distribution Space

Consider linear differential equation given by

𝜓𝑡′ = 𝜓𝑡 𝑄, 𝜓0 a given element of 𝒟, (5.2)

where
• 𝑄 is an 𝑛 × 𝑛 matrix,
• distributions are again understood as row vectors, and
• derivatives are taken element by element, so that
𝑑 𝑑
𝜓𝑡′ = ( 𝑑𝑡 𝜓𝑡 (𝑥1 ) ⋯ 𝑑𝑡 𝜓𝑡 (𝑥𝑛 ))

5.3.1 Solutions to Linear Vector ODEs

Using the matrix exponential, the unique solution to the initial value problem (5.2) is

𝜓𝑡 = 𝜓0 𝑃𝑡 where 𝑃𝑡 ∶= 𝑒𝑡𝑄 (5.3)

To check that (5.3) is a solution, we use (4.7) again to get


𝑑
𝑃 = 𝑄𝑒𝑡𝑄 = 𝑒𝑡𝑄 𝑄
𝑑𝑡 𝑡
The first equality can be written as 𝑃𝑡′ = 𝑄𝑃𝑡 and this is just the Kolmogorov backward equation.
The second equality can be written as

𝑃𝑡′ = 𝑃𝑡 𝑄

and is called the Kolmogorov forward equation.


Applying the Kolmogorov forward equation, we obtain
𝑑 𝑑 𝑑
𝜓𝑡 = 𝜓0 𝑃𝑡 = 𝜓0 𝑃𝑡 = 𝜓0 𝑃𝑡 𝑄 = 𝜓𝑡 𝑄
𝑑𝑡 𝑑𝑡 𝑑𝑡
This confirms that (5.3) solves (5.2).
Here’s an example of three distribution flows with dynamics generated by (5.2), one starting from each vertex.
The code uses (5.3) with matrix 𝑄 given by

52 Chapter 5. The Kolmogorov Forward Equation


Continuous Time Markov Chains

Q = ((-3, 2, 1),
(3, -5, 2),
(4, 6, -10))

(Distributions cool over time, so initial conditions are hot colors.)

5.3.2 Forwards vs Backwards Equations

As the above discussion shows, we can take the Kolmogorov forward equation 𝑃𝑡′ = 𝑃𝑡 𝑄 and premultiply by any distri-
bution 𝜓0 to get the distribution ODE 𝜓𝑡′ = 𝜓𝑡 𝑄.
In this sense, we can understand the Kolmogorov forward equation as pushing distributions forward in time.
Analogously, we can take the Kolmogorov backward equation 𝑃𝑡′ = 𝑄𝑃𝑡 and postmultiply by any vector ℎ to get

(𝑃𝑡 ℎ)′ = 𝑄𝑃𝑡 ℎ

Recalling that (𝑃𝑡 ℎ)(𝑥) = E [ℎ(𝑋𝑡 ) | 𝑋0 = 𝑥], this vector ODE tells us how expectations evolve, conditioning backward
to time zero.
Both the forward and the backward equations uniquely pin down the same solution 𝑃𝑡 = 𝑒𝑡𝑄 when combined with the
initial condition 𝑃0 = 𝐼.

5.3. ODEs in Distribution Space 53


Continuous Time Markov Chains

5.3.3 Matrix- vs Vector-Valued ODEs

The ODE 𝜓𝑡′ = 𝜓𝑡 𝑄 is sometimes called the Fokker–Planck equation (although this terminology is most commonly
used in the context of diffusions).
It is a vector-valued ODE that describes the evolution of a particular distribution path.
By comparison, the Kolmogorov forward equation is (like the backward equation) a differential equation in matrices.
(And matrices are really maps, which send vectors into vectors.)
Operating at this level is less intuitive and more abstract than working with the Fokker–Planck equation.
But, in the end, the object that we want to describe is a Markov semigroup.
The Kolmogorov forward and backward equations are the ODEs that define this fundamental object.

5.3.4 Preserving Distributions

In the simulation above, 𝑄 was chosen with some care, so that the flow remains in 𝒟.
What are the exact properties we require on 𝑄 such that 𝜓𝑡 is always in 𝒟?
This is an important question, because we are setting up an exact correspondence between linear ODEs that evolve in 𝒟
and continuous time Markov chains.
Recall that the linear update rule 𝜓 ↦ 𝜓𝑃 is invariant on 𝒟 if and only if 𝑃 is a Markov matrix.
So now we can rephrase our key question regarding invariance on 𝒟:
What properties do we need to impose on 𝑄 so that 𝑃𝑡 = 𝑒𝑡𝑄 is a Markov matrix for all 𝑡?
A square matrix 𝑄 is called an intensity matrix if 𝑄 has zero row sums and 𝑄(𝑥, 𝑦) ≥ 0 whenever 𝑥 ≠ 𝑦.

Theorem 5.1
If 𝑄 is a matrix on 𝑆 and 𝑃𝑡 ∶= 𝑒𝑡𝑄 , then the following statements are equivalent:
1. 𝑃𝑡 is a Markov matrix for all 𝑡.
2. 𝑄 is an intensity matrix.

The proof is related to that of Lemma 4.2 and is found as a solved exercise below.

Corollary 5.1
If 𝑄 is an intensity matrix on finite 𝑆 and 𝑃𝑡 = 𝑒𝑡𝑄 for all 𝑡 ≥ 0, then (𝑃𝑡 ) is a Markov semigroup.

We call (𝑃𝑡 ) the Markov semigroup generated by 𝑄.


Later we will see that this result extends to the case |𝑆| = ∞ under some mild restrictions on 𝑄.

54 Chapter 5. The Kolmogorov Forward Equation


Continuous Time Markov Chains

5.4 Jump Chains

Let’s return to the chain (𝑋𝑡 ) created from jump chain pair (𝜆, 𝐾) in Algorithm 4.1.
We found that the semigroup is given by

𝑃𝑡 = 𝑒𝑡𝑄 where 𝑄(𝑥, 𝑦) ∶= 𝜆(𝑥)(𝐾(𝑥, 𝑦) − 𝐼(𝑥, 𝑦))

Using the fact that 𝐾 is a Markov matrix and the jump rate function 𝜆 is nonnegative, you can easily check that 𝑄 satisfies
the definition of an intensity matrix.
Hence (𝑃𝑡 ), the Markov semigroup for the jump chain (𝑋𝑡 ), is the semigroup generated by the intensity matrix 𝑄(𝑥, 𝑦) =
𝜆(𝑥)(𝐾(𝑥, 𝑦) − 𝐼(𝑥, 𝑦)).
We can differentiate 𝑃𝑡 = 𝑒𝑡𝑄 to obtain the Kolmogorov forward equation 𝑃𝑡′ = 𝑃𝑡 𝑄.
We can then premultiply by 𝜓0 ∈ 𝒟 to get 𝜓𝑡′ = 𝜓𝑡 𝑄, which is the Fokker–Planck equation.
More explicitly, for given 𝑦 ∈ 𝑆,

𝜓𝑡′ (𝑦) = ∑ 𝜓𝑡 (𝑥)𝜆(𝑥)𝐾(𝑥, 𝑦) − 𝜓𝑡 (𝑦)𝜆(𝑦)


𝑥≠𝑦

The rate of probability flow into 𝑦 is equal to the inflow from other states minus the outflow.

5.5 Summary

We have seen that any intensity matrix 𝑄 on 𝑆 defines a Markov semigroup via 𝑃𝑡 = 𝑒𝑡𝑄 .
Henceforth, we will say that (𝑋𝑡 ) is a Markov chain with intensity matrix 𝑄 if (𝑋𝑡 ) is a Markov chain with Markov
semigroup (𝑒𝑡𝑄 ).
While our discussion has been in the context of a finite state space, later we will see that these ideas carry over to an
infinite state setting under mild restrictions.
We have also hinted at the fact that every continuous time Markov chain is a Markov chain with intensity matrix 𝑄 for
some suitably chosen 𝑄.
Later we will prove this to be universally true when 𝑆 is finite and true under mild conditions when 𝑆 is countably infinite.
Intensity matrices are important because
1. they are the natural infinitesimal descriptions of Markov semigroups,
2. they are often easy to write down in applications and
3. they provide an intuitive description of dynamics.
Later, we will see that, for a given intensity matrix 𝑄, the elements are understood as follows:
• when 𝑥 ≠ 𝑦, the value 𝑄(𝑥, 𝑦) is the “rate of leaving 𝑥 for 𝑦” and
• −𝑄(𝑥, 𝑥) ≥ 0 is the “rate of leaving 𝑥” .

5.4. Jump Chains 55


Continuous Time Markov Chains

5.6 Exercises

Exercise 5.1
Let (𝑃𝑡 ) be a Markov semigroup such that 𝑡 ↦ 𝑃𝑡 (𝑥, 𝑦) is differentiable at all 𝑡 ≥ 0 and (𝑥, 𝑦) ∈ 𝑆 × 𝑆.
(The derivative at 𝑡 = 0 is the usual right derivative.)
Define (pointwise, at each (𝑥, 𝑦)),

𝑃ℎ − 𝐼
𝑄 ∶= 𝑃0′ = lim (5.4)
ℎ↓0 ℎ

Assuming that this limit exists, and hence 𝑄 is well-defined, show that

𝑃𝑡′ = 𝑃𝑡 𝑄 and 𝑃𝑡′ = 𝑄𝑃𝑡

both hold. (These are the Kolmogorov forward and backward equations.)

Exercise 5.2
Recall our model of jump chains with state-dependent jump intensities given by rate function 𝑥 ↦ 𝜆(𝑥).
After a wait time with exponential rate 𝜆(𝑥) ∈ (0, ∞), the state transitions from 𝑥 to 𝑦 with probability 𝐾(𝑥, 𝑦).
We found that the associated semigroup (𝑃𝑡 ) satisfies the Kolmogorov backward equation 𝑃𝑡′ = 𝑄𝑃𝑡 with

𝑄(𝑥, 𝑦) ∶= 𝜆(𝑥)(𝐾(𝑥, 𝑦) − 𝐼(𝑥, 𝑦)) (5.5)

Show that 𝑄 is an intensity matrix and that (5.4) holds.

Exercise 5.3
Prove Theorem 5.1 by adapting the arguments in Lemma 4.2. (This is nontrivial but worth at least trying.)
Hint: The constant 𝑚 in the proof can be set to max𝑥 |𝑄(𝑥, 𝑥)|.

5.7 Solutions

Solution to Exercise 5.1


Let (𝑃𝑡 ) be a Markov semigroup and let 𝑄 be as defined in the statement of the exercise.
Fix 𝑡 ≥ 0 and ℎ > 0.
Combining the semigroup property and linearity with the restriction 𝑃0 = 𝐼, we get

𝑃𝑡+ℎ − 𝑃𝑡 𝑃 𝑃 − 𝑃𝑡 𝑃 (𝑃 − 𝐼)
= 𝑡 ℎ = 𝑡 ℎ
ℎ ℎ ℎ
Taking ℎ ↓ 0 and using the definition of 𝑄 give 𝑃𝑡′ = 𝑃𝑡 𝑄, which is the Kolmogorov forward equation.

56 Chapter 5. The Kolmogorov Forward Equation


Continuous Time Markov Chains

For the backward equation we observe that

𝑃𝑡+ℎ − 𝑃𝑡 𝑃 𝑃 − 𝑃𝑡 (𝑃 − 𝐼)𝑃𝑡
= ℎ 𝑡 = ℎ
ℎ ℎ ℎ
also holds. Taking ℎ ↓ 0 gives the Kolmogorov backward equation.

Solution to Exercise 5.2


Let 𝑄 be as defined in (5.5).
We need to show that 𝑄 is nonnegative off the diagonal and has zero row sums.
The first assertion is immediate from nonnegativity of 𝐾 and 𝜆.
For the second, we use the fact that 𝐾 is a Markov matrix, so that, with 1 as a column vector of ones,

𝑄1 = 𝜆(𝐾1 − 1) = 𝜆(1 − 1) = 0

Solution to Exercise 5.3


Suppose that 𝑄 is an intensity matrix, fix 𝑡 ≥ 0 and set 𝑃𝑡 = 𝑒𝑡𝑄 .
The proof from Lemma 4.2 that 𝑃𝑡 has unit row sums applies directly to the current case.
The proof of nonnegativity of 𝑃𝑡 can be applied after some modifications.
To this end, set 𝑚 ∶= max𝑥 |𝑄(𝑥, 𝑥)| and 𝑃 ̂ ∶= 𝐼 + 𝑄/𝑚.
You can check that 𝑃 ̂ is a Markov matrix and that 𝑄 = 𝑚(𝑃 ̂ − 𝐼).
The rest of the proof of nonnegativity of 𝑃𝑡 is unchanged and we will not repeat it.
We conclude that 𝑃𝑡 is a Markov matrix.
Regarding the converse implication, suppose that 𝑃𝑡 = 𝑒𝑡𝑄 is a Markov matrix for all 𝑡 and let 1 be a column vector of
ones.
Because 𝑃𝑡 has unit row sums and differentiation is linear, we can employ the Kolmogorov backward equation to obtain

𝑑 𝑑 𝑑
𝑄1 = 𝑄𝑃𝑡 1 = ( 𝑃 ) 1 = (𝑃𝑡 1) = 1 = 0
𝑑𝑡 𝑡 𝑑𝑡 𝑑𝑡
Hence 𝑄 has zero row sums.
We can use the definition of the matrix exponential to obtain, for any 𝑥, 𝑦 and 𝑡 ≥ 0,

𝑃𝑡 (𝑥, 𝑦) = 𝟙{𝑥 = 𝑦} + 𝑡𝑄(𝑥, 𝑦) + 𝑜(𝑡) (5.6)

From this equality and the assumption that 𝑃𝑡 is a Markov matrix for all 𝑡, we see that the off diagonal elements of 𝑄
must be nonnegative.
Hence 𝑄 is an intensity matrix.

5.7. Solutions 57
Continuous Time Markov Chains

58 Chapter 5. The Kolmogorov Forward Equation


CHAPTER

SIX

SEMIGROUPS AND GENERATORS

6.1 Overview

We have seen in previous lectures that every intensity matrix generates a Markov semigroup.
We have also hinted that the pairing is one-to-one, in a sense to be made precise.
To clarify these ideas, we start in an abstract setting, with an arbitrary initial value problem.
In this setting we introduce general operator semigroups and their generators.
Once this is done, we will be able to return to the Markov case and fully clarify the connection between intensity matrices
and Markov semigroups.
The material below is relatively technical, with most of the complications driven by the fact that the state space can be
infinite.
Such technicalities are hard to avoid, since so many interesting Markov chains do have infinite state spaces.
• Our very first example – the Poisson process – has an infinite state space.
• Another example is the study of queues, which often have no natural upper bound.1
Readers are assumed to have some basic familiarity with Banach spaces.

6.2 Motivation

The general theory of continuous semigroups of operators is motivated by the problem of solving linear ODEs in infinite
dimensional spaces.2
More specifically, the challenge is to solve initial value problems such as

𝑥′𝑡 = 𝐴𝑥𝑡 , 𝑥0 given (6.1)

where
• 𝑥𝑡 takes value in a Banach space at each time 𝑡,
• 𝐴 is a linear operator and
• the time derivative 𝑥′𝑡 uses a definition appropriate for a Banach space.
1 In fact a major concern with queues is that their length does not explode. This issue cannot be properly explored unless the state space is allowed

to be infinite.
2 An excellent introduction to operator semigroups, combined with applications to PDEs and Markov processes, can be found in [Applebaum, 2019].

59
Continuous Time Markov Chains

This problem is also called the “abstract Cauchy problem”.


Why do we need to solve such problems?
One example comes from PDEs.
PDEs tell us how functions change over time, starting from an infinitesimal description.
When 𝑥𝑡 is a point in a function space, this fits into the framework of (6.1).
Another example comes from Markov processes, where, as we have seen, the flow of distributions over time can be
represented as a linear ODE in distribution space.
If the number of state is infinite, then the space of distributions is infinite dimensional.
This is another version of (6.1), and we return to it after a discussion of the general theory.
To give a high level view of the results below, the solution to the Cauchy problem is represented as a trajectory 𝑡 ↦ 𝑈𝑡 𝑥0
from the initial value 𝑥0 under a semigroup of maps (𝑈𝑡 ).
The operator 𝐴 in (6.1) is called the “generator” of (𝑈𝑡 ) and is its infinitesimal description.

6.3 Preliminaries

Throughout this lecture, (𝔹, ‖ ⋅ ‖) is a Banach space.

6.3.1 The Space of Linear Operators

You will recall that a linear operator on 𝔹 is a map 𝐴 from 𝔹 to itself satisfying

𝐴(𝛼𝑔 + 𝛽ℎ) = 𝛼𝐴𝑔 + 𝛽𝐴ℎ, ∀ 𝑔, ℎ ∈ 𝔹, 𝛼, 𝛽 ∈ R

The operator 𝐴 is called bounded if

‖𝐴‖ ∶= sup ‖𝐴𝑔‖ < ∞ (6.2)


𝑔∈𝔹, ‖𝑔‖≤1

This is the usual definition of a bounded linear operator on a normed linear space.
The set of all bounded linear operators on 𝔹 is denoted by ℒ(𝔹) and is itself a Banach space.
Sums and scalar products of elements of ℒ(𝔹) are defined in the usual way, so that, for 𝛼 ∈ R, 𝐴, 𝐵 ∈ ℒ(𝔹) and 𝑔 ∈ 𝔹,
we have

(𝐴 + 𝐵)𝑔 = 𝐴𝑔 + 𝐵𝑔, (𝛼𝐴)𝑔 = 𝛼(𝐴𝑔)

and so on.
We write 𝐴𝐵 to indicate composition of the operators 𝐴, 𝐵 ∈ ℒ(𝔹).
The value defined in (6.2) is called the operator norm of 𝐴 and, as suggested by the notation, is a norm on ℒ(𝔹).
In addition to being a norm, it enjoys the submultiplicative property ‖𝐴𝐵‖ ≤ ‖𝐴‖‖𝐵‖ for all 𝐴, 𝐵 ∈ ℒ(𝔹).
Let 𝐼 be the identity in ℒ(𝔹), satisfying 𝐼𝑔 = 𝑔 for all 𝑔 ∈ 𝔹.
(In fact ℒ(𝔹) is a unital Banach algebra when multiplication is identified with operator composition and 𝐼 is adopted as
the unit.)

60 Chapter 6. Semigroups and Generators


Continuous Time Markov Chains

6.3.2 The Exponential Function

Given 𝐴 ∈ ℒ(𝔹), the exponential of 𝐴 is the element of ℒ(𝔹) defined as

𝐴𝑘 𝐴2
𝑒𝐴 = ∑ =𝐼 +𝐴+ +⋯ (6.3)
𝑘≥0
𝑘! 2!

This is the same as the definition for the matrix exponential. The exponential function arises naturally as the solution
to ODEs in Banach space, one example of which (as we shall see) is distribution flows associated with continuous time
Markov chains.
The exponential map has the following properties:
• For each 𝐴 ∈ ℒ(𝔹), the operator 𝑒𝐴 is a well defined element of ℒ(𝔹) with ‖𝑒𝐴 ‖ ≤ 𝑒‖𝐴‖ .3
• 𝑒0 = 𝐼, where 0 is the zero element of ℒ(𝔹).
• If 𝐴, 𝐵 ∈ ℒ(𝔹) and 𝐴𝐵 = 𝐵𝐴, then 𝑒𝐴+𝐵 = 𝑒𝐴 𝑒𝐵
• If 𝐴 ∈ ℒ(𝔹), then 𝑒𝐴 is invertible and (𝑒𝐴 )−1 = 𝑒−𝐴 .
The last fact is easily checked from the previous ones.

6.3.3 Operator Calculus

Consider a function

R+ ∋ 𝑡 ↦ 𝑈𝑡 ∈ ℒ(𝔹)
which we can think of as a time path in ℒ(𝔹), such as a flow of Markov operators.
We say that this function is differentiable at 𝜏 ∈ R+ if there exists an element 𝑇 of ℒ(𝔹) such that

𝑈𝜏+ℎ − 𝑈𝜏
→ 𝑇 as ℎ → 0 (6.4)

In this case, 𝑇 is called the derivative of the function 𝑡 ↦ 𝑈𝑡 at 𝜏 and we write

𝑑
𝑇 = 𝑈𝜏′ or 𝑇 = 𝑈 ∣
𝑑𝑡 𝑡 𝑡=𝜏
(Convergence of operators is in operator norm. If 𝜏 = 0, then the limit ℎ → 0 in (6.4) is the right limit.)

Example
If 𝑈𝑡 = 𝑡𝑉 for some fixed 𝑉 ∈ ℒ(𝔹), then it is easy to see that 𝑉 is the derivative of 𝑡 ↦ 𝑈𝑡 at every 𝑡 ∈ R+ .

Example
In our discussion of the Kolmogorov forward equation when 𝑆 is finite, we introduced the derivative of a map 𝑡 ↦ 𝑃𝑡 ,
where each 𝑃𝑡 is a matrix on 𝑆.
The derivative was defined by differentiating 𝑃𝑡 element-by-element.
This coincides with the operator-theoretic definition in (6.4) when 𝑆 is finite, because then the space ℒ(ℓ1 ), which consists
of all bounded linear operators on ℓ1 , is finite dimensional, and hence pointwise and norm convergence coincide.

3 Convergence of the sum in (6.3) follows from boundedness of 𝐴 and the fact that ℒ(𝔹) is a Banach space.

6.3. Preliminaries 61
Continuous Time Markov Chains

Analogous to the matrix and scalar cases, we have the following result:

Lemma (Differentiability of the Exponential Curve)


For all 𝐴 ∈ ℒ(𝔹), the exponential curve 𝑡 ↦ 𝑒𝑡𝐴 is everywhere differentiable and

𝑑 𝑡𝐴
𝑒 = 𝑒𝑡𝐴 𝐴 = 𝐴𝑒𝑡𝐴 (6.5)
𝑑𝑡

The proof is a (solved) exercise (see below).

6.4 Semigroups and Generators

For continuous time Markov chains where the state space 𝑆 is finite, we saw that Markov semigroups often take the form
𝑃𝑡 = 𝑒𝑡𝑄 for some intensity matrix 𝑄.
This is ideal because the entire semigroup is characterized in a simple way by its infinitesimal description 𝑄.
It turns out that, when 𝑆 is finite, this is always true: if (𝑃𝑡 ) is a Markov semigroup, then there exists an intensity matrix
𝑄 satisfying 𝑃𝑡 = 𝑒𝑡𝑄 for all 𝑡.
Moreover, this statement is again true when 𝑆 is infinite, provided that some restrictions are placed on the semigroup.
Our aim is to make these statements precise, starting in an abstract setting and then specializing.

6.4.1 Operator Semigroups

Let 𝑈𝑡 be an element of ℒ(𝔹) for all 𝑡 ∈ R+


We say that (𝑈𝑡 ) is an evolution semigroup on ℒ(𝔹) if 𝑈0 = 𝐼 and 𝑈𝑠+𝑡 = 𝑈𝑠 𝑈𝑡 for all 𝑠, 𝑡 ≥ 0.
The idea is that (𝑈𝑡 ) generates a path in 𝔹 from any starting point 𝑔 ∈ 𝔹, so that 𝑈𝑡 𝑔 is interpreted as the location of the
state after 𝑡 units of time.
An evolution semigroup (𝑈𝑡 ) is called
• a 𝐶0 semigroup on 𝔹 if, for each 𝑔 ∈ 𝔹, the map 𝑡 ↦ 𝑈𝑡 𝑔 from R+ to 𝔹 is continuous, and
• a uniformly continuous semigroup on 𝔹 if the map 𝑡 ↦ 𝑈𝑡 from R+ to ℒ(𝔹) is continuous.
In what follows we abbreviate “uniformly continuous” to UC.4

Example (Exponential curves are UC semigroups)


If 𝑈𝑡 = 𝑒𝑡𝐴 for 𝑡 ∈ R+ and 𝐴 ∈ ℒ(𝔹), then (𝑈𝑡 ) is a uniformly continuous semigroup on 𝔹.

The claim that (𝑈𝑡 ) is an evolution semigroup follows directly from the properties of the exponential function given above.
Uniform continuity can be established using arguments similar to those in the proof of differentiability in Lemma 6.1.
Since norm convergence on ℒ(𝔹) implies pointwise convergence, every uniformly continuous semigroup is a 𝐶0 semi-
group.
The reverse is certainly not true — there are many important 𝐶0 semigroups that fail to be uniformly continuous.
4 Be careful: the definition of a UC semigroup requires that 𝑡 ↦ 𝑈 is continuous as a map into ℒ(𝔹), rather than uniformly continuous. The UC
𝑡
terminology comes about because, for a UC semigroup, we have, by definition of the operator norm, sup‖𝑔‖≤1 ‖𝑈𝑠 𝑔 − 𝑈𝑡 𝑔‖ → 0 when 𝑠 → 𝑡.

62 Chapter 6. Semigroups and Generators


Continuous Time Markov Chains

In fact semigroups associated with PDEs, diffusions and other Markov processes on continuous state spaces are typically
𝐶0 but not uniformly continuous.
There are also important examples of Markov semigroups on infinite discrete state spaces that fail to be uniformly con-
tinuous.
However, we will soon see that, for most continuous time Markov chains used in applications, the semigroups are uniformly
continuous.

6.4.2 Generators

Consider a continuous time Markov chain on a finite state space with intensity matrix 𝑄.
The Markov semigroup (𝑃𝑡 ) is fully specified by this infinitesimal description 𝑄, in the sense that
• 𝑃𝑡 = 𝑒𝑡𝑄 for all 𝑡 ≥ 0 and (equivalently)
• the forward and backward equations hold: 𝑃𝑡′ = 𝑃𝑡 𝑄 = 𝑄𝑃𝑡 .
Since 𝑃0 = 𝐼, the matrix 𝑄 can be recovered from the semigroup via

𝑃ℎ − 𝐼
𝑄 = 𝑃0′ = lim
ℎ↓0 ℎ

In the more abstract setting of 𝐶0 semigroups, we say that 𝑄 is the “generator” of the semigroup (𝑃𝑡 ).
More generally, given a 𝐶0 semigroup (𝑈𝑡 ), we say that a linear operator 𝐴 from 𝔹 to itself is the generator of (𝑈𝑡 ) if

𝑈ℎ 𝑔 − 𝑔
𝐴𝑔 = lim (6.6)
ℎ↓0 ℎ

for all 𝑔 ∈ 𝔹 such that the limit exists.


The set of points where the limit exists (the domain of the generator) is denoted by 𝐷(𝐴).
At this point we would like to write (6.6) as 𝐴 = 𝑈0′ , or express 𝑈𝑡 as 𝑒𝑡𝐴 , analogous to the Markov case.
There are problems, however.
One problem is that the limit in (6.6) can fail to exist for some 𝑔 ∈ 𝔹.
Indeed, why should the limit exist, given that 𝐶0 semigroups are not required to be differentiable?
The other problem is that, even though the limit exists, the linear operator 𝐴 might be unbounded (i.e., not an element of
ℒ(𝔹)), in which case a statement like 𝑈𝑡 = 𝑒𝑡𝐴 is problematic.
It turns out that, despite these issues, the theory of 𝐶0 semigroups is powerful, and, with some work, the technical issues
can be circumvented.5
Even better, for the applications we wish to consider, we can focus on UC semigroups, where these problems do not arise.
The next section gives details.
5 An excellent treatment of the general theory of 𝐶0 semigroups, can be found in [Bobrowski, 2005].

6.4. Semigroups and Generators 63


Continuous Time Markov Chains

6.4.3 A Characterization of Uniformly Continuous Semigroups

We saw in Example 6.3 that exponential curves are an example of a UC semigroup.


The next theorem tells us that there are no other examples.

Theorem (UC Semigroups are Exponential Curves)


If (𝑈𝑡 ) is a UC semigroup on 𝔹, then there exists an 𝐴 ∈ ℒ(𝔹) such that 𝑈𝑡 = 𝑒𝑡𝐴 for all 𝑡 ≥ 0. Moreover,
• 𝑈𝑡 is differentiable at every 𝑡 ≥ 0,
• 𝐴 is the generator of (𝑈𝑡 ) and
• 𝑈𝑡′ = 𝐴𝑈𝑡 = 𝑈𝑡 𝐴 for all 𝑡 ≥ 0.

The last three claims in Theorem 6.1 follow directly from the first claim.
The statement 𝑈𝑡′ = 𝐴𝑈𝑡 = 𝑈𝑡 𝐴 is a generalization of the Kolmogorov forward and backward equations.
While slightly more complicated in the Banach setting, the proof of the first claim (existence of an exponential represen-
tation) is a direct extension of the fact that any continuous function 𝑓 from R to itself satisfying
• 𝑓(𝑠)𝑓(𝑡) = 𝑓(𝑠 + 𝑡) for all 𝑠, 𝑡 ≥ 0 and
• 𝑓(0) = 1
also satisfies 𝑓(𝑡) = 𝑒𝑡𝑎 for some 𝑎 ∈ R.
We proved something quite similar in Theorem 1.1, on the memoryless property of the exponential function.
For more discussion of the scalar case, see, for example, [Sahoo and Kannappan, 2011].
For a full proof of the first claim in Theorem 6.1, in the setting of a Banach algebra, see, for example, Chapter 7 of
[Bobrowski, 2005].

6.5 Exercises

Exercise
Prove that (6.5) holds for all 𝐴 ∈ ℒ(𝔹).

Exercise
In many texts, a 𝐶0 semigroup is defined as an evolution semigroup (𝑈𝑡 ) such that

𝑈𝑡 𝑔 → 𝑔 as 𝑡 → 0 for any 𝑔 ∈ 𝔹 (6.7)

Our aim is to show that (6.7) implies continuity at every point 𝑡, as in the definition we used above.
The Banach–Steinhaus Theorem can be used to show that, for an evolution semigroup (𝑈𝑡 ) satisfying (6.7), there exist
finite constants 𝜔 and 𝑀 such that

‖𝑈𝑡 ‖ ≤ 𝑒𝑡𝜔 𝑀 for all 𝑡 ≥ 0 (6.8)

Using this and (6.7), show that, for any 𝑔 ∈ 𝔹, the map 𝑡 ↦ 𝑈𝑡 𝑔 is continuous at all 𝑡.

64 Chapter 6. Semigroups and Generators


Continuous Time Markov Chains

Exercise
Following on from the previous exercise, a UC semigroup is often defined as an evolution semigroup (𝑈𝑡 ) such that

‖𝑈𝑡 − 𝐼‖ → 0 as 𝑡 → 0 (6.9)

Show that (6.9) implies norm continuity at every point 𝑡, as in the definition we used above.
In particular, show that, for any 𝑡𝑛 → 𝑡, we have ‖𝑈𝑡𝑛 − 𝑈𝑡 ‖ → 0 as 𝑛 → ∞.

6.6 Solutions

Solution to Exercise 8.1


To show the first equality, fix 𝑡 ∈ R+ , take ℎ > 0 and observe that

𝑒(𝑡+ℎ)𝐴 − 𝑒𝑡𝐴 − 𝑒𝑡𝐴 𝐴 = 𝑒𝑡𝐴 (𝑒ℎ𝐴 − 𝐼 − 𝐴)

Since the norm on ℒ(𝔹) is submultiplicative, it suffices to show that ‖𝑒ℎ𝐴 − 𝐼 − 𝐴‖ = 𝑜(ℎ) as ℎ → 0.
Using the definition of the exponential, this is easily verified, completing the proof of the first equality in (6.5).
The proof of the second equality is similar.

Solution to Exercise 8.2


Let (𝑈𝑡 ) be an evolution semigroup satisfying (6.7) and let 𝜔 and 𝑀 be as in (6.8).
Pick any 𝑔 ∈ 𝔹, 𝑡 > 0 and ℎ𝑛 ↓ 0 as 𝑛 → ∞.
On one hand, 𝑈𝑡+ℎ𝑛 𝑔 = 𝑈ℎ𝑛 𝑈𝑡 𝑔 → 𝑈𝑡 𝑔 by (6.7).
On the other hand, from (6.8) and the definition of the operator norm,

‖𝑈𝑡−ℎ𝑛 𝑔 − 𝑈𝑡 𝑔‖ = ‖𝑈𝑡−ℎ𝑛 (𝑔 − 𝑈ℎ𝑛 𝑔)‖ ≤ 𝑒(𝑡−ℎ𝑛 )𝜔 𝑀 ‖𝑔 − 𝑈ℎ𝑛 𝑔‖ → 0

as 𝑛 → ∞. This completes the proof.

Solution to Exercise 8.3


The solution is similar to that of the previous exercise.
Let (𝑈𝑡 ) be an evolution semigroup satisfying (6.9), fix 𝑡 > 0 and take (ℎ𝑛 ) to be a scalar sequence satisfying ℎ𝑛 ↓ 0 as
𝑛 → ∞.
On one hand, 𝑈𝑡+ℎ𝑛 = 𝑈ℎ𝑛 𝑈𝑡 → 𝑈𝑡 by (6.9).
On the other hand, from the submultiplicative property of the operator norm and (6.8),

‖𝑈𝑡−ℎ𝑛 − 𝑈𝑡 ‖ = ‖𝑈𝑡−ℎ𝑛 (𝐼 − 𝑈ℎ𝑛 )‖ ≤ 𝑒(𝑡−ℎ𝑛 )𝜔 𝑀 ‖𝐼 − 𝑈ℎ𝑛 ‖

This converges to 0 as 𝑛 → ∞, completing our proof.

6.6. Solutions 65
Continuous Time Markov Chains

66 Chapter 6. Semigroups and Generators


CHAPTER

SEVEN

UC MARKOV SEMIGROUPS

7.1 Overview

In our previous lecture we covered some of the general theory of operator semigroups.
Next we translate these results into the setting of Markov semigroups.
The Markov semigroups are defined on a countable set 𝑆.
The main aim is to give an exact one-to-one correspondence between
• UC Markov semigroups
• “conservative” intensity matrices and
• jump chains with state dependent jump intensities
Conservativeness is defined below and relates to “nonexplosiveness” of the associated Markov chain.
We will also give a brief discussion of intensity matrices that do not have this property, along with the processes they
generate.

7.2 Notation and Terminology

Let 𝑆 be an arbitrary countable set.


Let ℓ1 be the Banach space of summable functions on 𝑆; that is, all 𝑔 ∶ 𝑆 → R with

‖𝑔‖ ∶= ∑ |𝑔(𝑥)| < ∞


𝑥

Note that 𝒟, the set of all distributions on 𝑆, is contained in ℓ1 .


Each Markov matrix 𝑃 on 𝑆 can and will be identifed with a linear operator 𝑓 ↦ 𝑓𝑃 on ℓ1 via

(𝑓𝑃 )(𝑦) = ∑ 𝑓(𝑥)𝑃 (𝑥, 𝑦) (𝑓 ∈ ℓ1 , 𝑦 ∈ 𝑆) (7.1)


𝑥

To be consistent with earlier notation, we are writing the argument of 𝑃 to the left and applying 𝑃 to it as if premultiplying
𝑃 by a row vector.
In the exercises you are asked to verify that (7.1) defines a bounded linear operator on ℓ1 such that

‖𝑃 ‖ = 1 and 𝜙𝑃 ∈ 𝒟 whenever 𝜙 ∈ 𝒟 (7.2)

Note that composition of 𝑃 with itself is equivalent to powers of the matrix under matrix multiplication.

67
Continuous Time Markov Chains

For an intensity matrix 𝑄 on 𝑆 we can try to introduce the associated operator analogously, via

(𝑓𝑄)(𝑦) = ∑ 𝑓(𝑥)𝑄(𝑥, 𝑦) (𝑓 ∈ ℓ1 , 𝑦 ∈ 𝑆) (7.3)


𝑥

However, the sum in (7.3) is not always well defined.1


We say that an intensity matrix 𝑄 is conservative if the sum in (7.3) is well defined at all 𝑦 and, in addition, the mapping
𝑓 ↦ 𝑓𝑄 in (7.3) is a bounded linear operator on ℓ1 .
Below we show how this property can be checked in applications.

7.3 UC Markov Semigroups and their Generators

Let 𝑄 be a conservative intensity matrix on 𝑆.


Since 𝑄 is in ℒ(ℓ1 ), the operator exponential 𝑒𝑡𝑄 is well defined as an element of ℒ(ℓ1 ) for all 𝑡 ≥ 0.
Moreover, by Example 6.3, the family (𝑃𝑡 ) in ℒ(ℓ1 ) defined by 𝑃𝑡 = 𝑒𝑡𝑄 defines a UC Markov semigroup on ℓ1 .
(Here, a Markov semigroup (𝑃𝑡 ) is both a collection of Markov matrices and a collection of operators, as in (7.1).)
The next theorem says that this is the only way UC Markov semigroups can arise.

Theorem 7.1
If (𝑃𝑡 ) is a UC Markov semigroup on ℓ1 , then there exists a conservative intensity matrix 𝑄 such that 𝑃𝑡 = 𝑒𝑡𝑄 for all
𝑡 ≥ 0.

P roof. Let (𝑃𝑡 ) be a UC Markov semigroup on ℓ1 .


Since (𝑃𝑡 ) is a UC semigroup on ℓ1 , it follows from Theorem 6.1 that there exists a 𝑄 ∈ ℒ(ℓ1 ) such that 𝑃𝑡 = 𝑒𝑡𝑄 for
all 𝑡 ≥ 0.
We only need to show that 𝑄 is a conservative intensity matrix.
Because (𝑃𝑡 ) is a Markov semigroup, 𝑃𝑡 is a Markov matrix for all 𝑡, and, since 𝑃𝑡 = 𝑒𝑡𝑄 for all 𝑡, it follows that 𝑄 is an
intensity matrix.
We proved this for the case |𝑆| < ∞ in Theorem 5.1 and one can verify that the same arguments go through when
|𝑆| = ∞.
As 𝑄 ∈ ℒ(ℓ1 ), we know that 𝑄 is a bounded operator, so 𝑄 is a conservative intensity matrix.

From Theorem 7.1 we can easily deduce that


• 𝑃𝑡 is differentiable at every 𝑡 ≥ 0,
• 𝑄 is the generator of (𝑃𝑡 ) and
• 𝑃𝑡′ = 𝑄𝑃𝑡 = 𝑃𝑡 𝑄 for all 𝑡 ≥ 0.
• 𝑃0′ = 𝑄
1 Previously, we introduced the notion of an intensity matrix when 𝑆 is finite and the definition is essentially unchanged in the current setting. In

R
particular, 𝑄 ∶ 𝑆 × 𝑆 → is called an intensity matrix if 𝑄 has zero row sums and 𝑄(𝑥, 𝑦) ≥ 0 whenever 𝑥 ≠ 𝑦.

68 Chapter 7. UC Markov Semigroups


Continuous Time Markov Chains

In fact these results are just a special case of the claims in Theorem 6.1.
The second last of these results is the Kolmogorov forward and backward equations.
The last of these results shows that we can obtain the intensity matrix 𝑄 by differentiating 𝑃𝑡 at 𝑡 = 0.

Example 7.1
Let us consider again the Poisson process (𝑁𝑡 ) with rate 𝜆 > 0 in light of the discussion above.
The corresponding semigroup (𝑃𝑡 ) is UC and hence there exists a conservative intensity matrix 𝑄 with 𝑃𝑡 = 𝑒𝑡𝑄 for all
𝑡 ≥ 0.
This fact can be established by proving UC property and then appealing to Theorem 7.1.
Another alternative, easier in this case, is to supply the intensity matrix 𝑄 directly and then verify that 𝑃𝑡 = 𝑒𝑡𝑄 holds.
The semigroup for a Poisson process with rate 𝜆 was given in (3.9) and is repeated here:
𝑘−𝑗
𝑒−𝜆𝑡 (𝜆𝑡)
(𝑘−𝑗)! if 𝑗 ≤ 𝑘
𝑃𝑡 (𝑗, 𝑘) = { (7.4)
0 otherwise

For the intensity matrix we take

−𝜆 𝜆 0 0 0 ⋯

⎜ 0 −𝜆 𝜆 0 0 ⋯⎞⎟
⎜ ⎟
𝑄 ∶= ⎜
⎜ 0 0 −𝜆 𝜆 0 ⋯⎟⎟ (7.5)

⎜ ⎟
0 0 0 −𝜆 𝜆 ⋯⎟
⎝ ⋮ ⋮ ⋮ ⋮ ⋮ ⎠

The form of 𝑄 is intuitive: probability flows out of state 𝑖 and into state 𝑖 + 1 at the rate 𝜆.
It is immediate that 𝑄 is an intensity matrix, as claimed.
The exercises ask you to confirm that 𝑄 is in ℒ(ℓ1 ).
To prove that 𝑃𝑡 = 𝑒𝑡𝑄 for any 𝑡 ≥ 0, we first decompose 𝑄 as 𝑄 = 𝜆(𝐾 − 𝐼), where 𝐾 is defined by

𝐾(𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 1}

For given 𝑡 ≥ 0, we then have

(𝜆𝑡𝐾)𝑚
𝑒𝑡𝑄 = 𝑒𝜆𝑡(𝐾−𝐼) = 𝑒−𝜆𝑡 𝑒𝜆𝑡𝐾 = 𝑒−𝜆𝑡 ∑
𝑚≥0
𝑚!

The exercises ask you to verify that, for the powers of 𝐾, we have 𝐾 𝑚 (𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 𝑚}.
Inserting this expression for 𝐾 𝑚 leads to

(𝜆𝑡)𝑚 (𝜆𝑡)𝑚
𝑒𝑡𝑄 (𝑖, 𝑗) = 𝑒−𝜆𝑡 ∑ 𝟙{𝑗 = 𝑖 + 𝑚} = 𝑒−𝜆𝑡 ∑ 𝟙{𝑚 = 𝑗 − 𝑖}
𝑚≥0
𝑚! 𝑚≥0
𝑚!

This is identical to (7.4).


It now follows that 𝑡 ↦ 𝑃𝑡 ∈ ℒ(ℓ1 ) is differentiable at every 𝑡 ≥ 0 and 𝑄 is the generator of (𝑃𝑡 ), with 𝑃0′ = 𝑄.

7.3. UC Markov Semigroups and their Generators 69


Continuous Time Markov Chains

7.3.1 A Necessary and Sufficient Condition

Our definition of a conservative intensity matrix works for the theory above but can be hard to check in appliations and
lacks probabilistic intuition.
Fortunately, we have the following simple characterization.

Lemma 7.1
An intensity matrix 𝑄 on 𝑆 is conservative if and only if sup𝑥 |𝑄(𝑥, 𝑥)| is finite.

The proof is a solved exercise.

Example 7.2
Recall the jump chain setting where, repeating (4.4), we defined 𝑄 via

𝑄(𝑥, 𝑦) ∶= 𝜆(𝑥)(𝐾(𝑥, 𝑦) − 𝐼(𝑥, 𝑦)) (7.6)

The function 𝜆 ∶ 𝑆 → R+ gives the jump rate at each state, while 𝐾 is the Markov matrix for the embedded discrete
time jump chain.
Previously we discussed this in the case where 𝑆 is finite but there is no need to restrict attention to that case.
For general countable 𝑆, the matrix 𝑄 defined in (7.6) is still an intensity matrix.
If we continue to assume that 𝐾(𝑥, 𝑥) = 0 for all 𝑥, then 𝑄(𝑥, 𝑥) = −𝜆(𝑥).
Hence, 𝑄 is conservative if and only if sup𝑥 𝜆(𝑥) is finite.
In other words, 𝑄 is conservative if the set of jump rates is bounded.

This example shows that requiring 𝑄 to be conservative is a relatively mild restriction.

7.3.2 The Finite State Case

It is immediate from Lemma 7.1 that every intensity matrix is conservative when the state space 𝑆 is finite.
Hence, in this setting, every intensity matrix 𝑄 on 𝑆 defines a UC Markov semigroup (𝑃𝑡 ) via 𝑃𝑡 = 𝑒𝑡𝑄 .
Conversely, if 𝑆 is finite, then any Markov semigroup (𝑃𝑡 ) is a UC Markov semigroup.
To see this, recall that, as a Markov semigroup, (𝑃𝑡 ) satisfies lim𝑡→0 𝑃𝑡 (𝑥, 𝑦) = 𝐼(𝑥, 𝑦) for all 𝑥, 𝑦 in 𝑆.
In any finite dimensional space, pointwise convergence implies norm convergence, so 𝑃𝑡 → 𝐼 in operator norm as 𝑡 → 0
from above.
As we saw previously, this is enough to ensure that 𝑡 ↦ 𝑃𝑡 is norm continuous everywhere on R+ .
Hence (𝑃𝑡 ) is a UC Markov semigroup, as claimed.
Combining these results with Theorem 7.1, we conclude that, when 𝑆 is finite, there is a one-to-one correspondence
between Markov semigroups and intensity matrices.

70 Chapter 7. UC Markov Semigroups


Continuous Time Markov Chains

7.4 From Intensity Matrix to Jump Chain

We now understand that there is a one-to-one pairing between conservative intensity matrices and UC Markov semigroups.
These ideas are important from an analytical perspective.
Now we provide another point of view, more connected to probability.
This point of view is important for both theory and computation.

7.4.1 Jump Chain Pairs

Let us agree to call (𝜆, 𝐾) a jump chain pair if 𝜆 is a map from 𝑆 to R+ and 𝐾 is a Markov matrix on 𝑆.
It is easy to verify that the matrix 𝑄 on 𝑆 defined by

𝑄(𝑥, 𝑦) ∶= 𝜆(𝑥)(𝐾(𝑥, 𝑦) − 𝐼(𝑥, 𝑦)) (7.7)

is an intensity matrix.
(We saw in an earlier lecture that 𝑄 is the intensity matrix for the jump chain (𝑋𝑡 ) built via Algorithm 4.1 from jump
chain pair (𝜆, 𝐾).)
As we now show, every intensity matrix admits the decomposition in (7.7) for some jump chain pair.

7.4.2 Jump Chain Decomposition

Given an intensity matrix 𝑄, set

𝜆(𝑥) ∶= −𝑄(𝑥, 𝑥) (𝑥 ∈ 𝑆) (7.8)

Next we build 𝐾, first along the principle diagonal via

0 if 𝜆(𝑥) > 0
𝐾(𝑥, 𝑥) = { (7.9)
1 otherwise

Thus, if the rate of leaving 𝑥 is positive, we set 𝐾(𝑥, 𝑥) = 0, so that the embedded jump chain moves away from 𝑥 with
probability one when the next jump occurs.
Otherwise, when 𝑄(𝑥, 𝑥) = 0, we stay at 𝑥 forever, so 𝑥 is an absorbing state.
Off the principle diagonal, where 𝑥 ≠ 𝑦, we set
𝑄(𝑥,𝑦)
𝜆(𝑥) if 𝜆(𝑥) > 0
𝐾(𝑥, 𝑦) = { (7.10)
0 otherwise
The exercises below ask you to confirm that, for 𝜆 and 𝐾 just defined,
1. (𝜆, 𝐾) is a jump chain pair and
2. the intensity matrix 𝑄 satisfies (7.7).
We call (𝜆, 𝐾) the jump chain decomposition of 𝑄.
We summarize in a lemma.

Lemma 7.2
A matrix 𝑄 on 𝑆 is an intensity matrix if and only if there exists a jump chain pair (𝜆, 𝐾) such that (7.7) holds.

7.4. From Intensity Matrix to Jump Chain 71


Continuous Time Markov Chains

7.4.3 The Conservative Case

We know from Example 7.2 that an intensity matrix 𝑄 is conservative if and only if 𝜆 is bounded.
Moreover, we saw in Theorem 7.1 that the pairing between conservative intensity matrices and UC Markov semigroups
is one-to-one.
This leads to the following result.

Theorem 7.2
On 𝑆, there exists a one-to-one correspondence between the following sets of objects:
1. The set of all jump chain pairs (𝜆, 𝐾) such that 𝜆 is bounded.
2. The set of all conservative intensity matrices.
3. The set of all UC Markov semigroups.

7.4.4 Simulation

In view of the preceding discussion, we have a simple way to simulate a Markov chain given any conservative intensity
matrix 𝑄.
The steps are
1. Decompose 𝑄 into a jump chain pair (𝜆, 𝐾).
2. Simulate via Algorithm 4.1.
Recalling our discussion of the Kolmogorov backward equation, we know that this produces a Markov chain with Markov
semigroup (𝑃𝑡 ) where 𝑃𝑡 = 𝑒𝑡𝑄 for 𝑄 satisfying (7.7).
(Although our argument assumed finite 𝑆, the proof goes through when 𝑆 is countably infinite and 𝑄 is conservative with
very minor changes.)
In particular, (𝑋𝑡 ) is a continuous time Markov chain with intensity matrix 𝑄.

7.5 Beyond Bounded Intensity Matrices

If we do run into an application where an intensity matrix 𝑄 is not conservative, what might we expect?
In this scenario, we can at least hope that 𝑄 is the generator of a 𝐶0 semigroup.
Since 𝑄 is an intensity matrix, we can be sure that this semigroup will be a Markov semigroup.
To know when 𝑄 will be the generator of a 𝐶0 semigroup, we need to look to the Hille-Yoshida Theorem and sufficient
conditions derived from it.
While we omit a detailed treatment, it is worth noting that the issue is linked to explosions.
To see the connection, recall that some initial value problems do not lead to a valid solution defined for all 𝑡 ∈ R+ .
An example is the scalar problem 𝑥′𝑡 = 1 + 𝑥2𝑡 , which has solution 𝑥𝑡 = tan(𝑡 − 𝑐) for some constant 𝑐.
But this solution equals +∞ for 𝑡 ≥ 𝑐 + 𝜋/2.
The problem is that the time path explodes to infinity in finite time.
The same issue can occur for Markov processes, if jump rates grow sufficiently quickly.

72 Chapter 7. UC Markov Semigroups


Continuous Time Markov Chains

For more discussion, see, for example, Section 2.7 of [Norris, 1998].

7.6 Exercises

Exercise 7.1
Let 𝑃 be a Markov matrix on 𝑆 and identify it with the linear operator in (7.1). Verify the claims in (7.2).

Exercise 7.2
Prove the claim in Lemma 7.1.

Exercise 7.3
Confirm that 𝑄 defined in (7.5) induces a bounded linear operator on ℓ1 via (7.3).

Exercise 7.4
Let 𝐾 be defined on Z+ × Z+ by 𝐾(𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 1}.
Show that, with 𝐾 𝑚 representing the 𝑚-th matrix product of 𝐾 with itself, we have 𝐾 𝑚 (𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 𝑚} for any
𝑖, 𝑗 ∈ Z+ .

Exercise 7.5
Let 𝑄 be any intensity matrix on 𝑆.
Prove that the jump chain decomposition of 𝑄 is in fact a jump chain pair.
Prove that, in addition, this decomposition (𝜆, 𝐾) satisfies (7.7).

7.7 Solutions

Solution to Exercise 7.1


To determine the norm of 𝑃 , we use the definition in (6.2).
If 𝑓 ∈ ℓ1 and ‖𝑓‖ ≤ 1, then

‖𝑓𝑃 ‖ ≤ ∑ ∑ |𝑓(𝑥)|𝑃 (𝑥, 𝑦) = ∑ |𝑓(𝑥)| ∑ 𝑃 (𝑥, 𝑦) = ∑ |𝑓(𝑥)| = ‖𝑓‖


𝑦 𝑥 𝑥 𝑦 𝑥

Hence ‖𝑃 ‖ ≤ 1.
To see that equality holds we can repeat this argument with 𝑓 ≥ 0, obtaining ‖𝑓𝑃 ‖ = ‖𝑓‖.
Now pick any 𝜙 ∈ 𝒟.

7.6. Exercises 73
Continuous Time Markov Chains

Clearly 𝜙𝑃 ≥ 0, and

∑(𝜙𝑃 )(𝑦) = ∑ ∑ 𝜙(𝑥)𝑃 (𝑥, 𝑦) = ∑ 𝜙(𝑥) ∑ 𝑃 (𝑥, 𝑦) = 1


𝑦 𝑦 𝑥 𝑥 𝑦

Hence 𝜙𝑃 ∈ 𝒟 as claimed.

Solution to Exercise 7.2


Here is one solution.
Let 𝑄 be an intensity matrix on 𝑆.
Suppose first that 𝑚 ∶= sup𝑥 |𝑄(𝑥, 𝑥)| is finite.

Set 𝑃 ̂ ∶= 𝐼 + 𝑄/𝑚.
It is not hard to check that 𝑃 ̂ is a Markov matrix and that 𝑄 = 𝑚(𝑃 ̂ − 𝐼).
Since 𝑃 ̂ is a Markov matrix, it induces a bounded linear operator on ℓ1 via (7.1).
As ℒ(ℓ1 ) is a linear space, we see that 𝑄 is likewise in ℒ(ℓ1 ).
In particular, 𝑄 is a bounded operator, and hence conservative.
Next, suppose that 𝑄 is conservative and yet sup𝑥 |𝑄(𝑥, 𝑥)| is infinite.
Choose 𝑥 ∈ 𝑆 such that |𝑄(𝑥, 𝑥)| > ‖𝑄‖
Let 𝑓 ∈ ℓ1 be defined by 𝑓(𝑧) = 𝟙{𝑧 = 𝑥}.
Since ‖𝑓‖ = 1, we have

‖𝑄‖ ≥ ‖𝑓𝑄‖ = ∑ ∣∑ 𝑓(𝑧)𝑄(𝑧, 𝑦)∣ = ∑ |𝑄(𝑥, 𝑦)| ≥ |𝑄(𝑥, 𝑥)|


𝑦 𝑧 𝑦

Contradiction.

Solution to Exercise 7.3


Linearity is obvious so we focus on boundedness.
For any 𝑓 ∈ ℓ1 and this choice of 𝑄, we have

∑ |(𝑓𝑄)(𝑦)| ≤ ∑ ∑ |𝑓(𝑥)𝑄(𝑥, 𝑦)| ≤ 𝜆 ∑ ∑ |𝑓(𝑦) − 𝑓(𝑦 + 1)|


𝑦 𝑦 𝑥 𝑦 𝑥

Applying the triangle inequality, we see that the right hand side is dominated by 2𝜆‖𝑓‖.
Hence ‖𝑓𝑄‖ ≤ 2𝜆‖𝑓‖, which implies that 𝑄 ∈ ℒ(ℓ1 ) as required.

Solution to Exercise 7.4


The statement 𝐾 𝑚 (𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 𝑚} holds by definition when 𝑚 = 1.
Now suppose it holds at arbitrary 𝑚.
We then have, by definition of composition (matrix multiplication),

𝐾 𝑚+1 (𝑖, 𝑗) = ∑ 𝐾(𝑖, 𝑛)𝐾 𝑚 (𝑛, 𝑗) = ∑ 𝐾(𝑖, 𝑛)𝟙{𝑗 = 𝑛 + 𝑚} = 𝐾(𝑖, 𝑗 − 𝑚)


𝑛 𝑛

74 Chapter 7. UC Markov Semigroups


Continuous Time Markov Chains

Applying the definition 𝐾(𝑖, 𝑗) = 𝟙{𝑗 = 𝑖 + 1} completes verification of the claim.

Solution to Exercise 7.5


Let 𝑄 be an intensity matrix and let (𝜆, 𝐾) be the jump chain decomposition of 𝑄.
Nonnegativity of 𝜆 is immediate from the definition of an intensity matrix.
To see that 𝐾 is a Markov matrix we fix 𝑥 ∈ 𝑆 and suppose first that 𝜆(𝑥) > 0.
Then
𝑄(𝑥, 𝑦)
∑ 𝐾(𝑥, 𝑦) = ∑ 𝐾(𝑥, 𝑦) = ∑ =1
𝑦 𝑦≠𝑥 𝑦≠𝑥
𝜆(𝑥)

If, on the other hand, 𝜆(𝑥) = 0, then ∑𝑦 𝐾(𝑥, 𝑦) = 1, is immediate from the definition.
As 𝐾 is nonnegative, we see that 𝐾 is a Markov matrix.
Thus, (𝜆, 𝐾) is a valid jump chain pair.
The proof that 𝑄 and (𝜆, 𝐾) satisfy (7.7) is mechanical and the details are omitted.
(Try working case-by-case, with 𝜆(𝑥) = 0, 𝑥 = 𝑦, 𝜆(𝑥) > 0, 𝑥 = 𝑦, etc.)

7.7. Solutions 75
Continuous Time Markov Chains

76 Chapter 7. UC Markov Semigroups


CHAPTER

EIGHT

STATIONARITY AND ERGODICITY

8.1 Overview

In this lecture we discuss stability and equilibrium behavior for continuous time Markov chains.
To give one example of why this theory matters, consider queues, which are often modeled as continuous time Markov
chains.
Queueing theory is used in applications such as
• treatment of patients arriving at a hospital
• optimal design of manufacturing processes
• requests to a file server
• air traffic
• customers waiting on a helpline
A key topic in queueing theory is average behavior over the long run.
• Will the length of the queue grow without bounds?
• If not, is there some kind of long run equilibrium?
• If so, what is the average waiting time in this equilibrium?
• What is the average length of the queue over a week, or a month?
We will use the following imports

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import quantecon as qe
from numba import njit
from scipy.linalg import expm

from matplotlib import cm


from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Poly3DCollection

77
Continuous Time Markov Chains

8.2 Stationary Distributions

8.2.1 Definition

Let 𝑆 be countable.
Recall that, for a discrete time Markov chain with Markov matrix 𝑃 on 𝑆, a distribution 𝜓 is called stationary for 𝑃 if
𝜓𝑃 = 𝜓.
This means that if 𝑋𝑡 has distribution 𝜓, then so does 𝑋𝑡+1 .
For continuous time Markov chains, the definition is analogous.
Given a Markov semigroup (𝑃𝑡 ) on 𝑆, a distribution 𝜓∗ ∈ 𝒟 is called stationary for (𝑃𝑡 ) if

𝜓∗ 𝑃𝑡 = 𝜓∗ for all 𝑡 ≥ 0

As one example, we look again at the chain on 𝑆 = {0, 1, 2} with intensity matrix
Q = ((-3, 2, 1),
(3, -5, 2),
(4, 6, -10))

The following figure was shown before, except that now there is a black dot that the three trajectories seem to be converging
to.
(Recall that, in the color scheme, trajectories cool as time evolves.)

78 Chapter 8. Stationarity and Ergodicity


Continuous Time Markov Chains

This black dot is the stationary distribution 𝜓∗ of the Markov semigroup (𝑃𝑡 ) generated by 𝑄.
It was calculated using the stationary_distributions attribute of QuantEcon’s MarkovChain class, by ar-
bitrarily setting 𝑡 = 1 and solving 𝜓𝑃1 = 𝜓.
Below we show that, for this choice of 𝑄, the stationary distribution 𝜓∗ is unique in 𝒟, due to irreducibility.
Moreover, 𝜓𝑃𝑡 → 𝜓∗ as 𝑡 → ∞ for any 𝜓 ∈ 𝒟, as suggested by the figure.

8.2.2 Stationarity via the Generator

In many cases, it is easier to use the generator of the semigroup to identify stationary distributions rather than the semigroup
itself.
This is analogous to the idea that a point 𝑥̄ in R𝑑 is stationary for a vector ODE 𝑥′𝑡 = 𝐹 (𝑥𝑡 ) when 𝐹 (𝑥)̄ = 0.
(Here 𝐹 is the infinitesimal description, and hence analogous to the generator.)
The next result holds true under weaker conditions but the version stated here is easy to prove and sufficient for applications
we consider.

Theorem 8.1
Let (𝑃𝑡 ) be a UC Markov semigroup with generator 𝑄. A distribution 𝜓 on 𝑆 is stationary for (𝑃𝑡 ) if and only if 𝜓𝑄 = 0.

P roof. Fix 𝜓 ∈ 𝒟 and suppose first that 𝜓𝑄 = 0.


Since (𝑃𝑡 ) is a UC Markov semigroup, we have 𝑃𝑡 = 𝑒𝑡𝑄 for all 𝑡, and hence, for any 𝑡 ≥ 0,

𝜓𝑄2
𝜓𝑒𝑡𝑄 = 𝜓 + 𝑡𝜓𝑄 + 𝑡2 +⋯
2!
From 𝜓𝑄 = 0 we get 𝜓𝑄𝑘 = 0 for all 𝑘 ∈ N, so the last display yields 𝜓𝑃𝑡 = 𝜓.
Hence 𝜓 is stationary for (𝑃𝑡 ).
Now suppose that 𝜓 is stationary for (𝑃𝑡 ) and set 𝐷𝑡 ∶= (1/𝑡)(𝑃𝑡 − 𝐼).
From the triangle inequality and the definition of the operator norm, for any given 𝑡,

‖𝜓𝑄‖ ≤ ‖𝜓(𝑄 − 𝐷𝑡 )‖ + ‖𝜓𝐷𝑡 ‖ ≤ ‖𝑄 − 𝐷𝑡 ‖ + ‖𝜓𝐷𝑡 ‖

Since (𝑃𝑡 ) is UC and 𝑄 is its generator, we have ‖𝐷𝑡 − 𝑄‖ → 0 in ℒ(ℓ1 ) as 𝑡 → 0+ .


Hence ‖𝜓𝑄‖ ≤ lim inf𝑡↓0 ‖𝜓𝐷𝑡 ‖.
As 𝜓 is stationary for (𝑃𝑡 ), we have 𝜓𝐷𝑡 = 0 for all 𝑡.
Hence 𝜓𝑄 = 0, as was to be shown.

8.2. Stationary Distributions 79


Continuous Time Markov Chains

8.3 Irreducibility and Uniqueness

Let (𝑃𝑡 ) be a Markov semigroup on 𝑆 and consider arbitrary states 𝑥, 𝑦 ∈ 𝑆.


We say that state 𝑦 is accessible from state 𝑥 if there exists a 𝑡 ≥ 0 such that 𝑃𝑡 (𝑥, 𝑦) > 0.
We say that 𝑥 and 𝑦 communicate if 𝑥 is accessible from 𝑦 and 𝑦 is accessible from 𝑥.
A Markov semigroup (𝑃𝑡 ) on 𝑆 is called irreducible if every pair 𝑥, 𝑦 in 𝑆 communicates.
We seek a characterization of irreducibility of (𝑃𝑡 ) in terms of its generator.
As a first step, we will say there is a 𝑄-positive probability flow from 𝑥 to 𝑦 if there exists a finite sequence (𝑧𝑖 )𝑚
𝑖=0 in
𝑆 starting at 𝑥 = 𝑧0 and ending at 𝑦 = 𝑧𝑚 such that 𝑄(𝑧𝑖 , 𝑧𝑖+1 ) > 0 for all 𝑖.

Theorem 8.2
Let (𝑃𝑡 ) be a UC Markov semigroup with generator 𝑄. For distinct states 𝑥 and 𝑦, the following statements are equivalent:
1. The state 𝑦 is accessible from 𝑥 under (𝑃𝑡 ).
2. There is a 𝑄-positive probability flow from 𝑥 to 𝑦.
3. 𝑃𝑡 (𝑥, 𝑦) > 0 for all 𝑡 > 0.

P roof. Pick any two distinct states 𝑥 and 𝑦.


It is obvious that statement 3 implies statement 1, so we need only prove (1 ⟹ 2) and (2 ⟹ 3).
Starting with (1 ⟹ 2), recall that
𝑡2 2
𝑃𝑡 (𝑥, 𝑦) = 𝑡𝑄(𝑥, 𝑦) + 𝑄 (𝑥, 𝑦) + ⋯ (8.1)
2!
If 𝑥 is accessible from 𝑦, then 𝑃𝑡 (𝑥, 𝑦) > 0 for some 𝑡 > 0, so 𝑄𝑘 (𝑥, 𝑦) > 0 for at least one 𝑘 ∈ N.
Writing out the matrix product as a sum, we now have
𝑄𝑘 (𝑥, 𝑦) = ∑ ∑ ⋯ ∑ 𝑄(𝑥, 𝑧1 )𝑄(𝑧1 , 𝑧2 ) ⋯ 𝑄(𝑧𝑘−1 , 𝑦) > 0 (8.2)
𝑧1 𝑧2 𝑧𝑘−1

It follows that at least one element of the sum must be strictly positive.
Therefore, a 𝑄-positive probability flow from 𝑥 to 𝑦 exists.
Turning to (2 ⟹ 3), first note that, for arbitrary states 𝑢 and 𝑣, if 𝑄(𝑢, 𝑣) > 0 then 𝑃𝑡 (𝑢, 𝑣) > 0 for all 𝑡 > 0.
To see this, let (𝜆, 𝐾) be the jump chain pair constructed from 𝑄 via (7.8), (7.9) and (7.10).
Observe that, since 𝑄(𝑢, 𝑣) > 0, we must have 𝜆(𝑢) > 0.
As a consequence, applying (7.10), we have
𝑄(𝑢, 𝑣)
𝐾(𝑢, 𝑣) = >0
𝜆(𝑢)
Let (𝑌𝑘 ) and (𝐽𝑘 ) be the embedded jump chain and jump sequence generated by Algorithm 4.1, with 𝑌0 = 𝑢.
With 𝐸1 ∼ Exp(1) and 𝐸2 ∼ Exp(1), we have, for any 𝑡 > 0,
𝑃𝑡 (𝑢, 𝑣) ≥ P{𝐽1 ≤ 𝑡, 𝑌1 = 𝑣, 𝐽2 > 𝑡}
≥ P{𝐸1 ≤ 𝑡𝜆(𝑢), 𝐸2 > 𝑡𝜆(𝑣)}P{𝑌1 = 𝑣}
= P{𝐸1 ≤ 𝑡𝜆(𝑢)}P{𝐸2 > 𝑡𝜆(𝑣)}𝐾(𝑢, 𝑣)
>0

80 Chapter 8. Stationarity and Ergodicity


Continuous Time Markov Chains

Now suppose there is a 𝑄-positive probability flow (𝑧𝑖 )𝑚


𝑖=0 from 𝑥 to 𝑦.

If we fix 𝑡 > 0 and repeatedly apply (3.7) along with the last result, we obtain
𝑚−1
𝑃𝑡 (𝑥, 𝑦) ≥ ∏ 𝑃𝑡/𝑚 (𝑧𝑖 , 𝑧𝑖+1 ) > 0
𝑖=0

Theorem 8.2 leads directly to the following strong result.

Corollary 8.1
For a UC Markov semigroup (𝑃𝑡 ), the following statements are equivalent:
1. (𝑃𝑡 ) is irreducible.
2. 𝑃𝑡 (𝑥, 𝑦) > 0 for all 𝑡 > 0 and all (𝑥, 𝑦) ∈ 𝑆 × 𝑆.

Note: To obtain stable long run behavior in discrete time Markov chains, it is common to assume that the chain is
aperiodic.
This needs to be assumed on top of irreducibility if one wishes to rule out all dependence on initial conditions.
Corollary 8.1 shows that periodicity is not a concern for irreducible continuous time Markov chains.
Positive probability flow from 𝑥 to 𝑦 at some 𝑡 > 0 immediately implies positive flow for all 𝑡 > 0.

8.4 Asymptotic Stabilitiy

We call Markov semigroup (𝑃𝑡 ) asymptotically stable if (𝑃𝑡 ) has a unique stationary distribution 𝜓∗ in 𝒟 and

‖𝜓𝑃𝑡 − 𝜓∗ ‖ → 0 as 𝑡 → ∞ for all 𝜓 ∈ 𝒟 (8.3)

Our aim is to establish conditions for asymptotic stability of Markov semigroups.

8.4.1 Contractivity

Let’s recall some useful facts about the discrete time case.
First, if 𝑃 is any Markov matrix, we have, in the ℓ1 norm,

‖𝜓𝑃 ‖ ≤ ‖𝜓‖ for all 𝜓 ∈ 𝒟 (8.4)

This is because, given 𝜓 ∈ 𝒟,

‖𝜓𝑃 ‖ = ∑ ∣∑ 𝜓(𝑥)𝑃 (𝑥, 𝑦)∣ ≤ ∑ ∑ |𝜓(𝑥)|𝑃 (𝑥, 𝑦) = ‖𝜓‖


𝑦 𝑥 𝑦 𝑥

By linearity, for 𝜓, 𝜙 ∈ 𝒟, we then have

‖𝜓𝑃 − 𝜙𝑃 ‖ ≤ ‖𝜓 − 𝜙‖

Hence every Markov operator is contracting on 𝒟.

8.4. Asymptotic Stabilitiy 81


Continuous Time Markov Chains

Moreover, if 𝑃 is everywhere positive, then this inequality is strict:

Lemma 8.1 (Strict Contractivity)


If 𝑃 is a Markov matrix and 𝑃 (𝑥, 𝑦) > 0 for all 𝑥, 𝑦, then

‖𝜓𝑃 − 𝜙𝑃 ‖ < ‖𝜓 − 𝜙‖ for all 𝜓, 𝜙 ∈ 𝒟 with 𝜓 ≠ 𝜙

The proof follows from the strict triangle inequality, as opposed to the weak triangle inequality we used to obtain (8.4).
See, for example, Proposition 3.1.2 of [Lasota and Mackey, 1994] or Lemma 8.2.3 of [Stachurski, 2009].

8.4.2 Uniqueness

Irreducibility of a given Markov chain implies that there are no disjoint absorbing sets.
This in turn leads to uniqueness of stationary distributions:

Theorem 8.3
Let (𝑃𝑡 ) be a UC Markov semigroup on 𝑆. If (𝑃𝑡 ) is irreducible, then (𝑃𝑡 ) has at most one stationary distribution.

P roof. Suppose to the contrary that 𝜓 and 𝜙 are both stationary for (𝑃𝑡 ).
Since (𝑃𝑡 ) is irreducible, we know that 𝑃1 (𝑥, 𝑦) > 0 for all 𝑥, 𝑦 ∈ 𝑆.
If 𝜓 ≠ 𝜙, then, due to positivity of 𝑃1 , the strict inequality in Lemma 8.1 holds.
At the same time, by stationarity, ‖𝜓𝑃 − 𝜙𝑃 ‖ = ‖𝜓 − 𝜙‖. Contradiction.

Example 8.1
An M/M/1 queue with parameters 𝜇, 𝜆 is a continuous time Markov chain (𝑋𝑡 ) on 𝑆 = Z+ with with intensity matrix

−𝜆 𝜆 0 0 ⋯
⎛ 𝜇 −(𝜇 + 𝜆) 𝜆 0 ⋯⎞
𝑄=⎜

⎜ 0

⎟ (8.5)
𝜇 −(𝜇 + 𝜆) 𝜆 ⋯⎟
⎝ ⋮ ⋮ ⋮ ⋮ ⋱⎠

The chain (𝑋𝑡 ) records the length of the queue at each moment in time.
The intensity matrix captures the idea that customers flow into the queue at rate 𝜆 and are served (and hence leave the
queue) at rate 𝜇.
If 𝜆 and 𝜇 are both positive, then there is a 𝑄-positive probability flow between any two states, in both directions, so the
corresponding semigroup (𝑃𝑡 ) is irreducible.
Theorem 8.3 now tells us that (𝑃𝑡 ) has at most one stationary distribution.

82 Chapter 8. Stationarity and Ergodicity


Continuous Time Markov Chains

8.4.3 Stability from the Skeleton

Recall the definition of asymptotic stability given in (8.3).


Analogously, we call an individual Markov operator 𝑃 asymptotically stable if 𝑃 has a unique stationary distribution 𝜓∗
in 𝒟 and 𝜓𝑃 𝑛 → 𝜓∗ as 𝑛 → ∞ for all 𝜓 ∈ 𝒟.
The next result gives a connection between discrete and continuous stability.
The critical ingredient linking these two concepts is the contractivity in (8.4).

Lemma 8.2
Let (𝑃𝑡 ) be a Markov semigroup. If there exists an 𝑠 > 0 such that the Markov matrix 𝑃𝑠 is asymptotically stable, then
(𝑃𝑡 ) is asymptotically stable with the same stationary distribution.

P roof. Let (𝑃𝑡 ) and 𝑠 be as in the statement of Lemma 8.2.


Let 𝜓∗ be the stationary distribution of 𝑃𝑠 . Fix 𝜓 ∈ 𝒟 and 𝜖 > 0.
By stability of 𝑃𝑠 , we can take an 𝑛 ∈ N such that ‖𝜓𝑃𝑠𝑛 − 𝜓∗ ‖ < 𝜖.
Pick any 𝑡 > 𝑠𝑛. Set ℎ ∶= 𝑡 − 𝑠𝑛.
By the contractivity in (8.4) and 𝑃𝑠𝑛 = 𝑃𝑠𝑛 , we have

‖𝜓𝑃𝑡 − 𝜓∗ ‖ = ‖𝜓𝑃𝑠𝑛 𝑃ℎ − 𝜓∗ 𝑃ℎ ‖ ≤ ‖𝜓𝑃𝑠𝑛 − 𝜓∗ ‖ < 𝜖

Hence asymptotic stability holds for (𝑃𝑡 ).

8.4.4 Stability via Drift

In this section we address drift conditions, which are a powerful method for obtaining asymptotic stability when the state
space can be infinite.
The idea is to show that the state tends to drift back to a finite set over time.
Such drift, when combined with the contractivity in Lemma 8.1, is enough to give global stability.
The next theorem gives a useful version of this class of results.

Theorem 8.4
Let (𝑃𝑡 ) be a UC Markov semigroup with intensity matrix 𝑄. If (𝑃𝑡 ) is irreducible and there exists a function 𝑣 ∶ 𝑆 → R+ ,
a finite set 𝐹 ⊂ 𝑆 and positive constants 𝜖 and 𝑀 such that

𝑀 if 𝑥 ∈ 𝐹 and
∑ 𝑄(𝑥, 𝑦)𝑣(𝑦) ≤ {
𝑦 −𝜖 otherwise

then (𝑃𝑡 ) is asymptotically stable.

The proof of Theorem 8.4 can be found in [Pichór et al., 2012].

Example 8.2

8.4. Asymptotic Stabilitiy 83


Continuous Time Markov Chains

Consider again the M/M/1 queue on Z+ with intensity matrix (8.5).


Suppose that 𝜆 < 𝜇.
It is intuitive that, in this case, the queue length will not tend to infinity (since the service rate is higher than the arrival
rate).
This intuition can be confirmed via Theorem 8.4, after setting 𝑣(𝑗) = 𝑗.
Indeed, we have, for any 𝑖 ≥ 1,

∑ 𝑄(𝑖, 𝑗)𝑣(𝑗) = (𝑖 − 1)𝜇 − 𝑖(𝜇 + 𝜆) + (𝑖 + 1)𝜆 = 𝜆 − 𝜇


𝑗≥0

Setting 𝐹 = {0} and 𝑀 = 𝜆 − 𝜇 = −𝜖, we see that the conditions of Theorem 8.4 hold.
Hence the associated semigroup (𝑃𝑡 ) is asymptotically stable.

Corollary 8.2
If (𝑃𝑡 ) is an irreducible UC Markov semigroup and 𝑆 is finite, then (𝑃𝑡 ) is asymptotically stable.

A solved exercise below asks you to confirm this.

8.5 Exercises

Exercise 8.1
Let (𝑃𝑡 ) be a Markov semigroup. True or false: for this semigroup, every state 𝑥 is accessible from itself.

Exercise 8.2
Let (𝜆𝑘 ) be a bounded non-increasing sequence in (0, ∞).
A pure birth process starting at zero is a continuous time Markov process (𝑋𝑡 ) on state space Z+ with intensity matrix

−𝜆0 𝜆0 0 0 ⋯

⎜ 0 −𝜆1 𝜆1 0 ⋯⎞⎟
𝑄=⎜
⎜ 0 ⎟
0 −𝜆2 𝜆2 ⋯⎟
⎝ ⋮ ⋮ ⋮ ⋮ ⋱⎠

Show that (𝑃𝑡 ), the corresponding Markov semigroup, has no stationary distribution.

Exercise 8.3
Confirm that Theorem 8.4 implies Corollary 8.2.

84 Chapter 8. Stationarity and Ergodicity


Continuous Time Markov Chains

8.6 Solutions

Solution to Exercise 8.1


The statement is true. With 𝑡 = 0 we have 𝑃𝑡 (𝑥, 𝑥) = 𝐼(𝑥, 𝑥) = 1 > 0.

Solution to Exercise 8.2


Suppose to the contrary that 𝜙 ∈ 𝒟 and 𝜙𝑄 = 0.
Then, for any 𝑗 ≥ 1,

(𝜙𝑄)(𝑗) = ∑ 𝜙(𝑖)𝑄(𝑖, 𝑗) = −𝜆𝑗 𝜙(𝑗) + 𝜆𝑗−1 𝜙(𝑗 − 1) = 0


𝑖≥0

Since (𝜆𝑘 ) is non-increasing, it follows that

𝜙(𝑗) 𝜆𝑗−1
= ≥1
𝜙(𝑗 − 1) 𝜆𝑗

Therefore, for any 𝑗 ≥ 1, it must be:

𝜙(𝑗) ≥ 𝜙(𝑗 − 1)

It follows that 𝜙 is non-decreasing on Z+ .


But 𝒟 contains no non-decreasing functions when the state space is infinite. (Why?)
Contradiction.

Solution to Exercise 8.3


Let (𝑃𝑡 ) be an irreducible UC Markov semigroup and let 𝑆 be finite.
Pick any positive constants 𝑀 , 𝜖 and set 𝑣 = 𝑀 and 𝐹 = 𝑆.
We then have

∑ 𝑄(𝑥, 𝑦)𝑣(𝑦) = 𝑀 ∑ 𝑄(𝑥, 𝑦) = 0


𝑦 𝑦

Hence the drift condition in Theorem 8.4 holds and (𝑃𝑡 ) is asymptotically stable.

8.6. Solutions 85
Continuous Time Markov Chains

86 Chapter 8. Stationarity and Ergodicity


CHAPTER

NINE

BIBLIOGRAPHY

9.1 References

87
Continuous Time Markov Chains

88 Chapter 9. Bibliography
BIBLIOGRAPHY

[App19] David Applebaum. Semigroups of Linear Operators. Volume 93. Cambridge University Press, 2019.
[Bob05] Adam Bobrowski. Functional analysis for probability and stochastic processes: an introduction. Cambridge
University Press, 2005.
[How17] Douglas C Howard. Elements of Stochastic Processes: A Computational Approach. FE Press, 2017.
[LM94] Andrzej Lasota and Michael C Mackey. Chaos, fractals, and noise: stochastic aspects of dynamics. Vol-
ume 97. Springer Science & Business Media, 1994.
[LG16] Jean-François Le Gall. Brownian motion, martingales, and stochastic calculus. Volume 274. Springer, 2016.
[Lig10] Thomas Milton Liggett. Continuous time Markov processes: an introduction. Volume 113. American Mathe-
matical Soc., 2010.
[Nor98] James R Norris. Markov chains. Number 2. Cambridge University Press, 1998.
[Par08] Etienne Pardoux. Markov processes and applications: algorithms, networks, genome and finance. Volume
796. John Wiley & Sons, 2008.
[PichorRTKaminska12] Katarzyna Pichór, Ryszard Rudnicki, and Marta Tyran-Kamińska. Stochastic semigroups and
their applications to biological models. Demonstratio Mathematica, 45(2):463–494, 2012.
[SK11] Prasanna K Sahoo and Palaniappan Kannappan. Introduction to functional equations. CRC Press, 2011.
[Sta09] John Stachurski. Economic dynamics: theory and computation. MIT Press, 2009.
[Str13] Daniel W Stroock. An introduction to Markov processes. Volume 230. Springer Science & Business Media,
2013.
[Wal12] John B Walsh. Knowing the odds: an introduction to probability. Volume 139. American Mathematical Soc.,
2012.

89
Continuous Time Markov Chains

90 Bibliography
PROOF INDEX

algorithm-0 jccs
algorithm-0 (markov_prop), 30 jccs (uc_mc_semigroups), 70

diffexpmap jctosg
diffexpmap (generators), ?? jctosg (kolmogorov_bwd), 43
ecuc lemma-1
ecuc (generators), ?? lemma-1 (kolmogorov_bwd), 41
ejc_algo perimposs
ejc_algo (kolmogorov_bwd), 40 perimposs (ergodicity), 81
equivirr scintcon
equivirr (ergodicity), 80
scintcon (uc_mc_semigroups), 70
erlexp sdrift
erlexp (memoryless), 7
sdrift (ergodicity), 83
example-5
example-5 (ergodicity), 82
sfinite
sfinite (ergodicity), 84
example-8
example-8 (ergodicity), 83
stabskel
stabskel (ergodicity), 83
exp_unique
exp_unique (memoryless), 5 statfromq
statfromq (ergodicity), 79
generators-prf-1
generators-prf-1 (generators), ?? strictcontract
strictcontract (ergodicity), 82
generators-prf-2
generators-prf-2 (generators), ?? theorem-0
theorem-0 (poisson), 15
imatjc
imatjc (uc_mc_semigroups), 71 theorem-1
theorem-1 (poisson), 15
intvsmk
intvsmk (kolmogorov_fwd), 54 theorem-2
theorem-2 (kolmogorov_bwd), 42
intvsmk_c
intvsmk_c (kolmogorov_fwd), 54

91
Continuous Time Markov Chains

theorem-5
theorem-5 (uc_mc_semigroups), 72

uc-mc-semigroups-prf-1
uc-mc-semigroups-prf-1 (uc_mc_semigroups), 69

ucsgec
ucsgec (generators), ??

uniirr
uniirr (ergodicity), 82

usmg
usmg (uc_mc_semigroups), 68

92 Proof Index

You might also like