0% found this document useful (0 votes)
49 views394 pages

Hybrid Switching Diffusions Properties and Applications

There are three main points: 1. The document is about numerical approximation methods for solving hybrid switching diffusions, which are stochastic processes involving switching between different diffusion models. 2. It describes algorithms for numerically approximating solutions to switching diffusions and analyzing their convergence properties. 3. It also covers approximating the invariant measures of ergodic switching diffusions, which provide long-term statistical information about the processes.

Uploaded by

sophiali98222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views394 pages

Hybrid Switching Diffusions Properties and Applications

There are three main points: 1. The document is about numerical approximation methods for solving hybrid switching diffusions, which are stochastic processes involving switching between different diffusion models. 2. It describes algorithms for numerically approximating solutions to switching diffusions and analyzing their convergence properties. 3. It also covers approximating the invariant measures of ergodic switching diffusions, which provide long-term statistical information about the processes.

Uploaded by

sophiali98222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 394

Stochastic Mechanics Stochastic Modelling

Random Media
Signal Processing
and Applied Probability
and Image Synthesis (Formerly:
Applications of Mathematics)
Mathematical Economics and Finance
Stochastic Optimization
Stochastic Control
Stochastic Models in Life Sciences 63
Edited by B. Rozovskiı̆
G. Grimmett

Advisory Board D. Dawson


D. Geman
I. Karatzas
F. Kelly
Y. Le Jan
B. Øksendal
G. Papanicolaou
E. Pardoux
For other titles in this series, go to
http://www.springer.com/series/602
G. George Yin • Chao Zhu

Hybrid Switching Diffusions


Properties and Applications
G. George Yin Chao Zhu
Department of Mathematics Department of Mathematical Sciences
Wayne State University University of Wisconsin-Milwaukee
Detroit, MI 48202 Milwaukee, WI 53201
USA USA
[email protected] [email protected]

Managing Editors
Boris Rozovskiı̆ Geoffrey Grimmett
Division of Applied Mathematics Centre for Mathematical Sciences
Brown University University of Cambridge
182 George St Wilberforce Road
Providence, RI 02912 Cambridge CB3 0WB
USA UK
[email protected] [email protected]

ISSN 0172-4568
ISBN 978-1-4419-1104-9 e-ISBN 978-1-4419-1105-6
DOI 10.1007/978-1-4419-1105-6
Springer New York Dordrecht Heidelberg London

Library of Congress Control Number: 2009934520

Mathematics Subject Classification (2000): 60J27, 60J60, 93E03, 93E15

© Springer Science+Business Media, LLC 2010


All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY
10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection
with any form of information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject
to proprietary rights.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)


In memory of my sister Kewen Yin, who taught me algebra, calculus,
physics, and chemistry during the Cultural Revolution, when I was working
in a factory in Beijing and she was over 1000 miles away at a factory in
Lanzhou
George Yin

To my parents Yulan Zhong and Changming Zhu and my wife Lijing Sun,
with love
Chao Zhu
Contents

Preface xi

Conventions xv

Glossary of Symbols xvii

1 Introduction and Motivation 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 What Is a Switching Diffusion . . . . . . . . . . . . . . . . . 4
1.4 Examples of Switching Diffusions . . . . . . . . . . . . . . . 5
1.5 Outline of the Book . . . . . . . . . . . . . . . . . . . . . . 21

Part I: Basic Properties, Recurrence, Ergodicity 25

2 Switching Diffusion 27
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Switching Diffusions . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Weak Continuity . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5 Feller Property . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6 Strong Feller Property . . . . . . . . . . . . . . . . . . . . . 52
2.7 Continuous and Smooth Dependence on the Initial Data x . 56
2.8 A Remark Regarding Nonhomogeneous Markov Processes . 65
2.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

vii
viii Contents

3 Recurrence 69
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Formulation and Preliminaries . . . . . . . . . . . . . . . . 70
3.2.1 Switching Diffusion . . . . . . . . . . . . . . . . . . . 70
3.2.2 Definitions of Recurrence and Positive Recurrence . 72
3.2.3 Preparatory Results . . . . . . . . . . . . . . . . . . 72
3.3 Recurrence and Transience . . . . . . . . . . . . . . . . . . 78
3.3.1 Recurrence . . . . . . . . . . . . . . . . . . . . . . . 78
3.3.2 Transience . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4 Positive and Null Recurrence . . . . . . . . . . . . . . . . . 85
3.4.1 General Criteria for Positive Recurrence . . . . . . . 85
3.4.2 Path Excursions . . . . . . . . . . . . . . . . . . . . 89
3.4.3 Positive Recurrence under Linearization . . . . . . . 89
3.4.4 Null Recurrence . . . . . . . . . . . . . . . . . . . . 93
3.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.6 Proofs of Several Results . . . . . . . . . . . . . . . . . . . . 100
3.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4 Ergodicity 111
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.2 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3 Feedback Controls for Weak Stabilization . . . . . . . . . . 119
4.4 Ramifications . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.5 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . 129
4.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Part II: Numerical Solutions and Approximation 135

5 Numerical Approximation 137


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 Numerical Algorithms . . . . . . . . . . . . . . . . . . . . . 139
5.4 Convergence of the Algorithm . . . . . . . . . . . . . . . . . 140
5.4.1 Moment Estimates . . . . . . . . . . . . . . . . . . . 140
5.4.2 Weak Convergence . . . . . . . . . . . . . . . . . . . 144
5.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.6 Discussions and Remarks . . . . . . . . . . . . . . . . . . . 152
5.6.1 Remarks on Rates of Convergence . . . . . . . . . . 153
5.6.2 Remarks on Decreasing Stepsize Algorithms . . . . . 155
5.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

6 Numerical Approximation to Invariant Measures 159


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2 Tightness of Approximation Sequences . . . . . . . . . . . . 161
6.3 Convergence to Invariant Measures . . . . . . . . . . . . . . 165
Contents ix

6.4 Proof: Convergence of Algorithm . . . . . . . . . . . . . . . 169


6.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Part III: Stability 181

7 Stability 183
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.2 Formulation and Auxiliary Results . . . . . . . . . . . . . . 184
7.3 p-Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.3.1 Stability . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.3.2 Auxiliary Results . . . . . . . . . . . . . . . . . . . . 193
7.3.3 Necessary and Sufficient Conditions for p-Stability . 201
7.4 Stability and Instability of Linearized Systems . . . . . . . 203
7.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
7.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

8 Stability of Switching ODEs 217


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.2 Formulation and Preliminary Results . . . . . . . . . . . . . 219
8.2.1 Problem Setup . . . . . . . . . . . . . . . . . . . . . 219
8.2.2 Preliminary Results . . . . . . . . . . . . . . . . . . 220
8.3 Stability and Instability: Sufficient Conditions . . . . . . . . 227
8.4 A Sharper Result . . . . . . . . . . . . . . . . . . . . . . . . 231
8.5 Remarks on Liapunov Exponent . . . . . . . . . . . . . . . 236
8.5.1 Stability under General Setup . . . . . . . . . . . . . 236
8.5.2 Invariant Density . . . . . . . . . . . . . . . . . . . . 238
8.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

9 Invariance Principles 251


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
9.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
9.3 Invariance (I): A Sample Path Approach . . . . . . . . . . . 253
9.3.1 Invariant Sets . . . . . . . . . . . . . . . . . . . . . . 254
9.3.2 Linear Systems . . . . . . . . . . . . . . . . . . . . . 263
9.4 Invariance (II): A Measure-Theoretic Approach . . . . . . . 265
9.4.1 ω-Limit Sets and Invariant Sets . . . . . . . . . . . . 269
9.4.2 Switching Diffusions . . . . . . . . . . . . . . . . . . 275
9.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

Part IV: Two-time-scale Modeling and Applications 283

10 Positive Recurrence: Weakly Connected Ergodic Classes 285


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
10.2 Problem Setup and Notation . . . . . . . . . . . . . . . . . 285
10.3 Weakly Connected, Multiergodic-Class Switching Processes 286
x Contents

10.3.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . 287


10.3.2 Weakly Connected, Multiple Ergodic Classes . . . . 288
10.3.3 Inclusion of Transient Discrete Events . . . . . . . . 297
10.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

11 Stochastic Volatility Using Regime-Switching Diffusions 301


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
11.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
11.3 Asymptotic Expansions . . . . . . . . . . . . . . . . . . . . 306
11.3.1 Construction of ϕ0 (S, t, i) and ψ0 (S, τ, i) . . . . . . . 308
11.3.2 Construction of ϕ1 (S, t, i) and ψ1 (S, τ, i) . . . . . . . 309
11.3.3 Construction of ϕk (S, t) and ψk (S, τ ) . . . . . . . . . 313
11.4 Asymptotic Error Bounds . . . . . . . . . . . . . . . . . . . 317
11.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

12 Two-Time-Scale Switching Jump Diffusions 323


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
12.2 Fast-Varying Switching . . . . . . . . . . . . . . . . . . . . . 326
12.2.1 Fast-Varying Markov Chain Model . . . . . . . . . . 326
12.2.2 Limit System . . . . . . . . . . . . . . . . . . . . . . 329
12.3 Fast-Varying Diffusion . . . . . . . . . . . . . . . . . . . . . 339
12.4 Discussion and Remarks . . . . . . . . . . . . . . . . . . . . 348
12.5 Remarks on Numerical Solutions for Switching Jump Diffu-
sions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
12.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

A Appendix 355
A.1 Discrete-Time Markov Chains . . . . . . . . . . . . . . . . . 355
A.2 Continuous-Time Markov Chains . . . . . . . . . . . . . . . 358
A.3 Fredholm Alternative and Ramification . . . . . . . . . . . 362
A.4 Martingales, Gaussian Processes, and Diffusions . . . . . . . 366
A.4.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . 366
A.4.2 Gaussian Processes and Diffusion Processes . . . . . 369
A.5 Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . 371
A.6 Hybrid Jump Diffusion . . . . . . . . . . . . . . . . . . . . . 376
A.7 Miscellany . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
A.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

References 379

Index 392
Preface

This book encompasses the study of hybrid switching diffusion processes


and their applications. The word “hybrid” signifies the coexistence of con-
tinuous dynamics and discrete events, which is one of the distinct features
of the processes under consideration. Much of the book is concerned with
the interactions of the continuous dynamics and the discrete events. Our
motivations for studying such processes originate from emerging and ex-
isting applications in wireless communications, signal processing, queueing
networks, production planning, biological systems, ecosystems, financial
engineering, and modeling, analysis, and control and optimization of large-
scale systems, under the influence of random environments.
Displaying mixture distributions, switching diffusions may be described
by the associated operators or by systems of stochastic differential equa-
tions together with the probability transition laws of the switching actions.
We either have Markov-modulated switching diffusions or processes with
continuous state-dependent switching. The latter turns out to be much
more challenging to deal with. Viewing the hybrid diffusions as a number
of diffusions joined together by the switching process, they may be seem-
ingly not much different from their diffusion counterpart. Nevertheless, the
underlying problems become more difficult to handle, especially when the
switching processes depend on continuous states. The difficulty is due to
the interaction of the discrete and continuous processes and the tangled
and hybrid information pattern.
A salient feature of the book is that the discrete event process is al-
lowed to depend on the continuous dynamics. Apart from the existence
and uniqueness of solutions, we treat a number of basic properties such as

xi
xii Preface

regularity, the Feller property, the strong Feller property, and continuous
and smooth dependence on initial data of the solutions of the associated
stochastic differential equations with switching.
A large part of this work is concerned with the stability of switching
diffusion processes. Here stability is meant in the broad sense including
both weak and strong stability. That is, we focus on both a “neighbor-
hoods of infinity” and neighborhoods of equilibrium points; the stability
corresponding to the former is referred to as weak stability, whereas that
of the latter is stability in the usual sense. In studying deterministic dy-
namic systems, researchers used Lagrange stability to depict systems that
are ultimately uniformly bounded. Treating stochastic systems, one would
still hope to adopt such a notion. Unfortunately, the boundedness excludes
many important cases. Thus, we replace this boundedness by a weaker
notion known as recurrence (returning to a prescribed compact region in
finite time). When the expected returning time is finite, we have so-called
positive recurrence. A crucial question is: Under what conditions will the
systems be recurrent (resp. positive recurrent). We demonstrate that pos-
itive recurrence implies ergodicity and provide criteria for the existence
of invariant distributions together with their representation. In addition
to studying asymptotic properties of the systems in the neighborhood of
infinity, we examine the behavior of the systems at the equilibria. Also con-
sidered are invariance principles and stability of differential equations with
random switching but without diffusions.
Because the systems are rarely solvable in closed form, numerical meth-
ods become a viable alternative. We construct algorithms to approximate
solutions of such systems with state-dependent switching, and provide suf-
ficient conditions for convergence for numerical approximations to the in-
variant measures.
In real-world applications, hybrid systems encountered are often of large
scale and complex leading to intensive-computation requirement. Reduc-
tion of computational complexity is thus an important issue. To take this
into consideration, we consider time-scale separation in hybrid switching-
diffusion and hybrid-jump-diffusion models. Several issues including recur-
rence of processes with switching having multiple-weak-connected ergodic
classes, two-time-scale modeling of stochastic volatility, and weak conver-
gence analysis of systems with fast and slow motions involving additional
Poisson jumps are studied in details.
This book is written for applied mathematicians, probabilists, systems
engineers, control scientists, operations researchers, and financial analysts
among others. The results presented in the book are useful to researchers
working in stochastic modeling, systems theory, and applications in which
continuous dynamics and discrete events are intertwined. The book can be
served as a reference for researchers and practitioners in the aforementioned
areas. Selected materials from the book may also be used in a graduate-level
course on stochastic processes and applications.
Preface xiii

This book project could not have been completed without the help and
encouragement of many people. We are deeply indebted to Wendell Flem-
ing and Harold Kushner, who introduced the wonderful world of stochastic
systems to us. We have been privileged to have the opportunity to work
with Rafail Khasminskii on a number of research projects, from whom we
have learned a great deal about Markov processes, diffusion processes, and
stochastic stability. We express our special appreciation to Vikram Krish-
namurthy, Ruihua Liu, Yuanjin Liu, Xuerong Mao, John Moore, Qingshuo
Song, Chenggui Yuan, Hanqin Zhang, Qing Zhang, and Xunyu Zhou, who
have worked with us on various projects related to switching diffusions. We
also thank Eric Key, Jose Luis Menaldi, Richard Stockbridge, and Fubao Xi
for many useful discussions. This book, its contents, its presentation, and
exposition have benefited a great deal from the comments by Ruihua Liu,
Chenggui Yuan, Hanqin Zhang, and by several anonymous reviewers, who
read early versions of the drafts and offered many insightful comments. We
thank the series editor, Boris Rozovsky, for his encouragement and con-
sideration. Our thanks also go to Springer senior editor, Achi Dosanjh, for
her assistance and help, and to the production manager, and the Springer
professionals for their work in finalizing the book. During the years of our
study, the research was supported in part by the National Science Founda-
tion, and the National Security Agency. Their continuing support is greatly
appreciated.

Detroit, Michigan George Yin


Milwaukee, Wisconsin Chao Zhu
Conventions

We clarify the numbering system and cross-reference conventions used


throughout the book. Equations are numbered consecutively within a chap-
ter. For example, (2.10) indicates the tenth equation in Chapter 2. Corol-
laries, definitions, examples, lemmas, propositions, remarks, and theorems
are numbered sequentially throughout each chapter. For example, Defini-
tion 3.1, Theorem 3.2, Corollary 3.3 and so on. Assumptions are marked
consecutively within a chapter. For cross reference, an equation is identified
by the chapter number and the equation number; similar conventions are
used for theorems, remarks, assumptions, and so on.
Throughout the book, we assume that all deterministic processes are
Borel measurable and all stochastic processes are measurable with respect
to a given filtration. A subscript generally denotes either a finite or an
infinite sequence. However, the ε-dependence of a sequence is designated
in the superscript. To facilitate reading, we provide a glossary of symbols
used in the subsequent chapters.

xv
Glossary of Symbols

A0 transpose of A (either a matrix or a vector)


Cov(ξ) covariance of a random variable ξ
C space of complex numbers
C(D) space of real-valued continuous functions defined on D
Cbk (D) space of real-valued functions with bounded and
continuous derivatives up to the order k
Cbk Cbk (D) with D = Rr
C0k C k -functions with compact support
Dc complement of a set D
D closure of a set D, D = D ∪ ∂D
∂D boundary of a set D
D([0, T ]; S) space of S-valued functions
being right continuous and having left-hand limits
D ⊂⊂ E D ⊂ D ⊂ E and D is compact
Ex,i expectation with X(0) = x and α(0) = i
Eξ expectation of a random variable ξ
F σ-algebra
{Ft } filtration {Ft , t ≥ 0}
I identity matrix of suitable dimension
IA indicator function of a set A
K generic positive constant with
convention K + K = K and KK = K
M state space of switching process α(t)
N (x) neighborhood of x centered at the origin
O(y) function of y such that supy |O(y)|/|y| < ∞

xvii
xviii Glossary of Symbols

Px,i probability with X(0) = x and α(0) = i


P(ξ ∈ ·) probability
P distribution of a random variable ξ
Qf (·)(i) = j6=i qij (f (j) − f (i)) where Q = (qij )
Q(x) x-dependent generator of the switching process
R space of real numbers
Rr r-dimensional real Euclidean space
S(r) or Sr ball centered at the origin with radius r
X x,i (t) X(t) with initial data X(0) = x and α(0) = i

a+ = max{a, 0} for a real number a


a− = max{−a, 0} for a real number a
a.s. almost surely
ha, bi inner product of vectors a and b
a1 ∧ · · · ∧ a l = min{a1 , . . . , al } for ai ∈ R, i = 1, . . . , l
a1 ∨ · · · ∨ a l = max{a1 , . . . , al } for ai ∈ R, i = 1, . . . , l
diag(A1 , . . . , Al ) diagonal matrix of blocks A1 , . . . , Al

exp(Q) eQ for a matrix Q


fx or ∇x f gradient of f with respect to x
fxx or ∇2x f Hessian of f with respect to x
i.i.d. independent and identically distributed
ln x or log x natural logarithm of x
m(·) Lebesgue measure on R
o(y) a function of y such that limy→0 o(y)/|y| = 0
p(dt, dz) Poisson random measure with intensity dt × m(dz)
tr(A) trace of matrix A
w.p.1 with probability one
bxc integer part of x
(Ω, F, P) probability space
(Ω, F, {Ft }, P) filtered probability space
α(t) continuous-time process with right-continuous
sample paths
δ ij = 1 if i = j; = 0 otherwise
ε positive small parameter
λmax (A) maximal eigenvalue of a symmetric matrix A
λmin (A) minimal eigenvalue of a symmetric matrix A
1l a column vector with all entries being 1
def
:= or = defined to be equal to
2 end of a proof
|·| norm of an Euclidean space or a function space
|| · || essential sup-norm
1
Introduction and Motivation

1.1 Introduction
This book focuses on switching diffusion processes involving both con-
tinuous dynamics and discrete events. Before proceeding to the detailed
study, we address the following questions. Why should we study such hy-
brid systems? What are typical examples arising from applications? What
are the main properties we wish to study? This introductory chapter pro-
vides motivations of our study, delineates switching diffusions in a simple
way, presents a number of application examples, and gives an outline of the
entire book.

1.2 Motivation
Owing to their wide range of applications, hybrid switching diffusions, also
known as switching diffusion systems, have become more popular recently
and have drawn growing attention, especially in the fields of control and
optimization. Because of the presence of both continuous dynamics and
discrete events, such systems are capable of describing complex systems
and their inherent uncertainty and randomness in the environment. The
formulation provides more opportunity for realistic models, but adds more
difficulties in analyzing the underlying systems. Resurgent efforts have been
devoted to learning more about the processes and their properties. Much
of the study originated from applications arising in control engineering,

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 1
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_1,
© Springer Science + Business Media, LLC 2010
2 1. Introduction and Motivation

manufacturing systems, estimation and filtering, two-time-scale systems,


and financial engineering; see [74, 78, 123, 146, 147, 154, 167, 170, 184],
among others. In these applications, random-switching processes are used
to model demand rate or machine capacity in production planning, to de-
scribe the volatility changes over time to capture discrete shifts such as
market trends and interest rates and the like in finance and insurance, and
to model time-varying parameters for network problems. In reference to
recent developments, this book emphasizes the development of basic prop-
erties of switching diffusion processes.
The coexistence of continuous dynamics and discrete events and their
interactions reflect the salient features of the systems under consideration.
Such systems have become increasingly important for formulation, analysis,
and optimization in many applications. Perhaps, one of the reasons is that
many real-world applications in the new era require sophisticated models,
in which the traditional dynamic system setup using continuous dynam-
ics given by differential equations alone is inadequate. In addition, there
have been increasing demands for modeling large-scale and complex sys-
tems, designing optimal controls, and conducting optimization tasks. In the
traditional setting, the design of a feedback controller is based on a plant
having fixed parameters, which is inadequate when the actual system differs
from the assumed nominal model. As a consequence, the controller is not
able to serve its purposes and cannot attenuate disturbances or perturba-
tions. Much effort has been directed to the design of more “robust” controls
in recent years. Studies of hybrid systems with Markov regime switching
contribute significantly to this end. Various regime-switching models have
been proposed and examined. The so-called jump linear systems, widely
used in engineering, have been studied in Mariton [123]; controllability and
stabilizability of such systems are treated in Ji and Chizeck [78]. Estima-
tion problems are considered in Sworder and Boyd [154]. Manufacturing
and production planning under the framework of hierarchical structure are
covered in Sethi and Zhang [146], and Sethi, Zhang, and Zhang [147]. Fi-
nancial engineering applications and the use of hybrid geometric Brownian
motion models can be found in [170, 184], among others. To reduce the
complexity, effort has been made to deal with hybrid systems with regime
switching by means of time-scale separation in Yin and Zhang [176]; see
also [167, 185]. Hybrid systems have received increasing attention in recent
years. Other than the switching diffusion systems mentioned above, one
may find the formulation of a somewhat different setup in Bensoussan and
Menaldi [12]. An introductory text on stochastic control for jump diffusions
is in Hanson [64], whereas a treatment of stochastic hybrid systems with
applications to communication networks is in Hespanha [67]. For references
on stochastic control and controlled Markov processes, we refer the reader
to Fleming and Rishel [44], Fleming and Soner [45], Krylov [96], and Yong
and Zhou [181].
A special feature of switching diffusion processes is: In the systems, con-
1.2 Motivation 3

tinuous dynamics and discrete events are intertwined. For example, one
of the early efforts of using such hybrid models for financial applications
can be traced back to [5], in which both the appreciation rate and the
volatility rate of a stock depend on a continuous-time Markov chain. In the
simplest case, a stock market may be considered to have two “modes” or
“regimes,” up and down, resulting from the state of the underlying econ-
omy, the general mood of investors in the market, and so on. The rationale
is that in the different modes or regimes, the volatility and return rates
are very different. The introduction of hybrid models makes it possible to
describe stochastic volatility in a relatively simple manner (simpler than
the so-called stochastic volatility models). Another example is concerned
with a wireless communication network. Consider the performance analysis
of an adaptive linear multiuser detector in a cellular direct-sequence code-
division multiple-access wireless network with changing user activity due
to an admission or access controller at the base station. Under certain con-
ditions, an associated optimization problem leads to a switching diffusion
limit; see [168]. To summarize, a centerpiece in the applications mentioned
above is a two-component Markov process: a continuous component and a
discrete-event component.
Long-time behavior of such hybrid systems is one of the major concerns
in stochastic processes, systems theory, control, and optimization. Nowa-
days, there is a fairly well-known theory for stability of diffusion processes.
The basic setup of the stability study and the foundational work were
originated in the work of Khasminskii [83] and Kushner [97]. The origin
of the study may be traced back to Kac and Krasovskii [79] for differen-
tial equations perturbed by Markov chains. Comprehensive treatment of
Markov processes is in Dynkin [38], whereas that of diffusion processes
may be found in Gihman and Skorohod [55], Liptser and Shiryayev [110],
and Stroock and Varadhan [153] and references therein. Concerning jump
processes and Markov chains, we mention the work of Chung [28], Cox
and Miller [29], and Doob [33]. Davis’s piecewise deterministic viewpoint
adds a new twist to Markov models. Ethier and Kurtz [43] examine weak
convergence of Markov processes and provide an in-depth study on char-
acterization of the limit process. Chen’s book [23] presents an approach
from the angle of coupling methods. Recently, stability of diffusion pro-
cesses with Markovian switching have received much attention (see, e.g.,
[6, 7, 92, 116, 119, 183, 187, 190] and the references therein). Some of
the recent developments in Markov modulated switching diffusions (where
the Markov chain and the Brownian motion are independent) are found
in Mao and Yuan [120]. Accommodating the many applications, although
these systems are more realistically addressing the demands, the nontradi-
tional setup makes the analysis of switching diffusions more difficult. To be
able to effectively treat these systems, it is of foremost importance to have
a thorough understanding of the underpinning of the systems. This is the
main theme of the current work.
4 1. Introduction and Motivation

1.3 What Is a Switching Diffusion


Let us begin with answering the question: What is a switching diffusion.
In this book, we consider a switching diffusion model, in which the discrete
events are modeled by a finite-state process. The following diagram provides
a visualization of a typical example. Consider a switching diffusion as given
in Figure 1.1 that consists of three diffusions sitting on three parallel plans.
The discrete event is a three-state jump process. We denote the pair of
processes by

(continuous process, discrete event) = (X(t), α(t)).

Suppose that initially, the process is at (X(0), α(0)) = (x, 1). The discrete
event process sojourns in discrete state 1 for a random duration; during this
period, the continuous component evolves according to the diffusion process
specified by the drift and diffusion coefficients associated with discrete state
1 until a jump of the discrete component takes place. At random moment τ 1 ,
a jump to discrete state 3 occurs. Then the continuous component evolves
according to the diffusion process whose drift and diffusion coefficients are
determined by discrete event 3. The process wanders around in the third
plan until another random jump time τ2 . At τ2 , the system switches to the
second parallel plan and follows another diffusion with different drift and
diffusion coefficients and so on.

Discrete-event State 3
6X(τ1 )

? X(τ4 )
X(τ2 ) 6
Discrete-event State 2

? Discrete-event State 1
X(τ3 ) X(0) = x, α(0) = 1

FIGURE 1.1. A “sample path” of regime-switching diffusion (X(t), α(t)).

At a first glance, one may feel that the process is not much different from
a diffusion because the switching is a finite-state process. Nevertheless,
even if the switching is a finite-state Markov chain independent of the
Brownian motion (subsequently referred to as Markov-modulated switching
1.4 Examples of Switching Diffusions 5

diffusions), the switching diffusion is still much harder to handle. The main
reason is the coupling and interactions due to the switching process make
the analysis much more difficult. For example, when we study recurrence
or stability, we have to deal with systems of coupled partial differential
equations.
One of the main features of this book is that a large part of it deals with
the switching component depending on the continuous component. This is
often referred to as a state-dependent switching process in what follows;
more precise statements about which will be made later. In this case, the
analysis is much more difficult than that of the case in which the switching
is independent of the continuous states. As shown in Chapter 2, some of the
time-honored characteristics such as continuous dependence of initial data
become highly nontrivial; even Feller properties are not straightforward to
obtain.

1.4 Examples of Switching Diffusions


To demonstrate the utility of the switching diffusion models, this section
provides a number of examples. Some of these examples are revisited in
later chapters.
Example 1.1. (Lotka–Volterra model) Concerning ecological population
modeling, proposed by Lotka and Volterra, the well-known Lotka–Volterra
models have been investigated extensively in the literature. When two or
more species live in proximity and share the same basic requirements, they
usually compete for resources, food, habitat, or territory. Both determinis-
tic and stochastic models of the Lotka–Volterra systems have been studied
extensively.
It has been noted that the growth rates and the carrying capacities are
often subject to environmental noise. Moreover, the qualitative changes of
the growth rates and the carrying capacities form an essential aspect of
the dynamics of the ecosystem. These changes usually cannot be described
by the traditional (deterministic or stochastic) Lotka–Volterra models. For
instance, the growth rates of some species in the rainy season is much dif-
ferent from those in the dry season. Moreover, the carrying capacities often
vary according to the changes in nutrition and food resources. Similarly,
the interspecific or intraspecific interactions differ in different environment.
It is natural to consider a stochastic Lotka–Volterra ecosystem in a ran-
dom environment that can be formulated by use of an additional factor
process. Consider the following stochastic differential equation with regime
switching

dx(t) = diag (x1 (t), . . . , xn (t))


(1.1)
× [(b(α(t)) − A(α(t))x(t))dt + Σ(α(t)) ◦ dw(t)] ,
6 1. Introduction and Motivation

or equivalently, in componentwise form


nh n
X i
dxi (t) = xi (t) bi (α(t)) − aij (α(t))xj (t) dt
j=1
o (1.2)
+σi (α(t)) ◦ dwi (t) , i = 1, . . . , n,

where w(·) = (w1 (·), . . . , wn (·))0 is an n-dimensional standard Brownian


motion, and for α ∈ M, b(α) = (b1 (α), . . . , bn (α))0 , A(α) = (aij (α)),
Σ(α) = diag(σ1 (α), . . . , σn (α)) represent different growth rates, commu-
nity matrices, and noise intensities in different external environments, re-
spectively, and z 0 denotes the transpose of z. Assume that bi (α) > 0 for
each α ∈ M and each i = 1, . . . , n, and the Markov chain α(·) and the
Brownian motion w(·) are independent. Without loss of generality, we also
assume that the initial conditions x(0) and α(0) are nonrandom. The above
formulation is seen to be in the form of Stratonovich. As explained in [84],
this form is more suitable for environment modeling. It is well known that
(1.2) is equivalent to the following stochastic differential equation in the
Itô sense
nh n
X i
dxi (t) = xi (t) ri (α(t)) − aij (α(t))xj (t) dt
o j=1 (1.3)
+σi (α(t))dwi (t) , i = 1, 2, . . . , n,

where ri (α(t)) := bi (α(t)) + 12 σi2 (α(t)) for each i = 1, 2, . . . , n.


Regime-switching stochastic Lotka–Volterra models have received much
attention lately. For instance, the study of trajectory behavior of Lotka–
Volterra competition bistable systems and systems with telegraph noises
was considered in [37], stochastic population dynamics under regime switch-
ing was treated in [113], the dynamics of a population in a Markovian en-
vironment were studied in [151], and the evolution of a system composed
of two predator-prey deterministic systems described by Lotka–Volterra
equations in a random environment was investigated in [156]. In the ab-
sence of regime switching, the system is completely modeled by stochastic
time evolution in a fixed environment. The results in a fixed environment
correspond to (1.2) or (1.3) in the case when the Markov chain has only one
state or the Markov chain always stays in the fixed state (environment).
When random environments are considered, the system’s qualitative be-
havior can be drastically different. In a recent paper of Zhu and Yin [189],
we considered several asymptotic properties of the models given above.

Example 1.2. Here we consider the problem of balanced realizations, a


concept that started gaining popularity in the fields of systems and con-
trol in the early 1980s, and has still attracted much attention lately. Bal-
anced realizations have been studied for finite-dimensional linear systems
1.4 Examples of Switching Diffusions 7

for nearly three decades. The original problem is concerned with linear
deterministic systems. In the 1980s, to bridge the gap between minimal
realization theory and to treat the problem of finding lower-order approxi-
mations, a particular realization was first introduced in the seminal paper
by Moore [129], where the problem was studied with principal component
analysis of linear systems. The term “balanced” was used because the real-
izations have a certain symmetry between the input and the output maps
characterized by the controllability and observability Grammians. Owing
to their importance and their wide range of applications, balanced realiza-
tions have attracted much attention. One of the applications areas is model
reduction. Asymptotic stability of the reduced-order systems was studied
in [129], and error bounds between the reduced-order model and the origi-
nal system were obtained in [58] in terms of the associated singular values.
There have been substantial extensions of the theory to time-varying linear
systems. Key existence results concerning balanced realizations were con-
tained in [148] and [157]. Subsequent work can be found in [66, 75, 133, 144]
and references therein.
We generalize the ideas to include random switching to represent random
perturbations under a stochastic environment. We suppose that for each in-
stant t, the state consists of a pair (X(t), α(t)) representing the continuous
state component and discrete event component, respectively. Let α(t) be a
continuous-time Markov chain with finite state space M = {1, 2, . . . , m0 }
and generator
m0
X
Q = (qij ) ∈ Rm0 ×m0 such that qij ≥ 0 for i 6= j, and qij = 0. (1.4)
j=1

Our proposed set of equations for balanced realizations includes coupling


terms due to the Markov switching of the model. In the trivial case of no
switching, the equations are decoupled and are standard balanced realiza-
tion equations. We loosely carry the term “balanced realization” over to our
more general setting. As in the classical theory for the case of no Markovian
switching, the condition for balanced realization is represented by a system
of algebraic equations. Our system of equations is a set of Riccati equations
but with additional couplings between them owing to the discrete events
not evident in the classical case. The underlying systems are modulated by
a continuous-time Markov chain with state space M. The matrix coeffi-
cients of the linear systems depend on the states of the Markov chain. At
any given instant t, the Markov chain takes one of the values (e.g., i) in
M. Then the system dynamics are determined by the matrix coefficients
associated with state i. After a random time, the Markov chain switches to
a new state j 6= i and stays there until the next jump. The system dynam-
ics are then determined by the matrix coefficients associated with j. Such
random jump linear systems arise frequently in applications where regime
changes are utilized to model the random environment. Assume that for
8 1. Introduction and Motivation

each i ∈ M, A(i, ·), B(i, ·), and C(i, ·) are bounded and continuously dif-
ferentiable matrix-valued functions with suitable dimensions. Consider the
following system
d
X(t) = A(α(t), t)X(t) + B(α(t), t)u(t), X(t0 ) = x0 ,
dt (1.5)
y(t) = C(α(t), t)X(t),

where the state X(t) ∈ Rn×1 , input u(t) ∈ Rr×1 , and output y(t) ∈ Rm0 .
Note that because a finite-state Markov chain is used, effectively, (1.5) can
be written as
m0
X m0
X
d
X(t) = A(i, t)X(t)I{α(t)=i} + B(i, t)u(t)I{α(t)=i} , X(t0 ) = x0 ,
dt i=1 i=1
m0
X
y(t) = C(i, t)X(t)I{α(t)=i} ,
i=1

where IS is the indicator function of the set S. In what follows, if a square


matrix D is positive definite (resp., nonnegative definite), we often write it
as D > 0 (resp., D ≥ 0). For D1 ∈ Rι×` for some ι, ` ≥ 1, D10 denotes its
transpose. For a suitable function f (t, i), we denote
m0
X X
Qf (t, ·)(i) = qij f (t, j) = qij (f (t, j) − f (t, i)) for each i ∈ M.
j=1 j6=i

For each t ≥ 0 and i ∈ M, a realization (A(i, t), B(i, t), C(i, t)) is said to
be uniformly completely controllable if and only if there is a δ > 0 such
that for some positive Lc (δ) and Uc (δ),

∞ > Uc (δ)I ≥ Gc (t − δ, t, i) ≥ Lc (δ)I > 0, (1.6)

where Gc (t − δ, t, i) is the controllability Grammian


Z t
Gc (t − δ, t, i) = Φ(t, λ, i)B(i, λ)B 0 (i, λ)Φ0 (t, λ, i)dλ, (1.7)
t−δ

and Φ(t, λ, i) is the state transition matrix (see [2, p. 349]) of the equation

dz(t)
= A(i, t)z(t).
dt
For each t ≥ 0 and i ∈ M, a realization (A(i, t), B(i, t), C(i, t)) is said
to be uniformly completely observable if and only if there is a δ > 0 such
that for some positive Lo (δ) and Uo (δ),

∞ > Uo (δ)I ≥ Go (t, t + δ, i) ≥ Lo (δ)I > 0, (1.8)


1.4 Examples of Switching Diffusions 9

where Go (t, t + δ, i) is the observability Grammian


Z t+δ
Go (t, t + δ, i) = Φ0 (λ, t, i)C 0 (i, λ)C(i, λ)Φ(λ, t, i)dλ. (1.9)
t

System (1.5) is said to have a balanced realization if there are nonsingular


coordinate transformations T (t, i) with
def
P (t, i) = T 0 (t, i)T (t, i) > 0

such that

P (t, i)Gc (t, i)P (t, i) = Go (t, i) + QP (t, ·)(i), i = 1, 2, . . . , m0 . (1.10)

Note that regime switching is one of the main features (fully degenerate
regime-switching diffusion with the second order term missing in the sys-
tem) of such systems compared with the traditional setup. To study the
behavior of such systems, it is crucial to have a thorough understanding of
the switching diffusion processes. For modeling and analysis of the balanced
realizations, we refer the reader to [111] for further reading.
Example 1.3. Continuing with the setup of the hybrid linear system

ẋ(t) = A(α(t))x(t) + B(α(t))u(t),

where α(t) is a continuous-time Markov chain taking values in a finite set


M = {1, . . . , m0 }, A(i) and B(i) for i ∈ M are matrices with compatible
dimensions, and u(·) is the control. In lieu of one linear system, we have
a number of systems coupled through the Markov chain. Such systems
have enjoyed numerous applications in emerging application areas such as
financial engineering and wireless communications, as well as in existing
applications. A class of important problems concerns the asymptotic be-
havior of such systems when they are in operation for a long time. Very
often, in many engineering problems, one is more interested in whether the
system is stable. Much interest lies in finding admissible controls so that
the resulting system will be stabilized. There has been continuing interest
in dealing with hybrid systems under a Markov switching. In [123], stabi-
lization for robust controls of jump linear quadratic (LQ) control problems
was treated. In [78], both controllability and stabilizability of jump linear
LQ systems were considered. In [21], adaptive LQG problems with finite-
state process parameters were treated. Additional difficulties come when
the switching process α(t) cannot be observed, but can only be observed
in white noise. That is, we can observe

dX(t) = [A(α(t))X(t) + B(α(t))u(t)]dt + dw(t).

For such partially observed systems, it is natural to use nonlinear filter-


ing techniques. The associated filter is the well-known Wonham filter [159],
10 1. Introduction and Motivation

which is one of a handful of finite-dimensional filters in existence. Stabiliza-


tion of linear systems with hidden Markov chains was considered in [22, 36].
In both of these references, averaging criteria were used for the purpose of
stabilization under

lim sup E[|X(t)|2 + |u(t)|2 ] < ∞,


t→∞

in [36], whereas stabilization under


Z t
1
lim sup E [|X(s)|2 + |u(s)|2 ]ds < ∞
t→∞ t 0

was considered in [22]. A question of considerable practical interest is: Can


we design controls so that the resulting system will be stable in the almost
sure sense. Using Wonham filters and converting the partially observed sys-
tem to an equivalent fully observed system, almost sure stabilizing controls
were found under the criterion
1
lim sup log |X(t)| ≤ 0 almost surely (1.11)
t→∞ t
in [13]. The main idea lies in analyzing the sample path properties using
a suitable Liapunov function. We refer the reader to the reference given
above for further details.
Example 1.4. Let {θn } be a discrete-time Markov chain with finite state
space
M = {θ 1 , . . . , θ m0 } (1.12)
and transition probability matrix

P ε = I + εQ, (1.13)

where ε > 0 is a small parameter, I is an m0 × m0 identity matrix, and


Q = (qij ) ∈ Rm0 ×m0 is a generatorP of a continuous-time Markov chain
m0
(i.e., Q satisfies qij ≥ 0 for i 6= j and j=1 qij = 0 for each i = 1, . . . , m0 ).
For simplicity, suppose that the initial distribution P(θ0 =Pθi ) = p0,i is
m0
independent of ε for each i = 1, . . . , m0 , where p0,i ≥ 0 and i=1 p0,i = 1.
Let {Xn } be an S-state conditional Markov chain (conditioned on the
parameter process). The state space of {Xn } is S = {e1 , . . . , eS }, where
ei , for i = 1, . . . , S, denotes the ith standard unit vector with the ith
component being 1 and rest of the components being 0. For each θ ∈ M,
A(θ) = (aij (θ)) ∈ RS×S , the transition probability matrix of Xn , is defined
by

aij (θ) = P(Xn+1 = ej |Xn = ei , θn = θ) = P(X1 = ej |X0 = ei , θ0 = θ),

where i, j ∈ {1, . . . , S}. Assume that for θ ∈ M, A(θ) is irreducible and


aperiodic.
1.4 Examples of Switching Diffusions 11

Note that the underlying Markov chain {θn } is in fact ε-dependent. We


suppress the ε-dependence for notational simplicity. The small parameter
ε in (1.13) ensures that the entries of the transition probability matrix are
nonnegative, because pεij = δij + εqij ≥ 0 for ε > 0 small enough, where δij
denotes the Kronecker δ satisfying δij = 1 if i = j and is 0 otherwise. The
use of the generator Q makes the row sum of the matrix P be one. Although
the true parameter is time varying, it is piecewise constant. Moreover, due
to the dominating identity matrix in (1.13), {θn } varies slowly in time. The
time-varying parameter takes a constant value θ i for a random duration and
jumps to another state θ j with j 6= i at a random time.
The assumptions on irreducibility and aperiodicity of A(θ) imply that
for each θ ∈ M, there exists a unique stationary distribution π(θ) ∈ RS×1
satisfying
π 0 (θ) = π 0 (θ)A(θ), and π 0 (θ)1lS = 1,
where 1l` ∈ R`×1 with all entries being equal to 1. We use a stochastic
approximation algorithm to track the time-varying distribution π(θn ) that
depends on the underlying Markov chain θn .
We use the following adaptive algorithm of least mean squares (LMS)
type with constant stepsize in order to construct a sequence of estimates
πn } of the time-varying distribution π(θn )
{b
π bn + µ(Xn+1 − π
bn+1 = π bn ), (1.14)
bn − Eπ(θn ). Then (1.14) can
en = π
where µ denotes the stepsize. Define π
be rewritten as
en − µe
en+1 = π
π πn + µ(Xn+1 − Eπ(θn )) + E(π(θn ) − π(θn+1 )). (1.15)
Note that π en , are column vectors (i.e., they take
bn , π(θn ), and hence π
values in RS×1 ).
For 0 < T < ∞, we construct a piecewise constant interpolation of the
stochastic approximation iterates πbn as
bµ (t) = π
π bn , t ∈ [µn, µn + µ). (1.16)
The process π b µ (·) so defined is in D([0, T ]; RS ), which is the space of func-
tions defined on [0, T ] taking values in RS that are right continuous, have
left limits, and are endowed with the Skorohod topology.
In one of our recent works [168], using weak convergence methods to
carry out the analysis, we have shown that (b π µ (·), θ µ (·)) converges weakly
to (bπ (·), θ(·)), which is a solution of the following switching ordinary dif-
ferential equation
d
b(t) = π(θ(t)) − π
π b(t), π
b(0) = π
b0 . (1.17)
dt
The above switching ODE displays a very different behavior from the tra-
jectories of systems derived from the classical ODE approach for stochastic
12 1. Introduction and Motivation

approximation. It involves a random element because θ(t) is a continuous-


time Markov chain with generator Q. Furthermore, we can show that

{(bπn − Eπ(θn ))/ µ} is tight for n ≥ n0 for some positive integer n0 . In an
effort to determine the rate of variation of the tracking error sequence, we
define a scaled sequence of the tracking errors {vn } and its continuous-time
interpolation v µ (·) by

bn − E{π(θn )}
π
vn = √ , n ≥ n0 , v µ (t) = vn for t ∈ [nµ, nµ + µ). (1.18)
µ

We have shown in [168] that (v µ (·), θ µ (·)) converges weakly to (v(·), θ(·))
satisfying the switching diffusion equation

dv(t) = −v(t)dt + Σ1/2 (θ(t))dw(t), (1.19)

where w(·) is a standard Brownian motion and Σ(θ) is the almost sure limit
given by
n+m−1 n+m−1
1 X X
lim (Xk+1 (θ) − EXk+1 (θ))(Xk1 +1 (θ) − EXk1 +1 (θ))0
µ→0 n
k1 =m k=m

= Σ(θ) a.s.
(1.20)
Note that for each θ, Σ(θ) is an S × S deterministic matrix and in fact,
n+m−1 n+m−1
1 X X
E {(Xk+1 (θ) − EXk+1 (θ))(Xk1 +1 (θ) − EXk1 +1 (θ))0 }
n
k1 =m k=m

= Σ(θ) as µ → 0.
(1.21)
Because of the regime switching, the system is qualitatively different from
the existing literature on stochastic approximation methods; see Kushner
and Yin [104].

Example 1.5. Suppose that w(·) is a d-dimensional Brownian motion


and α(·) is a continuous-time Markov chain generated by Q taking values
in M = {1, . . . , m0 }. Define Ft = σ{w(s), α(s) : 0 ≤ s ≤ t}, and denote by
L2F (0, T ; Rm0 ) the set of all Rm0 -valued, measurable stochastic processes
RT
f (t) adapted to {Ft }t≥0 satisfying E 0 |f (t)|2 dt < ∞.
Consider a financial market in which d + 1 assets are traded continu-
ously. One of the assets is a bank account whose price P0 (t) is subject
to the following stochastic ordinary differential equation with Markovian
switching, 
 dP (t) = r(t, α(t))P (t)dt, t ∈ [0, T ],
0 0
(1.22)
 P (0) = p > 0,
0 0
1.4 Examples of Switching Diffusions 13

where r(t, i) ≥ 0, i = 1, 2, . . . , m0 , are given as the interest rate processes


corresponding to different market modes. The other d assets are stocks
whose price processes Pm (t), m = 1, 2, . . . , d, satisfy the following system
of stochastic differential equations with Markov switching:
 ( )
d
X


 dPm (t) = Pm (t) bm (t, α(t))dt + σmn (t, α(t))dwn (t) , t ∈ [0, T ],
 n=1

 P (0) = p > 0,
m m
(1.23)
where for each i = 1, 2, . . . , m0 , bm (t, i) is the appreciation rate process and
σm (t, i) := (σm1 (t, i), . . . , σmd (t, i)) is the volatility or the dispersion rate
process of the mth stock, corresponding to α(t) = i.
Define the volatility matrix as
 
σ1 (t, i)
 
 .. 
σ(t, i) :=  .  , for each i = 1, . . . , m0 . (1.24)
 
σd (t, i)

We assume that the following non-degeneracy condition

σ(t, i)σ 0 (t, i) ≥ δI, ∀t ∈ [0, T ], and i = 1, 2, . . . , m0 , (1.25)

is satisfied for some δ > 0, and that all the functions r(t, i), bm (t, i),
σmn (t, i) are measurable and uniformly bounded in t. Inequality (1.25)
is in the sense of positive definiteness for symmetric matrices; that is,
σ(t, i)σ 0 (t, i) − δI is a positive definite matrix.
Suppose that the initial market mode α(0) = i0 . Consider an agent with
an initial wealth x0 > 0. Denote by x(t) the total wealth of the agent at
time t ≥ 0. Assuming that the trading of shares takes place continuously
and that there are no transaction cost and consumptions, then one has
(see, e.g., [181, p.57])

 n X d
  o



 dx(t) = r(t, α(t))x(t) + b m (t, α(t)) − r(t, α(t)) u m (t) dt


 m=1
X d X d

 + σmn (t, α(t))um (t)dwn (t),



 n=1 m=1

 x(0) = x > 0, α(0) = i ,
0 0
(1.26)
where um (t) is the total market value of the agent’s wealth in the mth asset,
m = 0, 1, . . . , d, at time t. We call u(·) = (u1 (·), . . . , ud (·))0 a portfolio of
the agent. Note that once u(·) is determined, u0 (·), the asset in the bank
14 1. Introduction and Motivation

account is completed specified inasmuch as


d
X
u0 (t) = x(t) − ui (t).
i=1

Thus, we need only consider u(·) and ignore u0 (·).


Setting

B(t, i) := (b1 (t, i) − r(t, i), . . . , bd (t, i) − r(t, i)), i = 1, 2, . . . , m0 , (1.27)

we can rewrite the wealth equation (1.26) as



 dx(t) = [r(t, α(t))x(t) + B(t, α(t))u(t)]dt + u0 (t)σ(t, α(t))dw(t),
 x(0) = x , α(0) = i .
0 0
(1.28)
A portfolio u(·) is said to be admissible if u(·) ∈ L2F (0, T ; Rd ) and the
stochastic differential equation (1.28) has a unique solution x(·) correspond-
ing to u(·). In this case, we refer to (x(·), u(·)) as an admissible (wealth,
portfolio) pair. The agent’s objective is to find an admissible portfolio
u(·), among all the admissible portfolios whose expected terminal wealth
is Ex(T ) = z for some given z ∈ R1 , so that the risk measured by the
variance of the terminal wealth

Var x(T ) ≡ E[x(T ) − Ex(T )]2 = E[x(T ) − z]2 (1.29)

is minimized. Finding such a portfolio u(·) is referred to as the mean-


variance portfolio selection problem. An interested reader can find further
details of the study on this problem in [186].
Example 1.6. This problem arises in the context of insurance and risk
theory. Suppose that there is a finite set M = {1, . . . , m0 }, representing
the possible regimes (configurations) of the environment. At each i ∈ M,
assume that the premium is payable at rate c(i) continuously. Let U (t, i)
be the surplus process given the initial surplus u > 0 and initial state i:
Z t
U (t, i) = u + c(α(s))ds − S(t),
0

where S(t), as in the classical risk model, is a compound Poisson process


and α(t) is a continuous-time Markov chain with state space M repre-
senting the random environment. Under suitable conditions, we obtained
Lunderberg-type upper bounds and nonexponential upper bounds for the
ruin probability, and treated the renewal-type system of equations for ruin
probability when the claim sizes were exponentially distributed. To pro-
ceed further, we consider a class of jump-diffusions with regime switching
to prepare us for treating applications involving more general risk models.
1.4 Examples of Switching Diffusions 15

One of the main features of [163] is that there is an additional Markov


chain, which enables the underlying surplus to vary in accordance with dif-
ferent regimes. Consider jump-diffusions modulated by a continuous-time
Markov chain. Because the dynamic systems are complex, it is of fore-
most importance to reduce the complexity. Taking into consideration the
inherent hierarchy in a complex system [149], and different rates of vari-
ations of subsystems and components, we use the two-time-scale method
leading to systems in which the fast and slow rates of change are in sharp
contrast. Then we proceed to reduce the system complexity by aggrega-
tion/decomposition and averaging methods. We demonstrate that under
broad conditions, associated with the original systems, there are limit or
reduced systems, which are averages with respect to certain invariant mea-
sures. Using weak convergence methods [102, 103], we obtain the limit
system via martingale problem formulation.
Motivated by risk theory applications [35], we consider a switching jump
diffusion model. We formulate the problem in a general way, which allows
the consideration of other problems where switching jump diffusions are
involved.
Let Γ ⊂ Rr − {0} so that Γ is the range space of the impulsive jumps,
w(·) be a real-valued standard Brownian motion, and N (·, ·) be a Poisson
measure such that N (t, H) counts the number of impulses on [0, t] with
values in the set H. Let f (·, ·, ·) : [0, T ] × R × M 7→ R, σ(·, ·, ·) : [0, T ] × R ×
M 7→ R, g(·, ·, ·) : Γ × R × M 7→ R, and α(·) be a continuous-time Markov
chain having state space M and generator Q(t). A brief description of the
jump-diffusion process with a modulating Markov chain can be found in
the appendix. Consider the following jump-diffusion processes with regime
switching
Z t Z t
X(t) = x0 + f (s, X(s), α(s))ds + σ(s, X(s), α(s))dw(s)
Z t Z0 0 (1.30)
+ g(γ, X(s− ), α(s− ))N (ds, dγ).
0 Γ

We assume that w(·), N (·), and α(·) are mutually independent. Note that
compared with the traditional jump-diffusion processes, the coefficients in-
volved in (1.30) all depend on an additional switching process–the Markov
chain α(t).
In the context of risk theory, X(t) can be considered as the surplus of
the insurance company at time t, x0 is the initial surplus, f (t, X(t), α(t))
represents the premium rate (assumed to be ≥ 0), g(γ, X(t), α(t)) is the
amount of the claim if there is one (assumed to be ≤ 0), and the diffu-
sion is used to model additional uncertainty of the claims and/or premium
incomes. Similar to the volatility in stock market models, σ(·, ·, i) repre-
sents the amount of oscillations or volatility in an appropriate sense. The
model is sufficiently general to cover the traditional as well as the diffusion-
perturbed ruin models. It may also be used to represent security price in
16 1. Introduction and Motivation

finance (see [124, Chapter 3]). The process α(t) may be viewed as an envi-
ronment variable dictating the regime. The use of the Markov chain results
from consideration of a general trend of the market environment as well as
other economic factors. The economic and/or political environment changes
lead to the changes of surplus regimes resulting in markedly different be-
havior of the system across regimes.
Defining a centered Poisson measure and applying generalized Ito’s rule,
we can obtain the generator of the jump-diffusion process with regime
switching, and formulate a related martingale problem. Instead of a sin-
gle process, we have to deal with a collection of jump-diffusion processes
that are modulated by a continuous-time Markov chain. Suppose that λ
is positive such that λ∆ + o(∆) represents the probability of a switch of
regime in the interval [t, t + ∆), and π(·) is the distribution of the jump.
Then the generator of the underlying process can be written as
 

GF (t, x, ι) = + L F (t, x, ι)
∂t
Z
+ λ[F (t, x + g(γ, x, ι), ι) − F (t, x, ι)]π(dγ) (1.31)
Γ
+Q(t)F (t, x, ·)(ι), for each ι ∈ M,

where

1 2 ∂2 ∂
LF (t, x, ι) = σ (t, x, ι) 2 F (t, x, ι) + f (t, x, ι) F (t, x, ι),
2 m0
∂x ∂x
X X
Q(t)F (t, x, ·)(ι) = qι` (t)F (t, x, `) = qι` (t)[F (t, x, `) − F (t, x, ι)].
`=1 `6=ι
(1.32)
Of particular interest is the case that the switching process and the diffu-
sion vary at different rates. To reduce the amount of computational effort,
we propose a two-time-scale approach. By concentrating on time-scale sep-
arations, we treat two cases in Chapter 12. In the first one, the regime
switching is significantly faster than the dynamics of the jump diffusions,
whereas in the second case, the diffusion vary an order of magnitude faster
than the other processes. As shown, averaging plays an essential role in
these problems.

Example 1.7. Consider the process Y (·) = (X(·), α(·)) that has two com-
ponents, the diffusion component X(·) and the pure jump component α(·).
The state space of the process is S × M, where S is the unit circle and
M = {1, . . . , m0 } is a state space with finitely many elements for the jump
process. By identifying the endpoints 0 and 1, let x ∈ [0, 1] be the co-
ordinates in S. We assume that the generator of the process (X(t), α(t))
is of the form (1.31) with the jump part missing (i.e., λ = 0) and with
(x, i) ∈ [0, 1] × M.
1.4 Examples of Switching Diffusions 17

The probability density p(x, t) = (p(x, t, 1), . . . , p(x, t, m0 )) of the process


Y (·), with Z
p(x, t, i)dx = P(X(t) ∈ Γ, α(t) = i),
Γ
satisfies the adjoint equation, namely the system of forward equations
m0
X
∂p(x, t, i)
= D∗ p(x, t, i) + p(x, t, j)qji (x),
∂t j=1

where for a suitable function f (x, t, i),

D∗ f (x, t, i) = D ∗ (x, t, i)f (x, t, i)


1 ∂2 ∂
= (ai (x, t)f (x, t, i)) − (bi (x, t)f (x, t, i)),
2 ∂x2 ∂x
p(x, 0, i) = gi (x),

for i = 1, . . . , m0 , and g(x) = (g1 (x), . . . , gm0 (x)) is the initial distribution
for Y (t). The existence and properties of such switching diffusion processes
can be found in [55, Section 2.2]. Suppose that all conditions of [46, The-
orem 16, p. 82] are satisfied. Then the system of forward equations has a
unique solution.
Suppose that qij (x) > 0 for each i 6= j. Then the transition density
p(x, t) converges exponentially fast to

ν(x) = (ν1 (x), . . . , νm0 (x)),

the density of the stationary distribution; that is,

|p(x, t, i) − νi (x)| ≤ K exp(−γt),

for some K > 0 and γ > 0. The estimate is the well-known spectrum gap
condition. This spectrum gap condition helps us to study related asymp-
totic properties of the system.
Example 1.8. This example is concerned with a switching diffusion pro-
cess with a state-dependent switching component. We compare the trajec-
tories of a linear stochastic system with drift and diffusion coefficients given
by
f (x) = 0.11x and σ(x) = 0.2x (1.33)
and another regime-switching linear system with α(t) ∈ {1, 2}, and the
drift and diffusion coefficients given by

f (x, 1) = −0.2x, f (x, 2) = 0.11x,


(1.34)
σ(x, 1) = x, σ(x, 2) = 0.2x,
18 1. Introduction and Motivation

respectively. For the switching diffusion, the switching process is continuous


state-dependent with the Q(x) given by
 
−5 cos2 x 5 cos2 x
Q(x) =  .
10 cos2 x −10 cos2 x

What happens if the actual model is the one with switching, but we mis-
modeled the system so that instead of the switching model, we used a
simple linear stochastic differential equation model with drift and diffusion
coefficients given by (1.33)? If in fact, the system parameters are really
given by (1.34), and if the random environment is included in the model,
then we will see a significant departure of the model from that of the true
system. This can be seen from the plots in Figure 1.2.
To get further insight from these plots, imagine that we are encountering
the following scenarios. Suppose that the above example is for modeling the
stock price of a particular equity. Suppose that we modeled the stock price
by using the simple geometric Brownian motion model with return rate
11% and the volatility 0.2 as given in (1.33). However, the market is really
subject to random environment perturbation. For the bull market, the rates
are as before, but corresponding to the bear market, the return rate and
the volatility are given by −20% and 1, respectively. The deviation of the
plot in Figure 1.2 shows that if one uses a simple linear SDE model, one
cannot capture the market behavior.
Example 1.9. This example is concerned with a controlled switching dif-
fusion model. We first give a motivation of the study and begin the discus-
sion on Markov decision processes. Consider a real-valued process X(·) =
{X(t) : t ≥ 0} and a feedback control u(·) = {u(t) = u(X(t)) : t ≥ 0} such
that u(t) ∈ Γ for t ≥ 0 with Γ being a compact subset of an Euclidean
space. Γ denotes the control space. A vast literature is concerned with a
certain optimal control problem. That is, one aims to find a feedback u(·)
so that an appropriate objective function is minimized.
Extending the idea to switching diffusion processes, we consider the fol-
lowing pair of processes (X(t), α(t)) ∈ Rr × M, where M = {1, . . . , m0 }.
Let U be the control space or action space, which is a compact set of
an Euclidean space. Suppose that b(·, ·, ·) : Rr × M × U 7→ Rr , and
σ(·, ·, ·) : Rr × M × U 7→ Rr×r are appropriate functions satisfying cer-
Q(x) = (qij (x)) ∈ Rm0 ×m0 satisfies that for
tain regularity conditions, and P
m0
each x, qij (x) ≥ 0 for i 6= j, j=1 qij (x) = 0 for each i ∈ M. For each
i ∈ M and suitable smooth function h(·, i), define an operator
1
Lh(x, i) = ∇h0 (x, i)b(x, i, u) + tr[∇2 h(x, i)σ(x, i, u)σ 0 (x, i, u)]
2
m0
X (1.35)
+ qij (x)h(x, j).
j=1
1.4 Examples of Switching Diffusions 19

11

10

8
x(t)

4
0 50 100 150 200 250 300 350
t

(a) A sample path of a mismodeled system: A linear


stochastic differential equation with constant coeffi-
cients.

6
x(t)

1
0 50 100 150 200 250 300 350
t

(b) A sample path of the true system: A regime-


switching stochastic system of differential equations.

FIGURE 1.2. Comparisons of sample paths for a linear diffusion with that of a
regime-switching diffusion.
20 1. Introduction and Motivation

Then we call (X(t), α(t)) a controlled switching diffusion process. Follow-


ing the notation of Markov decision processes, we may write the process
(X(t), α(t)) as (X(t), α(t)) ∼ L(u). Note that owing to the feedback con-
trols used, it is natural to require the matrix-valued function Q(·) to be
x-dependent. This model is a generalization of the usual Markov decision
processes. When the functions b(x, i, u) ≡ 0 and σ(x, i, i) ≡ 0, the model
reduces to a Markov decision process. When the matrix-valued function
Q(x) ≡ 0, the model reduces to controlled diffusions. Typically, our ob-
jective is to select the control u(·) so that a control objective function is
achieved.

Example 1.10. Originating from statistical mechanics, mean-field mod-


els are concerned with many-body systems with interactions. To overcome
the difficulty of interactions due to the many bodies, one of the main ideas
is to replace all interactions to any one body with an average or effective
interaction. This reduces any multibody problem to an effective one-body
problem. Although its main motivation and development are in statisti-
cal mechanics, such models have also enjoyed recent applications in, for
example, graphical models in artificial intelligence.
If the field or particle exhibits many interactions in the original system,
the mean field will be more accurate for such a system. The usefulness,
the potential impact on many practical scenarios, and the challenges from
both physics and mathematics have attracted much attention in recent
years. The work of Dawson [31] presents a detailed study on the cooperative
behavior of mean fields.
Owing to the rapid progress in technology, more complicated systems
are encountered in applications. In response to such challenges, much effort
has been devoted to modeling and analysis for more sophisticated systems.
Frequently, there are factors that cannot be described by the traditional
models. One of the ideas is to bring regime switching into the formulation,
so as to deal with the coexistence of continuous dynamics and discrete
events. Although one may use a pure jump process to represent the discrete
events, due to the interactions of the many bodies, the discrete process in
fact, is correlated with the diffusive dynamics.
In Xi and Yin [162], the following system is considered. Suppose that α(t)
is a right-continuous random process taking values in a finite state space
M = {1, 2, . . . , m0 }. Consider an `-body mean-field model with switching
described by the following system of Itô stochastic differential equations.
For i = 1, 2, . . . , `,

 
dXi (t) = γ(α(t))Xi (t) − Xi3 (t) − β(α(t))(Xi (t) − X(t)) dt
(1.36)
+ σii (X(t), α(t))dwi (t),
1.5 Outline of the Book 21

where
1 X̀
X(t) = Xj (t),
` j=1 (1.37)
X(t) = (X1 (t), X2 (t), . . . , X` (t))0 ,

and γ(i) > 0 and β(i) > 0 for i ∈ M. Moreover, the transition rules of α(t)
are specified by

P{α(t + ∆) = j|α(t) = j, X(t) = x}



 q (x)∆ + o(∆), if i 6= j, (1.38)
ij
=
 1 + q (x)∆ + o(∆), if i = j,
ii

which hold uniformly in R` as ∆ ↓ 0, for x = (x1 , x2 , . . . , x` )0 ∈ R` and


σ = (σij ) ∈ R` × R` . A number of asymptotic properties were obtained
in [162] including regularity, Feller properties, and exponential ergodicity.
We refer the interested reader to the aforementioned reference for further
details.

As shown in the examples above, all the systems considered involve hy-
brid switching diffusions. To have a better understanding of each of the
problems and the properties of the corresponding processes, it is important
that we have a thorough understanding of the switching diffusion process.
This is our objective in this book.

1.5 Outline of the Book


This chapter serves as a prelude to the book. It gives motivation and
presents examples for the switching-diffusion processes. After this short
introduction, the book is divided into four parts.
The first part, including three chapters, presents basic properties such as
Feller and strong Feller, recurrence, and ergodicity. Chapter 2 begins with
the precise definition of the switching diffusion processes. With a brief
review of the existence and uniqueness of solutions of switching diffusions,
we deal with basic properties such as regularity, weak continuity, Feller and
strong Feller properties and so on. Then continuous and smooth dependence
of initial data are presented. The proofs of these results are different from
the traditional setup and much more involved due to the coupling resulting
from the switching component.
Chapter 3 is concerned with recurrence and positive recurrence. Neces-
sary and sufficient conditions for positive recurrence are given. We show
that recurrence and positive recurrence are independent of the open set cho-
sen. Furthermore, we demonstrate that we can work with a fixed discrete
22 1. Introduction and Motivation

component. The approach we are using is based on treatment of elliptic


systems of partial differential equations. First, criteria of positive recur-
rence based on Liapunov functions are given. Then we translate these into
conditions on the coefficients of the switching diffusions.
Chapter 4 focuses on ergodicity. It is shown that for a positive recurrent
switching diffusion, there exists a unique invariant measure. Not only are
the existence and uniqueness proved, but the form of the invariance measure
is given. It reveals the salient feature of the underlying process owing to the
switching. The ergodic measures obtained enable one to carry out control
and optimization tasks with replacement of the instantaneous measures by
the ergodic measures.
Part II of the book is devoted to numerical solutions of switching dif-
fusions. As their diffusion counterpart, very often closed-form solutions of
nonlinear switching diffusions are hard to come by. Thus, numerical ap-
proximation becomes a viable and often the only alternative. Developing
numerical results is important. Therefore, Chapters 5 and 6 are devoted to
numerical approximations. Chapter 5 presents numerical algorithms of the
Euler-Maruyama type. Although decreasing stepsize algorithms are often
considered in the literature, we illustrate that constant stepsize algorithms
work just as well. Such a fact has been well recognized by researchers work-
ing in the fields of systems theory, control, and signal processing among
others. Using weak convergence methods, we establish the convergence of
the numerical approximations.
Chapter 6 switches gear and concentrates on numerical approximation
of the erogdic measures. Sometimes, people refer to convergence to the
stationary distribution as stability in distribution. In this language, the
subject matter of this chapter is: Under what conditions, will the numerical
algorithm be stable in distribution. Again, weak convergence methods are
the key for us to reach the conclusion.
Containing three chapters, Part III focuses on stability. Chapters 7 and
8 proceed with the stability analysis. The approach is based on Liapunov
function methods. Chapter 7 studies stability of switching diffusions. The
notion is in the sense of stability in probability. First, definitions of stability
and instability are given. Then criteria are presented based on Liapunov
functions. In addition, more verifiable conditions on the coefficients of the
systems are presented. Necessary and sufficient conditions for pth-moment
stability of linear (in the continuous component) systems and linear ap-
proximations are provided. It is noted that contrary to common practice,
very often nonquadratic Liapunov functions are easier to use.
Chapter 8 takes up the stability study for fully degenerated systems.
We treat the case that the diffusion matrix becomes 0, or equivalently, we
have switching ordinary differential equations. Based on Liapunov meth-
ods, we obtain sufficient conditions for stability. Similar to the nondegen-
erate switching diffusions, the results on stability and instability have an
“eigenvalue” gap. That is, the stability criteria are in terms of the largest
1.5 Outline of the Book 23

eigenvalue of a certain matrix and the instability is in terms of the small-


est eigenvalue of the same matrix. To close up the gap, we introduce a
logarithm transformation leading to a necessary and sufficient condition in
terms of a one-dimensional function.
Chapter 9 proceeds with the investigation of the invariance principle. In
studying deterministic systems represented by ordinary differential equa-
tions, the concept of the invariance principle in the sense of LaSalle has
come into being (see [106] and [107]). It enables one to obtain far-reaching
results compared to the purely equilibrium point analysis alone. Building
on this idea, the invariance principle for stochastic counterpart for diffusion
processes was considered by Kushner in [98], whereas that for stochastic
differential delay equations was treated in [117]. In this chapter, focusing
on switching diffusions, we obtain their invariance properties using both a
sample path approach and a measure theoretic approach.
The last part of the book concentrates on two-time-scale modeling and
applications; it contains three chapters. Up to this point, it has been as-
sumed the switching component is irreducible. Roughly, that means all
of the discrete states belong to the same class. One natural question is:
What happens if not all states belong to the same recurrent class. Chap-
ter 10 is concerned with such an issue. Using the methods of perturbed
Liapunov functions, this problem is answered. We divide the state space
of the switching component into several irreducible groups. Within each
irreducible group the switching component moves frequently, whereas from
one group to another, it moves relatively infrequently. We distinguish the
fast and slow motions by introducing a small parameter ε > 0. Aggregating
the states in each recurrent group into one superstate, as ε → 0, we obtain
a limit system, whose drift and diffusion coefficients are averaged out with
respect to the invariant measure of the switching part. We show that if
the limit process is recurrent so is the original process for sufficiently small
ε > 0.
Chapter 11 is concerned with an application to financial engineering.
The equity asset is assumed to follow a Markov regime-switching diffusion,
in which both the return rates and the volatility depend on a continuous-
time Markov chain. This model is an alternative to the so-called stochastic
volatility model of Hull and White [70] which is well known nowadays. The
motivation of our study is similar to the fast mean reverting models treated
in [49], in which Fouque, Papanicolaou, and Sircar assumed the volatility
to be a fast-varying diffusion process. It was shown that the Black–Scholes
pricing formula is a first approximation to the stochastic volatility model
when fast mean reversion is observed. We treat an alternative model with
regime switching, and reach a similar conclusion.
Chapter 12 considers a slightly more general model with an additional
jump component. The key point here is the utilization of two-time scales.
The motivation stems from reduction of complexity. Two different situa-
tions are considered. In the first case, the switching component changes
24 1. Introduction and Motivation

an order of magnitude faster than the continuous component. In the sec-


ond case, the diffusion is fast-varying. In both cases, we obtain appropriate
limits using a weak convergence approach.
Finally, for convenience, an appendix including a number of mathemati-
cal preliminaries is placed at the end of the book. It serves as a quick refer-
ence. Topics discussed here including Markov chains, martingales, Gaussian
processes, diffusions, jump diffusions, and weak convergence methods. Al-
though detailed developments are often omitted, appropriate references are
provided for the reader to facilitate further reading.
2
Switching Diffusion

2.1 Introduction
This chapter provides an introduction to switching diffusions. First the
definition of switching diffusion is given. Then with a short review of the
existence and uniqueness of the solution of associated stochastic differential
equations, weak continuity, Feller, and strong Feller properties are estab-
lished. Also given here are the definition of regularity and criteria ensuring
such regularity. Moreover, smooth dependence on initial data is presented.
The rest of the chapter is arranged as follows. After this short introduc-
tory section, Section 2.2 presents the general setup for switching processes.
Section 2.3 is concerned with regularity. Section 2.4 deals with weak con-
tinuity of the pair of process (X(t), α(t)). Section 2.5 proceeds with Feller
properties. Section 2.6 goes one step further to obtain strong Feller prop-
erties. Section 2.7 presents smooth dependence properties of solutions of
the switching diffusions. Section 2.8 gives remarks on how nonhomogeneous
cases in which both the drift and diffusion coefficients depend explicitly on
time t can be handled. Finally, Section 2.9 provides additional notes and
remarks.

2.2 Switching Diffusions


We work with a probability space (Ω, F, P) throughout this book. A family
of σ-algebras {Ft }, for t ≥ 0 or t = 1, 2, . . ., or simply Ft , is termed a

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 27
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_2,
© Springer Science + Business Media, LLC 2010
28 2. Switching Diffusion

filtration if Fs ⊂ Ft for s ≤ t. We say that Ft is complete if it contains all


null sets and that the filtration {Ft } satisfies the usual condition if F0 is
complete. A probability space (Ω, F, P) together with a filtration {Ft } is
said to be a filtered probability space, denoted by (Ω, F, {Ft }, P).
Suppose that α(·) is a stochastic process with right-continuous sample
paths (or a pure jump process), finite-state space M = {1, . . . , m0 }, and
x-dependent generator Q(x) so that for a suitable function f (·, ·),
X
Q(x)f (x, ·)(ı) = qı (x)(f (x, ) − f (x, ı)), for each ı ∈ M. (2.1)
∈M

Let w(·) be an Rd -valued standard Brownian motion defined in the filtered


probability space (Ω, F, {Ft }, P). Suppose that b(·, ·) : Rr × M 7→ Rr and
that σ(·, ·) : Rr × M 7→ Rd . Then the two-component process (X(·), α(·)),
satisfying

dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t),


(2.2)
(X(0), α(0)) = (x, α),

and for i 6= j,

P{α(t + ∆) = j|α(t) = i, X(s), α(s), s ≤ t} = qij (X(t))∆ + o(∆), (2.3)

is termed a switching diffusion or a regime-switching diffusion. Naturally,


for the two-component process (X(t), α(t)), we call X(t) the continuous
component and α(t) the discrete component, in accordance with their sam-
ple path properties.
There is an associated operator defined as follows. For each ι ∈ M and
each f (·, ι) ∈ C 2 , where C 2 denotes the class of functions whose partial
derivatives with respect to the variable x up to the second-order are con-
tinuous, we have

Lf (x, ι) = ∇f 0 (x, ι)b(x, ι) + tr(∇2 f (x, ι)A(x, ι)) + Q(x)f (x, ·)(ι)
Xr r
∂f (x, ι) 1 X ∂ 2 f (x, ι)
= bi (x, ι) + aij (x, ι) (2.4)
i=1
∂xi 2 i,j=1 ∂xi ∂xj
+Q(x)f (x, ·)(ι),

where ∇f (x, ι) and ∇2 f (x, ι) denote the gradient and Hessian of f (x, ι)
with respect to x, respectively,
m0
X
Q(x)f (x, ·)(ι) = qιj f (x, j), and
j=1

A(x, ı) = (aij (x, ı)) = σ(x, ı)σ 0 (x, ı) ∈ Rr×r .


2.2 Switching Diffusions 29

Note that the evolution of the discrete component α(·) can be represented
by a stochastic integral with respect to a Poisson random measure (see,
e.g., [52, 150]). Indeed, for x ∈ Rr and i, j ∈ M with j 6= i, let ∆ij (x) be
the consecutive (with respect to the lexicographic ordering on M × M),
left-closed, right-open intervals of the real line, each having length qij (x).
Define a function h : Rr × M × R 7→ R by
m0
X
h(x, i, z) = (j − i)I{z∈∆ij (x)} . (2.5)
j=1

That is, with the partition {∆ij (x) : i, j ∈ M} used and for each i ∈ M,
if z ∈ ∆ij (x), h(x, i, z) = j − i; otherwise h(x, i, z) = 0. Then (2.3) is
equivalent to Z
dα(t) = h(X(t), α(t−), z)p(dt, dz), (2.6)
R

where p(dt, dz) is a Poisson random measure with intensity dt×m(dz), and
m is the Lebesgue measure on R. The Poisson random measure p(·, ·) is
independent of the Brownian motion w(·).
Similar to the case of diffusions, with the L defined in (2.4), for each
f (·, ı) ∈ C 2 , ı ∈ M, a result known as the generalized Itô lemma (see
[17, 120] or [150]) reads
Z t
f (X(t), α(t)) − f (X(0), α(0)) = Lf (X(s), α(s))ds + M1 (t) + M2 (t),
0
(2.7)
where
Z t
M1 (t) = ∇f (X(s), α(s)), σ(X(s), α(s))dw(s) ,
Z0 t Z

M2 (t) = f (X(s), α(0) + h(X(s), α(s), z))
0 R

−f (X(s), α(s)) µ(ds, dz),

and
µ(ds, dz) = p(ds, dz) − ds × m(dz)
is a martingale measure.
In view of the generalized Itô formula,
Z t
Mf (t) = f (X(t), α(t)) − f (X(0), α(0)) − Lf (X(s), α(s))ds (2.8)
0

is a local martingale. If for each ι ∈ M, f (x, ι) ∈ Cb2 (class of functions


possessing bounded and continuous partial derivatives with respect to x
of order up to two) or f (x, ι) ∈ C02 (C 2 functions with compact support),
then Mf (t) defined in (2.8) becomes a martingale. Similar to the case of
30 2. Switching Diffusion

diffusion processes, we can define the corresponding notion of the solution


of the martingale problem accordingly.
Another consequence of the generalized Itô formula (2.7) is that the
Dynkin formula follows. Indeed, let f (·, ı) ∈ C 2 for ı ∈ M, and τ1 , τ2 be
bounded stopping times such that 0 ≤ τ1 ≤ τ2 a.s. If f (X(t), α(t)) and
Lf (X(t), α(t)) and so on are bounded on t ∈ [τ1 , τ2 ] with probability 1,
then
Z τ2
Ef (X(τ2 ), α(τ2 )) = Ef (X(τ1 ), α(τ1 )) + E Lf (X(s), α(s))ds; (2.9)
τ1

see [120] or [150] for details. In what follows, for convenience (with multi-
index notation used, for instance) and emphasis on the x-dependence, we
often use Dx f (x, α) and ∇f (x, α) interchangeably to represent the gradi-
ent. In addition, for two vectors x and y with appropriate dimensions, we
use x0 y and x, y interchangeably to represent their inner product. To pro-
ceed, we present the existence and uniqueness of solutions to a system of
stochastic differential equations associated with switching diffusions first.
In what follows, for Q(x) : Rr → Rm0 ×m0 , we say it satisfies the q-
property, if Q(x) = (qij (x)), for all x ∈ Rr , qij (x) is Borel measurable for
all i, j ∈ M andPx ∈ Rr ; qij (x) is uniformly bounded; qij (x) ≥ 0 for j 6= i
and qii (x) = − j6=i qij (x) for all x ∈ Rr .
Note that an alternative definition of the q-property can be devised. The
boundedness assumption can be relaxed. However, for our purpose, the
current setup seems to be sufficient. The interested reader could find the
desired information through [28].

Theorem 2.1. Let x ∈ Rr , M = {1, . . . , m0 }, and Q(x) = (qij (x)) be


an m0 × m0 matrix depending on x satisfying the q-property. Consider the
two-component process Y (t) = (X(t), α(t)) given by (2.2) with initial data
(x, α). Suppose that Q(·) : Rr 7→ Rm0 ×m0 is a bounded and continuous
function, that the functions b(·, ·) and σ(·, ·) satisfy

|b(x, α)| + |σ(x, α)| ≤ K(1 + |x|), α ∈ M, (2.10)

and that for every integer N ≥ 1, there exists a positive constant M N such
that for all t ∈ [0, T ], i ∈ M and all x, y ∈ Rr with |x| ∨ |y| ≤ MN ,

|b(x, i) − b(y, i)| ∨ |σ(x, i) − σ(y, i)| ≤ MN |x − y|. (2.11)

Then there exists a unique solution (X(t), α(t)) to the equation (2.2) with
given initial data in which the evolution of the jump process is specified by
(2.3).

Remark 2.2. For brevity, the detailed proof is omitted. Instead, we make
the following remarks. There are a number of possible proofs. For example,
2.2 Switching Diffusions 31

the existence can be obtained as in [150, pp. 103-104]. Viewing the switch-
ing diffusion as a special case of a jump-diffusion process (see the stochastic
integral representation of α(t) in (2.6)), one may prove the existence and
uniqueness using [77, Section III.2]. Another possibility is to use a mar-
tingale problem formulation together with utilization of truncations and
stopping times as in [72, Chapter IV]. In Chapter 5, we present numerical
approximation algorithms for solutions of switching diffusions, and show
the approximation algorithms converge weakly to the switching diffusion
of interest by means of a martingale problem formulation. Then using Lip-
schitz continuity and the weak convergence, we further obtain the strong
convergence of the approximations. As a byproduct, we can obtain the ex-
istence and uniqueness of the solution. The verbatim proof can be found in
[172]; see Remark 5.10 for further explanations. While most proofs of the
uniqueness take two different solutions with the same initial data and show
their difference should be 0 by using Lipschitz continuity and Gronwall’s
inequality, it is possible to consider the difference of the two solutions with
different initial data whose difference is arbitrarily small. In this regard,
the uniqueness can be derived from Proposition 2.30. Earlier work using
such an approach may be found in [130].
Note that (2.10) is the linear growth condition and (2.11) is the local Lip-
schitz condition. These conditions for the usual diffusion processes (without
switching) are used extensively in the literature. Theorem 2.1 provides ex-
istence and uniqueness of solutions of (2.2) with (2.3) specified. To proceed,
we obtain a moment estimate on X(t). This estimate is used frequently in
subsequent development. As its diffusion counter part, the main ingredient
of the proof is the use of Gronwall’s inequality.

Proposition 2.3. Assume the conditions of Theorem 2.1. Let T > 0 be


fixed. Then for any positive constant γ, we have
 
γ
Ex,i sup |X(t)| ≤ C < ∞, (x, i) ∈ Rr × M, (2.12)
t∈[0,T ]

where the constant C satisfies C = C(x, T, γ).

Proof. Step 1. Because


Z t Z t
X(t) = x + b(X(s), α(s))ds + σ(X(s), α(s))dw(s),
0 0

we have for any p ≥ 2 that


 Z t p Z t p 
p p
|X(t)| ≤ 3p−1 |x| + b(X(s), α(s))ds + σ(X(s), α(s))dw(s) .
0 0

Using the Hölder inequality and the linear growth condition, detailed com-
32 2. Switching Diffusion

putations lead to
 Z t p 
Ex,i sup b(X(s), α(s))ds
0≤t≤T1 0
Z T1
p−1 p
≤T |b(X(s), α(s))| ds
0Z
T1
p−1 p
≤ KT (1 + |X(s)| )ds
Z0 T1 h i
≤ c1 (T, p) Ex,i 1 + sup |X(u)|p ds,
0 0≤u≤s

where 0 ≤ T1 ≤ T , and c1 (T, p), i = 1, 2 is a positive constant depending


only on T , p, and the linear growth constant in (2.10). Similarly, by the
Burkholder–Davis–Gundy inequality (see Lemma A.32; see also [120, p.
70]), the Hölder inequality, and the linear growth condition, we compute
 Z t p
Ex,i sup σ(X(s), α(s))dw(s)
0≤t≤T1
 Z 0T1 p/2
2
≤ KEx,i |σ(X(s), α(s))| ds
0 Z T1
p/2−1 p
≤ KT Ex,i |σ(X(s), α(s))| ds
Z T1 0 h i
≤ c2 (T, p) Ex,i 1 + sup |X(u)|p ds,
0 0≤u≤s

where 0 ≤ T1 ≤ T , and c2 (T, p) is a positive constant depending only on


T , p, and the linear growth constant in (2.10). Thus we have
 
p
Ex,i 1 + sup |X(t)|
0≤t≤T1
Z T1  
p
≤ c3 (x, p) + c4 (T, p) Ex,i 1 + sup |X(u)| ds,
0 0≤u≤s
p−1 p
where c3 (x, p) = 1 + 3 |x| , and c4 (T, p) = 3p−1 [c1 (T, p) + c2 (T, p)].
Hence the Gronwall inequality (see, e.g., [44]) implies that
 
p
Ex,i 1 + sup |X(t)|
0≤t≤T1

≤ c3 (x, p) exp(c4 (T, p)T1 )


≤ c3 (x, p) exp(c4 (T, p)T )
:= C(x, T, p).
 p
Thus we have Ex,i sup0≤t≤T1 |X(t)| ≤ C(x, T, p). Because T1 ≤ T is
arbitrary, it follows that
 
p
Ex,i sup |X(t)| ≤ C(x, T, p), p ≥ 2.
0≤t≤T
2.3 Regularity 33

γ γ
Step 2. Note that sup0≤t≤T |X(t)| = sup0≤t≤T |X(s)| for any γ > 0.
Thus we obtain from Hölder’s inequality and Step 1 that for any 1 ≤ p < 2,
    1/2
p 2p
Ex,i sup |X(t)| ≤ Ex,i sup |X(t)| ≤ C(T, x, p) < ∞.
0≤t≤T 0≤t≤T

Step 3. Finally, if 0 < p < 1, note that


p p p 1+p
|X(t)| = |X(t)| I{|X(t)|≥1} + |X(t)| I{|X(t)|<1} ≤ 1 + |X(t)| .

Hence it follows from Step 2 that


   
p 1+p
Ex,i sup |X(t)| ≤ Ex,i 1 + sup |X(t)| ≤ C(x, T, p).
0≤t≤T 0≤t≤T

This completes the proof of the proposition. 2


Proposition 2.4. The process (X(t), α(t)) is càdlàg. That is, the sample
paths of (X(t), α(t)) are right continuous and have left limits.

Proof. It is well known that the sample paths of the discrete component
α(·) are right continuous with left limits. Hence it remains to show that
the same is true for the continuous component X(·). To this end, let 0 ≤
s < t ≤ T with T being any fixed positive number, and consider
Z t Z t
X(t) − X(s) = b(X(u), α(u))du + σ(X(u), α(u))dw(u).
s s

Using Lemma 2.3 and [47, Theorem 4.6.3], detailed computations lead to
4 2
Ex,i |X(t) − X(s)| ≤ C |t − s| ,

where C is a constant dependent on T , the initial condition x, and the


linear growth and Lipschitz constant of the coefficients b and σ. Then the
desired result follows from Kolmogorov’s continuity criterion. 2
Remark 2.5. By considering x-dependent generator Q(x), our model pro-
vides a more realistic formulation allowing the switching component de-
pending on the continuous states. This, in turn, allows the coupling and
correlation between X(t) and α(t).

2.3 Regularity
It follows from Theorem 2.1 that if the coefficients satisfy the linear growth
and the local Lipschitz condition, then the solution (X(t), α(t)) of (2.2)–
(2.3) is defined for all t > 0. Nevertheless, the linear growth condition puts
34 2. Switching Diffusion

some restrictions on the applicability. For instance, consider the real-valued


regime-switching diffusion
dX(t) = −X 3 (t)dt − σ(α(t))dw(t), (2.13)
where σ(i), i ∈ M are constants. Even though the drift coefficient is only
local Lipschitz continuous and does not satisfy the linear growth condition,
the regime-switching diffusion will not blow up in finite time because the
drift coefficient forces the process to move towards the origin. Hence it is
expected that the solution to (2.13) exists for all t ≥ 0.
Let (X(t), α(t)) be a regime-switching diffusion given by (2.2) and (2.3)
whose drift and diffusion coefficients satisfy the local Lipschitz condition
(2.11). For any n = 1, 2, . . . and (x, α) ∈ Rr × M, we define βn to be the
first exit time of the process (X(t), α(t)) from the ball centered at the origin
with radius n, B(0, n) := {ξ ∈ Rr : |ξ| < n}. That is,
βn = βnx,α := inf{t ≥ 0 : |X x,α (t)| ≥ n}. (2.14)
Note that the sequence {βn } is monotonically increasing and hence has a
(finite or infinite) limit. Denote the limit by β∞ . To proceed, we first give
a definition of regularity.
Definition 2.6. Regularity. A Markov process (X(t), α(t)) with initial
data (X(0), α(0)) = (x, α) is said to be regular, if
β∞ = lim βn = ∞ a.s. (2.15)
n→∞

It is obvious to see that the process (X x,α (t), αx,α (t)) is regular if and
only if for any 0 < T < ∞,
P{ sup |X x,α (t)| = ∞} = 0. (2.16)
0≤t≤T

That is, a process is regular if and only if it does not blow up in finite
time. Thus, (2.16) can be used as an alternate definition. Nevertheless, it
is handy to use equation (2.15) to delineate the regularity.
In what follows, we take up the regularity issue of a regime-switching
diffusion when its coefficients do not satisfy the linear growth condition.
The following theorem gives sufficient conditions for regularity, which is
based on “local” linear growth and Lipschitz continuity.
Theorem 2.7. Suppose that for each i ∈ M, both the drift b(·, i) and the
diffusion coefficient σ(·, i) satisfy the linear growth and Lipschitz condition
in every bounded open set in Rr , and that there is a nonnegative function
V (·, ·) : Rr × M 7→ R+ that is twice continuously differentiable with respect
to x ∈ Rr for each i ∈ M such that there is an γ0 > 0 satisfying

LV (x, i) ≤ γ0 V (x, i), for all (x, i) ∈ Rr × M,


(2.17)
VR := inf V (x, i) → ∞ as R → ∞.
|x|≥R, i∈M
2.3 Regularity 35

Then the process (X(t), α(t)) is regular.

Proof. As argued earlier, it suffices to prove (2.15). Suppose on the con-


trary that (2.15) were false, then there would exist some T > 0 and ε > 0
such that Px,i {β∞ ≤ T } > ε. Therefore we could find some n1 ∈ N such
that
Px,i {βn ≤ T } > ε, for all n ≥ n1 . (2.18)
Define

S(x, i, t) = V (x, i) exp(−γ0 t), (x, i) ∈ Rr × M, and t ≥ 0.

Then it satisfies [(∂/∂t) + L]S(x, i, t) ≤ 0. By virtue of Dynkin’s formula,


we have

Ex,i [V (X(βn ∧ T ), α(βn ∧ T )) exp(−γ0 (βn ∧ T ))] − V (x, i)


Z βn ∧T  

= Ex,i + L S(X(u), α(u), u)du ≤ 0.
0 ∂t

Hence we have

V (x, i) ≥ Ex,i [V (X(βn ∧ T ), α(βn ∧ T )) exp(−γ0 (βn ∧ T ))] .

Note that βn ∧ T ≤ T and V is nonnegative. Thus we have

V (x, i) exp {γ0 T } ≥ Ex,i [V (X(βn ∧ T ), α(βn ∧ T ))]


 
≥ Ex,i V (X(βn ), α(βn ))I{βn ≤T } .

Furthermore, by the definition of βn and (2.17), we have

V (x, i) exp {γ0 T } ≥ Vn Px,i {βn ≤ T } > εVn → ∞, as n → ∞.

This is a contradiction. Thus we must have limn→∞ βn = ∞ a.s. This


completes the proof. 2
Similar to the proof of [83, Theorem 3.4.2], we can also prove the following
theorem.

Theorem 2.8. Suppose that for each i ∈ M, both the drift b(·, i) and
the diffusion coefficient σ(·, i) satisfy the linear growth and Lipschitz con-
dition in every bounded open set in Rr , and that there is a nonnegative and
bounded function V (·, ·) : Rr × M 7→ R+ that is not identically zero, that
for each i ∈ M, V (x, i) is twice continuously differentiable with respect to
x ∈ Rr such that there is a γ1 > 0 satisfying

LV (x, i) ≥ γ1 V (x, i), for all (x, i) ∈ Rr × M,


36 2. Switching Diffusion

Then the process (X(t), α(t)) is not regular. In particular, for any ε > 0,
we have
Px0 ,` {β∞ < κ + ε} > 0,
where β∞ := limn→∞ βn , (x0 , `) ∈ Rr × M satisfying V (x0 , `) > 0, and
 
sup V (x, i)
1  (x,i)∈Rr ×M 
κ= log  .
γ1 V (x0 , `)

Remark 2.9. If the coefficients of (2.2)–(2.3) satisfy both the local Lip-
schitz condition (2.11) and linear growth condition (2.10), then by The-
orem 2.1, there is a unique solution (X x,α (t), αx,α (t)) to (2.2)–(2.3) for
all t ≥ 0. The regime-switching diffusion (X x,α (t), αx,α (t)) is thus regu-
lar. Alternatively, we can use Theorem 2.7 to verify this. In fact, detailed
computations show that the function
2
V (x, i) = (|x| + 1)r/2 , (x, i) ∈ Rr × M
satisfies all conditions in Theorem 2.7. Thus the desired assertion follows.
In particular, we can verify that V (·, i) is twice continuously differentiable
for each i ∈ M and that LV (x, i) ≤ γ0 V (x, i) for some positive constant
γ0 and all (x, i) ∈ Rr × M. Thus for any t > 0, a similar argument as in
the proof of Theorem 2.7 leads to
V (x, i)eγ0 t
Px,i {βn < t} ≤ ,
Vn
where Vn = (1 + n2 )r/2 . Therefore for any t > 0 and ε > 0, there exists an
N ∈ N such that
Px,i {βn < t} < ε, for all n ≥ N, (2.19)
uniformly for x in any compact set F ⊂ Rr and i ∈ M.
Example 2.10. Consider a real-valued regime-switching diffusion
 
dX(t) = X(t) (b(α(t)) − a(α(t))X 2 (t))dt + σ(α(t))dw(t) , (2.20)
where w(·) is a one-dimensional standard Brownian motion, and α(·) ∈
M = {1, 2, . . . , m0 } is a jump process with appropriate generator Q(x).
Clearly, the coefficients of (2.20) do not satisfy the linear growth condition
if not all a(i) = 0 for i ∈ M. We claim that if a(i) > 0 for each i ∈ M,
then (2.20) is regular. To see this, we apply Theorem 2.7 and consider
V (x, i) := |x|2 , (x, i) ∈ R × M.
Clearly, V is nonnegative and satisfies
lim V (x, i) = ∞, for each i ∈ M.
|x|→∞
2.3 Regularity 37

Thus by Theorem 2.7, it remains to verify that for all (x, i) ∈ R × M, we


have LV (x, i) ≤ KV (x, i) for some positive constant K. To this end, we
compute
1
LV (x, i) = 2x · x[b(i) − a(i)x2 ] + 2 · x2 σ 2 (i)
2
= (2b(i) + σ 2 (i))x2 − 2a(i)x4
≤ KV (x, i),

where K = max 2b(i) + σ 2 (i), i = 1, . . . , m0 . Note in the above, we used
the assumption that a(i) > 0 for each i ∈ M. Hence it follows from Theo-
rem 2.7 that (2.20) is regular.
For demonstration, we plot a sample path of (2.20) with its coefficients
specified as follows. The jump process α(·) ∈ M = {1, 2} is generated by
 
−3 − sin x cos x 3 + sin x cos x
Q(x) =  ,
2 −2

and
b(1) = 3, a(1) = 2, σ(1) = 1
b(2) = 10, a(1) = 1, σ(1) = −1.

Figure 2.1 plots a sample path of (2.20) with initial condition (x, α) =
(2.5, 1).

Example 2.11. (Lotka–Volterra model cont.) We continue the discussion


initiated in Example 1.1. Regarding the concept of regularity, we can show
that if the ecosystem is self-regulating or competitive, then the population
will not explode in finite time almost surely. Recall that the system is
competitive if all the values in the community matrix A(α) are nonnegative,
that is, aij (α) ≥ 0 for all α ∈ M = {1, . . . , m} and i, j = 1, 2, . . . , n. The
competition among the same species is assumed to be strictly positive so
that each species grows “logistically” in any environment. To interpret it in
another way, members of the same species compete with one another. Take,
for instance, bee colonies in a field. They will compete for food strongly
with the colonies located near them. We assume that the self-regulating
competition within the same species is strictly positive, that is, for each
α ∈ M = {1, 2, . . . , m} and i, j = 1, 2, . . . , n with j 6= i, aii (α) > 0 and
aij (α) ≥ 0. It was proved in [189], for any initial conditions (x(0), α(0)) =
(x0 , α) ∈ Rn+ × M, where

Rn+ = {(x1 , . . . , xn ) : xi > 0, i = 1, . . . , n} ,

there is a unique solution x(t) to (1.1) on t ≥ 0, and the solution will remain
in Rn+ almost surely; that is, x(t) ∈ Rn+ for any t ≥ 0 with probability 1.
38 2. Switching Diffusion

4.5
X(t)
4 α(t)

3.5

3
X(t) and α(t)

2.5

1.5

0.5

0
0 2 4 6 8 10
t

FIGURE 2.1. A sample path of switching diffusion (2.20) with initial condition
(x, α) = (2.5, 1).

2.4 Weak Continuity


Recall that a stochastic process Y (t) with right continuous sample paths
is said to be continuous in probability at t if for any η > 0,

lim P(|Y (t + ∆) − Y (t)| ≥ η) = 0. (2.21)


∆→0

It is continuous in mean square at t if

lim E|Y (t + ∆) − Y (t)|2 = 0. (2.22)


∆→0

The process Y (t) is said to be continuous in probability in the interval [0, T ]


(or in short continuous in probability if [0, T ] is clearly understood), if it
is continuous in probability at every t ∈ [0, T ]. Likewise it is continuous
in mean square if it is continuous in mean square at every t ∈ [0, T ]. We
proceed to obtain a continuity result for the two-component switching dif-
fusion processes Y (t) = (X(t), α(t)). The results are presented in two parts.
The first part concentrates on Markovian switching diffusions, whereas the
second one is concerned with state-dependent diffusions.

Theorem 2.12. Suppose that the conditions of Theorem 2.1 are satisfied
with the modification Q(x) = Q that generates a Markov chain independent
2.4 Weak Continuity 39

of the Brownian motion. Then the process Y (t) = (X(t), α(t)) is continuous
in probability and also continuous in mean square.

Proof. We show that for any η > 0,

P(|Y (t + ∆) − Y (t)| ≥ η) → 0 as ∆ → 0, and


(2.23)
E|Y (t + ∆) − Y (t)|2 → 0 as ∆ → 0.

We proceed first to establish the mean square convergence above. Note that

Y (t + ∆) − Y (t) = (X(t + ∆), α(t + ∆)) − (X(t + ∆), α(t))


(2.24)
+(X(t + ∆), α(t)) − (X(t), α(t)).

We divide the rest of the proof into several steps.


Step 1: First we recognize that in view of (2.24),

E|Y (t + ∆) − Y (t)|2 ≤ 2 E|α(t + ∆) − α(t)|2
 (2.25)
+E|X(t + ∆) − X(t)|2 .

Thus to estimate the difference of the second moment, it suffices to consider


the two marginal estimates separately. We do this in the next two steps.
Step 2: We claim that for any t ≥ 0 and ∆ ≥ 0,

E|X(t + ∆) − X(t)|2 ≤ K∆. (2.26)

This estimate, in fact, is a modification of the standard estimates for


stochastic differential equations. It mainly uses the linear growth and Lip-
schitz conditions of the drift and diffusion coefficients and Proposition 2.3.
We thus omit the details.
Step 3: Note that for any t ≥ 0,
m0
X
α(t) = iI{α(t)=i} = χ(t)(1, . . . , m0 )0 ,
i=1

where

χ(t) = (χ1 (t), . . . , χm0 (t)) = (I{α(t)=1} , . . . , I{α(t)=m0 } ) ∈ R1×m0 , (2.27)

and (1, . . . , m0 )0 ∈ Rm0 is a column vector. Because the Markov chain α(t)
is independent of the Brownian motion w(·) (Q is a constant matrix), it is
well known that
Z t+∆
χ(t + ∆) − χ(t) − χ(s)Qds
t
40 2. Switching Diffusion

is a martingale; see Lemma A.5 in this book or [176, Lemma 2.4]. It follows
that  Z 
t+∆
Et χ(t + ∆) − χ(t) − χ(s)Qds = 0,
t
where Et denotes the conditional expectation on the σ-algebra
Ft = {(X(u), α(u)) : u ≤ t}.
It then follows that
Z t+∆
χ(s)Qds = O(∆) a.s. (2.28)
t

Thus, we obtain
Et χ(t + ∆) = χ(t) + O(∆) a.s. (2.29)
In view of this structure, we see that
χ(t + ∆) − χ(t) = (χ1 (t + ∆) − χ1 (t), . . . , χm0 (t + ∆) − χm0 (t))
with χi (·) given by (2.27). This together with (2.29) implies

Et [χi (t + ∆) − χi (t)]2
= Et [I{α(t+∆)=i} − I{α(t)=i} ]2
  (2.30)
= Et I{α(t+∆)=i} − 2I{α(t)=i} Et I{α(t+∆)=i} + I{α(t)=i}
= O(∆) a.s.

Step 4: Next we consider

E[α(t + ∆) − α(t)]2
= E|[χ(t + ∆) − χ(t)](1, . . . , m0 )0 |2
≤ KE|χ(t + ∆) − χ(t)|2 (2.31)
m0
X
≤K EEt [χi (t + ∆) − χi (t)]2
i=1
≤ K∆ → 0 as ∆ → 0.

From the next to the last line to the last line above, we have used (2.30).
By combining (2.26) and (2.31), we obtain that (2.25) leads to
E|Y (t + ∆) − Y (t)|2 → 0 as ∆ → 0.
The mean square continuity has been established. Then the desired con-
tinuity in probability follows from Tchebyshev’s inequality. 2
We next generalize the above result and allow the switching process to
be x-dependent. The result is presented next.
2.5 Feller Property 41

Theorem 2.13. Suppose that the conditions of Theorem 2.1 are satis-
fied. Then the process Y (t) = (X(t), α(t)) is continuous in probability and
continuous in mean square.

Proof. Step 1 and Step 2 are the same as before. We only point out the
main difference as compared to Theorem 2.12.
Consider the function h(x, α) = I{α=i} for each i ∈ M. Because h is
independent of x, it is readily seen that Lh(x, α) = Q(x)h(x, ·)(α). Conse-
quently,
h Z t+∆ i
Et h(X(t + δ), α(t + ∆)) − h(X(t), α(t)) − Lh(X(s), i)ds = 0.
t

However,
h Z t+∆ i
Et h(X(t + ∆), α(t + ∆)) − h(X(t), α(t)) − Lh(X(s), α(s))ds
h t
= Et h(X(t + ∆), α(t + ∆)) − h(X(t), α(t))
Z t+∆ i
− Q(X(s))h(X(s), ·)(α(s))ds
t Z t+∆ X
h m0 i
= Et I{α(t+∆)=i} − I{α(t)=i} − qij (X(s))I{α(s)=i} ds .
t j=1

Because Q(x) is bounded, similar to (2.28), we obtain


Z t+∆
χ(s)Q(X(s))ds = O(∆) a.s. (2.32)
t

With (2.32) at our hands, we proceed with the rest of Step 3 and Step 4 in
the proof of Theorem 2.12. The desired result follows. 2

2.5 Feller Property


In the theory of Markov processes and their applications, dealing with
a Markov process ξ(t) with ξ(0) = x, for a suitable function f (·), often
one must consider the function u(t, x) = Ex f (ξ(t)). Following [38], the
process ξ(t) is said to be Feller if u(t, ·) is continuous for any t ≥ 0 and
limt↓0 u(t, x) = f (x) for any bounded and continuous function f and it is
said to be strong Feller if u(t, ·) is continuous for any t > 0 and any bounded
and measurable function f . This is a natural condition in physical or social
modeling. It indicates that a slight perturbation of the initial data should
result in a small perturbation in the subsequent movement. For example,
let X x (t) be a diffusion process of appropriate dimension with X(0) = x,
42 2. Switching Diffusion

we say that X x (·) satisfies the Feller property or X x (·) is a Feller process
if for any t ≥ 0 and any bounded and continuous function g(·), u(t, x) =
Ex g(X(t)) is continuous with respect to x and limt↓0 u(t, x) = g(x); see [38,
57] and many references therein for the pioneering work on Feller properties
for the diffusions. For switching-diffusion processes, do the Feller and the
strong Feller properties carry over? This section is concerned with Feller
property, whereas the next section deals with strong Feller property.
We show that such properties carry over to the switching diffusions, but
the proofs are in fact nontrivial. The difficulty stems from the consideration
of x-dependent generator Q(x) for the discrete component α(t). The clas-
sical arguments for Feller and strong Feller properties of diffusions will not
work here. We need to prove that the function u(t, x, i) = Ex,i f (X(t), α(t))
is continuous with respect to the initial data (x, i) for any t ≥ 0 and
limt↓0 u(t, x, i) = f (x, i) for any bounded and continuous function f . By
virtue of Proposition 2.4 and the boundedness and continuity of f , we have
limt↓0 u(t, x, i) = Ex,i f (X(0), α(0)) = f (x, i). Note also that u(0, x, i) is au-
tomatically continuous by the continuity of f . Because M = {1, . . . , m0 } is
a finite set, it is enough to show that u(t, x, i) is continuous with respect to
x for any t > 0. In this section, we establish the Feller property for switch-
ing diffusions under Lipschitz continuity and linear growth conditions. This
section is motivated by the recent work [161]. The discussion of the strong
Feller property is deferred to Section 2.6.
We first present the following lemma. In lieu of the local Lipschitz con-
dition, we assume a global Lipschitz condition holds henceforth without
specific mentioning. Also, we assume for the moment that the discrete
component α(·) is generated by a constant Q.
Lemma 2.14. Assume the conditions of Theorem 2.1 hold with the mod-
ification of the local Lipschitz condition replaced by a global Lipschitz con-
dition. Moreover, suppose that α(·) is generated by Q(x) = Q and that α(·)
is independent of the Brownian motion w(·). Then for any fixed T > 0, we
have  
2
2
E sup X xe,α (t) − X x,α (t) ≤ C |e
x − x| , (2.33)
0≤t≤T
where C is a constant depending only on T and the global Lipschitz and the
linear growth constant K.

Proof. Assume that x e = x + ∆x and let 0 < t ≤ T . Moreover, for nota-


e
tional simplicity, we denote X(t) = X xe,α (t) and X(t) = X x,α (t). By the
assumption of the theorem, Q(x) = Q that is independent of x, we have
αxe,α (t) = αx,α (t) = α(t). Then we have
Z t
e
X(t) − X(t) = ∆x + e
[b(X(s), α(s)) − b(X(s), α(s))]ds
Z t 0
+ e
[σ(X(s), α(s)) − b(X(s), α(s))]dw(s).
0
2.5 Feller Property 43

Thus it follows that

2
Z t 2
e − X(t) 2e
X(t) ≤ 3 |∆x| + 3 [b(X(s), α(s)) − b(X(s), α(s))]ds
0
Z t 2
+3 e e(s)) − b(X(s), α(s))]dw(s) .
[σ(X(s), α
0
(2.34)
Using the assumption that b(·, j) is Lipschitz continuous for each j ∈ M,
we have

e
b(X(s), α(s)) − b(X(s), α(s))
m0
X
= e
[b(X(s), j) − b(X(s), j)]I{α(s)=j}
j=1
e
≤ K X(s) − X(s) ,

where in the above and hereafter, K is a generic positive constant not de-
e, or t whose exact value may change in different appearances.
pending on x, x
Consequently, we have from the Hölder inequality that
" Z 2
#
t
E sup e
[b(X(s), α(s)) − b(X(s), α(s))]ds
0≤t≤T1 0
Z T1 2
≤ TE e
b(X(s), α(s)) − b(X(s), α(s)) ds (2.35)
Z0 T1 2
≤ KT e
E X(s) − X(s) ds,
0

where 0 ≤ T1 ≤ T . Similarly the Lipschitz continuity yields that

e
σ(X(s), e(s)) − σ(X(s), α
α e
e(s)) ≤ K X(s) − X(s) ,

and hence the Burkholder–Davis–Gundy inequality (see Lemma A.32) leads


to
" Z 2
#
t
E sup e
[σ(X(s), α(s)) − σ(X(s), α(s))]dw(s)
0≤t≤T1 0
Z T1 2
≤E e
σ(X(s), α(s)) − σ(X(s), α(s)) ds (2.36)
Z0 T1 2
≤K e
E X(s) − X(s) ds.
0
44 2. Switching Diffusion

Therefore it follows from (2.34)–(2.36) that


 
2
E sup e − X(t)
X(t)
0≤t≤T1
Z T1 2
2 e
≤ 3 |∆x| + K(T + 1) E X(s) − X(s) ds
Z T1 
0
2

2 e
≤ 3 |∆x| + K(T + 1) E sup X(u) − X(u) ds.
0 0≤u≤s

Then Gronwall’s inequality implies that


 
2
e − X(t) 2
E sup X(t) ≤ 3 |∆x| exp {K(T + 1)T1 }
0≤t≤T1
2
≤ 3 |∆| exp {K(T + 1)T } .

The above inequality is true for any 0 ≤ T1 ≤ T , hence we have


 
2
e − X(t) 2 2
E sup X(t) ≤ 3 |∆x| exp {K(T + 1)T } = K |e
x − x| .
0≤t≤T

This completes the proof. 2

Remark 2.15. An immediate consequence of (2.33) is that we obtain the


Feller property for the case when the discrete component α(·) is generated
by a constant Q following the argument in [130, Lemma 8.1.4]. But for the
general case when the generator Q(x) of α(·) is x-dependent, the proof in
[130] is not applicable since (2.33) is not proven yet. New methodology is
needed to treat (2.33); see Section 2.7 for more details.

To proceed, in addition to the assumptions of Theorem 2.1, we assume


that the coefficients of (2.2)–(2.3) are bounded, continuously differentiable,
and satisfy the global Lipschitz condition (all with respect to x). That is,
we assume:

(A2.1) For α, ` ∈ M and i, j = 1, . . . , r, aij (·, α) ∈ C 2 , b(·, α) ∈ C 1 ,


and qα` (·) ∈ C 1 , and for some positive constant K, we have

|b(x, i)| + |σ(x, i)| + |qij (x)| ≤ K, x ∈ Rr , i, j ∈ M, (2.37)

e, x ∈ Rr , we have
and for α, ` ∈ M and x

|σ(e
x, α) − σ(x, α)| + |b(e
x, α) − b(x, α)| + |qα` (e
x) − qα` (x)|
≤ K |e
x − x| .
(2.38)
2.5 Feller Property 45

Consider the auxiliary process (Z x,α (t), r α (t)) ∈ Rr × M defined by



 dZ(t) = b(Z(t), r(t))dt + σ(Z(t), r(t))dw(t),
(2.39)
 P {r(t + ∆) = j|r(t) = i, r(s), 0 ≤ s ≤ t} = ∆ + o(∆), j 6= i,

with initial conditions Z(0) = x and r(0) = α, where w(t) is a d-dimensional


standard Brownian motion independent of the Markov chain r(t). The
Markov chain r(t) thus can be viewed as one with the generator
 
−(m0 − 1) 1 ... 1 1
 1 −(m0 − 1) . . . 1 1 
 
... ... ... ... ...
1 1 . . . 1 −(m0 − 1)
= −m0 Im0 + (1lm0 , 1lm0 , . . . , 1lm0 ),

where Im0 is the m0 × m0 identity matrix and 1lm0 is a column vector of


dimension m0 with all components being 1.
For any T > 0, denote by µT,1 (·) the measure induced by the process
(X x,α (t), αx,α (t)) and by µT,2 (·) the measure induced by the auxiliary pro-
cess (Z x,α (t), r α (t)). Then by virtue of [41, 42], µT,1 (·) is absolutely contin-
uous with respect to µT,2 (·). Moreover, the corresponding Radon–Nikodyn
derivative has the form

pT (Z x,α (·), r α (·))


dµT,1 x,α
= (Z (·), r α (·))
dµT,2 ( )
Z T
= exp {(m0 − 1)T } exp − qr(τn(T ) ) (Z(s))ds
τn(T )
n(T )−1   Z 
Y τi+1
× qr(τi )r(τi+1 ) (Z(τi+1 )) exp − qr(τi ) (Z(s))ds ,
i=0 τi

P (2.40)
where for each i ∈ M, qi (x) := −qii (x) = q
j6=i ij (x), τ i is a sequence of
stopping times defined by:
τ0 = 0,
and for i = 0, 1, . . .,
τi+1 := inf {t > τi : r(t) 6= r(τi )} ,
and n(T ) = max {n ∈ N : τn ≤ T }. Note that if n(T ) = 0, then
( Z )
T
x,α α
pT (Z (·), r (·)) = exp − [qα (Z(s)) − m0 + 1] ds .
0

Concerning the Radon–Nikodyn derivative pT (Z x,α (·), r α (·)), we have


the following results.
46 2. Switching Diffusion

Lemma 2.16. Assume the conditions of Theorem 2.1 and (A2.1). Then
for any T > 0 and (x, α) ∈ Rr × M, we have

E |pT (Z x,α (·), r α (·))| ≤ K < ∞.

Proof. By virtue of (2.37) and (2.40), we have

E |pT (Z x,α (·), r α (·))| ≤ exp {(m0 − 1)T } EK n(T ) .

 that n(T ) is a Poisson process with rate m0 − 1. Thus it follows that


Note
E K n(T ) < ∞ and the desired conclusion follows. 2

Lemma 2.17. Assume the conditions of Lemma 2.16. Then for any T >
e, x ∈ Rr , and α ∈ M, we have
0, x

E pT (Z xe,α (·), r α (·)) − pT (Z x,α (·), r α (·)) ≤ K |e


x − x| .

Proof. Similar to the notations in the proof of Lemma 2.14, we denote


e = Z xe,α (t), Z(t) = Z x,α (t), and r(t) = r α (t). Note that for any positive
Z(t)
n n
sequences {ck }k=1 and {dk }k=1 , we have by induction that (see also [41, 42])

n
Y n
Y  n−1
ck − dk ≤ n max {ck , dk } max {|ck − dk |} . (2.41)
k=1,...,n k=1,...,n
k=1 k=1

e
Applying (2.41) to pT (Z(·), r(·)) and pT (Z(·), r(·)), we have

e
E pT (Z(·), r(·)) − pT (Z(·), r(·))
" ∞
X
≤ exp {(m0 − 1)T } E (n + 1)K n I{n(T )=n}
( n=0
RT RT
e
× max e− τn
qr(τn ) (Z(s))ds
− e− τn
qr(τn ) (Z(s))ds
,
i=0,1,...,n−1
R τi+1
e i+1 ))e− e
qr(τi ) (Z(s))ds
qr(τi )r(τi+1 ) (Z(τ τi
)#
R τi+1
−qr(τi )r(τi+1 ) (Z(τi+1 ))e− τi qr(τi ) (Z(s))ds
.

Note that e−c − e−d ≤ |c − d| for any c, d ≥ 0. Meanwhile, (2.38) implies


that
x) − qi (x)| ≤ K |e
|qi (e e, x ∈ Rr , i ∈ M.
x − x| , x
2.5 Feller Property 47

Thus it follows that


Z ! Z !
T T
exp − e
qr(τn ) (Z(s))ds − exp − qr(τn ) (Z(s))ds
τn τn
Z T Z T
≤ e
qr(τn ) (Z(s))ds − qr(τn ) (Z(s))ds
τn τn
Z T
≤ e
qr(τn ) (Z(s)) − qr(τn ) (Z(s)) ds
ZτnT
≤ e − Z(s) ds
K Z(s)
nτn o
e − Z(s) : 0 ≤ s ≤ T .
≤ T sup Z(s)

Similarly, for any i = 0, 1, . . . , n − 1, we have


R τi+1
e i+1 ))e− e
qr(τi ) (Z(s))ds
qr(τi )r(τi+1 ) (Z(τ τi

R τi+1
−qr(τi )r(τi+1 ) (Z(τi+1 ))e− τi qr(τi ) (Z(s))ds
h Rτ R τi+1 i
e
e i+1 )) e− τii+1 qr(τi ) (Z(s))ds
≤ qr(τi )r(τi+1 ) (Z(τ − e− τi qr(τi ) (Z(s))ds
h i Rτ
e i+1 )) − qr(τ )r(τ ) (Z(τi+1 )) e− τii+1 qr(τi ) (Z(s))ds
+ qr(τi )r(τi+1 ) (Z(τ i i+1
Z τi+1
≤K e
qr(τi ) (Z(s)) − qr(τi ) (Z(s)) ds + K Z(τe i+1 ) − Z(τi+1 )
τi n o
≤ K(T + 1) sup Z(s) e − Z(s) : 0 ≤ s ≤ T .

Hence it follows that

e
E pT (Z(·), r(·)) − pT (Z(·), r(·))
" ∞
X
≤E exp {(m0 − 1)T } (n + 1)K n I{n(T )=n}
n=0 #
n o
×K(T + 1) sup e − Z(s) : 0 ≤ s ≤ T
Z(s)

X
≤ exp {(m0 − 1)T } (T + 1)(n + 1)K n+1
n=0  n o 2
  1/2
×E 1/2
I{n(T )=n} E e
sup Z(s) − Z(s) : 0 ≤ s ≤ T .

As noted in the proof of Lemma 2.16, n(T ) is a Poisson process with rate
m0 − 1. Hence

  [(m0 − 1)T ]n
E I{n(T )=n} = P {n(T ) = n} = exp {−(m0 − 1)T } .
n!
48 2. Switching Diffusion

Also, because the generator of r(t) of the auxiliary process (Z(t), r(t)) is a
constant matrix, by virtue of Lemma 2.14, we have
 n o 2
E sup Z(s) e − Z(s) : 0 ≤ s ≤ T
  
2
= E sup Z(s) e − Z(s) : 0 ≤ s ≤ T
2
≤ K |e
x − x| .

Therefore, it follows that



X
e (n + 1)K n/2
E pT (Z(·), r(·)) − pT (Z(·), r(·)) ≤ K |e
x − x| √
n=0 n!
≤ K |e
x − x| ,

where in the above, we used the fact that

X∞
(n + 1)K n/2
√ < ∞,
n=0 n!

e − x, or t. This
e, x, x
and as before, K is a generic constant independent of x
completes the proof of the lemma. 2
With the preparations above, we prove the Feller property for the switch-
ing diffusion process.

Theorem 2.18. Let (X x,α (t), αx,α (t)) be the solution to the system given
by (2.2)–(2.3) with (X(0), α(0)) = (x, α). Assume the conditions of Theo-
rem 2.1 hold. Then for any bounded and continuous function g(·, ·) : Rr ×
M → R, the function u(x, α) = Ex,α g(X(t), α(t)) = Eg(X x,α (t), αx,α (t))
is continuous with respect to x.

Proof. We prove the theorem in two steps. The first step deals with the
special case when Assumption (A2.1) is true. The general case is treated
in Step 2.
Step 1. Assume (A2.1). Then for any fixed t > 0, we have

E [g(X x,α (t), αx,α (t))] = E [g(Z x,α (t), r α (t))pt (Z x,α (·), r α (·))] .

Let {xn } be a sequence of points converging to x. Then Lemma 2.14 implies


that
2
E |Z xn ,α (t) − Z x,α (t)| → 0 as n → ∞.
Thus there is a subsequence {yn } of {xn } such that

Z yn ,α (t) → Z x,α (t) a.s. as n → ∞.


2.5 Feller Property 49

By virtue of Lemma 2.17,

E |pt (Z yn ,α (·), r α (·)) − pt (Z x,α (·), r α (·))| → 0 as n → ∞.

Hence there is a subsequence {zn } of {yn } such that

pt (Z zn ,α (·), r α (·)) → pt (Z x,α (·), r α (·)) a.s. as n → ∞.

Now g is bounded and continuous, it follows from Lemma 2.16 and the
dominated convergence theorem that

u(x, α) = E [g(X x,α (t), αx,α (t))]


= E [g(Z x,α (t), r α (t))pt (Z x,α (·), r α (·))]
h i
= E lim g(Z zn ,α (t), r α (t))pt (Z zn ,α (·), r α (·))
n→∞
= lim E [g(Z zn ,α (t), r α (t))pt (Z zn ,α (·), r α (·))]
n→∞
= lim E [g(X zn ,α (t), αzn ,α (t))]
n→∞
= lim u(yn , α).
n→∞

Therefore, every sequence {xn } converging to x has a subsequence {zn }


such that u(x, α) ≤ lim inf n→∞ u(zn , α). It can be shown that the set
{x ∈ Rr : u(x, α) > β} is open for any real number β and each α ∈ M.
That is, u(·, α) is lower semi-continuous for each α ∈ M (see [143, p. 37]).
Applying the above argument to −u(x, α) = E[−g(X x,α (t), αx,α (t))], we
obtain that
u(x, α) ≥ lim sup u(zn , α),
n→∞

and hence u(·, α) is upper semi-continuous for each α ∈ M. Therefore,


u(x, α) is continuous with respect to x.
Step 2. Fix any t ≥ 0. We construct an N -truncated process in light
of the approach in [102]. Let N be any positive integer, and define an N -
truncation process X N (t) so that X N (t) = X(t) up until the first exit
from the N -ball B(0, N ) = {x ∈ Rr : |x| < N }. Associated with the N -
truncated process X N (t), we construct an auxiliary operator as follows.
For N = 1, 2, . . ., let φN (x) be a C ∞ function with range [0, 1] satisfying

N 1, if |x| ≤ N ,
φ (x) =
0, if |x| ≥ N + 1.

Now for j, k = 1, 2, . . . , n and (x, i) ∈ Rr × M, define

aN N
jk (x, i) := ajk (x, i)φ (x),

bN N
j (x, i) := bj (x, i)φ (x).
50 2. Switching Diffusion

For any ϕ(·, i) ∈ C 2 , i ∈ M, define the operator LN as

n
1 X N ∂2
LN ϕ(x, i) = ajk (x, i) ϕ(x, i)
2 ∂xj ∂xk
j,k=1
X n m0
X (2.42)

+ bN
j (x, i) ϕ(x, i) + qij (x)ϕ(x, j).
j=1
∂xj j=1

Denote by PN x,i the probability measure for which the associated martingale
problem [153] has operator LN with coefficients aN N
jk (x, i), bj (x, i), and
qij (x), and denote by EN x,i the corresponding expectation. Then by Step 1,
N
(X (t), α(t)) is Feller.
As in (2.14), let βN be the first exit time from the ball B(0, N ). Then
by virtue of the strong uniqueness result in [172], the probabilities Px,i
and PN x,i agree until the moment when the continuous component reaches
the boundary |x| = N . Hence it follows that for any bounded and Borel
measurable function f (·, ·) : Rr × M 7→ R, we have

   
Ex,i f (X(t), α(t))I{βN >t} = EN
x,i f (X(t), α(t))I{βN >t} . (2.43)

(Alternatively, one can obtain (2.43) by showing that the solution to the
martingale problem with operator L is unique in the weak sense. The weak
uniqueness can be established by using the characteristic function as in the
proof of [176, Lemma 7.18].)
Fix (x0 , α) ∈ Rr × M. By virtue of (2.19) and (2.43), it follows that for
any t > 0 and ε > 0, there exists an N ∈ N sufficiently large and a bounded
neighborhood N (x0 ) of x0 (N (x0 ) = {x : |x − x0 | ≤ N }) such that for any
(x, i) ∈ N (x0 ) × M, we have

Ex,i g(X(t), α(t)) − EN


x,i g(X(t), α(t))
   
= Ex,i g(X(t), α(t))I{βN ≤t} − EN x,i g(X(t), α(t))I{βN ≤t}
 
≤ kgk Px,i {βN ≤ t} + PN x,i {βN ≤ t}

= 2 kgk [1 − PN
x,i {βN > t}]

= 2 kgk [1 − Px,i {βN > t}] < ε/2,


(2.44)
where k·k is the essential sup-norm.
Now let {xn } ⊂ N (x0 ) be any sequence converging to x0 . Then it follows
2.5 Feller Property 51

from (2.44) and Step 1 that for N sufficiently large,

|E [g(X xn ,α (t), αxn ,α (t))] − E [g(X x0 ,α (t), αx0 ,α (t))]|


≤ E [g(X xn ,α (t), αxn ,α (t))] − EN [g(X xn ,α (t), αxn ,α (t))]
+ EN [g(X xn ,α (t), αxn ,α (t))] − EN [g(X x0 ,α (t), αx0 ,α (t))]
+ EN [g(X x0 ,α (t), αx0 ,α (t))] − E [g(X x0 ,α (t), αx0 ,α (t))]
≤ ε + EN [g(X xn ,α (t), αxn ,α (t))] − EN [g(X x0 ,α (t), αx0 ,α (t))]
→ ε + 0 = ε.

This proves that as n → ∞

u(xn , α) = E [g(X xn ,α (t), αxn ,α (t))]


→ E [g(X x0 ,α (t), αx0 ,α (t))] = u(x0 , α),

as desired. 2
Corollary 2.19. Assume that the conditions of Theorem 2.18 hold. Then
the process (X(t), α(t)) is strong Markov.

Proof. This claim follows from [47, Theorem 2.2.4], Lemma 2.4, and The-
orem 2.18. 2
In Step 2 of the proof of Theorem 2.18, we used a truncation device and
defined X N (t). As a digression, using such a truncation process, we obtain
the following result as a by-product. It enables us to extend the existence
and uniqueness to more general functions satisfying only local linear growth
and Lipschitz conditions.
Proposition 2.20. Under the conditions of Theorem 2.7, equation (2.2)
together with (2.3) has a unique solution a.s.

Proof. By virtue of Theorem 2.7, (X(t), α(t)) is regular and


P(β = ∞) = 1. (2.45)
Define the N -truncation process X N (t) as in Step 2 of the proof of Theo-
rem 2.18. By virtue of Theorem 2.1, and the local linear growth and Lip-
schitz conditions, X N (t) is the unique solution of (2.2)–(2.3) for t < βN .
The regularity implies that X N (t) is a solution of (2.2)–(2.3) for all t ≥ 0.
Since X N (t) coincides with X(t) for t < βN ,
!
P sup |X N (t) − X(t)| > 0 = 0.
0≤t<βN

Using (2.45) and letting N → ∞, the desired result follows. 2


52 2. Switching Diffusion

2.6 Strong Feller Property


Assume the following conditions hold throughout the section.

(A2.2) For α and i ∈ M and j, k = 1, . . . , r, the coefficients bj (x, i),


σjk (x, i), and qiα (x) are Hölder continuous with exponent γ (0 <
γ ≤ 1).
(A2.3) The matrix-valued Q(x) is irreducible for each x ∈ Rr .
(A2.4) For each i ∈ M, a(x, i) = (ajk (x, i)) is symmetric and satisfies

ha(x, i)ξ, ξi ≥ κ|ξ|2 , for all ξ ∈ Rr , (2.46)

with some positive constant κ ∈ R for all x ∈ Rr .

Lemma 2.21. Assume that f (·, i) ∈ Cb (Rr ) for i ∈ M. If for each i ∈ M,


the function u(·, ·, i) ∈ C 1,2 (R×Rr ) (the class of functions whose derivatives
with respect to t are continuous and partial derivatives with respect to x up
to the second order are continuous) is bounded and satisfies

 ∂u = Lu, t > 0, (x, i) ∈ Rr × M,
∂t (2.47)
 u(0, x, i) = f (x, i), (x, i) ∈ Rr × M.

Then u(t, x, i) = Ex,i f (X(t), α(t)).

Proof. For fixed t > 0 and (x, i) ∈ Rr × M, we apply generalized Itô’s


lemma (2.7) (see also [150]) to the function u(t − r, X(r), α(r)), r ≤ t, to
obtain

u(t − r ∧ βN , X(r ∧ βN ), α(r ∧ βN ))


Z r∧βN  

= u(t, x, i) + + L u(t − s, X(s), α(s))ds (2.48)
0 ∂s
+M1 (r ∧ βN ) + M2 (r ∧ βN ),

where (x, i) = (X(0), α(0)), N ∈ N satisfying N > |x|, βN is the first exit
time from the bounded ball B(0, N ) as defined in (2.14), and
Z r∧βN
M1 (r ∧ βN ) = ∇u(t − s, X(s), α(s)), σ(X(s), α(s))dw(s) .
Z0 r∧βN Z
M2 (r ∧ βN ) = [u(t − s, X(s), α(0) + h(X(s), α(s), z))
0 R
−u(t − s, X(s), α(s))] µ(ds, dz),

with µ(ds, dz) = p(ds, dz) − ds × m(dz) being a martingale measure. By


virtue of Proposition 2.3, the linear growth condition of σ, using the stan-
dard argument, we can show that M1 (r ∧ βN ) is a mean zero martingale.
2.6 Strong Feller Property 53

The boundedness of u immediately implies that M2 (r ∧ βN ) is also a mar-


tingale with zero mean. Therefore by taking expectations on both sides of
(2.48) and noting (2.47), we obtain

u(t, x, i) = Ex,i u(t − r ∧ βN , X(r ∧ βN ), α(r ∧ βN )).

Recall that βN → ∞ a.s. as N → ∞. Thus it follows from the bounded con-


vergence theorem that u(t, x, i) = Ex,i u(t−r, X(r), α(r)). Finally by taking
r = t and noting that by virtue of (2.47), u(0, X(t), α(t)) = f (X(t), α(t)),
we have u(t, x, i) = Ex,i f (X(t), α(t)). 2

Lemma 2.22. Assume in addition to (A2.2)–(A2.4) that for i, ` ∈ M and


j, k = 1, 2, . . . , r, the coefficients ajk (x, i), bj (x, i), and qi` (x) are bounded.
Then the process (X(t), α(t)) is strong Feller.

Proof. By virtue of [39, Theorem 2.1], the backward equation

∂u
= Lu, t>0
∂t
has a unique fundamental solution p(x, i, t, y, j) (as a function of t, x, i),
which is positive, jointly continuous in t, x, and y, and satisfies
( )
2
θ (−r+|θ|)/2 −c |y − x|
Dx p(x, i, t, y, j) ≤ Ct exp , (2.49)
t

where C and c are positive constants not depending on x, y, or t, and


θ = (θ1 , θ2 , . . . , θr ) is a multi-index with |θ| := θ1 + θ2 + · · · + θr ≤ 2, and

∂θ ∂ |θ|
Dxθ = := .
∂xθ ∂xθ11 ∂xθ22 · · · ∂xθrr

To see that p is the transition probability density of the process (X(t), α(t)),
consider an arbitrary bounded and continuous function φ(x, i), (x, i) ∈
Rr × M and define
m0 Z
X
Φ(t, x, i) := p(x, i, t, y, j)φ(y, j)dy.
j=1 Rr

Then Φ satisfies (2.47). Moreover, by virtue of (2.49) and the boundedness


of φ, Φ is also bounded. Thus it follows from Lemma 2.21 that Φ(t, x, i) =
Ex,i φ(X(t), α(t)) or
m0 Z
X
Ex,i φ(X(t), α(t)) = p(x, i, t, y, j)φ(y, j)dy.
j=1 Rr
54 2. Switching Diffusion

Hence the fundamental solution p is the transition probability density of


(X x,i (t), αx,i (t)). Then for each ι ∈ M and for any bounded and measurable
function f (x, i), the function
m0 Z
X
x 7→ Ex,i f (X(t), α(t)) = f (y, j)p(x, i, t, y, j)dy
j=1 Rr

is continuous by the dominated convergence theorem. 2


Using the argument in [38, Theorem 13.1], we obtain the following lemma.

Lemma 2.23. Assume the process (X(t), α(t)) is strong Feller. Denote
U := D × J ⊂ Rr × M, where D is a nonempty bounded open subset of
Rr . Then for any t > 0 and every bounded real Borel measurable function
f (·, ·) on U , the functions
 
F (x, i) := Ex,i I{τU >t} f (X(t), α(t)) ,
G(x, i) := Ex,i [f (X(t ∧ τU ), α(t ∧ τU ))] ,

are continuous in U , where τU is the first exit time from U .

Now we are ready to present the main result of this section.

Theorem 2.24. Assume (A2.2)–(A2.4) hold. Then the process (X(t), α(t))
possesses the strong Feller property.

Proof. As in the proof of Theorem 2.18, we denote by PN x,i the probability


measure for which the associated martingale problem [153] has operator
LN defined in (2.42) and denote by EN x,i the corresponding expectation.
Then by Lemma 2.22, PN x,i is strong Feller. As in (2.14), let βN be the first
exit time from the ball B(0, N ) = {x ∈ Rr : |x| < N }. Then as argued in
the proof of Theorem 2.18, for any bounded and Borel measurable function
f (·, ·) ∈ Rr × M and (x, i) ∈ Rr × M, we have
   
Ex,i f (X(t), α(t))I{βN >t} = EN x,i f (X(t), α(t))I{βN >t} .

It follows that

Ex,i f (X(t), α(t)) − ENx,i f (X(t), α(t))


   
= Ex,i f (X(t), α(t))I{βN ≤t} − EN x,i f (X(t), α(t))I{βN ≤t}
Z
 
= f (X(t), α(t))I{βN ≤t} Px,i (dω) − PN x,i (dω) (2.50)

 
≤ kf k Px,i {βN ≤ t} + PN x,i {βN ≤ t}

= 2 kf k [1 − PN
x,i {βN > t}],
2.6 Strong Feller Property 55

where k·k is the essential sup norm. Fix some (x0 , i) ∈ Rr ×M. Because the
process (X(t), α(t)) is regular, by (2.15), βN → ∞ a.s. Px0 ,i as N → ∞.
Therefore for any positive number ε > 0, we can choose some N sufficiently
large so that
ε
1 − PN
x0 ,i {βN > t} = 1 − Px0 ,i {βN > t} < . (2.51)
12 kf k

Also by Lemma 2.23, the function x 7→ PNx,i {βN > t} is continuous. Hence
there exists some δ1 > 0 such that whenever |x − x0 | < δ1 we have

N
1 − PN
x,i {βN > t} ≤ 1 − Px0 ,i {βN > t}

+ PN N
x0 ,i {βN > t} − Px,i {βN > t} (2.52)
ε
< .
6 kf k

Note also that by virtue of Lemma 2.22, there exists some δ2 > 0 such that

EN N
x,i f (X(t), α(t)) − Ex0 ,i f (X(t), α(t)) < ε/3, if |x − x0 | < δ2 . (2.53)

Finally, it follows from (2.50)–(2.53) that whenever |x − x0 | < δ, where


δ = min {δ1 , δ2 }, we have

|Ex,i f (X(t), α(t)) − Ex0 ,i f (X(t), α(t))|


≤ Ex,i f (X(t), α(t)) − EN
x,i f (X(t), α(t))

+ Ex0 ,i f (X(t), α(t)) − EN


x0 ,i f (X(t), α(t))

+ EN N
x,i f (X(t), α(t)) − Ex0 ,i f (X(t), α(t))

< ε.

This concludes the proof of the theorem. 2


A slight modification of the standard argument in [38, Theorem 13.5]
leads to the following theorem. We omit the detailed proof here.

Theorem 2.25. Assume (A2.2)–(A2.4). Let U := D×J ⊂ Rr ×M, where


D ⊂ Rr is nonempty, open, and bounded with sufficiently smooth boundary
∂D. Then the process (X(· ∧ τU ), α(· ∧ τU )) possesses the Feller property,
where τU is the first exit time from U .

We end this section with a brief discussion of strong Feller processes and
L-harmonic functions. More properties of L-harmonic functions are treated
in Chapter 3. As in [52], for any U = D × J, where D ⊂ Rr is a nonempty
domain, and J ⊂ M, a Borel measurable function u : U 7→ R is said to be
L-harmonic in U if u is bounded in compact subsets of U and that for all
56 2. Switching Diffusion

(x, i) ∈ U and any V = De × Je with D


e ⊂⊂ D being a neighborhood of x
and i ∈ Je ⊂ J, we have
u(x, i) = Ex,i u(X(τV ), α(τV )),
where τV denotes the first exit time of the process (X(t), α(t)) from V , and
De ⊂⊂ D means D e ⊂D e ⊂ D and D e is compact, with De =D e ∪ ∂D e denoting
the closure of D. It is now obvious that if the process (X(t), α(t)) is strong
Feller, then any L-harmonic function is continuous. Therefore we have
Proposition 2.26. Assume (A2.2)–(A2.4). Then any L-harmonic func-
tion is continuous in its domain.

2.7 Continuous and Smooth Dependence on the


Initial Data x
When one deals with a continuous-time dynamic system modeled by an
ordinary differential equation together with appropriate initial data, the
well-posedness is crucial. The well-posedness appears in ordinary differen-
tial equations and partial differential equations, together with initial and/or
boundary data. They are time-honored phenomena, which naturally carry
over to stochastic differential equations as well as stochastic differential
equations with random switching. A problem for the associated switching
diffusion is well posed if there is a unique solution for the initial value
problem and the solution continuously depends on the initial data.
Continuous dependence on the initial data can be obtained using the
Feller property by choosing the function u(·) appropriately. In the subse-
quent development, we also need to use smooth dependence on the initial
data, which is even more difficult to obtain. In this section, we devote our
attention to this smoothness property. Again we need to use the notion of
multi-index. Recall that a vector β = (β1 , . . . , βr ) with nonnegative integer
components is referred to as a multi-index. Put
|β| = β1 + · · · + βr ,
and define Dxβ as
∂β ∂ |β|
Dxβ = = .
∂x β
∂x1 · · · ∂xβr r
β 1

The main result of this section is the following theorem.


Theorem 2.27. Denote by (X x,α (t), αx,α (t)) the solution to the system
given by (2.2) and (2.3). Assume that the conditions of Theorem 2.1 are
satisfied and that for each i ∈ M, b(·, i) and σ(·, i) have continuous partial
derivatives with respect to the variable x up to the second order and that
γ
Dxβ b(x, i) + Dxβ σ(x, i) ≤ K(1 + |x| ), (2.54)
2.7 Continuous and Smooth Dependence on the Initial Data x 57

where K and γ are positive constants and β is a multi-index with |β| ≤


2. Then X x,α (t) is twice continuously differentiable in mean square with
respect to x.
For ease of presentation, we prove Theorem 2.27 for the case when X(t)
is one-dimensional. The multidimensional case can be handled similarly. To
proceed, we need to introduce a few more notations. Let ∆ 6= 0 be small
and denote x e = x + ∆. As in Lemma 2.14, let (X(t), α(t)) be the switching
diffusion process satisfying (2.2) and (2.3) with initial condition (x, α) and
e
(X(t), e(t)) be the process starting from (e
α x, α) (i.e., (X(0), α(0)) = (x, α))
e
and (X(0), e(0)) = (e
α x, α), resp.).
Fix any T > 0 and let 0 < t < T . Put
e − X(t)
X(t)
Z ∆ (t) = Z x,∆,α (t) := . (2.55)

Then we have
Z
∆ 1 t e
Z (t) = 1 + e(s)) − b(X(s), α(s))]ds
[b(X(s), α
∆ 0
Z t
1 e
+ [σ(X(s), e(s)) − b(X(s), α(s))]dw(s),
α
∆ 0 Z t (2.56)
1 e
= 1 + φ∆ (t) + [b(X(s), α(s)) − b(X(s), α(s))]ds
Z t ∆ 0
1 e
+ [σ(X(s), α(s)) − σ(X(s), α(s))]dw(s),
∆ 0
where
Z t
1 e e
φ∆ (t) = [b(X(s), e(s)) − b(X(s),
α α(s))]ds
∆ 0Z
t
1 e e
+ [σ(X(s), e(s)) − σ(X(s),
α α(s))]dw(s).
∆ 0

To proceed, we first prove a lemma.


Lemma 2.28. Under the conditions of Theorem 2.27,

lim E sup |φ∆ (t)|2 = 0.


∆→0 0≤t≤T

Proof. By virtue of Hölder’s inequality and Doob’s martingale inequality


(A.19),
Z T
2T e e
E sup |φ∆ (t)|2 ≤ 2 E |b(X(s), e(s)) − b(X(s),
α α(s))|2 ds
0≤t≤T ∆ 0
Z T 2
8 e e
+ 2E [σ(X(s), e(s)) − σ(X(s),
α α(s))]dw(s) .
∆ 0
58 2. Switching Diffusion

We treat each of the terms above separately. Choose η = ∆γ0 with γ0 > 2
and partition the interval [0, T ] by η. We obtain

Z T
E e
|b(X(s), e
e(s)) − b(X(s),
α α(s))|2 ds
0
bT /ηc−1 Z kη+η
X
=E e
|b(X(s), e
e(s)) − b(X(s),
α α(s))|2 ds
k=0 kη
bT /ηc−1 h
X Z kη+η
≤ KE e
|b(X(s), e
e(s)) − b(X(ηk),
α e(s))|2 ds
α (2.57)
k=0 Z kη
kη+η
+ KE e
|b(X(ηk), e
e(s)) − b(X(ηk),
α α(s))|2 ds
Zkηkη+η i
+ KE e
|b(X(ηk), e
α(s)) − b(X(s), α(s))|2 ds .

Note that the constant K in (2.57) does not depend on k = 0, 1, . . . , bT /ηc


or η. The exact value of K may be different in each occurrence. We use
this convention throughout the rest of the section.
By Lipschitz continuity and the tightness type of estimate (2.26), we
obtain
Z kη+η
E e
|b(X(s), e
e(s)) − b(X(ηk),
α e(s))|2 ds
α

Z kη+η
≤K e
E|X(s) e
− X(ηk)| 2
ds (2.58)

Z kη+η
≤K (s − ηk)ds ≤ Kη 2 .

Likewise, we can deal with the term on the last line of (2.57), and obtain

Z kη+η
E e
|b(X(ηk), e
α(ηk)) − b(X(s), α(s))|2 ds ≤ Kη 2 . (2.59)

To treat the term on the next to the last line of (2.57), note that for
k = 0, 1, . . . , bT /ηc − 1,

Z kη+η
E e
|b(X(ηk), e
e(s)) − b(X(ηk),
α α(s))|2 ds

Z kη+η
≤ KE e
|b(X(ηk), e
e(s)) − b(X(ηk),
α e(ηk))|2 ds
α (2.60)

Z kη+η
+KE e
|b(X(ηk), e
e(ηk)) − b(X(ηk),
α α(s))|2 ds.

2.7 Continuous and Smooth Dependence on the Initial Data x 59

For the term on the second line of (2.60) and k = 0, 1, . . . , bT /ηc − 1,

Z kη+η
E e
|b(X(ηk), e
e(s)) − b(X(ηk),
α e(ηk))|2 ds
α

Z kη+η
=E e
|b(X(ηk), α e
e(s)) − b(X(ηk), e(ηk))|2 I{e
α e(ηk)} ds
α(s)6=α

Z
X X kη+η
=E e
|b(X(ηk), e
i) − b(X(ηk), j)|2 I{e
α(s)=j} I{e
α(ηk)=i} ds
i∈M j6=i kη
X XZ kη+η
≤ KE e
[1 + |X(ηk)| 2
]I{e
α(ηk)=i}
i∈M j6=i kη

×E[I{e e
α(s)=j} X(ηk), α e(ηk) = i]ds
X Z kη+η
≤ KE e
[1 + |X(ηk)| 2
]I{e
α(ηk)=i}
i∈M kη
hX i
× e
qij (X(ηk))(s − ηk) + o(s − ηk) ds
j6=i
Z kη+η
≤K O(η)ds ≤ Kη 2 .

In the above, we used Proposition 2.3 and the boundedness of Q(x).


Next, we show that for k = 1, . . . , bT /ηc − 1,

Z kη+η
E e
|b(X(ηk), e
e(ηk)) − b(X(ηk),
α α(s))|2 ds ≤ Kη 2 . (2.61)

To do so, we use the technique of basic coupling of Markov processes (see


e.g., Chen [23, p. 11]). For x, x e ∈ Rr , and i, j ∈ M, consider the measure
x, i)) = |x − x
Λ((x, j), (e e| + d(j, i), where d(j, i) = 0 if j = i and d(j, i) = 1 if
j 6= i. That is, Λ(·, ·) is a measure obtained by piecing the usual Euclidean
length of two vectors and the discrete measure together. Let (α(t), α e(t)) be
a discrete random process with a finite state space M × M such that

h i
P (α(t + h), α e(t + h)) = (j, i) (α(t), α e
e(t)) = (k, l), (X(t), X(t)) e)
= (x, x

e)h + o(h),
qe(k,l)(j,i) (x, x if (k, l) 6= (j, i),
=
e)h + o(h), if (k, l) = (j, i),
1 + qe(k,l)(k,l) (x, x
(2.62)
where h → 0, and the matrix (e e)) is the basic coupling of ma-
q(k,l)(j,i) (x, x
60 2. Switching Diffusion

trices Q(x) = (qkl (x)) and Q(e


x) = (qkl (e
x)) satisfying

X
e x
Q(x, e)fe(k, l) = e)(fe(j, i) − fe(k, l))
q(k,l)(j,i) (x, x
(j,i)∈M×M
X
= x))+ (fe(j, l) − fe(k, l))
(qkj (x) − qlj (e
j
X (2.63)
+ x) − qkj (x))+ (fe(k, j) − fe(k, l))
(qlj (e
j
X
+ x))(fe(j, j) − fe(k, l)),
(qkj (x) ∧ qlj (e
j

for any function fe(·, ·) defined on M × M. Note that for s ∈ [ηk, ηk + η),
e(s) can be written as
α

X
e(s) =
α lI{e
α(s)=l} .
l∈M

Owing to the coupling defined above and noting the transition probabilities
(2.62), for i1 , i, j, l ∈ M with j 6= i and s ∈ [ηk, ηk + η), we have

E[I{α(s)=j} α(ηk) = i1 , α e
e(ηk) = i, X(ηk) = x, X(ηk) =xe]
X
= E[I{α(s)=j} I{e
α(s)=l} α(ηk) = i1 , α
e
e(ηk) = i, X(ηk) = x, X(ηk) e]
=x
l∈M
X
= e)(s − ηk) + o(s − ηk) = O(η).
qe(i1 ,i)(j,l) (x, x
l∈M
(2.64)
By virtue of (2.64), we obtain

Z kη+η
E e
|b(X(ηk), e
e(ηk)) − b(X(ηk),
α α(s))|2 ds

Z kη+η
=E e
|b(X(ηk), e
α(s)) − b(X(ηk), e(ηk))|2 I{α(s)6=αe(ηk)} ds
α

X X Z kη+η
=E e
|b(X(ηk), e
i) − b(X(ηk), j))|2 I{α(s)=j} I{e
α(ηk)=i} ds
i∈M j6=i kη
X XZ kη+η
≤ KE e
[1 + |X(ηk)| 2
]I{e
α(ηk)=i,α(ηk)=i1 }
i,i1 ∈M j6=i kη

× E[α(s) = j α(ηk) = i1 , α e
e(ηk) = i, X(ηk) = x, X(ηk) e]ds
=x
= O(η 2 ).
2.7 Continuous and Smooth Dependence on the Initial Data x 61

Using the assumption α e(0) = α(0) = α and noting X(0)e =x e, we obtain


Z η
E e
|b(X(0), e
e(0)) − b(X(0),
α α(s))|2 ds
0 Z η
2
=E |b(e
x, α(0)) − b(ex, α(s))| ds
0
Z ηX (2.65)
2
=E |b(e
x, α) − b(e
x, j)| I{α(s)=j} ds
0 j6=α
Z ηX h i
2
= |b(ex, α) − b(e
x, j)| qαj (ex)s + o(s) ds ≤ Kη 2 .
0 j6=α

Thus, it follows that for k = 0, 1, . . . , bT /ηc − 1,


Z kη+η
E e
|b(X(ηk), e
e(s)) − b(X(ηk),
α α(s))|2 ds ≤ Kη 2 . (2.66)

Using the estimates (2.58), (2.59), and (2.66) in (2.57), we obtain


Z T
E e
|b(X(s), e
e(s)) − b(X(s),
α α(s))|2 ds
0
bT /ηc−1
X (2.67)
≤ Kη 2 ≤ Kη.
k=0

Likewise, we obtain
Z T 2
E e
[σ(X(s), α e
e(s)) − σ(X(s), α(s))]dw(s) ≤ Kη. (2.68)
0

Putting (2.67) and (2.68) into φ∆ (t), and noting γ0 > 2, we obtain
η
E sup |φ∆ (t)|2 ≤ K = K∆γ0 −2 → 0 as ∆ → 0. (2.69)
0≤t≤T ∆2

The lemma is proved. 2

Remark 2.29. In deriving (2.65), the condition α(0) = α e(0) = α is used


crucially. If the initial data are not the same, there will be a contribution
of a nonzero term resulting in difficulties in obtaining the differentiability.

Proposition 2.30. Using the conditions of Lemma 2.14 except Q(x) being
allowed to be x-dependent, the conclusion of Lemma 2.14 continues to hold.

Proof. As before, let (X(t), α(t)) denote the switching diffusion process
e
satisfying (2.2) and (2.3) with initial condition (x, α) and (X(t), e(t)) be the
α
process starting from (e e
x, α) (i.e., (X(0), α(0)) = (x, α)) and (X(0), e(0)) =
α
62 2. Switching Diffusion

(e e − x. Then we
x, α) respectively). Let T > 0 be fixed and denote ∆ = x
e − X(t) = ∆ + A(t) + B(t), and hence
have X(t)
2
e − X(t) 2 2
sup X(t) ≤ 3∆2 + 3 sup |A(t)| + 3 sup |B(t)| ,
t∈[0,T ] t∈[0,T ] t∈[0,T ]

where
Z t
A(t) := e
[b(X(s), α e
e(s)) − b(X(s), α(s))]ds
0Z
t
+ e
[σ(X(s), e
e(s)) − b(X(s),
α α(s))]dw(s) = ∆φ∆ (t),
0

and
Z t
B(t) := e
[b(X(s), α(s)) − b(X(s), α(s))]ds
0Z
t
+ e
[σ(X(s), α(s)) − b(X(s), α(s))]dw(s).
0

It follows from (2.69) that


2
E sup |A(t)| ≤ K∆2 ∆γ0 −2 = K∆γ0 = o(∆2 ).
t∈[0,T ]

Meanwhile, we can apply the conclusion of Lemma 2.14 to obtain that


2
E sup |B(t)| ≤ K∆2 .
t∈[0,T ]

Therefore, we have
2
e − X(t) 2
E sup X(t) ≤ K∆2 + o(∆2 ) ≤ K |e
x − x| .
t∈[0,T ]

This finishes the proof of the proposition. 2


A direct consequence of Proposition 2.30 is the mean square continuity
of the solution of the switching diffusion with respect to x; that is, for any
T > 0,
2
lim E |X y,α (t) − X x,α (t)| = 0, for each α ∈ M, and t ∈ [0, T ].
y→x

That is, the continuous dependence on the initial data x is obtained. We


state this fact below.

Corollary 2.31. Assume the conditions of Theorem 2.27. Then X x,α (t)
is continuous in mean square with respect to x.
2.7 Continuous and Smooth Dependence on the Initial Data x 63

Proof of Theorem 2.27. With Lemma 2.28 and Proposition 2.30 at our
hands, we proceed to prove Theorem 2.27. Because b(·, j) is twice continu-
ously differentiable with respect to x, we can write
Z
1 t e
[b(X(s), α(s)) − b(X(s), α(s))]ds
∆ 0 Z Z
1 t 1 d e
= b(X(s) + v(X(s) − X(s)), α(s))dvds
Z∆ t Z
0
1
0 dv 
= e
bx (X(s) + v(X(s) − X(s)), α(s))dv Z ∆ (s)ds,
0 0

where Z ∆ (t) is defined in (2.55) and bx (·) denotes the partial derivative of
b(·) with respect to x (i.e., bx = (∂/∂x)b). It follows from Lemma 2.30 that
for any s ∈ [0, T ],
e
X(s) − X(s) → 0
in probability as ∆ → 0. This implies that
Z 1
e
bx (X(s) + v(X(s) − X(s)), α(s))dv → bx (X(s), α(s)) (2.70)
0

in probability as ∆ → 0. Similarly, we have


Z
1 t e
[σ(X(s), α(s)) − σ(X(s), α(s))]dw(s)
∆ 0Z Z 
t 1
= e
σx (X(s) + v(X(s) − X(s)), α(s))dv Z ∆ (s)dw(s)
0 0

and
Z 1
e
σx (X(s) + v(X(s) − X(s)), α(s))dv → σx (X(s), α(s)) (2.71)
0

in probability as ∆ → 0. Let ζ(t) := ζ x,α (t) be the solution of


Z t Z t
ζ(t) = 1 + bx (X(s), α(s))ζ(s)ds + σx (X(s), α(s))ζ(s)dw(s), (2.72)
0 0

where bx and σx denote the partial derivatives of b and σ with respect to


x, respectively. Then (2.56), (2.69)–(2.71), and [47, Theorem 5.5.2] imply
that
2
E Z ∆ (t) − ζ(t) → 0 as ∆ → 0 (2.73)
and ζ(t) = ζ x,α (t) is mean square continuous with respect to x. Therefore,
(∂/∂x)X(t) exists in the mean square sense and (∂/∂x)X(t) = ζ(t).
Likewise, we can show that (∂ 2 /∂x2 )X x,α (t) exists in the mean square
sense and is mean square continuous with respect to x. The proof of the
theorem is thus concluded. 2
64 2. Switching Diffusion

Corollary 2.32. Under the assumptions of Theorem 2.27, the mean square
derivatives (∂/∂xj )X x,α (t) and (∂ 2 /∂xj ∂xk )X x,α (t), j, k = 1, . . . , n, are
mean square continuous with respect to t.

Proof. As in the proof of Theorem 2.27, we consider only the case when
X(t) is real valued. Also, we use the same notations here as those in the
proof of Theorem 2.27. To see that ζ(t) is continuous in the mean square
sense, we first observe that for any t ∈ [0, T ],

2 2 2
E |ζ(t)| ≤ 2E ζ(t) − Z ∆ (t) + 2E Z ∆ (t) .

It follows from (2.69), the Lipschitz condition, and Lemma 2.30 that
Z 2
2 2 1 t e
E Z ∆ (t) ≤ 3E φ∆ (t) + 3E [b(X(u), α(u)) − b(X(u), α(u))]du
∆ 0
Z 2
1 t e
+ 3E [σ(X(u), α(u)) − σ(X(u), α(u))]dw(u)
∆ 0 Z
t 2
1 e
≤ K + 3t 2E b(X(u), α(u)) − b(X(u), α(u)) du
|∆|Z 0
t
1 e
2
+ 3E 2 σ(X(u), α(u)) − σ(X(u), α(u)) du
|∆| 0 Z t
1 2
≤ K + 3K(T + 1) e − X(u) du
2E X(u)
|∆| 0

≤ C = C(x, T, K).

Hence we have from (2.73) that


2
sup E |ζ(t)| ≤ C = C(x, T, K) < ∞. (2.74)
t∈[0,T ]

Thus ζ(t) is mean square continuous if we can show that


2
E |ζ(t) − ζ(s)| → 0 as |s − t| → 0.

To this end, we note that for any s, t ∈ [0, T ],


h 2 2
2
E |ζ(t) − ζ(s)| ≤ 3E ζ(t) − Z ∆ (t) + ζ(s) − Z ∆ (s)
2
i
+ Z ∆ (t) − Z ∆ (s) .

In view of (2.73), we need only prove that


2
E Z ∆ (t) − Z ∆ (s) → 0 as |s − t| → 0.
2.8 A Remark Regarding Nonhomogeneous Markov Processes 65

Without loss of generality, assume that s < t. Then by (2.56), we have


2
E Z ∆ (t) − Z ∆ (s)
2
≤ 3E φ∆ (t) − φ∆ (s)
Z 2
1 t e (2.75)
+ 3E b(X(u), α(u)) − b(X(u), α(u))du
∆ s
Z t 2
1 e
+ 3E σ(X(u), α(u)) − σ(X(u), α(u))dw(u) .
∆ s
It follows from the Cauchy-Schwartz inequality, the Lipschitz condition,
and Lemma 2.30 that
Z 2
1 t e
E b(X(u), α(u)) − b(X(u), α(u))du
∆ s Z t
1 2
≤ (t − s) e α(u)) − b(X(u), α(u)) du
2E b(X(u),
|∆| Zs (2.76)
t
1 e
2
≤ (t − s) 2E K X(u) − X(u) du
|∆| s

≤ KC(t − s)2 ,
where K is the Lipschitz constant and C is a constant independent of t, s,
or ∆. Similarly, we can show that
Z 2
1 t e
E σ(X(u), α(u)) − σ(X(u), α(u))dw(u)
∆ s Z
t
1 e
2
(2.77)
=E 2 σ(X(u), α(u)) − σ(X(u), α(u)) du
|∆| s
≤ KC(t − s).
Next, using the same argument as that of Lemma 2.28, we can show that
2
E φ∆ (t) − φ∆ (s) ≤ K(t − s). (2.78)
Thus it follows from (2.75)–(2.78) that
2
E Z ∆ (t) − Z ∆ (s) = O(|t − s|) → 0 as |t − s| → 0
and hence ζ(t) is mean square continuous with respect to t.
Likewise, we can show that (∂ 2 /∂x2 )X x,α (t) is mean square continuous
with respect to t. This concludes the proof. 2

2.8 A Remark Regarding Nonhomogeneous


Markov Processes
Throughout the book, for notational simplicity, we have decided to mainly
concern ourselves with time-homogeneous switching diffusion processes. As
66 2. Switching Diffusion

a consequence, the drift and diffusion coefficients are all independent of t.


The discussion throughout the book can be extended to nonhomoge-
neous Markov processes. The setup can be changed as follows. Suppose
that b(·, ·, ·) : R × Rr × M 7→ Rr and that σ(·, ·, ·) : R × Rr × M 7→ Rd .
Let w(·) be an Rd -valued standard Brownian motion defined in the filtered
probability space (Ω, F, {Ft }, P), and α(·) be a pure jump process whose
generator is given by Q(x) as before such that for f (·, ·, ·) : R × Rr × M,
X
Q(x)f (t, x, ·)(ı) = qı (x)(f (t, x, ) − f (t, x, ı)), for each ı ∈ M.
∈M
(2.79)
In lieu of (2.2), we may consider the two-component process (X(·), α(·))
satisfying
Z t Z t
X(t) = X(0) + b(s, X(s), α(s))ds + σ(s, X(s), α(s))dw(s), (2.80)
0 0

and as ∆ → 0,

P{α(t + ∆) = |X(t) = x, α(t) = ı, X(s), α(s), s ≤ t}


(2.81)
= qı (x)∆ + o(∆), for ı 6= .
It follows that the associated operator can be defined as: For each ι ∈
M, f (·, ·, ι) ∈ C 1,2 where C 1,2 denotes the class of functions whose first-
order derivative with respect to t and second-order partial derivatives with
respect to x are continuous,
r
X ∂f (t, x, ι)
Lf (t, x, ι) = bi (t, x, ι)
i=1
∂xi
r
1 X ∂ 2 f (t, x, ι) (2.82)
+ aij (t, x, ι)
2 i,j=1 ∂xi ∂xj
+Q(x)f (t, x, ·)(ι).
Now, the generalized Itô lemma is changed to

f (t, X(t), α(t)) − f (0, X(0), α(0))


Z t  (2.83)

= + L f (s, X(s), α(s))ds + M1 (t) + M2 (t),
0 ∂s
where
Z t
M1 (t) = ∇f (s, X(s), α(s)), σ(s, X(s), α(s))dw(s) ,
Z0 t Z

M2 (t) = f (s, X(s), α(0) + h(X(s), α(s), z))
0 R

−f (s, X(s), α(s)) µ(ds, dz).
2.9 Notes 67

In view of the generalized Itô formula, for any f (·, ·, ı) ∈ C 1,2 with ı ∈ M,
and τ1 , τ2 being bounded stopping times such that 0 ≤ τ1 ≤ τ2 a.s., if
f (t, X(t), α(t)) and Lf (t, X(t), α(t)) etc. are bounded on t ∈ [τ1 , τ2 ] with
probability 1, then Dynkin’s formula becomes
Z τ2
Ef (τ2 , X(τ2 ), α(τ2 )) = Ef (τ1 , X(τ1 ), α(τ1 )) + E Lf (s, X(s), α(s))ds.
τ1
(2.84)
Moreover, for each f (·, ·, ι) ∈ Cb1,2 or f (·, ·, ι) ∈ C01,2 ,

Mf (t) = f (t, X(t), α(t)) − f (0, X(0), α(0))


Z t 

− + L f (s, X(s), α(s))ds
0 ∂s

is a martingale. With the setup changed to nonhomogeneous Markov pro-


cesses, most of the subsequent results will carry over. Nevertheless, mod-
ifications are necessary to take into account the added complexity due to
nonhomogeneous processes. In order to present the main results and ideas
without much notational complication, we confine ourselves to the homo-
geneous switching diffusions throughout the book.

2.9 Notes
The connection between generators of Markov processes and martingales is
explained in Ethier and Kurtz [43]. An account of piecewise-deterministic
processes is in Davis [30]. Results on basic probability theory may be found
in Chow and Teicher [27]; the theory of stochastic processes can be found
in Gihman and Skorohod [53], Khasminskii [83], and Liptser and Shiryayev
[110], among others. More detailed discussions regarding martingales and
diffusions are in Elliott [40]; an in-depth study of stochastic differential
equations and diffusion processes is contained in Ikeda and Watanabe [72].
Concerning the existence of solutions to stochastic differential equations
with switching, using Poisson random measures (see Skorohod [150] also
Basak, Bisi, and Ghosh [6] and Mao and Yuan [120]), it can be shown
that there is a unique solution for each initial condition by following the
approach of Ikeda and Watanabe [72] with the appropriate use of the stop-
ping times. However, the Picard iteration method does not work, which is
further explained when we study the numerical solutions (see Chapter 5).
In Section 2.7, we dealt with smoothness properties of solutions of stochas-
tic differential equations with x-dependent switching. It is interesting to
note that even the time-honored concept of well-posedness cannot be eas-
ily obtained. Once x-dependence is added, the difficulty rises considerably.
3
Recurrence

3.1 Introduction
This chapter is concerned with recurrence of switching diffusion processes.
Because practical systems in applications are often in operation for a rel-
atively long time, it is of foremost importance to understand the systems’
asymptotic behavior. By asymptotic behavior, we mean the properties of
the underlying processes in a neighborhood of “∞” and in a neighborhood
of an equilibrium point. Properties concerning a neighborhood of ∞ are
treated in this chapter, whereas stability of an equilibrium point is dealt
with in Chapter 7.
Dealing with dynamic systems, one often wishes to examine if the un-
derlying system is sensitive to perturbations. In accordance with LaSalle
and Lefschetz [107], a deterministic system ẋ = h(t, x) satisfying appropri-
ate conditions, is Lagrange stable if the solutions are ultimately uniformly
bounded. When diffusions are used, one may wish to add probability modi-
fications such as “almost surely” or “in probability” to the aforementioned
uniform boundedness. Nevertheless, for instance, if almost sure bounded-
ness is used, it will exclude many systems due to the presence of Brownian
motion. Thus as pointed out by Wonham [160], such boundedness is inap-
propriate; an alternative notion of stability in a certain weak sense should
be used. In lieu of requiring the system to be bounded, one aims to find
conditions under which the systems return to a prescribed compact region
in finite time. This chapter focuses on weak-sense stability for switching
diffusions. We define recurrence, positive recurrence, and null recurrence;

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 69
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_3,
© Springer Science + Business Media, LLC 2010
70 3. Recurrence

we also develop Liapunov-function-based criteria together with more easily


verifiable conditions on the coefficients of the processes for positive recur-
rence as well as nonrecurrence, and null recurrence. Despite the growing
interest in treating regime-switching systems, the results regarding such
issues as recurrence and positive recurrence (or weak stochastic stability as
coined by Wonham [160]) are still scarce. These are not simple extensions
of their diffusion counterpart. Due to the coupling and interactions, elliptic
systems instead of a single elliptic equation must be treated. Moreover, even
the classical approaches such as Liapunov function methods and Dynkin’s
formula are still applicable for switching diffusions, the analysis is more
delicate than the diffusion counterparts, and requires careful handling of
discrete-event component α(·).
The rest of the chapter is arranged as follows. In Section 3.2, in addition
to introducing certain notations, we also provide definitions of recurrence,
transience, positive recurrence, and null recurrence, as well as some prelim-
inary results. Section 3.3 focuses on recurrence and transience. Section 3.4
proceeds with the study of positive and null recurrence. We present results
of necessary and sufficient conditions for recurrence using Liapunov func-
tions. We also consider the case under “linearization” for the continuous
component. Section 3.5 is devoted to a number of examples as applications
of the general results. Section 3.6, containing the proofs of several technical
lemmas, is provided to facilitate reading. Discussions and further remarks
are made in Section 3.7.

3.2 Formulation and Preliminaries


3.2.1 Switching Diffusion
Recall that (Ω, F, {Ft }t≥0 , P) is a complete probability space with a fil-
tration {Ft }t≥0 satisfying the usual condition (i.e., it is right continuous
with F0 containing all P-null sets). Let x ∈ Rr , M = {1, . . . , m0 }, and
Q(x) = (qij (x)) an m0 × m0 matrix dependingPm0 on x satisfying that for
any x ∈ Rr , qij (x) ≥ 0 for i 6= j and j=1 ij (x) = 0. For any twice
q
continuously differentiable function h(·, i), i ∈ M, define L by

r r
1 X ∂ 2 h(x, i) X ∂h(x, i)
Lh(x, i) = ajk (x, i) + bj (x, i) + Q(x)h(x, ·)(i)
2 ∂xj ∂xk j=1
∂xj
j,k=1
1
= tr(a(x, i)∇2 h(x, i)) + b0 (x, i)∇h(x, i) + Q(x)h(x, ·)(i),
2
(3.1)
where ∇h(·, i) and ∇2 h(·, i) denote the gradient and Hessian of h(·, i),
respectively, b0 (x, i)∇h(x, i) denotes the usual inner product on Rr with z 0
3.2 Formulation and Preliminaries 71

being the transpose of z for z ∈ Rι1 ×ι2 with ι1 , ι2 ≥ 1, and


m0
X
Q(x)h(x, ·)(i) = qij (x)h(x, j)
j=1 (3.2)
X
= qij (x)(h(x, j) − h(x, i)), i ∈ M.
j∈M

Consider a Markov process Y (t) = (X(t), α(t)), whose associated opera-


tor is given by L. Note that Y (t) has two components, an r-dimensional
continuous component X(t) and a discrete component α(t) taking value in
M = {1, . . . , m0 }.
Recall that the process Y (t) = (X(t), α(t)) may be described by the
following pair of equations:

dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t),


(3.3)
X(0) = x, α(0) = α,

and
P{α(t + ∆) = j|α(t) = i, X(s), α(s), s ≤ t}
(3.4)
= qij (X(t))∆ + o(∆), i 6= j,
where w(t) is a d-dimensional standard Brownian motion, b(·, ·) : Rr ×M 7→
Rr , and σ(·, ·) : Rr × M 7→ Rr×d satisfying σ(x, i)σ 0 (x, i) = a(x, i). Note
that (3.3) depicts the system dynamics and (3.4) delineates the probability
structure of the jump process. Note that if α(·) is a continuous-time Markov
chain independent of the Brownian motion w(·) and Q(x) = Q or Q(x) =
Q(t) (independent of x), then equation (3.3) together with the generator Q
or Q(t) is sufficient to characterize the underlying process. As long as there
is an x-dependence, equation (3.4) is needed in delineating the dynamics
of the switching diffusion.
In this chapter, our study is carried out with the use of the operator L
given in (3.1). Throughout the chapter, we assume that both b(·, i) and
σ(·, i) satisfy the usual local Lipschitz and linear growth conditions for
each i ∈ M and that Q(·) is bounded and continuous. As described in
Theorem 2.1, the system (3.3)–(3.4) has a unique strong solution. In what
follows, denote the solution of (3.3)–(3.4) by (X x,α (t), αx,α (t)) when we em-
phasize the dependence on initial data. To study recurrence and ergodicity
of the process Y (t) = (X(t), α(t)), we further assume that the following
condition (A3.1) holds throughout the chapter. For convenience, we also
put the boundedness and continuity of Q(·) in (A3.1).

(A3.1) The operator L satisfies the following conditions. For each i ∈


M, a(x, i) = (ajk (x, i)) is symmetric and satisfies

κ1 |ξ|2 ≤ ξ 0 a(x, i)ξ ≤ κ−1 2 r


1 |ξ| , for all ξ ∈ R , (3.5)
72 3. Recurrence

with some constant κ1 ∈ (0, 1] for all x ∈ Rr . Q(·) : Rr 7→


Rm0 ×m0 is a bounded and continuous function. Moreover, Q(x)
is irreducible for each x ∈ Rr .

3.2.2 Definitions of Recurrence and Positive Recurrence


This subsection is devoted to the definitions of recurrence, positive recur-
rence, and null recurrence. First, we introduce the following notation and
conventions. For any D ⊂ Rr , J ⊂ M, and U = D × J ⊂ Rr × M, denote

τU := inf{t ≥ 0 : (X(t), α(t)) ∈


/ U },
(3.6)
σU := inf{t ≥ 0 : (X(t), α(t)) ∈ U }.

In particular, if U = D × M is a “cylinder,” we set

τD := inf{t ≥ 0 : X(t) ∈
/ D},
(3.7)
σD := inf{t ≥ 0 : X(t) ∈ D}.

Definition 3.1. Recurrence and positive and null recurrence are defined
as follows.
• Recurrence and Transience. For U := D × J, where J ⊂ M and
D ⊂ Rr is an open set with compact closure, let
x,α
σU = inf{t : (X x,α (t), αx,α (t)) ∈ U }.
A regular process (X x,α (·), αx,α (·)) is recurrent with respect to U if
x,α
P{σU < ∞} = 1 for any (x, α) ∈ D c × M, where D c denotes the
complement of D; otherwise, the process is transient with respect to
U.
• Positive Recurrence and Null Recurrence. A recurrent process with
finite mean recurrence time for some set U = D × J, where J ⊂ M
and D ⊂ Rr is a bounded open set with compact closure, is said to
be positive recurrent with respect to U ; otherwise, the process is null
recurrent with respect to U .

3.2.3 Preparatory Results


We first prove the following theorem, which asserts that under assumption
(A3.1), the process Y (t) = (X(t), α(t)) will exit every bounded “cylinder”
with finite mean exit time.
Theorem 3.2. Let D ⊂ Rr be a nonempty open set with compact closure
D. Let τD := inf{t ≥ 0 : X(t) ∈
/ D}. Then
Ex,i τD < ∞, for any (x, i) ∈ D × M. (3.8)
3.2 Formulation and Preliminaries 73

Proof. First, note that from the uniform ellipticity condition (3.5), we have

κ1 ≤ a11 (x, i) ≤ κ−1


1 , for any (x, i) ∈ D × M. (3.9)

For each i ∈ M, consider

W (x, i) = k − (x1 + β)c ,

where the constants k, c (with c ≥ 2), and β are to be specified, and


x1 = e01 x is the first component of x with e1 = (1, 0, . . . , 0)0 being the
standard unit vector. Direct computation leads to
 
c−1
LW (x, i) = −c(x1 + β)c−2 b1 (x, i)(x1 + β) + a11 (x, i) .
2

Set !
2
c= sup |b1 (x, i)(x1 + β)| + 1 + 1.
κ1 (x,i)∈D×M

Then we have from (3.9) that


c−1
a11 (x, i) + b1 (x, i)(x1 + β)
2
c−1
≥ κ1 − sup |b1 (x, i)(x1 + β)| ≥ 1.
2 (x,i)∈D×M

Meanwhile, since x ∈ D ⊂ D and D is compact, we can choose β such


that 1 ≤ x1 + β ≤ M for all x ∈ D, where M is some positive constant.
Thus we have (x1 + β)c−2 ≥ 1c−2 = 1. Finally, we choose k large enough
so that W (x, i) = k − (x1 + β)c > 0 for all (x, i) ∈ D × M. Therefore,
W (x, i), i ∈ M are Liapunov functions satisfying

LW (x, i) ≤ −c, for all (x, i) ∈ D × M. (3.10)

Now let τD (t) = t∧τD := min{t, τD }. Then we have from Dynkin’s formula
and (3.10) that

Ex,i W (X(τD (t)), α(τD (t))) − W (x, i)


Z τD (t)
= Ex,i LW (X(u), α(u))du ≤ −cEx,i τD (t).
0

Because the function W (·, ·) is nonnegative, we have


1
Ex,i τD (t) ≤ W (x, i). (3.11)
c
Because
Ex,i τD (t) = Ex,i τD I{τD ≤t} + Ex,i tI{τD >t} ,
74 3. Recurrence

we have from (3.11) that


1
tPx,i [τD > t] ≤ W (x, i).
c
Letting t → ∞, we obtain
Px,i [τD = ∞] = 0 or Px,i [τD < ∞] = 1.
This yields that τD (t) → τD almost surely (a.s.) Px,i as t → ∞. Now
applying Fatou’s lemma, as t → ∞, we obtain
1
Ex,i τD ≤ W (x, i) < ∞,
c
as desired. 2
Remark 3.3. A closer examination of the proof shows that the conclusion
of Theorem 3.2 remains valid if we replace the uniform ellipticity condition
(3.5) by a weaker condition: There exist some ι = 1, 2, . . . , r and positive
constant κ such that
aιι (x, i) ≥ κ for any (x, i) ∈ D × M. (3.12)
Let us recall the definition of L-harmonic functions. For any U = D × J,
where D ⊂ Rr is a nonempty domain, and J ⊂ M, a Borel measurable
function u : U 7→ R is said to be L-harmonic in U if u is bounded in
compact subsets of U and that for all (x, i) ∈ U and any V = D e × Je with
De ⊂⊂ D being a neighborhood of x and i ∈ Je ⊂ J, we have

u(x, i) = Ex,i u(X(τV ), α(τV )),


where τV denotes the first exit time of the process (X(t), α(t)) from V , and
De ⊂⊂ D means D e ⊂D e ⊂ D and D e is compact, with D
e =D e ∪ ∂D e denoting
the closure of D.
Lemma 3.4. For any U = D ×J ⊂ Rr ×M, where D ⊂ Rr is a nonempty
domain, the functions
f (x, i) = Px,i {τU < ∞} and g(x, i) = Ex,i φ(X(τU ), α(τU ))
are L-harmonic in U , where φ is any bounded and Borel measurable func-
tion on ∂D × M.

Proof. Fix any (x, i) ∈ U . Consider any V = De × Je ⊂ U such that


x ∈ De ⊂⊂ D and i ∈ Je ⊂ J. Then it follows from the strong Markov
property that
 
f (x, i) = Ex,i [I{τU <∞} ] = Ex,i Ex,i [I{τU <∞} ] FτV
 
= Ex,i EX(τV ),α(τV ) [I{τU <∞} ] = Ex,i f (X(τV ), α(τV )).
3.2 Formulation and Preliminaries 75

This shows that f is L-harmonic in U . A very similar argument shows that


g is also L-harmonic in U . 2
Following the well-known arguments in [38, Vol. II, Chapter 13], we ob-
tain the following two lemmas. (Note that Lemma 3.5 was also proved in
[52, Lemma 4.3], and in [24], when the operator L is in divergence form.)
Lemma 3.5. Assume (A3.1). Let U = D × M ⊂ Rr × M and f : U 7→ R,
where D ⊂ Rr is a nonempty domain. Then

Lf (x, i) = 0 for any (x, i) ∈ U (3.13)

if and only if f is L-harmonic in U . Moreover, assume that ∂D is suffi-


ciently smooth, D is compact, and ϕ(·, i) is an arbitrary continuous function
on ∂D for any i ∈ M. Then

u(x, i) := Ex,i ϕ(X(τU ), α(τU )) (3.14)

is the unique solution of the differential equation (3.13) with boundary con-
dition

lim u(x, i) = ϕ(x0 , i) for any (x0 , i) ∈ ∂D × M. (3.15)


x→x0 ,x∈D

Proof. We prove the lemma in several steps.


Step 1. Assume (3.13). Let (x, i) ∈ V ⊂ U , V , and τV be as in the proof
of Lemma 3.4 and t > 0. Then Dynkin’s formula and (3.13) lead to
Z τV ∧t
Ex,i f (X(τV ∧ t), α(τV ∧ t)) = f (x, i) + Ex,i Lf (X(s), α(s))ds
0
= f (x, i).

Note that Theorem 3.2 implies that Px,i {τV < ∞} = 1. Letting t → ∞,
we obtain by virtue of the bounded convergence theorem that

f (x, i) = Ex,i f (X(τV ), α(τV )).


This shows that f is L-harmonic in U .
Step 2. Assume that f is L-harmonic in U . Note that by virtue of Propo-
sition 2.26, f is continuous in U . Consider V = D e × M ⊂ U with D
e ⊂⊂ D
and ∂ De sufficiently smooth. Then by virtue of [41], the boundary value
problem 
Lfe(x, i) = 0, (x, i) ∈ V,
(3.16)
fe(x, i) = f (x, i), (x, i) ∈ ∂ D
e × M,

has a unique classical solution. We show that fe agrees with f in V . In fact,


for any (x, i) ∈ V , the same argument as in Step 1 shows that

fe(x, i) = Ex,i fe(X(τV ), α(τV )).


76 3. Recurrence

But the boundary condition in (3.16) implies that

Ex,i fe(X(τV ), α(τV )) = Ex,i f (X(τV ), α(τV )).

Therefore the assumption that f is L-harmonic in U further leads to

fe(x, i) = Ex,i f (X(τV ), α(τV )) = f (x, i).

This shows that f (·, i) ∈ C 2 (D) for each i ∈ M and that f satisfies the
differential equation (3.13).
Step 3. Now assume ∂D is sufficiently smooth, D is compact, and ϕ(·, i)
is an arbitrary continuous function on ∂D for any i ∈ M. We show that the
function u defined in (3.14) is the unique solution of (3.13) with boundary
condition (3.15). Indeed, it follows from Lemma 3.4 that u is L-harmonic in
U . Then we have from Step 2 that u satisfies the differential equation (3.13).
Finally, the boundary condition (3.15) is satisfied by the assumptions that
∂D is sufficiently smooth, D is compact, and that φ is continuous. This
completes the proof of the lemma. 2
Using a similar argument, we can prove the following lemma.
Lemma 3.6. Let U = D × M ⊂ Rr × M, where D ⊂ Rr is a nonempty
open set with compact closure. Suppose that g(·, i) ∈ Cb (D) and f (·, ·) :
D × M 7→ R. Then f solves the boundary value problem

 Lf (x, i) = −g(x, i), (x, i) ∈ D × M
 f (x, i) = 0, (x, i) ∈ ∂D × M

if and only if
Z τU
f (x, i) = Ex,i g(X(t), α(t))dt, for all (x, i) ∈ D × M.
0

Using Lemmas 3.5 and 3.6, we proceed to prove that if the process
Y (t) = (X(t), α(t)) is recurrent (resp., positive recurrent) with respect
to some “cylinder” D × M ⊂ Rr × M, then it is recurrent (resp., positive
recurrent) with respect to any “cylinder” E ×M ⊂ Rr ×M, where D is any
nonempty domain in Rr with compact closure. These results are proved in
the following two lemmas. To preserve the flow of presentation, the proofs
are postponed to Section 3.6.
Lemma 3.7. Let D ⊂ Rr be a nonempty open set with compact closure.
Suppose that

Px,i {σD < ∞} = 1 for any (x, i) ∈ D c × M. (3.17)

Then for any nonempty open set E ⊂ Rr , we have

Px,i {σE < ∞} = 1 for any (x, i) ∈ E c × M.


3.2 Formulation and Preliminaries 77

Lemma 3.8. Let D ⊂ Rr be a nonempty open set with compact closure.


Suppose that
Ex,i σD < ∞ for any (x, i) ∈ D c × M. (3.18)
Then for any nonempty open set E ⊂ Rr , we have

Ex,i σE < ∞ for any (x, i) ∈ E c × M.

The following lemma shows that if the process Y (t) = (X(t), α(t)) reaches
the “cylinder” D×M in finite time a.s. Px,i , then it will visit the set D×{`}
in finite time a.s. Px,i for any ` ∈ M. Its proof together with the proof of
Lemma 3.10 is placed in Section 3.6 as well.
Lemma 3.9. Let D ⊂ Rr be a nonempty open set with compact closure
satisfying

Py,j {σD < ∞} = 1 for any (y, j) ∈ D c × M. (3.19)

Then for any (x, i) ∈ Rr × M,

Px,i {σD×{`} < ∞} = 1 for any ` ∈ M. (3.20)

With Lemma 3.9, we can now prove that if the process Y (t) = (X(t), α(t))
is positive recurrent with respect to some “cylinder” D × M, then it is pos-
itive recurrent with respect to the set D × {`} ⊂ Rr × M.
Lemma 3.10. Let D ⊂ Rr be a nonempty open set with compact closure
satisfying
Ey,j σD < ∞ for any (y, j) ∈ D c × M. (3.21)
Then for any (x, i) ∈ Rr × M,

Ex,i σD×{`} < ∞ for any ` ∈ M. (3.22)

Remark 3.11. By virtue of Lemmas 3.7–3.10, under assumption (A3.1),


the process Y (t) = (X(t), α(t)) is recurrent (resp., positive recurrent) with
respect to some “cylinder” D × M if and only if it is recurrent (resp.,
positive recurrent) with respect to the product set D × {`} ⊂ Rr × M
for any ` ∈ M. Also we have proved that the properties of recurrence
and positive recurrence are independent of the choice of the set D. We
summarize these in the following theorem.
Theorem 3.12. Suppose that (A3.1) holds. Then the following assertions
hold:
• The process (X(t), α(t)) is recurrent (resp., positive recurrent) with
respect to D×M if and only if it is recurrent (resp., positive recurrent)
with respect to D × {`}, where D ⊂ Rr is a nonempty open set with
compact closure and ` ∈ M.
78 3. Recurrence

• If the process (X(t), α(t)) is recurrent (resp., positive recurrent) with


respect to some U = D × M, where D ⊂ Rr is a nonempty open set
with compact closure, then it is recurrent (resp., positive recurrent)
with respect to any U e = D e × M, where D e ⊂ Rr is any nonempty
open set.
Remark 3.13. In view of Theorem 3.12, we make the following remarks.
• Recurrence is a property independent of the region chosen; hence-
forth, a g process (X(t), α(t)) with the associated generator L satis-
fying (A3.1) is said to be recurrent, if it is recurrent with respect to
some U = D × {`}, where D ⊂ Rr is a nonempty bounded open set
and ` ∈ M; otherwise it is said to be transient.
• Henceforth, we call a recurrent process (X(t), α(t)) positive recurrent
if it is positive recurrent with respect to some bounded domain U =
D × {`} ⊂ Rr × M; otherwise, we have a null recurrent process.

3.3 Recurrence and Transience


3.3.1 Recurrence
To study the recurrence of the process (X(t), α(t)), we first present the
following criterion based upon the existence of certain Liapunov functions.
Theorem 3.14. Assume that there exists a nonempty bounded open set
D ⊂ Rr such that there exists V (·, ·) : D c × M 7→ R+ satisfying

Vn := inf V (x, i) → ∞, as n → ∞,
|x|≥n,i∈M (3.23)
LV (x, i) ≤ 0, for all (x, i) ∈ D c × M.

Then the process (X(t), α(t)) is recurrent.

Proof. Fix any (x, α) ∈ D c × M. Define


x,α
σD = σ D := inf {t ≥ 0 : X x,α (t) ∈ D}
and
(n)
σD (t) := σD ∧ t ∧ βn ,
where βn = inf{t : |X (n) (t)| ≥ n} is as in (2.14). Then it follows from
Dynkin’s formula that
(n) (n)
EV (X(σD (t)), α(σD (t))) − V (x, α)
Z σD
(n)
(t)
=E LV (X(u), α(u))du ≤ 0.
0
3.3 Recurrence and Transience 79

Consequently,
(n) (n)
EV (X(σD (t)), α(σD (t))) ≤ V (x, α).

Note that as t → ∞, σD ∧ t ∧ βn → σD ∧ βn a.s. Hence Fatou’s lemma


implies that
EV (X(σD ∧ βn ), α(σD ∧ βn )) ≤ V (x, α).
Then we have

V (x, α) ≥ EV (X(σD ∧ βn ), α(σD ∧ βn ))


≥ EV (X(βn ), α(βn ))I{βn <σD } ≥ Vn P {βn < σD } .

Hence it follows from (3.23) that as n → ∞,

V (x, α)
P {βn < σD } ≤ → 0.
Vn
Note that
P {σD = ∞} ≤ P {βn < σD } .
Thus we have P {σD < ∞} = 1, as desired. 2
The above result is based on a Liapunov function argument. However,
constructing Liapunov functions is generally difficult. It would be nice if we
could place certain conditions on the coefficients of processes. The following
theorem is an attempt in this direction.
Theorem 3.15. Either one of the following conditions implies that the
process (X(t), α(t)) is recurrent.
(i) There exist constants γ > 0 and ci ∈ R with i ∈ M such that for
(x, i) ∈ {x ∈ Rr : |x| ≥ 1} × M,
m0
X
x0 b(x, i) tr(a(x, i)) x0 a(x, i)x 1
+ + (γ − 2) − qij (x)cj ≤ 0,
|x|2 2|x|2 2|x|4 k − γci j=1
(3.24)
where k is a positive constant sufficiently large so that k − γci > 0
for each i ∈ M.
(ii) There exist a positive constant γ and symmetric and positive definite
matrices Pi for i ∈ M such that for (x, i) ∈ {x ∈ Rr : |x| ≥ 1} × M,

x0 Pi b(x, i) tr(σ 0 (x, i)Pi σ(x, i))


0
+
x Pi x 2x0 Pi x
2 m0
X γ/2
|σ(x, i)0 Pi x| |x0 Pj x|
+(γ − 2) 2 + q ij (x) γ/2
≤ 0.
|x0 Pi x| j=1 |x0 Pi x|
(3.25)
80 3. Recurrence

Proof. (i) For each i ∈ M, define a Liapunov function as

V (x, i) = (k − γci )|x|γ .

Then direct computation shows that for x 6= 0, we have

V (x, i) = (k − γci )γ|x|γ−2 x,


 
∇2 V (x, i) = (k − γci )γ |x|γ−2 I + (γ − 2)|x|γ−4 xx0 .

Hence it follows that



x0 b(x, i) tr(a(x, i)) x0 a(x, i)x
LV (x, i) = (k − γci )γ|x|γ 2
+ 2
− (γ − 2)
|x| |x| |x|4
m0
X 
1
− qij (x)cj .
k − γci j=1

Therefore Theorem 3.14 implies the desired conclusion.


Assertion (ii) can be established using a similar argument as in (i) by
considering

W (x, i) = (x0 Pi x)γ/2 for (x, i) ∈ {x ∈ Rr : |x| ≥ 1} × M

and verifying that

∇W (x, i) = γ(x0 Pi x)(γ−2)/2 Pi x,


∇2 W (x, i) = γ(x0 Pi x)(γ−2)/2 Pi + γ(γ − 2)(x0 Pi x)(γ−4)/2 Pi xx0 Pi .

The details are omitted for brevity. 2


Lemma 3.16. If there exists some (x0 , `) ∈ Rr × M such that for any
ε > 0,

Px0 ,` {(X(tn ), α(tn )) ∈ B(x0 , ε) × {`} , for a sequence tn ↑ ∞} = 1,


(3.26)
then for any U := D × {j} ⊂ Rr × M, where D ⊂ Rr is a nonempty
bounded domain and j ∈ M, we have

Px0 ,` {σU < ∞} = 1.

In particular, if (3.26) is true for any (x, i) ∈ Rr × M, then the process


(X(t), α(t)) is recurrent.

Proof. It is enough to consider two cases, namely, when x0 6∈ D and when


x0 ∈ D with j 6= `. For the first case, the proof in Lemma 3.7 can be
adopted to show thatPx0 ,` {σD < ∞} = 1. Then by virtue of Lemma 3.9,
it follows that Px0 ,` σD×{j} < ∞ = 1. The second case follows from a
slight modification of the argument in the proof of Lemma 3.9. 2
3.3 Recurrence and Transience 81

Lemma 3.17. If the process (X(t), α(t)) is recurrent, then for any (x, i) ∈
Rr × M and ε > 0, we have

Px,i {(X(tn ), α(tn )) ∈ B(x, ε) × {i} , for a sequence tn ↑ ∞} = 1. (3.27)

Proof. Denote B = B(x, ε), B1 = B(x, ε/2), and B2 = B(x, 2ε). Define a
sequence of stopping times by

η1 := inf {t ≥ 0 : X(t) ∈
/ B2 } ;

and for n = 1, 2, . . .,

η2n := inf {t ≥ η2n−1 : (X(t), α(t)) ∈ B1 × {i}} ,


η2n+1 := inf {t ≥ η2n : X(t) ∈
/ B2 } .

Note that the process (X(t), α(t)) is recurrent, in particular, with respect
to B1 × {i}. This, together with Theorem 3.2, implies that ηn < ∞ a.s.
Px,i . Thus (3.27) follows. 2
Combining Lemmas 3.16 and 3.17, we obtain the following theorem.

Theorem 3.18. The process (X(t), α(t)) is recurrent if and only if every
point (x, i) ∈ Rr × M is recurrent in the sense that for any ε > 0,

Px,i {(X(tn ), α(tn )) ∈ B(x, ε) × {i} , for a sequence tn ↑ ∞} = 1.

Theorem 3.18 enables us to provide another criterion for recurrence in


terms of mean sojourn time. This is motivated by the results in [109]. To
this end, for any U := D × J ⊂ Rr × M and any λ ≥ 0, define
Z ∞ 
−λt
Rλ (x, i, U ) := Ex,i e IU (X(t), α(t))dt . (3.28)
0

In particular, R0 (x, i, U ) denotes the mean sojourn time of the process


(X x,i (t), αx,i (t)) in the domain U . We state a proposition below, whose
proof, being similar to a result in [109], is relegated to Section 3.6.

Proposition 3.19. Assume (A3.1). If for some point (x0 , `) ∈ Rr × M


and any ρ > 0,
R0 (x0 , `, B(x0 , ρ) × {`}) = ∞, (3.29)
then (x0 , `) is a recurrent point; that is, for every ρ > 0,

Px0 ,` {(X(tn ), α(tn )) ∈ B(x0 , ρ) × {`} , for a sequence tn ↑ ∞} = 1.


(3.30)
In particular, if (3.29) is true for any (x, i) ∈ Rr × M, then the process
(X(t), α(t)) is recurrent.
82 3. Recurrence

3.3.2 Transience
We first argue that if the process (X(t), α(t)) is transient, then the norm
of the continuous component |X(t)| → ∞ a.s. as t → ∞, and vice versa.
Then we provide two criteria for transience in terms of mean sojourn time
and Liapunov functions, respectively. If the coefficients b(x, i) and σ(x, i)
for i ∈ M are linearizable in x, we also obtain easily verifiable conditions
for transience.
Theorem 3.20. The process (X(t), α(t)) is transient if and only if
lim |X(t)| = ∞ a.s. Px,α for any (x, α) ∈ Rr × M.
t→∞

The proof of Theorem 3.20 follows the argument of Bhattacharya [15].


We next obtain a criterion for transience under certain conditions. To pre-
serve the flow of the presentation, the proofs of both Theorem 3.20 and
Proposition 3.21 are placed in Section 3.6.
Proposition 3.21. Assume that the following conditions hold.
• For i = 1, 2, . . . , m0 , the coefficients b(·, i), σ(·, i), and Q(·) are Hölder
continuous with exponent 0 < γ ≤ 1.
• Q(x) is irreducible for each x ∈ Rr .
• For each i ∈ M, a(x, i) = σ(x, i)σ 0 (x, i) is symmetric and satisfies
ha(x, i)ξ, ξi ≥ κ|ξ|2 , for all ξ ∈ Rr , (3.31)
with some positive constant κ ∈ R for all x ∈ Rr .
If for some U := D × J ⊂ Rr × M containing the point (x0 , `), where D
is a nonempty open and bounded set, R0 (x0 , `, U ) < ∞, then the process
(X(t), α(t)) is transient.
Remark 3.22. It follows from Propositions 3.19 and 3.21 that the process
(X(t), α(t)) is recurrent if and only if for every (x, i) ∈ Rr × M and every
ρ > 0, we have
R0 (x, i, B(x, ρ) × {i}) = ∞.
Next we obtain a sufficient condition for transience in terms of the exis-
tence of a Liapunov function.
Theorem 3.23. Assume there exists a nonempty bounded domain D ⊂ Rr
and a function V (·, ·) : D c × M 7→ R satisfying

sup V (x, i) ≤ 0,
(x,i)∈∂D×M

LV (x, i) ≥ 0 for any (x, i) ∈ D c × M,


(3.32)
sup V (x, i) ≤ M < ∞,
(x,i)∈D c ×M

V (y, `) > 0 for some (y, `) ∈ D c × M.


3.3 Recurrence and Transience 83

Then the process (X(t), α(t)) is either transient or not regular.

Proof. Assuming the process (X(t), α(t)) is regular, we need to show that
it is transient. Fix (y, `) ∈ D c × M with V (y, `) > 0. Define the stopping
y,`
times σD = σD , βn = βny,` , and βn ∧σD ∧t as in the proof of Theorem 3.14,
with n > (n0 ∨ |y|), where n0 is an integer such that D ⊂ {x : |x| < n0 }
and βny,` = inf{t : |X y,` (t)| = n}. By virtue of Dynkin’s formula and (3.32),

EV (X(βn ∧ σD ∧ t), α(βn ∧ σD ∧ t)) − V (y, `)


Z βn ∧σD ∧t
=E LV (X(u), α(u))du ≥ 0.
0

Therefore, we have from (3.32) that

V (y, `) ≤ EV (X(βn ∧ σD ∧ t), α(βn ∧ σD ∧ t))


= EV (X(σD ), α(σD ))I{σD ≤βn ∧t}
+ EV (X(βn ∧ t), α(βn ∧ t))I{σD >βn ∧t}
≤ M P {σD > βn ∧ t} .

Let An := {ω ∈ Ω : σD (ω) > βn (ω) ∧ t}. Then

V (y, `)
P(An ) ≥ , for any n ≥ n0 .
M
Note that βn ≤ βn+1 implies An ⊃ An+1 . This, together with the regularity
yields that

\
An = lim An = {σD > t} .
n→∞
n=n0

Therefore,
V (y, `)
P {σD > t} ≥ .
M
Finally, by letting t → ∞, we obtain that

P {σD = ∞} > 0.

Thus the process (X(t), α(t)) is transient. This completes the proof. 2
To proceed, we focus on linearizable (in the continuous component) sys-
tems. In addition to condition (A3.1), we also assume that the following
condition holds.

(A3.2) For each i ∈ M, there exist b(i), σj (i) ∈ Rr×r , j = 1, 2, . . . , d,


and Qb = (bqij ), a generator of a continuous-time Markov chain
84 3. Recurrence

b(t) such that as |x| → ∞,


α
b(x, i) x
= b(i) + o(1),
|x| |x|
σ(x, i) 1 (3.33)
= (σ1 (i)x, σ2 (i)x, . . . , σd (i)x) + o(1),
|x| |x|
Q(x) = Q b + o(1),

where o(1) → 0 as |x| → ∞. Moreover, α b(t) is irreducible


with stationary distribution denoted by π = (π1 , π2 , . . . , πm0 ) ∈
R1×m0 .
As an application of Theorem 3.23, we have the following easily verifiable
condition for transience under conditions (A3.1) and (A3.2).
Theorem 3.24. Assume (A3.1) and (A3.2). If for each i ∈ M,
 Xd  d
1 X 2
λmin b(i) + b0 (i) + σj (i)σj0 (i) − ρ(σj (i) + σj0 (i)) > 0, (3.34)
j=1
2 j=1

then the process (X(t), α(t)) is transient.

Proof. Let D = {x ∈ Rr : |x| < k}, where k is a sufficiently large positive


β
number. Consider W (x, i) = k β − |x| , (x, i) ∈ D c × M, where β < 0
is a sufficiently small constant. Then we have W (x, i) = 0 for all (x, i) ∈
∂D × M and k β ≥ W (x, i) > 0 for all (x, i) ∈ D c × M. Thus conditions
(3.32) are verified. Detailed computations reveal that for x 6= 0, we have

∇W (x, i) = −β|x|β−2 x,
 
∇2 W (x, i) = −β |x|β−2 I + (β − 2)|x|β−4 xx0 .

Hence it follows from (3.33) that for all (x, i) ∈ D c × M, we have


 0 d 
x b(i)x 1 X x0 σj0 (i)σj (i)x
LW (x, i) = −β|x|β +
|x|2 2 j=1 |x|2
!  (3.35)
(x0 σj0 (i)x)2
+ (β − 2) + o(1) .
|x|4

Note that
 Pd 
0
x b(i)x
d
1 X x0 σj0 (i)σj (i)x x0 b0 (i) + b(i) + j=1 σj0 (i)σj (i) x
+ =
|x|2 2 j=1
|x|2 2|x|2
 Xd 
1
≥ λmin b(i) + b0 (i) + σj0 (i)σj (i) .
2 j=1
(3.36)
3.4 Positive and Null Recurrence 85

Meanwhile, for any symmetric matrix A with real eigenvalues λ1 ≥ λ2 ≥


· · · ≥ λn , using the transformation x = U y with U being a real orthogonal
matrix satisfying U 0 AU = diag(λ1 , λ2 , . . . , λn ) (see [59, Theorem 8.1.1]),
we have
|x0 Ax| = |λ1 y12 + λ2 y22 + · · · + λn yn2 | ≤ ρA |y|2 = ρA |x|2 . (3.37)
Therefore it follows from (3.35)–(3.37) that for all (x, i) ∈ D c × M,
  Xd 
1 β 0 0
LW (x, i) ≥ − β|x| λmin b(i) + b (i) + σj (i)σj (i)
2 j=1
d 
1 X 0
2
− ρ(σj (i) + σj (i)) + o(1) + O(β)
2 j=1
≥ 0,

where in the last step, we used the fact that β < 0 and condition (3.34).
Hence the second equation in (3.32) of Theorem 3.23 is valid. Thus Theo-
rem 3.23 implies that the process (X(t), α(t)) is not recurrent. 2

3.4 Positive and Null Recurrence


This section takes up the positive recurrence issue. It entails the use of
appropriate Liapunov functions. Recall that the process Y (t) = (X(t), α(t))
is recurrent (resp., positive recurrent) with respect to some “cylinder” D ×
M if and only if it is recurrent (resp., positive recurrent) with respect to
D × {`}, where D ⊂ Rr is a nonempty open set with compact closure
and ` ∈ M. Thus the properties of recurrence or positive recurrence do
not depend on the choice of the open set D ⊂ Rr or ` ∈ M. The result
in Example 3.35 is quite interesting, which shows that the combination of
a transient diffusion and a positive recurrent diffusion can be a positive
recurrent switching diffusion.

3.4.1 General Criteria for Positive Recurrence


Theorem 3.25. A necessary and sufficient condition for positive recur-
rence with respect to a domain U = D × {`} ⊂ Rr × M is: For each i ∈ M,
there exists a nonnegative function V (·, i) : D c 7→ R such that V (·, i) is
twice continuously differentiable and that
LV (x, i) = −1, (x, i) ∈ D c × M. (3.38)
Let u(x, i) = Ex,i σD . Then u(x, i) is the smallest positive solution of

Lu(x, i) = −1, (x, i) ∈ D c × M,
(3.39)
u(x, i) = 0, (x, i) ∈ ∂D × M,
86 3. Recurrence

where ∂D denotes the boundary of D.

Proof. The proof is divided in three steps.


Step 1: Show that the process Y (t) = (X(t), α(t)) is positive recurrent
if there exists a nonnegative function V (·, ·) satisfying the conditions of
the theorem. Choose n0 to be a positive integer sufficiently large so that
D ⊂ {|x| < n0 }. Fix any (x, i) ∈ D c × M. For any t > 0 and n ∈ N with
n > n0 , we define
(n)
σD (t) = σD ∧ t ∧ βn ,
where βn = inf{t : |X (n) (t)| ≥ n} is defined as in (2.14) and σD is the first
entrance time X(t) to D. That is, σD = inf{t : X(t) ∈ D}. Now Dynkin’s
formula and equation (3.38) imply that
(n) (n)
Ex,i V (X(σD (t)), α(σD (t))) − V (x, i)
Z σD
(n)
(t)
(n)
= Ex,i LV (X(s), α(s))ds = −Ex,i σD (t).
0

(n)
Note that the function V is nonnegative, hence we have Ex,i σD (t) ≤
V (x, i). Meanwhile, because the process Y (t) = (X(t), α(t)) is regular, it
(n)
follows that σD (t) → σD (t) a.s. as n → ∞, where σD (t) = σD ∧ t. By
virtue of Fatou’s lemma, we obtain

Ex,i σD (t) ≤ V (x, i). (3.40)

Now the argument after equation (3.11) in the proof of Theorem 3.2 yields
that Ex,i σD ≤ V (x, i) < ∞. Then Lemma 3.10 implies that Ex,i σU =
Ex,i σD×{`} < ∞. Since (x, i) ∈ D c × M is arbitrary, we conclude that Y (t)
is positive recurrent with respect to U .
Step 2: Show that u(x, i) := Ex,i σD is the smallest positive solution of
(3.39). To this end, let n0 be defined as before, that is, a positive integer
(n)
sufficiently large so that D ⊂ {|x| < n0 }. For n ≥ n0 , set σD = σD ∧ βn .
(n) (n+1)
Clearly, we have σD ≤ σD for all n ≥ n0 . Then the regularity of the
(n)
process Y (t) implies that σD % σD a.s. as n → ∞. Hence the monotone
convergence theorem implies that as n → ∞,
(n)
Ex,i σD % Ex,i σD . (3.41)

Note that Ex,i σD < ∞ from Step 1. Meanwhile, Lemma 3.6 implies that
(n)
the function un (x, i) = Ex,i σD solves the boundary value problem

Lun (x, i) = −1,


un (x, i)|x∈∂D = 0, (3.42)
un (x, i)||x|=n = 0, i ∈ M.
3.4 Positive and Null Recurrence 87

Thus the function vn (x, i) := un+1 (x, i) − un (x, i) is L-harmonic in the


(n) (n+1) (n)
domain (D c ∩{|x| < n})×M. Since σD ≤ σD , it follows that Ex,i σD ≤
(n+1)
Ex,i σD and hence vn (x, i) ≥ 0. Now (3.41) implies that

X
u(x, i) = un0 (x, i) + vk (x, i). (3.43)
k=n0

Using Harnack’s inequality for L-elliptic systems of equations (see [3, 25],
and also [158] for general references on elliptic systems), it can be shown by
a slight modification of the well-known arguments (see, e.g., [56, pp. 21–22])
that the sum of a convergent series of positive L-harmonic functions is also
an L-harmonic function. Hence we conclude that u(x, i) is twice continu-
ously differentiable and satisfies equation (3.39). To verify that u(x, i) is
the smallest positive solution of (3.39), let w(x, i) be any positive solution
(n)
of (3.39). Note that un (x, i) = Ex,i σD satisfies the boundary conditions

un (x, i)|x∈∂D = 0,
un (x, i)||x|=n = 0, i ∈ M.

Then the functions un (x, i) − w(x, i) for i ∈ M are L-harmonic and satisfy
un (x, i) − w(x, i) = 0 for (x, i) ∈ ∂D × M and un (x, i) − w(x, i) < 0 for
(x, i) ∈ {|x| = n} × M. Hence it follows from the maximum principle
for L-elliptic system of equations [138, p. 192] that un (x, i) ≤ w(x, i) in
(Dc ∩ {|x| < n}) × M for all n ≥ n0 . Letting n → ∞, we obtain u(x, i) ≤
w(x, i), as desired.
Step 3: Show that there exists a nonnegative function V satisfying the
conditions of the theorem if the process Y (t) = (X(t), α(t)) is positive
recurrent with respect to the domain U = D×{`}. Then Ex,i σD < ∞ for all
(x, i) ∈ D c ×M and consequently equation (3.43) and Harnack’s inequality
for L-elliptic system of equations [3, 25] imply that the bounded monotone
increasing sequence un (x, i) converges uniformly on every compact subset
of Dc × M. Moreover, its limit u(x, i) satisfies the equation Lu(x, i) = −1
for each i ∈ M. Therefore the function V (x, i) := u(x, i) satisfies equation
(3.38). This completes the proof of the theorem. 2
Theorem 3.26. A necessary and sufficient condition for positive recur-
rence with respect to a domain U = D × {`} ⊂ Rr × M is: For each i ∈ M,
there exists a nonnegative function V (·, i) : D c 7→ R such that V (·, i) is
twice continuously differentiable and that for some γ > 0,

LV (x, i) ≤ −γ, (x, i) ∈ D c × M. (3.44)

Proof. Necessity: This part follows immediately from the necessity of The-
orem 3.25 with γ = −1.
88 3. Recurrence

Sufficiency: Suppose that there exists a nonnegative function V satisfying


(n)
the conditions of the theorem. Define the stopping time σD (t) = σD ∧t∧βn
as in the proof of Theorem 3.25. Now Dynkin’s formula and equation (3.44)
imply that for any (x, i) ∈ D c × M,

(n) (n)
Ex,i V (X(σD (t)), α(σD (t))) − V (x, i)
Z σD
(n)
(t)
(n)
= Ex,i LV (X(s), α(s))ds ≤ −γEx,i σD (t).
0

(n)
Hence we have by the nonnegativity of the function V that Ex,i σD (t) ≤
V (x, i)/γ. Meanwhile, the regularity of the process Y (t) = (X(t), α(t)) im-
(n)
plies that σD (t) → σD (t) a.s. as n → ∞, where σD (t) = σD ∧ t. Therefore
Fatou’s lemma leads to Ex,i σD (t) ≤ V (x, i)/γ. Moreover, from the proof
of Theorem 3.25, σD (t) → σD a.s. as t → ∞. Thus we obtain

1
Ex,i σD ≤ V (x, i)
γ

by applying Fatou’s lemma again. Then Lemma 3.10 implies that Ex,i σU =
Ex,i σD×{`} < ∞. Because (x, i) ∈ D c × M is arbitrary, we conclude that
Y (t) is positive recurrent with respect to U . This completes the proof of
the theorem. 2

Example 3.27. Let us continue our discussion from Example 1.1. Con-
sider (1.1) and assume that for each α ∈ M = {1, 2, . . . , m} and i, j =
1, 2, . . . , n with j 6= i, aii (α) > 0 and aij (α) ≥ 0. Then for each i = 1, . . . , n
and α ∈ M, we have
 n
X 
−aii (α)x2i + ri (α) + aji (α) xi − bi (α)
j=1
 Pn 2
ri (α) + j=1 aji (α)
≤ b i (α).
− bi (α) := K
4aii (α)

Denote

b i (α) ∨ 0, for each i = 1, . . . , n and α = 1, . . . , m.


Ki (α) := K

We can further find a number ρi (α) > 0 sufficiently large so that


 n
X 
−aii (α)x2i + ri (α) + aji (α) xi − bi (α)
j=1
n
X
≤− Kj (α) − 1, for any xi > ρi (α),
j=1
3.4 Positive and Null Recurrence 89

Denote
ρ := max {ρi (α), i = 1, . . . , n, α = 1, . . . , m} . (3.45)
Then the solution x(t) to (1.1) is positive recurrent with respect to the
domain 
Eρ := x ∈ Rn+ : 0 < xi < ρ, i = 1, 2, . . . , n .
We refer the reader to [189] for a verbatim proof.

3.4.2 Path Excursions


Applications of the positive recurrence criteria enable us to establish path
excursions of the underlying processes. Suppose that Y (t) = (X(t), α(t)) is
positive recurrent, and that the Liapunov functions V (x, i) (with i ∈ M)
are given in Theorem 3.26, so is the set D. Let D0 be a bounded open set
with compact closure satisfying D ⊂ D0 , and τ1 be the first exit time of
(X(t), α(t)) from D0 × M; that is, τ1 = min{t > 0 : X(t) 6∈ D0 }. Define
σ1 = min{t > τ1 : X(t) ∈ D0 }. We can obtain

EV (X(τ1 ), α(τ1 ))
P( sup V (X(t), α(t)) ≥ γ) ≤ , for γ > 0,
τ1 ≤t≤σ1 γ
EV (X(τ1 ), α(τ1 ))
E(σ1 − τ1 ) ≤ ,
γ
(3.46)
where γ is as given in Theorem 3.26. The idea can be continued in the
following way for k ≥ 1. Define

τk+1 = min{t > σk : X(t) 6∈ D0 }

and
σk+1 = min{t ≥ τk+1 : X(t) ∈ D0 }.
Then we can obtain estimates of E(σk+1 − τk+1 ), which gives us the ex-
pected difference of (k + 1)st return time and exit time.

3.4.3 Positive Recurrence under Linearization


This subsection is devoted to positive recurrence of regime-switching dif-
fusions under linearization with respect to the continuous component. By
linearizable systems, we mean such systems that are linearizable with re-
spect to the continuous component. These systems are important inasmuch
as linearization is widely used in many applications because it is much eas-
ier to deal with linear systems.
With Theorem 3.28 at hand, we proceed to study positive recurrence of
linearizable (in the continuous component) systems as described in con-
dition (A3.2). We are ready to present an easily verifiable condition for
positive recurrence.
90 3. Recurrence

Theorem 3.28. Assume (A3.1) and (A3.2). If


m0
X  d
X 
πi λmax b(i) + b0 (i) + σj (i)σj0 (i) < 0, (3.47)
i=1 j=1

then the process (X(t), α(t)) is positive recurrent.

Proof. For notational simplicity, define the column vector


µ = (µ1 , µ2 , . . . , µm0 )0 ∈ Rm0
with
 Xd 
1
µi = λmax b(i) + b0 (i) + σj (i)σj0 (i) .
2 j=1

Let
m0  d 
1X 0
X
0
β := −πµ = − πi b(i) + b (i) + σj (i)σj (i) .
2 i=1 j=1

Note that β > 0 by (3.47). Because


π(µ + β1l) = πµ + β · π1l = −β + β = 0,
condition (A3.2) and Lemma A.12 yield that the equation
b = µ + β1l
Qc
has a solution c = (c1 , c2 , . . . , cm0 )0 ∈ Rm0 . Thus we have
m0
X
µi − qbij cj = −β, i ∈ M. (3.48)
j=1

For each i ∈ M, consider the Liapunov function


V (x, i) = (1 − γci )|x|γ ,
where 0 < γ < 1 is sufficiently small so that 1 − γci > 0 for each i ∈ M. It
is readily seen that for each i ∈ M, V (·, i) is continuous, nonnegative, and
has continuous second partial derivatives with respect to x in any deleted
neighborhood of 0. Detailed calculations reveal that for x 6= 0, we have

∇V (x, i) = (1 − γci )γ|x|γ−2 x,


 
∇2 V (x, i) = (1 − γci )γ |x|γ−2 I + (γ − 2)|x|γ−4 xx0 .

Meanwhile, it follows from (3.33) that


d
a(x, i) σ(x, i)σ 0 (x, i) X 1
= = σj (i)xx0 σj0 (i) 2 + o(1),
|x|2 |x|2 j=1
|x|
3.4 Positive and Null Recurrence 91

where o(1) → 0 as |x| → ∞. Therefore, we have that

Xd 
γ
LV (x, i) = (1 − γci ) x0 σj0 (i)|x|γ−2 Iσj (i)x
2 j=1

+x0 σj0 (i)(γ − 2)|x|γ−4 xx0 σj (i)x
+(1 − γci )γ|x|γ−2 x0 b(i)x + o(|x|γ )
X
− qij (x)|x|γ γ(cj − ci )
j6=i  !
1 X d
x0 σj0 (i)σj (i)x (x0 σj0 (i)x)2
γ
= γ(1 − γci )|x| + (γ − 2)
2
j=1
|x|2 |x|4

x0 b(i)x X cj − c i 
+ − q ij (x) + o(1) ,
|x|2 1 − γci 
j6=i
(3.49)
with o(1) → 0 as |x| → ∞. Note that
d
x0 b(i)x 1 X x0 σj0 (i)σj (i)x
+
|x|2 2 j=1 |x|2
 d  (3.50)
1 X
0 0
≤ λmax b(i) + b (i) + σj (i)σj (i) = µi .
2 j=1

Next, using condition (A3.2),


X cj − c i
qij (x)
1 − γci
j6=i
m0
X X ci (cj − ci )
= qij (x)cj + qij (x) γ (3.51)
j=1
1 − γci
j6=i
m0
X
= qbij cj + O(γ) + o(1),
j=1

where O(γ)/γ is bounded and o(1) → 0 as |x| → ∞ and γ → 0. Hence it


follows from (3.49)–(3.51) that when |x| > R with R sufficiently large and
0 < γ < 1 sufficiently small, we have
 
 m0
X 
LV (x, i) ≤ γ(1 − γci )|x|γ µi − qbij cj + o(1) + O(γ) .
 
j=1

Furthermore, by virtue of (3.48), we have

LV (x, i) ≤ γ(1 − γci )|x|γ (−β + o(1) + O(γ)) ≤ −K < 0,


92 3. Recurrence

for any (x, i) ∈ Rr × M with |x| > r, where K is a positive constant.


Therefore we conclude from Theorem 3.28 that the process (X(t), α(t)) is
positive recurrent. 2
The above result further specializes to the following corollary.

Corollary 3.29. Suppose that the continuous component X(t) is one-


dimensional and that as |x| → ∞,

b(x, i) x
= bi + o(1),
|x| |x|
σ(x, i) x (3.52)
= σi + o(1),
|x| |x|

for some constants bi , σi , i ∈ M. If


m0
X  
1 σ2
πb − πσ 2 := πi bi − i < 0, (3.53)
2 i=1
2

where π = (π1 , . . . , πm0 ) ∈ R1×m0 is as in condition (A3.2), then the process


(X(t), α(t)) is positive recurrent.

Next we develop a sufficient condition for nonpositive recurrence.

Theorem 3.30. Suppose that there exists a nonempty bounded domain


D such that there exist functions V (x, i) and W (x, i) defined on D c × M
satisfying

(a) V (x, i) ≥ 0 for all (x, i) ∈ D c × M, and for some positive constant k,

0 ≤ LV (x, i) ≤ k for all (x, i) ∈ D c × M;

(b) W (x, i) ≤ 0 for all (x, i) ∈ ∂D × M, and

LW (x, i) ≥ 0 for all (x, i) ∈ D c × M;

(c) for an increasing sequence of bounded domains En ⊃ D with bound-


aries Γn ,

inf (x,i)∈Γn ×M V (x, i)


= Rn → ∞ as n → ∞. (3.54)
sup(x,i)∈Γn ×M W (x, i)

If there exists some (x, α) ∈ D c × M satisfying W (x, α) > 0, then the


process (X(t), α(t)) is not positive recurrent. That is, EσD = ∞ for all
(x, α) ∈ D c × M such that W (x, α) > 0, where
x,α
σD = σ D = inf {t ≥ 0 : X x,α (t) ∈ D} .
3.4 Positive and Null Recurrence 93

Proof. Consider the function V − Rn W in (D c ∩ En ) × M. Then from


conditions (a), (b), and (c), we have

L(V − Rn W )(x, i) ≤ k,
(V − Rn W )(x, i)|x∈∂D ≥ 0,
(V − Rn W )(x, i)|x∈Γn ≥ 0.

Fix any (x, α) ∈ D c × M satisfying W (x, α) > 0. Then it follows that


P {σD ∧ βn < ∞} = 1 for any n > (n0 ∨ |y|), where n0 is a sufficiently
large integer such that D ⊂ {x ∈ Rr : |x| < n0 } and βn is as in (2.14).
Hence Dynkin’s formula implies that

E(V − Rn W )(X(σD ∧ βn ), α(σD ∧ βn )) − (V − Rn W )(x, α)


Z σD ∧βn
=E L(V − Rn W )(X(u), α(u))du
0
≤ kE[σD ∧ βn ].

That is,
1
EσD ∧ βn ≥ [Rn W (x, α) − V (x, α)].
k
Therefore, we have from (3.54) that
EσD ∧ βn → ∞, as n → ∞.
Meanwhile, note that σD ≥ σD ∧ βn . We have
EσD = ∞.
Therefore the process (X(t), α(t)) is not positive recurrent by virtue of
Theorem 3.12. 2

3.4.4 Null Recurrence


Null recurrence is more complex. Even for diffusions alone, the available
results are scarce. For simplicity here, we assume that the continuous com-
ponent of the system is a diffusion without drift term.
Theorem 3.31. Consider a real-valued regime-switching diffusion process

 dX(t) = σ(X(t), α(t))dw(t)
 P{α(t + ∆) = j|α(t) = i, X(s), α(s), s ≤ t} = q (X(t))∆ + o(∆).
ij
(3.55)
If for some constants 0 ≤ β ≤ 1, k1 > 0, and k2 > 0,
σ 2 (x, i) ≤ k1 |x|1−β , for all (x, i) ∈ {x ∈ R : |x| ≥ k2 } × M, (3.56)
then the process (X(t), α(t)) defined by (3.55) is null recurrent.
94 3. Recurrence

Proof. In fact, by Theorem 3.12, it suffices to prove that the process is


null recurrent with respect to D × {`}, where D = (−k2 , k2 ) and ` ∈ M.
For each i ∈ M, the Liapunov function
 1+β
|x| , if β > 0,
V (x, i) =
|x|(ln |x| − 1) + K, if β = 0,

where K is a sufficiently large positive constant, and

W (x, i) = |x| − k2 for (x, i) ∈ D c × M.

Then detailed computations show that the nonnegative function W satisfies


(3.23) in Theorem 3.14. This implies that the process is recurrent with
respect to D × {`}.
Similarly, by virtue of (3.56), detailed computations show that conditions
(a)–(c) of Theorem 3.30 are satisfied with the functions V and W as chosen
above. It thus follows from Theorem 3.30 that the process (X(t), α(t)) is
not positive recurrent with respect to D × {`}. The details are omitted.
Therefore the process defined by (3.55) is null recurrent. 2
Remark 3.32. If for some positive constants %, k1 , and k2 , we have

σ 2 (x, i) ≥ k1 |x|1+% , for all (x, i) ∈ {x ∈ R : |x| ≥ k2 } × M, (3.57)

then the process (X(t), α(t)) defined by (3.55) is positive recurrent.


To verify this, we choose 1 > ς > 0 such that ς + % > 1 and define
V (x, i) = |x|ς for (x, i) ∈ {x ∈ R : |x| ≥ k2 } × M. Then we can show
1 1
LV (x, i) = ς(ς − 1)|x|ς−2 σ 2 (x, i) ≤ k1 ς(ς − 1)k2ς+%−1 < 0,
2 2
where (x, i) ∈ {x ∈ R : |x| ≥ k2 } × M. Therefore, it follows from Theo-
rem 3.28 that the process (X(t), α(t)) is positive recurrent.

3.5 Examples
In this section, we provide several examples to illustrate the results obtained
thus far.
Example 3.33. Suppose that for each x ∈ Rr and each i ∈ M, there exist
positive constants c and γ such that for all x with |x| ≥ c,
x
b0 (x, i) < −γ; (3.58)
|x|

where |x| denotes the norm of x. That is, the drifts are pointed inward.
Then the process Y (t) = (X(t), α(t)) is positive recurrent.
3.5 Examples 95

First note that (3.5) implies that for all x with |x| ≥ r/(γκ1 ), we have
r
X r
X
tr(a(x, i)) = e0j a(x, i)ej ≤ κ−1
1 ≤ γ|x|.
j=1 j=1

By Theorem 3.12, it is enough to prove that the process Y (t) = (X(t), α(t))
is positive recurrent with respect to the domain U := {|x| < %} × {`}
for some ` ∈ M, where % := max{c, r/(γκ1 )}. To this end, consider the
function
1 0
V (x, i) = x x, for each i ∈ M and for all |x| ≥ %.
2
For each i ∈ M, ∇V (·, i) = x and ∇2 V (·, i) = I, where I is the r×r identity
matrix. Thus by the definition of L, we have for all (x, i) ∈ {|x| ≥ %} × M
that
1 x
LV (x, i) = tr(a(x, i)) + b0 (x, i) |x|
2 |x|
1
< γ|x| − γ|x|
2
1 1
= − γ|x| ≤ − κ%.
2 2
Then the conclusion follows from Theorem 3.26 immediately.
Remark 3.34. Suppose that the diffusion component X(t) of the process
Y (t) = (X(t), α(t)) is one-dimensional and that there exist constants c0 > 0
and c1 > 0 such that for each i ∈ M,

 < −c , for x > c ,
1 0
b(x, i) (3.59)
 >c , for x < −c .
1 0

Then the process Y (t) = (X(t), α(t)) is positive recurrent. In fact, the
conclusion follows immediately if we observe that (3.59) satisfies (3.58).
Alternatively, we can verify this directly by defining the Liapunov function
V (x, i) = |x| for each i ∈ M.
Example 3.35. To illustrate the utility of Theorem 3.26, consider a real-
valued process

dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t), (3.60)

where α(t) is a two-state random jump process, with x-dependent generator


 
1 1 1 1
 − − cos x + cos x 
3 4 3 4 ,
Q(x) =  
7 1 7 1
+ sin x − − sin x
3 2 3 2
96 3. Recurrence

and
b(x, 1) = −x, σ(x, 1) = 1, b(x, 2) = x, σ(x, 2) = 1.
Thus (3.60) can be regarded as the result of the following two diffusions:
dX(t) = −X(t)dt + dw(t), (3.61)
and
dX(t) = X(t)dt + dw(t), (3.62)
switching back and forth from one to the other according to the movement
of α(t).
Note that (3.61) is positive recurrent whereas (3.62) is a transient diffu-
sion process. But, the switching diffusion (3.60) is positive recurrent. We
verify these as follows. Consider the Liapunov function V (x, 1) = |x|. Let
L1 be the operator associated with (3.61). Then we have for all |x| ≥ 1,
L1 V (x, 1) = −x sign(x) = −|x| ≤ −1 < 0. It follows from [83, Theorem
3.7.3] that (3.61) is positive recurrent. Recall that the real-valued diffusion
process
dX(t) = b(X(t))dt + σ(X(t))dw(t)
with σ(x) 6= 0 for all x ∈ R, is recurrent if and only if
Z x  Z u 
b(z)
exp −2 2
dz du → ±∞ (3.63)
0 0 σ (z)

as x → ±∞; see [83, p. 105]. Direct computation shows that (3.62) fails to
satisfy this condition and hence is transient.
Next, we use Theorem 3.26 to demonstrate that the switching diffusion
(3.60) is positive recurrent for appropriate Q. Consider Liapunov functions
7
V (x, 1) = |x|, V (x, 2) = |x|.
3
We have
  
1 1 7 2 2
LV (x, 1) = −x · sign x + + cos x − 1 |x| ≤ − |x| ≤ − ,
3 4  3  9 9
7 7 1 7 1 1
LV (x, 2) = x · sign x + + sin x 1− |x| ≤ − |x| ≤ − ,
3 3 2 3 9 9
for all |x| ≥ 1. Thus the switching diffusion (3.60) is positive recurrent by
Theorem 3.26.
Example 3.36. To illustrate the result of Corollary 3.29, we consider a
real-valued process given by

 dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t)
 P{α(t + ∆) = j|α(t) = i, X(s), α(s), s ≤ t} = q (X(t))∆ + o(∆),
ij
(3.64)
3.5 Examples 97

with the following specifications. The jump component α(t) has three states
and is generated by
 
2 + |x| 1 + 3x2 2 + |x| 1 + 3x2
 − 1 + x2 − 2 + x2 1 + x2 2 + x2 
 sin x sin x cos x cos x 
 
Q(x)=  1− − − 3 2 + ,
 1 + x2 1 + x2 2 + |x| 2 + |x| 
 1 + cos2 x cos2 x + 1 
2 − − 2
2 + x2 2 + x2
and the drift and diffusion coefficients are given by
3 x
b(x, 1) = 3x − 1, b(x, 2) = x + 1, b(x, 3) = −x + ,
2 1 + |x|
p p p
σ(x, 1) = 3 + x2 , σ(x, 2) = 2 − sin x + 2x2 , σ(x, 3) = 3 + 4 + x2 .
Hence associated with (3.64), there are three diffusions
p
dX(t) = (3X(t) − 1)dt + 3 + X 2 (t)dw(t), (3.65)
 
3 p
dX(t) = X(t) + 1 dt + 2 − sin(X(t)) + 2X 2 (t)dw(t), (3.66)
2
   
X(t) p
dX(t) = − X(t) dt + 3 + 4 + X 2 (t) dw(t), (3.67)
1 + |X(t)|
switching back and forth from one to another according to the movement
of the jump component α(t).
Note that (3.67) is positive recurrent, whereas (3.65) and (3.66) are tran-
sient diffusions. But due to the stabilization effect of α(t), the switching
diffusion (3.64) is positive recurrent. We verify these as follows. Consider
the Liapunov function V (x) = |x|γ with 0 < γ < 1 sufficiently small; let L1
be the operator associated with the third equation in (3.65). Detailed com-
putation shows that for all |x| ≥ 1, we have L1 V (x) ≤ − 21 γ < 0. Thus it
follows from [83, Theorem 3.7.3] that (3.67) is positive recurrent. Detailed
computations show that (3.65) and (3.66) fail to satisfy (3.63) and hence
these diffusions are transient.
Next we use Corollary 3.29 to demonstrate that the switching diffusion
(3.64) is positive recurrent. In fact, it is readily seen that as x → ∞, the
constants b(i) and σ 2 (i), i = 1, 2, 3, as in (3.52) are given by
3
b(1) = 3, b(2) = , b(3) = −1,
2 (3.68)
σ 2 (1) = 1, σ 2 (2) = 2, σ 2 (3) = 1.
In addition, as |x| → ∞, Q(x) tends to
 
−3 0 3
 
b  
Q =  1 −3 2 .
 
0 2 −2
98 3. Recurrence

By solving the system of equations



 πQ b = 0,
 π1l = 1,

e
we obtain the stationary distribution π associated with Q
 
2 6 9
π= , , . (3.69)
17 17 17

Thus by virtue of (3.68) and (3.69), observe that


3
X  
σ 2 (i) 23
πi b(i) − =− < 0.
i=1
2 34

Thus Corollary 3.29 implies that (3.64) is positive recurrent; see the sample
path demonstrated in Figure 3.1.

35
X(t)
30 α(t)

25

20

15
X(t) and α(t)

10

−5

−10

−15
0 2 4 6 8 10
t

FIGURE 3.1. Sample path of switching diffusion (3.64) with initial condition
(x, α) = (3, 1).

Example 3.37. Consider a two-dimensional (in the continuous compo-


nent) regime-switching diffusion

 dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t)
 P{α(t + ∆) = j|α(t) = i, X(s), α(s), s ≤ t} = q (X(t))∆ + o(∆)
ij
(3.70)
3.5 Examples 99

with (X(t), α(t)) ∈ R2 × {1, 2}. The discrete component α(t) is generated
by
 
2(1 + cos(x21 )) 2(1 + cos(x21 ))
 −3 + 3 + x2 + x2 3−
3 + x21 + x22 
Q(x1 , x2 ) = 
 1
1 2
1
,

1+ p −1 − p
1 + x21 + x22 2
1 + x 1 + x2 2

and
 
−x1 + 2x2
b(x1 , x2 , 1) =  ,
2x2
 p 
3 + sin x1 + 2 + x22 0
σ(x1 , x2 , 1) =  p ,
0 2 − cos x2 + 1 + x21
 
−3x1 − x2
b(x1 , x2 , 2) =  ,
x1 − 2x2
 
1 1
σ(x1 , x2 , 2) =  p .
0 10 + 3 + x21

Associated with the regime-switching diffusion (3.70), there are two diffu-
sions
dX(t) = b(X(t), 1)dt + σ(X(t), 1)dw(t), (3.71)
and
dX(t) = b(X(t), 2)dt + σ(X(t), 2)dw(t), (3.72)
switching back and forth from one to another according to the movement of
the jump component α(t), where w(t) = (w1 (t), w2 (t))0 is a two-dimensional
standard Brownian motion. By selecting appropriate Liapunov functions as
in [83, Section 3.8], or using the criteria in [15], we can verify that (3.71)
is transient and (3.72) is positive recurrent.
Next we use Theorem 3.28 to show that the switching diffusion (3.70)
is positive recurrent owing to the presence of the stabilizing effect of the
discrete component α(t). Note that the matrices b(i) and σj (i), i, j = 1, 2,
and Q b as in condition (A3.2) are
   
−1 2 −3 −1
b(1) =   , b(2) =   , σ1 (2) = 0,
0 2 1 −2
   
0 1 −3 3
σ1 (1) = σ2 (2) = σ20 (1) =  b=
 , and Q .
0 0 1 −1
100 3. Recurrence

b is π = (0.25, 0.75), and


Thus the stationary distribution associated with Q

λmax (b(1) + b0 (1) + σ1 (1)σ10 (1) + σ2 (1)σ20 (1)) = 5.6056,


λmax (b(2) + b0 (2) + σ1 (2)σ10 (2) + σ2 (2)σ20 (2)) = −4.
This yields that
2
X
πi λmax (b(i) + b0 (i) + σ1 (i)σ10 (i) + σ2 (i)σ20 (i)) = −1.5986 < 0.
i=1

Therefore, we conclude from Theorem 3.28 that the switching diffusion


(3.70) is positive recurrent. For comparison, we begin by considering a
sample path of the switching diffusion. Next treating t as a parameter and
eliminating it from the two components x1 and x2 , we plot a curve of x2
versus x1 in the “phase space.” Borrowing the terminology from ordinary
differential equations, and abusing the terminology slightly, we still call such
plots phase portraits henceforth. The phase portrait Figure 3.2 confirms
our findings. For comparison and better visualization, we also present the
componentwise sample paths in Figure 3.3 (a) and (b).

30

20

10
X (t)

0
2

−10

−20

−30
−25 −20 −15 −10 −5 0 5 10 15
X1(t)

FIGURE 3.2. Phase portrait of switching diffusion (3.70) with initial condition
(x, α), where x = [2.5, 2.5]0 and α = 1.

3.6 Proofs of Several Results

Proof of Lemma 3.7. It suffices to prove the lemma when E∪∂E ⊂ D and
∂E is sufficiently smooth. Fix any (x, i) ∈ E c × M. Let G ⊂ Rr be an open
3.6 Proofs of Several Results 101

15
X1(t)
10 α(t)

0
X (t) and α(t)

−5
1

−10

−15

−20

−25
0 2 4 6 8 10
t

(a) Sample path (X1 (t), α(t)) of (3.70).

30
X (t)
2
α(t)
20

10
X (t) and α(t)

0
2

−10

−20

−30
0 2 4 6 8 10
t

(b) Sample path (X2 (t), α(t)) of (3.70).

FIGURE 3.3. Componentwise sample path of switching diffusion (3.70) with ini-
tial condition (x, α), where x = [2.5, 2.5]0 and α = 1.
102 3. Recurrence

and bounded set with sufficiently smooth boundary such that D ∪ ∂D ⊂ G.


Without loss of generality, we may further assume that (x, i) ∈ G × M.
Define a sequence of stopping times by

ς1 := inf{t ≥ 0 : X(t) ∈ ∂G}, (3.73)

and for n = 1, 2, . . .,

ς2n := inf{t ≥ ς2n−1 : X(t) ∈ ∂D},


(3.74)
ς2n+1 := inf{t ≥ ς2n : X(t) ∈ ∂G}.

It follows from (3.17) and Theorem 3.2 that ςn < ∞ a.s. Px,i for n =
1, 2, . . . Let H := G − E and define u(x, i) := Px,i {X(τH ) ∈ ∂E}. Note
that u(x, j)|x∈∂E = 1 and u(x, j)|x∈∂G = 0 for all j ∈ M. Therefore, it
follows that
m0 Z
X
u(x, i) = Px,i {(X(τH ), α(τH )) ∈ (dy × {j})} u(y, j)
j=1 ∂E
m0 Z
X
+ Px,i {(X(τH ), α(τH )) ∈ (dy × {j})} u(y, j)
j=1 ∂G

= Ex,i u(X(τH ), α(τH )).

Thus u(x, i) ≥ 0 is L-harmonic in H × M by Lemma 3.5. Moreover u is


not identically zero since u(x, i) = 1 for (x, i) ∈ ∂E × M. Therefore the
maximum principle for L-harmonic functions [52] implies that

inf u(x, i) ≥ δ1 > 0, (3.75)


(x,i)∈K×M

where K is some compact subset of H containing x and ∂D. Define

A0 := {X(t) ∈ ∂E, for some t ∈ [0, ς1 )}, (3.76)

and for n = 1, 2, . . .,

An := {X(t) ∈ ∂E, for some t ∈ [ς2n , ς2n+1 )}. (3.77)

Note that the event Ac0 implies that X(τH ) = X(ς1 ) ∈ ∂G. Hence we have
from (3.75) that

Px,i (Ac0 ) ≤ Px,i (X(τH ) ∈ ∂G) = 1 − u(x, i) ≤ 1 − δ1 .

Then it follows from the strong Markov property and (3.75) that
( n )
\
Px,i Ak ≤ (1 − δ1 )n+1 .
c
(3.78)
k=0
3.6 Proofs of Several Results 103

Thus, we have

Px,i {σE = ∞} = Px,i {X(t) ∈


/ ∂E, for any t ≥ 0}
( n )
\
c
≤ lim Px,i Ak
n→∞
k=0
≤ lim (1 − δ1 )n+1 = 0.
n→∞

It follows that Px,i {σE < ∞} = 1 as desired. 2


Proof of Lemma 3.8. As in Lemma 3.7, it is enough to prove the lemma
when E ∪ ∂E ⊂ D and ∂E is sufficiently smooth. Fix any (x, i) ∈ E c ×
M. Let G ⊂ Rr be an open and bounded set with sufficiently smooth
boundary such that D ∪ ∂D ⊂ G. As in the proof of Lemma 3.7, we may
further assume that (x, i) ∈ G × M. Define stopping times ς1 , ς2 , . . . and
events A0 , A1 , A2 , . . . as in (3.73), (3.74), (3.76), and (3.77) in the proof of
Lemma 3.7. It follows from (3.18) and Lemma 3.7 that Px,i {σE < ∞} = 1.
Tn−1
Note that if ς2n < σE < ς2n+1 , then the event k=0 Ack happens a.s. Hence,
it follows from (3.78) that
(n−1 )
\
Px,i {ς2n < σE < ς2n+1 } ≤ Px,i Ack ≤ (1 − δ1 )n .
k=0

Therefore, we have

X
Ex,i τ Ec = Ex,i σE I{0<σE <ς1 } + Ex,i σE I{ς2n <σE <ς2n+1 }
n=1

X
≤ Px,i [0 < σE < ς1 ]Ex,i ς1 + Px,i [ς2n < σE < ς2n+1 ]Ex,i ς2n+1
n=1

X
≤ (1 − δ1 )n Ex,i ς2n+1 ,
n=0

where IA is the indicator of the set A. In what follows, denote by Mi


(i = 1, 2, 3) positive real numbers. Because (x, i) ∈ G × M, it follows from
Theorem 3.2 that Ex,i ς1 = Ex,i τG ≤ M1 < ∞. Consequently, using σD and
τG defined in (3.7) (where τG is defined with G replacing D),

Ex,i ς3 = Ex,i ς1 + Ex,i EX(ς1 ),α(ς1 ) (ς3 − ς1 )


≤ M1 + sup Ey,j σD + sup Ez,k τG
(y,j)∈∂G×M (z,k)∈∂D×M

≤ M1 + M2 + M3 ≤ 2M,

where M = max{M1 , M2 + M3 } < ∞. Note that in the above deductions,


we used equation (3.18) and Theorem 3.2. Likewise, in general, we have
104 3. Recurrence

Ex,i ς2n+1 ≤ (n + 1)M for any n = 1, 2, . . . Therefore, it follows that



X
Ex,i σE ≤ (1 − δ1 )n (n + 1)M < ∞.
n=0

This completes the proof of the lemma. 2


Proof of Lemma 3.9. Fix any ` ∈ M. It suffices to prove (3.20) when
(x, i) ∈ D × (M − {`}) because the process Y (t) = (X(t), α(t)), starting
from (y, j) ∈ D c × M, will reach D × M in finite time a.s. Py,j by (3.19).
Choose ε > 0 sufficiently small such that B ⊂ B ⊂ B1 ⊂ B1 ⊂ D, where

B = B(x, ε) = {y ∈ Rr : |y − x| < ε}, and B1 = B(x, 2ε). (3.79)

Redefine
ς1 := inf{t ≥ 0 : X(t) ∈ ∂B}, (3.80)
and for n = 1, 2, . . .,

ς2n := inf{t ≥ ς2n−1 : X(t) ∈ ∂B1 },


(3.81)
ς2n+1 := inf{t ≥ ς2n : X(t) ∈ ∂B}.

Note that equation (3.19), Theorem 3.2, and Lemma 3.7 imply that ςn < ∞
a.s. Px,i . Set
n o
u(x, i) := Px,i σB×{`} < τB1 .

As in the proof of Lemma 3.7, we can verify that u(x, i) is L-harmonic


in B1 × M. Moreover, u is not identically zero, because u(x, `)|x∈∂B = 1.
Therefore, the maximum principle [52] implies that

inf u(x, i) ≥ δ2 > 0. (3.82)


(x,i)∈B×M

Redefine
A0 := {α(t) = `, for some t ∈ [0, ς2 )}, (3.83)
and for n = 1, 2, . . .,

An := {α(t) = `, for some t ∈ [ς2n+1 , ς2n+2 )}. (3.84)

Using almost the same argument as in the proof of Lemma 3.7, we obtain
that
( n )
\
c
Px,i (A0 ) ≤ 1 − δ2 , and Px,i Ak ≤ (1 − δ2 )n+1 .
c
(3.85)
k=0
3.6 Proofs of Several Results 105

Thus, we have

Px,i {(X(t), α(t)) ∈


/ D × {`}, for any t ≥ 0}

≤ Px,i (X(t), α(t)) ∈ / B × {`}, for any t ≥ 0
( n ) 1
\
≤ lim Px,i Ack
n→∞
k=0
≤ lim (1 − δ2 )n+1 = 0.
n→∞

As a result,

Px,i {σD×{`} = ∞} = Px,i {(X(t), α(t)) ∈


/ D × {`}, for any t ≥ 0} = 0,

or Px,i {σD×{`} < ∞} = 1. This completes the proof of the lemma. 2


Proof of Lemma 3.10. Fix any ` ∈ M. As in Lemma 3.9, it is enough to
prove (3.22) when (x, i) ∈ D × (M − {`}). Let the balls B and B1 , stop-
ping times ς1 , ς2 , . . ., and events A0 , A1 , . . . be as in (3.79)–(3.81), (3.83),
and (3.84) in the proof of Lemma 3.9. It follows from equation (3.21) and
Lemma 3.9 that Px,i {σD×{`} < ∞} = 1. Observe that if ς2n ≤ σD×{`} <
Tn−1
ς2n+2 , then the event k=0 Ack happens a.s. Hence we have from (3.85) that
(n−1 )
\
Px,i {ς2n ≤ σD×{`} < ς2n+2 } ≤ Px,i Ack ≤ (1 − δ2 )n .
k=0

It follows that

Ex,i σD×{`} = Ex,i σD×{`} I{0≤σD×{`} <ς2 }


X∞
+ Ex,i σD×{`} I{ς2n ≤σD×{`} <ς2n+2 }
n=1
≤ Px,i [0 ≤ σD×{`} < ς2 ]Ex,i ς2
X∞
+ Px,i [ς2n ≤ σD×{`} < ς2n+2 ]Ex,i ς2n+2
n=1

X
≤ (1 − δ2 )n Ex,i ς2n+2 .
n=0

Following almost the same argument as that for the proof of Lemma 3.8, we
can show that Ex,i ς2n ≤ nM for some positive constant M . Consequently,

X
Ex,i σD×{`} ≤ (1 − δ2 )n (n + 1)M < ∞.
n=0

The proof of the lemma is thus completed. 2


106 3. Recurrence

Proof of Proposition 3.19. In view of Theorem 3.18, it is enough to


prove the first assertion only. Denote B = B(x0 , ρ) and U = B × {`}.
Define for any T > 0 and any 0 < ε < ρ a sequence of stopping times by
ς0 := 0, and for n ≥ 1,

ςn := inf {t > ςn−1 + T : (X(t), α(t)) ∈ B(x0 , ε) × {`}} .

(We use the convention that inf {∅} = ∞.) Then by virtue of the strong
Markov property, we can show
 n
sup Px,` {ςn < ∞} ≤ sup Px,` {ς1 < ∞} . (3.86)
x∈B(x0 ,ε) x∈B(x0 ,ε)

Hence we have

X  Z ςn+1 
R0 (x0 , `, U ) = Ex0 ,` I{ςn <∞} IU (X(t), α(t))dt
n=0 ςn
X∞  Z ς1 
= Ex0 ,` I{ςn <∞} EX(ςn ),α(ςn ) IU (X(t), α(t))dt .
n=0 0

Note that
Z ς1 Z T Z ς1
IU (X(t), α(t))dt = IU (X(t), α(t))dt + IU (X(t), α(t))dt ≤ T.
0 0 T

It follows from (3.86) that



X
R0 (x0 , `, U ) ≤ T Px0 ,` {ςn < ∞}
n=0  n
X∞
≤T sup Px,` {ς1 < ∞} .
n=0 x∈B(x0 ,`)

But R0 (x0 , `, U ) = ∞, thus


n o
T
sup Px,` σB(x 0 ,ε)×{`}
< ∞ = sup Px,` {ς1 < ∞} = 1,
x∈B(x0 ,ε) x∈B(x0 ,`)

T
where σB(x 0 ,ε)×{`}
:= ς1 = inf {t > T : (X(t), α(t)) ∈ B(x0 , ε) × {`}}. Be-
cause B(x0 , ε) ⊂ B,
n o n o
T T
Px,` σB(x 0 ,ε)×{`}
< ∞ ≤ P x,` σ B×{`} < ∞ ,
n o
T
and supx∈B(x0 ,ε) Px,` σB×{`} < ∞ = 1. Finally, because (X(t), α(t)) is
strong Feller, we obtain by letting ε → 0 that
n o
T
Px0 ,` σB×{`} <∞ =1
3.6 Proofs of Several Results 107

for any T > 0. Thus (3.30) follows. This completes the proof of the propo-
sition. 2
Proof of Theorem 3.20. Sufficiency. This is clear by a contradiction
argument.
Necessity. Assume the process (X(t), α(t)) is transient. Fix any (x, α) ∈
Rr × M. Let ρ ∈ R be sufficiently large such that ρ > 1 ∨ |x|. The process
(X(t), α(t)) is transient, therefore it is transient with respect to the “cylin-
der” B(0, ρ) × M. Thus there exists some (y0 , j0 ) ∈ (Rr − B(0, ρ)) × M
such that 
Py0 ,j0 σB(0,ρ) < ∞ < 1. (3.87)

 |y0 | = r0 > ρ. Then by virtue of Lemma 3.4, the function (y, j) 7→


Assume
Py,j σB(0,ρ) < ∞ is L-harmonic. Hence it follows from the maximum
principle for L-harmonic functions [52] and (3.87) that

sup Py,j σB(0,ρ) < ∞ = δ < 1. (3.88)
|y|=r0 ,j∈M

Then using the standard argument (see [15, Theorem 3.2]), we can show
that n o
Px,α lim inf |X(t)| > ρ − 1 = 1.
t→∞

Because this is true for all ρ > 0,

Px,α {|X(t)| → ∞ as t → ∞} = 1,

as desired. 2
Proof of Proposition 3.21. By virtue of Theorem 3.18, it is enough to
show that the point (x0 , `) is transient in the sense that there exists some
ε0 > 0 and a finite time T0 > 0 such that

Px0 ,` {(X(t), α(t)) ∈/ B(x0 , ε0 ) × {`} , for all t ≥ T0 } = 1. (3.89)


R∞
Clearly R0 (x0 , `, U ) = Ex0 ,` 0 IU (X(t), α(t))dt > 0. Thus there exists
some t > 0 such that
Z t
Ex0 ,` IU (X(t), α(t))dt > 0.
0

Because the process (X(t), α(t)) is strong Feller by virtue of Theorem 2.24,
we conclude that there exists a neighborhood E ⊂ D of x0 such that
Z t
inf Ey,` IU (X(t), α(t))dt > 0.
y∈E 0

Hence we have
δ := inf R0 (x, `, U ) > 0. (3.90)
x∈E
108 3. Recurrence

For each T > 0, define


T
σE×{`} = inf {t > T : (X(t), α(t)) ∈ E × {`}} . (3.91)

Then it follows from the strong Markov property that


Z ∞
R0 (x0 , `, U ) ≥ Ex0 ,` IU (X(t), α(t))dt
T
" Z #

≥ Ex0 ,` InσT <∞
o IU (X(t), α(t))dt
E×{`} T
σE×{`}
 
= Ex0 ,` InσT o R (X(σ T T
<∞ 0 E×{`} ), α(σE×{`} ), U )
E×{`}
n o
T
≥ inf R0 (x, `, U )Px0 ,` σE×{`} <∞ .
x∈E

Hence we have from (3.90) and (3.91) that


n o 1 Z ∞
T
Px0 ,` σE×{`} < ∞ ≤ Ex0 ,` IU (X(t), α(t))dt.
δ T

For any ε > 0, in view of the assumption R0 (x0 , `, U ) < ∞, we can choose
some Te0 > 0 such that
Z ∞
Ex0 ,` IU (X(t), α(t))dt < δε
Te0

and hence n e o
T0
Px0 ,` σE×{`} < ∞ < ε.
That is,
n o
T
lim Px0 ,` σE×{`} <∞
T →∞ n o
T
= Px0 ,` σE×{`} < ∞, for every T > 0 = 0.

Therefore there exists some T0 > 0 such that


n o
T0
Px0 ,` σE×{`} = ∞ > 0. (3.92)

Hence (3.89) follows. 2

3.7 Notes
Under general conditions, necessary and sufficient conditions for recurrence,
nonrecurrence, and positive recurrence have been studied in this chap-
ter. We refer the reader to Chapter 2; see also Skorohod [150] for related
3.7 Notes 109

stochastic differential equations involving Poisson measures describing the


evolution of the switching processes. In our formulation, the finite-state
process depicts a random environment that has right-continuous sample
paths and that cannot be described by a diffusion. Consequently, both
continuous dynamics (diffusions) and discrete events (jumps) coexist yield-
ing hybrid dynamic systems, which provide a more realistic formulation
for many applications. The discrete events are frequently used to provide
more realistic models and to capture random evolutions. For instance, the
switching may be used to describe stochastic volatility resulting from mar-
ket modes and interest rates, as well as other economic factors in modeling
financial markets, to enhance the versatility in risk management practice, to
better understand ruin probability in insurance, and to carry out dividend
optimization tasks.
Regime-switching diffusions have received much attention lately. For in-
stance, optimal controls of switching diffusions were studied in [11] us-
ing a martingale problem formulation; jump-linear systems were treated in
[78]; stability of semi-linear stochastic differential equations with Marko-
vian switching was considered in [6]; ergodic control problems of switching
diffusions were studied in [52]; stability of stochastic differential equations
with Markovian switching was dealt with in [116, 136, 183]; asymptotic
expansions for solutions of integro-differential equations for transition den-
sities of singularly perturbed switching-diffusion processes were developed
in [74]; switching diffusions were used for stock liquidation models in [184].
For some recent applications of hybrid systems in communication networks,
air traffic management, control problems, and so on, we refer the reader to
[67, 68, 123, 137, 155] and references therein.
In [6, 116, 183, 184], Q(x) = Q, a constant matrix. In such cases,
α(·) is a continuous-time Markov chain. Moreover, it is assumed that the
Markov chain α(·) is independent of Brownian motion. In our formulation,
x-dependent Q(x) is considered, and as a result, the transition rates of the
discrete event α(·) depend on the continuous dynamic X(·), as depicted in
(3.4). Although the pair (X(·), α(·)) is a Markov process, for x-dependent
Q(x), only for each fixed x, the discrete-event process α(·) is a Markov
chain. Such formulation enables us to describe complex systems and their
inherent uncertainty and randomness in the environment. However, it adds
much difficulty in analysis. Our formulation is motivated by the fact that
in many applications, the discrete event and continuous dynamic are in-
tertwined. It would be useful to relax the independence assumption of the
discrete-event process and Brownian motion.
As seen in this chapter, the study of switching diffusions is connected
with systems of partial differential equations. The works [1, 39, 46, 56, 60,
105, 108, 158] and references therein provide a systematic treatment of
partial differential equations and systems of partial differential equations.
These tools are handy to use.
4
Ergodicity

4.1 Introduction
Continuing with the study of basic properties of switching-diffusion pro-
cesses, this chapter is concerned with ergodicity. Many applications in con-
trol and optimization require minimizing an expected cost of certain objec-
tive functions. Treating average cost per unit time problems, we often wish
to “replace” the time-dependent instantaneous measure by a steady-state
(or ergodic) measure. Thus we face the following questions: Do the sys-
tems possess an ergodic property? Under what conditions do the systems
have the desired ergodicity? Significant effort has been devoted to approxi-
mating such expected values by replacing the instantaneous measures with
stationary measures when the time horizon is long enough. To justify such
a replacement, ergodicity is needed. For diffusion processes, we refer the
readers to, for example, [10, 103] among others for the study of ergodic
control problems. In what follows, we study ergodicity and reveal the main
features of the ergodic measures. We carry out our study on ergodicity by
constructing cycles and using induced discrete-time Markov chains.
We consider the two-component process Y (t) = (X(t), α(t)) as in Chap-
ter 3. Let w(t) be a d-dimensional standard Brownian motion, b(·, ·) : Rr ×
M 7→ Rr , and σ(·, ·) : Rr × M 7→ Rr×d satisfying σ(x, i)σ 0 (x, i) = a(x, i).
For t ≥ 0, let X(t) ∈ Rr and α(t) ∈ M such that

dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t),


(4.1)
X(0) = x, α(0) = α,

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 111
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_4,
© Springer Science + Business Media, LLC 2010
112 4. Ergodicity

and
P{α(t + ∆) = j|α(t) = i, X(s), α(s), s ≤ t}
(4.2)
= qij (X(t))∆ + o(∆), i 6= j.
The rest of the chapter is arranged as follows. Section 4.2 begins with
the discussion of ergodicity. The analysis is carried out by using cycles and
induced Markov chains in discrete time. Then the desired result is obtained
together with the representation of the stationary density. Section 4.3 takes
up the issue of making a switching-diffusion process ergodic by means of
feedback controls. Section 4.4 discusses some ramifications, and Section 4.5
obtains asymptotic normality when the continuous component belongs to
a compact set. Section 4.6 presents some further remarks.

4.2 Ergodicity
In this section, we study the ergodic properties of the process Y (t) =
(X(t), α(t)) under the assumption that the process is positive recurrent
with respect to some bounded domain U = E × {`}, where E ⊂ Rr and
` ∈ M are fixed throughout this section. We also assume that the boundary
∂E of E is sufficiently smooth. Let the operator L satisfy (A3.1). Then
it follows from Theorem 3.12 that the process is positive recurrent with
respect to any nonempty open set.
Let D ⊂ Rr be a bounded open set with sufficiently smooth boundary
∂D such that E ∪ ∂E ⊂ D. Let ς0 = 0 and define the stopping times
ς1 , ς2 , . . . inductively as follows: ς2n+1 is the first time after ς2n at which
the process Y (t) = (X(t), α(t)) reaches the set ∂E × {`} and ς2n+2 is the
first time after ς2n+1 at which the path reaches the set ∂D × {`}. Now we
can divide an arbitrary sample path of the process Y (t) = (X(t), α(t)) into
cycles:
[ς0 , ς2 ), [ς2 , ς4 ), . . . , [ς2n , ς2n+2 ), . . . (4.3)
Figure 4.1 presents a demonstration of such cycles when the discrete com-
ponent α(·) has three states.
The process Y (t) = (X(t), α(t)) is positive recurrent with respect to
E × {`} and hence positive recurrent with respect to D × {`} by Theo-
rem 3.12. It follows that all the stopping times ς0 < ς1 < ς2 < ς3 < ς4 < · · ·
are finite almost surely (a.s.). Because the process Y (t) = (X(t), α(t)) is
positive recurrent, we may assume without loss of generality that Y (0) =
(X(0), α(0)) = (x, `) ∈ ∂D × {`}. It follows from the strong Markov prop-
erty of the process Y (t) = (X(t), α(t)) that the sequence {Yn } is a Markov
chain on ∂D × {`}, where Yn = Y (ς2n ) = (Xn , `), n = 0, 1, . . . Let Pe (x, A)
denote the one-step transition probabilities of this Markov chain; that is,

Pe (x, A) = P (Y1 ∈ (A × {`}) | Y0 = (x, `))


4.2 Ergodicity 113

D E State 3
6
∂D ∂E

D E State 2
∂D ∂E

ς2
ς3
D E State 1
ς1
∂D ?∂E
ς4 ς0
FIGURE 4.1. A sample path of the process Y (t) = (X(t), α(t)) when m0 = 3.

for any x ∈ ∂D and A ∈ B(∂D), where B(∂D) denotes the collection of


Borel measurable sets on ∂D. Note that the process Y (t) = (X(t), α(t)),
starting from (x, `), may jump many times before it reaches the set (A, `);
see [150] for more details. Denote by Pe (n) (x, A) the n-step transition proba-
bility of the Markov chain for any n ≥ 1. For any Borel measurable function
f : Rr 7→ R, set
Z
Ex f (X1 ) := Ex,` f (X1 ) = f (y)Pe(x, dy). (4.4)
∂D

Throughout this section, we write Ex in lieu of Ex,` for simplicity. We


show that the process Y (t) = (X(t), α(t)) possesses a unique stationary
distribution. To this end, we need the following lemma.
Lemma 4.1. The Markov chain Yi = (Xi , `) has a unique stationary
distribution m(·) such that the n-step transition matrix Pe (n) (x, A) satisfies

Pe (n) (x, A) − m(A) < λn , for any A ∈ B(∂D), (4.5)

for some constant 0 < λ < 1.

Proof. Note that

Pe (x, A) = P {Y1 ∈ (A × {`})|Y0 = (x, `)}


Z
= Px,` {(X(ς1 ), α(ς1 )) ∈ (dy × {`})}
∂D
×Py,` {(X(ς2 ), α(ς2 )) ∈ (A × {`})} .
114 4. Ergodicity

Using the harmonic measure defined and Lemmas 2.2 and 2.3 in Chen and
Zhao [25], relating the kernel and surface area (similar to the solution for
the diffusion process without switching in the form of double-layer poten-
tial given in the first displayed equation in [83, p. 97]) and the harmonic
measure, we can finish the proof of this lemma analogously to that of [83,
Lemma 4.4.1]. The details are omitted. 2
Remark 4.2. Note that

(X s,X(s),α(s) (t), αs,X(s),α(s) (t))


(4.6)
= (X 0,X(0),α(0) (t + s), α0,X(0),α(0) (t + s)),

where (X 0,X(0),α(0) (u), α0,X(0),α(0) (u)) denotes the sample path of the pro-
cess (X(·), α(·)) with initial point (X(0), α(0)) at time t = 0, and a similar
definition for (X s,X(s),α(s) (t), αs,X(s),α(s) (t)). When no confusion arises, we
simply write

(X(u), α(u)) = (X 0,X(0),α(0) (u), α0,X(0),α(0) (u)).

Let τ be an Ft stopping time with Ex,i τ < ∞ and let f : Rr × M 7→ R


be a Borel measurable function. Then
Z τ
Ex,i f (X(s + t), α(s + t))ds
0 Z τ (4.7)
= Ex,i EX(s),α(s) f (X(s + t), α(s + t))ds.
0

Now we can construct the stationary distribution of the process Y (t) =


(X(t), α(t)) explicitly.
Theorem 4.3. The positive recurrent process Y (t) = (X(t), α(t)) has a
unique stationary distribution ν(·, ·) = (ν(·, i) : i ∈ M).

Proof. Recall that the cycles were defined in (4.3). Let A ∈ B(Rr ) and
i ∈ M. Denote by τ A×{i} the time spent by the path of Y (t) = (X(t), α(t))
in the set (A × {i}) during the first cycle. Set
Z
νb(A, i) := m(dx)Ex τ A×{i} , (4.8)
∂D

where m(·) is the stationary distribution of Yi = (Xi , `), whose existence


is guaranteed by Lemma 4.1. It is easy to verify that νb(·, ·) is a positive
measure defined on B(Rr ) × M. Thus for any bounded Borel measurable
function g(·) : Rr 7→ R, it follows from (4.4) and Fubini’s theorem that
Z Z Z Z
Ex g(X1 )m(dx) = m(dx) e
g(y)P (x, dy) = g(y)m(dy).
∂D ∂D ∂D ∂D
(4.9)
4.2 Ergodicity 115

Now we claim that for any bounded and continuous function f (·, ·),

m0 Z
X Z Z ς2
f (y, j)b
ν (dy, j) = m(dx)Ex f (X(t), α(t))dt (4.10)
j=1 Rr ∂D 0

holds. In fact, if f is an indicator function, that is, f (y, j)IA×{i} (y, j) for
some A ∈ B(Rr ) and i ∈ M, then from (4.8),

m0 Z
X
IA×{i} (y, j)b
ν (dy, j)
Rr
j=1 Z
= νb(A, i) = m(dx)Ex τ A×{i}
Z ∂D Z ς
2

= m(dx)Ex IA×{i} (X(t), α(t))dt.


∂D 0

Similarly, (4.10) holds for f being a simple function of the form

n
X
f (y, j) = cp IUp (y, j),
p=1

where Up ⊂ Rr × M. Finally, if f is a bounded and continuous function,


equation (4.10) follows by approximating f by simple functions. It follows
from equations (4.10), (4.6), and (4.7) that

m0 Z
X
Ex,i f (X(t), α(t))b ν (dx, i)
Rr
i=1 Z Z ς2
= m(dx)Ex EX(s),α(s) f (X(t + s), α(t + s))ds
Z∂D Z 0 ς2
= m(dx)Ex f (X(t + s), α(t + s))ds
Z∂D Z 0 ς2
= m(dx)Ex f (X(u), α(u))du
Z∂D Z
0
t+ς2
+ m(dx)Ex f (X(u), α(u))du
Z∂D Zς2t
− m(dx)Ex f (X(u), α(u))du.
∂D 0

Applying (4.9) with

Z ς2 +t
g(x) = Ex f (X(u), α(u))du,
ς2
116 4. Ergodicity

we obtain
Z Z ς2 +t
m(dx)Ex f (X(u), α(u))du
∂D Z ς2 Z ς2 +t
= m(dx)Ex EX1 ,` f (X(u + ς2 ), α(u + ς2 ))du
Z∂D Z t ς2

= m(dx)Ex f (X(u), α(u))du.


∂D 0

Note that in the above deduction, we used equation (4.6) again. Therefore,
the above two equations and (4.10) yield that
m0 Z
X m0 Z
X
Ex,i f (X(t), α(t))b
ν (dx, i) = f (x, i)b
ν (dx, i).
i=1 Rr i=1 Rr

Thus, the normalized measure


νb(A, i)
ν(A, i) = Pm0 , i ∈ M, (4.11)
b(Rr , j)
j=1 ν

defines the desired stationary distribution. The theorem thus follows. 2


Theorem 4.4. Denote by µ(·, ·) the stationary density associated with the
stationary distribution ν(·, ·) constructed in Theorem 4.3 and let f (·, ·) :
Rr × M 7→ R be a Borel measurable function such that
m0 Z
X
|f (x, i)|µ(x, i)dx < ∞. (4.12)
i=1 Rr

Then Z !
T
1
Px,i lim f (X(t), α(t))dt = f = 1, (4.13)
T →∞ T 0

for any (x, i) ∈ Rr × M, where


m0 Z
X
f= f (x, i)µ(x, i)dx. (4.14)
i=1 Rr

Proof. We first prove (4.13) if the initial distribution is the stationary


distribution of the Markov chain Yi = (Xi , αi ); that is,

P{(X(0), α(0)) ∈ (A × {`})} = m(A) (4.15)

for any A ∈ B(∂D). Consider the sequence of random variables


Z ς2n+2
ηn = f (X(t), α(t))dt. (4.16)
ς2n
4.2 Ergodicity 117

Then it follows from equation (4.15) that {ηn } is a strictly stationary se-
quence. Also from equations (4.8) and (4.10), we have
m0 Z
X
Eηn = f (x, i)b
ν (dx, i), (4.17)
i=1 Rr

for all n = 0, 1, 2, . . . Meanwhile, equation (4.5) implies that the sequence


ηn is metrically transitive. Let υ(T ) denote the number of cycles completed
up to time T . That is,
( n
)
X
υ(T ) := max n ∈ N : (ς2k − ς2k−2 ) ≤ T .
k=1

RT
Then we can decompose 0
f (X(t), α(t))dt into

Z T υ(T )
X Z T
f (X(t), α(t))dt = ηn + f (X(t), α(t))dt, (4.18)
0 n=0 ς2υ(T )

with ηn given in equation (4.16). We may assume without loss of generality


that f (x, i) ≥ 0 (for the general case, we can write f (x, i) as a difference
of two nonnegative functions). Then it follows from equation (4.18) that
υ(T )
X Z T υ(T )+1
X
ηn ≤ f (X(t), α(t))dt ≤ ηn .
n=0 0 n=0

Inasmuch as the sequence {ηn } is stationary and metrically transitive, the


law of large numbers for such sequences implies that
( n m0 Z
)
1X X
P ηk → ν (dx, i), as n → ∞ = 1.
f (x, i)b (4.19)
n i=1 R
r
k=0

In particular, if f (x, i) ≡ 1, then the above equation reduces to


( m0
)
ς2n+2 X
r
P → νb(R , i), as n → ∞ = 1. (4.20)
n i=1

Note that the positive recurrence of the process Y (t) = (X(t), α(t)) implies
that υ(T ) → ∞ as T → ∞. Clearly, υ(T )/(υ(T ) + 1) → 1 almost surely as
T → ∞. Thus, it follows from (4.20) that as T → ∞,
ς2υ(T )
ς2υ(T ) υ(T )
υ(T )
= ς → 1 a.s. (4.21)
ς2υ(T )+2 2υ(T )+2 υ(T ) + 1
υ(T ) + 1
118 4. Ergodicity

Meanwhile, since ς2υ(T ) ≤ T ≤ ς2υ(T )+2 , we have


ς2υ(T ) ς2υ(T ) ς2υ(T )
≤ ≤ = 1.
ς2υ(T )+2 T ς2υ(T )

Therefore, we have from (4.21) that


ς2υ(T )
→ 1 a.s. as T → ∞. (4.22)
T
Moreover, (4.20) implies that

υ(T ) 1
→ P m0 a.s. as T → ∞. (4.23)
ς2υ(T ) b(Rr , i)
i=1 ν

Now using equations (4.19), (4.22), and (4.23), we obtain


( Z RT
1 T f (X(t), α(t))dt υ(T ) ς2υ(T )
P f (X(t), α(t))dt = 0
T 0 υ(T ) ς2υ(T ) T
m0 Z
X 
T →∞
−→ f (x, i)ν(dx, i) = 1.
i=1 Rr

Finally, we note that


Z Z
f (x, i)ν(dx, i) = f (x, i)µ(x, i)dx
Rr Rr

by the definition of µ(·, ·). Thus, equation (4.13) holds. This proves (4.13)
if the initial distribution is (4.15).
Let (x, i) ∈ Rr × M. Because the process Y (t) = (X(t), α(t)) is positive
recurrent with respect to the domain D × {`}, we have
h Z i
1 T
Px,i lim f (X(t), α(t))dt = f
T →∞ T 0
h Z i
1 T
= Px,i lim f (X(t), α(t))dt = f .
T →∞ T ς
2

We can further write the latter as


h Z i
1 T
Px,i lim f (X(t), α(t))dt = f
ZT →∞ T ς2
= Px,i [(X(ς2 ), α(ς2 )) ∈ (dy, `)]
∂D
h Z i
1 T
×Py,` lim f (X(t), α(t))dt = f
T →∞ T 0
h Z i
1 T
= Py,` lim f (X(t), α(t))dt = f ,
T →∞ T 0
4.3 Feedback Controls for Weak Stabilization 119

where the last line above follows from the use of the invariant distribution.
The following illustrates that starting from an arbitrary point (x, i) and
arbitrary initial distribution is asymptotically equivalent to starting with
the initial distribution being the stationary distribution. Therefore, (4.13)
holds for all (x, i) ∈ Rr × M. This completes the proof of the theorem. 2
As a consequence of Theorem 4.4, we obtain the following corollary.
Corollary 4.5. Let the assumptions of Theorem 4.4 be satisfied and let
u(t, x, i) be the solution of the Cauchy problem

 ∂u(t, x, i) = Lu(t, x, i), t > 0, (x, i) ∈ Rr × M,
∂t (4.24)

u(0, x, i) = f (x, i), (x, i) ∈ Rr × M,

where f (·, i) ∈ Cb (Rr ) for each i ∈ M. Then as T → ∞,


Z T m0 Z
X
1
u(t, x, i)dt → f (x, i)µ(x, i)dx. (4.25)
T 0 i=1 Rr

Proof. By virtue of Lemma 2.21, u(t, x, i) = Ex,i f (X(t), α(t)). Thus we


have Z Z !
1 T 1 T
u(t, x, i)dt = Ex,i f (X(t), α(t))dt . (4.26)
T 0 T 0
Meanwhile, (4.13) implies that
Z T m0 Z
X
1 T →∞
f (X(t), α(t))dt −→ f (x, i)µ(x, i)dx a.s.
T 0 i=1 Rr

with respect to the probability Px,i . Then equation (4.25) follows from the
dominated convergence theorem. 2

4.3 Feedback Controls for Weak Stabilization


Many applications in control and optimization require minimizing an ex-
pected cost of a certain objective function. For example, one often wishes
to minimize an average cost per unit time cost function. The computation
could be difficult, complicated, and time consuming. Significant effort has
been devoted to approximating such expected values by replacing the in-
stantaneous measure with stationary measures when the time horizon is
long enough. To justify such a replacement, ergodicity is needed. As we
have proved in the previous section, a positive recurrent regime-switching
diffusion possesses an ergodic measure. One question of both theoretical
and practical interest is: If a switching diffusion is not ergodic, can we
120 4. Ergodicity

design suitable controls so that the controlled regime-switching diffusion


becomes positive recurrent and hence ergodic. Because positive recurrence
was named as weak stability by Wonham [160], the problem of interest can
be stated as: Can we find suitable controls to stabilize (in the weak sense)
the regime-switching diffusion system.
In this section, our goal is to design suitable controls so that the resulting
regime-switching diffusion is positive recurrent and hence ergodic. Consider
the following regime switching diffusion (4.1) with the transition of the
jump process specified by (4.2) and initial condition

X(0) = x, α(0) = α.

Consider also its controlled system

dX(t) = b(X(t), α(t))dt + B(α(t))u(X(t), α(t))dt + σ(X(t), α(t))dw(t),


(4.27)
where B(i) ∈ Rr×r , i ∈ M are constant matrices, and u(·, ·) : Rr ×M 7→ Rr
denotes the feedback control to be identified.
It is well known that the regime-switching diffusion (4.1) can be regarded
as the following m0 single diffusions

dX(t) = b(X(t), i)dt + σ(X(t), i)dw(t), i ∈ M, (4.28)

coupled by the discrete-event component α(t) according to the transition


laws specified in (4.2). Often the system is observable only when it operates
in some modes but not all. Accordingly, it is natural to decompose the
discrete state space M into two disjoint subsets M1 and M2 , namely,
M = M1 ∪ M2 , where for each mode i ∈ M2 , the process (4.28) cannot
be stabilized by feedback control, but it can be stabilized for each i ∈ M1 .
Thus we consider feedback control of the following form

u(X(t), α(t)) = −L(α(t))X(t),

where for each i ∈ M, L(i) ∈ Rr×r is a constant matrix. Moreover, if


i ∈ M2 , L(i) = 0. Thus (4.27) can be rewritten as

dX(t) = [b(X(t), α(t)) − B(α(t))L(α(t))X(t)] dt + σ(X(t), α(t))dw(t).


(4.29)
Assume conditions (A3.1) and (A3.2). In other words, we assume that
the nonlinear system is locally linearizable in a neighborhood of ∞. Then
we have the following theorem by applying Theorem 3.28 to (4.29).

Theorem 4.6. If for each i ∈ M1 , there exists a constant matrix L(i) ∈


4.3 Feedback Controls for Weak Stabilization 121

Rr×r such that


X  d
X 
πi λmax b(i) − B(i)L(i) + b0 (i) − L0 (i)B 0 (i) + σj (i)σj0 (i)
i∈M1 j=1
X  d
X 
+ πi λmax b(i) + b0 (i) + σj (i)σj0 (i) < 0,
i∈M2 j=1
(4.30)
then the resulting controlled regime-switching system (4.27) is weakly stabi-
lizable. That is, the controlled regime-switching diffusion is positive recur-
rent.
Theorem 4.6 ensures that under simple conditions, there are many choices
for the matrices L(i) with i ∈ M1 in order to make the regime-switching
diffusion (2.2) be positive recurrent. Take, for example,
L(i) = θi I, i ∈ M1 ,
where I is the r × r identity matrix and θi are nonnegative constants to be
determined. Hence it follows that for i ∈ M1 , we have
 Xd 
0 0 0 0
λmax b(i) − B(i)L(i) + b (i) − L (i)B (i) + σj (i)σj (i)
j=1
 d
X 
≤ λmax b(i) + b0 (i) + σj (i)σj0 (i) + λmax (−B(i)L(i) − L0 (i)B 0 (i))
j=1
 d
X 
= λmax b(i) + b0 (i) + σj (i)σj0 (i) − θi λmin (B(i) + B 0 (i)).
j=1
(4.31)
Assume that for some ι ∈ M1 such that the symmetric matrix B(ι) +
B 0 (ι) is positive definite. Hence λmin (B(ι) + B 0 (ι)) > 0. Then if θι > 0 is
sufficiently large, and θi = 0 for i 6= ι, we have from (4.31) that the left-hand
side of (4.30) is less than 0. Thus the resulting controlled regime-switching
diffusion (4.27) is positive recurrent. We summarize the discussion in the
following theorem.
Theorem 4.7. If for some ι ∈ M1 , the symmetric matrix B(ι) + B 0 (ι)
is positive definite, then there exists a feedback control u(·, ·) such that the
controlled regime-switching diffusion (4.27) is positive recurrent.
Example 4.8. In this example, we apply Theorems 4.6 and 4.7 to stabilize
(in the weak sense) a switching diffusion. Consider a two-dimensional (in
the continuous component) regime-switching diffusion

 dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t)
 P{α(t + ∆) = j|α(t) = i, X(s), α(s), s ≤ t} = q (X(t))∆ + o(∆)
ij
(4.32)
122 4. Ergodicity

and its controlled system



 dX(t) = b(X(t), α(t)) + B(α(t))u(X(t), α(t))dt + σ(X(t), α(t))dw(t)
 P{α(t + ∆) = j|α(t) = i, X(s), α(s), s ≤ t} = q (X(t))∆ + o(∆),
ij
(4.33)
respectively. Suppose that
 
x1 + x 2
b(x1 , x2 , 1) =  ,
2x
 p 2 
3 + 2x22 5
σ(x1 , x2 , 1) =  p ,
0 2
4 + x2 − sin(x1 x2 )
 
2x1 + x2
b(x1 , x2 , 2) =  ,
−x1 + 3x2
 p 
−2 + 1 + x22 0
σ(x1 , x2 , 2) =  p ,
−3 2 + x21

and    
−2 3 2 −3
B(1) =  , B(2) =  .
1 −1 1 3

The generator of α(·) is given by


 
sin x1 cos(x21 ) + sin(x22 ) sin x1 cos(x21 ) + sin(x22 )
 −2 − p 2+ p 
Q(x) =  1 + x21 + x22 1 + x21 + x22 .
 2x2 − cos x1 2x2 − cos x1 
1− −1 +
3 + x21 + x22 2
3 + x 1 + x22

Thus associated with the regime-switching diffusion (4.32), there are two
diffusions
dX(t) = b(X(t), 1)dt + σ(X(t), 1)dw(t), (4.34)
and
dX(t) = b(X(t), 2)dt + σ(X(t), 2)dw(t) (4.35)
switching back and forth from one to another according to the movement
of the discrete component α(t), where w(t) = (w1 (t), w2 (t))0 is a two-
dimensional standard Brownian motion. Assume that the system is observ-
able when the discrete component α(·) is in state 2. Detailed calculations
using the methods in [83, Section 3.8] or [15] allow us to verify that both
(4.34) and (4.35) are transient diffusions; see also the phase portraits in
Figure 4.2 (a) and (b).
4.3 Feedback Controls for Weak Stabilization 123

8
x 10
14

12

10

8
2
x

0
−5 0 5 10 15 20
x1 x 10
8

(a) Phase portrait of (4.34).

12
x 10
4

−2

−4
2
x

−6

−8

−10

−12

−14
−2 0 2 4 6 8 10 12 14 16
x1 12
x 10

(b) Phase portrait of (4.35).

FIGURE 4.2. Phase portraits of transient diffusions in (4.34) and (4.35) with
initial condition x, where x = [−5, 3]0 .
124 4. Ergodicity

Next we use Theorem 3.24 to verify that the switching diffusion (4.32)
is transient. To this end, we compute the matrices b(i), σj (i) for i, j = 1, 2
and Q b as in condition (A3.2):
   √   
1 1 0 2 0 0
b(1) =  , σ1 (1) =   , σ2 (1) =  ,
0 2 0 0 0 1
     
2 1 0 1 0 0
b(2) =   , σ1 (2) =  , σ2 (2) =  ,
−1 3 0 0 1 0

and  
−2 2
b=
Q .
1 −1
Thus the stationary distribution is π = (1/3, 2/3) and

λmin (b(1) + b0 (1) +σ1 (1)σ10 (1) + σ2 (1)σ20 (1))


2
1 X 2
− ρ(σj (1) + σj0 (1)) = 0.3820,
2 j=1
λmin (b(2) + b0 (2) +σ1 (2)σ10 (2) + σ2 (2)σ20 (2))
2
1 X 2
− ρ(σj (2) + σj0 (2)) = 4.
2 j=1

Hence Theorem 3.24 implies that the switching diffusion (4.32) is transient.
Figure 4.3 (a) confirms our analysis.
By our assumption, the switching diffusion is observable when the dis-
crete component α(·) is in state 2. Note that
 
4 −2
B(2) + B 0 (2) =  
−2 6

is symmetric and positive definite. Therefore it follows from Theorem 4.7


that (4.33) is stabilizable in the weak sense. In fact, as we discussed in
Section 4.3, if we take θ1 = 0 and θ2 = −4, then direct computation leads
to

π2 λmax (b(2) + b0 (2) − 4B(2) − 4B 0 (2) + σ1 (2)σ10 (2) + σ2 (2)σ20 (2))


+π1 λmax (b(1) + b0 (1) + σ1 (1)σ10 (1) + σ2 (1)σ20 (1)) = −1.7647 < 0.

Therefore the controlled switching diffusion (4.33) is positive recurrent un-


der the feedback control law
u(x, 1) = 0 and u(x, 2) = −4x. (4.36)
4.4 Ramifications 125

Figure 4.3(b) demonstrates the phase portrait of (4.33) under the feedback
control (4.36). We also demonstrate the component-wise sample path of
(4.33) under the feedback control (4.36) in Figure 4.4(a) and (b).

4.4 Ramifications
Remark on a Tightness Result
Under positive recurrence, we may obtain tightness (or boundedness in the
sense of in probability) of the underlying process. Suppose that (X(t), α(t))
is positive recurrent. We can use the result in Chapter 3 about path excur-
sion (3.46) to prove that for any compact set D (the closure of the open
set D), the set
[
{(X(t), α(t)) : t ≥ 0, X(0) = x, α(0) = α}
x∈D

is tight (or bounded in probability). The idea is along the line of the diffu-
sion counterpart; see [102].
Suppose that for each i ∈ M, there is a Liapunov function V (·, i) such
that minx V (x, i) = 0, V (x, i) → ∞ as |x| → ∞. Let a1 > a0 > 0 and
X(0) = x and α(0) = α. Using the argument in Chapter 3, because recur-
rence is independent of the chosen set, we can work with a fixed ` ∈ M,
and consider the sets

Ba0 = {x ∈ Rr : V (x, `) ≤ a0 },
Ba1 = {x ∈ Rr : V (x, `) ≤ a1 }.

Then Ba0 will be visited by the switching diffusion infinitely often a.s.
Because the switching is positive recurrent, Theorem 3.26 implies that there
is a κ > 0 such that LV (x, i) ≤ −κ for all (x, i) ∈ Bac0 .
Define a sequence of stopping times recursively as

τ1 = min{t : X(t) ∈ Ba0 }


τ2n = min{t > τ2n−1 : X(t) ∈ ∂Ba1 }
τ2n+1 = min{t > τ2n : X(t) ∈ ∂Ba0 }.

Using the argument as in (3.46), we can obtain


a1
Eτ2n (τ2n+1 − τ2n )I{τ2n <∞} ≤ I{τ2n } ,
κ !
I{τn <∞} Pτn sup V (X(t)) ≥ a → 0 as a → ∞,
τn ≤t≤τn+1
126 4. Ergodicity

21
x 10
1

−1
X (t)

−2
2

−3

−4

−5
−8 −7 −6 −5 −4 −3 −2 −1 0 1
X1(t) 21
x 10

(a) Phase portrait of (4.32).

2
X (t)
2

−2

−4

−6
−25 −20 −15 −10 −5 0 5 10
X1(t)

(b) Phase portrait of (4.33).

FIGURE 4.3. Phase portraits of switching diffusion (4.32) and its controlled
system (4.33) under feedback control law (4.36) with initial condition (x, α),
where x = [−5, 3]0 and α = 2.
4.4 Ramifications 127

10
X1(t)
α(t)
5

0
X (t) and α(t)

−5

−10
1

−15

−20

−25
0 2 4 6 8 10
t

(a) Sample path (X1 (t), α(t)) of (4.33).

8
X (t)
2
α(t)
6

4
X (t) and α(t)

0
2

−2

−4

−6
0 2 4 6 8 10
t

(b) Sample path (X2 (t), α(t)) of (4.33).

FIGURE 4.4. Componentwise sample path of (4.33) under feedback control law
(4.36) with initial condition (x, α), where x = [−5, 3]0 and α = 2.
128 4. Ergodicity

uniformly in n ≥ 1. Then there exist ∆i > 0 for i = 0, 1 satisfying


I{τ2n−1 <∞} Pτ2n−1 (τ2n − τ2n−1 ≥ ∆0 ) ≥ ∆1 I{τ2n−1 <∞} . Working with the
estimates up to now and using the argument as in [102, pp. 147–148], we
can show that for any ∆ > 0 and compact set B, there is an a∆ such
that Px (V (X(t), i) ≥ a∆ ) ≤ ∆ for any x ∈ B and t ≥ 0. The condition
V (x, i) → ∞ as |x| → ∞ then implies the desired tightness claim.

Occupation Measures
To illustrate another utility of Theorem 4.4, take f (x, i) = I{B×J} (x, i),
the indicator function of the set B × J, where B ⊂ Rr and J ⊂ M. Then
Theorem 4.4 becomes a result regarding the occupation measure. In fact,
we have
Z XZ
1 T
I{B×J} (X(t), α(t))dt → µ(x, i)dx a.s. as T → ∞.
T 0 Bi∈J

Stochastic Approximation
Consider a parameter optimization problem. We wish to find θ∗ , a vector-
valued parameter so that the cost function
Z
1 T b
J(θ) = lim E J(θ, Y (t))dt
T →∞ T 0

is minimized, where Y (t) is a positive recurrent switching diffusion as con-


b ·, ·) satisfies the conditions of
sidered in this chapter and for each θ, J(θ,
Theorem 4.4. For simplicity, we assume that the gradient of J(·,b x, i) with
respect to θ is available for each x and each i ∈ M. Then we consider a
constant stepsize recursive algorithm
Z
1 nT +T b
θn+1 = θn − ε ∇J(θn , Y (t))dt,
T nT
or a decreasing stepsize algorithm
Z nT +T
1 b n , Y (t))dt,
θn+1 = θn − εn ∇J(θ
T nT
P
where ε > 0, and εn → 0 as n → ∞ and n εn = ∞. Modifications
and variants are possible. For example, additional measurement noise may
be included, and the gradient of J(·)b may be replaced by its gradient es-
timates. The motivation for such algorithms stems from optimization of
average cost per unit time problems arising from parameter estimations
in switching systems of SDEs, manufacturing systems, and queueing net-
works; see related work in [104, Chapter 9] and [174]. The ergodicity of the
switching diffusion is crucial in the study of the asymptotic behavior of the
algorithms.
4.5 Asymptotic Distribution 129

4.5 Asymptotic Distribution


Based on ergodicity of hybrid diffusions, this section is concerned with a
centered and scaled sequence. It begins with a lemma on the convergence of
the transition density to the invariant density, followed by the verification
of a couple of lemmas. Then a result concerning the correlation function is
presented. Finally, an asymptotic distribution result is established. Here we
work with a compact state space for the x component. To be more specific,
we assume that the state space is a torus S in Rr . More general compact
manifolds can also be considered with essentially the same argument.
The following lemma can be obtained in the same spirit as that of [74].
We omit the detailed argument and refer the reader to the reference.

Lemma 4.9. Denote by p(y, j; t, x, i) the transition density of the process


(X(t), α(t)) with

Py,j (X(t) ∈ S, α(t) = i) = P(X(t) ∈ S, α(t) = i X(0) = y, α(0) = j))


Z
= p(y, j; t, x, i)dx,
S

where S ∈ B(S). Then p(y, j; t, x, i) converges exponentially fast to µ(·, ·).


That is, for some C > 0 and κ3 > 0,

|p(y, j; t, x, i) − µ(x, i)| ≤ C exp(−κ3 t) for any (y, j), (x, i) ∈ S × M,


(4.37)
where µ(x, i) is the invariant density as in [74], which is the solution of the
system of Kolmogorov–Fokker–Planck equations

L∗ g(x, i) = 0, for each i ∈ M,


m0 Z
X (4.38)
g(x, i)dx = 1,
i=1 S

where L∗ is the adjoint operator of L defined in (3.1).

Lemma 4.10. Let the assumptions of Lemma 4.9 be satisfied. Suppose


that f (·, ·) : S × M 7→ R is a Borel measurable function. Then for any
0 ≤ s ≤ t, we have

EX(s),α(s) f (X(t), α(t)) − Ef (X(t), α(t)) ≤ C exp(−κ3 (t − s)), (4.39)

for some positive constant C and κ3 given by (4.37), where Ef (X(t), α(t))
is the expectation with respect to the stationary measure
m0 Z
X def
Ef (X(t), α(t)) = f (x, i)µ(x, i)dx = fav . (4.40)
i=1 S
130 4. Ergodicity

Proof. Note that


m0
X
f (X(t), α(t)) = f (X(t), i)I{α(t)=i} .
i=1

Inasmuch as we work with S, a torus in Rr ,

m0 Z
X
|f (x, i)|dx < ∞.
i=1 S

Moreover,

EX(s),α(s) f (X(t), α(t))


m0
X
= EX(s),α(s) f (X(t), i)I{α(t)=i}
i=1
m0 Z
X
= f (x, i)PX(s),α(s) (X(t) ∈ dx, α(t) = i)
i=1 S
m0 Z
X
= f (x, i)p(X(s), α(s); t − s, x, i)dx.
i=1 S

Thus, in view of (4.40), we have from (4.37) that

EX(s),α(s) f (X(t), α(t)) − Ef (X(t), α(t))


Xm0 Z
= f (x, i)[p(X(s), α(s), t − s, x, i) − µ(x, i)]dx
i=1 S

≤ C exp(−κ3 (t − s)).

Hence the lemma follows. 2


Using Lemma 4.10, we obtain another lemma.

Lemma 4.11. Assume the conditions of Lemma 4.10. Then the following
assertion holds:

|E[f (X(t), α(t)) − fav ][f (X(s), α(s)) − fav ]|


(4.41)
≤ C exp(−κ3 |t − s|)

for some positive constant C.

Proof. Assume without loss of generality that s ≤ t. Note that from (4.40),
4.5 Asymptotic Distribution 131

Ef (X(u), α(u)) − fav = 0 for any u ≥ 0. Hence we have

|E[f (X(t), α(t)) − fav ][f (X(s), α(s)) − fav ]|


= E[f (X(t), α(t)) − fav ][f (X(s), α(s)) − fav ]
−E[f (X(t), α(t)) − fav ]E[f (X(s), α(s)) − fav ]

= E [f (X(s), α(s)) − fav ](EX(s),α(s) [f (X(t), α(t)) − fav ]
−E[f (X(t), α(t)) − fav ])
≤ CE1/2 |f (X(s), α(s)) − fav |2
2
×E1/2 EX(s),α(s) [f (X(t), α(t)) − fav ] − E[f (X(t), α(t)) − fav ]
≤ C exp(−κ3 (t − s))

by virtue of Lemma 4.10. In the above, from the next to the last line to
the last line, we used C as a generic positive constant whose values may be
different for different appearances. The proof of the lemma is concluded. 2
In what follows, denote

ρ(t − s) = E[f (X(t), α(t)) − fav ][f (X(s), α(s)) − fav ], 0 ≤ s ≤ t ≤ T.


(4.42)
This is the covariance function. The process is time homogeneous, thus it
is a function of the time difference only.
Lemma 4.12. Assume the conditions of Lemma 4.10 hold. Then
Z ∞
|ρ(t)|dt < ∞. (4.43)
0

Proof. In view of (4.42), we have from Lemma 4.11 that

|ρ(t)| = |ρ(s + t − s)|


= |E[f (X(s + t), α(s + t)) − fav ][f (X(s), α(s)) − fav ]|
≤ C exp(−κ3 t).

Then equation (4.43) follows immediately. 2


Theorem 4.13. Let the conditions of Lemma 4.10 be satisfied. Then
Z T
1 2
√ [f (X(t), α(t)) − fav ]dt converges in distribution to N (0, σav ),
T 0
(4.44)
a normal random variable with mean 0 and variance
Z ∞
2
σav = 2 ρ(t)dt.
0
132 4. Ergodicity

Proof. Define
Z T
1
ζ(T ) = [f (X(t), α(t)) − fav ]dt, and ξ(T ) = √ ζ(T ). (4.45)
0 T
Note that
Z T
1
Eξ(T ) = E √ [f (X(t), α(t)) − fav ]dt
TZ 0
T
1
=√ E[f (X(t), α(t)) − fav ]dt = 0.
T 0
To calculate the asymptotic variance, note that
" Z T #2
1
E √ [f (X(t), α(t)) − fav ]dt
T 0
Z Z
1 T T
=E [f (X(t), α(t)) − fav ][f (X(s), α(s)) − fav ]dsdt
T Z0 Z0
1 T t
= 2E [f (X(t), α(t)) − fav ][f (X(s), α(s)) − fav ]dsdt.
T 0 0
(4.46)
Equation (4.42) and changing variables lead to
Z Z
1 T t
E [f (X(t), α(t)) − fav ][f (X(s), α(s)) − fav ]dsdt
T 0 0
Z Z
1 T t
= ρ(t − s)dsdt
T 0 0
Z TZ T (4.47)
1
= ρ(u)dtdu
T 0 u
Z T
u
= 1− ρ(u)du.
0 T
Choose some 0 < ∆ < 1. Then by virtue of Lemma 4.12, it follows that as
T → ∞,
Z T Z T∆ Z T
u u u
ρ(u)du = ρ(u)du + ρ(u)du
0 T 0 T T ∆ T
Z ∆ Z
T∆ T T T
≤ ρ(u)du + ρ(u)du
T 0 T T∆
→ 0.

Combining the above, we arrive at


Z Z
1 T t
E [f (X(t), α(t)) − fav ][f (X(s), α(s)) − fav ]dsdt
T 0Z 0

→ ρ(t)dt as T → ∞.
0
4.6 Notes 133

Therefore, we have
Z ∞
E[ξ(T )]2 → σav
2
=2 ρ(t)dt as T → ∞. (4.48)
0

The desired result thus follows. 2

4.6 Notes
Many applications of diffusion processes without switching require using
invariant distributions of the underlying processes. One of them is the for-
mulation of two-time-scale diffusions. Dealing with diffusions having fast
and slow components, under the framework of diffusion approximation, it
has been shown that in the limit, the slow component is averaged out with
respect to the invariant measure of the fast component; see Khasminskii
[82] for a weak convergence limit result, and Papanicolaou, Stroock, and
Varadhan [131] for using a martingale problem formulation. In any event, a
crucial step in these references is the use of the invariance measure. In this
chapter, in contrast to the diffusion counterpart, we treat regime-switching
diffusions, where in addition to the continuous component, there are dis-
crete events. Our interest has been devoted to obtaining ergodicity.
For regime-switching diffusions, asymptotic stability for the density of
the two-state random process (X(t), α(t)) was established in [136]; asymp-
totic stability in distribution (or the convergence to the stationary mea-
sures of the switching diffusions) for the process (X(t), α(t)) was obtained
in [6, 183]. Here, we addressed ergodicity for (X(t), α(t)) under different
conditions from those in [6, 136, 183].
Taking into consideration of the many applications in which discrete
events and continuous dynamics are intertwined and the discrete-event
process depends on the continuous state, we allow the discrete compo-
nent α(·) to have x-dependent generator Q(x). Another highlight is that
we obtain the explicit representation of the invariant measure of the process
(X(t), α(t)) by considering certain cylinder sets and by defining cycles ap-
propriately. As a byproduct, we demonstrate a strong law of large numbers
type theorem for positive recurrent regime-switching diffusions. It reveals
that positive recurrence and ergodicity of switching diffusions are equiva-
lent.
In this chapter, we first developed ergodicity for positive recurrent regime-
switching diffusions. Focusing on a compact space, we then obtained asymp-
totic distributions as a consequence of the ergodicity of the process. The
asymptotic distributions are important in treating limit ergodic control
problems as well as in applications of Markov chain Monte Carlo. A crucial
step in the proof is the verification of the centered and scaled process being
a mixing process with exponential mixing rate.
134 4. Ergodicity

A number of important problems remain open. Obtaining large deviation-


type bounds is a worthwhile undertaking, which will have an important
impact on studying the associated control and optimization problems.
5
Numerical Approximation

5.1 Introduction
As is the case for deterministic dynamic systems or stochastic differential
equations, closed-form solutions for switching diffusions are often difficult
to obtain, and numerical approximation is frequently a viable or possi-
bly the only alternative. Being extremely important, numerical methods
have drawn much attention. To date, a number of works (e.g., [120, 121,
122, 171]) have focused on numerical approximations where the switching
process is independent of the continuous component and is modeled by
a continuous-time Markov chain. In addition to the numerical methods,
approximation to invariant measures and non-Lipschitz data were dealt
with. Nevertheless, it is necessary to be able to handle the coupling and
dependence of the continuous states and discrete events. This chapter is de-
voted to numerical approximation methods for switching diffusions whose
switching component is x-dependent. Section 5.2 presents the setup of the
problem. Section 5.3 suggests numerical algorithms. Section 5.4 establishes
the convergence of the numerical algorithms. Section 5.5 proceeds with a
couple of examples. Section 5.6 gives a few remarks concerning the rates of
convergence of the algorithms and the study on decreasing stepsize algo-
rithms. Finally Section 5.7 concludes the chapter.

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 137
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_5,
© Springer Science + Business Media, LLC 2010
138 5. Numerical Approximation

5.2 Formulation
Let M = {1, . . . , m0 } and consider the hybrid diffusion system

dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t),


(5.1)
X(0) = X0 , α(0) = α0 ,

and
P(α(t + ∆) = j|α(t) = i, X(s), α(s), s ≤ t)
(5.2)
= qij (X(t))∆ + o(∆), i 6= j,
where w(·) is an r-dimensional standard Brownian motion, X(t) ∈ Rr ,
b(·, ·) : Rr × M 7→ Rr , σ(·, ·) : Rr × M 7→ Rr×r are appropriate functions
m0 ×m0
satisfying certain regularity conditions, and Q(x)Pm=0 (qij (x)) ∈ R
satisfies that for each x, qij (x) ≥ 0 for i 6= j, j=1 qij (x) = 0 for each
i ∈ M. There is an associated operator for the switching diffusion process
defined as follows. For each i ∈ M and suitable smooth function h(·, i),
define an operator
1
Lh(x, i) = ∇h0 (x, i)b(x, i) + tr[∇2 h(x, i)σ(x, i)σ 0 (x, i)]
m0
2
X (5.3)
+ qij (x)h(x, j).
j=1

In this chapter, our aim is to construct numerical approximation schemes


for solving (5.1). Note that in our setup, Q(x), the generator of the switch-
ing process α(t), taking values in M = {1, . . . , m0 }, is state dependent. It
is the x-dependence of Q(x) that makes the analysis much more difficult.
One of the main difficulties is that due to the continuous-state dependence,
α(t) and X(t) are dependent; α(t) is a Markov chain only for a fixed x but
is otherwise non-Markovian. The essence of our approach is to treat the
pair of processes (X(t), α(t)) jointly; the two-component process turns out
to be Markovian. Nevertheless, much care needs to be exercised in handling
the mixture distributions. To proceed, we use the following conditions.

(A5.1) The function Q(·) : Rr 7→ Rm0 ×m0 is bounded and continuous.


(A5.2) The functions b(·, ·) and σ(·, ·) satisfy
(a) |b(x, α)| ≤ K(1 + |x|), |σ(x, α)| ≤ K(1 + |x|), and
(b) |b(x, α) − b(z, α)| ≤ K0 |x − z| and |σ(x, α) − σ(z, α)| ≤
K0 |x − z| for some K > 0 and K0 > 0 and for all x, z ∈ Rr
and α ∈ M.

We first construct Euler’s scheme with a constant stepsize for approxi-


mating solutions of switching diffusions. It should be mentioned in partic-
ular, our analysis differs from the usual approach. To obtain convergence
5.3 Numerical Algorithms 139

of the algorithms, we first obtain weak convergence of the algorithm by


means of martingale problem formulation. This is particularly suited for
x-dependent switching processes because the convergence result would be
much more difficult to obtain otherwise. We then obtain the convergence in
the sense of L2 . As a demonstration, we provide numerical experiments to
delineate sample path properties of the approximating solutions. Further
discussion of uniform convergence in the sense of L2 and the associated
rates of convergence are derived. In addition, we present a decreasing step-
size algorithm. Also provided in this chapter is a brief discussion of rates
of convergence.

5.3 Numerical Algorithms


To approximate (5.1), choosing a sequence of independent and identically
distributed random variables {ξn } with mean 0 and finite variance, we
propose the following algorithm

Xn+1 = Xn + εb(Xn , αn ) + εσ(Xn , αn )ξn . (5.4)

We proceed to describe the terms involved above. We would like to have


αn be a discrete-time stochastic process that approximates α(t) in an ap-
propriate sense. It is natural when Xn−1 = x, that αn has a transition
probability matrix exp(Q(x)ε). It is easily seen that the transition ma-
trix may be approximated further by I + εQ(x) + O(ε2 ) by virtue of the
boundedness and the continuity of Q(·). Based on this observation, in what
follows, we discard the O(ε2 ) term and simply use I + εQ(x) for the transi-
tion matrix for αn when Xn−1 = x. To approximate the Brownian motion,
we use {ξn }, a sequence of independent and identically distributed random
variables with 0 mean and finite variance. We put what has been said above
into the following assumption.

(A5.3) In (5.4), for each n, when Xn−1 = x, αn has the transition


matrix I + εQ(x), and {ξn } is a sequence of independent and
identically distributed variables such that ξn is independent of
the σ-algebra Gn generated by {Xk , αk : k ≤ n}, that Eξn = 0,
E|ξn |p < ∞, p ≥ 2, and that Eξn ξn0 = I.

Remark 5.1. One of the features of (5.4) is that it is easily implementable.


In lieu of discretizing a Brownian motion, we generate a sequence of inde-
pendent and identically distributed normal random variables to approx-
imate the Brownian motion. This facilitates the computational task. In
addition, instead of using transition matrix exp(εQ(x)) for a fixed x, we
use another fold of approximation I + εQ(x) based on a truncated Tay-
lor series. All of these stem from consideration of numerical computation
140 5. Numerical Approximation

and Monte Carlo implementation. For simplicity, we have chosen ξn to be


Gaussian. In fact, any sequence with mean 0 and finite second moment can
be used. Moreover, correlated sequence {ξn } can be used as well. However,
from a Monte Carlo perspective, there seems to be no strong reason that
we need to use correlated sequences.

5.4 Convergence of the Algorithm


5.4.1 Moment Estimates
Throughout this chapter, we assume the stepsize ε < 1. Note that by T /ε,
we mean the integer part of T /ε, that is, bT /εc. However, for simplicity,
we do not use the floor function notation most of the time and retain this
notation only if it is necessary. We first obtain an estimate on the pth
moment of {Xn }. This is stated as follows.

Lemma 5.2. Under (A5.1)–(A5.3), for any fixed p ≥ 2 and T > 0,

sup E|Xn |p ≤ (|X0 |p + KT ) exp(KT ) < ∞. (5.5)


0≤n≤T /ε

Proof. Define U (x) = |x|p and use En to denote the conditional expecta-
tion with respect to the σ-algebra Gn , where Gn was given in (A5.3). Note
that

En σ(Xn , αn )ξn = σ(Xn , αn )En ξn = 0 and


En |σ(Xn , αn )|2 |ξn |2 = |σ(Xn , αn )|2 En |ξn |2 ≤ K|σ(Xn , αn )|2 ,

where K is a generic positive constant. Thus

En U (Xn+1 ) − U (Xn ) = En ∇U 0 (Xn )[Xn+1 − Xn ]


+En (Xn+1 − Xn )0 ∇2 U (Xn+ )(Xn+1 − Xn )
≤ ε∇U (Xn )b(Xn , αn ) + Kε|Xn |p−2 (1 + |Xn |2 )
≤ Kε(1 + |Xn |p ),
(5.6)
where ∇U and ∇2 U denotes the gradient and Hessian of U with respect to
to x, and Xn+ denotes a vector on the line segment joining Xn and Xn+1 .
Note that in the last line of (5.6), we have used the linear growth in x for
both b(·, ·) and σ(·, ·). Because U (Xn ) = |Xn |p , we obtain

En |Xn+1 |p ≤ |Xn |p + Kε + Kε|Xn |p .


5.4 Convergence of the Algorithm 141

Taking the expectation on both sides and iterating on the resulting recur-
sion, we obtain
n
X
E|Xn+1 |p ≤ |X0 |p + Kεn + Kε E|Xk |p .
k=0

An application of Gronwall’s inequality yields that

E|Xn+1 |p ≤ (|X0 |p + KT ) exp(KT )

as desired. 2
Remark 5.3. In view of the estimate above, {Xn : 0 ≤ n ≤ T /ε} is tight
in Rr by means of the well-known Tchebyshev
p inequality. That is, for each
η > 0, there is a Kη satisfying Kη > (1/η) such that

sup E|Xn |2
0≤n≤T /ε
P(|Xn | > Kη ) ≤ ≤ Kη.
Kη2

This indicates that the sequence of iterates is “mass preserving” or no


probability is lost. To proceed, take continuous-time interpolations defined
by
X ε (t) = Xn , αε (t) = αn , for t ∈ [nε, nε + ε). (5.7)
We show that X ε (·) and αε (·) are tight in suitable function spaces.
Lemma 5.4. Assume (A5.1)–(A5.3). Define

χn = (I{αn =1} , . . . , I{αn =m} ) ∈ R1×m and


χε (t) = χn , for t ∈ [εn, εn + ε).

Then for any t, s > 0,

E[χε (t + s) − χε (t) Ftε ] = O(s), (5.8)

where Ftε denotes the σ-algebra generated by {X ε (u), αε (u) : u ≤ t}.

Proof. First note that by the boundedness and the continuity of Q(·), for
each i ∈ M,
m0
X
E[I{αk+1 =j} − I{αk =i} Gk ]
j=1
X
= E[I{αk+1 =j} Gk ]
j6=i,j∈M
X
= εqij (Xk )I{αk =i} = O(ε)I{αk =i} .
j6=i,j∈M
142 5. Numerical Approximation

Note that we have used the ijth entry of I + εQ(Xk ),

(I + εQ(Xk ))ij = δij + εqij (Xk ),

where Gn is given in (A5.3), and



1, if i = j,
δij =
0, otherwise.

It then follows that there is a random function ge(·) such that


(t+s)/ε−1
X
E[ [χk+1 − χk ] Ftε ]
k=t/ε
(t+s)/ε−1
X
= E[ E[χk+1 − χk Gk ] Ftε ] (5.9)
k=t/ε

= ge(t + s − t)
= ge(s),

and that
Ee
g (s) = O(s).
In the above, we have used the convention that t/ε and (t + s)/ε denote
the integer parts of t/ε and (t + s)/ε, respectively. Inasmuch as

h (t+s)/ε−1
X i
ε ε
E χ (t + s) − χ (t) − [χk+1 − χk ] Ftε = 0,
k=t/ε

it follows from (5.9) that

E[χε (t + s) Ftε ] = χε (t) + ge(s).

The desired result then follows. 2


Lemma 5.5. Under the conditions of Lemma 5.4, {αε (·)} is tight.

Proof. By virtue of Lemma 5.4,

E[|χε (t + s) − χε (t)|2 Ftε ]


= E[χε (t + s)χε,0 (t + s) − 2χε (t + s)χε,0 (t) + χε (t)χε,0 (t) Ftε ]
m0
X
= E[I{α(t+s)/ε =i} − 2I{α(t+s)/ε =i} I{αt/ε=i } + I{αt/ε =i} Ftε ].
i=1
(5.10)
5.4 Convergence of the Algorithm 143

The estimates in Lemma 5.4 then imply that

lim lim sup E[E|χε (t + s) − χε (t)|2 Ftε ] = 0.


s→0 ε→0

The tightness criterion in [102, p. 47] yields that {χε (·)} is tight. Conse-
quently, {αε (·)} is tight. 2

Lemma 5.6. Assume that the conditions of Lemma 5.5 are satisfied. Then
{X ε (·)} is tight in D r ([0, ∞) : Rr ), the space of functions that are right
continuous and have left limits, endowed with the Skorohod topology.

Proof. For any η > 0, t ≥ 0, 0 ≤ s ≤ η, we have

E|X ε (t + s) − X ε (t)|2
(t+s)/ε−1 (t+s)/ε−1 2
X √ X
=Eε b(Xk , αk ) + ε σ(Xk , αk )ξk
k=t/ε k=t/ε
(t+s)/ε−1 (t+s)/ε−1
X X
≤ Kε2 (1 + E|Xk |2 ) + Kε E|σ(Xk , αk )|2 E|ξk |2
k=t/ε k=t/ε
(t+s)/ε−1
X
≤ Kε2 (1 + sup E|Xk |2 )
t/ε≤k≤(t+s)/ε−1
k=t/ε
(t+s)/ε−1
X
+Kε (1 + sup E|Xk |2 )
t/ε≤k≤(t+s)/ε−1
 k=t/ε 
t+s t
≤O − O(ε) = O(s).
ε ε
(5.11)
In the above, we have used Lemma 5.2 to ensure that

sup E|Xk |2 < ∞.


t/ε≤k≤(t+s)/ε−1

Therefore, (5.11) leads to

lim lim sup E|X ε (t + s) − X ε (t)|2 = 0.


η→0 ε→0

The tightness of {X ε (·)} then follows from [102, p. 47]. 2


Combining Lemmas 5.4-5.6, we obtain the following tightness result.

Lemma 5.7. Under assumptions (A5.1)–(A5.3), {X ε (·), αε (·)} is tight in


D([0, ∞) : Rr × M).
144 5. Numerical Approximation

5.4.2 Weak Convergence


The main result of this section is the following theorem.

Theorem 5.8. Under (A5.1)–(A5.3), (X ε (·), αε (·)) converges weakly to


(X(·), α(·)), which is a process with generator given by (5.3).

Proof. Because (X ε (·), αε (·)) is tight, by Prohorov’s theorem (see Theo-


rem A.20 of this book and also [43, 104]), we may select a convergent sub-
sequence. For simplicity, we still denote the subsequence by (X ε (·), αε (·))
e
with the limit denoted by (X(·), e(·)). We proceed to characterize the limit.
α
By Skorohod representation (see Theorem A.21 and also [43, 104]), with-
out loss of generality and without changing notation, we may assume that
e
(X ε (·), αε (·)) converges to (X(·), e(·)) w.p.1, and the convergence is uni-
α
form on each bounded interval. We proceed to characterize the limit pro-
cess.
Step 1: We first work with the marginal of the switching component, and
characterize the limit of αε (·). The weak convergence of αε (·) to α
e(·) yields
that χε (·) converges to χ(·) weakly. For each t > 0 and s > 0, each positive
integer κ, each 0 ≤ tι ≤ t with ι ≤ κ, and each bounded and continuous
function ρι (·, i) for each i ∈ M,

κ
Y h (t+s)/ε−1
X i
E ρι (X ε (tι ), αε (tι )) χε (t+s)−χε (t)− (χk+1 −χk ) = 0. (5.12)
ι=1 k=t/ε

The weak convergence of χε (·) to χ(·) and the Skorohod representation


imply that

κ
Y
lim E ρι (X ε (tι ), αε (tι ))[χε (t + s) − χε (t)]
ε→0
ι=1
κ
Y
=E e ι ), α
ρι (X(t e(tι ))[χ(t + s) − χ(t)].
ι=1

Pick out a sequence {nε } of nonnegative real numbers such that

nε → ∞ as ε → 0 but δε = εnε → 0.

Use Ξεl , the set of indices defined by

Ξεl = {k : lnε ≤ k ≤ lnε + nε − 1}, (5.13)


5.4 Convergence of the Algorithm 145

as a base for partition. Then the continuity together with the boundedness
of Q(·) implies that

κ
Y h (t+s)/ε−1
X i
lim E ρι (X ε (tι ), αε (tι )) (χk+1 − χk )
ε→0
ι=1 k=t/ε
κ
Y h (t+s)/ε−1
X i
= lim E ρι (X ε (tι ), αε (tι )) (E(χk+1 Gk ) − χk )
ε→0
ι=1 k=t/ε
κ
Y h (t+s)/ε−1
X lnεX +nε −1 i
= lim E ρι (X ε (tι ), αε (tι )) χk (I + εQ(Xk )− I)
ε→0
ι=1 lnε =t/ε k=lnε
κ
Y h (t+s)/ε−1
X 1
+nε −1
lnεX i
= lim E ρι (X ε (tι ), αε (tι )) δε χk Q(Xlnε ) .
ε→0
ι=1

lnε =t/ε k=lnε
(5.14)
Note that

κ
Y h (t+s)/ε−1
X 1
+nε −1
lnεX i
lim E ρι (X ε (tι ), αε (tι )) δε [χk − χlnε ]Q(Xlnε )
ε→0
ι=1

lnε =t/ε k=lnε
κ
Y
= lim E ρι (X ε (tι ), αε (tι ))
ε→0
ι=1
h (t+s)/ε−1
X 1
lnεX
+nε −1 i
× δε E[χk − χlnε Glnε ]Q(Xlnε )

lnε =t/ε k=lnε

= lim E ρι (X ε (tι ), αε (tι ))
ε→0
ι=1
h (t+s)/ε−1
X 1
lnεX
+nε −1 i
× δε Eχlnε [(I + εQ(Xlnε ))k−lnε − I]Q(Xlnε )

lnε =t/ε k=lnε

= 0.
(5.15)
Therefore,

κ
Y h (t+s)/ε−1
X 1
lnεX
+nε −1 i
lim E ρι (X ε (tι ), αε (tι )) δε χk Q(Xlnε )
ε→0
ι=1

lnε =t/ε k=lnε
κ
Y h (t+s)/ε−1
X i
= lim E ρι (X ε (tι ), αε (tι )) δε χlnε Q(Xlnε )
ε→0
ι=1 lnε =t/ε
κ
Y Z t+s
=E e ι ), α
ρι (X(t e(tι )) e
χ(u)Q(X(u))du.
ι=1 t
(5.16)
146 5. Numerical Approximation

Moreover, the limit does not depend on the chosen subsequence. Thus,
κ
Y h Z t+s i
E e e(tι )) χ(t + s) − χ(t) −
ρι (X(tι ), α e
χ(u)Q(X(u))du = 0. (5.17)
ι=1 t

e
e(·) has a generator Q(X(·)).
Therefore, the limit process α
Step 2: For t, s, κ, tι as chosen before, for each bounded and continuous
function ρι (·, i), and for each twice continuously differentiable function with
compact support h(·, i) with i ∈ M, we show that
κ
Y
E e ι ), α
ρι (X(t e + s), α
e(tι ))[h(X(t e(t + s))
ι=1 Z t+s (5.18)
e
− h(X(t), e(t)) −
α e
Lh(X(u), e(u))du] = 0.
α
t

This yields that


Z t
e
h(X(t), e(t)) −
α e
Lh(X(u), e(u))du,
α
0

e
is a continuous-time martingale, which in turn implies that (X(·), αe(·)) is
a solution of the martingale problem with operator L defined in (5.3).
To establish the desired result, we work with the sequence (X ε (·), αε (·)).
Again, we use the sequence {nε } as in Step 1. By virtue of the weak con-
vergence and the Skorohod representation, it is readily seen that
κ
Y
E ρι (X ε (tι ), αε (tι ))[h(X ε (t + s), αε (t + s)) − h(X ε (t), αε (t))]
ι=1
κ
Y
→E e ι ), α
ρι (X(t e + s), α
e(tι ))[h(X(t e
e(t + s)) − h(X(t), e(t))]
α
ι=1
(5.19)
as ε → 0. On the other hand, direct calculation shows that
κ
Y
E ρι (X ε (tι ), αε (tι ))[h(X ε (t + s), αε (t + s)) − h(X ε (t), αε (t))]
ι=1
κ
Y
=E ρι (X ε (tι ), αε (tι ))
ι=1
X h
n (t+s)/ε−1
× [h(Xlnε +nε , αlnε +nε ) − h(Xlnε +nε , αlnε )]
lnε =t/ε
io
+[h(Xlnε +nε , αlnε ) − h(Xlnε , αlnε )] .
(5.20)
5.4 Convergence of the Algorithm 147

Step 3: Still use the notation Ξε` defined in (5.13). For the terms on the
last line of (5.20), we have
κ
Y
lim E ρι (X ε (tι ), αε (tι ))
ε→0
ι=1
(t+s)/ε−1
X
× [h(Xlnε +nε , αlnε ) − h(Xlnε , αlnε )]
lnε =t/ε

= lim E ρι (X ε (tι ), αε (tι )) (5.21)
ε→0
ι=1
n (t+s)/ε−1 h
X lnεX
+nε −1
× ε∇h0 (Xlnε , αlnε ) b(Xk , αlnε )
lnε =t/ε k=lnε

ε
lnεX
+nε −1 io
+ tr[∇2 h(Xlnε , αlnε )σ(Xk , αlnε )σ 0 (Xk , αlnε )] .
2
k=lnε

By the continuity of b(·, i) for each i ∈ M and the choice of nε ,


κ
Y
lim E ρι (X ε (tι ), αε (tι ))
ε→0
ι=1
n(t+s)/ε−1
X
× δε ∇h0 (Xlnε , αlnε )
lnε =t/ε

1
lnεX+nε −1 o
× [b(Xk , αlnε ) − b(Xlnε , αlnε )]

k=lnε

= 0.

Thus, in evaluating the limit, for lnε ≤ k ≤ lnε + nε − 1, b(Xk , αlnε ) can
be replaced by b(Xlnε , αlnε ).
The choice of nε implies that εlnε → u as ε → 0 yielding εk → u for all
lnε ≤ k ≤ lnε + nε . Consequently, by weak convergence and the Skorohod
representation, we obtain
κ
Y n (t+s)/ε−1
X
def
L1 = lim E ρι (X ε (tι ), αε (tι )) ε∇h0 (Xlnε , αlnε )
ε→0
ι=1 lnε =t/ε
lnεX
+nε −1 o
× b(Xk , αlnε )
k=lnε (5.22)

ε ε
= lim E ρι (X (tι ), α (tι ))
ε→0
ι=1
n (t+s)/ε−1
X 1
lnεX
+nε −1 o
× δε ∇h0 (Xlnε , αlnε ) b(Xlnε , αlnε ) .

lnε =t/ε k=lnε
148 5. Numerical Approximation

Thus,
κ
Y
L1 = lim E ρι (X ε (tι ), αε (tι ))
ε→0
ι=1
n (t+s)/ε−1
X o
× δε ∇h0 (Xlnε , αlnε )b(X ε (lδε ), αε (lδε ))
lnε =t/ε
κ
Y nZ t+s o
=E e ι ), α
ρι (X(t e(tι )) e
∇h0 (X(u), e
e(u))b(X(u),
α e(u))du .
α
ι=1 t
(5.23)
In the above, treating such terms as b(X ε (lδε ), αε (lδε )), we can approx-
imate X ε (·) by a process taking finitely many values using a standard
approximation argument (see, e.g., [104, p. 169] for more details).
Similar to (5.23), we also obtain
κ
Y
def
L2 = lim E ρι (X ε (tι ), αε (tι ))
ε→0
ι=1
nε (t+s)/ε−1 lnε +nε −1
X X o
× tr[∇ h(Xlnε , αlnε )σ(Xk , αlnε )σ 0 (Xk , αlnε )]
2
2
lnε =t/ε k=lnε
κ
Y
=E e ι ), α
ρι (X(t e(tι ))
ι=1
Z
n t+s
1 o
× e
tr[∇2 h(X(u), e
e(u))σ(X(u),
α α e
e(u))σ 0 (X(u), e(u))]du .
α
t 2
(5.24)
Step 4: We next examine the terms on the next to the last line of (5.20).
First, again using the continuity, weak convergence, and the Skorohod rep-
resentation, it can be shown that

κ
Y h (t+s)/ε−1
X
lim E ρι (X ε (tι ), αε (tι )) [h(Xlnε +nε , αlnε +nε )
ε→0
ι=1 lnε =t/ε
i
−h(Xlnε +nε , αlnε )]
κ
Y (5.25)
= lim E ρι (X ε (tι ), αε (tι ))
ε→0
ι=1
h (t+s)/ε−1
X i
× [h(Xlnε , αlnε +nε ) − h(Xlnε , αlnε )] .
lnε =t/ε

That is, as far as asymptotic analysis is concerned, owing to the choice of


{nε } and the continuity of h(·, i), the term

h(Xlnε +nε , αlnε +nε ) − h(Xlnε +nε , αlnε )


5.4 Convergence of the Algorithm 149

in the next to the last line of (5.20) can be replaced by

h(Xlnε , αlnε +nε ) − h(Xlnε , αlnε )

with an error tending to 0 in probability as ε → 0 uniformly in t. It follows


that

κ
Y n (t+s)/ε−1
X o
ε ε
lim E ρι (X (tι ), α (tι )) [h(Xlnε , αlnε +nε ) − h(Xlnε , αlnε )]
ε→0
ι=1 lnε =t/ε
κ
Y
= lim E ρι (X ε (tι ), αε (tι ))
ε→0
ι=1
n (t+s)/ε−1
X lnεX +nε −1 o
× [h(Xlnε , αk+1 ) − h(Xlnε , αk )]
lnε =t/ε k=lnε
Y κ
= lim E ρι (X ε (tι ), αε (tι ))
ε→0
ι=1
n (t+s)/ε−1
X lnεX +nε −1 X m0
m0 X h
× E [h(Xlnε , i)I{αk+1 =i}
lnε =t/ε k=lnε i=1 i1 =1
io
−h(Xlnε , i1 )I{αk =i1 } ] Gk .
(5.26)
Note that for k ≥ lnε ,

h i
E [h(Xlnε , i)I{αk+1 =i} − h(Xlnε , i1 )I{αk =i1 } ] Gk
= [h(Xlnε , i)P(αk+1 = i Gk , αk = i1 ) − h(Xlnε , i1 )]I{αk =i1 }
(5.27)
= [h(Xlnε , i)(δi1 i + εqi1 i (Xk )) − h(Xlnε , i1 )]I{αk =i1 }
= εh(Xlnε , i)qi1 i (Xk )I{αk =i1 } .

Using (5.27) in (5.26) and noting the continuity and boundedness of Q(·),
we can replace qi1 i (Xk ) by qi1 i (Xlnε ) yielding the same limit. Then as in
(5.15) and (5.16), replace I{αk =i1 } by I{αε (εlnε )=i1 } , again yielding the same
limit. Thus, we have

κ
Y n (t+s)/ε−1
X o
lim E ρι (X ε (tι ), αε (tι )) [h(Xlnε , αlnε +nε ) − h(Xlnε , αlnε )]
ε→0
ι=1 lnε =t/ε
κ
Y nZ t+s o
=E e ι ), α
ρι (X(t e(tι )) e
Q(X(u))h( e
X(u), ·)(e
α(u))du ,
ι=1 t
(5.28)
150 5. Numerical Approximation

where for each i1 ∈ M,


m0
X
Q(x)h(x, ·)(i1 ) = qi1 i (x)h(x, i)
i=1
X
= qi1 i (x)(h(x, i) − h(x, i1 )).
i6=i1

Step 5: Combining Steps 1–4, we arrive at that (X(·), e e(·)), the weak
α
limit of (X ε (·), αε (·)) is a solution of the martingale problem with oper-
ator L defined in (5.3). Using characteristic functions, we can show as in
[176, Lemma 7.18], (X(·), α(·)), the solution of the martingale problem with
operator L, is unique in the sense of in distribution. Thus (X ε (·), αε (·)) con-
verges to (X(·), α(·)) as desired, which concludes the proof of the theorem.
2
In addition, we can obtain the following convergence result as a corollary.
Corollary 5.9. Under the conditions of Theorem 5.8, the sequence of pro-
cesses (X ε (·), αε (·)) converges to (X(·), α(·)) in the sense

sup E|X ε (t) − X(t)|2 → 0 as ε → 0. (5.29)


0≤t≤T

Remark 5.10. With a little more effort, we can obtain strong convergence
(in the usual sense used in the numerical solutions of stochastic differential
equations). The steps involved can be outlined as follows. (a) We consider
two sequences of approximations (X ε (·), αε (·)) and (X η (·), αη (·)) with the
same initial data. (b) Define for sufficiently small ε > 0 and η > 0,
Z t Z t
Xe ε (t) = X0 + b(X ε (s), αε (s))ds + σ(X ε (s), αε (s))dw(s),
Z0 t Z0 t
e η
X (t) = X0 + η η
b(X (s), α (s))ds + σ(X η (s), αη (s))dw(s).
0 0

That is, they are two approximations of the solution with the use of different
stepsizes. Then we can show
e ε (t) − X
E sup |X e η (t)|2 → 0 as ε → 0 and η → 0.
t∈[0,T ]

The main ingredient is the application of Doob’s martingale inequality.


(c) Let {εn } be a sequence of positive real numbers satisfying εn → 0.
We show that {X εn (t) : t ∈ [0, T ]} is an L2 Cauchy sequence of random
elements. We then conclude E supt∈[0,T ] |X ε (t) − X(t)|2 → 0 as ε → 0.
(d) Moreover, we can use the above results to give an alternative proof of
the existence and uniqueness of the solution of (5.1) together with (5.2)
(or (2.2) together with (2.3)); see Remark 2.2 for different approaches for
proving the existence and uniqueness of the solution.
5.5 Examples 151

5.5 Examples
Here we present two examples for demonstration. It would be ideal if we
could compare the numerical solutions using our algorithms with the an-
alytic solutions. Unfortunately, due to the complexity of the x-dependent
switching process, closed-form solutions are not available. We are thus con-
tended with the numerical solutions. In both examples, we use the state-
dependent generator Q(x) given by
 
−5 cos2 x 5 cos2 x
Q(x) =  .
10 cos2 x −10 cos2 x

Example 5.11. Consider a jump linear system. Suppose that

σ(x, 1) = 2x and σ(x, 2) = x.

Let
b(x, i) = A(i)x with A(1) = −3.3 and A(2) = −2.7.
We use the constant stepsize algorithm (5.4). Specify the initial conditions
as X0 = 5 and α0 = 1, and use the constant stepsize 0.001. A sample path
of the computed iterations is depicted in Figure 5.1.

4
x

0
0 200 400 600 800 1000 1200
Iterations

FIGURE 5.1. A sample path of a numerical approximation to a jump-linear


system.
152 5. Numerical Approximation

Example 5.12. This example is concerned with switching diffusion with


nonlinear drifts with respect to the continuous-state variable. We use the
following specifications,
σ(x, 1) = 0.5x, σ(x, 2) = 0.2x.
For each i ∈ {1, 2}, consider the nonlinear functions
b(x, 1) = −(2 + sin x) and b(x, 2) = −(1 + sin x cos x).
Using the same initial data and stepsize as in Example 5.11, the calculation
is carried out. The computational result is displayed in Figure 5.2.

5.5

5
x

4.5

3.5
0 200 400 600 800 1000 1200
Iterations

FIGURE 5.2. A sample path of switching diffusion with nonlinear drifts.

For these numerical examples, we have tried different stepsizes. They all
produced similar sample path behavior as displayed above. For numeri-
cal experiment purposes, we have also tested different functions b(·, ·) and
σ(·, ·).

5.6 Discussions and Remarks


This section provides some discussions on issues related to numerical ap-
proximation. First, rates of convergence are discussed, and then decreasing
stepsize algorithms are studied.
5.6 Discussions and Remarks 153

5.6.1 Remarks on Rates of Convergence


Because we are dealing with a numerical algorithm, it is desirable that
we have some estimation error bounds on the rates of convergence. This
section takes up this issue. In Kloeden and Platen [94, p. 323], the rate of
convergence was defined as follows. For a finite time T > 0, if there exists
a positive constant K that does not depend on ε such that E|X ε (T ) −
X(T )| ≤ Kεγ for some γ > 0 then the approximation X ε is said to converge
strongly to X with order γ. Here we adopt the more recent approach in
Mao and Yuan [120], and concentrate on uniform convergence in the sense
of error bounds of the form E sup0≤t≤T |X ε (t) − X(t)|2 .
We assume the conditions of Theorem 5.8 are satisfied. Also we make
use
√ of Remark 5.10. In what follows, to simplify the discussion, we take
εξn = w(ε(n + 1)) − w(εn). (Independent and identically distributed
“white” noise sequence {ξn } can be used, which makes the notation more
complex.) It is straightforward that the piecewise constant interpolation of
(5.4) leads to

Z t Z t
X ε (t) = X0 + b(X ε (s), αε (s))ds + σ(X ε (s), αε (s))dw(s). (5.30)
0 0

The representation (5.30) enables us to compare the solution (5.1) with


that of the discrete iterations.
Comparing the interpolation of the iterates and the solution of (5.1), we
obtain

E sup |X ε (t) − X(t)|2


0≤t≤T
Z t 2
≤ 2E sup [b(X(s), α(s)) − b(X ε (s), αε (s))]ds
0≤t≤T 0
Z t 2
+ 2E sup [σ(X(s), α(s)) − σ(X ε (s), αε (s))]dw (5.31)
0≤t≤T 0
Z T
≤ 2T E |b(X(s), α(s)) − b(X ε (s), αε (s))|2 ds
0
Z T
+ 8E |σ(X(s), α(s)) − σ(X ε (s), αε (s))|2 ds.
0

Note that in (5.31), the first inequality is obtained from the familiar in-
equality (a + b)2 ≤ 2(a2 + b2 ) for two real numbers a and b. The first term
on the right-side of the second inequality follows from Hölder’s inequality,
and the second term is a consequence of the well-known Doob martingale
inequality (see (A.19) in the appendix). To proceed, we treat the drift and
154 5. Numerical Approximation

diffusion terms separately. Note that


Z T
E |b(X(s), α(s)) − b(X ε (s), αε (s))|2 ds
Z T
0

≤E |b(X(s), α(s)) − b(X ε (s), α(s))|2 ds


Z
0
T
+E |b(X ε (s), α(s)) − b(X ε (s), αε (s))|2 ds (5.32)
0
Z T
≤K E|X ε (s) − X(s)|2 ds
Z0 T
+E [1 + |X ε (s)|2 ]I{α(s)6=αε (s)} ds.
0

The first inequality in (5.32) follows from the familiar triangle inequality,
and the second inequality is a consequence of the Lipschitz continuity, the
Cauchy inequality, and the linear growth condition. We now concentrate
on the last term in (5.32). Using discrete iteration, we have
Z T
E [1 + |X ε (s)|2 ]I{α(s)6=αε (s)} ds
0
bt/εc−1
X Z εk+ε
= E [1 + |X ε (s)|2 ]I{α(s)6=αε (s)} ds.
k=0 εk

Using nested conditioning, we further obtain


Z εk+ε
E [1 + |X ε (s)|2 ]I{α(s)6=αε (s)} ds
εk Z εk+ε
=E [1 + |Xk |2 ]E[I{α(s)6=αε (s)} Fεk ]ds
 εk Z εk+ε X X
= E [1 + |Xk |2 ] I{αεk =i} [qij (X(εk))(s − εk)
εk i∈M j6=i

+ o(s − εk)]ds
Z εk+ε
≤ Kε ds ≤ Kε2 .
εk

Thus the moment estimate of E|X(t)|2 yields that


Z T
E |b(X(s), α(s)) − b(X ε (s), αε (s))|2 ds
0 Z T (5.33)
≤ Kε + K E|X ε (s) − X(s)|2 ds.
0
5.6 Discussions and Remarks 155

Likewise, for the term involving diffusion, we also obtain


Z T
E |σ(X(s), α(s)) − σ(X ε (s), αε (s))|2 dw
0 Z T (5.34)
≤ Kε + K E|X ε (s) − X(s)|2 ds.
0

Using (5.31)–(5.34), we obtain

E sup |X ε (t) − X(t)|2


0≤t≤T
Z T (5.35)
≤ Kε + E sup |X ε (s) − X(s)|2 ds.
0 0≤s≤T

An application of Gronwall’s inequality leads to


E sup |X ε (t) − X(t)|2 ≤ Kε.
0≤t≤T

Thus, we conclude that the discrete iterates converge strongly in the L2


sense with an error bound of the order O(ε). We state it as a result below.
Theorem 5.13. Assume (A5.1)–(A5.3). Then the sequence (X ε (·), αε (·))
converges to (X(·), α(·)) in the sense
E sup |X ε (t) − X(t)|2 → 0 as ε → 0. (5.36)
0≤t≤T

Moreover, we have the following rate of convergence estimate


E sup |X ε (t) − X(t)|2 = O(ε). (5.37)
0≤t≤T

5.6.2 Remarks on Decreasing Stepsize Algorithms


So far the development is based on using constant stepsize algorithms. To
approximate (5.1), we could also use a decreasing stepsize algorithm of the
form

Xn+1 = Xn + εn b(Xn , αn ) + εn σ(Xn , αn )ξn . (5.38)
Compared with (5.4), for Xn−1 = x, αn is a finite-state process with tran-
sition matrix I + εn Q(x), Instead of (A5.3), we assume the following con-
dition.
(A5.4) In (5.38), {εn } is a sequence of decreasing
P stepsizes satisfying
εn ≥ 0, εn → 0 as n → ∞ and ε
n n = ∞. The {ξn } is
a sequence of independent and identically distributed normal
random variables such that ξn is independent of the σ-algebra Gn
generated by {Xk , αk : k ≤ n}, and that Eξn = 0, E|ξn |2 < ∞,
and Eξn ξn0 = I. Moreover, for Xn−1 = x, αn is a finite-state
process with transition matrix I + εn Q(x).
156 5. Numerical Approximation

Define
n−1
X
tn = εk , m(t) = max{n : tn ≤ t},
k=0

and continuous-time interpolations

X 0 (t) = Xn , α0 (t) = αn , for t ∈ [tn , tn+1 ).


X n (t) = X 0 (t + tn ), αn (t) = α0 (t + tn ).

Using essentially the same approach as in the development of Theorem 5.8


together with the ideas from stochastic approximation [104], we obtain the
following result.

Theorem 5.14. Under (A5.1), (A5.2), and (A5.4), (X n (·), αn (·)) con-
verges to (X(·), α(·)) weakly, which is a solution of the martingale problem
with operator L defined in (5.3). Moreover,

E sup |X n (t) − X(t)|2 → 0 as n → ∞. (5.39)


0≤t≤T

Furthermore, the rate of convergence is given by

E sup |X n (t) − X(t)|2 = O(εn ) as n → ∞. (5.40)


0≤t≤T

5.7 Notes
Numerical methods for stochastic differential equations have been studied
extensively, for example, in [94, 126] among others. A comprehensive study
of the early results is contained in Kloeden and Platen [94]. Accelerated
rates of convergence are given in Milstein [126]. As a natural extension,
further results are considered in Milstein and Tretyakov [127], among oth-
ers. Numerical solutions for stochastic differential equations modulated by
Markovian switching have also been well studied; recent progress for switch-
ing diffusions are contained in [120] and references therein.
Although numerical methods for Markov modulated switching diffusions
have been considered by many researchers, less is known for processes with
continuous-state-dependent switching. In fact, the study of switching diffu-
sions with state-dependent switching is still in its infancy. There are certain
difficulties. For example, the usual Picard iterations cannot be used, which
uses the Lipschitz conditions crucially. When Markovian regime-switching
processes are treated, with the given generator of the switching process,
we can pre-generate the switching process throughout the iterations. How-
ever, in the state-dependent switching case, since the generation of the
x-dependent switching processes is different in every step, we can no longer
5.7 Notes 157

pre-generate the switching process without interacting with the continuous-


state process and use the Lipschitz condition directly.
Part of the results of this chapter are based on Yin, Mao, Yuan, and Cao
[172]. The approach uses local analysis and weak convergence methods. It
relies on the solutions of associated martingale problems. The approach is
different from the usual techniques developed in the literature for numeri-
cal solutions of stochastic differential equations to date. The idea and the
techniques used are interesting in their own right. Rates of convergence
are then ascertained together with the development of the unusual strong
convergence and decreasing stepsize algorithms.
6
Numerical Approximation to Invariant
Measures

6.1 Introduction
Continuing with the development in Chapter 5, this chapter is devoted to
additional properties of numerical approximation algorithms for switching
diffusions, where continuous dynamics are intertwined with discrete events.
In this chapter, we establish that if the invariant measure exists, under suit-
able conditions, the sequence of iterates obtained using Euler–Maruyama
approximation converges to the invariant measure.
Here for simplicity, the discrete events are formulated as a finite-state
continuous-time Markov chain that can accommodate a set of possible
regimes, across which the dynamic behavior of the systems may be markedly
different. For simplicity, we have chosen to present the result for Q(x) = Q.
One of the motivations for the use of a constant matrix Q is: We may view it
as an approximation to x-dependent Q(x) in the sense of Q(x) = Q + o(1)
as |x| → ∞. This is based on the results in Chapters 3 and 4. Because
positive recurrence implies ergodicity, we could concentrate on a neighbor-
hood of ∞. Then effectively, Q, the constant matrix is the one having the
most contributions to the asymptotic properties in which we are interested.
Thus, we can “replace” Q(x) by Q in the first approximation. At any given
instance, in lieu of a fixed regime, the system parameters can take one of
several possible regimes (configurations). As the Markov chain sojourns in
a given state for a random duration, the system dynamics are governed by
a diffusion process in accordance with the associated stochastic differential
equation. Subsequently, the Markov chain jumps into another state, and

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 159
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_6,
© Springer Science + Business Media, LLC 2010
160 6. Numerical Approximation to Invariant Measures

the dynamic system switches to another diffusion process associated with


the new state, and so on. Instead of staying in a fixed configuration follow-
ing one diffusion process, the system jumps back and forth among a set of
possible configurations, resulting in a hybrid system of diffusions. In this
chapter, we consider switching diffusions given by

dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t), (6.1)

where α(t) is a continuous-time Markov chain that is generated by Q and


that has state space M = {1, . . . , m0 }. Associated with (6.1), there is an
operator defined by
r
X ∂f (x, ı)
Lf (x, ı) = bi (x, ı)
i=1
∂xi
r (6.2)
1 X ∂ 2 f (x, ı)
+ aij (x, ı) + Qf (x, ·)(ı), ı ∈ M,
2 i,j=1 ∂xi ∂xj

where
m0
X
Qf (x, ·)(ı) = qı f (x, ), ı ∈ M,
=1

and
A(x, ı) = (aij (x, ı)) = σ(x, ı)σ 0 (x, ı).
As alluded to in the previous chapters, switching diffusions have pro-
vided many opportunities in terms of flexibility. The formulation allows
the mathematical models to have multiple discrete configurations thereby
making them more versatile. However, solving systems of diffusions with
switching is still a challenging task, which often requires using numerical
methods and/or approximation techniques. The Euler–Maruyama scheme
is one of such approaches. In Section 6.2, we derive the tightness of the
approximating sequence. To proceed, an important problem of both theo-
retical and practical concerns is: Whether the sequence of approximation
converges to the invariant measure of the underlying system, provided that
it exists. To answer this question, we derive the convergence to the invari-
ant measures of the numerical approximation in Section 6.3. To obtain the
results requires the convergence of the algorithm under the weak conver-
gence framework. Rather than working with sample paths or numerically
solving systems of Kolmogorov–Fokker–Planck equations, we focus on the
corresponding measures and use a purely probabilistic argument. Why is
such a consideration important from a practical point of view? Suppose
that one considers an ergodic control problem of a hybrid-diffusion sys-
tem with regime switching. Then it is desirable to “replace” the actual
time-dependent measure by an invariant measure. The control problems
often have to be solved using numerical approximations. Because solving
6.2 Tightness of Approximation Sequences 161

the corresponding system of Kolmogorov–Fokker–Planck equations is com-


putationally intensive, it is crucially important to be able to approximate
the invariant measures numerically. For previous work on approximating
invariant measures of diffusion processes without regime switching, we re-
fer the reader to [102] and the many references cited there. Section 6.4
provides the proof of convergence for a decreasing stepsize algorithm and
Section 6.5 concludes the chapter.

6.2 Tightness of Approximation Sequences


In this chapter, we concern ourselves with the following algorithms with a
sequence of decreasing stepsizes {εn },


Xn+1 = Xn + εn b(Xn , αn ) + εn σ(Xn , αn )ξn , (6.3)

as well as algorithms with a constant stepsize ε,



Xn+1 = Xn + εb(Xn , αn ) + εσ(Xn , αn )ξn . (6.4)

Various quantities are as given in the last chapter. Associated with (6.1),
there is a martingale problem formulation. A process (X(t), α(t)) is said to
be a solution of the martingale problem with operator L defined in (6.2) if,
Z t
h(X(t), α(t)) − Lh(X(s), α(s))ds (6.5)
0

is a martingale for any real-valued function h(·) defined on Rr × M such


that for each i ∈ M, h(·, i) ∈ C02 (a collection of functions that are twice
continuously differentiable w.r.t. the first variable with compact support).
To proceed, we state the following conditions first.

(A6.1) There is a unique solution of (6.1) for each initial condition.


(A6.2) For each α ∈ M, there is a Liapunov function V (·, α) such that
(a) V (·, α) is twice continuously differentiable and ∇2 V (·, α) is
bounded uniformly; (b) V (x, α) ≥ 0; |V (x, α)| → ∞ as |x| → ∞;
(c) LV (x, α) ≤ −λV (x, α) for some λ > 0; (d) for each x ∈ Rr ,
the following growth conditions hold:

|∇V 0 (x, α)b(x, α)| ≤ K(1 + V (x, α)),


|b(x, α)|2 ≤ K(1 + V (x, α)), (6.6)
|σ(x, α)|2 ≤ K(1 + V (x, α)).
162 6. Numerical Approximation to Invariant Measures

(A6.3) The {εn } is a sequence of decreasing


P stepsizes satisfying εn ≥
0, εn → 0 as n → ∞, and n ε n = ∞. The {ξn } is a se-
quence of independent and identically distributed random vari-
ables such that ξn is independent of the σ-algebra Gn generated
by {Xk , αk : k ≤ n}, and that Eξn = 0, E|ξn |2 < ∞, and
Eξn ξn0 = I.

Remark 6.1. Note that sufficient conditions for the existence and unique-
ness of the solution of the switching diffusion were provided in Chapter 2.
Here we simply assume them for convenience.
Condition (A6.2) requires the existence of Liapunov functions V (·, α).
Condition (A6.2)(d) is a growth condition on the functions b(x, α) and
σ(x, α). If b(·, α) and σ(·, α) grow at most linearly, and the Liapunov func-
tion is quadratic, this condition is verified. Condition (c) requires the dif-
fusion with regime switching (6.1) to be stable in the sense of Liapunov.
Conditions listed in (A6.2) cover a large class of functions; see the related
comments in [102, 104].
Remark 6.2. A quick glance at the algorithm reveals that (6.3) has a cer-
tain resemblance to a stochastic approximation algorithm, which has been
the subject of extensive research for over five decades since the pioneering
work of Robbins and Monro [139]. The most recent account on the subject
and a state-of-the-art treatment can be found in Kushner and Yin [104] and
references therein. In what follows, we use certain ideas from stochastic ap-
proximation methods to establish the limit of the discretization through
suitable interpolations. Weak convergence methods are used to study the
convergence of the algorithm and the associated invariant measure.
Remark 6.3. As alluded to in Chapter 5, it can be established that
the sequence {Xn } is tight by using the moment estimates together with
Tchebeshev’s inequality. Here we use an alternative approach based on Li-
apunov function methods. Note that this approach can be modified to a
perturbed Liapunov function approach, which can be used to handle cor-
related random processes under suitable conditions. To illustrate the use
of the Liapunov function method for the tightness, we present the result
below together with a proof.
Theorem 6.4. Assume (A6.2) and (A6.3). Then
(i) the iterates generated by (6.3) using decreasing stepsizes satisfy

EV (Xn , αn ) = O(1);

(ii) the iterates generated by (6.4) using a constant stepsize satisfy

EV (Xn , αn ) = O(1)

for n sufficiently large.


6.2 Tightness of Approximation Sequences 163

Proof. The proof uses Liapunov functions. We concern ourselves with the
proof for algorithm (6.3) only. The results for (6.4) can be obtained simi-
larly.
Henceforth, denote by Fen and Fenα the σ-algebras generated by {ξk , αk :
k < n} and {αn , ξk , αk : k < n}, and denote by En and Eα n the correspond-
ing conditional expectations w.r.t. to Fen and Fenα , respectively. Similarly,
denote Ft = σ{w(u), α(u) : u ≤ t} and Et the conditional expectation
w.r.t. Ft , where w(t) is an r-dimensional standard Brownian motion (hav-
ing independent increments). Note that {ξn } is a sequence of independent
and identically distributed random vectors with 0 mean and covariance I.
We have

n V (Xn+1 , αn+1 ) − V (Xn , αn )

= Eα
n [V (Xn+1 , αn+1 ) − V (Xn+1 , αn )]
(6.7)
+Eα
n V (Xn+1 , αn ) − V (Xn , αn ).

We proceed to estimate each of the terms after the equality sign above.
Using the smoothness of V (·, α), the independence of ξn with αn , and
the independent increment property, estimates (with details omitted) lead
to

n V (Xn+1 , αn ) − V (Xn , αn )

= εn ∇V 0 (Xn , αn )b(Xn , αn )
εn (6.8)
+ tr[σ 0 (Xn , αn )∇2 V (Xn , αn )σ(Xn , αn )]
2
+O(ε2n )(1 + V (Xn , αn )).

As for the term on the second line of (6.7), using a truncated Taylor
expansion, we obtain


n [V (Xn+1 , αn+1 ) − V (Xn+1 , αn )]

= Eα
n [V (Xn , αn+1 ) − V (Xn , αn )]
Z 1 (6.9)
α
+En ∇V 0 (Xn + s∆Xn , αn )(Xn+1 − Xn )ds
0
= εn QV (Xn , ·)(αn ) + O(εn )(1 + V (Xn , αn )).

By virtue of (A6.2),

∇V 0 (Xn , αn )b(Xn , αn ) + QV (Xn , ·)(αn )


1
+ tr[σ 0 (Xn , αn )∇2 V (Xn , αn )σ(Xn , αn )]
2
= LV (Xn , αn ) ≤ −λV (Xn , αn ).
164 6. Numerical Approximation to Invariant Measures

Thus (6.7)–(6.9) yield


n V (Xn+1 , αn+1 ) − V (Xn , αn )

≤ −λεn V (Xn , αn ) + O(εn )V (Xn , αn ) + O(εn ).


Taking the expectation and iterating on the resulting inequality lead to

EV (Xn+1 , αn+1 )
n
X n
X
≤ An,−1 EV (X0 , α0 ) + K εk Ank EV (Xk , αk ) + K εk Ank ,
k=0 k=0
(6.10)
where  Qn
j=k+1 (1 − λεj ), if n > k,
Ank =
1, otherwise.
Note that
n
X
εk Ank = O(1).
k=0
An application of Gronwall’s inequality implies
n
!
X
EV (Xn , αn ) ≤ K exp εk Ank = O(1).
k=0

Thus the theorem is proved. 2


Using the same kind of arguments as those of Chapter 5, we can obtain
the convergence of the algorithm. Define
n−1
X
tn = εl .
l=0

Let m(t) be the unique value of n such that tn ≤ t < tn+1 for t ≥ 0. Define
αn = α(tn ). Let X 0 (t) and α0 (t) be the piecewise constant interpolations
of Xn and αn on [tn , tn+1 ), and X n (t) and αn (t) be their shifts. That is,

X 0 (t) = Xn , α0 (t) = αn if 0 < t ∈ [tn , tn+1 ),


(6.11)
X n (t) = X 0 (t + tn ), αn (t) = α0 (t + tn ).
This shift is used to bring the asymptotic properties of the underlying se-
quence to the foreground. Note that X n (·) ∈ D([0, ∞); Rr ), the space of
Rr -valued functions that are right continuous and have left limits endowed
with the Skorohod topology [43, 102, 104]. In what follows, we show that
X n (·) converges weakly to X(·), the solution of (6.1). In fact, we work with
the pair (X n (·), αn (·)). By virtue of Theorem 5.14, for the constant stepsize
algorithm, {X ε (·), αε (·)} converges weakly to (X(·), α(·)), which is a pro-
cess with generator given by (6.2). To proceed, we establish a convergence
result for the decreasing stepsize algorithm.
6.3 Convergence to Invariant Measures 165

Theorem 6.5. Consider algorithm (6.4). Assume (A6.1)–(A6.3) with α n


being a Markov chain with one-step transition probability matrix I + ε n Q.
Then the interpolated process (X n (·), αn (·)) converges weakly to (X(·), α(·)),
which is a solution of (6.1).
We note that because Q is independent of x, the limit process α(·) is a
Markov chain generated by Q. The proof of this theorem is still of inde-
pendent interest although a constant stepsize algorithm has been proved in
Chapter 5 for x-dependent Q(x). However, the main result of this chapter
is on convergence to the invariant measures, thus we postpone the proof
until Section 6.4.

6.3 Convergence to Invariant Measures


Having the convergence of {X n (·), αn (·)} and hence the convergence of
the algorithm, in this section, we examine the large-time behavior of the
algorithm and address the issue: When will the sequence of measures of
(X n (sn + ·), αn (sn + ·)) converge to the invariant measure of the switching
diffusion (6.1) if it exists, when sn → ∞ as n → ∞. We first recall a couple
of definitions.
For our two component process (X(t), α(t)), let Cb (Rr ) be the space of
real-valued bounded and continuous functions defined on Rr . A set CD ⊂
Cb (Rr ) is said to be convergence determining if
m0 Z
X m0 Z
X
f (x, i)νn (dx, i) → f (x, i)ν(dx, i)
i=1 i=1

for each ι ∈ M and f (·, ι) ∈ CD implies that νn converges to ν weakly.


Remark 6.6. Different from the case of diffusion processes, in addition
to the continuous component, we also have a discrete component. As a
convention, with a slight abusing of notation, we use, for example, ν(t, dx×
α) and ν(dx × α) instead of ν(t, dx × {α}) and ν(dx × {α}) throughout.
Recall that a measure ν(t, ·) of (X(t), α(t)) is weakly stable or stable in
distribution (see Kushner [102, p. 154]) if, for each δ > 0 and arbitrary
integer n0 , there exist an η > 0 and nη0 such that for each α ∈ M and any
ϕ(·, α) ∈ C0 (Rr ) (continuous functions with compact support), and for all
ν(0, ·, ·)
X Z X Z
ϕj (x, α)ν(0, dx × α) − ϕj (x, α)ν(dx × α) < η, j ≤ nη0 ,
α∈M α∈M

implies that for all t ≥ 0,


X Z X Z
ϕj (x, α)ν(t, dx × α) − ϕj (x, α)ν(dx × α) < δ, j ≤ n0 .
α∈M α∈M
166 6. Numerical Approximation to Invariant Measures

Lemma 6.7. The space of functions Cb (Rr ) is convergence determining.


b where K
The space of functions C0 (K), b is any compact subset of Rr , is
convergence determining.

Proof. See Ethier and Kurtz [43, p. 112]. 2


For each α ∈ M, suppose that {ϕk (·, α)} is a sequence of uniformly
continuous functions with compact support defined on K b for any compact
b Then {ϕk (·, α)} is convergence determining according to Lemma 6.7.
K.
Moreover, the sequence νn converges weakly to ν as n → ∞ is equivalent
to
X Z X Z
ϕk (x, α)νn (dx × α) → ϕk (x, α)ν(dx × α) for each k.
α∈M α∈M

With the preparation above, we proceed with investigation of conver-


gence to the invariant measure for numerical approximation. We begin by
assuming that the solution of (6.1) has a unique invariant measure. This
together with a couple of other conditions is given below; see Remark 6.8
for comments on these conditions.

(A6.4) The process (X(·), α(·)) has a unique invariant measure ν(·).
Denote by ν(x, α; t, ·) the measure of (X(t), α(t)) with initial
condition (X(0), α(0)) = (x, α). As t → ∞, ν(x, α; t, ·) converges
weakly to ν(·) for each (x, α). For any compact Kb ⊂ Rr and for
r
any ϕ ∈ C(R × M),

Ex,α ϕ(X(t), α(t)) → Eν ϕ


X Z (6.12)
= ϕ(x, α)ν(dx × α).
α∈M

uniformly on K b × M, where Eν denotes the expectation with


respect to the invariant measure ν.
(A6.5) For each α ∈ M, X x,α (·) is a Feller process with continuous
coefficients on [0, ∞) for each initial condition X(0) = x.

Remark 6.8. In accordance with the results in Chapter 4, the existence


and uniqueness of an invariant measure of a switching diffusion is guaran-
teed by the positive recurrence of the process. Hence, sufficient conditions
for positive recurrence of switching diffusions presented in Chapter 3 are the
conditions to ensure the existence and uniqueness of the invariant measure.
In addition, suppose that the continuous component lives in a compact set,
that for each α ∈ M, b(·, α) and σ(·, α) satisfying the usual regularity con-
dition and σ(x, α)σ 0 (x, α) is positive definite, and that the generator Q is
6.3 Convergence to Invariant Measures 167

irreducible, with L∗ denoting the adjoint operator of L given in (6.2). Then


the system of equations

L∗ µ(x, i) = 0, i ∈ M,
m0 Z
X (6.13)
µ(x, j)dx = 1
j=1

has a unique solution, known as the invariant density; where L∗ is the


adjoint operator of L; see Il’in, Khasminskii, and Yin [74] and references
therein. Furthermore, in this case, the convergence to the invariant density
takes places at an exponential rate.
In what follows, to highlight the dynamics starting at (X(0), α(0)), by
abusing notation slightly, we often write Eν ϕ on the right-hand side of
(6.12) as Eν ϕ(X(0), α(0)). That is, for ϕ(·, ι) ∈ C(Rr ) for each ι ∈ M,
XZ
Eν ϕ(X(0), α(0)) = ϕ(y, i)ν(dy, i).
i∈M

It should be clear from the context.


Note that the Feller property assumed in (A6.5) has been established in
Chapter 2 for switching diffusions with x-dependent Q(x) under suitable
conditions. It was noted that for Markovian switching diffusions, the Feller
property can be obtained with a much simpler proof. For convenience, we
put it here as a condition. As a direct consequence of these conditions, the
following lemma holds.

Lemma 6.9. Let Kc be a set of Rr -valued random variables, which is tight


and (X(0), α(0)) = (x, α) ∈ Kc × M. Under (A6.4) and (A6.5), for each
α ∈ M and any positive integer n0 , 0 = δ1 < δ2 < · · · < δn0 , and each ι
and any ϕ(·, ι) ∈ Cb (Rr ),

EX(0),α(0) ϕ(X(t + δj ), α(t + δj ), j ≤ n0 )


(6.14)
→ Eν ϕ(X(δj ), α(δj ), j ≤ n0 ),

uniformly for (X(0), α(0)) ∈ Kc × M as t → ∞.

Remark 6.10. Lemma 6.9 is a consequence of assumptions (A6.4) and


(A6.5), and is handy to use in the subsequent development. When n0 = 1,

EX(0),α(0) ϕ(X(t), α(t))

is bounded and continuous for each t > 0 because ϕ(·, ι) ∈ Cb (Rr ) for
each ι ∈ M. Suppose that X(0) ∈ Kc and the distribution with initial
condition (X(0), α(0)) is denoted by ν(0, ·). Then the tightness of X(0)
168 6. Numerical Approximation to Invariant Measures

and conditions (A6.4) and (A6.5) yield that

EX(0),α(0) ϕ(X(t), α(t))


m0 Z
X
= ν(0, dx × i)Ex,i ϕ(X(t), α(t))
i=1
m0 Z
X (6.15)
→ ν(dx × i)Eν ϕ(X(0), α(0))
i=1
= Eν ϕ(X(0), α(0)),

inasmuch as
m0 Z
X
ν(dx × i) = 1.
i=1

Moreover, the convergence is uniform in (X(0), α(0)) by the condition of


uniform convergence on any compact x-set as given in (A6.4). Similar to the
approach in Kushner [102, p. 155] (see also [101]), for general n0 , Lemma 6.9
can be proved by induction. The details are omitted.

Theorem 6.11. Assume (A6.1)–(A6.5).

(i) For arbitrary positive integer n0 , ϕ(·, ι) ∈ C(Rr ) for each ι ∈ M, and
for any δ > 0, there exist t0 < ∞ and positive integer N0 such that
for all t ≥ t0 and n ≥ N0 ,

Eϕ(X n (t + δj ), αn (t + δj ), j ≤ n0 )
(6.16)
− Eν ϕ(X(δj ), α(δj ), j ≤ n0 ) < δ.

(ii) Furthermore, for any sequence sn → ∞,

((X n (sn + δ1 ), αn (sn + δ1 )), . . . , (X n (sn + δn0 ), αn (sn + δn0 )))

converges weakly to the stationary distribution of

((X(δ1 ), α(δ1 )), . . . , (X(δn0 ), α(δn0 ))).

Remark 6.12. Note that t0 and N0 above depend on δ and on ϕ(·).

Proof of Theorem 6.11. Suppose that (6.16) were not true. There would
exist a subsequence {nk } and a sequence snk → ∞ such that

Eϕ(X nk (snk + δj ), αnk (snk + δj ), j ≤ n0 )


(6.17)
− Eν ϕ(X(δj ), α(δj ), j ≤ n0 ) ≥ δ > 0.
6.4 Proof: Convergence of Algorithm 169

For a fixed T > 0, choose a further subsequence {k` } of {nk }, and the cor-
responding sequence (X k` (·), αk` (·)) such that (X k` (sk` − T ), αk` (sk` − T ))
converges weakly to a random variable (X(0), α(0)). Theorem 6.5 implies
that (X k` (sk` − T + ·), αk` (sk` − T + ·)) converges weakly to (X(·), α(·))
with initial condition (X(0), α(0)). Moreover,

Eϕ(X k` (sk` − T + T + δj ), αk` (sk` − T + T + δj ), j ≤ n0 )


 
→ E EX(0),α(0) ϕ(X(T + δj ), α(T + δj ), j ≤ n0 )

as k` → ∞. Owing to (A6.5), the collection of all possible X(0) over all


T > 0 and weakly convergent subsequence is tight. Noting α(0) ∈ M,
which is a finite integer, by Lemma 6.9, there exists T0 > 0 such that for
all T ≥ T0 ,
 
E EX(0),α(0) ϕ(X(T + δj ), α(T + δj ), j ≤ n0 )
− Eν ϕ(X(δj ), α(δj ), j ≤ n0 ) < δ/2,

which contradicts (6.17).


Using Lemma 6.9 again, part (i) of the theorem implies that (X n (sn +
·), αn (sn + ·)) converges weakly to the random variable with the invariant
distribution ν(·) as sn → ∞. Thus part (ii) of the assertion also follows. 2
For a constant stepsize algorithm (5.4), we can examine the associated
invariant measures similar to the development for the decreasing stepsize
algorithms. The following result can be derived. We state the result and
omit the proof.
Theorem 6.13. Consider algorithm (6.3). Assume the conditions of The-
orem 6.11 are fulfilled. For arbitrary positive integer n0 , ϕ(·, ι) ∈ Cb (Rr )
for each ι ∈ M, and for any δ > 0, there exist t0 < ∞ and positive integer
ε0 > 0 such that for all t ≥ t0 and ε ≤ ε0 ,

Eϕ(X ε (t + δj ), αε (t + δj ), j ≤ n0 ) − Eν ϕ(X(δj ), α(δj ), j ≤ n0 ) < δ.

Moreover, for any sequence sε → ∞ as ε → 0,


((X ε (sε + δ1 ), αε (sε + δ1 )), . . . , (X ε (sε + δn0 ), αε (sε + δn0 )))
converges weakly to the stationary distribution of
((X(δ1 ), α(δ1 )), . . . , (X(δn0 ), α(δn0 ))).

6.4 Proof: Convergence of Algorithm

Proof of Theorem 6.5. We can show that Lemma 5.4 continues to hold
and {αn (·)} is tight with the constant stepsize replaced by the decreasing
stepsizes.
170 6. Numerical Approximation to Invariant Measures

We first show that the sequence of interest is tight in D([0, ∞) : Rr ×M),


and then characterize the limit by means of martingale problem formula-
tion. In the process of verifying the tightness, we need to show for each
0 < T < ∞,

lim lim sup P (sup |X n (t)| ≥ K0 ) = 0. (6.18)


K0 →∞ n→∞ t≤T

Equation (6.18) is usually difficult to verify. Thus, we use a technical device,


known as an N -truncation [104, p. 248], to overcome the difficulty.
We illustrate the use of the N -truncation device. The main idea is that
for each N < ∞, we work with the sequence X n,N (·) that is equal to X n (·)
up until the first exit from the N -sphere SN = {x : |x| ≤ N } and is zero
outside the (N + 1)-sphere SN +1 . We then proceed to prove the truncated
sequence is tight and obtain its limit. Finally, letting N → ∞, a piecing
together argument together with the uniqueness of the martingale problem
enables us to complete the proof. To proceed, the proof is divided into a
number of steps.
In lieu of (6.3), consider

N √
Xn+1 = XnN + εn bN (XnN , αn ) + εn σ N (XnN , αn )ξn , (6.19)

where

bN (x, α) = b(x, α)q N (x), σ N (x, α) = σ(x, α)q N (x), (6.20)

and q N (x) is a smooth function satisfying q N (x) = 1 when x ∈ SN and


q N (x) = 0 when x ∈ Rr − SN +1 . Next, define X 0,N (t) = XnN on [tn , tn+1 )
and X n,N (t) = X 0,N (t + tn ). Then X n,N (·) is an N -truncation of X n (·).
The next lemma shows that {X n,N (·)} is tight by the tightness criterion.
For any 0 < T < ∞, any δ > 0, |t| ≤ T , and 0 < s ≤ δ, we have

2
Etm(tn +t) X n,N (t + s) − X n,N (t)
m(t+s+tn )−1 2
X
≤ 2Etm(tn +t) bN (XkN , αk )εk (6.21)
k=m(t+tn )
m(t+s+tn )−1 2
X √
+ 2Etm(tn +t) εk σ N (XkN , αk )ξk ,
k=m(t+tn )

where Etm(tn +t) denotes the conditional expectation on the σ-algebra gen-
erated by {(Xj , αj ) : j ≤ m(tn + t)}. The continuity of bN (·, i) and σ N (·, i)
(for each i ∈ M), the smoothness of q N (·), and the boundedness of XnN
6.4 Proof: Convergence of Algorithm 171

yield the boundedness of bN (·, i) and σ N (·, i). Thus,


m(t+s+tn )−1 2
X
Etm(tn +t) bN (XkN , αk )εk
k=m(t+tn )
m(t+s+tn )−1 m(t+s+tn )−1
(6.22)
X X
2
≤K εl εk ≤ O(s ).
l=m(t+tn ) k=m(t+tn )

In the above and hereafter, we use K to denote a generic positive constant;


its values may vary for different appearances. It follows from (6.22),
m(t+s+tn )−1 2
X
lim lim sup E bN (XkN , αk )εk = 0. (6.23)
δ→0 n→∞
k=m(t+tn )

By virtue of (A6.3), {ξn } is an independent sequence with zero mean


and covariance I. Without loss of generality, assume l ≤ k. Then

Eα 0
l ξk ξl = E[ξ
0
ξ |α , ξ , α ; j < l]
 k l l j j
 0, if l 6= k,
=
 1, if l = k.

By the independence of XnN and ξn , and the independence of αn and ξn ,


m(t+s+tn )−1 2
X √
Etm(tn +t) εk σ N (XkN , αk )ξk
k=m(t+tn )
m(t+s+tn )−1
(6.24)
X
≤K εk ≤ O(s).
k=m(t+tn )

Combining (6.22) and (6.24) and recalling that 0 ≤ s < δ,

Etm(tn +t) |X n,N (t + s) − X n,N (t)|2 ≤ Etm(tn +t) γ n,N (s), (6.25)

where γ n,N (s) is a random variable satisfying

lim lim sup Eγ n,N (s) = 0.


δ→0 n→∞

The criterion in [102, Theorem 3, p. 47] then implies the tightness of


{X n,N (·)}. As a consequence of the tightness of {X n,N (·)} and {αn (·)},
the sequence of the interpolated pair of processes is tight.
Next, we use the martingale averaging techniques employed in Chapter 5
to show that (X n,N (·), αn (·)) converges weakly to (X N (·), α(·)). This result
is stated as a lemma below.
172 6. Numerical Approximation to Invariant Measures

Lemma 6.14 The pair (X N (·), α(·)) is the solution of the martingale prob-
lem with operator LN obtained from L by replacing b(·) and σ(·) with bN (·)
and σ N (·) (defined in (6.20)), respectively.

Proof of Lemma 6.14. We need to verify that (6.5) holds. Without loss
of generality, we work with t ≥ 0. It suffices to show that for each i ∈ M,
any real-valued function h(·, i) ∈ C02 , any T < ∞, 0 ≤ t ≤ T , s > 0,
arbitrary positive integer n0 , bounded and continuous functions ϕj (·, i)
(with j ≤ n0 ), and any sj satisfying 0 < sj ≤ t ≤ t + s,

n0
Y 
E ϕj (X N (sj ), α(sj )) h(X N (t + s), α(t + s))
j=1
Z t+s
(6.26)
N N N

− h(X (t), α(t)) − L h(X (u), α(u))du = 0.
t

To obtain (6.26), let us begin with the process (X n,N (·), αn (·)). By virtue
of the weak convergence of (X n,N (·), αn (·)) to (X N (·), α(·)) and the Sko-
rohod representation, as n → ∞,

n0
Y
E ϕj (X n,N (sj ), αn (sj )) h(X n,N (t + s), αn (t + s))
j=1

− h(X n,N (t), αn (t))
n0
Y 
→E ϕj (xN (sj ), α(sj )) h(X N (t + s), α(t + s)) − h(X N (t), α(t)) .
j=1
(6.27)
Choose δn and ml such that δn → 0 as n → ∞ and

ml+1 −1
1 X
εj → 1 as n → ∞.
δn j=1

Use the notation Ξ given by

Ξ = {l : m(t + tn ) ≤ ml ≤ ml+1 − 1 ≤ m(t + s + tn ) − 1}. (6.28)

Then
h(X n,N (t + s), αn (t + s)) − h(X n,N (t), αn (t))
X
N N
= [h(Xm , αml+1 ) − h(Xm , αml )]
l∈Ξ
l+1 l+1
(6.29)
X
N N
+ [h(Xm l+1
, αml ) − h(Xm l
, αml )].
l∈Ξ
6.4 Proof: Convergence of Algorithm 173

For the last term in (6.29),

X
N N
[h(Xm l+1
, αml ) − h(Xm l
, αml )]
l∈Ξ
X
= ∇h0 (Xm
N
l
N
, αml )[Xm l+1
N
− Xm l
] (6.30)
l∈Ξ
1X N N 0 2 N N N
+ [Xm − Xm ] ∇ h(Xm , αml )[Xm − Xm ] + een ,
2 l+1 l l l+1 l
l∈Ξ

where ∇2 h denotes the Hessian of h(·, α) and een represents the error in-
curred from the truncated Taylor expansion. By the continuity of ∇2 h(·, α),
the boundedness of {XnN }, it is readily seen that

n0
Y
lim E ϕj (X n,N (sj ), αn (sj ))(|e
en |) = 0. (6.31)
n→∞
j=1

Using (6.3) in (6.30),

X
∇h0 (Xm
N
l
N
, αml )[Xm l+1
N
− Xm l
]
l∈Ξ
ml+1 −1
X X
= ∇h0 (Xm
N
l
, α ml ) bN (XkN , αk )εk
l∈Ξ k=ml
ml+1 −1
X X √
+ ∇h0 (Xm
N
l
, α ml ) εk σ N (XkN , αk )ξk .
l∈Ξ k=ml

The independence of ξk and αk and the measurability of (X n,N (sj ), αn (sj ))


for j ≤ n0 with respect to Fm(tn +t) imply

n0
Y hX
E ϕj (X n,N (sj ), αn (sj )) ∇h0 (Xm
N
l
, α ml )
j=1 l∈Ξ
ml+1 −1
X √ i
× εk σ N (XkN , αk )ξk
k=ml
n0
Y h X
=E ϕj (X n,N (sj ), αn (sj )) Em(tn +t) ∇h0 (Xm
N
l
, α ml )
j=1 l∈Ξ
ml+1 −1
X √ i
× εk σ N (XkN , αk )Eα
k ξk = 0.
k=ml
174 6. Numerical Approximation to Invariant Measures

Note that

n0
Y hX
lim E ϕj (X n,N (sj ), αn (sj )) ∇h0 (Xm
N
l
, α ml )
n→∞
j=1 l∈Ξ
ml+1 −1
X i
× bN (XkN , αk )εk
k=ml
n0
Y (6.32)
n,N n
X 0 N
= lim E ϕj (x (sj ), α (sj )) ∇h (Xm l
, αml )δn
n→∞
j=1 l∈Ξ
ml+1 −1
1 X 
× bN (Xm
N
, αk )εk .
δn l
k=ml

Owing to the interpolation, for k ∈ [ml , ml+1 − 1], write αk as αn (u) with
u ∈ [tml , tml+1 −1 ). Then

ml+1 −1
1 X
bN (Xm
N
, αk )εk
δn l
k=ml
m0 ml+1 −1
X 1 X
= bN (Xm
N
, i)I{αk =i} εk
δ
i=1 n
l
k=ml
m0 ml+1 −1
X 1 X
= bN (Xm
N
, i)I{αn (u)=i} εk .
δ
i=1 n
l
k=ml

When tml → u, tml+1 → u as well. In view of the weak convergence of


(X n,N (·), αn (·)) to (X N (·), α(·)) and the Skorohod representation, together
with the continuity of bN (·, α) for each α ∈ M then yield

n0
Y hX ml+1 −1
X i
E ϕj (X n,N (sj ), αn (sj )) ∇h0 (Xm
N
l
, α ml ) bN (XkN , αk )εk
j=1 l∈Ξ k=ml
m0
X n0
Y
= E ϕj (X n,N (sj ), αn (sj ))
i=1 j=1
X 
× ∇h0 (X n,N (tml ), i)bN (X n,N (tml ), i)I{αn (u)=i} δn
l∈Ξ
m0
X n0
Y Z t+s

→ E ϕj (X N (sj ), α(sj )) ∇h0 (X N (u), α(u))bN (X N (u), i)
i=1 j=1 t

×I{α(u)=i} du .
6.4 Proof: Convergence of Algorithm 175

Thus

n0
Y
E ϕj (X n,N (sj ), αn (sj ))[Xm
n,N
l+1
n,N
− Xm l
]
j=1
n0
Y hZ t+s i
→E ϕj (X N (sj ), α(sj )) ∇h0 (X N (u), α(u))bN (X N (u), α(u))du .
j=1 t
(6.33)
Next, consider the term involving ∇2 h(·, α) in (6.30). We have

X
N N 0 2 N N N
[Xm l+1
− Xm l
] ∇ h(Xm l
, αml )[Xm l+1
− Xm l
]
l∈Ξ
X−1 h
X−1 ml+1
X ml+1
= εk1 εk bN,0 (XkN1 , αk1 )
l∈Ξ k1 =ml k=ml

×∇ 2 N
h(Xm , αml )bN (XkN , αk ) (6.34)
l

+ εk1 εk bN,0 (XkN1 , αk1 )∇2 h(Xm N
l
, αml )σ N,0 (XkN , αk )ξk

+ εk εk1 ξk0 1 σ N,0 (XkN1 , αk1 )∇2 h(Xm N
l
, αml )bN (XkN , αk )
√ 
+ tr[∇2 h(Xm N
l
, αml ) εk εk1 σ N (XkN , αk )ξk ξk0 1 σ N,0 (XkN1 , αk1 ) .

By the boundedness of XnN , bN (·), ∇2 h(·), and ϕj (·),

n0
Y X ml+1
X−1 ml+1
X−1
E ϕj (X n,N (sj ), αn (sj )) εk 1 εk
j=1 l∈Ξ k1 =ml k=ml

× bN,0 (XkN1 , αk1 )∇2 h(Xm


N
l
, αml )bN (XkN , αk )
X ml+1
X−1 ml+1 −1
X
≤K εk 1 εk
l∈Ξ k1 =ml k=ml
ml+1 −1 ml+1 −1
X 1 X 1 X
≤K δn2 εk 1 εk
δn δn
l∈Ξ k1 =ml k=ml

≤ Kδn → 0 as n → ∞,

since

X   ml+1 −1
X
t+s t 1
δn = δ n − = O(1) and εk = O(1).
δn δn δn
l∈Ξ k=ml
176 6. Numerical Approximation to Invariant Measures

Likewise, we also have

X ml+1
X−1 ml+1
X−1 √
E εk1 εk bN,0 (XkN1 , αk1 )∇2 h(Xm
N
l
, α ml )
l∈Ξ k1 =ml k=ml
N
×σ (XkN , αk )ξk
ml+1 −1 ml+1 −1 2
X X X √
≤K εk1 E1/2 εk σ N (XkN , αk )ξk (6.35)
l∈Ξ k1 =ml k=ml
ml+1 −1
X 1 X
≤K δn3/2 εk 1
δn
l∈Ξ k1 =ml
X
≤K δn3/2 → 0 as n → ∞,
l∈Ξ

and

n0 ml+1 −1 ml+1 −1
Y X X
n,N n
E ϕj (X (sj ), α (sj )) εk ξk0 1 σ N,0 (XkN1 , αk1 )
j=1 k1 =ml k=ml (6.36)
2 N
×∇ h(Xm l
, αml )bN (XkN , αk ) → 0 as n → ∞.

The independence of {ξn } and {αn } and Eα 0


k ξk ξk = I yield that as n →
∞,

n0 ml+1 −1 ml+1 −1
Y X X √
E ϕj (X n,N (sj ), αn (sj )) εk εk1 tr[∇2 h(Xm
N
l
, α ml )
j=1 k1 =ml k=ml

× σ (XkN , αk )ξk ξk0 1 σ N,0 (XkN1 , αk1 )]


N

n0 ml+1 −1
Y X
n,N n
= E ϕj (X (sj ), α (sj )) εk tr[∇2 h(Xm
N
l
, αml )σ N (XkN , αk )
j=1 k=ml

× εk [Eα 0
k ξk ξk ]σ
N,0
(XkN , αk )]
Yn0 Z
 t+s
→E ϕj (X N (sj ), α(sj )) tr[∇2 h(X N (u), α(u))σ N (X N (u), α(u))
j=1 t

× σ N,0 (X N (u), α(u))]du .
(6.37)
6.4 Proof: Convergence of Algorithm 177

Combining (6.34)–(6.37), we arrive at that as n → ∞,


n0
Y X
E ϕj (X n,N (sj ), αn (sj )) N
[Xm l+1
N 0 2
− Xm l
N
] ∇ h(Xm l
, α ml )
j=1 l∈Ξ
N N
×[Xm l+1
− Xm l
]
n0
Y
→E ϕj (X N (sj ), α(sj ))
j=1
hZ t+s i
× tr[∇2 h(X N (u), α(u))σ N (X N (u), α(u))σ N,0 (xN (u), α(u))]du .
t
(6.38)
N
By the smoothness of h(·, α) for each α ∈ M, we can replace Xm l+1
in the
N
second line of (6.29) by Xml . In fact,
X
N N
[h(Xm l+1
, αml+1 ) − h(Xm l+1
, αml )]
l∈Ξ
X
N N
= [h(Xm l
, αml+1 ) − h(Xm l
, αml )] + o(1),
l∈Ξ

where o(1) → 0 in probability as n → ∞ uniformly in t. It follows that


n0
Y X
E ϕj (X n,N (sj ), αn (sj )) N
[h(Xm l
N
, αml+1 ) − h(Xm l
, αml )]
j=1 l∈Ξ
n0
Y X ml+1
X−1
N n
=E ϕj (X (sj ), α (sj )) Eα N N
k [h(Xml , αk+1 ) − h(Xml , αk )],
j=1 l∈Ξ k=ml
(6.39)
with
X−1
X ml+1
Eα N N
k [h(Xml , αk+1 ) − h(Xml , αk )]
l∈Ξ k=ml
X ml+1
X−1 X m0
m0 X
N
= [h(Xm l
, j0 )P(αk+1 = j0 |αk = i0 )
l∈Ξ k=ml j0 =1 i0 =1
N
−h(Xm l
, i0 )I{αk =i0 } ]
ml+1 −1
X X
= χk (P k,k+1 − I)H(Xm
N
l
)
l∈Ξ k=ml
X ml+1
X−1 [exp(εk Q) − I] N
= ε k χk H(Xm )
εk l
l∈Ξ k=ml
ml+1 −1
X 1 X [exp(εk Q) − I] N
= δn ε k χk H(Xm )
δn εk l
l∈Ξ k=ml
Z t+s
→ Qh(X N (u), ·)(α(u))du as n → ∞,
t
178 6. Numerical Approximation to Invariant Measures

where
H(x) = (h(x, 1), . . . , h(x, m0 ))0 ∈ Rm0 ×1 . (6.40)
Thus Lemma 6.14 is proved. 2
Completion of the proof of the theorem. We show the convergence
of the untruncated process. We have demonstrated that the truncated pro-
cess {(X n,N (·), αn (·))} converges to (X N (·), α(·)). Here we show that the
untruncated sequence {(X n (·), αn (·))} also converges. The basic premise is
the uniqueness of the martingale problem. By letting N → ∞, we obtain
the desired result. The argument is similar to that of [104, pp. 249–250].
We present only the basic idea here.
Let the measures induced by (X(·), α(·)) and (X N (·), α(·)) be P (·) and
N
P (·), respectively. The martingale problem with operator L has a unique
solution (in the sense in distribution) for each initial condition, therefore
P (·) is unique. For any 0 < T < ∞ and |t| ≤ T , P (·) and P N (·) are the
same on all Borel subsets of the set of paths in D((−∞, ∞); Rr × M) with
values in SN × M. By using

P ( sup |X(t)| ≤ N ) → 1 as N → ∞,
|t|≤T

and the weak convergence of X n,N (·) to X N (·), we conclude X n (·) con-
verges weakly to X(·). This leads to the desired result. The proof of the
theorem is completed. 2

6.5 Notes
Chapter 4 provides sufficient conditions for ergodicity of switching diffu-
sions with state-dependent switching. Based on the work Yin, Mao, and Yin
[171], this chapter addresses the ergodicity for the corresponding numerical
algorithms. The main result here is the demonstration of convergence to the
invariant measure of the Euler–Maruyama-type numerical algorithms when
the invariant measure exists. To obtain the result, we have first proved weak
convergence of the algorithms. Here our approach is inspired by Kushner’s
work [101], in which he considered convergence to invariant measures for
systems driven by wideband noise. We have adopted the method in that
reference to treat the numerical approximation problem. Moreover, conver-
gence of numerical algorithms has been proved using ideas from stochastic
approximation (see Kushner and Yin [104]). We have dealt with both al-
gorithms with decreasing stepsizes and constant stepsize.
Here our approach is based on weak convergence methods and we work
with the associated measure. A different approach concentrating on the
associated differential equations is in Mao, Yuan, and Yin [121]. The rate
of convergence of the algorithms may be studied, for example, by means
6.5 Notes 179

of strong invariance principles. Further study may also be directed to the


large deviation analysis related to convergence to invariant measures.
7
Stability

7.1 Introduction
Continuing our effort of studying positive recurrence and ergodicity of
switching diffusion processes in Chapters 3 and 4, this chapter focuses
on stability of the dynamic systems described by switching diffusions. For
some of the recent progress in stability analysis, we refer the reader to
[48, 116, 136, 182, 183] and references therein. For treating dynamic sys-
tems in science and engineering, linearization techniques are used most
often. Nevertheless, the nonlinear systems and their linearizations may or
may not share similar asymptotic behavior. A problem of great interest is:
If a linear system is stable, what can we say about the associated nonlinear
systems? This chapter provides a systematic approach for treating such
problems for switching diffusions. We solve these problems using Liapunov
function methods.
The rest of the chapter is arranged as follows. Section 7.2 begins with
the formulation of the problem together with an auxiliary result, which is
used in our stability analysis. Section 7.3 recalls various notions of stability,
and presents p-stability and exponential p-stability results. Easily verifiable
conditions for stability and instability of linearized systems are provided
in Section 7.4. To demonstrate our results, we provide several examples
in Section 7.5. Further remarks are made in Section 7.6 to conclude this
chapter.

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 183
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_7,
© Springer Science + Business Media, LLC 2010
184 7. Stability

7.2 Formulation and Auxiliary Results


Recall that we use z 0 to denote the transpose of z ∈ R`1 ×`2 with `i ≥ 1,
and i = 1, 2, whereas R`×1 is simply written as R` ; 1l = (1, 1, . . . , 1)0 ∈ Rm0
is a column vector with all entries being 1; the Euclidean norm for a row or
column vector x is denoted by |x|. As usual, I denotes
p the identity matrix.
For a matrix A, its trace norm is denoted by |A| = tr(A0 A). If B is a set,
its indicator function is denoted by IB (·).
Suppose that (X(t), α(t)) is a two-component Markov process such that
X(·) is a continuous component taking values in Rr and α(·) is a jump
process taking values in a finite set M = {1, 2, . . . , m0 }. The process
(X(t), α(t)) has a generator L given as follows. For any twice continuously
differentiable function g(·, i), i ∈ M,
r r
1 X ∂ 2 g(x, i) X ∂g(x, i)
Lg(x, i) = ajk (x, i) + bj (x, i)
2 ∂xj ∂xk j=1
∂xj
j,k=1

+Q(x)g(x, ·)(i)
1
= tr(a(x, i)∇2 g(x, i)) + b0 (x, i)∇g(x, i) + Q(x)g(x, ·)(i),
2
(7.1)
where x ∈ Rr , Q(x) = (qij (x)) isPan m0 × m0 matrix depending on x
satisfying qij (x) ≥ 0 for i 6= j and j∈M qij (x) = 0 for each i ∈ M, and
X
Q(x)g(x, ·)(i) = qij (x)g(x, j)
j∈M
X
= qij (x)(g(x, j) − g(x, i)), i ∈ M,
j∈M

and ∇g(·, i) and ∇2 g(·, i) denote the gradient and Hessian of g(·, i), respec-
tively.
The process (X(t), α(t)) can be described by

dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t),


(7.2)
X(0) = x, α(0) = α,

and for i 6= j,

P{α(t + ∆t) = j|α(t) = i, (X(s), α(s)), s ≤ t}


(7.3)
= qij (X(t))∆t + o(∆t),

where w(t) is a d-dimensional standard Brownian motion, b(·, ·) : Rr ×M 7→


Rr , and σ(·, ·) : Rr × M 7→ Rr×d satisfies σ(x, i)σ 0 (x, i) = a(x, i).
To proceed, we need conditions on the smoothness and growth of the
functions involved, and the condition that 0 is the only equilibrium point of
7.2 Formulation and Auxiliary Results 185

the random dynamic system. Hence we assume that the following conditions
are valid throughout this chapter.
(A7.1) The matrix-valued function Q(·) is bounded and continuous.
(A7.2) b(0, α) = 0, and σ(0, α) = 0 for each α ∈ M. Moreover, assume
that σ(x, α) vanishes only at x = 0 for each α ∈ M.
(A7.3) There exists a constant K0 > 0 such that for each α ∈ M, and
for any x, y ∈ Rr ,
|b(x, α) − b(y, α)| + |σ(x, α) − σ(y, α)| ≤ K0 |x − y|. (7.4)

Under these conditions, the system given by (7.2) and (7.3) has a unique
solution; see Chapter 2 for more details. In what follows, a process start-
ing from (x, α) is denoted by (X x,α (t), αx,α (t)) if the emphasis on initial
condition is needed. If the context is clear, we simply write (X(t), α(t)).
To study stability of the equilibrium point x = 0, we first present
the following “nonzero” property, which asserts that almost all the sam-
ple paths of any solution of the system given by (7.2) and (7.3) starting
from a nonzero state will never reach the origin. For diffusion processes,
such a result was established in [83, Section 5.2]; for Markovian regime-
switching processes, similar results were obtained in [116, Lemma 2.1]. In
what follows, we give a proof for switching diffusions with continuous-state-
dependent switching processes. The result is useful, provides us with flexi-
bility for choices of Liapunov functions, and enables us to build Liapunov
functions in a deleted neighborhood of the origin.
Lemma 7.1. Under conditions (A7.1)–(A7.3), we have
P{X x,α (t) 6= 0, t ≥ 0} = 1, for any x 6= 0, α ∈ M, (7.5)
and for any β ∈ R and t > 0
h i
β β
E |X x,α (t)| ≤ |x| eKt , x 6= 0, α ∈ M, (7.6)

where K is a constant depending only on β, m0 , and the Lipschitz constant


K0 in (7.4).

Proof. For x 6= 0, for each i ∈ M, define V (x, i) = |x|β for any β ∈ R−{0}.
For any ∆ > 0 and |x| > ∆,

∇V (x, i) = β|x|β−2 x,
 
2
∇2 V (x, i) = β|x|β−4 |x| I + (β − 2)xx0 .

Then it follows that


β−2
LV (x, i) = β |x| x, b(x, i)
1 h 
2
i
+ tr σ(x, i)σ 0 (x, i)β|x|β−4 |x| I + (β − 2)xx0 .
2
186 7. Stability

Moreover, using conditions (A7.2) and (A7.3),


β
|LV (x, i)| ≤ K |x| , for any (x, i) ∈ Rr × M with x 6= 0. (7.7)

Let τ∆ be the first exit time from {x ∈ Rr : |x| > ∆}. Denote (X(t), α(t)) =
(X x,α (t), αx,α (t)). Now applying generalized Itô’s formula (see (2.7)) to V ,
we obtain
Z τ∆ ∧t
β β β
|X(τ∆ ∧ t)| = |x| + L |X(u)| du
Z τ∆ ∧t0 (7.8)
β−2 0
+ β |X(u)| X (u)σ(X(u), α(u))dw(u).
0

Note that by virtue of conditions (A7.2) and (A7.3)


Z τ∆ ∧t 2
Z τ∆ ∧t
β−2 2β
E β |X(u)| X 0 (u)σ(X(u), α(u)) du ≤ E K |X(u)| du.
0 0

Thus if β > 0, (2.12) implies that


Z τ∆ ∧t Z t
2β 2β
E K |X(u)| du ≤ E |X(u)| du ≤ KCt < ∞,
0 0

where C = C(x, t, β) > 0. On the other hand, if β < 0, then by the


definition of τ∆ , we have
Z τ∆ ∧t Z τ∆ ∧t

E |X(u)| du ≤ E K∆2β du ≤ K∆2β t < ∞.
0 0

Therefore we have verified that the stochastic integral in (7.8) is a mar-


tingale with mean 0. Hence, by taking expectations on both sides of (7.8),
and taking into account (7.7), it follows that
Z τ∆ ∧t
β
E|X(τ∆ ∧ t)| β
≤ |x| + KE |X(u)|β du
Z0 t
≤ |x|β + KE |X(u ∧ τ∆ )|β du.
0

In the above, we have used the fact τ∆ ∧u = u for u ≤ τ∆ ∧t. An application


of Gronwall’s inequality implies that for any β 6= 0,

E|X(τ∆ ∧ t)|β ≤ |x|β exp(Kt). (7.9)

Taking β = −1, we have


h i
−1
E |X(τ∆ ∧ t)| ≤ |x|−1 exp(Kt).
7.2 Formulation and Auxiliary Results 187

By Tchebeshev’s inequality, for any ∆ > 0,



P(τ∆ < t) = P(|X(τ∆ ∧ t)| ≤ ∆) = P |X(τ∆ ∧ t)|−1 ≥ ∆−1
h i h i
−1
≤ ∆ E |X(τ∆ ∧ t)| ≤ ∆ |x|−1 exp(Kt) .

Letting ∆ → 0, we have P(τ∆ < t) → 0 as ∆ → 0. Therefore (7.6) follows


by letting ∆ → 0 in (7.9) and Fatou’s lemma.
Finally, assume (7.5) were false, then there would exist some (x0 , α0 ) ∈
Rr × M and T > 0 such that

P {X x0 ,α0 (T ) = 0} > 0.

Then we would have


h i
−1
E |X x0 ,α0 (T )| = ∞,

which would contradict with (7.6) that we just proved. This completes the
proof of the lemma. 2
Remark 7.2. In view of (7.5), we can work with functions V (·, i), i ∈
M that are twice continuously differentiable and are defined on a deleted
neighborhood of 0 in what follows.
To proceed, we present an auxiliary result, namely, the solvability of a
system of deterministic equations. Suppose that Q, an m0 × m0 constant
matrix, is the generator of a continuous-time Markov chain r(t) and that
Q is irreducible.
Remark 7.3. In the above, by the irreducibility, we mean that the system
of equations 
 νQ = 0
 ν1l = 1,

has a unique solution such that ν = (ν1 , . . . , νm0 ) satisfies νi > 0; see
Definition A.7 and discussions there for further details.
Note that if Q is irreducible, the rank of Q is m0 −1. Denote by R(Q) and
N (Q) the range and the null space of Q, respectively. It follows that N (Q)
is one-dimensional spanned by 1l (i.e., N (Q) = span{1l}). As a consequence,
the Markov chain r(t) is ergodic; see, for example, [29]. In what follows,
denote the associated stationary distribution by

ν = (ν1 , ν2 , . . . , νm0 ) ∈ R1×m0 . (7.10)

We are interested in solving a linear system of equations

Qc = η, (7.11)
188 7. Stability

where Q ∈ Rm0 ×m0 and η ∈ Rm0 are given and c ∈ Rm0 is an unknown
vector. Note that (7.11) is a Poisson equation. The properties of solutions
of (7.11) are provided in Lemma A.12. Basically, it indicates that under
the irreducibility of Q, equation (7.11) has a solution if and only if νη = 0.
Moreover, suppose that c1 and c2 are two solutions of (7.11). Then c1 −c2 =
α0 1l for some α0 ∈ R.

7.3 p-Stability
This section is concerned with stability of the equilibrium point x = 0 for
the system given by (7.2) and (7.3). Adopting the terminologies of [83], we
first present definitions of stability, p-stability, and exponential p-stability.
Then general results in terms of Liapunov functions are provided.

7.3.1 Stability
Definition 7.4. The equilibrium point x = 0 of the system given by (7.2)
and (7.3) is said to be
(i) stable in probability, if for any ε > 0 and any α ∈ M,

lim P{sup |X x,α (t)| > ε} = 0,


x→0 t≥0

and x = 0 is said to be unstable in probability if it is not stable in


probability;
(ii) asymptotically stable in probability, if it is stable in probability and
satisfies

lim P{ lim X x,α (t) = 0} = 1, for any α ∈ M;


x→0 t→∞

(iii) p-stable (for p > 0), if

lim sup E|X x,α (t)|p = 0;


δ→0 |x|≤δ,α∈M,t≥0

(iv) asymptotically p-stable, if it is p-stable and satisfies E|X x,α (t)|p → 0


as t → ∞ for any (x, α) ∈ Rr × M;
(v) exponentially p-stable, if for some positive constants K and k,

E|X x,α (t)|p ≤ K|x|p exp(−kt), for any (x, α) ∈ Rr × M.

Using similar arguments as those of [83, Theorems 5.3.1, 5.4.1, and 5.4.2],
we establish the following three lemmas. The statements are given together
their proofs.
7.3 p-Stability 189

Lemma 7.5. Let D ⊂ Rr be a neighborhood of 0. Suppose that for each


i ∈ M, there exists a nonnegative function V (·, i) : D 7→ R such that
(i) V (·, i) is continuous in D and vanishes only at x = 0;
(ii) V (·, i) is twice continuously differentiable in D − {0} and satisfies
LV (x, i) ≤ 0, for all x ∈ D − {0}.

Then the equilibrium point x = 0 is stable in probability.

Proof. Let ς > 0 be such that the ball Bς = {x ∈ Rr : |x| < ς} and its
boundary ∂Bς = {x ∈ Rr : |x| = ς} are contained in D. Set
Vς := inf {V (y, j) : y ∈ D \ Bς , j ∈ M} .
Then Vς > 0 by assumption (i). Next, by virtue of assumption (ii) and
Dynkin’s formula, we have
Z τς ∧t
Ex,i V (X(t ∧ τς ), α(t ∧ τς )) = V (x, i) + Ex,i LV (X(s), α(s))ds
0
≤ V (x, i),

where (x, i) ∈ Bς × M and τς is the first exit time from Bς , that is, τς :=
{t ≥ 0 : |X(t)| ≥ ς}. Because V is nonnegative, we further have
 
Vς P {τς ≤ t} ≤ Ex,i V (X(τς ), α(τς ))I{τς ≤t} ≤ V (x, i).
Note that τς ≤ t if and only if sup0≤u≤t |x(u)| > ς. Therefore it follows
that  
V (x, i)
Px,i sup |X(u)| > ς ≤ .
0≤u≤t Vς
Letting t → ∞, we obtain
 
V (x, i)
Px,i sup |X(t)| > ς ≤ .
0≤t Vς
Finally, the desired conclusion follows from the assumptions that V (0, i) =
0 and V (·, i) is continuous for each i ∈ M. 2
Introduce the notation
x,α
τε,r 0
:= inf{t ≥ 0 : |X x,α (t)| = ε or |X x,α (t)| = r0 }, (7.12)
for any 0 < ε < r0 and any (x, α) ∈ Rr × M with ε < |x| < r0 .
Lemma 7.6. Assume the conditions of Lemma 7.5. If for any sufficiently
small 0 < ε < r0 and any (x, α) ∈ Rr × M with ε < |x| < r0 , we have
x,α
P{τε,r 0
< ∞} = 1, (7.13)
then the equilibrium point x = 0 is asymptotically stable in probability.
190 7. Stability

Proof. We divide the proof into several steps.


Step 1. Let ς > 0 and (x, i) ∈ Bς × M, and define τς as in the proof of
Lemma 7.5. For any t ≥ 0, let g(t) := V (X x,i (τς ∧ t), αx,i (τς ∧ t)). Then
as in the proof of Lemma 7.5, the assumption that LV (y, j) ≤ 0 for all
(y, j) ∈ (D − {0}) × M implies that g is a nonnegative supermartingale.
Therefore the martingale convergence theorem implies that

lim g(t) = g(∞) exists a.s. (7.14)


t→∞

Step 2. By virtue of Lemma 7.5, the equilibrium point x = 0 is stable in


probability. Therefore for any ε > 0, there exists a δ > 0 (we may further
assume that δ < ς) such that

Py,j {τς < ∞} < ε/2, for all (y, j) ∈ Bδ × M. (7.15)

Now choose an arbitrary point (x, α) ∈ Bδ × M. Then both (7.14) and


(7.15) hold. Hence it follows from (7.13) and (7.15) that for any ρ > 0 with
ρ < |x|, we have

Px,α {τρ < ∞} ≥ Px,α {τρ,ς < ∞} − Px,α {τς < ∞} ≥ 1 − ε/2.

This implies that Px,α {inf t≥0 |X(t)| ≤ ρ} ≥ 1 − ε/2. Since ρ > 0 can be
arbitrarily small,
 
Px,α inf |X(t)| = 0 ≥ 1 − ε/2.
t≥0

Now let  
A := ω ∈ Ω : τς (ω) = ∞, inf |X(t, ω)| = 0 .
t≥0

Then Px,α (A) ≥ 1 − ε/2.


Step 3. We claim that for almost all ω ∈ A, we have

lim inf |X x,α (t, ω)| = 0.


t→∞

If the claim were false, there would exist a B ⊂ A with Px,α (B) > 0 such
that for all ω ∈ B, we would have

lim inf |X x,α (t, ω)| ≥ θ > 0.


t→∞

Then for any ω ∈ B, there exists a T = T (ω) > 0 such that |X x,α (t, ω)| ≥ θ
for all t ≥ T . Therefore for any ω ∈ B and n ∈ N sufficiently large,
τ1/n (ω) ≤ T , where τ1/n (ω) := inf {t ≥ 0 : |X x,α (t, ω)| ≤ 1/n}. Hence it
follows that
lim τ1/n (ω) ≤ T (ω) < ∞.
n→∞
7.3 p-Stability 191

Then we would have


n o
Px,α lim τ1/n < ∞ ≥ Px,α (B) > 0.
n→∞

But this would lead to a contradiction because by virtue of Lemma 7.1, the
equilibrium point 0 is inaccessible with probability 1. Thus it follows that
n o n o
Px,α lim τ1/n = ∞ = 1, or Px,α lim τ1/n < ∞ = 0.
n→∞ n→∞

Hence the claim is verified.


Step 4. Since V (·, i) is continuous and V (0, i) = 0 for each i ∈ M, we
have from Step 3 that

lim inf V (X x,α (t), αx,α (t)) = 0, for almost all ω ∈ A. (7.16)
t→∞

Now by virtue of (7.14) and the definition of A, we have

lim g(t) = lim V (X x,α (τς ∧ t), αx,α (τς ∧ t))


t→∞ t→∞ (7.17)
= lim V (X x,α (t), αx,α (t)) = g(∞) a.s.
t→∞

Thus it follows from (7.16) and (7.17) that

lim V (X x,α (t), αx,α (t)) = 0, for almost all ω ∈ A.


t→∞

But V (·, i) vanishes only at x = 0 for each i ∈ M, so limt→∞ X x,α (t) = 0


on A. Thus we have
n o
Px,α lim X(t) = 0 ≥ Px,α (A) ≥ 1 − ε/2.
t→∞

Note that (x, α) is an arbitrary point in Bδ × M. Thus, for any ε > 0, there
exists a δ > 0 such that
n o
Px,α lim X(t) = 0 ≥ 1 − ε/2, for all (x, α) ∈ Bδ × M.
t→∞

That is, the equilibrium point x = 0 is asymptotically stable in probability


as desired. 2
Lemma 7.7. Let D ⊂ Rr be a neighborhood of 0. Assume that the condi-
tions of Lemma 7.6 hold and that for each i ∈ M, there exists a nonnegative
function V (·, i) : D 7→ R such that V (·, i) is twice continuously differen-
tiable in every deleted neighborhood of 0, and

LV (x, i) ≤ 0 for all x ∈ D − {0}; (7.18)

lim V (x, i) = ∞, for each i ∈ M. (7.19)


|x|→0

Then the equilibrium point x = 0 is unstable in probability if (7.13) holds.


192 7. Stability

Proof. Let ς > 0 and (x, α) ∈ Bς × M, and define τς as in the proof of


Lemma 7.5. Let also 0 < ε < |x| and define τε := inf {t ≥ 0 : |X x,α (t)| ≤ ε}.
Then for any t > 0, we have

Ex,α V (X(t ∧ τε,ς ), α(t ∧ τε,ς ))


Z t∧τε,ς
= V (x, α) + Ex,α LV (X(s), α(s))ds (7.20)
0
≤ V (x, α).

Letting t → ∞ in (7.20), we obtain by virtue of Fatou’s lemma and (7.13)


that
Ex,α V (X(τε ∧ τς ), α(τε ∧ τς )) ≤ V (x, α).
Furthermore, since V is nonnegative, we have
 
V (x, α) ≥ Ex,α V (X(τε ), α(τε ))I{τε <τς }
≥ inf {V (y, j) : |y| = ε, j ∈ M} Px,α {τε < τr }
 
= Vε Px,α sup |X(t)| < ς ,
0≤t≤τε

where Vε = inf {V (y, j) : |y| = ε, j ∈ M}. By Lemma 7.1, the equilibrium


point 0 is inaccessible with probability 1 and hence τε → ∞ a.s. as ε → 0.
Also, it follows from (7.19) that Vε → ∞ as ε → 0. Therefore it follows
that as ε → 0, we have
 
Px,α sup |X(t)| < ς = 0.
t≥0

This shows that the equilibrium point x = 0 is unstable in probability. 2


Remark 7.8. Note that (7.13) is an essential assumption in Lemma 7.6
and Lemma 7.7. Here we present two sufficient conditions for which (7.13)
holds.
(i) Let N ⊂ Rr be a neighborhood of 0. Assume that for each i ∈ M,
there exists a nonnegative function V (·, i) : N 7→ R such that V (·, i)
is twice continuously differentiable in every deleted neighborhood of
0, and that for any sufficiently small 0 < ε < r0 there is a positive
constant κ = κ(ε) such that
LV (x, i) ≤ −κ, for all x ∈ N with ε < |x| < r0 . (7.21)
Then (7.13) holds.
(ii) If for any sufficiently small 0 < ε < r0 , there exist some ι = 1, 2, . . . , r
and some constant κ = κ(ε) > 0 such that
aιι (x, i) ≥ κ, for all (x, i) ∈ ({x : ε < |x| < r0 }) × M, (7.22)
then (7.13) holds.
7.3 p-Stability 193

Assertion (i) can be established using almost the same proof as that for
[83, Theorem 3.7.1]. Also (ii) follows by observing that if (7.22) is satisfied,
then we can construct some Liapunov function V (·, ·) satisfying (7.21); see
also a similar argument in the proof of [83, Corollary 3.7.2]. We omit the
details here for brevity.

7.3.2 Auxiliary Results


Concerning the exponential p-stability of the equilibrium point x = 0 of
the system given by (7.2) and (7.3), sufficient conditions in terms of the
existence of certain Liapunov functions being homogeneous of degree p
were obtained in [116]. In what follows, we first derive a Kolmogorov-type
backward equation and provide a couple of lemmas as a preparation, and
then we present a necessary condition for the exponential p-stability.
Note that in Theorem 7.10, we do not assume that the operator L is
uniformly parabolic. In other words, the operator L is degenerate. Never-
theless, we prove that the function u(t, x, i) defined in (7.24) is a classical
solution to the initial value problem (7.30)–(7.31).

Theorem 7.9. Assume that for each i ∈ M, the coefficients of the op-
erator L defined in (7.1) satisfy b(·, i) ∈ C 2 and σ(·, i) ∈ C 2 and that
|qij (x)| ≤ K for all x ∈ Rr and some K > 0. Suppose that φ(·, i) ∈ C 2 and
that Dxθ φ(·, i) is Lipschitz continuous for each i ∈ M and |θ| = 2, and that
γ
Dxβ b(x, i) + Dxβ σ(x, i) + Dxθ φ(x, i) ≤ K(1 + |x| ), i ∈ M, (7.23)

where K and γ are positive constants and β and θ are multi-indices with
|β| ≤ 2 and |θ| ≤ 2. Then for any T > 0, the function

u(t, x, i) := Ex,i [φ(X(t), α(t))] = E[φ(X x,i (t), αx,i (t))] (7.24)

is twice continuously differentiable with respect to the variable x and satis-


fies
γ
Dxβ u(t, x, i) ≤ K(1 + |x| ),

where t ∈ [0, T ], x ∈ Rr , and i ∈ M.

Proof. For notational simplicity, we prove the theorem when X(t) is one-
dimensional, the multidimensional case can be handled in a similar man-
ner. Fix (t, x, i) ∈ [0, T ] × Rr × M. Let x e = x + ∆ with 0 < |∆| < 1. As
in the proof of Theorem 2.27, we denote (X(t), α(t)) = (X x,i (t), αx,i (t))
e
and (X(t), e(t)) = (X xe,i (t), αxe,i (t)). By virtue of Theorem 2.27, the mean
α
square derivative ζ(t) = (∂/∂x)X x,i (t) exists and is mean square continu-
ous with respect to x and t.
194 7. Stability

Write
e, i) − u(t, x, i)
u(t, x 1 e
= E[φ(X(t), e(t)) − φ(X(t), α(t))]
α
∆ ∆
1 e e
= E[φ(X(t), e(t)) − φ(X(t),
α α(t))] (7.25)

1 e
+ E[φ(X(t), α(t)) − φ(X(t), α(t))].

Similar to the proof of Lemma 2.28, we can show that
 
1 2
E sup φ( e
X(t), e
α (t)) − φ( e
X(t), α(t)) → 0, (7.26)
∆2 0≤t≤T

as ∆ → 0. To proceed, for each i ∈ M, we use φx (·, i) and φxx (·, i) to denote


the first and second derivatives of φ(·, i) with respect to x, respectively. We
obtain
1 e
E[φ(X(t), α(t)) − φ(X(t), α(t))]
∆ Z 1
1 d e − X(t)), α(t))dv
= E φ(X(t) + v(X(t)
∆ 0 Zdv 
1
= E Z(t) e
φx (X(t) + v(X(t) − X(t)), α(t))dv ,
0

where
e − X(t)
X(t)
Z(t) = . (7.27)

Thus it follows that

1 e
E[φ(X(t), α(t)) − φ(X(t), α(t))] − E [φx (X(t), α(t))ζ(t)]
∆ Z 1
≤E e − X(t)), α(t))dvZ(t) − φx (X(t), α(t))ζ(t)
φx (X(t) + v(X(t)
Z
0
1 
≤E e
φx (X(t) + v(X(t) − X(t)), α(t))dv − φx (X(t), α(t)) Z(t)
0
+E |φx (X(t), α(t)) [Z(t) − ζ(t)]|
:= e1 + e2 ,

It follows from (7.23), Proposition 2.3, and (2.73) that

e2 = E |φx (X(t), α(t)) [Z(t) − ζ(t)]|


2 2
≤ E1/2 |φx (X(t), α(t))| E1/2 |Z(t) − ζ(t)|
2
≤ KE1/2 |Z(t) − ζ(t)| → 0,
7.3 p-Stability 195

as ∆ → 0. To estimate the term e1 , we note that (7.23) and Proposition 2.3


imply that
2
e − X(t)), α(t)) − φx (X(t), α(t))
E φx (X(t) + v(X(t) ≤K

for all 0 < |∆| < 1. Recall also from the proof of Theorem 2.27 that
e
X(t) → X(t) in probability for any t ∈ [0, T ]. Thus it follows that
2
e − X(t)), α(t)) − φx (X(t), α(t))
E φx (X(t) + v(X(t) → 0,

2
as ∆ → 0. Note that we proved in Corollary 2.32 that E |Z(t)| ≤ K, where
Z(t) is the “difference quotient” defined in (7.27). Then we have from the
Cauchy–Schwartz inequality that
Z 1 
e1 = E e − X(t)), α(t))dv − φx (X(t), α(t)) Z(t)
φx (X(t) + v(X(t)
0
Z 1 2
≤ E1/2 e − X(t)), α(t))dv − φx (X(t), α(t))
φx (X(t) + v(X(t)
0
2
×E1/2 |Z(t)|
→ 0 as ∆ → 0.

Hence we have shown that as ∆ → 0,

1 e
E[φ(X(t), α(t)) − φ(X(t), α(t))] − E [φx (X(t), α(t))ζ(t)] → 0. (7.28)

Therefore it follows from (7.25), (7.26), and (7.28) that

e, i) − u(t, x, i)
u(t, x
− E [φx (X(t), α(t))ζ(t)] → 0 as ∆ → 0.

Thus u(t, ·, i) is differentiable with respect to the variable x and


 
∂u(t, x, i) x,i x,i ∂X x,i (t)
= E [φx (X(t), α(t))ζ(t)] = E φx (X (t), α (t)) .
∂x ∂x
(7.29)
Moreover, (7.23), Proposition 2.3, and (2.74) imply that for some K > 0,
we have
∂u(t, x, i)
≤ E |φx (X(t), α(t))ζ(t)|
∂x
2 2
≤ E1/2 |φx (X(t), α(t))| E1/2 |ζ(t)|
γ γ
≤ KE1/2 (1 + |X(t)| ) ≤ K(1 + |x| 0 ).
196 7. Stability

Next, we verify that (∂/∂x)u(t, x, i) is continuous with respect to x. To this


purpose, we consider

e, i)
∂u(t, x, i) ∂u(t, x e e − φx (X(t), α(t))ζ(t)
− ≤ E φx (X(t), e(t))ζ(t)
α
∂x ∂x
e − ζ(t))
≤ E φx (X(t), α(t))(ζ(t)
e
+E [φx (X(t), e(t)) − φx (X(t), α(t))]ζ(t) ,
α

where
e,i
x
e = ζ xe,i (t) = ∂X (t) .
ζ(t)
∂x
By virtue of Theorem 2.27, ζ(t) = ∂X(t)/∂x is mean square continuous.
Hence it follows that
2
e − ζ(t)) ≤ E1/2 |φx (X(t), α(t))|2 E1/2 ζ(t)
E φx (X(t), α(t))(ζ(t) e − ζ(t)

→ 0 as x
e → x.

In addition, detailed calculations similar to those used in deriving (7.26)


lead to

e
E [φx (X(t), e(t)) − φx (X(t), α(t))]ζ(t)
α
2
e 2
≤ E1/2 φx (X(t), e(t)) − φx (X(t), α(t)) E1/2 |ζ(t)|
α
2
e
φx (X(t), e(t)) − φx (X(t), α(t))
α 2
1/2
≤ KE |e
x − x|
e−x
x
→ 0 as x
e → x.

Hence it follows that (∂/∂x)u(t, x, i) is continuous with respect to x and


therefore u(t, x, i) is continuously differentiable with respect to the variable
x.
In a similar manner, we can show that u(t, x, i) is twice continuously
differentiable with respect to the variable x and that
"  2 #
∂ 2 u(t, x, i) ∂X(t) ∂ 2 X(t)
= Ex,i φxx (X(t), α(t)) + φx (X(t), α(t)) .
∂x2 ∂x ∂x2

Consequently, we can verify that

∂ 2 u(t, x, i) γ
≤ K(1 + |x| ).
∂x2

This completes the proof of the theorem. 2


7.3 p-Stability 197

Theorem 7.10. Assume the conditions of Theorem 7.9. Then the function
u defined in (7.24) is continuously differentiable with respect to the variable
t. Moveover, u satisfies the system of Kolmogorov backward equations

∂u(t, x, i)
= Lu(t, x, i), (t, x, i) ∈ (0, T ] × Rr × M, (7.30)
∂t
with initial condition

lim u(t, x, i) = φ(x, i), (x, i) ∈ Rr × M, (7.31)


t↓0

where Lu(t, x, i) in (7.30) is to be interpreted as L applied to the function


(x, i) 7→ u(t, x, i).

Proof. First note that by virtue of Proposition 2.4, the process (X(t), α(t))
is càdlàg. Hence the initial condition (7.31) follows from the continuity of
φ. We divide the rest of the proof into several steps.
Step 1. For fixed (x, i) ∈ Rr × M, u(t, x, i) is absolutely continuous with
respect to t ∈ [0, T ]. In fact, for any 0 ≤ s ≤ t ≤ T , we have from Dynkin’s
formula that

u(t, x, i) − u(s, x, i) = Ex,i φ(X(t), α(t)) − Ex,i φ(X(s), α(s))


= Ex,i [Ex,i [(φ(X(t), α(t)) − φ(X(s), α(s)))|Fs ]]
Z t 
= Ex,i Ex,i Lφ(X(v), α(v))dv|Fs
Z t s

≤ Ex,i Ex,i [|Lφ(X(v), α(v))| |Fs ] dv.


s

Using (7.23), for some positive constants K and γ0 , we have


γ
|Lφ(x, i)| ≤ K(1 + |x| 0 ) for all (x, i) ∈ Rr × M.

Hence it follows from Proposition 2.3 that


γ
Ex,i [|Lφ(X(v), α(v))| |Fs ] ≤ KEx,i [(1 + |X(v)| 0 )|Fs ] ≤ C,

where C is independent of t, s, or v. Thus we have

|u(t, x, i) − u(s, x, i)| ≤ C |t − s| .

Thus u is absolutely continuous with respect to t and (∂/∂t)u(t, x, i) exists


a.s. on [0, T ] and we have
Z t
∂u(v, x, i)
u(t, x, i) = u(0, x, i) + dv. (7.32)
0 ∂v
198 7. Stability

Step 2. For any h > 0, we have from the strong Markov property that

u(t + h, x, i) = Ex,i φ(X(t + h), α(t + h))


= Ex,i [Ex,i [φ(X(t + h), α(t + h))|Fh ]]
  (7.33)
= Ex,i EX(h),α(h) φ(X(t + h), α(t + h))
= Ex,i u(t, X(h), α(h)).

Now let g(x, i) := u(t, x, i). Then Theorem 7.9 implies that g(·, i) ∈ C 2 for
each i ∈ M and for some K > 0 and γ0 > 0,
γ
Dxβ g(x, i) ≤ K(1 + |x| 0 ), i ∈ M.

Thus it follows from Dynkin’s formula that


Z h
Ex,i g(X(h), α(h)) − g(x, i) = Ex,i Lg(X(v), α(v))dv.
0

Using the same argument as in the proof of [47, Theorem 5.6.1], we can
show that
Z h
1
Ex,i Lg(X(v), α(v))dv → Lg(x, i) as h ↓ 0. (7.34)
h 0

Therefore,
Ex,i g(X(h), α(h)) − g(x, i)
lim = Lg(x, i).
h↓0 h
But by the definition of g, we have from (7.33) that

u(t + h, x, i) − u(t, x, i)
lim = Lg(x, i) = Lu(t, x, i). (7.35)
h↓0 h

Thus a combination of (7.32) and (7.35) leads to


Z t
u(t, x, i) = u(0, x, i) + Lu(v, x, i)dv. (7.36)
0

Step 3. We claim that Lu(t, x, i) is continuous with respect to the variable


t. Note that
m
∂u(t, x, i) 1 2 ∂ 2 u(t, x, i) X 0

Lu(t, x, i) = b(x, i) + σ (x, i) + qij (x)u(t, x, j).


∂x 2 ∂x2 j=1

The claim is verified if we can show (∂/∂x)u(t, x, i) and (∂ 2 /∂x2 )u(t, x, i)


are continuous with respect to t, since Step 1 above shows that u(t, x, i) is
7.3 p-Stability 199

continuous with respect to t. To this end, let t, s ∈ [0, T ]. Then we have

∂u(t, x, i) ∂u(s, x, i)

∂x ∂x
= |Ex,i [φx (X(t), α(t))ζ(t)] − Ex,i [φx (X(s), α(s))ζ(s)]|
≤ Ex,i |φx (X(t), α(t))ζ(t) − φx (X(s), α(s))ζ(s)|
≤ Ex,i |[φx (X(t), α(t)) − φx (X(s), α(s))] ζ(t)|
+Ex,i |φx (X(s), α(s)) [ζ(t) − ζ(s)]|
1/2 2 1/2 2
≤ Ex,i |φx (X(t), α(t)) − φx (X(s), α(s))| Ex,i |ζ(t)|
1/2 2 1/2 2
+Ex,i |φx (X(s), α(s))| Ex,i |ζ(t) − ζ(s)|

As we demonstrated before,

1/2 2
Ex,i |φx (X(s), α(s))| ≤ K.

While Corollary 2.32 implies that ζ(t) is mean square continuous with
respect to t. Hence it follows that

1/2 2
Ex,i |ζ(t) − ζ(s)| → 0 as |t − s| → 0.

Meanwhile,

2
Ex,i |φx (X(t), α(t)) − φx (X(s), α(s))|
2
≤ KEx,i |φx (X(t), α(t)) − φx (X(s), α(t))|
2
+KEx,i |φx (X(s), α(t)) − φx (X(s), α(s))|
:= e1 + e2 .

Using Theorem 2.13 or (2.26) and (2.74), detailed computations show that

Z 1 2
e1 ≤ KEx,i φxx (X(s) + v(X(t) − X(s)), α(t))dv(X(t) − X(s))
0
→ 0 as |t − s| → 0.

To treat the term e2 , we assume without loss of generality that t > s and
200 7. Stability

compute
2
e2 = KEx,i |φx (X(s), α(t)) − φx (X(s), α(s))|
m0 X
X 2
= Ex,i |φx (X(s), j) − φx (X(s), i)| I{α(t)=j} I{α(s)=i}
i=1 j6=i
m0 X
X h i
2
= Ex,i |φx (X(s), j) − φx (X(s), i)| I{α(s)=i} Ex,i [I{α(t)=j} |Fs ]
i=1 j6=i
m0 X
X h
2
= Ex,i |φx (X(s), j) − φx (X(s), i)| I{α(s)=i}
i=1 j6=i
i
×qij (X(s))(t − s) + o(t − s)
≤ K(t − s).

Thus it follows that e2 → 0 as |t − s| → 0. Hence we have shown that


∂u(t, x, i) ∂u(s, x, i)
− → 0 as |t − s| → 0
∂x ∂x
and so (∂/∂x)u(t, x, i) is continuous with respect to the variable t. Similarly,
we can show that (∂ 2 /∂x2 )u(t, x, i) is also continuous with respect to the
variable t. Therefore Lu(t, x, i) is continuous with respect to the variable t.
Step 4. Finally, by virtue of (7.36) and Step 3 above, we conclude that
(∂/∂t)u(t, x, i) exists everywhere for t ∈ (0, T ] and that
∂u(t, x, i)
= Lu(t, x, i).
∂t
This finishes the proof of the theorem. 2
Lemma 7.11. Let X x,α (t) be the solution to the system given by (7.2)
and (7.3) with initial data X(0) = x, α(0) = α. Assume that for each
i ∈ M, b(·, i) and σ(·, i) have continuous partial derivatives with respect to
the variable x up to the second order and b(0, i) = σ(0, i) = 0. Then for
any γ ∈ R, the function
p
u(t, x, i) := E X x,i (t) (7.37)

is twice continuously differentiable with respect to x, except possibly at x =


0. Moreover, we have
∂u(t, x, α) p−1 κ0 t
≤ K |x| e , and
∂xj
(7.38)
∂ 2 u(t, x, α) p−2 κ0 t
≤ K |x| e ,
∂xj ∂xk
where j, k = 1, 2, . . . , r, and K and κ0 are positive constants.
7.3 p-Stability 201

Proof. Once again, for notational simplicity, we present the proof for
X(t) being a real-valued process. By virtue of Theorem 7.9, u(t, x, i) :=
p
E X x,i (t) is twice continuously differentiable with respect to x, except
possibly at x = 0. We need only show that the partial derivatives satisfy
(7.38). To this end, similar to the proofs of Theorems 7.9 and 7.10, we
assume x to be a scalar without loss of generality. By virtue of (7.29),
 
∂u(t, x, α) x,α p−1 x,α ∂X x,α (t)
= pE |X (t)| sgn(X (t)) . (7.39)
∂x ∂x
Then it follows from (7.6) and (2.74) that
 x,α

∂u(t, x, α) x,α p−1 ∂X (t)
≤ KE |X (t)|
∂x ∂x
2
2p−2 ∂X x,α (t)
≤ E1/2 |X x,α (t)| E1/2
∂x
2p−2 Kt 1/2 p−1 κ0 t
≤ K(|x| e ) = K |x| e .

Similarly, detailed calculations lead to


∂ 2 u(t, x, α) p−2 κ0 t
≤ K |x| e .
∂x2

The lemma is thus proved. 2

7.3.3 Necessary and Sufficient Conditions for p-Stability


Theorem 7.12. Suppose that the equilibrium point 0 is exponentially p-
stable. Moreover assume that the coefficients b and σ have continuous
bounded derivatives with respect to the variable x up to the second order.
Then for each i ∈ M, there exists a function V (·, i) : Rr 7→ R such that

k1 |x|p ≤ V (x, i) ≤ k2 |x|p , x ∈ N, (7.40)

LV (x, i) ≤ −k3 |x|p for all x ∈ N − {0}, (7.41)


∂V
(x, i) < k4 |x|p−1 ,
∂xj
(7.42)
∂2V
(x, i) < k4 |x|p−2 ,
∂xj ∂xk
for all 1 ≤ j, k ≤ r, x ∈ N − {0}, and for some positive constants k1 , k2 , k3 ,
and k4 , where N is a neighborhood of 0.

Proof. For each i ∈ M, consider the function


Z T
V (x, i) = E|X x,i (u)|p du.
0
202 7. Stability

It follows from Lemma 7.11 that the functions V (x, i), i ∈ M are twice
continuously differentiable with respect to x except possibly at x = 0.
The equilibrium point 0 is exponentially p-stable, therefore by the defi-
nition of exponential p-stability, there is a β > 0 such that
Z T
V (x, i) ≤ K|x|p exp(−βu)du ≤ K|x|p .
0

Since 0 is an equilibrium point, |A(x, i)| ≤ K|x|2 and b(x, i)| ≤ K|x|.
Consequently, |L|x|p | ≤ K|x|p . An application of Itô’s lemma to g(x) = |x|p
implies that
Z T
E|X x,i (T )|p − |x|p = E L|X x,i (u)|p du
0Z
T
≥ −K E|X x,i (u)|p du = −KV (x, i).
0

Again, by the exponential p-stability, we can choose T so that


E|X x,i (T )|p ≤ (1/2)|x|p ,
and as a result V (x, i) ≥ |x|p /(2K). Thus, (7.40) is verified.
We note that
Z T
∂V (x, i) ∂
= E|X x,i (u)|p du
∂x` 0
Z T ∂x `

≤ K|x|p−1 exp(Ku)du ≤ K|x|p−1 .


0

Likewise, we can verify the second part of (7.42). Thus the proof is com-
pleted. 2
We end this section with the following results on linear systems. Assume
that the evolution (7.2) is replaced by
d
X
dX(t) = b(α(t))X(t)dt + σj (α(t))X(t)dwj (t), (7.43)
j=1

where b(i) and σj (i) are r × r constant matrices, and wj (t) are independent
one-dimensional standard Brownian motions for i = 1, 2, . . . , m0 , and j =
1, 2, . . . , d. Then we have the following two theorems.
Theorem 7.13. The equilibrium point x = 0 of system (7.43) together
with (7.3) is exponentially p-stable if and only if for each i ∈ M, there is
a function V (·, i) : Rr 7→ R satisfying equations (7.40) and (7.42) for some
constants ki > 0, i = 1, . . . , 4.

Proof. The proof of sufficiency was contained in [116]. However, the neces-
sity follows from Theorem 7.12 because the coefficients of (7.43) and (7.3)
satisfy the conditions of Theorem 7.12. We omit the details here. 2
7.4 Stability and Instability of Linearized Systems 203

Theorem 7.14. Let Q(x) ≡ Q be a constant matrix. Assume also that


the Markov chain α(t) is independent of the Brownian motion

w(t) = (w1 (t), w2 (t), . . . , wd (t))0

(or equivalently, α(0) is independent of the Brownian motion w(·)). If the


equilibrium point x = 0 of the system given by (7.43) and (7.3) is stable in
probability, then it is p-stable for sufficiently small p > 0.

Proof. The proof follows from a crucial observation. In this case, because
(7.43) is linear in X(t), X λx,α (t) = λX x,α (t). Using a similar argument as
that for [83, Lemma 6.4.1], we can conclude the proof; a few details are
omitted. 2

7.4 Stability and Instability of Linearized Systems


This section provides criteria for stability and instability. To proceed, we
put an assumption.

(A7.4) For each i ∈ M, there exist b(i), σ (i) ∈ Rr×r ,  = 1, 2, . . . , d,


and a generator of a continuous-time Markov chain Q b = (b qij )
such that as x → 0,

b(x, i) = b(i)x + o(|x|),


σ(x, i) = (σ1 (i)x, σ2 (i)x, . . . , σd (i)x) + o(|x|), (7.44)
b + o(1).
Q(x) = Q

b is irreducible and α
Moreover, Q b(t) is a Markov chain with gen-
b
erator Q.

Remark 7.15. Note that condition (A7.4) is rather natural. It is equiva-


lent to Q(x) being continuous at x = 0, and b(x, i) and σ(x, i) continuously
b(t) is an ergodic Markov
differentiable at x = 0. It follows from (A7.4) that α
chain. Denote the stationary distribution of α b(t) by π = (π1 , π2 , . . . , πm0 ) ∈
R1×m0 .

Remark 7.16. For any square matrix A ∈ Rr×r , A can be decomposed


to the sum of a symmetric matrix A1 and an antisymmetric matrix A2 . In
fact, A1 = (A + A0 )/2 and A2 = (A − A0 )/2. Moreover, the quadratic form
satisfies
A + A0
x0 Ax = x0 A1 x = x0 x. (7.45)
2
This observation is used in what follows.
204 7. Stability

Theorem 7.17. Assume condition (A7.4). Then the equilibrium point x =


0 of system given by (7.2) and (7.3) is asymptotically stable in probability
if
Xm0  d
X 
0 0
πi λmax b(i) + b (i) + σj (i)σj (i) < 0, (7.46)
i=1 j=1

and is unstable in probability if


m0
X   d
X 
0
πi λmin b(i) + b (i) + σj0 (i)σj (i)
i=1 j=1
d  (7.47)
1 X 0
2
− ρ(σj (i) + σj (i)) > 0.
2 j=1

Proof. (a) We first prove that the equilibrium point x = 0 of the system
given by (7.2) and (7.3) is asymptotically stable in probability if (7.46)
holds. For notational simplicity, define the column vector

µ = (µ1 , µ2 , . . . , µm0 )0 ∈ Rm0

with
 d
X 
µi = λmax b(i) + b0 (i) + σj0 (i)σj (i) .
j=1

Also let β := −πµ. Note that β > 0 by (7.46). It follows from assumption
(A7.4) and Lemma A.12 that the equation

b = µ + β1l
Qc

has a solution c = (c1 , c2 , . . . , cm0 )0 ∈ Rm0 . Thus we have


m0
X
µi − qbij cj = −β, i ∈ M. (7.48)
j=1

For each i ∈ M, consider the Liapunov function

V (x, i) = (1 − γci )|x|γ ,

where 0 < γ < 1 is sufficiently small so that 1 − γci > 0 for each i ∈ M. It
is readily seen that for each i ∈ M, V (·, i) is continuous, nonnegative, and
vanishes only at x = 0. Detailed calculations reveal that for x 6= 0, we have

∇V (x, i) = (1 − γci )γ|x|γ−2 x,


 
∇2 V (x, i) = (1 − γci )γ |x|γ−2 I + (γ − 2)|x|γ−4 xx0 .
7.4 Stability and Instability of Linearized Systems 205

In addition, it follows from (7.44) that


d
X
a(x, i) = σ(x, i)σ 0 (x, i) = σj (i)xx0 σj0 (i) + o(|x|2 ).
j=1

Note that for any matrix A ∈ Rr×r , we have


tr(σj (i)xx0 σj0 (i)A) = x0 σj0 (i)Aσj (i)x.
Therefore, we have that
  
d
1 X
LV (x, i) = tr σj (i)xx0 σj0 (i) + o(|x|2 ) ∇2 V (x, i)
2 j=1
X
0
+∇V (x, i)(b(i)x + o(|x|)) − qij (x)|x|γ γ(cj − ci )
 j6=i
!
1 X d
x 0 0
σ (i)σ (x0 σj0 (i)x)2
γ j j (i)x
= γ(1 − γci )|x| + (γ − 2)
2
j=1
|x|2 |x|4

x0 b(i)x X cj − c i 
+ − q ij (x) + o(1) .
|x|2 1 − γci 
j6=i
(7.49)
By virtue of Remark 7.16, we obtain
d
x0 b(i)x 1 X x0 σj0 (i)σj (i)x
+
|x|2 2 j=1 |x|2
 d  (7.50)
1 X
0 0
≤ λmax b(i) + b (i) + σj (i)σj (i) = µi .
2 j=1

Next, it follows from condition (A7.4) that when |x| and γ are sufficiently
small,
X cj − c i
qij (x)
1 − γci
j6=i
m0
X X ci (cj − ci )
= qij (x)cj + qij (x) γ (7.51)
j=1
1 − γci
j6=i
m
X 0

= qbij cj + O(γ) + o(1),


j=1

where o(1) → 0 as |x| → 0. Hence it follows from (7.49) and (7.51) that
when |x| < r0 with r0 and 0 < γ < 1 sufficiently small, we have
 
 m0
X 
LV (x, i) ≤ γ(1 − γci )|x|γ µi − qbij cj + o(1) + O(γ) .
 
j=1
206 7. Stability

Furthermore, by virtue of (7.48), we have


LV (x, i) ≤ γ(1 − γci )|x|γ (−β + o(1) + O(γ)) ≤ −κ(ε) < 0,
for any (x, i) ∈ N × M with ε < |x| < r0 , where N ⊂ Rr is a small
neighborhood of 0 and κ(ε) is a positive constant. Therefore we conclude
from Lemma 7.6 and Remark 7.8 that the equilibrium point x = 0 is
asymptotically stable in probability.
(b) Now we prove that the equilibrium point x = 0 is unstable in proba-
bility if (7.47) holds. Define the column vector θ = (θ1 , θ2 , . . . , θm0 )0 ∈ Rm0
by
 Xd  d
1 0 0 1 X 2
θi := λmin b(i) + b (i) + σj (i)σj (i) − ρ(σj (i) + σj0 (i)) ,
2 j=1
4 j=1

and set
m0
X
δ := −πθ = πi θi < 0.
i=1
As in part (a), assumption (A7.4), the definition of δ, and Lemma A.12
b = θ + δ1l has a solution c = (c1 , c2 , . . . , cm )0 ∈
imply that the equation Qc 0
m0
R and
Xd
θi − qbij cj = −δ > 0, i ∈ M. (7.52)
j=1

For i ∈ M, consider the Liapunov function


V (x, i) = (1 − γci )|x|γ ,
where −1 < γ < 0 is sufficiently small so that 1 − γci > 0 for each i ∈ M.
Obviously the nonnegative function V (·, i), i ∈ M satisfies (7.19). Similar
to the arguments in part (a), Remark 7.16 implies that
d
X
x0 σj0 (i)σj (i)x  
0 Xd
x b(i)x j=1 1 0 0
+ ≥ λmin b(i) + b (i) + σj (i)σj (i) .
|x|2 2|x|2 2 j=1

Note that for any symmetric matrix A with real eigenvalues λ1 ≥ λ2 ≥


· · · ≥ λn , using the transformation x = U y, where U is a real orthogonal
matrix such that U 0 AU = diag(λ1 , λ2 , . . . , λn ) (see [59, Theorem 8.1.1]),
we have
|x0 Ax| = |λ1 y12 + λ2 y22 + · · · + λn yn2 | ≤ ρA |y|2 = ρA |x|2 . (7.53)
Thus by applying (7.53) to the matrix σj0 (i) + σj (i), we obtain that

(x0 σj0 (i)x)2 (x0 (σj0 (i) + σj (i))x)2 1 2


= ≤ ρ(σj (i) + σj0 (i)) .
|x|4 4|x|4 4
7.4 Stability and Instability of Linearized Systems 207

Therefore, detailed computations as in part (a) (taking into account the


 2
extra term involving 41 ρ(σj (i) + σj0 (i)) ) show that for any sufficiently
small 0 < ε < r, we have
LV (x, i) ≤ −κ(ε) < 0, for any (x, i) ∈ N × M with ε < |x| < r0 ,
where N ⊂ Rr is a small neighborhood of 0 and κ(ε) is a positive constant.
Therefore Lemma 7.7 and Remark 7.8 imply that the equilibrium point
x = 0 is unstable in probability. The proof of the theorem is concluded. 2
Remark 7.18. Suppose that for all i ∈ M and j = 1, 2, . . . , d, the matrices
(σj0 (i) + σj (i)) are nonnegative definite. Then we have
ρ(σj (i) + σj0 (i)) = λmax (σj (i) + σj0 (i)) ≥ λmin (σj (i) + σj0 (i)) ≥ 0.
Consequently a close examination of the proof of Theorem 7.17 reveals that
the conditions (7.46) and (7.47) can be replaced by
m0
X  Xd 
πi λmax b(i) + b0 (i) + σj (i)σj0 (i)
i=1 j=1
d  (7.54)
1 X 0
2
− λmin (σj (i) + σj (i)) < 0,
2 j=1

and
m0
X  d
X 
0
πi λmin b(i) + b (i) + σj (i)σj0 (i)
i=1 j=1
d
! (7.55)
1 X 2
− λmax (σj (i) + σj0 (i)) > 0,
2 j=1

respectively. In a sense, the above two inequalities, in particular (7.54),


strengthen the corresponding results in Theorem 7.17.
Theorem 7.17 gives sufficient conditions in terms of the maximum and
minimum eigenvalues of the matrices for stability and instability of the
equilibrium point x = 0. Because there is a “gap” between the maximum
and minimum eigenvalues, a natural question arises: Can we obtain neces-
sary and sufficient conditions for stability? If the component X(t) is one-
dimensional, we have the following result.
We replace the first and second equations of (7.44) in assumption (A7.4)
by
b(x, i) = bi x + o(x),
(7.56)
σ(x, i) = σi x + o(|x|).
where x ∈ R, and bi and σi2 are real constants with σi2 ≥ 0, i ∈ M.
Then we immediately have the following corollary from Theorem 7.17 and
Remark 7.18.
208 7. Stability

Corollary 7.19. Let assumption (A7.4) and (7.56) be valid. Then the
equilibrium point x = 0 is asymptotically stable in probability if
m0
X  
σi2
πi b i − < 0,
i=1
2

and is unstable in probability if


m0
X  
σ2
πi bi − i > 0.
i=1
2

Remark 7.20. As can be seen from Corollary 7.19, if the continuous com-
ponent of the system is one-dimensional, we obtain a necessary and suffi-
cient condition for stability. One question of particular interest is: Will we
be able to obtain a similar condition for a multidimensional counterpart.
For linear systems of stochastic differential equations with constant coef-
ficients without switching, such a condition was obtained in Khasminskii
[83, pp. 220–224]. The main ingredient is the use of the transformations
y = x/|x| and ln |x|. The result is a sharp necessary and sufficient con-
dition. In Mao, Yin, and Yuan [119], inspired by the approach of [83],
Markov-modulated regime-switching diffusions were considered, and neces-
sary and sufficient conditions were obtained for exponential stability. The
main ingredient is the use of a logarithm transformation technique leading
to the derivation of the so-called Liapunov exponent. Such an approach
can be adopted to treat switching diffusions with state-dependent switch-
ing with no essential difficulty. Because in Chapter 8, we will also examine
a related problem for switched ordinary differential equations (a completely
degenerate switching diffusion with the absence of the diffusion terms), we
will not dwell on it here.

7.5 Examples
Example 7.21. To illustrate Theorem 7.17 and Corollary 7.19, we con-
sider a real-valued process given by

 dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t)
 P{α(t + ∆) = j|α(t) = i, X(s), α(s), s ≤ t} = q (X(t))∆ + o(∆),
ij
(7.57)
for j 6= i, where the jump process α(t) has three states and is generated by
 
−3 − sin x cos x + sin x2 1 + sin x cos x 2 − sin x2
 
 x2 x2 
Q(x) =  2 −2 − ,
 2 + x2 2 + x2 
4 − sin x sin2 x −4 + sin x − sin2 x
7.5 Examples 209

and the drift and diffusion coefficients are given by

b(x, 1) = x − x sin x, b(x, 2) = x − x sin x2 , b(x, 3) = 4x + x sin x,


3x 1 1
σ(x, 1) = − , σ(x, 2) = x + x sin x, σ(x, 3) = x − x sin2 x.
1 + x2 3 2
Associated with (7.57), there are three diffusions

3X(t)
dX(t) = (X(t) − X(t) sin X(t))dt − dw(t), (7.58)
1 + X 2 (t)
 
2
 1
dX(t) = X(t) − X(t) sin X (t) dt + X(t) + X(t) sin X(t) dw(t),
3
  (7.59)
1
dX(t) = (4X(t) + X(t) sin X(t)) dt + X(t) − X(t) sin2 X(t) dw(t),
2
(7.60)
switching from one to another according to the movement of the jump
process α(t). It is readily seen that as x → 0, the constants bi , σi2 , i = 1, 2, 3,
as in (7.56) are given by

b1 = 1, b2 = 1, b3 = 4
(7.61)
σ12 = 9, σ22 = 1, σ32 = 1.

Also as x → 0, Q(x) tends to


 
−3 1 2
 
b=
Q  2 −2

0 .
 
4 0 −4

The matrix is irreducible. By solving the system of equations π Qb = 0 and


b
1lπ = 1, we obtain the stationary distribution π associated with Q,

π = (0.5, 0.25, 0.25) . (7.62)

Finally, by virtue of (7.61) and (7.62) we examine that


3
X  
σ2
πi bi − i = −0.75 < 0.
i=1
2

Therefore, we conclude from Corollary 7.19 that the equilibrium point x = 0


of (7.57) is asymptotically stable in probability.
This example is interesting and provides insight. It was proven in [83,
pp. 171–172] that a one-dimensional nonlinear diffusion is stable if and
210 7. Stability

only if its linear approximation is stable. Hence we can check that the
equilibrium point x = 0 of (7.58) is stable in probability whereas (7.59) and
(7.60) are unstable in probability. Therefore the jump process α(t) could
be considered as a stabilization factor. Note that similar examples were
demonstrated in [116] under the assumptions that the jump component
α(·) is generated by a constant matrix Q and that the Markov chain α(·) is
independent of the Brownian motion w(·). Note also Examples 4.1 and 4.2
in [48] are also concerned with stability of switching systems. Their result
indicates that if the switching takes place sufficiently fast, the system will
be stable even if the individual mode may be unstable. Essentially, it is
related to singularly perturbed systems. Due to the fast variation, there is
a limit system that is an average with respect to the stationary distribution
of the Markov chain and that is stable. Then if the rate of switching is fast
enough the original system will also be stable. Such an idea was also used
in an earlier paper [18].
To illustrate, we plot a sample path of (7.57) in Figure 7.1. For com-
parison, we also demonstrate the sample paths of (7.58), (7.59), and (7.60)
(without switching) in Figures 7.2–7.4, respectively.

3
α(t)
X(t)
2.5

2
X(t) and α(t)

1.5

0.5

0
0 2 4 6 8 10
t

FIGURE 7.1. A sample path of (7.57) with initial condition (x, α) = (1.5, 1).

Example 7.22. (Lotka–Volterra model). This is a continuation of the dis-


cussion of the Lotka–Volterra model given in Example 1.1. The notation
and problem setup are as in Example 1.1. Motivated by the works [34]
and [118], define V (t, x, α) = et log(|x|) for (t, x, α) ∈ [0, ∞) × Rn+ × M. It
7.5 Examples 211

2.5
X1(t)

1.5
X1(t)

0.5

0
0 2 4 6 8 10
t

FIGURE 7.2. A sample path of (7.58) with initial condition x = 1.5.

1400
X2(t)

1200

1000

800
X2(t)

600

400

200

0
0 2 4 6 8 10
t

FIGURE 7.3. A sample path of (7.59) with initial condition x = 1.5.


212 7. Stability

15
x 10
10
X (t)
3
9

6
X3(t)

0
0 2 4 6 8 10
t

FIGURE 7.4. A sample path of (7.60) with initial condition x = 1.5.

follows from Itô’s lemma that

et log(|x(t)|) − log(|x(0)|)
Z t X n  Xn
x2i (s)
= es 2 r i (α(s)) − aij (α(s))xj (s)
0 i=1 |x(s)|! j=1

1 2x2 (s)
+ 1 − i 2 σi2 (α(s)) ds
2 |x(s)|
Z t Z t X n
x2i (s)
+ es log(|x(s)|)ds + es 2 σi (α(s))dwi (s).
0 0 i=1 |x(s)|

Denote Z t
x2i (s)
Mi (t) = es 2 σi (α(s))dwi (s),
0 |x(s)|
whose quadratic variation is
Z t
e2s 4 2
Mi (t), Mi (t) = 4 xi (s)σi (α(s))ds.
0 |x(s)|

By virtue of the exponential martingale inequality [47], for any positive


constants T , δ, and β, we have
 
δ
P sup [Mi (t) − Mi (t), Mi (t) ] > β ≤ e−δβ .
0≤t≤T 2
7.5 Examples 213

Choose T = kγ, δ = nεe−kδ , and β = (θekδ log(k))/(εn), where k ∈ N,


0 < ε < 1, θ > 1, and γ > 0 in the above equation. Then it follows that
( )
nεe−kγ θekδ log k
P sup [Mi (t) − Mi (t), Mi (t) ] > ≤ k −θ .
0≤t≤kγ 2 εn
P∞ −θ
Because k=1 k < ∞, it follows from the Borel–Cantalli lemma that
there exists some Ωi ⊂ Ω with P(Ωi ) = 1 such that for any ω ∈ Ωi , an
integer ki = ki (ω) such that for any k > ki , we have

nεe−kγ θekγ log k


Mi (t) ≤ Mi (t), Mi (t) + for all 0 ≤ t ≤ kγ.
2 εn
Tn
Now let Ω0 := i=1 Ωi . Then P(Ω0 ) = 1. Moreover, for any ω ∈ Ω0 , let

k0 (ω) := max {ki (ω), i = 1, 2, . . . , n} .

Then for any ω ∈ Ω0 and any k ≥ k0 (ω), we have


n Z
X t
x2i (s)
es 2 σi (α(s))dwi (s)
i=1 0 |x(s)|
n
X n
nεe−kγ X θekγ log k
= Mi (t) ≤ Mi (t), Mi (t) + ,
i=1
2 i=1
ε

where 0 ≤ t ≤ kγ. Then it follows that

et log(|x(t)|) − log(|x(0)|)
Z t X n  Xn
x2i (s)
≤ es 2 r i (α(s)) − aij (α(s))xj (s)
0 i=1 |x(s)| ! j=1

1 2x2i (s) 2
+ 1− 2 σi (α(s)) ds
2 |x(s)|
Z t Z t n
nεe−kγ 2s X x4i (s) 2
+ es log(|x(s)|)ds + e 4 σi (α(s))ds
0 0 2 i=1 |x(s)|
θekγ log k
+
Z t ε Xn 
s x2i (s) 2
≤ e log(|x(s)|) + 2 bi (α(s)) + σi (α(s))
0 |x(s)|
X n  i=1
− aij (α(s))xj (s)
j=1
n 
X  
εne−kγ s x4i (s)σi2 (α(s)) θekγ log k
+ e −1 4 + .
i=1
2 |x(s)| ε
214 7. Stability

Note that for any t ∈ [0, kγ], s ∈ [0, t], and (x, α) ∈ Rn+ × M, we have
X n  2  Xn 
xi 2
log(|x|) + 2 bi (α) + σ i (α) − a ij (α)x j
i=1 |x| j=1
  
εnes−kγ x4i σi2 (α)
+ −1
2 |x|4
n
1 X 3
≤ log(|x|) + κ − 2 β xi + K
|x| i=1
β
≤ log(|x|) + κ − √ |x| + K ≤ K.
n
Hence it follows that for all 0 ≤ t ≤ kγ with k ≥ k0 (ω), we have
Z t
θekγ log k
et log(|x(t)|) − log(|x(0)|) ≤ Kes ds +
0 ε
t θekγ log k
= K(e − 1) + .
ε
Thus for (k − 1)γ ≤ t ≤ kγ, we have
θekγ log k
log(|x(t)|) ≤ e−t log(|x(0)|) + K(1 − e−t ) +
εe(k−1)γ
θeγ log k
= e−t log(|x(0)|) + K(1 − e−t ) + ,
ε
and hence it follows that
log(|x(t)|) log(|x(0)|) K(1 − e−t ) θeγ log k
≤ + + .
log t et log t log t ε log((k − 1)γ)
Now let k → ∞ (and so t → ∞) and we obtain
log(|x(t)|) θeγ
lim sup ≤ .
t→∞ log t ε
Finally, by sending γ ↓ 0, ε ↑ 1, and θ ↓ 1, we have
log(|x(t)|)
lim sup ≤ 1,
t→∞ log t
as desired. Thus, the solution x(t) of (1.1) satisfies
log(|x(T )|)
lim sup ≤ 1. (7.63)
T →∞ log T
Furthermore, since
log |x(T )| log |x(T )| log T
lim sup = lim sup lim sup
T →∞ T T →∞ log T T →∞ T
log T
≤ lim sup ,
T →∞ T
7.6 Notes 215

log |x(T )|
lim sup ≤ 0 a.s. (7.64)
T →∞ T

7.6 Notes
The framework of this chapter is based on the paper of Khasminskii, Zhu,
and Yin [92]. Some new results are included, e.g., Theorems 7.9 and 7.10,
where we derived Kolmogorov-type backward equations without assuming
nondegeneracy of the diffusion part. These results are interesting in their
own right.
For brevity, we are not trying to cover every angle of the stability analysis.
For example, sufficient conditions for exponential p-stability can be derived
similar to that of [116], and as a result the verbatim proof for the sufficiency
is omitted. Nevertheless, necessary conditions for stability and stability
under linearization are provided. Finally, we note that using linearization to
infer the stability of the associated nonlinear systems should be interesting
owing to its wide range of applications.
8
Stability of Switching ODEs

8.1 Introduction
The main motivational forces for this chapter are the work of Davis [30]
on piecewise deterministic systems, and the work of Kac and Krasovskii
[79] on stability of randomly switched systems. In recent years, growing
attention has been drawn to deterministic dynamic systems formulated as
differential equations modulated by a random switching process. This is
because of the increasing demands for modeling large-scale and complex
systems, designing optimal controls, and carrying out optimization tasks.
In this chapter, we consider stability of such hybrid systems modulated by a
random-switching process, which are “equivalent” to a number of ordinary
differential equations coupled by a switching or jump process.
In this chapter, for random-switching systems, we first obtain sufficient
conditions for stability and instability. Our approach leads to a necessary
and sufficient condition for systems whose continuous component is one-
dimensional. For multidimensional systems, our conditions involve the use
of minimal and maximal eigenvalues of appropriate matrices. The differ-
ence of maximal and minimal eigenvalues results in a gap for stability and
instability. To close this gap, we introduce a logarithm transformation lead-
ing to the continuous component taking values in the unit sphere. This in
turn, enables us to obtain necessary and sufficient conditions for stability.
The essence is the utilization of the so-called Liapunov exponent.
Because the systems we are interested in have continuous components
(representing continuous dynamics) as well as discrete components (repre-

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 217
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_8,
© Springer Science + Business Media, LLC 2010
218 8. Stability of Switching ODEs

senting discrete events), their asymptotic behavior can be quite different


from a single system of differential equations. As noted, a random-switching
differential system may be considered as several differential equations cou-
pled by a switching process. We show that even though some of the indi-
vidual equations are not stable, the entire switching system may still be
stable as long as certain conditions are satisfied.
For random switching systems that are linear in their continuous com-
ponent, suppose that corresponding to different discrete states, some of the
associated differential equations are stable and the others are not. If the
jump component is ergodic, we show that as long as the stable part of the
differential equations dominates the rest (in an appropriate sense), the cou-
pled hybrid system will be stable. For nonlinear differential equations, the
well-known Hartman–Grobman theorem (see [134, Section 2.8]) provides
an important result concerning the local qualitative behavior. It says that
near a hyperbolic equilibrium point x0 , the nonlinear system ẋ = f (x) has
the same qualitative structure as that of the linear system ẋ = ∇f (x 0 )x,
although the topological equivalence may not hold for a non-hyperbolic
equilibrium point (e.g., a center). Treating hybrid systems, consider the
differential equations ẋ(t) = f (x(t), α(t)) and ẋ(t) = ∇f (x0 , α(t))x(t) for
α(t) belonging in a finite set. We show although some of the linear equations
have centers, as long as the spectrum of the coefficients of the differential
equation corresponding to the stable node dominates that of the centers,
the overall system will still be topologically equivalent to the linear (in
continuous component) system. To reveal the salient features, we present
a number of examples, and display the corresponding phase portraits. The
results are quite revealing.
The rest of the chapter is arranged as follows. Section 8.2 begins with
the formulation of the random-switching systems and provides definitions
of stability, instability, and asymptotical stability of the equilibrium point
of the random-switching hybrid systems and gives some preliminary re-
sults. For the purpose of our asymptotic analysis, we also present sufficient
conditions for stability, instability, and asymptotical stability. Easily veri-
fiable conditions for stability and instability of the systems are provided in
Section 8.3. Section 8.4 presents a sharper condition for systems that are
linear in the continuous state variable. Discussions on Liapunov exponent
are given in Section 8.5. To demonstrate our results, we provide several
examples in Section 8.6. Finally, we conclude the chapter with further re-
marks in Section 8.7.
8.2 Formulation and Preliminary Results 219

8.2 Formulation and Preliminary Results


8.2.1 Problem Setup
Throughout the chapter, we use z 0 to denote the transpose of z ∈ R`1 ×`2
with `i ≥ 1, whereas R`×1 is simply written as R` ; 1l = (1, 1, . . . , 1)0 ∈ Rm0
is a column vector with all entries being 1; the Euclidean norm for a row or
column vector x is denoted by |x|. As usual, I denotes p the identity matrix.
For a matrix A, its trace norm is denoted by |A| = tr(A0 A). When B
is a set, IB (·) denotes the indicator function of B. For A ∈ Rr×r being a
symmetric matrix, we use λmax (A) and λmin (A) to denote the maximum
and minimum eigenvalues of A, respectively.
Consider the system with random switching

Ẋ(t) = f (X(t), α(t)), X(0) = x ∈ Rr , α(0) = α ∈ M, (8.1)

where X(t) is the continuous state, f (·, ·) : Rr × M 7→ Rr , and α(·) is a


jump process taking value in a finite state space M = {1, 2, P. . . , m0 } with
generator Q(x) = qij (x) satisfying qij (x) ≥ 0 for j 6= i and j∈M qij (x) =
0 for all x ∈ Rr and i ∈ M. The evolution of the jump component is
described by

P{α(t + ∆) = j|α(t) = i, (X(s), α(s)), s ≤ t}


(8.2)
= qij (X(t))∆ + o(∆), i 6= j.

Note that in our formulation, x-dependent Q(x) is considered, whereas in


[6, 79, 116, 183], the constant generator Q was used.
Associated with the process (X(t), α(t)) defined by (8.1)–(8.2), there is
an operator L defined as follows. For each i ∈ M and any g(·, i) ∈ C 1 (Rr ),

Lg(x, i) = f 0 (x, i)∇g(x, i) + Q(x)g(x, ·)(i), (8.3)

where ∇g(x, i) denotes the gradient (with respect to the variable x) of


g(x, i), and
X
Q(x)g(x, ·)(i) = qij (x)g(x, j) for each i ∈ M. (8.4)
j∈M

For further references on the associated operator (or generator) of the hy-
brid system (8.1)–(8.2), we refer the reader to Chapter 2 of this book; see
also [79] and [150].
To proceed, we need conditions regarding the smoothness and growth of
the functions involved, and the condition that 0 is an equilibrium point of
the dynamic system. We assume the following hypotheses throughout the
chapter.

(A8.1) The matrix-valued function Q(·) is bounded and continuous.


220 8. Stability of Switching ODEs

(A8.2) For each α ∈ M, f (·, α) is locally Lipschitz continuous and


satisfies f (0, α) = 0.
(A8.3) There exists a constant K0 > 0 such that for each α ∈ M,

|f (x, α)| ≤ K0 (1 + |x|), for all x ∈ Rr . (8.5)

It is well known that under these conditions, system (8.1)–(8.2) has a


unique solution; see [150] for details. In what follows, a process starting
from (x, α) is denoted by Y x,α (t) = (X x,α (t), αx,α (t)) to emphasize the
dependence on the initial condition. If the context is clear, we simply write
Y (t) = (X(t), α(t)).

8.2.2 Preliminary Results


In this subsection, we first recall the definitions of stability, instability,
asymptotic stability, and exponential p-stability. Then we present some
preparatory results of stability and instability in terms of Liapunov func-
tions.
Definition 8.1. ([79]) The equilibrium point x = 0 of system (8.1)–(8.2)
is said to be
(i) stable in probability, if for any α = 1, . . . , m0 and r0 > 0,

lim P{sup |X x,α (t)| > r0 } = 0,


x→0 t≥0

and it is said to be unstable in probability if it is not stable in proba-


bility;
(ii) asymptotically stable in probability, if it is stable in probability and

lim P{ lim X x,α (t) = 0} = 1, for each α = 1, . . . , m0 ;


x→0 t→∞

(iii) exponentially p-stable, if for some positive constants K and γ,

E|X x,α (t)|p ≤ K|x|p exp{−γt}, for any (x, α) ∈ Rr × M.

The definitions above should be compared to Definition 7.4. They are of


the same spirit although now we have a completely degenerate case with the
diffusion matrix being identically 0. To study stability of the equilibrium
point x = 0, we first observe that almost all trajectories of the system
(8.1)–(8.2) starting from a nonzero state will never reach the origin with
probability one.
Proposition 8.2. Let conditions (A8.1)–(A8.3) be satisfied. Then

P{X x,α (t) 6= 0, t ≥ 0} = 1, for any (x, α) ∈ Rr × M with x 6= 0. (8.6)


8.2 Formulation and Preliminary Results 221

Proof. This proposition can be proved using a slight modification of the


argument in Lemma 7.1. The details are omitted for brevity. 2
In view of (8.6), we can work with functions V (·, i), i ∈ M, which are
continuously differentiable in a deleted neighborhood of 0 in what follows.
This turns out to be quite convenient. Another immediate consequence of
(8.6) is the following Lp estimate for the solution of the system (8.1)–(8.2).
The result is interesting in its own right.

Theorem 8.3. Let conditions (A8.1)–(A8.3) be satisfied. Then for any


p ≥ 1 and any (x, α) ∈ Rr × M with x 6= 0, we have
 
1 1
E|X x,α (t)| ≤ |x|p + exp(2pm0 K0 t) −
 2  2
(8.7)
p 1
≤ |x| + exp(2pm0 K0 t).
2

Proof. For each α ∈ M, the function V (x, α) = |x|p is continuously differ-


entiable in the domain |x| > δ for any δ > 0. Let τδ be the first exit time
of the process X x,α (·) from the set {x ∈ Rr : |x| > δ} × M. That is,

τδ := inf{t ≥ 0 : |X x,α (t)| ≤ δ}.

P t > 0, set τδ (t) := min{τδ , t}. Because V (x, i) is independent


For any
of i, j∈M qij (x)V (x, j) = 0. Then it follows from the Cauchy–Schwartz
inequality and (8.5) that

p
E |X x,α (τδ (t))|
Z τδ (t)
p p−2
= |x| + E p |X x,α (s)| X x,α (s), f (X x,α (s), αx,α (s)) ds
0 Z τδ (t)
p p−1
≤ |x| + pm0 K0 E |X x,α (s)| (1 + |X x,α (s)|)ds
0
Z τδ (t)
p
≤ |x|p + 2pm0 K0 E |X x,α (s)| ds
Z τδ (t) 0
p−1
+ pm0 K0 E |X x,α (s)| I{δ≤|X x,α (s)|<1} ds
0 Z τδ (t)  
p x,α p 1
≤ |x| + 2pm0 K0 E |X (s)| + ds.
0 2

Note that for all s ≤ τδ (t), we have s = τδ (s). Hence we have


Z τδ (t)  
p p 1
E |X x,α (τδ (t))| ≤ |x|p + 2pm0 K0 E |X x,α (τδ (s))| + ds
Z t0 2
p 1
≤ |x|p + 2pm0 K0 E |X x,α (τδ (s))| + ds.
0 2
222 8. Stability of Switching ODEs

p
Applying Gronwall’s inequality to [E |X x,α (τδ (t))| + (1/2)] leads to
 
x,α p 1 p 1
E |X (τδ (t))| + ≤ |x| + exp(2pm0 K0 t), (8.8)
2 2
or equivalently,
 
p 1 1
E |X x,α (τδ (t))| ≤ |x|p + exp(2pm0 K0 t) −
 2 2
(8.9)
1
≤ |x|p + exp(2pm0 K0 t).
2
Note that we have from (8.6) that

τδ (t) → t as δ → 0 with probability 1 for any t > 0.

Finally, letting δ → 0 in (8.9), by Fatou’s lemma, we obtain (8.7). 2


Remark 8.4. If condition (8.5) is replaced by

|f (x, α)| ≤ K|x|, for all (x, α) ∈ Rr × M, (8.10)

where K is some positive constant, then the conclusion Theorem 8.3 can
be strengthened to the following. For any β ∈ R and any (x, α) ∈ Rr × M
with x 6= 0, we have
E|X x,α (t)|β ≤ |x|β eρt , (8.11)
where ρ is a constant depending only on β, m0 , and the constant K given
in (8.10).
In fact, by virtue of (8.10), we obtain by a slight modification of the
argument in the proof of Theorem 8.3 that

E|X x,α (τδ (t))|β ≤ |x|β eρt ,

where ρ is a constant depending only on β, m0 , and the constant K in


(8.10). Then similar to the proof of Theorem 8.3, (8.11) follows from (8.6)
and Fatou’s lemma.
We finally note that if f (x, α) is Lipschitzian with a global Lipschitz
constant L0 , and f (0, α) = 0, then (8.10) is verified.
Next, concerning stability and asymptotical stability of the equilibrium
point x = 0 of the system (8.1)–(8.2), we have the following results.
Proposition 8.5. Let D ⊂ Rr be a neighborhood of 0. Suppose that for
each i ∈ M, there exists a nonnegative function V (·, i) : D 7→ R such that
(i) V (·, i) is continuous in D and vanishes only at x = 0;
(ii) V (·, i) is continuously differentiable in D − {0} and satisfies

LV (x, i) ≤ 0, for all x ∈ D − {0}. (8.12)


8.2 Formulation and Preliminary Results 223

Then the equilibrium point x = 0 is stable in probability.

Proof. Choose r0 > 0 such that the ball Br0 = {x ∈ Rr : |x| < r0 } and
its boundary ∂Br0 = {x ∈ Rr : |x| = r0 } are contained in D. Set Vr0 :=
inf {V (x, i) : x ∈ D \ Br0 , i ∈ M}. Then Vr0 > 0 by assumption (i). Next,
assumption (ii) leads to
Z τr0 ∧t
Ex,i V (X(t ∧ τr0 ), α(t ∧ τr0 )) = V (x, i) + Ex,i LV (X(s), α(s))ds
0
≤ V (x, i),

where (x, i) ∈ Br0 × M and τr0 is the first exit time from Br0 ; that is,
τr0 := {t ≥ 0 : |x(t)| ≥ r0 }. Because V is nonnegative, we further have
h i
Vr0 P {τr0 ≤ t} ≤ Ex,i V (X(τr0 ), α(τr0 ))I{τr ≤t} ≤ V (x, i).
0

Note that τr0 ≤ t if and only if sup0≤u≤t |x(u)| > r0 . Therefore it follows
that  
V (x, i)
Px,i sup |X(u)| > r0 ≤ .
0≤u≤t V r0
Letting t → ∞, we obtain from assumption (i) that
 
V (x, i)
Px,i sup |X(t)| > r0 ≤ .
0≤t V r0

Finally, the desired conclusion follows from the assumptions that V (0, i) =
0 and V (·, i) is continuous for each i ∈ M. 2
Proposition 8.6. Assume the conditions of Proposition 8.5. Suppose also
that for each i ∈ M, the function V (·, i) satisfies

LV (x, i) ≤ −κ(%) < 0, for all x ∈ D − {x ∈ Rr : |x| > %}, (8.13)

where % > 0 and κ(%) is a positive constant. Then the equilibrium point
x = 0 is asymptotically stable in probability.

Proof. By virtue of Proposition 8.5, the equilibrium point x = 0 is stable


in probability. It remains to show that
n o
lim Px,i lim X(t) = 0 = 1.
x→0 t→∞

The equilibrium point x = 0 is stable in probability, therefore it follows


that for any ε > 0 and r0 > 0, there exists some δ > 0 (without loss of
generality, we may assume that δ < r0 ) such that
 
ε
Px,i sup |X(t)| < r0 ≥ 1 − , for any (x, i) ∈ Bδ × M, (8.14)
t≥0 2
224 8. Stability of Switching ODEs

where Bδ := {x ∈ Rr : |x| < δ}. Now fix some (x, α) ∈ (Bδ − {0}) × M and
let %1 > % > 0 be arbitrary satisfying %1 < |x|. Define

τ% := {t ≥ 0 : |X(t)| ≤ %} ,
τr0 := {t ≥ 0 : |X(t)| ≥ r0 } .

Then it follows that for any t > 0,

Ex,α V (X(t ∧ τ% ∧ τr0 ), α(t ∧ τ% ∧ τr0 )) − V (x, α)


Z t∧τ% ∧τr0
≤ Ex,α LV (X(s), α(s))ds
0
≤ −κ(%)Ex,α [t ∧ τ% ∧ τr0 ].

Because V is nonnegative, we have Ex,α [t ∧ τ% ∧ τr0 ] ≤ V (x, α)/(κ(%)) and


hence tPx,α {τ% ∧ τr0 > t} ≤ V (x, α)/(κ(%)). Letting t → ∞, we obtain

Px,α {τ% ∧ τr0 = ∞} = 0 or Px,α {τ% ∧ τr0 < ∞} = 1. (8.15)

Note that (8.14) implies that Px,α {τr0 < ∞} ≤ ε/2. Hence it follows that

Px,α {τ% < ∞} ≥ Px,α {τ% ∧ τr0 < ∞} − Px,α {τr0 < ∞}
ε (8.16)
≥1− .
2

Now let
τ%1 := {t ≥ τ% : |X(t)| ≥ %1 } .

We use the convention that inf ∅ = ∞. Then for any t > 0, we have

Ex,α V (X(t ∧ τ%1 ), α(t ∧ τ%1 ))


Z t∧τ%1
= Ex,α V (X(t ∧ τ% ), α(t ∧ τ% )) + Ex,α LV (X(s), α(s))ds
t∧τ%

≤ Ex,α V (X(t ∧ τ% ), α(t ∧ τ% )).


(8.17)
Note that τ% ≤ τ%1 by definition and hence τ% ≥ t implies that τ%1 ≥ t.
Therefore it follows that
 
Ex,α I{τ% ≥t} V (X(τ% ∧ t), α(τ% ∧ t))
 
= Ex,α I{τ% ≥t} V (X(t), α(t)) (8.18)
 
= Ex,α I{τ% ≥t} V (X(τ%1 ∧ t), α(τ%1 ∧ t)) .
8.2 Formulation and Preliminary Results 225

Then we have by virtue of (8.17) and (8.18) that


 
Ex,α I{τ% <t} V (X(τ%1 ∧ t), α(τ%1 ∧ t))
 
≤ Ex,α I{τ% <t} V (X(τ% ∧ t), α(τ% ∧ t))
 
= Ex,α I{τ% <t} V (X(τ% ), α(τ% ))
≤ Vb% ,

where Vb% := sup {V (y, j) : |y| = %, j ∈ M}. Furthermore,


h i
Vb% ≥ Ex,α I{τ% <t} I{τ% <t} V (X(τ%1 ∧ t), α(τ%1 ∧ t))
h 1 i
= Ex,α I{τ% <t} V (X(τ%1 ∧ t), α(τ%1 ∧ t))
h 1 i
= Ex,α I{τ% <t} V (X(τ%1 ), α(τ%1 ))
1

≥ V%1 Px,α {τ%1 < t} ,

where V%1 := inf {V (y, j) : |y| = %1 , j ∈ M}. Because for each i ∈ M,


V (·, i) vanishes only at x = 0, V%1 > 0. Since V is continuous, we can
choose % sufficiently small so that

Vb% ε
Px,α {τ%1 < t} ≤ ≤ .
V %1 2

Letting t → ∞, we obtain
ε
Px,α {τ%1 < ∞} ≤ . (8.19)
2
Finally, it follows from (8.16) and (8.19) that

Px,α {τ% < ∞, τ%1 = ∞} ≥ Px,α {τ% < ∞} − Px,α {τ%1 < ∞}
ε ε
≥ 1 − − = 1 − ε.
2 2
This implies that
 
Px,α lim sup |X(t)| ≤ %1 ≥ 1 − ε.
t→∞

Because %1 > 0 can be chosen to be arbitrarily small, we have


n o
Px,α lim X(t) = 0 ≥ 1 − ε.
t→∞

This finishes the proof of the proposition. 2


226 8. Stability of Switching ODEs

Proposition 8.7. Let D ⊂ Rr be a neighborhood of 0. Assume that for


each i ∈ M, there exists a nonnegative function V (·, i) : D 7→ R such that
V (·, i) is continuously differentiable in every deleted neighborhood of 0,

LV (x, i) ≤ −κ(ε) < 0, for all x ∈ D − {x ∈ Rr : |x| > ε}, (8.20)

where ε > 0 and κ(ε) is a positive constant, and

lim V (x, i) = ∞, for each i ∈ M. (8.21)


|x|→0

Then the equilibrium point x = 0 is unstable in probability.

Proof. Let r0 be a positive real number so that Br0 := {x ∈ Rr : |x| < r0 }


and its boundary ∂Br0 := {x ∈ Rr : |x| = r0 } are contained in D. Fix some
(x, α) ∈ Br0 × M and let 0 < ε < |x|. Then for any t > 0, we have

Ex,α V (X(t ∧ τε ∧ τr0 ), α(t ∧ τε ∧ τr0 ))


Z t∧τε ∧τr0
= V (x, α) + Ex,α LV (X(s), α(s))ds (8.22)
0
≤ V (x, α).

As in the proof of Proposition 8.6, (8.20) implies that

Px,α {τε ∧ τr0 < ∞} = 1.

Hence letting t → ∞ in (8.22), we obtain by virtue of Fatou’s lemma that

Ex,α V (X(τε ∧ τr0 ), α(τε ∧ τr0 )) ≤ V (x, α).

Furthermore, because V is nonnegative, we have


h i
V (x, α) ≥ Ex,α V (X(τε ), α(τε ))I{τε <τr }
0

≥ inf {V (y, j) : |y| = ε, j ∈ M} Px,α {τε < τr }


 
= Vε Px,α sup |X(t)| < r0 ,
0≤t≤τε

where Vε = inf {V (y, j) : |y| = ε, j ∈ M}. By virtue of Proposition 8.2,


τε → ∞ a.s. as ε → 0. Also, it follows from (8.21) that Vε → ∞ as ε → 0.
Therefore it follows that as ε → 0, we have
 
Px,α sup |X(t)| < r0 = 0.
t≥0

This demonstrates that the equilibrium point x = 0 is unstable in proba-


bility. 2
8.3 Stability and Instability: Sufficient Conditions 227

8.3 Stability and Instability: Sufficient Conditions


In the previous section, we obtained sufficient conditions for stability, in-
stability, and asymptotic stability, using a Liapunov function argument.
Because the results are based on the existence of Liapunov functions, to
apply them, it is necessary to find appropriate Liapunov functions. Never-
theless, finding suitable Liapunov functions is more often than not a very
challenging task. In many applications, it is often more convenient to be
able to analyze the stability through conditions on the coefficients of the
corresponding stochastic differential equations. Thus in this section we con-
tinue our study by providing easily verifiable conditions on the coefficients
of the system (8.1)–(8.2).
Note that if the generator Q is irreducible, then all the states of the
Markov chain belong to the same ergodic class. For multiple ergodic class
cases, one may use the idea of two-time-scale formulation and singular
perturbation methods as in [176] and [179]. To proceed, we assume the
following condition holds throughout the rest of the section.
b = (b
(A8.4) For each i ∈ M, there exist Ai ∈ Rr×r and Q qij ) ∈ Rm0 ×m0 ,
a generator of a continuous-time Markov chain α b(t) such that
as x → 0,
f (x, i) = Ai x + o(|x|),
(8.23)
b + o(1).
Q(x) = Q

Moreover, Qb is irreducible. Denote the unique stationary distri-


bution of the associated Markov chain α b(t) by
π = (π1 , π2 , . . . , πm0 ) ∈ R1×m0 .

Theorem 8.8. Assume conditions (A8.1)–(A8.4). Then the following as-


sertions hold.
(i) If there exists a symmetric and positive definite matrix G such that
m0
X
πi λmax (GAi G−1 + G−1 A0i G) < 0, (8.24)
i=1

then the equilibrium point x = 0 of the system (8.1)–(8.2) is asymp-


totically stable in probability.
(ii) If there exists a symmetric and positive definite matrix G such that
m0
X
πi λmin (GAi G−1 + G−1 A0i G) > 0, (8.25)
i=1

then the equilibrium point x = 0 of the system (8.1)– (8.2) is unstable


in probability.
228 8. Stability of Switching ODEs

Proof. (a) We first prove that the equilibrium point x = 0 of system


(8.1)–(8.2) is asymptotically stable in probability if (8.24) holds for some
symmetric and positive definite matrix G. For notational simplicity, define
the column vector µ = (µ1 , µ2 , . . . , µm0 )0 ∈ Rm0 with

1
µi = λmax (GAi G−1 + G−1 A0i G),
2
where G is as in (8.24). Also let
m0
X
β := −πµ = − πi µ i .
i=1

Note that β > 0 by (8.24). By virtue of condition (A8.4) and Lemma A.12,
the equation
b = µ + β1l
Qc
has a solution c = (c1 , c2 , . . . , cm0 )0 ∈ Rm0 . Thus we have
m0
X
µi − qbij cj = −β, i ∈ M. (8.26)
j=1

For each i ∈ M, consider the Liapunov function

V (x, i) = (1 − γci )(x0 G2 x)γ/2 ,

where 0 < γ < 1 is sufficiently small so that 1 − γci > 0 for each i ∈ M. It
is readily seen that for each i ∈ M, V (·, i) is continuous, nonnegative, and
vanishes only at x = 0. In addition, since γ > 0 and 1 − γci > 0, we have

lim V (x, i) ≥ lim (1 − γci )(λmin (G2 ))γ/2 |x|γ = ∞. (8.27)


|x|→∞ |x|→∞

Detailed calculation reveals that for x 6= 0, we have

∇V (x, i) = (1 − γci )γ(x0 G2 x)γ/2−1 G2 x.

It yields that for x 6= 0,

LV (x, i) = (1 − γci )γ(x0 G2 x)γ/2−1 x0 G2 (Ai x + o(|x|))


X
− qij (x)(x0 G2 x)γ/2 γ(cj − ci )
j6=i  0 2 
x G Ai x X cj − c i
= (1 − γci )γ(x0 G2 x)γ/2 + o(1) − q ij (x) .
x0 G2 x 1 − γci
j6=i
(8.28)
8.3 Stability and Instability: Sufficient Conditions 229

It follows from condition (A8.4) that for sufficiently small |x|,


X cj − c i
qij (x)
1 − γci
j6=i
m0
X X ci (cj − ci )
= qij (x)cj + qij (x) γ (8.29)
j=1
1 − γci
j6=i
m0
X
= qbij cj + O(γ) + o(1),
j=1

where o(1) → 0 as |x| → 0 and O(γ) → 0 as γ → 0. Meanwhile, using the


transformation y = Gx we have
x 0 G2 Ai x x0 (G2 Ai + A0i G2 )x y 0 G−1 (G2 Ai + A0i G2 )G−1 y
= =
x0 G2 x 2x0 G2 x 2y 0 y
1 −1 2 0 2 −1 (8.30)
≤ λmax (G (G Ai + Ai G )G )
2
1
= λmax (GAi G−1 + G−1 A0i G) = µi .
2
Moreover, note that
(λmin (G2 ))γ/2 |x|γ ≤ (x0 G2 x)γ/2 ≤ (λmax (G2 ))γ/2 |x|γ . (8.31)
When |x| < δ with δ and 0 < γ < 1 sufficiently small, (8.28)–(8.31) lead to
 
 m0
X 
LV (x, i) ≤ γ(1 − γci )(λmin (G2 ))γ/2 |x|γ µi − qbij cj + o(1) + O(γ) .
 
j=1

Furthermore, by virtue of (8.26), we have

LV (x, i) ≤ γ(1 − γci )(λmin (G2 ))γ/2 |x|γ (−β + o(1) + O(γ))
≤ −κ(ε) < 0,

for any (x, i) ∈ N0 × M with |x| > ε, where N0 ⊂ Rr is a small neigh-


borhood of 0 and κ(ε) is a positive constant. Therefore we conclude from
Proposition 8.6 that the equilibrium point x = 0 is asymptotically stable
in probability.
(b) Now we prove that the equilibrium point x = 0 is unstable in prob-
ability if (8.25) holds for some symmetric and positive definite matrix G.
Define the column vector θ = (θ1 , θ2 , . . . , θm0 )0 ∈ Rm0 by
1
λmin (GAi G−1 + G−1 A0i G),
θi :=
2
and set δ = −πθ. Note that
m0
X
δ=− πi θi < 0.
i=1
230 8. Stability of Switching ODEs

As in part (a), assumption (A8.4), the definition of δ, and Lemma A.12


b = θ + δ1l has a solution c = (c1 , c2 , . . . , cm )0 ∈
imply that the equation Qc 0
m0
R and
m0
X
θi − qbij cj = −δ > 0, i ∈ M. (8.32)
j=1

For i ∈ M, consider the Liapunov function

V (x, i) = (1 − γci )(x0 G2 x)γ/2 ,

where −1 < γ < 0 is sufficiently small so that 1 − γci > 0 for each i ∈ M.
Similar to the argument in (8.27), we can verify that V (·, i) satisfies (8.21)
for each i ∈ M. Detailed computations as in part (a) show that for any
sufficiently small 0 < ε < r0 ,

LV (x, i) ≤ −κ(ε) < 0, for any (x, i) ∈ N0 × M with |x| > ε,

where N0 ⊂ Rr is a small neighborhood of 0 and κ(ε) is a positive con-


stant. Therefore Proposition 8.7 implies that the equilibrium point x = 0
is unstable in probability. This completes the proof of the theorem. 2

Corollary 8.9. Assume conditions (A8.1)–(A8.4). Then the following as-


sertions hold.

(i) The equilibrium point x = 0 is asymptotically stable in probability if


m0
X
πi λmax (Ai + A0i ) < 0. (8.33)
i=1

(ii) The equilibrium point x = 0 is unstable in probability if


m0
X
πi λmin (Ai + A0i ) > 0. (8.34)
i=1

Proof. This corollary follows from Theorem 8.8 immediately by choosing


the symmetric and positive definite matrix G in (8.24) and (8.25) to be the
identity matrix I. 2
Theorem 8.8 and Corollary 8.9 give sufficient conditions in terms of the
maximum and minimum eigenvalues of the matrices for stability and in-
stability of the equilibrium point x = 0. Because there is a “gap” between
the maximum and minimum eigenvalues, a natural question arises: Can we
obtain a necessary and sufficient condition for stability. If the component
X(t) is one-dimensional, we have the following result from Theorem 8.8,
which is a necessary and sufficient condition.
8.4 A Sharper Result 231

Corollary 8.10. Assume conditions (A8.1)–(A8.4). Let the continuous


component X(t) of the hybrid process (X(t), α(t)) given by (8.1) and (8.2)
be one-dimensional. Then the equilibrium point x = 0 is asymptotically
stable in probability if
m0
X
πi Ai < 0,
i=1

and is unstable in probability if


m0
X
πi Ai > 0.
i=1

8.4 A Sharper Result


This section deals with systems that are linear in the continuous state
variable x. The system we are interested in is given by

Ẋ(t) = A(α(t))X(t), (8.35)

where A(i) = Ai ∈ Rr×r , X(t) ∈ Rr , and α(t) is a continuous-time Markov


chain with a generator Q independent of t. Assume moreover that the
Markov chain α(t) is irreducible. Denote the corresponding stationary dis-
tribution by π = (π1 , π2 , . . . , πm0 ). Our main objective is to obtain a nec-
essary and sufficient condition, or, in other words, to close the gap due to
the presence of minimal and maximal eigenvalues as we mentioned in the
previous section. Denote the solution of (8.35) by X(t) with initial condi-
tion (X(0), α(0)). Assume that X(0) 6= 0. Then according to Corollary 8.2,
X(t) 6= 0 for any t ≥ 0 with probability 1. As a result, Y (t) = X(t)/|X(t)|
is well defined and takes values on the unit sphere

Sr = {y ∈ Rr : |y| = 1}.

Define z = ln |x|. It is readily seen that

X 0 (t)Ẋ(t)
ż(t) = . (8.36)
|X(t)|2

In what follows, we deal with a case that the Markov chain is fast varying.
It acts as a “noise,” whereas the X(t) is slowly varying. In the end, the
noise is averaged out and replaced by its stationary distribution. To put
this in a mathematical form, we suppose that there is a small parameter
ε > 0 and
Q0
Q = Qε = ,
ε
232 8. Stability of Switching ODEs

where Q0 is the generator of an irreducible Markov chain. Note that in this


case, α(t) should really be written as αε (t). In what follows, we adopt this
notation.
Using (8.36), for any T > 0, we have
Z
z(T ) − z(0) 1 T X 0 (t)A(αε (t))X(t)
= dt
T T 0 |X(t)|2
m Z
1X 0 T
X 0 (t)Ai X(t)
= [I{αε (t)=i} − πi ]dt (8.37)
T i=1 0 |X(t)|2
m Z
1 X 0 T
X 0 (t)Ai X(t)
+ πi dt.
T i=1 0 |X(t)|2

For arbitrary T > 0, we partition the interval [0, T ] by use of T −∆ > 0


where ∆ > 0 is a parameter. Then we choose ε as a function of T −∆ . Such
a choice will lead to the desired result. To proceed, choose a real number
∆ > 1/2. Denote δ∆ = T −∆ and N = bT /δ∆ c, where byc is the usual
floor function notation for a real number y. Note that N = O(T 1+∆ ) and
N δ∆ = O(T ). Next, let 0 = t0 < t1 < · · · < tN = T be a partition of
[0, T ] such that tk = kδ∆ for k = 0, 1, . . . , N . For notational simplicity, for
i = 1, . . . , m0 , denote
X 0 (t)Ai X(t)
ζi (t) = 2
,
 |X(t)|

 ζ (0), if 0 ≤ t < t2 ,

 i
e
ζi (t) = ζi (tk−1 ), if tk ≤ t < tk+1 , k = 2, . . . , N − 1,



 ζ (t
i N −1 ), if tN ≤ t ≤ T.
Iiε (t) = I{αε (t)=i} − πi .

Note that in the above, ζei (t) is a piecewise constant approximation of ζi (t)
with the interpolation intervals [tk , tk+1 ), k = 0, 1, . . . , N .
Lemma 8.11. Suppose that Q0 is irreducible and that
ε = o(T −∆ ) as T → ∞. (8.38)
Then
m0 Z T
1 X X 0 (t)Ai X(t)
[I{αε (t)=i} − πi ]dt → 0 in probability as T → ∞.
T i=1 0 |X(t)|2

Proof. The assertion of Lemma 8.11 will follow immediately if we can show
that for each i ∈ M,
Z
1 T 2
E ζi (t)Iiε (t)dt → 0 as T → ∞.
T 0
8.4 A Sharper Result 233

Using the triangle inequality and the Cauchy–Schwarz inequality, we can


verify that for each i ∈ M,
Z
1 T 2
E ζi (t)Iiε (t)dt
T 0 Z Z T
T
2 2 2 2
≤ 2E [ζi (t) − ζei (t)]Iiε (t)dt + 2 E ζei (t)Iiε (t)dt (8.39)
T Z 0 ZT 0
K T 2 T 2
≤ E[ζi (t) − ζei (t)]2 dt + 2 E ζei (t)Iiε (t)dt .
T 0 T 0

In the above and hereafter, K is used as a generic positive constant, whose


values may change for different appearances. Clearly, for each i ∈ M, ζi (·)
is uniformly bounded by |Ai |, the norm of Ai . Thus

sup E|ζi (t)|2 < ∞.


0≤t≤T

Because X(t)/|X(t)| takes values on the unit sphere, it is readily verified


that
d X 0 (t)Ai X(t)
sup ≤ K, (8.40)
0≤t≤T dt |X(t)|2
where K is independent of T . As a result, ζi (·) is Lipschitz continuous,
uniformly on [0, T ]. Consequently,
Z T N
X −1 Z tk+1
E[ζi (t) − ζei (t)]2 dt = E[ζi (t) − ζi (tk−1 )]2 dt
0 k=0 tk
N
X −1 Z tk+1
(8.41)
≤K [t − tk−1 ]2 dt
k=0 tk
3 2
= O(N δ∆ )= O(T δ∆ ) → 0 as T → ∞,

by the choice of δ∆ .
By means of (8.39) and (8.41), it remains to show that the last term of
(8.39) converges to 0 as T → ∞. For any 0 ≤ t ≤ T , define
hZ t i2
hi (t) = E ζei (s)Iiε (s)ds . (8.42)
0

Then Z t
dhi (t)
=2 E[ζei (s)Iiε (s)ζei (t)Iiε (t)]ds. (8.43)
dt 0
For 0 ≤ t ≤ t2 ,
Z t
E[ζei (s)Iiε (s)ζei (t)Iiε (t)]ds
0Z t2
≤ E1/2 |ζei (s)|2 E1/2 |ζei (t)|2 ds ≤ Kt2 = O(T −∆ ).
0
234 8. Stability of Switching ODEs

For tk ≤ t < tk+1 with k = 2, . . . , N , we have


Z tk−1 Z t ! h i
dhi (t)
=2 + E ζei (s)Iiε (s)ζei (t)Iiε (t) ds.
dt 0 tk−1

Furthermore, Z t
E[ζei (s)Iiε (s)ζei (t)Iiε (t)]ds
tk−1
Z t
≤ E1/2 |ζei (s)|2 E1/2 |ζei (t)|2 ds
tk−1

≤ K(t − tk−1 ) = O(T −∆ ).


It follows that
Z tk−1
dhi (t)
=2 E[ζei (s)Iiε (s)ζei (t)Iiε (t)]ds + O(T −∆ ). (8.44)
dt 0

For s ≤ tk−1 ≤ t < tk+1 , by use of (8.38),

E[ζei (s)Iiε (s)ζei (t)Iiε (t)]


= |E[ζei (s)Iiε (s)E(ζei (t)Iiε (t)|Ftk−1 )]|
= |E[ζei (s)Iiε (s)ζi (tk−1 )E(Iiε (t)|Ftk−1 )]|
(8.45)
≤ E1/2 |ζei (s)|2 E1/2 |ζi (tk−1 )|2 O(ε + exp(−(t − tk−1 )/ε))
= O(ε + exp(−(tk − tk−1 )/ε))
= O(ε + exp(−T −∆ /ε)).

The next to the last inequality follows from the asymptotic expansions
obtained in, for example, [176, Lemma 5.1].
Using (8.44) and (8.45), we obtain that
dhi (t)   T −∆ 
sup ≤ O(T )O ε + exp − for each i ∈ M.
0≤t≤T dt ε
Because hi (0) = 0, we conclude
  T −∆ 
|hi (T )| ≤ O(T 2 )O ε + exp − .
ε
Thus hi (T )/T 2 → 0 as T → ∞ as long as ε = o(T −∆ ). The lemma then
follows. 2
By virtue of Lemma 8.11, we need only consider the last term in (8.37).
To this end, for each i ∈ M, there is a λi ∈ R such that
Z Z
1 T X 0 (t)Ai X(t) 1 T 0
dt = Y (t)Ai Y (t)dt
T 0 |X(t)|2 T 0 (8.46)
→ λi in probability as T → ∞,
8.4 A Sharper Result 235

where the existence of the limit follows from a slight modification of [61,
p. 344, Theorem 6]. The number λi above is precisely the average of
Y 0 (t)Ai Y (t), known as the mean value of Y 0 (·)Ai Y (·) and denoted by
M [Y 0 Ai Y ] in the literature of almost periodic functions [61, Appendix].
Define
m0
X
λ= πi λ i . (8.47)
i=1

It follows from Lemma 8.11, equations (8.37), (8.46), and (8.47) that
Z T
1 |X(T )| 1 X 0 (t)A(αε (t))X(t)
ln = dt → λ as T → ∞. (8.48)
T |X(0)| T 0 |X(t)|2

To proceed, note that (8.48) can be rewritten as

1 |X(T )|
ln = λ + o(1), (8.49)
T |X(0)|

where o(1) → 0 in probability as T → ∞. If λ < 0, denote λ = −λ0 with


λ0 > 0. We can make −λ0 + o(1) ≤ −λ1 where 0 < λ1 < λ0 . Thus it follows
from (8.49) that

|X(T )| ≤ |X(0)| exp(−λ1 T ) → 0 in probability as T → ∞.

Likewise, if λ > 0, we can find 0 < λ2 < λ such that λ + o(1) ≥ λ2 , which
in turn, implies that

|X(T )| ≥ |X(0)| exp(λ2 T ) → ∞ in probability as T → ∞. (8.50)

We summarize the result obtained thus far in the following theorem.


Theorem 8.12. The equilibrium point x = 0 of (8.35) is asymptotically
stable in probability if λ < 0 and asymptotically unstable in probability in
the sense of (8.50) if λ > 0.
Note that the results in Theorem 8.12 present a dichotomy for the prop-
erty of the equilibrium point x = 0 according to λ < 0 or λ > 0. One
naturally asks the question: What can we say about the equilibrium point
x = 0 if λ = 0. To answer this question, we need the following lemma.
Lemma 8.13. If the equilibrium point x = 0 of (8.35) is asymptotically
stable in probability, then it is exponentially p-stable for all sufficiently small
positive p.

Proof. This lemma can be established using a similar argument as that of


[83, Theorem 6.4.1]; we omit the details here for brevity. 2
With Lemma 8.13 at our hand, we are now able to describe the local
qualitative behavior of the equilibrium point x = 0 when λ = 0.
236 8. Stability of Switching ODEs

Theorem 8.14. Suppose that λ = 0. Then the equilibrium point x = 0 is


neither asymptotically stable nor asymptotically unstable in probability.

Proof. Assume first that the equilibrium point x = 0 is asymptotically


stable in probability. Then Lemma 8.13 implies that it is exponentially p-
stable for all sufficiently small positive p. Now applying Jensen’s inequality
to the convex function ϕ(x) = ex , from equation (8.37) and Definition 8.1
(iii), we obtain that
( Z )
T 0 ε
X (t)A(α (t))X(t)
|X(0)|p exp pE dt
0 |X(t)|2
= |X(0)|p exp {pE(ln |X(T )| − ln |X(0)|)}
≤ E|X(T )|p ≤ K|X(0)|p e−kT .

Consequently, with positive probability,


Z
1 T X 0 (t)A(αε (t))X(t)
lim dt < 0.
T →∞ T 0 |X(t)|2

According to (8.48), this contradicts the assumption that λ = 0. Thus


the equilibrium point x = 0 is not asymptotically stable. Similarly we can
prove that the equilibrium point x = 0 is not asymptotically unstable in
probability if λ = 0. 2

8.5 Remarks on Liapunov Exponent


This section is divided into two parts. In the first part, we consider a more
general case with the process given by (8.35), where α(t) is a continuous-
time Markov chain whose generator Q does not involve a small parameter
ε > 0. Here, the result is presented by the use of the Liapunov exponent
and the stationary distributions. Then, in the second part, we consider the
problems of finding the stationary density of the random process.

8.5.1 Stability under General Setup


To date, a method used frequently for analyzing stability of stochastic
systems is based on the work set forth in Khasminskii’s book [83]. It is par-
ticularly effective in treating linear stochastic differential equations, using
transformation techniques. Such an approach depends on the calculation
of a limit quantity later commonly referred to as the Liapunov exponent.
Assume that the Markov chain α(t) is irreducible. Our main objective is
to obtain a necessary and sufficient condition for stability. It follows that
Y (t) = X(t)/|X(t)| is well-defined and takes values on the unit sphere Sr =
8.5 Remarks on Liapunov Exponent 237

{y ∈ Rr : |y| = 1}. Differentiating Y (t) with respect to t and expressing


the result in terms of y again, we obtain
Ẏ (t) = −[Y 0 (t)A(α(t))y(t)]Y (t) + A(α(t))Y (t). (8.51)
Defining a process z(t) = ln |X(t)| as before, we obtain
ż(t) = Y 0 (t)A(α(t))Y (t). (8.52)
Thus Z t
ln |X(t)| = ln |X(0)| + Y 0 (u)A(α(u))Y (u)du. (8.53)
0
In view of the discussion in Chapter 4 and the well-known results for
Markov processes, denote the transition density of (Y (t), α(t)) by p(y, i, t)
for i ∈ M. (Note that in general, we would use p(y0 , i0 ; y, i, t) for each
i ∈ M to denote the transition probability density function, where (y0 , i0 )
denotes the initial position of the process (Y (t), α(t)). Nevertheless, in view
of (8.35), the process is time homogeneous. Using the convention in Markov
processes, we can simply write the transition density by p(y, i, t).) Then
p(y, i, t) satisfies the system of forward equations
∂p(y, i, t)
= L∗ p(y, i, t)
∂t r
X ∂
=− ((A(i)y)l p(y, i, t)) + Q0 p(y, ·, t)(i), i ∈ M,
∂yl
l=1

where (A(i)y)l denotes the lth component of A(i)y and Q0 is the transpose
of Q. Moreover, for any Γ ⊂ Rr and i ∈ M,
Z
P(y(t) ∈ Γ, α(t) = i) = p(y, i, t)dx.
Γ

Note that the process (Y (t), α(t)) is on the compact set Sr × M. The
existence of the invariant density of (Y (t), α(t)) is thus guaranteed owing
to the compactness. We further assume the uniqueness of the invariant
density, and denote it by (µ(y, i) : i = 1, . . . , m0 ). Then µ(y, i) satisfies
Xm0 Z

L µ(y, i) = 0, µ(y, i)dy = 1.
i=1 Rr

Under this condition, as in the proof similar to [74, Theorem 1], it can be
shown that there is a κ0 > 0 such that
|p(y, i, t) − µ(y, i)| ≤ K exp(−κ0 t). (8.54)
Similar to (8.37), we obtain
Z
z(T ) − z(0) 1 T 0
= Y (t)A(α(t))Y (t)dt
T T Z0
T
1
= Y 0 (t)A(α(t))Y (t)dt.
T 0
238 8. Stability of Switching ODEs

Redefine λ as
m0 Z
X
λ= y 0 A(i)yµ(y, i)dy,
i=1 Rr

where µ(y, i) is the stationary density of (Y (t), α(t)). Then we have


Z T 2
1
E Y 0 (t)A(α(t))Y (t)dt − λ
T 0
Z TZ T
1 0
= E 2 (Y 0 (t)A(α(t))Y (t) − λ) (8.55)
T 0 0

× (Y 0 (s)A(α(s))Y (s) − λ) dtds .

By virtue of (8.54), for t ≥ s,

0
E (Y 0 (t)A(α(t))Y (t) − λ) (Y 0 (s)A(α(s))Y (s) − λ)

≤ K exp(−κ0 (t − s)).

This together with (8.55) then yields that


Z T 2
1 K
E Y 0 (t)A(α(t))Y (t)dt − λ ≤ → 0 as T → ∞.
T 0 T

Because z(0)/T → 0 as T → ∞, we conclude that z(T )/T → λ in proba-


bility as T → ∞ as desired. The rest of the arguments are the same as in
the previous section. In fact, we can also obtain the limit in the sense of
convergence w.p.1. That is,

X 0 m Z
ln |X(t)|
lim =λ= y 0 A(i)yµ(y, i)dy w.p.1. (8.56)
t→∞ t i=1 Sr

The limit λ is precisely the Liapunov exponent.

8.5.2 Invariant Density


The last section was devoted to calculation of the Liapunov exponent. In
obtaining the Liapunov exponent, it is crucial to find the associated sta-
tionary density or invariant density. For ordinary differential equations with
Markovian random switching, when the continuous component is two di-
mensional, using Liapunov exponents and the framework of [83], stability
analysis was carried out in [8, 9, 112] and interesting results were obtained
for special classes of problems such as certain forms of Markov modulated
harmonic oscillators. Nevertheless, general results on sufficient conditions
8.5 Remarks on Liapunov Exponent 239

guaranteeing the existence and uniqueness of invariant density are still


scarce.
In this section, we discuss the problem of finding the associated invari-
ant densities of (X(t), α(t)). In the literature, considerable attention has
been devoted to the case that X(t) ∈ R2 ; see [8, 9, 112]. First, second-
order differential equations with random switching are frequently used in
mechanical systems to describe oscillatory motions, as well as in mathe-
matical physics to study problems involving Schrödinger equations with
random potential. Linearization of the regime-switching Liénard equations
(in particular, Van der Pol equations) also leads to switching systems that
are linear in the x component. In addition, for two-dimensional systems,
polar coordinate transformation can be used to facilitate the study. To give
better insight and for simplicity, we let α(t) ∈ M = {1, 2}. Conditions for
existence and uniqueness of solutions are provided. The equations satis-
fied by the invariant density are nonlinear and complex. It appears that
closed-form solutions are difficult to obtain.
Assume that the Markov chain α(t) is irreducible. To proceed, for sim-
plicity, we consider the case that x ∈ R2 and
 
a11 (i) a12 (i)
A(i) = ∈ R2×2 , M = {1, 2},
a21 (i) a22 (i)

and the generator


 
−q1 q1
Q= ∈ R2×2 .
q2 −q2

For x = (x1 , x2 )0 ∈ R2 , it is more convenient to use coordinate transfor-


mation. Introduce the polar coordinate system by defining ϕ = ϕ(x) =
tan−1 (x2 /x1 ). Conditioning on α(t) = i, with the use of variable ϕ, we
obtain


γ(ϕ, i) = = −[sin ϕ cos ϕa11 (i) + sin2 ϕa12 (i)]
dt (8.57)
+[cos2 ϕa21 (i) + a22 (i) cos ϕ sin ϕ].

To proceed, for each i ∈ M, consider a Liapunov function V (·, i) : [0, 2π] 7→


R, where S is the unit circle. It can be verified that

∂V
LV (ϕ, i) = γ(ϕ, i) + QV (ϕ, ·)(i), i ∈ M. (8.58)
∂ϕ

We slightly change the notation and write the vector-valued function


µ(ϕ) = (µ1 (ϕ), µ2 (ϕ))0 ∈ R2×1 in what follows. This invariant density then
240 8. Stability of Switching ODEs

is a solution of the system of forward equations


    
 − ∂


γ1 (ϕ)µ1 (ϕ)
+ Q 0 µ1 (ϕ)
= 0,

 ∂ϕ γ2 (ϕ)µ2 (ϕ) µ2 (ϕ)


X 2 Z 2π
µi (ϕ)dϕ = 1, (8.59)



 i=1 0


 µ (·) is 2π periodic for i ∈ M.
i

Recall that Q0 denotes the transpose of Q. To ensure the existence of the


solution of (8.59), we propose the following conditions.
(A8.5) For each i ∈ M,

[a22 (i) − a11 (i)]2 + 4a12 (i)a21 (i) < 0. (8.60)

Next, we proceed with solving the system of equations (8.59). Suppress


the argument when there is no confusion. Then the first part of (8.59) leads
to 
 d
 − (γ1 µ1 ) − q1 µ1 + q2 µ2 = 0,
dϕ (8.61)
 − d (γ2 µ2 ) + q1 µ1 − q2 µ2 = 0.


Thus, we have
d
(γ1 µ1 + γ2 µ2 ) = 0. (8.62)

As a result, γ1 (ϕ)µ1 (ϕ) + γ2 (ϕ)µ2 (ϕ) = c for all ϕ ∈ [0, 2π], where c is a
constant.
Let γ i (ϕ) = γi (ϕ)µi (ϕ) for i = 1, 2. Then (8.61) and (8.62) can be
rewritten as 
 dγ1 γ1 γ2

 + q1 − q2 =0

 dϕ γ 1 γ2
dγ2 γ1 γ2 (8.63)
 − q1 + q2 =0
 dϕ


γ1 γ2

γ1 + γ2 = c.
Denote Z ϕ 
q1 q2
h(ϕ) = exp ( + )ds .
0 γ1 (s) γ2 (s)
By solving the above system of linear differential equations, we have
 Z ϕ
 h(s)

 q2 c ds + c

 γ 2 (s)

 γ1 = 0

Z ϕ h(ϕ)
h(s) (8.64)



 q1 c ds + c − c

 γ1 (s)
 γ2 = 0 ,
h(ϕ)
8.6 Examples 241

where c = γ 1 (0). Note that in deriving the second equation in (8.63), we


have used γ 2 (0) = c − γ 1 (0). Thus,
 Z ϕ
 h(s)

 q2 c ds + c

 γ2 (s)

 µ1 (ϕ) = 0
Z ϕ h(ϕ)γ1 (ϕ) (8.65)

 h(s)

 q 1 c ds + c − c

 γ1 (s)
 µ2 (ϕ) = 0 .
h(ϕ)γ2 (ϕ)
We also know that µi (ϕ) must be a 2π-periodic function of ϕ for i ∈ M,
so µ1 (0) = µ1 (2π). We thus have a system of linear equations for c and c,
namely,    
c 0
D = , (8.66)
c 1
where D = (dij ) ∈ R2×2 , with
Z 2π
q2 h(s)
d11 = ds
0 γ2 (s)
d12 = 1 − h(2π)
Z ϕ Z ϕ
h(s) h(s)
Z 2π γ2 (ϕ) q2 ds + γ1 (ϕ) q1 ds + γ1 (ϕ)
0 γ 2 (s) 0 γ 1 (s)
d21 = dϕ
Z0 2π h(ϕ)γ1 (ϕ)γ2 (ϕ)
γ2 (ϕ) − γ1 (ϕ)
d22 = dϕ.
0 h(ϕ)γ1 (ϕ)γ2 (ϕ)
(A8.6) The matrix D is nonsingular, or equivalently, det(D) 6= 0.
The uniqueness of (8.66) leads to the uniqueness of µ(y). Thus, we arrive
at the following conclusion: Suppose conditions (A8.5) and (A8.6) hold.
Then (8.59) has a unique solution.

8.6 Examples
For simplicity, by a slightly abuse of notation, we call a system that is linear
with respect to the continuous state a linear system or hybrid linear system.
To illustrate, we provide several examples in this section. In addition, we
demonstrate results that are associated with the well-known Hartman–
Grobman theorem for random switching systems.
Example 8.15. Consider a system (8.35) (linear in the variable x) with
the following specifications. The Markov chain α(t) has two states and is
generated by  
−3 3
Q= ,
1 −1
242 8. Stability of Switching ODEs

and
   
−1 2 −3 −1
A1 = A(1) =  , A2 = A(2) =  .
0 2 1 −2

Thus associated with the hybrid system (8.35), there are two ordinary
differential equations
Ẋ(t) = A1 X(t), (8.67)
and
Ẋ(t) = A2 X(t), (8.68)
switching back and forth from one to another according to the movement
of the jump process α(t). It is readily seen that the eigenvalues of A1 are
−1 and 2. Hence the motion of (8.67) is unstable. Similarly, by computing
the eigenvalues of A2 , the motion of (8.68) is asymptotically stable.
Next we use Corollary 8.9 to show that the hybrid system (8.35) is asymp-
totically stable owing to the presence of the Markov chain α(t). That is,
the Markov chain becomes a stabilizing factor. The stationary distribution
of the Markov chain α(t) is π = (0.25, 0.75), which is obtained by solving
the system of equations πQ = 0 and π1l = 1. The maximum eigenvalues of
A1 + A01 and A2 + A02 are
λmax (A1 + A01 ) = 4.6056, λmax (A2 + A02 ) = −4,
respectively. This yields that
π1 λmax (A1 + A01 ) + π2 λmax (A2 + A02 ) = −1.8486 < 0.
Therefore, we conclude from Corollary 8.9 that the hybrid system (8.35) is
asymptotically stable. The phase portrait Figure 8.1(a) confirms our find-
ings. It is interesting to note the dynamic movements and the interactions
of the continuous and discrete components. To see the difference between
hybrid systems and ordinary differential equations, we also present the
phase portraits of (8.67) and (8.68) in Figure 8.1(b).

Example 8.16. Consider a system (linear in the x variable)


Ẋ(t) = A(α(t))X(t) (8.69)
with the following specifications. Here we consider an x-dependent gener-
ator Q(x). The discrete event process α(t) has two states and is generated
by

Q(x) = Q(x1 , x2 )
 
−2 − sin x1 cos x21 − sin x22 2 + sin x1 cos(x21 ) + sin(x22 )
= ,
1 − 0.5 sin(x1 x2 ) −1 + 0.5 sin(x1 x2 )
8.6 Examples 243

−1
−0.5 0 0.5 1 1.5 2 2.5 3 3.5

(a) Hybrid linear system: Phase portrait of (8.35)


with initial condition (x, α) = ([1, 1]0 , 1).

12

10

0
−1 0 1 2 3 4 5 6 7 8

(b) Phase portraits of (8.67) (solid line) and (8.68)


(starred line) with the same initial condition x =
[1, 1]0 .

FIGURE 8.1. Comparisons of switching system and the associated ordinary dif-
ferential equations.
244 8. Stability of Switching ODEs

and
   
0 0 −1 2
A1 = A(1) =  , A2 = A(2) =  .
3 0 −2 −1

Note that the distinct feature of this example compared with the last one
is that the Q matrix is x dependent, which satisfies the approximation
condition posed in Section 8.3. Associated with the hybrid system (8.35),
there are two ordinary differential equations

Ẋ(t) = A1 X(t), (8.70)

and
Ẋ(t) = A2 X(t) (8.71)
switching from one to another according to the movement of the jump
process α(t). Solving (8.70), we obtain

x1 (t) = c1 , x2 (t) = 3c1 t + c2

for some constants c1 and c2 . Hence the motion of (8.70) is unstable if


the initial point (c1 , c2 ) is not in the y-axis. In addition, by computing the
eigenvalues of A2 , we obtain that the motion of (8.71) is asymptotically
stable.
By virtue of Corollary 8.9, the hybrid system (8.69) is asymptotically
stable due to the stabilizing jump process α(t). We first note that as x → 0,
 
−2 2
Q(x) → Q b= ,
1 −1

b(t) is π =
and hence the stationary distribution of the Markov chain α
(1/3, 2/3). The maximum eigenvalues of A1 + A01 and A2 + A02 are

λmax (A1 + A01 ) = 3, λmax (A2 + A02 ) = −2,

respectively. It follows that

π1 λmax (A1 + A01 ) + π2 λmax (A2 + A02 ) = −1/3 < 0.

Therefore, we conclude from Corollary 8.9 that the hybrid system (8.69)
is asymptotically stable. The phase portrait in Figure 8.2(a) confirms our
results. To delineate the difference between hybrid systems and ordinary
differential equations, we also present the phase portraits of (8.70) and
(8.71) in Figure 8.2(b). The phase portraits reveal the interface of continu-
ous and discrete components. They illustrate the hybrid characteristics in
an illuminating way.
8.6 Examples 245

−2

−4
−2 −1 0 1 2 3 4 5 6

(a) Hybrid linear system: Phase portrait of (8.69)


with initial condition (x, α) = ([0.5, 2]0 , 1).

12

10

−2
−0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2

(b) Phase portraits of (8.70) (starred line) and (8.71)


(solid line) with the same initial condition x =
[0.5, 2]0 .

FIGURE 8.2. Comparisons of switching system and the associated ordinary dif-
ferential equations.
246 8. Stability of Switching ODEs

Example 8.17. In this example, we consider a nonlinear hybrid system

Ẋ(t) = f (X(t), α(t)), (8.72)

where X(t) is a two-dimensional state trajectory, and α(t) is a jump process


taking value in M = {1, 2, 3} with generator

Q(x1 , x2 ) =
 
sin x2 − 2 − cos x1 1 − sin x2 1 + cos x1
 
 
 1 − sin(x21 x2 ) sin(x21 x2 ) − 1 − cos2 (x1 x2 ) cos2 (x1 x2 ) .
 
0 3 + sin x1 + sin x2 cos x2 −3 − sin x1 − sin x2 cos x2

The functions f (x, i), i = 1, 2, 3 are defined as


 
x2
x1 + 1+x21+x2
f (x, 1) =  1 2 ,
−2 x2 + sin(x1 x2 ) cos x2
 
−2 x1 + x2 − 2x1 sin x2
f (x, 2) =  ,
− x1 − x2 + x1 cos x1 sin x2
 
x2 + 2x1 cos x1 sin x2
f (x, 3) =  .
− x1 + x2 sin x1

b and Ai , i ∈ M in
Note that the matrices Q, assumption (A8.4) can be
obtained as follows  
−3 1 2
 

b =  1 −1 
Q 0  (8.73)
 
0 3 −3

and
     
1 0 −2 1 0 1
A1 =   , A2 =   , and A3 =  .
0 −2 −1 −1 −1 0
(8.74)
For a system of differential equations without switching, the well-known
Hartman–Grobman theorem holds. It indicates that for hyperbolic equilib-
ria, a linear system arising from approximation is topologically equivalent
to the associated nonlinear system, whereas a system with a center is not.
Here we demonstrate that this phenomenon changes a little. We show that
the nonlinear system and its approximation could be equivalent even for
8.7 Notes 247

equilibria with a center in one or more of its components as long as the


component with hyperbolic equilibrium dominates the rest of them.
We proceed to use Theorem 8.8 and Corollary 8.9 to verify that the
hybrid system (8.72) is asymptotically stable. The stationary distribution
of the Markov chain α b(t) is π = (3/14, 9/14, 1/7). Now consider the
symmetric and positive definite matrix
 
3 −1
G= 
−1 3

and compute the maximum eigenvalues of the matrices GAi G−1 + G−1 A0i G
with i = 1, 2, 3,
11
λmax (GA1 G−1 + G−1 A01 G) = ,
4
11
λmax (GA2 G−1 + G−1 A02 G) = − ,
4
and
3
λmax (GA3 G−1 + G−1 A03 G) = .
2
Thus we obtain
3
X 27
πi λmax (GAi G−1 + G−1 A0i G) = − < 0. (8.75)
i=1
28

Hence Theorem 8.8 implies that the hybrid system (8.72) is asymptotically
stable.
It is interesting to note that the linear approximation

Ẋ(t) = A(b
α(t))X(t), (8.76)

is also asymptotically stable, where the matrices A(i) = Ai , i ∈ M are as in


(8.74), and the Markov chain α b in (8.73). To demon-
b(t) is generated by Q
strate, we present the phase portrait of (8.72) in Figure 8.3(a), whereas
Figure 8.3(b) presents the phase portrait of its first–order linear approxi-
mation (8.76).

8.7 Notes
Consideration of switching ordinary differential equations of the form (8.1)
stems from a wide variety of applications including control, optimization,
estimation, and tracking. For example, in [168], with motivation of using
stochastic recursive algorithms for tracking Markovian parameters such
248 8. Stability of Switching ODEs

0.5

−0.5
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

(a) Hybrid nonlinear system: Phase portrait of (8.72)


with initial condition (x, α) = ([0.8, 1]0 , 1).

1.2

0.8

0.6

0.4

0.2

−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2

(b) Hybrid linear system: Phase portrait of (8.76) with


initial condition (x, α) = ([0.8, 1]0 , 1).

FIGURE 8.3. Comparisons of nonlinear system and its linear approximation.


8.7 Notes 249

as those in spreading code optimization in CDMA (Code Division Multi-


ple Access) wireless communication, we used an adaptive algorithm with
constant stepsize to construct a sequence of estimates of the time-varying
distribution. It was shown that under simple conditions, a continuous-time
interpolation of the iteration converges weakly not to an ODE as is widely
known in the literature of stochastic approximation [104], but to a system
of ODEs with regime switching. Subsequently, treating least-squares-type
algorithms involving Markovian jump processes in [169], random-switching
ODEs were also obtained. Thus, not only is the study of systems given
by (8.1) and (8.2) of mathematical interest, but also it provides practical
guidance for many applications.
Taking into consideration that many real-world systems are in operation
for a long period of time, longtime behavior of such systems is of fore-
most importance. Recently, much attention has been drawn to the study
of stability of such systems; see [6, 78, 116, 183] among others. Much of
the contemporary study of stochastic stability of dynamic systems can be
traced back to the original work [79] by Kac and Krasovskii, in which a
systematic approach was developed for stability of systems with Markovian
switching using Liapunov function methods. This important work stimu-
lated much of the subsequent development.
This chapter focuses on stability and instability of random switching
systems of differential equations. Sufficient conditions have been derived.
These conditions are easily verifiable and are based on coefficients of the
systems. For systems that are linear in the continuous component, certain
necessary and sufficient conditions are derived using a transformation tech-
nique, which close the gap in using maximal and minimal eigenvalues of
certain matrices. Somewhat remarkable, a particularly interesting discov-
ery is: Different from a single system of differential equations, in which the
Hartman–Grobman theorem is in force, for switching systems of differential
equations modulated by a random process, even if some of the equilibria
are not hyperbolic (e.g., center), the original system and that of the “lin-
earized” system (with respect to the continuous variable) could still have
the same asymptotic behavior. This is demonstrated by our analytical re-
sults as well as computations with the use of phase portraits. The stability
study is based on the results from Zhu, Yin, and Song [190]; the discussion
on Liapunov exponent is based on He and Yin [65].
Owing to the increasing needs of random environment models, stability
of hybrid systems has received resurgent attention lately. Effort has been
placed on deriving more easily verifiable conditions for stability and insta-
bility. This work also aims to contribute in this direction. One of the main
features of [79] is to use quadratic Liapunov functions to obtain verifiable
conditions for stability of the system Ẋ(t) = A(α(t))X(t), where α(t) is
a continuous-time Markov chain with a constant generator Q. Their con-
ditions amount to solving a system of linear nonhomogeneous equations,
which is generally complicated. Here, we presented easily verifiable condi-
250 8. Stability of Switching ODEs

tions for stability and instability with the aid of nonquadratic Liapunov
functions. Compared with the conditions in [79], our conditions in Theo-
rem 8.8, Corollary 8.9, and Corollary 8.10 are simpler and easier to verify.
Moreover, our results can be applied to more general models.
9
Invariance Principles

9.1 Introduction
In the previous two chapters, we have studied stability of switching dif-
fusions and random switching ordinary differential equations. Continuing
our effort, this chapter is concerned with invariance principles of switching
diffusion processes. This chapter together with the previous two chapters
delineates long-time behavior and gives a complete picture of the switching-
diffusion processes under consideration.
The rest of the chapter is arranged as follows. Section 9.2 begins with the
formulation of the problem. Section 9.3 is devoted to the invariance princi-
ples using sample paths and kernels of the associated Liapunov functions.
Here, Liapunov function-type criteria are obtained first. Then linear (in x)
systems are treated. Section 9.4 switches gears to examine the invariance
using the associated measures. Finally, a few more remarks are made in
Section 9.5 to conclude the chapter.

9.2 Formulation
As in the previous chapters, we use z 0 to denote the transpose of z ∈ R`1 ×`2
with `i ≥ 1, whereas R`×1 is simply written as R` ; 1l = (1, 1, . . . , 1)0 ∈ Rm0
is a column vector with all entries being 1; the Euclidean norm for a row or a
column vector x is denoted by |x|. As usual, I denotes the identity matrix
with suitable dimension. For a matrix A, its trace norm is denoted by

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 251
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_9,
© Springer Science + Business Media, LLC 2010
252 9. Invariance Principles
p
|A| = tr(A0 A). If a matrix A is real and symmetric, we use λmax (A) and
λmin (A) to denote the maximal and minimal eigenvalues of A, respectively,
and set ρ(A) := max {|λmax (A)| , |λmin (A)|}. When B is a set, IB (·) denotes
the indicator function of B.
We work with (Ω, F, P), a complete probability space, and consider a
two-component Markov process (X(t), α(t)), where X(·) is a continuous
component taking values in Rr and α(·) is a jump component taking values
in a finite set M = {1, 2, . . . , m0 }. The process (X(t), α(t)) has a generator
L given as follows. For each i ∈ M and any twice continuously differentiable
function g(·, i),
r r
1 X ∂ 2 g(x, i) X ∂g(x, i)
Lg(x, i) = ajk (x, i) + bj (x, i) + Q(x)g(x, ·)(i),
2 ∂xj ∂xk j=1
∂xj
j,k=1
1
= tr(a(x, i)∇2 g(x, i)) + b0 (x, i)∇g(x, i) + Q(x)g(x, ·)(i),
2
(9.1)
where x ∈ Rr , and Q(x) = (qij (x))Pis an m0 × m0 matrix depending on x
satisfying qij (x) ≥ 0 for i 6= j and j∈M qij (x) = 0 for each i ∈ M,
X
Q(x)g(x, ·)(i) = qij (x)g(x, j)
j∈M
X
= qij (x)(g(x, j) − g(x, i)), i ∈ M,
j∈M,j6=i

and ∇g(·, i) and ∇2 g(·, i) denote the gradient and Hessian of g(·, i), respec-
tively.
The process (X(t), α(t)) can be described by

dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t),


(9.2)
X(0) = x, α(0) = α,

and
P{α(t + ∆t) = j|α(t) = i, (X(s), α(s)), s ≤ t}
(9.3)
= qij (X(t))∆t + o(∆t), i 6= j,
where w(t) is a d-dimensional standard Brownian motion, b(·, ·) : Rr ×M 7→
Rr , and σ(·, ·) : Rr × M 7→ Rr×d satisfies σ(x, i)σ 0 (x, i) = a(x, i).
Throughout the chapter, we assume that both b(·, i) and σ(·, i) satisfy the
usual Lipschitz condition and linear growth condition for each i ∈ M and
that Q(·) is bounded and continuous. Under these conditions, the system
(9.2)–(9.3) has a unique strong solution; see Chapter 2 and also [77] or
[172] for details. From time to time, we often wish to emphasize the initial
data (X(0), α(0)) = (x, α) dependence of the solution of (9.2)–(9.3), which
is denoted by (X x,α (t), αx,α (t)).
9.3 Invariance (I): A Sample Path Approach 253

9.3 Invariance (I): A Sample Path Approach


Recall that the evolution of the discrete component α(·) can be represented
as a stochastic integral with respect to a Poisson random measure; see
Chapter 2, and also, for example, [52, 150]. Indeed, for x ∈ Rr and i, j ∈ M
with j 6= i, let ∆ij (x) be consecutive (with respect to the lexicographic
ordering on M × M), left-closed, right-open intervals of the real line, each
having length qij (x). Define a function h : Rr × M × R 7→ R by
m0
X
h(x, i, z) = (j − i)I{z∈∆ij (x)} . (9.4)
j=1

Then (9.3) is equivalent to


Z
dα(t) = h(X(t), α(t−), z)p(dt, dz), (9.5)
R

where p(dt, dz) is a Poisson random measure with intensity dt × m(dz),


and m is the Lebesgue measure on R. The Poisson random measure p(·, ·)
is independent of the Brownian motion w(·). Denote the natural filtration
by Ft := σ {(X(s), α(s)), s ≤ t}. Without loss of generality, assume the
filtration {Ft }t≥0 satisfies the usual condition. That is, it is right continuous
with F0 containing all P-null sets.
For any (x, i), (y, j) ∈ Rr × M, define

|x − y|, if i = j,
d((x, i), (y, j)) = (9.6)
|x − y| + 1, if i 6= j.

It is easy to verify that for any (x, i), (y, j), and (z, l),
(i) d((x, i), (y, j)) ≥ 0 and d((x, i), (y, j)) = 0 if and only if (x, i) = (y, j),
(ii) d((x, i), (y, j)) = d((y, j), (x, i)), and
(iii) d((x, i), (y, j)) ≤ d((x, i), (z, l)) + d((z, l), (y, j)).
Thus d is a distance function of Rr × M. Also if U is a subset of Rr × M,
we define
d((x, i), U ) = inf {d((x, i), (y, j)) : (y, j) ∈ U } . (9.7)
Let M be endowed with the trivial topology. As usual, we denote by d(x, D)
the distance between x ∈ Rr and D ⊂ Rr ; that is,

d(x, D) = inf {|x − y| : y ∈ D} . (9.8)

In addition, we use Px,α and Ex,α to denote the probability and expectation
with (X(0), α(0)) = (x, α), respectively. Then for a fixed U ∈ Rr × M, the
function d(·, U ) is continuous.
254 9. Invariance Principles

9.3.1 Invariant Sets


Inspired by the study in [47], we define the invariant set as follows.

Definition 9.1. A Borel measurable set U ⊂ Rr ×M is said to be invariant


with respect to the solutions of (9.2)–(9.5) or simply, U is invariant with
respect to the process (X(t), α(t)) if

Px,i {(X(t), α(t)) ∈ U, for all t ≥ 0} = 1, for any (x, i) ∈ U.

That is, a process starting from U will remain in U a.s.

As shown in Lemma 7.1, when the coefficients of (9.2)–(9.5) satisfy

b(0, α) = σ(0, α) = 0, for each α ∈ M,

then any solution with initial condition (x, i) satisfying x 6= 0 will never
reach the origin almost surely, in other words, the set (Rr − {0}) × M is
invariant with respect to the solutions of (9.2)–(9.5).
Using the terminologies in [47, 83], we recall the definitions of stability
and asymptotic stability of a set. Then general results in terms of the
Liapunov function are provided.

Definition 9.2. A closed and bounded set K ⊂ Rr × M is said to be

(i) stable in probability if for any ε > 0 and ρ > 0, there is a δ > 0 such
that
 
Px,i sup d((X(t), α(t)), K) < ρ ≥ 1−ε, whenever d((x, i), K) < δ;
t≥0

(ii) asymptotically stable in probability if it is stable in probability, and


moreover
n o
Px,i lim d((X(t), α(t)), K) = 0 → 1, as d((x, i), K) → 0;
t→∞

(iii) stochastically asymptotically stable in the large if it is stable in prob-


ability, and
n o
Px,i lim d((X(t), α(t)), K) = 0 = 1, for any (x, i) ∈ Rr × M;
t→∞

(iv) asymptotically stable with probability one if

lim d((X(t), α(t)), K) = 0, a.s.


t→∞
9.3 Invariance (I): A Sample Path Approach 255

Theorem 9.3. Assume that there exists a nonnegative function V (·, ·) :


Rr × M 7→ R+ such that

Ker(V ) := {(x, i) ∈ Rr × M : V (x, i) = 0} (9.9)

is nonempty and bounded, and that for each α ∈ M, V (·, α) is twice con-
tinuously differentiable with respect to x, and

LV (x, i) ≤ 0, for all (x, i) ∈ Rr × M. (9.10)

Then

(i) Ker(V ) is an invariant set for the process (X(t), α(t)), and

(ii) Ker(V ) is stable in probability.

Proof. Let (x0 , i0 ) ∈ Ker(V ). By virtue of generalized Itô’s lemma [150],


we have for any t ≥ 0,
Z t
V (X(t), α(t)) = V (x0 , i0 ) + LV (X(s), α(s))ds + M (t), (9.11)
0

where M (t) = M1 (t) + M2 (t) is a local martingale with


Z t
M1 (t) = ∇V (X(s), α(s)), σ(X(s), α(s))dw(s) ,
Z0 t Z

M2 (t) = V (X(s), i0 + h(X(s), α(s−), z))
0 R

−V (X(s), α(s)) µ(ds, dz),

where ·, · denotes the usual inner product, and

µ(ds, dz) = p(ds, dz) − ds × m(dz)

is a martingale measure with p(dt, dz) being the Poisson random measure
with intensity dt × m(dz) as in (9.5). Taking expectations on both sides of
(9.11) (using a sequence of stopping times and Fatou’s lemma, if necessary
as the argument in Theorem 3.14), it follows from (9.10) that

Ex0 ,i0 [V (X(t), α(t))] ≤ V (x0 , i0 ) = 0.

The last equality above holds because (x0 , i0 ) ∈ Ker(V ). But V is non-
negative, so we must have V (X(t), α(t)) = 0 a.s. for any t ≥ 0. Then we
have ( )
Px0 ,i0 sup V (X(tn ), α(tn )) = 0 = 1,
tn ∈Q+
256 9. Invariance Principles

where Q+ denotes the set of nonnegative rational numbers. Now by virtue


of Proposition 2.4, the process (X(t), α(t)) is cádlág (sample paths being
right continuous and having left limits). Thus we obtain
 
Px0 ,i0 sup V (X(t), α(t)) = 0 = 1.
t≥0

That is,

Px0 ,i0 {(X(t), α(t)) ∈ Ker(V ), for all t ≥ 0} = 1.

This proves the first assertion of the theorem.


We proceed to prove the second assertion. For any δ > 0, let Uδ be a
neighborhood of Ker(V ) such that

Uδ := {(x, i) ∈ Rr × M : d((x, i), Ker(V )) < δ} . (9.12)

Let the initial condition (x, i) ∈ Uδ − Ker(V ) and τ be the first exit time
of the process from Uδ . That is,

τ = inf{t : (X(t), α(t)) 6∈ Uδ }.

Then for any t ≥ 0, by virtue of generalized Itô’s lemma,


Z t∧τ
V (X(t ∧ τ ), α(t ∧ τ )) = V (x, i) + LV (X(s), α(s))ds + M (t ∧ τ ),
0

where
Z t∧τ
M (t ∧ τ ) = ∇V (X(s), α(s)), σ(X(s), α(s))dw(s)
0ZZ t∧τ

+ V (X(s), i + h(X(s), α(s−), z))
0 R

−V (X(s), α(s)) µ(ds, dz).

As argued in the previous paragraph, by virtue of (9.10), we can use a


sequence of stopping times and Fatou’s lemma, if necessary, to obtain
Z t∧τ
Ex,i [V (X(t ∧ τ ), α(t ∧ τ ))] ≤ V (x, i) + Ex,i LV (X(s), α(s))ds
0
≤ V (x, i).

Because V is nonnegative, we further have

V (x, i) ≥ Ex,i [V (X(τ ), α(τ ))I{τ <t} ] + Ex,i [V (X(t), α(t))I{t≤τ } ]


≥ Ex,i [V (X(τ ), α(τ ))I{τ <t} ].
(9.13)
9.3 Invariance (I): A Sample Path Approach 257

For notational simplicity, denote (ξ, `) = (X(τ ), α(τ )). We claim that

V (ξ, `) > ρ, for some constant ρ > 0. (9.14)


Sk
To this end, write Ker(V ) = l=1 (Njl × {jl }), where k ≤ m0 , Njl ⊂ Rr
and jl ∈ M, for jl = 1, . . . , k. We denote further J = {j1 , . . . , jk } ⊂ M.
Sk when ` 6∈ J. Note that ξ ∈ D, where D is a
Let us first consider the case
bounded neighborhood of l=1 Nl (such a neighborhood D exists because
Ker(V ) is bounded by the assumption of the theorem). Then we have

inf {V (x, `) : x ∈ D} ≥ ρ1 > 0. (9.15)

Suppose (9.15) were not true. Then there would exist a sequence {xn } ⊂ D
such that limn→∞ V (xn , `) = 0. Because {xn } is bounded, there exists a
subsequence {xnk } such that xnk → x e. Thus by the continuity of V (·, `),
we have
x, `) = lim V (xnk , `) = 0.
V (e
k→∞

x, `) ∈ Ker(V ). This is a contradiction to the assumption that


That is, (e
` 6∈ J. Thus (9.15) is true and hence V (ξ, `) ≥ ρ1 .
Now let us consider the case ` ∈ J. It follows that δ ≤ d(ξ, N` ) ≤ δe < ∞.
A similar argument using contradiction as in the previous case shows that
n o
inf V (x, `) : δ ≤ d(x, N` ) ≤ δe ≥ ρ2 > 0.

Thus it follows that V (ξ, `) ≥ ρ2 . A combination of the two cases gives us


V (ξ, `) ≥ ρ, where ρ = ρ1 ∧ ρ2 . Hence the claim follows.
Finally, we have from (9.13) and (9.14) that
1
Px,i {τ < t} ≤ V (x, i).
ρ
Letting t → ∞,
1
Px,i {τ < ∞} ≤ V (x, i).
ρ
Note that
 
{τ < ∞} = sup d((X(t), α(t)), Ker(V )) ≥ δ .
0≤t<∞

Therefore, it follows that


 
1
Px,i sup d((X(t), α(t)), Ker(V )) ≥ δ ≤ V (x, i) → 0,
0≤t<∞ ρ

as d((x, i), Ker(V )) → 0. This finishes the proof of the theorem. 2


Next we consider asymptotic stability. To this end, we need the following
lemma.
258 9. Invariance Principles

Lemma 9.4. Assume that there exists a nonnegative function V : R r ×


M 7→ R+ with nonempty and bounded Ker(V ) such that for each α ∈ M,
V (·, α) is twice continuously differentiable with respect to x, and that for
any ε > 0,

LV (x, i) ≤ −κε < 0, for any (x, i) ∈ (Rr × M) − U ε , (9.16)

where κε is a positive constant depending on ε, Uε is a neighborhood of


Ker(V ) as defined in (9.12), and U ε denotes the closure of Uε . Then for
any 0 < ε < r0 , we have

Px,i {τε,r0 < ∞} = 1, for any (x, i) ∈ Uε,r0 ,

where

Uε,r0 = {(y, j) ∈ Rr × M : ε < d((y, j), Ker(V )) < r0 } ,

and τε,r0 is the first exit time from Uε,r0 ; that is,

τε,r0 := inf {t ≥ 0 : (X(t), α(t)) 6∈ Uε,r0 } .

Proof. Fix any (x, i) ∈ Uε,r0 . By virtue of generalized Itô’s lemma, we have
that for any t ≥ 0,

V (X(t ∧ τε,r0 ), α(t ∧ τε,r0 ))


Z t∧τε,r0
= V (x, i) + LV (X(s), α(s))ds + M (t ∧ τε,r0 ),
0

where M (t ∧ τε,r0 ) is a martingale with mean zero. Thus by taking expec-


tations on both sides, and using (9.16), we obtain
h i Z t∧τε,r0
Ex,i V (X(t ∧ τε,r0 ), α(t ∧ τε,r0 )) ≤ V (x, i) − Ex,i κε ds
0
= V (x, i) − κε Ex,i [t ∧ τε,r0 ].

Note that V is nonnegative; hence we have


1
Ex,i [t ∧ τε,r0 ] ≤ V (x, i).
κε
But
Ex,i [t ∧ τε,r0 ] = tPx,i {τε,r0 > t} + Ex,i [τε,r0 I{τε,r ≤t} ]
0

≥ tPx,i {τε,r0 > t} .


Thus it follows that
1
tPx,i {τε,r0 > t} ≤ V (x, i).
κε
9.3 Invariance (I): A Sample Path Approach 259

Now letting t → ∞, we have

Px,i {τε,r0 = ∞} = 0 or Px,i {τε,r0 < ∞} = 1.

The assertion thus follows. 2

Theorem 9.5. Assume that there exists a function V satisfying the con-
ditions of Lemma 9.4. Then Ker(V ) is an invariant set for the process
(X(t), α(t)) and Ker(V ) is asymptotically stable in probability.

Proof. Motivated by Mao and Yuan [120, Theorem 5.36], we use similar
ideas and proceed as follows. By virtue of Theorem 9.3, we know that
Ker(V ) is an invariant set for the process (X(t), α(t)) and that Ker(V ) is
stable in probability. Hence it remains to show that
n o
Px,i lim d((X(t), α(t)), Ker(V )) = 0 → 1 as d((x, i), Ker(V )) → 0.
t→∞

Because Ker(V ) is stable in probability, for any ε > 0 and any θ > 0, there
exists some δ > 0 (without loss of generality, we may assume that δ < θ)
such that
 
ε
Px,i sup d((X(t), α(t)), Ker(V )) < θ ≥ 1 − , (9.17)
t≥0 2

for any (x, i) ∈ Uδ , where Uδ is defined in (9.12). Now fix any (x, α) ∈
Uδ − Ker(V ) and let ρ > 0 be arbitrary satisfying 0 < ρ < d((x, i), Ker(V ))
and choose some % ∈ (0, ρ). Define

τ% := inf {t ≥ 0 : d((X(t), α(t)), Ker(V )) ≤ %} ,


τθ := inf {t ≥ 0 : d((X(t), α(t)), Ker(V )) ≥ θ} .

Then it follows from Lemma 9.4 that

Px,α {τ% ∧ τθ < ∞} = Px,α {τ%,θ < ∞} = 1, (9.18)

where τ%,θ is the first exit time from U%,θ that is defined as

U%,θ := {(y, j) ∈ Rr × M : % < d((X(t), α(t)), Ker(V )) < θ} .

But (9.17) implies that Px,α {τθ < ∞} ≤ ε/2. Note also

Px,α {τ% ∧ τθ < ∞} ≤ Px,α {τ% < ∞} + Px,α {τθ < ∞} .

Thus it follows that


ε
Px,α {τ% < ∞} ≥ Px,α {τ% ∧ τθ < ∞} − Px,α {τθ < ∞} ≥ 1 − . (9.19)
2
260 9. Invariance Principles

Now let
τρ := inf {t ≥ τ% : d((X(t), α(t)), Ker(V )) ≥ ρ} .
We have used the convention that inf {∅} = ∞. For any t ≥ 0, we apply
generalized Itô’s lemma and (9.16) to obtain

Ex,α V (X(τρ ∧ t), α(τρ ∧ t)) ≤ Ex,α V (X(τ% ∧ t), α(τ% ∧ t))
Z τρ ∧t
+ Ex,α LV (X(s), α(s))ds (9.20)
τ% ∧t

≤ Ex,α V (X(τ% ∧ t), α(τ% ∧ t)).

Note that τ% ≥ t implies τρ ≥ t since % ∈ (0, ρ). As a result, on the set


{ω ∈ Ω : τ% (ω) ≥ t}, we have

Ex,α V (X(τρ ∧ t), α(τρ ∧ t))


= Ex,α V (X(t), α(t)) (9.21)
= Ex,α V (X(τ% ∧ t), α(τ% ∧ t)).

Therefore, it follows from (9.20) and (9.21) that


h i
Ex,α I{τ% <t} V (X(τρ ∧ t), α(τρ ∧ t))
h i
≤ Ex,α I{τ% <t} V (X(τ% ∧ t), α(τ% ∧ t))
h i
= Ex,α I{τ% <t} V (X(τ% ), α(τ% ))
≤ Vb% ,

where Vb% := sup {V (y, j) : d((y, j), Ker(V )) = %}. Note that τρ < t implies
τ% < t. Hence we further have
h i
Vb% ≥ Ex,α I{τ% <t} I{τρ <t} V (X(τρ ∧ t), α(τρ ∧ t))
h i
= Ex,α I{τρ <t} V (X(τρ ∧ t), α(τρ ∧ t))
h i
= Ex,α I{τρ <t} V (X(τρ ), α(τρ ))
≥ Vρ Px,α {τρ < t} ,

where Vρ := inf {V (y, j) : ρ ≤ d((y, j), Ker(V )) ≤ ρe}, with ρe > 0 being
some constant. Recall that we showed in the proof of Theorem 9.3 that
Vρ > 0. Because V is continuous, we may choose % sufficiently small so
that
Vb% ε
Px,α {τρ < t} ≤ ≤ .
Vρ 2
9.3 Invariance (I): A Sample Path Approach 261

Letting t → ∞, we obtain
ε
Px,α {τρ < ∞} ≤ . (9.22)
2
Finally, it follows from (9.19) and (9.22) that

Px,α {τ% < ∞, τρ = ∞} ≥ Px,α {τ% < ∞} − Px,α {τρ < ∞}


ε ε
≥ 1 − − = 1 − ε.
2 2
This implies that
 
Px,α lim sup d((X(t), α(t)), Ker(V )) ≤ ρ ≥ 1 − ε.
t→∞

But ρ > 0 can be chosen to be arbitrarily small. Therefore we have


n o
Px,α lim d((X(t), α(t)), Ker(V )) = 0 ≥ 1 − ε.
t→∞

This finishes the proof of the theorem. 2


Theorem 9.6. Assume there exists a function V satisfying the conditions
of Lemma 9.4. If V also satisfies

lim inf V (x, α) = ∞, (9.23)


|x|→∞ α∈M

Then Ker(V ) is asymptotically stable in probability in the large; that is,


Ker(V ) is stable in probability and
n o
Px,α lim d((X(t), α(t)), Ker(V )) = 0 = 1, (9.24)
t→∞

r
for any (x, α) ∈ R × M.

Proof. By virtue of Theorem 9.3, Ker(V ) is stable in probability. Thus it


remains to verify (9.24). To this end, as in the proof of Theorem 9.3, write
Sk
Ker(V ) = l=1 (Njl × {jl }), where k ≤ m0 , Njl ⊂ Rr , and jl ∈ M. Since
Sk
Ker(V ) is bounded by assumption, in particular, l=1 Njl is bounded, there
exists some R > 0 such that
 [k 
sup |y| : y ∈ Njl ≤ R. (9.25)
l=1

Let ε > 0 and fix any (x, α) ∈ Rr × M. Then (9.23) implies that there
exists some positive constant β > (R + 2) ∨ d((x, α), Ker(V )) such that
2V (x, α)
inf {V (y, j) : |y| ≥ β, j ∈ M} ≥ . (9.26)
ε
262 9. Invariance Principles

Define
τβ := inf {t ≥ 0 : d((X(t), α(t)), Ker(V )) ≥ 2β} .
For any t ≥ 0, we have by virtue of generalized Itô’s lemma and (9.16) that

Ex,α V (X(t ∧ τβ ), α(t ∧ τβ )) ≤ V (x, α). (9.27)

We claim that |X(τβ )| ≥ β. If this were not true, it would follow from
(9.25) that for any (y, j) ∈ Ker(V ),

d((X(τβ ), α(τβ )), (y, j)) ≤ |X(τβ ) − y| + 1


≤ |X(τβ )| + |y| + 1 < β + R + 1 < 2β − 1,

where in the last inequality above, we used the fact that β > R + 2. Then
we have d((X(τβ ), α(τβ )), Ker(V )) ≤ 2β − 1 < 2β. This contradicts the
definition of τβ . Thus we must have |X(τβ )| ≥ β. Then it follows from
(9.27) that
h i
V (x, α) ≥ Ex,α V (X(τβ ), α(τβ ))I{τβ <t}
≥ inf {V (y, j) : |y| ≥ β, j ∈ M} Px,α {τβ < t} ,

and hence (9.26) implies that


ε
Px,α {τβ < t} ≤ .
2
By letting t → ∞, we have
ε
Px,α {τβ < ∞} ≤ .
2
Then we can finish the proof using the same argument in the proof of
Theorem 9.5. 2
The following theorem provides a criterion for asymptotic stability with
probability 1.
Theorem 9.7. If there exists a nonnegative function V : U 7→ R+ such
that for each α ∈ M, V (·, α) is twice continuously differentiable with re-
spect to x and that there exists a continuous function Wc : Rr × M 7→ R+
satisfying
LV (x, i) ≤ −Wc (x, i), for any (x, i) ∈ U, (9.28)
where U ⊂ Rr ×M is an invariance set for the process (X(t), α(t)), assume
also that either U is bounded or

lim V (x, α) = ∞. (9.29)


|x|→∞, (x,α)∈U

Then for any initial condition (x, α) ∈ Rr × M, the following assertions


hold.
9.3 Invariance (I): A Sample Path Approach 263

(i) lim sup V (X x,α (t), αx,α (t)) < ∞ a.s.,


t→∞

c ) 6= ∅,
(ii) Ker(W
c )) = 0 a.s., and
(iii) lim d((X x,α (t), αx,α (t)), Ker(W
t→∞

c ) = {0} × M, then
(iv) if moreover, Ker(W

lim X x,α (t) = 0 a.s.


t→∞

Proof. This theorem can be proved using the arguments in [117, Theorem
2.1] although some modifications are needed. 2

9.3.2 Linear Systems


We end this section with the following results on linear systems. Again, by
linear systems we mean systems that are linear in the x variable. Consider
d
X
dX(t) = b(α(t))X(t)dt + σj (α(t))X(t)dwj (t), (9.30)
j=1

where b(i), σj (i) are r × r constant matrices and wj (t) are independent
one-dimensional standard Brownian motions for i = 1, 2, . . . , m0 and j =
1, 2, . . . , d.
Note that 0 is an equilibrium point for the system given by (9.30) and
(9.3). As we indicated earlier, it was shown in [92, 116] that the set Rr × M
is invariant with respect to the process (X(t), α(t)).
Theorem 9.8. Assume that the discrete component α(·) is ergodic with
constant generator Q = (qij ) and invariant distribution π = (π1 , . . . , πm0 ) ∈
R1×m0 . Then the equilibrium point x = 0 of the system given by (9.30) and
(9.3)
(i) is asymptotically stable with probability one if
m
X  d
X 
0
πi λmax b(i) + b (i) + σj (i)σj0 (i) < 0, (9.31)
i=1 j=1

(ii) is unstable in probability if

Xm   Xd  1 2 
0 0 0
πi λmin b(i) + b (i) + σj (i)σj (i) − ρ(σj (i) + σj (i)) > 0.
i=1 j=1
2
(9.32)
264 9. Invariance Principles

Proof. We need only prove assertion (i), because assertion (ii) was con-
sidered in Theorem 7.17 in Chapter 7. For notational simplicity, define the
column vector
µ = (µ1 , µ2 , . . . , µm0 )0 ∈ Rm0
with
 Xd 
1
µi = λmax b(i) + b0 (i) + σj (i)σj0 (i) .
2 j=1

Also let β := −πµ. Note that β > 0 by (9.31). As in Chapter 7, it follows


that the equation
Qc = µ + β1l
has a solution c = (c1 , c2 , . . . , cm0 )0 ∈ Rm0 . Thus we have
m0
X
µi − qij cj = −β, i ∈ M. (9.33)
j=1

For each i ∈ M, consider the Liapunov function

V (x, i) = (1 − γci )|x|γ ,

where 0 < γ < 1 is sufficiently small so that 1 − γci > 0 for each i ∈ M.
It is readily seen that for each i ∈ M, V (·, i) is continuous, nonnegative,
vanishes only at x = 0, and satisfies (9.29). Detailed calculations as in the
proof of Theorem 7.17 in Chapter 7 reveal that for x 6= 0, we have
n x0 b(i)x X cj − c i
LV (x, i) = γ(1 − γci )|x|γ 2
− qij
|x| 1 − γci
j6=i
 (9.34)
d
1 X x σj (i)σj (i)x
0 0
(x0 σj0 (i)x)2 o
+ + (γ − 2) .
2 j=1 |x|2 |x|4

Note that
d
x0 b(i)x 1 X x0 σj0 (i)σj (i)x
+
|x|2 2 j=1 |x|2
d
x0 (b0 (i) + b(i))x 1 X x0 σj0 (i)σj (i)x
= + (9.35)
2|x|2 2 j=1 |x|2
 X d 
1 0 0
≤ λmax b(i) + b(i) + σj (i)σj (i) = µi .
2 j=1

Next, it follows that when γ is sufficiently small,


X Xm X ci (cj − ci ) X m
cj − c i
qij = qij cj + qij γ= qbij cj + O(γ). (9.36)
1 − γci j=1
1 − γci j=1
j6=i j6=i
9.4 Invariance (II): A Measure-Theoretic Approach 265

Hence it follows from (9.33)–(9.36) that when 0 < γ < 1 sufficiently small,
we have
n m
X o
LV (x, i) ≤ γ(1 − γci )|x|γ µi − qbij cj + O(γ)
j=1
 
γ
= γ(1 − γci )|x| − β + O(γ)
β γ c (x, i),
≤ − γ(1 − γci ) |x| := −W
2

Note that Ker(Wc ) = {0} × M. Thus, we conclude from Theorem 9.7 that
the equilibrium point x = 0 is asymptotically stable with probability 1. 2

9.4 Invariance (II): A Measure-Theoretic Approach


This section aims to study the invariance principles by adopting the idea
of Kushner [98]; see also the related work [99, 100]. The motivation stems
from the study of deterministic system ẋ = b(x). In this case, for a suitable
Liapunov function V (x) and a suitable function k(x), let Oλ = {x : V (x) <
λ} and denote Z = {x ∈ Oλ : k(x) = 0}. In his well-known treatment of the
invariance theory, LaSalle [106] showed that if b(·) is continuous and V (·)
is a continuously differentiable function being nonnegative in the bounded
open set Oλ satisfying V̇ (x) = −k(x) ≤ 0, where k(·) is continuous in Oλ ,
then x(t) tends to ZI , the largest invariant set contained in Z as t → ∞.
Note that for the invariance, we often work with the time interval (−∞, ∞)
rather than the usual [0, ∞). As commented by Hale and Infante in [62],
working on [0, ∞) “at first sight to be a reasonable definition; however,
this definition does not impart any special significance to the limit set of
an orbit and appears unreasonable since it generally occurs that trajectories
having limits can be used to define functions on (−∞, ∞).”
In deterministic systems, one usually assumes the boundedness of the
trajectories, whereas in the stochastic setup, this boundeness can no longer
be assumed. It is replaced by certain weak boundedness (a precise definition
is provided later). The stochastic counterpart of the invariance principle is
based upon a set of measures that have the semigroup property. For our
case, consider the two-component Markov process Y (t) = (X(t), α(t)) in
which we are interested. Roughly, the idea of invariant set from a measure-
theoretic view point can be described as follows. A set of measures S is an
invariant set, if for any ψ ∈ S, there is a process Y (t) for t ∈ (−∞, ∞)
with measure ψ(t) so that ψ(0) = ψ and ψ(t) ∈ S. Thus if ψ ∈ S, so
is an entire trajectory of measures over (−∞, ∞). All of these are made
more precise in what follows. The main point is that the state of the flow
of the process is the measure, analogous to the state of the deterministic
model. To proceed, we first state a result, whose proof is essentially in
266 9. Invariance Principles

[97]. We recall that the system is stable with respect to (O1 , O2 , ρ) (or
stable relative to (O1 , O2 , ρ)) if for each i ∈ M and x ∈ O1 implies that
Px,i (X(t) ∈ O2 for all t < ∞) ≤ ρ.
Proposition 9.9. Let Y (t) = (X(t), α(t)) be a switching diffusion process
on Rr × M. For each i ∈ M, let V (x, i) be a nonnegative continuous
function, Om = {x : V (x, i) < m} be a bounded set, and
τ = inf{t : X(t) 6∈ Om }. (9.37)
Suppose that for each i ∈ M, LV (x, i) ≤ −k(x) ≤ 0, where k(·) is contin-
uous in Om . Denote Obm = Om ∩ {x : k(x) = 0}. Suppose that
 
Px,i sup |X(s) − x| ≥ η → 0 as t → ∞
0≤s≤t

bm with probability at
for any η > 0 and for each x ∈ O m . Then X(t) → O
least 1 − V (x, i)/m.
f relative to an
Remark 9.10. Recall that an η-neighborhood of a set M
f
open set O is an open set Nη (M ) ⊂ O such that
f) = {x ∈ O : |x − y| < η for some y ∈ M
Nη (M f}.

In addition, N η (Mf) = Nη (M f) + ∂Nη (Mf) is the closure of Nη (Mf). In our


b
case, k(x) is uniformly continuous on Om . If k(x) > 0 for some x ∈ Om ,
then for some d0 > 0 and for each 0 < d < d0 , there is an ηd > 0 such
that for the ηd -neighborhood Nηd (O bm ) of O
bm relative to Om , k(x) ≥ d > 0
on Om − Nηd (O bm ). Before proceeding to the proof of Proposition 9.9, we
establish a lemma below.
Lemma 9.11. In addition to the conditions of Proposition 9.9, assume
also that V (0, i) = 0 for each i ∈ M, for some m > 0, and for any d1 > 0
satisfying d1 ≤ V (x, i) ≤ m. Then the switching diffusion is stable with
respect to (Od1 , Om , 1 − d1 /m). In addition, for almost all ω ∈ Ωm = {ω :
X(t) ∈ Om for all t < ∞}, there is a c(ω) satisfying 0 ≤ c(ω) ≤ m such
that V (X(t ∧ τm ), α(t ∧ τm )) → c(ω), where τm = inf{t : X(t) 6∈ Om } is the
first exit time from Om .

Idea of Proof of Lemma 9.11. By Dynkin’s formula,


Z t∧τm
Ex,i V (X(t ∧ τm ), α(t ∧ τm )) − V (x, i) = LV (X(s), α(s))ds ≤ 0.
0

Then Ex,i V (X(t ∧ τm ), α(t ∧ τm )) ≤ V (x, i). Thus the stopped process is a
supermartingale. We have
V (x, i)
Px,i ( sup V (X(t ∧ τm ), α(t ∧ τm )) ≥ λ) ≤ .
0≤t<∞ λ
9.4 Invariance (II): A Measure-Theoretic Approach 267

Thus, Px,i (ω ∈ Ωm ) ≥ 1 − V (x, i)/m. By virtue of the supermartingale


convergence theorem, there exists a c(ω) such that V (X(t ∧ τm ), α(t ∧
bm , and O
τm )) → c(ω) a.s. Moreover, the structure of the sets Om , O em
implies that 0 ≤ c(ω) ≤ m.
Sketch of Proof of Proposition 9.9. The proof in fact, is almost the
same as that of [97, Theorem 2]. If k(x) ≡ 0 in Om , the result is evident as
0
a consequence of Lemma 9.11. Moreover, by Lemma 9.11, P(X(t) ∈ Om )≥
0
1 − V (x, i)/m, where Om denotes the interior of Om .
For each d1 and d2 > 0, and d0 given in Remark 9.10 satisfying d0 >
d1 > d2 , let η` be such that for x ∈ Om − Nη` (O bm ), k(x) ≥ d` . Without
b
loss of generality, Nη2 (Om ) is a proper subset of Nη1 (O bm ). Define

b
χ(s) = Ix,i (s, ω, η` ) = 1, if X(t) ∈ Om − Nη` (Om ),
0, otherwise.

Define Z τm
T (t, η` ) = Tx,i (t, η` ) = χ(s)ds.
t∧τm

That is, T (t, η` ) is the total time that the process spent in Om − Nη` (O bm )
with t < T (t, η` ), and T (t, η` ) < min(∞, τm ); T (t, η` ) = 0 if τm < t. Using
Lemma 9.11,

Px,i (X(t) leaves Om at least once before t = ∞)


V (x, i)
= 1 − Px,i (Ωm ) ≤ .
m
bm ),
Again, by Dynkin’s formula, for any x ∈ Om − Nη` (O

V (x, i) − Ex,i V (X(t ∧ τm ), α(t ∧ τm ))


Z t∧τm
= − Ex,i LV (X(s), α(s))ds
Z 0t∧τm
≥ γEx,i Ix,i (s)ds = γEx,i (t ∧ τm ),
0

where Ix,i (s) is the indicator of the set of (s, ω)s where LV (x, i) ≤ −γ.
Using the nonnegativity of V (·) and taking the limit as t → ∞, we obtain
Ex,i τm < V (x, i)/γ. Thus, Tx,i (t, η` ) < ∞ a.s. and Tx,i (t, η` ) → 0 as t → ∞.
There are only two possibilities. (i) There is a random variable τ (η1 ) < ∞
a.s. such that for all t > τ (η1 ), X(t) ∈ Nη1 (O bm ) with probability at least
1 − V (x, i)/m (this can also be represented as X(t) ∈ Nη1 (O bm ) a.s. relative
b
to Ωm ); (ii) for ω ∈ Ωm , X(t) moves from Nη2 (Om ) back and forth to
bm ) infinitely often in any interval [t, ∞). We demonstrate that
O m − N η1 ( O
the second alternative cannot happen.
268 9. Invariance Principles

Consider (ii). Since Tx,i (t, η` ) → 0 a.s. as t → ∞, there are infinitely


many movements from Nη2 (O bm ) to Om − Nη (O bm ) and back to Nη (O
bm )
1 2
in a total time being arbitrarily small. We claim that the probability of the
second alternative is 0.
For any ∆1 > 0, choose e t > 0 such that
sup Px,i ( sup |X(s) − x| ≥ η1 − η2 ) < ∆1 . (9.38)
bm )
x∈Om −Nη2 (O 0≤s≤e
t

Owing to the stochastic continuity and compactness of O m (the closure of


Om ), for each ∆2 > 0, there is a e
t < ∞ such that Px,i (Tx,i (t, η2 ) > e
t) < ∆2 .
This is a contradiction to (9.38). Thus
bm ) for some s ∈ (t, ∞)) → 0 as t → ∞,
Px,i (X(s) ∈ Om − Nη1 (O
for any x ∈ Om . The arbitrariness of η` then implies the result. 2
R∞
Remark 9.12. Denote Dη` = {ω : ω ∈ Ωm and 0 χ` (s)ds < ∞},
without assuming the boundedness of Om and O bm , but require that for any
ε > 0, Px,i {|X x,i (t)| → ∞, X(t) ∈ Om , Dη` } = 0. Then the conclusions of
Proposition 9.9 continue to hold. The proof of this can be carried out as in
[97, Theorem 3]. We omit the details for brevity.
To continue, we first set up some notation and conventions. Consider
the two-component Markov process Y (t) = (X(t), α(t)) with the generator
given by (9.1) or equivalently with the dynamics specified by (9.2) and (9.3)
with initial condition (X(0), α(0)) = (X0 , α0 ) ∈ Rr × M being possibly
random. The process takes values in Rr × M.
Suppose that m(t, ϕ, dx × i) is the measure induced on B(Rr × M), the
Borel sets of Rr × M, at time t such that

ϕ = m(0, ϕ, dx × i) is the measure induced at t = 0,


(9.39)
m(t + s, ϕ, dx × i) = m(t, m(s, ϕ), dx × i), t, s ≥ 0.
It is the measure of the process at time t, given “initial data” ϕ. The last
line of (9.39) is just the semigroup property. In the above, we continue to
use the convention as mentioned in Remark 6.6. That is, m(t, ϕ, dx × i) is
meant to be m(t, ϕ, dx × {i}).
Let Cb be the space of bounded and continuous functions on Rr ×M. Let
M be the collection of all measures on (Rr × M, B(Rr × M)). A sequence
of measures {ψn } converges to ψ weakly in M if as n → ∞,
m0 Z
X m0 Z
X
h(x, i)ψn (dx × i) → h(x, i)ψ(dx × i), for all h ∈ Cb .
i=1 Rr i=1 Rr

A set S ⊂ M is weakly bounded or tight, if for any η > 0, and each i ∈ M,


there is a compact set Kη ⊂ Rr such that
ψ((Rr − Kη ) × {i}) < η for all ψ ∈ S.
9.4 Invariance (II): A Measure-Theoretic Approach 269

Remark 9.13. In the above and henceforth, for simplicity, we use the
phrase “f ∈ Cb ” for a f function defined on Rr × M. In fact, since α(t)
takes values in a finite set M, for each x, f (x, ·) is trivially continuous.
Thus, the requirement of f ∈ Cb can be rephrased as “for each i ∈ M,
f (·, i) ∈ Cb .” In all subsequent development, when we say f ∈ Cb , it is
understood to be in this sense.

9.4.1 ω-Limit Sets and Invariant Sets


Definition 9.14. A measure ψ ∈ M is in the ω-limit set Ω(ϕ) e (with
ϕ given by (9.39)), if there is a sequence of real numbers {tn } satisfying
tn → ∞ such that
XZ
f (y, j)[m(tn , ϕ, dy × j) − ψ(dy × j)] → 0 as n → ∞
j∈M Rr

for each f ∈ Cb .

Definition 9.15. A set S ⊂ M is an invariant set if for each ψ ∈ S, there


e ψ, dx × j) defined for all s ∈ (−∞, ∞) and taking values
is a function m(s,
in S such that

e ψ, dx × j) = ψ(dx × j),
(1) m(0,

e + s, ψ, dx × j) = m(t, m(s,
(2) m(t e ψ), dx × j) for any s ∈ (−∞, ∞) and
t ∈ [0, ∞).

e ψ, dx) is the measure of the process. The variable s


Note that the m(t,
taking values in (−∞, ∞) is consistent with the definitions used in dynamic
systems for the flows of deterministic problems. The idea is that, for any
measure ψ to be an element of the invariant set, there must be a flow for
s ∈ (−∞, ∞) that takes value ψ at time zero. As commented upon in the
beginning of this section, the time interval is two-sided. We aim to obtain
the invariance principle using measure concepts. The result is spelled out
in the following theorem.

Theorem 9.16. Suppose that

(a) for each i ∈ M, ∪t∈[0,∞) m(t, ϕ, dx × i) is weakly bounded;

(b) on any finite time interval,


m0 Z
X
f (x, j)m(t, ϕ, dx × j) is a continuous function of t
j=1

uniformly in ϕ, for ϕ in any weakly bounded set of M;


270 9. Invariance Principles

(c) for each f (·) ∈ Cb and some weakly bounded sequence {ϕn } converg-
ing weakly to ϕ, we have
m0 Z
X m0 Z
X
f (x, j)m(ϕn , dx × j) → f (x, j)m(ϕ, dx × j).
j=1 j=1

e
Then (i) Ω(ϕ) e
6= ∅, (ii) Ω(ϕ) e
is weakly bounded, (iii) Ω(ϕ) is an invariant
set, and (iv)
m0 Z
X
f (x, j)[m(t, ϕ, dx × j) − ϕ(dx × j)] → 0, as t → ∞.
j=1

Proof. We divide the proof into several steps.


Step 1. It is well known that on a completely separable metric space,
weak boundedness is equivalent to sequential compactness; see [16, p. 37],
e
[43, p. 103] and so on. Consequently, the ω-limit set Ω(ϕ) 6= ∅.
Step 2. For some sequence {tn } with tn → ∞ as n → ∞, some ψ ∈ M,
and all f (·, j) ∈ Cb for each j ∈ M, suppose that
m0 Z
X m0 Z
X
f (x, j)m(tn , ϕ, dx × j) → f (x, j)ψ(dx × j) as n → ∞.
j=1 j=1

Let K be a compact set in Rr . Denote by CK the restriction of Cb to K.


Then CK is a complete separable metric space equipped with the sup-norm
restricted to K. Let TK be a countable dense set in CK . For each i ∈ M
and each f (·, i) ∈ TK , define
XZ
e
fn (t; f ) = f (x, j)m(tn + t, ϕ, dx × j).
j∈M K

For some T > 0, consider the family of functions


n o
fen (t; f ) : f ∈ TK , t ∈ [−T, T ], tn ≥ T ,

which is uniformly bounded. By virtue of condition (b) of the theorem,


this is an equicontinuous family. The well-known Ascoli–Arzela lemma im-
plies that we can extract a subsequence {fenk (t; f )} that converges uni-
formly to fe(t; f ) on [−T, T ]. By using the diagonal process and repeat-
edly selecting subsequences, we can show that we can extract a further
subsequence {fen` (t; f )} that converges uniformly to fe(t; f ) on any finite
interval in (−∞, ∞). Moreover, because TK is dense in CK , the aforemen-
tioned convergence takes place for any f ∈ CK . The semigroup property of
9.4 Invariance (II): A Measure-Theoretic Approach 271

m(t, ϕ, dx × i) and weak convergence of


XZ
fen (t; f ) = f (x, dx × j)m(tn + t, ϕ, dx × j)
j∈M ZK
X
= f (x, dx × j)m(t, m(tn , ϕ), dx × j)
j∈M K

for each t ∈ (−∞, ∞) and each f (·, i) ∈ CK for each i ∈ M together with
the weak boundedness of {m(tn + t, ϕ, dx × i)} implies that the restriction
of {m(tn + t, ϕ)} to (K, B((Rr ∩ K) × M)) converges weakly to a measure
ψK (t) for each t ∈ (−∞, ∞).
Step 3. By the weak boundedness of {m(t, ϕ)}, there exists a sequence
ηi → 0, and a sequence of compact sets {Ki } in Rr satisfying Ki ⊂ Ki+1
such that for each j ∈ M,

m(t, ϕ, Kic × {j}) ≤ ηi , (9.40)

where Kic × {j} = (Rr − Ki ) × {j}. By means of a diagonal process, we


can find a subsequence {tn } and a sequence of measures {ψi (t, · × ·) : t ∈
(−∞, ∞)} such that for each f (·) ∈ Cb ,
XZ XZ
f (x, j)m(tn +t, ϕ, dx×j) → f (x, j)ψKi (t, dx×j). (9.41)
j∈M Ki j∈M Ki

For k > i,
XZ XZ
f (x, j)ψKi (t, dx × j) = f (x, j)ψKk (t, dx × j). (9.42)
j∈M Ki j∈M Ki

By (9.40)–(9.42), there is a measure ψ(·) such that for all t ∈ (−∞, ∞) and
all f (·) ∈ Cb ,
m0 Z
X m0 Z
X
f (x, j)m(tn + t, ϕ, dx × j) → f (x, j)ψ(t, dx × j).
j=1 j=1

Noting ψ(0) = ψ and by (c) in the assumption of the theorem, for each
f ∈ Cb , s ≥ 0, and t ∈ (−∞, ∞),
m0 Z
X
f (x, j)m(s, m(tn + t, ϕ), dx × j)
j=1
m0 Z
X
→ f (x, j)m(s, ψ(t), dx × j) as n → ∞,
j=1
272 9. Invariance Principles

and
m0 Z
X
f (x, j)m(s, m(tn + t, ϕ, dx × j), dx × j)
j=1
m0 Z
X
= f (x, j)m(0, m(tn + t + s, ϕ), dx × j)
j=1
m0 Z
X
→ f (x, j)m(0, ψ(t + s), dx × j)
j=1
m0 Z
X
= f (x, j)ψ(t + s, dx × j).
j=1

This shows that {ψ(t, dx × j)} is an invariant set.


Step 4. Suppose that there were a sequence {tn } satisfying for any sub-
sequence {tnk } and some f (·) ∈ Cb ,
m0 Z
X
lim sup inf f (x, j)m(tnk , ϕ, dx × j)
t→∞ k e
ψ∈Ω(ϕ) j=1
m0 Z
X
− f (x, j)ψ(dx × j) > 0.
j=1

The weak boundedness of {m(tnk , ϕ)} implies that there is a subsequence


that converges weakly to some ψ ∈ M. However, ψ ∈ Ω(ϕ),e which is a
contradiction. Thus the desired result follows. 2

Proposition 9.17. If (X(t), α(t)) is Feller, then condition (c) of Theo-


rem 9.16 holds. That is, for any t ∈ [0, T ] with T > 0 and any weakly
bounded sequence {ϕn } such that ϕn converges weakly to ϕ as n → ∞, we
have
m0 Z
X m0 Z
X
f (x, j)m(t, ϕn , dx × j) → f (x, j)m(t, ϕ, dx × j),
j=1 Rr j=1 Rr

for any f ∈ Cb as n → ∞.

Proof. Fix any t ∈ [0, T ]. For any f ∈ Cb , let h(x, i) = Ex,i f (X(t), α(t)).
Then by virtue of the Feller property, h is continuous and bounded. Note
that
m0 Z
X m0 Z
X
f (x, j)m(t, ϕn , dx × j) = Ex,i f (X(t), α(t))ϕn (dx × j)
j=1 Rr i=1 R
r

m0 Z
X
= h(x, i)ϕn (dx × i).
i=1 Rr
9.4 Invariance (II): A Measure-Theoretic Approach 273

Because ϕn converges weakly to ϕ and h ∈ Cb , it follows that


m0 Z
X m0 Z
X
lim h(x, i)ϕn (dx × i) = h(x, i)ϕ(dx × i).
n→∞ Rr Rr
i=1 i=1

But
m0 Z
X m0 Z
X
f (x, j)m(t, ϕ, dx × j) = Ex,i f (X(t), α(t))ϕ(dx × i)
j=1 Rr i=1 R
r

m0 Z
X
= h(x, i)ϕ(dx × i).
i=1 Rr

Thus we conclude that


Xm0 Z m0 Z
X
lim f (x, j)m(t, ϕn , dx × j) = f (x, j)m(t, ϕ, dx × j).
n→∞ Rr Rr
j=1 j=1

This finishes the proof. 2

Proposition 9.18. Assume that

(i) (X(t), α(t)) is Feller,

(ii) (X(t), α(t)) is continuous in probability uniformly in t ∈ [0, T ], where


T > 0,
S
(iii) t≥0 m(t, ϕ) is weakly bounded.

Then condition (b) of Theorem 9.16 holds. That is, for any f ∈ Cb , the
function
m0 Z
X
t 7→ f (x, j)m(t, ϕ, dx × j)
j=1 Rr

is continuous, uniformly in ϕ for ϕ in any weakly bounded set of M.

Proof. The proof is divided into several steps.


Step 1. Let f ∈ Cb . Then the weak boundedness, the continuity of f ,
and the stochastic continuity of (X(t), α(t)) imply that f (X(t), α(t)) is
continuous in probability in t.
Step 2. Next we show that the function t 7→ Ex,i f (X(t), α(t)) is contin-
uous at any fixed finite t for any (x, i) ∈ Rr × M. In fact, for any fixed
finite t, any (x, i) ∈ Rr × M, and any ε > 0, by virtue of Step 1, we can
find some ∆ = ∆(x, i) > 0 such that
ε
Px,i {|f (X(t + s), α(t + s)) − f (X(t), α(t))| > ε/6} <
12 kf k
274 9. Invariance Principles

for |s| ≤ ∆, where kf k denotes that essential sup-norm of f . Then it follows


that for |s| ≤ ∆,

|Ex,i f (X(t + s), α(t + s)) − Ex,i f (X(t), α(t))|


≤ Ex,i |f (X(t + s), α(t + s)) − f (X(t), α(t))|
= Ex,i [|f (X(t + s), α(t + s)) − f (X(t), α(t))| (I1 + I2 )] (9.43)
ε ε
≤ + 2 kf k
6 12 kf k
ε
≤ ,
3
where I1 = I{|f (X(t+s),α(t+s))−f (X(t),α(t))|≤ε/6} and I2 = 1 − I1 . Thus the
function Ex,i f (X(t), α(t)) is continuous at t.
Step 3. We claim that the function t 7→ Ex,i f (X(t), α(t)) is continuous
at any fixed finite t, uniformly for (x, i) ∈ K × M, where K is any compact
subset of Rr . In fact, since the process (X(t), α(t)) is Feller, the function
x 7→ Ex,i f (X(t), α(t)) is continuous, and hence uniformly continuous in K,
where K is any compact subset of Rr . Moreover, Proposition 2.30 enables
us to conclude that the function x 7→ Ex,i f (X(t), α(t)) is continuous, uni-
formly in any finite t interval. Thus for any ε > 0, there exists a δ > 0 such
that
ε
|Ex1 ,i f (X(t), α(t)) − Ex2 ,i f (X(t), α(t))| < , (9.44)
3
for any x1 , x2 ∈ K with |x1 − x2 | < δ. Because K is compact, we can cover
K by a finite union of balls; that is, there exists a positive integer N and
x1 , . . . , xN ∈ K such that K ⊂ ∪N k=1 B(xk , δ). It follows from (9.43) that
for each k = 1, 2, . . . , N and each i ∈ M, there exists a ∆k,i = ∆(xk , i)
such that
ε
|Exk ,i f (X(t + s), α(t + s)) − Exk ,i f (X(t), α(t))| < , (9.45)
3
for all |s| ≤ ∆k,i . Let ∆ = min {∆k,i , k = 1, . . . , N, i ∈ M}. Note that for
any x ∈ K, x ∈ B(xk , δ) for some k = 1, . . . , N . Hence for any i ∈ M and
any |s| ≤ ∆, it follows from (9.44) and (9.45) that

|Ex,i f (X(t + s), α(t + s)) − Ex,i f (X(t), α(t))|


≤ |Ex,i f (X(t + s), α(t + s)) − Exk ,i f (X(t + s), α(t + s))|
+ |Exk ,i f (X(t + s), α(t + s)) − Exk ,i f (X(t), α(t))|
+ |Exk ,i f (X(t), α(t)) − Ex,i f (X(t), α(t))|
ε ε ε
≤ + + = ε.
3 3 3
The claim thus follows.
9.4 Invariance (II): A Measure-Theoretic Approach 275

Step 4. Now for any compact set K ⊂ Rr and any f ∈ Cb , we have

m0 Z
X m0 Z
X
f (y, j)m(t + s, ϕ, dy × j) − f (y, j)m(t, ϕ, dy × j)
j=1 Rr j=1 Rr

m0 Z
X
= (Ex,i f (X(t + s), α(t + s)) − Ex,i f (X(t), α(t))) ϕ(dx × i)
r
j=1 R
m
X 0 Z
≤ |Ex,i f (X(t + s), α(t + s)) − Ex,i f (X(t), α(t))| ϕ(dx × i)
i=1 K
m0 Z
X
+ |Ex,i f (X(t + s), α(t + s)) − Ex,i f (X(t), α(t))| ϕ(dx × i).
i=1 Kc

Let S ⊂ M be weakly bounded. Fix t ≥ 0 and ε > 0. There is a compact


K0 ⊂ Rr such that ϕ(K0c × M) < ε/(4 kf k) for all ϕ ∈ S. By virtue of Step
3, there is a ∆ > 0 such that
ε
|Ex,i f (X(t + s), α(t + s)) − Ex,i f (X(t), α(t))| < ,
2
for all |s| ≤ ∆ and all (x, i) ∈ K0 × M. Hence it follows that for any ϕ ∈ S
and any |s| ≤ ∆, we have

m0 Z
X m0 Z
X
f (y, j)m(t + s, ϕ, dy × j) − f (y, j)m(t, ϕ, dy × j)
j=1 Rr j=1 Rr
m0 Z
X
≤ |Ex,i f (X(t + s), α(t + s)) − Ex,i f (X(t), α(t))| ϕ(dx × i)
i=1 K0
m0 Z
X
+ |Ex,i f (X(t + s), α(t + s)) − Ex,i f (X(t), α(t))| ϕ(dx × i)
c
i=1 K0
m0 Z
X X 0 m Z
ε
≤ ϕ(dx × i) + 2 kf k ϕ(dx × i)
i=1 K0
2 i=1 K0
ε ε
≤ + 2 kf k
2 4 kf k
= ε.
P m0 R
This shows that the function t 7→ j=1 Rr
f (y, j)m(t, ϕ, dy × j) is contin-
uous, uniformly in ϕ ∈ S, a weakly bounded set of M. 2

9.4.2 Switching Diffusions


Let us recall the definitions of the support of a measure and a family of
measures here. Let µ be a measure on (R, B(Rr )), where B(Rr ) denotes the
collection of Borel sets on Rr . Then the support of µ is defined to be the
276 9. Invariance Principles

set of all points x ∈ Rr × M for which every neighborhood N (x) of x has


positive measure. That is, if µ is a measure, then the support of µ, denoted
by supp(µ), is defined as

supp(µ) := {x : µ(N (x)) > 0 for all open neighborhood N (x) of x} .

If S is a family of measures, then


[
supp(S) := supp(µ).
µ∈S

In what follows, for a measure defined on Rr × M, by its x-section (or


x-component), we mean the measure defined on Rr .

Theorem 9.19. Assume the conditions of Theorem 9.16. Then the fol-
lowing assertions hold.
e
(a) Suppose that F = suppx (Ω(ϕ)) is the support of the x-section of set
e
Ω(ϕ). Then the process X(t) converges to F in probability as t → ∞.
That is,

Pϕ (d(X(t), F ) > η) → 0 as t → ∞ for any η > 0.

(b) Suppose X(t) converges in probability to a set G. Let LI be the largest


invariant set whose support is contained in G (i.e., supp(LI ) ⊂ G).
Then X(t) converges to supp(LI ) in probability.

Proof. To prove (a), let O be an arbitrary open set such that F ⊂ O. We


claim 
lim sup Pϕ X(t) ∈ Rr − O = 0. (9.46)
t→∞

We verify (9.46) by contradiction. Suppose it were not true. There would


e containing O, and a nonnegative function fe(·, i) ∈ Cb (for
be an open set O
each i ∈ M) satisfying

e = 0, if y ∈ O
f (y, i) e
≥ 1, if y ∈ Rr − O

such that
m0 Z
X
lim sup fe(x, i)m(t, ϕ, dx × i) > 0.
t→∞ Rr
i=1

It follows that there is a sequence {tn } and a γ > 0 such that


m0 Z
X
fe(x, i)m(tn , ϕ, dx × i) → γ > 0.
i=1
9.4 Invariance (II): A Measure-Theoretic Approach 277

By Theorem 9.16, {m(tn , ϕ)} is weakly bounded. So there is a subsequence


e
{tnk } and a ψ ∈ Ω(ϕ) such that
m0 Z
X m0 Z
X
f (x, i)m(tnk , ϕ, dx × i) → f (x, i)ψ(dx × i)
i=1 i=1

for all f (·, i) ∈ Cb . Thus it also holds for


m0 Z
X
fe(x, i)ψ(dx × i) = γ > 0.
i=1

However, this implies that there is some point x ∈ Rr − O such that ψ(N ×
i) > 0 for all open set N containing x. Hence it follows that (x, i) ∈ supp(ψ),
which leads to a contradiction, because supp(ψ) ⊂ supp(Ω(ϕ)) = F ⊂ O.
Therefore (9.46) must be true and hence the assertion of (a) is established.
To prove (b), by virtue of part (a), X(t) converges in probability to F as
t → ∞. In addition, by the hypothesis, X(t) converges to G in probability
as t → ∞. Thus it follows that F ⊂ G. Meanwhile, Theorem 9.16 implies
e
that Ω(ϕ) is an invariance set. Hence it follows that F = suppx (Ω(ϕ))e ⊂
supp(LI ). As a result, X(t) converges to supp(LI ) in probability as t → ∞.
Remark 9.20. Let the conditions of Theorem 2.1 be satisfied. Then by
virtue of Theorems 2.13 and 2.18, the process (X(t), α(t)) is continuous in
probability and Feller. Suppose that for each i ∈ M, there are nonnegative
functions V (·, i) ∈ C 2 and k(x) ∈ C satisfying
LV (x, i) ≤ −k(x) ≤ 0.
Assume also that
(i) lim inf V (x, i) = ∞, and
|x|→∞ i∈M

XZ
(ii) EV (X(0), α(0)) = V (x, i)ϕ(dx × i) < ∞.
i∈M Rr

Then using Dynkin’s formula, we can verify that


Eϕ V (X(t), α(t)) ≤ EV (X(0), α(0)) < ∞
for any t ≥ 0. Thus the set of measures induced by {V (X(t), α(t)), t ≥ 0}
is weakly bounded. Hence for any ε > 0, there is a compact K ⊂ R such
that
Pϕ {V (X(t), α(t)) ∈ K × M} ≥ 1 − ε, for any t ≥ 0.
Now let K e := {x ∈ Rr : V (x, i) ∈ K, for some i ∈ M}. Because K is
e is closed. In addition, a simple con-
compact and V (·, i) is continuous, K
tradiction argument and condition (i) imply that K e is bounded. Therefore
e
K is compact. Note that
e ×M
V (X(t), α(t)) ∈ K × M implies (X(t), α(t)) ∈ K
278 9. Invariance Principles

for any t ≥ 0. Thus it follows that


n o
e × M ≥ 1 − ε.
Pϕ (X(t), α(t)) ∈ K

This shows that the collection of measures induced by {(X(t), α(t)), t ≥ 0}


or ∪t≥0 m(t, ϕ) is weakly bounded. Therefore we conclude from Propositions
9.17 and 9.18 that all conditions of Theorem 9.16 are satisfied.
Suppose further that k(X x,i (t)) → 0 in probability for each i ∈ M, and
there is a sequence of compact sets {Kn } with Kn ×M ⊂ Rr ×M satisfying

1
m(t, ϕ, Knc × M) < .
n
Then for any open set O in Rr containing the set Kn ∩ {x : k(x) = 0}, we
have
1
lim Pϕ (X(t) 6∈ O) < .
t→∞ n
So we conclude from part (b) of Theorem 9.19 that X(t) converges in
probability to the largest support of an invariant set that is contained in
limn Kn ∩ {x : k(x) = 0}.

Example 9.21. We consider a randomly switching Liénard equation, a


real-valued equation of the second order of the following form

dX 2 (t) dX(t)
+ f (X(t), α(t)) + g(X(t)) = 0,
dt2 dt
where α(t) is a continuous-time Markov chain taking values in M = {1, 2},
and for each i ∈ M, f (·, i) : R 7→ R and g(·) : R 7→ R are continuously
differentiable functions satisfying for each i ∈ M,

f (x, i) > 0 for all x ∈ R,


g(0) = 0, xg(x) > 0 for all x 6= 0,
Z x
g(u)du → ∞ as x → ∞.
0

The above equation may be converted to a system of equations



 dX1
 = X2 (t)
dt
 dX2 = −g(X1 (t)) − f (X1 (t), α(t))X2 (t).

dt
For each i ∈ M, we can then define a Liapunov function
Z x1
x22
V (x1 , x2 , i) = + g(u)du,
2 0
9.4 Invariance (II): A Measure-Theoretic Approach 279

where as in the case of ordinary differential equations, the first term in


the Liapunov function has the meaning of kinetic energy and the term
involving the integral is potential energy. Note that the Liapunov function
constructed is independent of i ∈ M. Thus,
2
X
qij V (x1 , x2 , j) = 0 for each i = 1, 2.
j=1

Denote by
Oλ = {(x1 , x2 , i) : V (x1 , x2 , i) < λ}.
It is easily checked that

LV (x1 , x2 , i) = −x22 f (x1 , i) ≤ 0

since f (x1 , i) > 0 for all x1 ∈ R. With (X1 (0), X2 (0), α(0)) = (x1 , x2 , α),
by Dynkin’s formula,

Ex1 ,x2 ,α V (X1 (t), X2 (t), α(t)) − V (x1 , x2 , α)


Z t
= Ex1 ,x2 ,α LV (X1 (u), X2 (u), α(u))du ≤ 0.
0

So Ex1 ,x2 ,α V (X1 (t), X2 (t), α(t)) ≤ V (x1 , x2 , α), and it is a supermartin-
gale. It follows that for any t1 ≥ 0,
 
Px1 ,x2 ,α sup V (X1 (t), X2 (t), α(t)) ≥ λ
t1 ≤t<∞
Ex1 ,x2 ,α V (X1 (t1 ), X2 (t1 ), α(t))
≤ .
λ
By virtue of Proposition 9.9, it can be shown that with probability 1,
relative to
Ωλ = {ω : sup V (X1 (t), X2 (t), α(t)) < λ},
and
P(Ωλ ) ≥ 1 − V (x1 , x2 , i)/λ,

(X1 (t), X2 (t)) → G = {(x1 , x2 ) : x2 = 0, V (x1 , x2 , i) < λ; i ∈ M}


Z x1
= {(x1 , x2 , i) : x2 = 0, g(u)du < λ; i ∈ M}.
0
Another way to rephrase the notion “convergence in probability relative to
Ωλ ” is X(t) → G with probability at least 1 − V (x1 , x2 , i)/λ.
Referring to Theorem 9.19, the k(x) there now takes the form

k(x) = −x22 f (x1 , i) ≤ 0 for each i ∈ M.

Thus by Theorem 9.19, X(t) = (X1 (t), X2 (t)) converges in probability to


the largest invariance set whose support is contained in G. For ω ∈ Ω − Ωλ ,
280 9. Invariance Principles

(X1 (t), X2 (t)) → ∂Oλ . It follows that (X1 (t), X2 (t)) → (0, 0) in probability
relative to Ωλ . Therefore, for each η > 0, there is a T < ∞, such that for
all t ≥ T , P(V (X1 (t), X2 (t)) ≥ λ0 ) ≤ η. In particular,

P(V (X1 (t), X2 (t)) ≥ λ0 ) ≤ η. (9.47)

By the supermartingale inequality,


 
Px1 ,x2 ,α sup V (X1 (t), X2 (t), α(t)) ≥ λ0
T ≤t<∞
(9.48)
Ex1 ,x2 ,α V (X1 (T ), X2 (T ), α(T ))
≤ .
λ0
Then using (9.47),

Ex1 ,x2 ,α V (X1 (T ), X2 (T ), α(T )) ≤ η(1 − η) + ηλ.

Thus,
  η(1 + λ)
Px1 ,x2 ,α sup V (X1 (t), X2 (t), α(t)) ≥ λ0 ≤ .
T ≤t<∞ λ0

Because η and λ0 are arbitrary, (X1 (t), X2 (t)) → (0, 0) almost surely rela-
tive to Ωλ . However, Oλ is also arbitrary and bounded for any 0 < λ. Since
P(Ωλ ) → 1 as λ → ∞, (X1 (t), X2 (t)) → (0, 0) almost surely.

9.5 Notes
For systems running for a long time, it is crucial to learn their long-run
behavior; see [92, 116, 183] for recent progress on stability of such systems.
The rapid progress in natural science, life science, engineering, as well as
in social science demands the consideration of stability of such systems. In
fact, the advent of switching diffusions is largely because of the practical
needs in modeling complex dynamic systems; see [7, 52, 74, 92, 116, 168,
180, 183, 187, 188] for some of the recent studies.
Most works to date has been concentrated on Markov-modulated dif-
fusions, in which the Brownian motion and the switching force are inde-
pendent, whereas less is known for systems with continuous-component
dependent switching processes. As demonstrated in Chapter 2 (see also
[188]), when x-dependent switching diffusions are encountered, even such
properties as continuous and smooth dependence on the initial data are
nontrivial and fairly difficult to establish. Nevertheless, studying such sys-
tems is both practically useful and theoretically interesting. In our recent
work, basic properties such as recurrence, positive recurrence, and ergod-
icity are studied in [187]; stability is treated in [92]; stability of randomly
switching ordinary differential equations is treated in [190].
9.5 Notes 281

This chapter has taken up the issue of examination of the invariance


principle akin to LaSalle’s theorem for deterministic systems [61, 62, 106].
Previous study of invariance principles for stochastic systems can be found
in [98, 117].
In this chapter, two different approaches are used to study the invariance.
The first one is inspired by the work of Mao [117] using kernels of Liapunov
functions. The second one uses the approach of the measure-theoretic view-
point of Kushner [98]. The results obtained can also be adopted to treat
random-switching ordinary differential equations.
10
Positive Recurrence: Weakly
Connected Ergodic Classes

10.1 Introduction
To study the positive recurrence and ergodicity, one of the conditions used
in Chapters 3 and 4 is that the states of the switching process belong to
only one ergodic class. In this chapter, we further our study by treating a
more general class of problems. We consider the case that the states of the
discrete event process belong to several “ergodic” classes that are weakly
connected. This notion is made more precise in what follows. A key idea is
the use of two-time-scale formulation; see [176, 177] and many references
therein.
The rest of the chapter is arranged as follows. Section 10.2 begins with
the formulation. Section 10.3 focuses on hybrid diffusions whose discrete
component lives in weakly connected “ergodic” (irreducible) classes. Fi-
nally, the chapter is concluded with additional remarks in Section 10.4.

10.2 Problem Setup and Notation


Let x ∈ Rr , M = {1, . . . , m0 }, and Q(x) = (qij (x)) an m0 × m0 ma-
trix depending on P x satisfying that for any x ∈ Rr and i ∈ M, qij (x) ≥
m0
0 for i 6= j and j=1 qij (x) = 0. Consider a switching diffusion pro-
cess Y (t) = (X(t), α(t)), which has two components, an r-dimensional
diffusion component X(t) and a jump component α(t) taking value in
M = {1, . . . , m0 } representing discrete events. The hybrid diffusion process

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 285
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_10,
© Springer Science + Business Media, LLC 2010
286 10. Positive Recurrence: Weakly Connected Ergodic Classes

Y (t) = (X(t), α(t)) satisfies

dX(t) = b(X(t), α(t))dt + σ(X(t), α(t))dw(t),


(10.1)
X(0) = x, α(0) = α,

and
P(α(t + ∆t) = j|α(t) = i, X(s), α(s), s ≤ t)
(10.2)
= qij (X(t))∆t + o(∆t), i 6= j,
where w(t) is a d-dimensional standard Brownian motion, b(·, ·) : Rr ×M 7→
Rr , and σ(·, ·) : Rr × M 7→ Rr×d satisfying
b(x, i) = (bj (x, i)) ∈ Rr , σ(x, i)σ 0 (x, i) = a(x, i) = (ajk (x, i)) ∈ Rr×r ,
with z 0 denoting the transpose of z for z ∈ Rι1 ×ι2 and ι1 , ι2 ≥ 1.
Associated with the process given in (10.1) and (10.2), there is a gener-
ator L0 defined as follows. For each i ∈ M and for any twice continuously
differentiable function g(·, i), let
1
L0 g(x, i) =tr(a(x, i)∇2 g(x, i)) + b0 (x, i)∇g(x, i) + Q(x)g(x, ·)(i),
2
(10.3)
where ∇g(·, i) and ∇2 g(·, i) denote the gradient and Hessian of g(·, i), re-
spectively, and
m0
X
Q(x)g(x, ·)(i) = qij (x)g(x, j)
j=1 (10.4)
X
= qij (x)(g(x, j) − g(x, i)), i ∈ M.
j6=i,j∈M

For further references on stochastic differential equations involving Poisson


measures describing the evolution of the jump processes, we refer the reader
to Chapter 2 of this book; see also Skorohod [150].
Throughout the chapter, we assume that for each i ∈ M, both b(·, i)
and σ(·, i) satisfy the usual local Lipschitz and linear growth conditions.
It is well known that under these conditions, the system (10.1)–(10.2) has
a unique solution; see Chapter 2 of this book and also [150] for details.
In what follows, denote the solution by Y x,α (t) = (X x,α (t), αx,α (t)) to
emphasize the dependence on the initial data when needed.

10.3 Weakly Connected, Multiergodic-Class


Switching Processes
As mentioned, one of the conditions used in Chapters 3 and 4 is that the
states of the jump component are in a single ergodic class. This section
10.3 Weakly Connected, Multiergodic-Class Switching Processes 287

deals with the situation that weakly connected, multiple ergodic classes
are included. We assume that the discrete jump component is generated by
1e b
Qε = Q + Q, (10.5)
ε

where 0 < ε  1, and Q e and Qb are themselves generators of certain Markov


chains. Corresponding to (10.5), the states of the switching process live in a
number of ergodic classes that are weakly connected through the generator
b We say switching processes with such a structure have multiergodic
Q.
classes that are weakly connected. This section is divided into two parts.
We first concern ourselves with the case that the jump components are
divided into l recurrent classes. Later, we consider the case that transient
states are also included.

10.3.1 Preliminary
Before proceeding further, let us give the motivation for using such models.
First, we note that Qε is a constant matrix independent of x. The ratio-
nale is similar to those considered in the previous chapters, which can be
considered as linearizing Q(x) at “point of ∞.” We may begin with an x-
dependent matrix, say Qε (x). Then for x large enough, it can be replaced
by a constant Qε . To be more precise, assume

Qε (x) = Qε + o(1), as |x| → ∞, (10.6)

where Qε has the form (10.5). Note that in the previous chapters, a con-
dition similar to (10.6) was used without the ε-dependence, but the cor-
responding constant matrix (the limit at |x| → ∞) is a generator of an
ergodic Markov chain. Here, we are mostly concerned with the case that
the generator could possibly be reducible with several ergodic classes. Nev-
ertheless, the states belonging to different ergodic classes are not completely
separable. They are linked together through weak interaction due to the
presence of the slow part of the generator Q. b
The formulation of Qε being a generator of an ε-dependent Markov chain
for a small parameter ε, stems from an effort of using two-time-scale models
to reduce the complexity of the underlying systems. It has been observed
in [149] that there are natural hierarchical structures in many large-scale
and complex systems. The formulation in (10.5) is an effort to highlight the
different parts of subsystems varying at different rates. It is often possible
to partition the system states into a number of groups so that within each
group, the state transitions take place rapidly, whereas among different
groups, the changes are relatively infrequent. Such scenarios, in fact, appear
in many applications. Thus an effective way is to treat the systems through
decompositions and aggregations. Loosely, one can use the natural scales
shown in the system to aggregate the states in each ergodic class into
288 10. Positive Recurrence: Weakly Connected Ergodic Classes

one state. In this way, the total number of states is much reduced for the
aggregated system. This point of view was exclusively discussed in Yin and
Zhang [176]. To begin, one may not have an ε in the system, but it is
brought into the formulation to highlight the different rates of change so
as to separate the fast and slow motions.

10.3.2 Weakly Connected, Multiple Ergodic Classes


Suppose
e = diag(Q
Q e l ).
e1 , . . . , Q (10.7)
In view of (10.7), the state space M of the underlying Markov chain is
decomposable into l subspaces. That is, we can relabel the states so that

M = M 1 ∪ M2 ∪ · · · ∪ M l , (10.8)

with Mi = {si1 , . . . , simi } and m0 = m1 + m2 + · · · + ml such that Q ei ,


the generator associated with the subspace Mi for each i = 1, . . . , l is
irreducible. Thus the corresponding Mi for i = 1, . . . , l consist of recurrent
states belonging to l ergodic classes.
To signal the ε-dependence, we index the process by ε and write it as
Y ε (t) = (X ε (t), αε (t)). Then (10.1) and (10.2) become

dX ε (t) = b(X ε (t), αε (t))dt + σ(X ε (t), αε (t))dw(t),


(10.9)
X ε (0) = x, α(0) = α,
and
P(αε (t + ∆) = j|αε (t) = i, X ε (s), αε (s), s ≤ t) = qij
ε
∆ + o(∆), i 6= j.
(10.10)
The associated operator for the switching diffusion is given by
1
Lε g(x, ι) = tr(a(x, ι)∇2 g(x, ι)) + b0 (x, ι)∇g(x, ι) + Qε g(x, ·)(ι), ι ∈ M.
2
(10.11)
To proceed, lump the states of the jump component in each Mi into a
single state and define
αε (t) = i if αε (t) ∈ Mi . (10.12)
Denote the state space of αε (·) by M = {1, . . . , l}, and ν̃ = diag(ν 1 , . . . , ν l ),
where ν i is the stationary distribution corresponding to Q e i . Define
bl
Q = ν̃ Q1 (10.13)
0 `
with 1l = diag(1lm1 , . . . , 1lml ) and 1l` = (1, . . . , 1) ∈ R . The essence of the
aggregated process is to treat all the states in Mi as one state, so the total
number of states in the “effective” state space is much reduced. We need
the following assumption about the generator of αε (·).
10.3 Weakly Connected, Multiergodic-Class Switching Processes 289

e i is irreducible.
(A10.1) For each i = 1, . . . , l, Q

Lemma 10.1. Under (A10.1), the following assertions hold.

(a) The probability vector pε (t) ∈ R1×m0 with

pε (t) = (P(αε (t) = sij ), i = 1, . . . , l, j = 1, . . . , mi ),

satisfies
pε (t) = θ(t)ν̃ + O(ε(t + 1) + e−κ0 t/ε )
for some κ0 > 0, and θ(t) = (θ1 (t), . . . , θl (t)) ∈ R1×l satisfies

dθ(t)
= θ(t)Q, θ(0) = p(0)1l.
dt

(b) The transition matrix satisfies

P ε (t) = P (0) (t) + O(ε(t + 1) + e−κ0 t/ε ),

where P (0) (t) = 1lΘ(t)ν̃ and

dΘ(t)
= Θ(t)Q, Θ(0) = I.
dt

(c) The aggregated process αε (·) converges weakly to α(·) as ε → 0, where


α(·) is a Markov chain generated by Q.
(d) For i = 1, . . . , l, j = 1, . . . , mi ,
Z ∞ 2
−t
E e (I{αε (t)=sij } − νji I{αε (t)=i} )dt = O(ε),
0

where νji denotes the jth component of ν i for i = 1, . . . , l and j =


1, . . . , mi .

Proof. The proofs of (a) and (b) in Lemma 10.1 can be found in [176,
Corollary 6.12, p. 130]. The proof of (c) is based on the martingale averaging
method [176, Theorem 7.4, p. 172]; an outline of the idea is in [4], and a
discrete version of the approximation may be found in [177].
As for (d), using (a) and (b), direct calculation reveals that
Z ∞ 2
−t
E e (I{αε (t)=sij } − νji I{αε (t)=i} )dt
Z 0∞ Z t  
= e−t−s O ε(t + 1) + e−κ0 (t−s)/ε dsdt
Z ∞ 0Z s
0  
+ e−t−s O ε(t + 1) + e−κ0 (s−t)/ε dtds.
0 0
290 10. Positive Recurrence: Weakly Connected Ergodic Classes

Detailed calculations then yield the desired result. 2


Due to the aggregation, certain averages take place. When ε goes to 0,
we obtain a limit system. The following lemma records this fact, whose
proof can be found in, for example, Yin and Zhang [176], and Yin [165].
The basic idea is to use martingale averaging; we omit the details here.
Lemma 10.2. Assume (A10.1). Then (X ε (·), αε (·)) converges weakly to
(X(·), α(·)), whose operator is given by
1 0
Lg(x, i) = tr(a(x, i)∇2 g(x, i)) + b (x, i)∇g(x, i) + Qg(x, ·)(i), (10.14)
2
where
mi
X
b(x, i) = νji b(x, sij ),
j=1
Xmi (10.15)
a(x, i) = νji a(x, sij ), i = 1, . . . , l.
j=1

Remark 10.3. In view of the weak convergence result, the limit stochastic
differential equation for (10.9) is
dx = b(X(t), α(t))dt + σ(X(t), α(t))dw, (10.16)
where σ(x, i) is defined in terms of the average in (10.15); that is,
σ(x, i)σ 0 (x, i) = a(x, i).
(A10.2) Q is irreducible. For each i ∈ M, a(x, i) satisfies
κ1 |ξ|2 ≤ ξ 0 a(x, i)ξ, for all ξ ∈ Rr , (10.17)
with some constant κ1 ∈ (0, 1] for all x ∈ Rr .
With the conditions given, using the techniques of Chapter 3, to be more
specific, by virtue of Theorem 3.26, we establish the following assertion.
Proposition 10.4. Under (A10.1) and (A10.2), the switching diffusion
with generator given by (10.14) is positive recurrent.
We now present the result on positive recurrence of the underlying pro-
cess. The main idea is that although the discrete events, described by a
continuous-time Markov chain may have several weakly connected ergodic
classes, when ε is sufficiently small, we still have positive recurrence for the
process (X ε (·), αε (·)). The proof rests upon the use of perturbed Liapunov
function methods, which were first used to treat diffusion approximations
in [131] by Papanicolaou, Stroock, and Varadhan, and later on have been
successfully used in stochastic systems theory (see Kushner [102]), and
stochastic approximation (see Kushner and Yin [104]), among others. The
basic idea is to introduce perturbations of Liapunov functions that are
small in magnitude and that result in the desired cancellation of unwanted
terms.
10.3 Weakly Connected, Multiergodic-Class Switching Processes 291

Theorem 10.5. Assume that conditions (A10.1) and (A10.2) hold. Then
for sufficiently small ε > 0, the process (X ε (·), αε (·)) is positive recurrent.

Remark 10.6. First note that by Proposition 10.5, the process (X(·), α(·))
is positive recurrent. Then it follows that there are Liapunov functions
V (x, i) for i = 1, . . . , l for the limit system (10.16) such that

LV (x, i) ≤ −γ for some γ > 0. (10.18)

In Theorem 10.5, the meaning of the property holds for sufficiently small
ε. That is, there exists an ε0 > 0 such that for all 0 < ε ≤ ε0 , the property
holds.

Proof. To prove this result, we begin with the Liapunov function in (10.18)
for the limit system. Choose n0 to be an integer large enough so that D is
contained in the ball {|x| < n0 }. For any (x, i) ∈ D c × M, any t > 0, any
positive integer n > n0 , define

τD = inf{t : X ε (t) ∈ D}, and τD,n (t) = t ∧ τD ∧ βn , (10.19)

where βn comes from the regularity consideration and is the first exit time
of the process (X(t), α(t)) from the set {ex : |e
x| < n} × M; this is, βn =
inf{t : |X(t)| = n}. For i ∈ M, use the Liapunov function V (x, i) in (10.18)
to define
l
X
V (x, α) = V (x, i)I{α∈Mi } = V (x, i) if α ∈ Mi . (10.20)
i=1

Note that
V (X ε (t), αε (t)) = V (X ε (t), αε (t)), (10.21)

so these two expressions are to be used interchangeably in what follows.


e that is, QV
Observe that V (x, α) is orthogonal to Q; e (x, ·)(α) = 0. Then

Lε V (X ε (t), αε (t)) = V x (X ε (t), αε (t))b(X ε (t), αε (t))


1
+ tr[V xx (X ε (t), αε (t))a(X ε (t), αε (t))] (10.22)
2
b (X ε (t), ·)(αε (t)).
+ QV

By virtue of Dynkin’s formula,

Ex,i V (X ε (τD,n (t), αε (τD,n (t)) − V (x, i)


Z τD,n (t) (10.23)
= Ex,i Lε V (X ε (s), αε (s))ds,
0
292 10. Positive Recurrence: Weakly Connected Ergodic Classes

which involves unwanted terms on the right-hand side. To get rid of these
terms, we use methods of perturbed Liapunov functions [104, 176] to aver-
age out the “bad” terms. Note that the process (X ε (t), αε (t)) is Markov.
Thus, for a suitable function ξ(·), Lε ξ(t) can be calculated by

ξ(t + δ) − ξ(t)
Lε ξ(t) = lim Eεt , (10.24)
δ→0 δ

where Eεt denotes the conditional expectation with respect to the σ-algebra

Ftε = σ{(X ε (s), αε (s)) : s ≤ t}.

To obtain the desired results, we introduce three perturbations. The in-


tegrand of each of these perturbations is formed by the difference of two
terms, an original term and its “average.” The goal is to use the averages
in the final form for the evaluation of the Liapunov function. To ensure the
integrability in an infinite horizon, exponential discounting is used in the
integrals.
Define

Z ∞  
V1ε (x, α, t) = Eεt et−u V x (x, α) b(x, αε (u)) − b(x, αε (u)) du,
Zt ∞
1
V2ε (x, α, t) = Eεt et−u tr[V xx (x, α)(a(x, αε (u)) − a(x, αε (u)))]du,
Z ∞t 2
ε ε t−u b
V3 (x, t) = Et e [QV (x, ·)(αε (u)) − QV (x, ·)(αε (u))]du.
t
(10.25)
To proceed, we first state a lemma.

Lemma 10.7. Assume the conditions of Theorem 10.5. Then the following
assertions hold.

(a) For Viε with i = 1, 2, 3, we have the following estimates:

V1ε (X ε (t), αε (t), t) = O(ε)(V (X ε (t), αε (t)) + 1),

V2ε (X ε (t), αε (t), t) = O(ε)(V (X ε (t), αε (t)) + 1), (10.26)

V3ε (X ε (t), t) = O(ε)(V (X ε (t), αε (t)) + 1).


10.3 Weakly Connected, Multiergodic-Class Switching Processes 293

(b) Moreover,

Lε V1ε (X ε (t), αε (t), t)


= −V x (X ε (t), αε (t))[b(X ε (t), αε (t)) − b(X ε (t), αε (t))]

+ O(ε)(V (X ε (t), αε (t)) + 1),

Lε V2ε (X ε (t), αε (t), t)


1
= − tr[V xx (X ε (t), αε (t))(a(X ε (t), αε (t)) − a(X ε (t), αε (t))]
2
+ O(ε)(V (X ε (t), αε (t)) + 1),
b (X ε (t), ·)(αε (t)) + QV (X ε (t), ·)(αε (t))
Lε V3ε (X ε (t), t) = −QV

+ O(ε)(V (X ε (t), αε (t)) + 1).


(10.27)

Proof. The proof is inspired by the technique of perturbed Liapunov func-


tion methods of Kushner and Yin in [104, Chapter 6], and the construction
of the Liapunov functions is along the line of Badowski and Yin [4]. By the
definition of b(·),

b(X ε (t), αε (u)) − b(X ε (t), αε (u))


X l X mi
(10.28)
= b(X ε (t), sij )[I{αε (u)=sij } − νji I{αε (u)∈Mi } ].
i=1 j=1

It can be argued by using the Markov property and the two-time-scale


structure (see [176, p. 187]) that for u ≥ t,
 
Eεt [I{αε (u)=sij } − νji I{αε (u)=i} ] = O ε + e−κ0 (u−t)/ε . (10.29)

Thus, using (10.21), (10.28), and (10.29) in (10.25), we obtain


mi
l X
X
|V1ε (X ε (t), αε (t), t)| ≤ V x (X ε (t), αε (t))b(X ε (t), sij )
i=1Z j=1
∞   (10.30)
× et−u O ε + e−κ0 (u−t)/ε du
t
= O(ε)(V (X ε (t), αε (t)) + 1).

Likewise, we obtain the estimates for V2ε (X ε (t), αε (t), t) and V3ε (X ε (t), t)
in (10.26). This establishes statement (a).
Next, we prove (b). For convenience, introduce a notation

Γ(x, α, α1 ) = V x (x, α)[b(x, α1 ) − b(x, α1 )]. (10.31)


294 10. Positive Recurrence: Weakly Connected Ergodic Classes

By virtue of the definition of Lε , we have

Lε V1ε (X ε (t), αε (t), t)


Z
1 t+δ ε t−u
= − lim Et e Γ(X ε (t), αε (t), αε (u))du
δ→0 δZ t
1 ∞ ε t+δ−u
+ lim Et [e − et−u ]Γ(X ε (t), αε (t), αε (u))du
δ→0 δ t+δ
Z ∞
+ lim et+δ−u Eεt [Γ(X ε (t + δ), αε (t), αε (u))
δ→0 t+δ
− Γ(X ε (t), αε (t), αε (u))]du
+ O(ε)(V (X ε (t), αε (t)) + 1).

Moreover,
Z t+δ
1
− lim Eεt et−u Γ(X ε (t), αε (t), αε (u))du
δ→0 δ  
t (10.32)
ε ε ε ε ε ε
= − V x (X (t), α (t)) b(X (t), α (t)) − b(X (t), α (t)) ,

and
Z ∞
1
lim Eεt [et+δ−u − et−u ]Γ(X ε (t), αε (t), αε (u))du
δ→0 δ
Z t+δ

= et−u Eεt Γ(X ε (t), αε (t), αε (u))du = V1ε (X ε (t), αε (t)).
t
(10.33)
The independence of αε (·) and w(·), (10.28), and (10.31) lead to that for
u ≥ t,

Eεt [Γ(X ε (t + δ), αε (t), αε (u)) − Γ(X ε (t), αε (t), αε (u))]


Xl Xmi
= Eεt [V x (X ε (t + δ), αε (t))b(X ε (t + δ), sij )
i=1 j=1

− V x (X ε (t), αε (t))b(X ε (t), sij )]Eεt+δ [I{αε (u)=sij } − νji I{αε (u)∈Mi } ].
(10.34)
Furthermore, we have
Z
1 ∞ t+δ−u ε
lim e Et [Γ(X ε (t + δ), αε (t), αε (u))
δ→0 δ t+δ

− Γ(X ε (t), αε (t), αε (u))]du


Z
1 ∞ t−u ε
= lim e Et [Γ(X ε (t + δ), αε (t), αε (u))
δ→0 δ t+δ

− Γ(X ε (t), αε (t), αε (u))]du + o(1),


10.3 Weakly Connected, Multiergodic-Class Switching Processes 295

where o(1) → 0 in probability uniformly in t. Using (10.29), (10.31), and


(10.34),
mi
l X
X 
V x (X ε (t), αε (t))b(X ε (t), sij ) x b(X ε (t), αε (t))
i=1 j=1 Z
∞   (10.35)
× Eεt et−u I{αε (u)=sij } − νji I{αε (u)=i} du
t
= O(ε)(V (X ε (t), αε (t)) + 1),

and
mi
l X
X 1
tr[(Vx (X ε (t), αε (t))b(X ε (t), sij ))xx a(X ε (t), αε (t))]
2
i=1 j=1 Z
∞   (10.36)
× Eεt et−u I{αε (u)=sij } − νji I{αε (u)=i} du
t
= O(ε)(V (X ε (t), αε (t)) + 1).

Thus, (10.32), (10.33), (10.35), and (10.36) lead to

Lε V1ε (X ε (t), αε (t), t)


= − V x (X ε (t), αε (t))[b(X ε (t), αε (t)) − b(X ε (t), αε (t))]
+ O(ε)(V (X ε (t), αε (t)) + 1).

Similar calculations enable us to obtain the rest of the estimates in part


(b) of the lemma. The proof of the lemma is concluded. 2
Using Lemma 10.7, we proceed to eliminate the unwanted terms and to
obtain the detailed estimates. Define

V ε (t) = V (X ε (t), αε (t)) + V1ε (X ε (t), αε (t), t)


(10.37)
+ V2ε (X ε (t), αε (t), t) + V3ε (X ε (t), t).

It follows that

V ε (t) = V (X ε (t), αε (t)) + O(ε)(V (X ε (t), αε (t)) + 1), (10.38)

and

Lε V ε (t) = Vx (X ε (t), αε (t))b(X ε (t), αε (t))


1
+ tr [Vxx (X ε (t), αε (t))a(X ε (t), αε (t))]
2 (10.39)
+ QV (X ε (t), ·)(αε (t))
+ O(ε)(V (X ε (t), αε (t)) + 1).
296 10. Positive Recurrence: Weakly Connected Ergodic Classes

Therefore,
Lε V ε (t) = LV (X ε (t), αε (t)) + O(ε)(V (X ε (t), αε (t)) + 1). (10.40)
Note that through the use of perturbations, the first term on the right-
hand side above involves the limit operator and the Liapunov function of
the limit system, which is crucially important.
For fixed but arbitrary T > 0, we then obtain

Ex,i V ε (τD,n (t) ∧ T )


Z τD,n (t)∧T
= V (x, i) + Ex,i V1ε (x, i, 0) + Ex,i Lε V ε (s)ds
Z0 τD,n (t)∧T
= V (x, i) + Ex,i V1ε (x, i, 0) + Ex,i [LV (X ε (s), αε (s))
0
+ O(ε)(V (X ε (s), αε (s)) + 1)]ds
= V (x, i) + Ex,i V1ε (x, i, 0)
Z τD,n (t)∧T
+ Ex,i [LV (X ε (s), αε (s)) + O(ε)(V ε (s) + 1)]ds
0
≤ V (x, i) + Ex,i V1ε (x, i, 0)
Z τD,n (t)∧T
+ [O(ε) − γ]Ex,i [τD,n (t) ∧ T ] + O(ε) V ε (s)ds.
0
(10.41)
The expression after the third equality sign follows from the estimates in
(10.26), and the expression after the last inequality sign is a consequence
of (10.18).
Because ε > 0 is sufficiently small, there is a γ1 such that −γ1 ≥ −γ +
O(ε). It is clear that for the fixed T , τD,n (t) ∧ T ≤ T . Thus, the foregoing
together with [114, Theorem 2.6.1, p. 68] and [61, p. 36] yields that

Ex,i V ε (τD,n (t) ∧ T )


X 3
 
≤ V (x, i) + Ex,i Vιε (x, i, 0) − γ1 Ex,i (τD,n (t) ∧ T ) exp(εT ).
ι=1
(10.42)
Consequently,

exp(−εT )Ex,i V ε (τD,n (t) ∧ T ) + γ1 Ex,i [τD,n (t) ∧ T ]


X3 (10.43)
≤ V (x, i) + Ex,i Vιε (x, i, 0).
ι=1

Taking limit T → ∞ in (10.43) leads to


3
X
γ1 Ex,i τD,n (t) ≤ [V (x, i) + Ex,i Vιε (x, i, 0)]. (10.44)
ι=1
10.3 Weakly Connected, Multiergodic-Class Switching Processes 297

By means of regularity of the underlying process, passing to the limit as


n → ∞, (10.44) together with Fatou’s lemma yields that
3
X
Ex,i [t ∧ τD ] ≤ V (x, i) + Ex,i Vιε (x, i, 0).
ι=1

Finally, it is easily seen that


3
X
Ex,i Vιε (x, i, 0) < ∞.
ι=1

In addition,

Ex,i [t ∧ τD ] = Ex,i τD I{τD ≤t} + Ex,i tI{τD >t} . (10.45)

Because

X3
1
Px,i (τD > t) ≤ [V (x, i) + Ex,i Vιε (x, i, 0)] → 0 as t → ∞,
α1 t ι=1

we obtain Ex,i τD < ∞ as desired. Thus the switching diffusion is positive


recurrent. 2

10.3.3 Inclusion of Transient Discrete Events


Here, we extend the result to the case that αε (t) includes transient states
in addition to the states of ergodic classes. Let αε (·) be a Markov chain
with generator given by (10.5) with
 
e1
Q
 
 e2 
 Q 
 
e=
Q 
..
.

. (10.46)
 
 
 el
Q 
 
e1
Q e2
Q ··· el
Q e∗
Q
∗ ∗ ∗

Now the jump process is again nearly completely decomposable. However,


in addition to the l recurrent classes, there are a number of transient states.
That is, M = M1 ∪ · · · ∪ Ml ∪ M∗ , where M∗ = {s∗1 , . . . , s∗m∗ } contains a
collection of transient states. We replace (A10.1) by (A10.3) in this section.

e i are irreducible, and Q


(A10.3) For i = 1, . . . , l, Q e ∗ is Hurwitz (i.e., all
of its eigenvalues have negative real parts).
298 10. Positive Recurrence: Weakly Connected Ergodic Classes

The Hurwitz condition implies that the states in M∗ are transient.


Within a short period of time, they will enter one of the recurrent classes.
To proceed, we define the aggregate process. Note, however, we only lump
the states in each recurrent class, not in the transient class. Define

ν∗ = diag(ν̃, 0m∗ ×m∗ ) ∈ R(l+m∗ )×m0 ,


e −1
ai = − Q
e ei
∗ Q∗ 1lmi ∈ R
m∗ ×1
, for i = 1, . . . , l,
Ae = (e al ) ∈ Rm∗ ×l ,
a ,...,e (10.47)
1 
1l 0(m0 −m∗ )×m∗
1l∗ =   ∈ Rm0 ×(l+m∗ ) .
Ae 0m ×m
∗ ∗

Write  
b 11
Q b 12
Q
b=
Q 
b 21
Q b 22
Q
b 11 ∈ R(m0 −m∗ )×(m0 −m∗ ) , Q
so that Q b 22 ∈ Rm∗ ×m∗ , and Q
b 12 and Q
b 21 have
appropriate dimensions. Denote

b 11 1l + Q
Q = ν̃(Q b 12 A),
e Q∗ = diag(Q, 0m ×m ). (10.48)
∗ ∗

Define the aggregated process αε (·) by



ε i, if αε (t) ∈ Mi ,
α (t) = (10.49)
Uj , if αε (t) = s∗j ,

where
l
X
Uj = iI{Pi−1 P
aj0 j <U ≤ ij
e aj0 j } ,
e
j0 =1 0 =1
i=1

and U is a random variable uniformly distributed on [0, 1], independent


of αε (·). We only lump the states in each irreducible class, thus the state
space of the aggregated process αε (·) is again M = {1, . . . , l}. The detailed
proofs are omitted; see [176] and the references therein.
Lemma 10.8. Assume condition (A10.3). Then the following assertions
hold.
(a) The probability vector pε (t) = (pε,1 (t), pε,2 (t), . . . , pε,l (t), pε,∗ (t)), with
pε,i (t) ∈ R1×mi and

pε,∗ (t) = (P(αε (t) = s∗1 ), . . . , P(αε (t) = s∗m∗ )) ∈ R1×m∗ ,

satisfies

pε (t) = (θ(t)ν̃, 0m∗ ) + O (ε(t + 1) + exp(−κ0 t/ε))


10.3 Weakly Connected, Multiergodic-Class Switching Processes 299

for some κ0 > 0, where


dθ(t)
= θ(t)Q,
dt
θ(0) = (p1 (0)1lmi + p∗ (0)e
a1 , . . . , pl (0)1lml + p∗ (0)e
aml ).

(b) The transition probability matrix P ε (t) satisfies

P ε (t) = P (0) (t) + O(ε(t + 1) + e−κ0 t/ε ),

where P (0) (t) = 1l∗ Θ∗ (t)ν∗ with Θ∗ (t) = diag(Θ(t), Im∗ ×m∗ ), where
Θ(t) satisfies
dΘ(t)
= Θ(t)Q, Θ(0) = I.
dt
(c) αε (·) converges weakly to α(·), a Markov chain generated by Q.
(d) For i = 1, . . . , l, j = 1, . . . , mi ,
Z ∞ 2
E e−t (I{αε (t)=sij } − νji I{αε (t)=i} )dt = O(ε),
0

and for i = ∗, j = 1, . . . , m∗ ,
Z ∞ 2
−t
E e I{αε (t)=s∗j } dt = O(ε).
0

(e) (X ε (·), αε (·)) converges weakly to (X(·), α(·)) such that the limit op-
erator is as given in Lemma 10.2.
Remark 10.9. Although a collection of transient states of the discrete
events are included, the limit system is still an average with respect to
the stationary measures of those ergodic classes only. Asymptotically, the
transient states can be discarded because the probabilities go to 0 rapidly.
Theorem 10.10. Assume (A10.2) and (A10.3), and for each i ∈ M, there
is a Liapunov function V (x, i) such that LV (x, i) ≤ −γ for some γ > 0.
Then for sufficiently small ε > 0, the process (X ε (t), αε (t)) with inclusion
of transient discrete events is still positive recurrent.

Idea of Proof. Since the proof is along the line of that of Theorem 10.5,
we only note the differences compared with that theorem. For i = 1, . . . , l,
let V (x, i) be the Liapunov function associated with the limit process given
by Lemma 10.8(v). The perturbed Liapunov function method is used again.
This time, redefine
l
X m∗
X
V (x, α) = V (x, i)I{α∈Mi } + V (x, i)e
ai,j I{α=s∗j } . (10.50)
i=1 i=1
300 10. Positive Recurrence: Weakly Connected Ergodic Classes

e (x, ·)(α) = 0, where Q


It is easy to check that QV e is defined in (10.46).
Moreover,

b(x, αε (t)) − b(x, αε (t))


Xi X mi m∗
X
= b(x, sij )[I{αε (t)=sij } − νji I{αε (t)=i} ] + b(x, s∗j )I{αε (t)=s∗j } .
i=1 j=1 j=1

Therefore, we can carry out the proof in a similar manner to that of The-
orem 10.5. The details are omitted.

10.4 Notes
This chapter continues our study of positive recurrence for switching diffu-
sions. One of the crucial assumptions in the ergodicity study in the previous
chapters is that the switching component has a single ergodic class. This
chapter takes up the issue that the discrete component may have multi-
ple ergodic classes that are weakly connected. The main idea is the use
of the two-time-scale approach. Roughly, if the discrete events or certain
components of the discrete events change sufficiently quickly, the positive
recurrence can still be guaranteed. The main ingredient is the use of per-
turbed Liapunov function methods.
11
Stochastic Volatility Using
Regime-Switching Diffusions

11.1 Introduction
This chapter aims to model stochastic volatility using regime-switching dif-
fusions. Effort is devoted to developing asymptotic expansions of a system
of coupled differential equations with applications to option pricing under
regime-switching diffusions. By focusing on fast mean reversion, we aim
at finding the “effective volatility.” The main techniques used are singular
perturbation methods. Under simple conditions, asymptotic expansions are
developed with uniform asymptotic error bounds. The leading term in the
asymptotic expansions satisfies a Black–Scholes equation in which the mean
return rate and volatility are averaged out with respect to the stationary
measure of the switching process. In addition, the full asymptotic series is
developed. The asymptotic series helps us to gain insight on the behavior
of the option price when the time approaches maturity. The asymptotic
expansions obtained in this chapter are interesting in their own right and
can be used for other problems in control optimization of systems involving
fast-varying switching processes.
Nowadays, sophisticated financial derivatives such as options are used
widely. The Nobel prize winning Black–Scholes formula provides an im-
portant tool for pricing options on a basic equity. It has encouraged and
facilitated the union of mathematics, finance, computational sciences, and
economics. On the other hand, it has been recognized, especially by practi-
tioners in the financial market, that the assumption of constant volatility,
which is essential in the Black–Scholes formula, is a less-than-perfect de-

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 301
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_11,
© Springer Science + Business Media, LLC 2010
302 11. Stochastic Volatility Using Regime-Switching Diffusions

scription of the real world. To capture the behavior of stock prices and
other derivatives, there has been much effort in taking into account fre-
quent volatility changes. It has been recognized that it is more suitable to
use a stochastic process to model volatility variations. In [70], instead of
the usual GBM (geometric Brownian motion) model, a second stochastic
differential equation is introduced to describe the random environments.
Such a formulation is known as a stochastic volatility model.
What happens if the stochastic volatility undergoes fast mean rever-
sion? To answer this question, in [49] and the subsequent papers [50, 51],
a class of volatility models has recently been studied in details. Under the
setup of mean reversion, two-time-scale methods are used. The rationale is
to identify the important groupings of market parameters. It also reveals
that the Black–Scholes formula is a “first approximation” to such fast-
varying volatility models. Assume that the volatility is a function f (·) of a
fast-varying diffusion that is mean reverting (or ergodic). The mean rever-
sion implies that although rapidly varying, the volatility does not blow up.
By exploiting the time-scale separation, it was shown in [49, 50] that the
“slow” component (the leading term or the zeroth-order outer expansion
term in the approximation) of the option prices can be approximated by a
2
Black–Scholes differential equation with constant volatility f where f is
the average of f 2 (·) with respect to the stationary measure of the “fast”
component. Moreover, using a singular perturbation approach, the next
term (the first-order outer expansion term) in the asymptotic expansion
was also found. For convenience, the volatility was assumed to be driven
by an Ornstein–Uhlenbeck (OU) process in [49]–[51]. The fast mean rever-
sion has been further examined in [90] with more general models. A full
asymptotic series with uniform error bounds was obtained; see also related
diffusion approximation in [131] and asymptotic expansions for diffusion
processes [88, 89].
Along another line, increasing attention has been drawn to modeling,
analysis, and computing using regime-switching models [164], which are
alternatives to the stochastic volatility models mentioned above. They
present an effective way to model stochastic volatility with simpler struc-
tures. The use of the Markov chains is much simpler than the use of a
second stochastic differential equation. This is motivated by the desire
to use regime switching to describe uncertainty and stochastic volatility.
Nowadays, it has been well recognized that due to stochastic volatility, a
phenomena known as the volatility smile arises. There has been an effort
to provide better models to replicate the “smile.” In [164], the authors re-
produced the volatility smile successfully by using regime-switching models
easily. Earlier efforts in modeling and analysis of regime-switching models
can be found in [5, 32, 184] among others.
To some large extent, this chapter is motivated by [49] for extremely fast
mean reversion processes in the driving force of stochastic volatility. Nev-
11.2 Formulation 303

ertheless, rather than using an additional SDE to represent the stochastic


volatility, we use the same setup of a regime-switching model as that of
[164]. Therefore, it accommodates the desired goal from a different angle.
Here the fast mean reversion is captured by a fast-varying continuous-time
Markov chain. We demonstrate that the model under consideration leads
to effective volatility, which is an average with respect to the stationary
distribution of the fast-changing jump process. In mathematical terms, the
fast reversion corresponds to the Markov chain being ergodic with fast
variations. Using a two-time-scale formulation, the problem itself centers
around fast-varying switching-diffusion processes. The approach that we
are using is analytical. We focus on developing asymptotic expansions that
are approximation of solutions of systems of parabolic equations, and aim
to obtain such expansions with uniform asymptotic error bounds on a com-
pact set for the continuous component. It should be noted that if one is only
interested in convergence in the pointwise sense (termed pointwise asymp-
totic expansions henceforth) rather than obtaining uniform error bounds,
then one can proceed similarly to the diffusion counterpart in [49, 50]).
In any event, the “effective volatility” can be obtained. It should also be
noted that the asymptotic expansions presented in this chapter give some
new insight on the construction of approximation of solutions of backward
type systems of PDEs, which is different from [176] where probability dis-
tributions were considered. The methods presented are constructive. The
asymptotic analysis techniques are interesting in their own right.
The rest of the chapter is arranged as follows. Section 11.2 presents the
formulation of the problem. Section 11.3 constructs asymptotic expansions.
Section 11.4 proceeds with validation of the expansions. Finally Section 11.5
issues a few more remarks.

11.2 Formulation
In the rest of the book, r is used as the dimension of the continuous state
variables (i.e., Rr is used as the space for the continuous state variable).
In this chapter, however, to adopt the traditional convention, we use r as
the interest rate throughout. We consider a basic equity, a stock whose
price is given by S(t). Different from the traditional geometric Brownian
motion setup, we assume that the price follows a switching-diffusion model.
Suppose that α(t) is a continuous-time Markov chain with generator Q(t)
and a finite state space M = {1, . . . , m0 }. The price of the stock is a
solution of the stochastic differential equation

dS(t) = µ(α(t))S(t)dt + σ(α(t))S(t)dw(t), (11.1)

where w(·) is a standard Brownian motion independent of α(·), and µ(·) and
σ(·) are the appreciation rate and the volatility rate, respectively. Such a
304 11. Stochastic Volatility Using Regime-Switching Diffusions

model is frequently referred to as a regime-switching asset model. Note that


in the above, both the appreciation rate and volatility rate are functions of
the Markov chain.
Consider a European type of call option. The payoff at time T is given
by H(S), a nonnegative function. Denote h(S) = H(eS ). Suppose that the
associated risk-free interest rate is given by r. Nowadays, a standard ap-
proach in option pricing is risk-neutral valuation. The rationale is to derive
a suitable probability space on which the expected rate of return of all secu-
rities is equal to the risk-free interest rate. The mathematical requirement
is that the discounted asset price is a martingale. The associated probabil-
ity space is referred to as the risk-neutral world. The price of the option on
the asset is then the expected value, with respect to this martingale mea-
sure, of the discounted option payoff. In what follows, we use the so-called
risk-neutral probability measure; see, e.g., [49] and [69] among others for
diffusion processes and [164] for switching diffusion processes. To proceed,
we first present a lemma, which is essentially a generalized Girsanov theo-
rem for Markov-modulated processes. The results of this type are generally
known, and a proof can be found in [164].
Lemma 11.1. The following assertions hold.
(a) Suppose that σ(i) > 0 for each i ∈ M, and let
Z t
r − µ(α(u))
e := w(t) −
w(t) du.
0 σ(α(u))
e
e is a P-Brownian
Then, w(·) motion, where Pe is known as the risk-
neutral probability measure; see [49, 164] among others.
e
e are mutually independent under P;
(b) S(0), α(·), and w(·)
(c) (Itô’s lemma or Dynkin’s formula) For each i ∈ M and each g(·, ·, i) ∈
C 2,1 , we have
Z s
g(S(s), s, α(s)) = g(S(t), t, α(t)) + Lg(S(u), u, α(u))du
t
+M (s) − M (t),

e local martingale and L is a generator given by


where M (·) is a P
∂ 1 ∂2
Lg(S, t, i) = g(S, t, i) + S 2 σ 2 (i) 2 g(S, t, i)
∂t 2 ∂S (11.2)

+ rS g(S, t, i) + Q(t)g(S, t, ·)(i),
∂S
with
m0
X
Q(t)g(S, t, ·)(i) = qij (t)g(S, t, j). (11.3)
j=1
11.2 Formulation 305

Note that (S(t), α(t)) is a Markov process with generator L. To proceed,


as alluded to in the introduction, in reference to [49], we also consider a fast
mean reverting driving process, but the driving process is a continuous-time
Markov chain not a diffusion.
By introducing a small parameter ε > 0, we aim to show that the regime-
switching model is also a good approximation of the Black–Scholes model.
To this end, suppose that the generator of the Markov chain is given by
Qε (t) = (qij
ε
(t)) ∈ Rm0 ×m0 .
Q(t)
Qε (t) = , (11.4)
ε
where Q(t) is the generator of a continuous-time Markov chain. To highlight
the ε-dependence, we denote the Markov chain by α(t) = αε (t).
We are mainly interested in uniform asymptotic expansions for the option
price. To obtain such uniform asymptotic expansions, we need to have the
continuous component of the switching diffusion be in a compact set. For
simplicity, we take the compact set to be [0, 1]; see [90] for a comment on
the corresponding stochastic volatility model without switching. We use
this as a standing assumption throughout this chapter. Next, we state a
couple of additional conditions.
(A11.1) There is a T > 0 and for all t ∈ [0, T ], the generator Q(t)
is weakly irreducible. There is an n ≥ 1 such that Q(·) ∈
C n+2 [0, T ].
(A11.2) The function h(·) is sufficiently smooth and vanishes outside of
a compact set.
Remark 11.2. Condition (A11.1) indicates that the weak irreducibility
of the Markov chain implies the existence of the unique quasi-stationary
distribution ν(t) = (ν1 (t), . . . , νm0 (t)) ∈ R1×m0 . Our assumption on the op-
erator Lε then leads to the equity undergoing fast mean reverting switching.
Following [49, §5.4], we have assumed h(S) to be a bounded and smooth
function in (A11.2). For European options, if it is a call option, h(S) = (S −
K0 )+ , and if it is a put option, h(S) = (K0 − S)+ , where K0 is the exercise
price. Thus the function h(·) is not smooth and is unbounded. In [50], using
regularization or smoothing techniques, asymptotic expansions up to the
order O(ε) were obtained without assuming smoothness and boundedness
of the payoff function h(S). Note that the accuracy of the approximation
was obtained only for a fixed state variable x and time variable t with t < T
in [50], however. In this chapter, as in [49], we confine our attention to the
smooth and bounded payoff h(S), and prove the uniform in S accuracy
of approximation. It appears that the unbounded and nonsmooth payoff
function h(S) can be handled and the formal asymptotic expansions can
be obtained, but such expansions are not uniform in S. Moreover, different
methods and ideas must be used to justify the expansions as commented
on in [90].
306 11. Stochastic Volatility Using Regime-Switching Diffusions

Recall that by weak irreducibility, we mean that the system of equations



 ν(t)Q(t) = 0
(11.5)
 ν(t)1l = 1

has a unique solution ν(t) = (ν1 (t), . . . , νm0 (t)) ∈ R1×m0 satisfying νi (t) ≥
0 for each i ∈ M. Such a nonnegative solution is termed a quasi-stationary
distribution; see [176].
Let
V ε (S, t, i) = ES,i [exp(−r(T − t))h(S ε (T ))]
(11.6)
= E[exp(−r(T − t))h(S ε (T ))|S ε (t) = S, αε (t) = i].
The option price can be characterized by the following system of partial
differential equations,
∂V ε (S, t, i) 1 ∂ 2 V ε (S, t, i)
+ σ 2 (i)S 2
∂t 2 ∂S 2
∂V ε (S, t, i) (11.7)
+ rS − rV ε (S, t, i)
∂S
+ Qε (t)V ε (S, t, ·)(i) = 0, i ∈ M,
which is a generalization of the usual Black–Scholes PDE. Associated with
the above system of PDEs, we define an operator Lε . For i ∈ M, and each
g(·, ·, i) ∈ C 2,1 , let
∂g(S, t, i) 1 2 ∂ 2 g(S, t, i)
Lε g(S, t, i) = + σ (i)S 2
∂t 2 ∂S 2
∂g(S, t, i) (11.8)
+ rS − rg(S, t, i)
∂S
+ Qε (t)g(S, t, ·)(i).
Now the setup of the problem is complete. We proceed to obtain the ap-
proximation of the option price by means of asymptotic expansions.

11.3 Asymptotic Expansions


What can one say about the effect of αε (·)? Is there an “effective volatil-
ity?” Is the Black–Scholes formula still a reasonable approximation to such
regime-switching models? To answer these questions, we develop an asymp-
totic series using analytic techniques. We seek asymptotic expansions of
V ε (S, t, i) of the form
n
X n
X
Φεn (S, t, i) + Ψεn (S, τ, i) = εk ϕk (S, t, i) + εk ψk (S, τ, i), i ∈ M,
k=0 k=0
(11.9)
11.3 Asymptotic Expansions 307

where τ is a stretched-time variable defined by

T −t
τ= .
ε

The ϕk (·) are called regular terms or outer expansion terms, and the ψk (·)
are the boundary layer correction terms (or to be more precise, terminal
layer corrections). In this problem, the terminal layer correction terms are
particularly useful for behavior of the option price near the time of maturity.
We aim to obtain asymptotic expansions of the order n, and derive the
uniform error bounds. For the purposes of error estimates, we need to
calculate a couple of more terms for analysis reasons.
First let us look at Φεn (S, t, i), the regular part of the asymptotic expan-
sions. Substituting it into (11.7), and comparing coefficients of like powers
of εk for 0 ≤ k ≤ n + 1, we obtain

Q(t)ϕ0 (S, t, ·)(i) = 0,


∂ϕ0 (S, t, i) 1 2 ∂ 2 ϕ0 (S, t, i) ∂ϕ0 (S, t, i)
+ σ (i)S 2 + rS
∂t 2 ∂S 2 ∂S
− rϕ0 (S, t, i) + Q(t)ϕ1 (S, t, ·)(i) = 0,
(11.10)
···
∂ϕk (S, t, i) 1 2 ∂ 2 ϕk (S, t, i) ∂ϕk (S, t, i)
+ σ (i)S 2 2
+ rS
∂t 2 ∂S ∂S
− rϕk (S, t, i) + Q(t)ϕk+1 (S, t, ·)(i) = 0.

To ensure the match of the terminal (or boundary) conditions, we choose

ϕ0 (S, T, i) + ψ0 (S, 0, i) = h(S), i ∈ M,


(11.11)
ϕk (S, T, i) + ψk (S, 0, i) = 0, i ∈ M and k ≥ 1.

Taking a Taylor expansion of Q(·) around T , we obtain

n+1
X (−1)k (ετ )k dk Q(T )
Q(t) = Q(T − ετ ) = + Rn+1 (T − ετ ), (11.12)
k! dtk
k=0

where by assumption (A11.1), it can be shown that Rn+1 (t) = O(tn+1 )


uniformly in t ∈ [0, T ].
Similar to the equations in (11.10) for Φεn+1 (S, t, i), we can substitute
308 11. Stochastic Volatility Using Regime-Switching Diffusions

Ψεn+1 (S, τ, i) into (11.7). This results in


∂ψ0 (S, τ, i)
− + Q(T )ψ0 (S, τ, i) = 0,
∂τ 
∂ψ1 (S, τ, i)
− + Q(T )ψ1 (S, τ, i) + − τ Q(1) (T )ψ0 (S, τ, i)
∂τ
1 ∂ 2 ψ0 (S, τ, i)
+ σ 2 (i)S 2
2 ∂S 2  (11.13)
∂ψ0 (S, τ, i)
+ rS − rψ0 (S, τ, i) = 0,
∂S
··· ···
∂ψk (S, τ, i) ek (S, τ, i) = 0,
− + Q(T )ψk (S, τ, i) + R
∂τ
where the remainder term is given by
Xk
ek (S, τ, i) = (−1)j τ j dj Q(T )
R ψk−j (S, τ, i)
j=1
j! dtj
1 ∂ 2 ψk−1 (S, τ, i) ∂ψk−1 (S, τ, i) (11.14)
+ σ 2 (i)S 2 2
+ rS
2 ∂S ∂S
− rψk−1 (S, τ, i).

Our task to follow is to construct the sequences {ϕk (S, t)} and {ψk (S, τ )}.

11.3.1 Construction of ϕ0 (S, t, i) and ψ0 (S, τ, i)


We claim that ϕ0 (S, t, i) must be independent of i. To see this, write
ϕ0 (S, t) = (ϕ0 (S, t, 1), . . . , ϕ0 (S, t, m0 ))0 ∈ Rm0 ×1 .
Then the first equation in (11.10) may be written as Q(t)ϕ0 (S, t) = 0.
This equation in turn implies that ϕ0 (S, t) is in the null space of Q(t).
Because Q(t) is weakly irreducible, the rank of Q(t) = m0 − 1, so the
null space is one-dimensional. Consequently, the null space is spanned by
1l = (1, . . . , 1)0 ∈ Rm×1 . This yields that ϕ0 (S, t) = γ0 (S, t)1l, where γ0 (S, t)
is a real-valued function and hence ϕ0 (S, t, i) is a function independent of
i.
Using the above argument in the second equation of (11.10), and multi-
plying from the left through the equation by νi (t), where (ν1 (t), . . . , νm0 (t))
is the quasi-stationary distribution associated with Q(t), and summing over
i ∈ M, we obtain
∂γ0 (S, t) 1 2 ∂ 2 γ0 (S, t) ∂γ0 (S, t)
+ σ (t)S 2 2
+ rS − rγ0 (S, t) = 0, (11.15)
∂t 2 ∂S ∂S
where
m0
X
σ 2 (t) = νi (t)σ 2 (i).
i=1
11.3 Asymptotic Expansions 309

The terminal condition is given by

γ0 (S, T ) = h(S). (11.16)

Then (11.15) together with the terminal condition (11.16) has a unique
solution.

Remark 11.3. Note that γ0 (S, t) satisfies the Black–Scholes partial dif-
ferential equation, in which the coefficients are averaged out with respect
to the stationary distributions of the Markov chain. The result reveals that
the Black–Scholes formulation, indeed, is a first approximation to the op-
tion model under regime switching. Thus, the regime-switching model can
be thought of as another stochastic volatility model.

In view of (11.11) and (11.13), ψ0 (S, τ, i) is obtained from the first equa-
tion in (11.13) together with the condition ψ0 (S, 0, i) = 0. It follows that
ψ0 (S, τ, i) = 0 for each i ∈ M.

11.3.2 Construction of ϕ1 (S, t, i) and ψ1 (S, τ, i)


Now, we proceed to obtain ϕ1 (S, t, i) and ψ1 (S, τ, i). Similar to ϕ0 (S, t), for
i ≤ n + 1, define

ϕi (S, t) = (ϕi (S, t, 1), . . . , ϕi (S, t, m0 ))0 ∈ Rm0 ,


ψi (S, t) = (ψi (S, t, 1), . . . , ψi (S, t, m0 ))0 ∈ Rm0 .

Consider ϕ1 (S, t) in (11.10). It is a solution of a nonhomogeneous al-


gebraic equation as in (A.10) in the appendix of this book. By virtue of
Lemma A.12, we can write ϕ1 (S, t, i) as the sum of a general solution of
the associated homogeneous equation and the unique solution of the inho-
mogeneous equation that is orthogonal to the stationary distribution ν(t).
That is,
ϕ1 (S, t, i) = γ1 (S, t) + ϕ01 (S, t, i), i ∈ M,
where {ϕ01 (S, t, i) : i ∈ M} is the unique solution of (11.10) satisfying
m0
X
νi (t)ϕ01 (S, t, i) = 0.
i=1

Equivalently, denote

ϕ01 (S, t) = (ϕ01 (S, t, 1), . . . , ϕ01 (S, t, m0 ))0 ∈ Rm0 .

Then
ϕ1 (S, t) = γ1 (S, t)1l + ϕ01 (S, t), (11.17)
310 11. Stochastic Volatility Using Regime-Switching Diffusions

where ϕ01 (S, t) is the unique solution of the system of equations

Q(t)ϕ01 (S, t) = F0 (S, t)


(11.18)
ν(t)ϕ01 (S, t) = 0,

where
F0 (S, t) = (F0 (S, t, 1), . . . , F0 (S, t, m0 ))0 ∈ Rm0 ,

with
h ∂γ (S, t) σ 2 (i)S 2 ∂γ (S, t) ∂γ0 (S, t) i
0 0
F0 (S, t, i) = − + 2
+ rS − rγ0 (S, t) ,
∂t 2 ∂S ∂S
(11.19)
by Lemma A.12. That is, ϕ1 (S, t) is the sum of a general solution of the
homogeneous equation plus a particular solution verifying the orthogonality
condition.
Note that γ1 (S, t) has not been determined yet. We proceed to obtain
it from the next equation in (11.10). By substituting (11.17) into the next
equation in (11.10) and premultiplying it by ν, we obtain

∂γ1 (S, t) 1 2 ∂ 2 γ1 (S, t) ∂γ1 (S, t)


+ σ (t)S 2 + rS − rγ1 (S, t) = F 1 (S, t),
∂t 2 ∂S 2 ∂S
(11.20)
where
F1 (S, t) = (F1 (S, t, 1), . . . , F1 (S, t, m0 ))0 ∈ Rm0 ,

with
h ∂ϕ0 (S, t, i) σ 2 (i)S 2 ∂ϕ0 (S, t, i) ∂ϕ01 (S, t, i)
1 1
F1 (S, t, i) = − + rS
∂t i 2 ∂S 2 ∂S
− rϕ01 (S, t, i) ,
(11.21)
and
F 1 (S, t) = ν(t)F1 (S, t).

It is easily seen that (11.20) is a uniquely solvable Cauchy problem if the


terminal condition γ1 (S, T ) ∈ R is specified. We determine this by matching
the terminal condition with the terminal layer term ψ1 (S, 0).
Since ψ0 (S, τ ) = 0, from the second equation in (11.13), ψ1 (S, τ ) satisfies

∂ψ1 (S, τ )
= Q(T )ψ1 (S, τ ). (11.22)
∂τ
0
Choose ψ1 (S, 0) = ψ10 (S) such that ψ 1 (S) = νψ10 (S) = 0 where ν = ν(T ).
Here and hereafter, we always use ν to denote ν(T ) for notational simplicity.
11.3 Asymptotic Expansions 311

Multiplying from the left by ν, the stationary distribution associated with


Q(T ), we obtain
∂ψ 1 (S, τ )
= 0 ∈ R,
∂τ (11.23)
0
ψ 1 (S, 0) = ψ 1 (S) = 0,

where ψ 1 (S, τ ) = νψ1 (S, τ ). The solution of (11.23) is thus given by

ψ 1 (S, τ ) = 0.

Remark 11.4. The above may also be seen as

ψ1 (S, τ ) = exp(Q(T )τ )ψ10 (S)

for a chosen initial ψ10 (S). Using



X
exp(Q(T )τ ) = (Q(T )τ )j /j!,
j=0

the orthogonality of νQ(T ) = 0 leads to


0
ν exp(Q(T )τ )ψ10 (S) = νIψ10 (S) = νψ10 (S) = ψ 1 (S)

for each τ ≥ 0.
Note that once ψ10 (S) is chosen, the ϕ1 (S, t) and ψ1 (S, τ ) will be deter-
mined. The choice of ψ10 (S, 0) enables us to obtain the exponential decay
of ψ1 (S, τ ) easily.

In view of the defining equation for ϕ1 (S, t), equations (11.17)–(11.20),


and the terminal condition (11.11), equation (11.20) together with the ter-
0
minal condition γ1 (S, T ) = −ψ 1 (S) = 0 has a unique solution. Up to now,
ϕ1 (S, t) has been completely determined. Next, we proceed to find ψ1 (S, τ ).
Condition (11.11) implies that
0
ψ1 (S, 0) = −ϕ1 (S, T ) = ψ 1 (S)1l − ϕ01 (S, T ) = −ϕ01 (S, T ). (11.24)

It follows that the Cauchy problem given by (11.22) and (11.24) has a
unique solution. We are in a position to derive the exponential decay prop-
erty of ψ1 (S, τ ).

Lemma 11.5. For ψ1 (S, τ ) obtained from the solution of (11.22) and
(11.24), we have that for some K > 0 and κ0 > 0,

sup |ψ1 (S, τ )| ≤ K exp(−κ0 τ ). (11.25)


S∈[0,1]
312 11. Stochastic Volatility Using Regime-Switching Diffusions

Proof. The weak irreducibility of Q(T ) implies that exp(Q(T )τ ) → 1lν as


τ → ∞, and

| exp(Q(T )τ ) − 1lν| ≤ exp(−κ0 τ ) for some κ0 > 0.

The solution of (11.22) and (11.24) yields that for each S ∈ [0, 1] and for
some K > 0,

|ψ1 (S, τ )| = |ψ1 (S, τ ) − 1lνψ1 (S, 0)|


= |[exp(Q(T )τ ) − 1lν]ψ1 (S, 0)|
≤ | exp(Q(T )τ ) − 1lν||ψ1 (S, 0)|
≤ K exp(−κ0 τ ).

Furthermore, it is readily seen that the above estimate holds uniformly for
S ∈ [0, 1]. The desired result thus follows. 2

Remark 11.6. In what follows, for notational simplicity, K and κ0 are


generic positive constants. Their values may change for different appear-
ances.

We are now in a position to obtain a priori bounds for the derivatives of


ψ1 (S, τ ). This is presented in the next lemma.

Lemma 11.7. The function ψ1 (S, τ ) satisfies

∂ψ1 (S, τ )
sup ≤ K exp(−κ0 τ ),
S∈[0,1] ∂S
∂ 2 ψ1 (S, τ )
sup ≤ K exp(−κ0 τ ).
S∈[0,1] ∂S 2

Proof. Consider
∂ψ1 (S, τ )
U (S, τ ) = and
2
∂S
∂ ψ1 (S, τ )
V (S, τ ) = .
∂S 2
Then we have U and V satisfy

∂U (S, τ ) ∂ψ1 (S, 0)


= Q(T )U (S, τ ), U (S, 0) = ,
∂τ ∂S
∂V (S, τ ) ∂ 2 ψ1 (S, 0)
= Q(T )V (S, τ ), V (S, 0) = ,
∂τ ∂S 2 (11.26)
∂U (S, τ )
= 0, U (S, 0) = 0,
∂τ
∂V (S, τ )
= 0, V (S, 0) = 0,
∂τ
11.3 Asymptotic Expansions 313

respectively, where U (S, τ ) = νU (S, τ ) and V (S, τ ) = νV (S, τ ). Thus, we


have
U (S, τ ) = 0, V (S, τ ) = 0,
and
U (S, τ ) = exp(Q(T )τ )U (S, 0),
V (S, τ ) = exp(Q(T )τ )V (S, 0).
It follows that

|U (S, τ )| = | exp(Q(T )τ )U (S, 0) − 1lνU (S, 0)|


≤ K exp(−κ0 τ ).

Likewise, |V (S, τ )| ≤ K exp(−κ0 τ ) as desired. 2

11.3.3 Construction of ϕk (S, t) and ψk (S, τ )


Denote ϕk (S, t) ∈ Rm0 , ϕ0k (S, t) ∈ Rm0 , γk (S, t) ∈ R, and ψk (S, τ ) ∈
Rm0 as that of ϕ1 (S, t), ϕ01 (S, t), γ1 (S, t), and ψ1 (S, τ ), respectively. By
induction, we proceed to obtain the terms ϕk (S, t) and ψk (S, τ ) for 1 <
k ≤ n + 1. Similar to the last section, for 1 < k ≤ n + 1, denote

Fk (S, t) = (Fk (S, t, 1), . . . , Fk (S, t, m0 ))0 ∈ Rm0 ,


h ∂ϕ0 (S, t, i) σ 2 (i)S 2 ∂ϕ0 (S, t, i)
k k
Fk (S, t, i) = − +
∂t 2 i ∂S 2 (11.27)
∂ϕ0k (S, t, i)
+ rS − rϕ0k (S, t, i) ,
∂S
F k (S, t) = ν(t)Fk (S, t).

Suppose that we have constructed ϕk−1 (S, t) and ψk−1 (S, τ ) for 1 < k ≤
n + 1. Then similar to ϕ1 (S, t) and ψ1 (S, τ ), we can define ϕk (S, t) and
ψk (S, τ ). Next, write ϕk (S, t) as

ϕk (S, t) = γk (S, t)1l + ϕ0k (S, t), (11.28)

where ϕ0k (S, t) is the unique solution of the system of equations

Q(t)ϕ0k (S, t) = Fk−1 (S, t)


(11.29)
ν(t)ϕ0k (S, t) = 0,

ν(t) is the quasi-stationary distribution associated with Q(t), and γk (S, t)


satisfies
∂γk (S, t) 1 2 ∂ 2 γk (S, t) ∂γk (S, t)
+ σ (t)S 2 + rS − rγk (S, t) = F k (S, t).
∂t 2 ∂S 2 ∂S
(11.30)
314 11. Stochastic Volatility Using Regime-Switching Diffusions

By virtue of Lemma A.12, there is a unique solution to (11.29). To deter-


mine the solution of (11.30) as a solution of the Cauchy problem, we select
the terminal condition from the terminal layer ψk (S, τ ) via the matching
condition (11.11). Note that ψk (S, τ ) satisfies

∂ψk (S, τ ) ek (S, τ ) = 0,


− + Q(T )ψk (S, τ ) + R (11.31)
∂τ

where R ek (S, τ, 1), . . . , R


ek (S, τ ) = (R ek (S, τ, m0 ))0 ∈ Rm0 with R
ek (S, τ, i)
given by (11.14).
Since the construction of ψ2 (S, τ ) is somewhat different from ψ1 (S, τ ),
we single it out first. In view of (11.13) and noting ψ0 (S, τ ) = 0, ψ2 (S, τ )
satisfies
∂ψ2 (S, τ ) e2 (S, τ )
= Q(T )ψ2 (S, τ ) + R
∂τ h 1 ∂ 2 ψ1 (S, τ )
= Q(T )ψ2 (S, τ ) + − τ Q(1) (T )ψ1 (S, τ ) + S 2 Σ
i 2 ∂S 2
∂ψ1 (S, τ )
+ rS − rψ1 (S, τ ) ,
∂S
(11.32)
where
Σ = diag(σ(1), . . . , σ(m0 )) ∈ Rm0 ×m0 . (11.33)

With given initial data ψ2 (S, 0), the solution of (11.32) is given by
Z τ
ψ2 (S, τ ) = exp(Q(T )τ )ψ2 (S, 0) + exp(Q(T )(τ − s))R e2 (S, s)ds.
0
(11.34)
The solution of (11.34) can be further expanded as
Z ∞
ψ2 (S, τ ) = 1lνψ2 (S, 0) + 1l e2 (S, s)ds
νR
0 Z ∞
+ [exp(Q(T )τ ) − 1lν]ψ2 (S, 0) − 1lν e2 (S, s)ds
R (11.35)
Z τ τ

+ [exp(Q(T )(τ − s)) − 1lν]R e2 (S, s)ds.


0

By virtue of Lemmas 11.5 and 11.7, R e2 (S, τ ) decays exponentially fast to


R∞
e
0 so the integral 0 R2 (S, s)ds is well defined. Furthermore,
Z ∞
e2 (S, s)ds ≤ K exp(−κ0 τ ).
R
τ

A similar argument to that of Lemma 11.5 shows that

|[exp(Q(T )τ ) − 1lν]ψ2 (S, 0)| ≤ K exp(−κ0 τ ).


11.3 Asymptotic Expansions 315

In addition,
Z τ
[exp(Q(T )(τ − s)) − 1lν]R e2 (S, s)ds
Z τ
0

≤K | exp(Q(T )(τ − s)) − 1lν||Re2 (S, s)|ds


Z0 τ
≤K exp(−κ0 (τ − s)) exp(−κ0 s)ds

≤ Kτ exp(−κ0 τ ).
Thus, ψ2 (S, τ ) will decay to 0 exponentially fast if we choose
Z ∞
0
1lψ 2 (S) = 1lνψ2 (S, 0) = −1l e2 (S, s)ds.
νR (11.36)
0
0
Note that there is only one unknown in (11.36), namely, ψ 2 (S, 0) = ψ 2 (S).
That is, (11.36) enables us to obtain ψ 2 (S, 0) uniquely. Using γ2 (S, T ) =
−ψ 2 (S, 0) together with (11.30) enables us to find the unique solution for
the Cauchy problem (11.30) that satisfies ψ 2 (S, 0). Therefore, ϕ2 (S, t) and
ψ2 (S, τ ) are completely determined. Moreover, the construction ensures
that
sup |ψ2 (S, τ )| ≤ K exp(−κ0 τ ).
S∈[0,1]
In addition, it can be shown that
∂ i ψ2 (S, τ )
sup ≤ K exp(−κ0 τ ), i = 1, 2.
S∈[0,1] ∂S i
Recall that for simplicity, we have used the convention that K and κ0 are
some positive real numbers. Their values may vary. Their precise values are
not important but only the exponential decay property is crucial.
Proceeding in a similar way, we obtain ψk (S, τ ) as follows. Suppose that
ψ1 (S, τ ), . . . , ψk−1 (S, τ ) have been constructed so that ψj (S, τ ) for j =
1, . . . , k − 1 decay exponentially fast together with their first and second
derivatives (∂/∂S)ψj (S, τ ) and (∂ 2 /∂S 2 )ψj (S, τ ), respectively.
Consider
∂ψk (S, τ ) ek (S, τ ).
= Q(T )ψk (S, τ ) + R (11.37)
∂τ
The solution is then given by
Z τ
ψk (S, τ ) = exp(Q(T )τ )ψk (S, 0) + exp(Q(T )(τ − s))R ek (S, s)ds
Z ∞ 0

= 1lνψk (S, 0) + 1l ek (S, s)ds


νR
0 Z ∞
+ [exp(Q(T )τ ) − 1lν]ψk (S, 0) − 1lν ek (S, s)ds
R
Z τ τ

+ [exp(Q(T )(τ − s)) − 1lν]Rek (S, s)ds.


0
(11.38)
316 11. Stochastic Volatility Using Regime-Switching Diffusions

Choose
Z ∞
0
ψ k (S) := 1lνψk (S, 0) = −1l ek (S, s)ds,
νR
0
0 (11.39)
γk (S, T ) = −ψ k (S),
0
ψk (S, 0) = −ϕk (S, T ) = ψ k (S)1l − ϕk (S, T ).

Then, ϕk (S, t) and ψk (S, τ ) are completely specified. Moreover, we can


establish the following lemmas.
Lemma 11.8. The following assertions hold.
• ψk (S, τ ) can be constructed by using the first line of (11.38) with
terminal data (11.39);
• ψk decay exponentially fast in the sense

sup |ψk (S, τ )| ≤ K exp(−κ0 τ ) (11.40)


S∈[0,1]

for some K > 0 and κ0 > 0.


Lemma 11.9. For k > 1, the functions ψk (S, τ ) satisfy

∂ψk (S, τ )
sup ≤ K exp(−κ0 τ ),
S∈[0,1] ∂S
∂ 2 ψk (S, τ )
sup ≤ K exp(−κ0 τ ).
S∈[0,1] ∂S 2

Proof. Redefine
∂ψk (S, τ )
U (S, τ ) = and
2
∂S
∂ ψk (S, τ )
V (S, τ ) = .
∂S 2
Then U and V satisfy

∂U (S, τ ) ∂ψ1 (S, 0)


= Q(T )U (S, τ ), U (S, 0) = ,
∂τ 2
∂S (11.41)
∂V (S, τ ) ∂ ψ1 (S, 0)
= Q(T )V (S, τ ), V (S, 0) = ,
∂τ ∂S 2
respectively. The rest of the proof is similar to that of Lemma 11.7. 2
We summarize what we have obtained thus far and put it in the following
theorem.
Theorem 11.10. Under conditions (A11.1) and (A11.2), we can con-
struct sequences {ϕk (S, t, i) : i ∈ M; k = 0, . . . , n} and {ψk (S, τ, i) : i ∈
M; k = 0, . . . , n} such that
11.4 Asymptotic Error Bounds 317

• ϕ0 (S, t) = γ0 (S, t)1l with γ0 (S, t) being the solution of (11.15) satisfy-
ing the terminal condition (11.16); ψ0 (S, τ, i) = 0 for each i ∈ M;
• ϕ1 (S, t, i) is given by (11.17) with ϕ01 (S, t) being the unique solution
of (11.18) and γ1 (S, t) given by (11.20) with γ1 (S, T ) = 0 and the
terminal layer term ψ1 (S, τ, i) specified in Lemma 11.5;
• ϕk (S, t) is given by (11.28) with ϕ0k (S, t) being the unique solution of
(11.29) and γk (S, t) being the solution of (11.30) with γ(S, T ) given
in Lemma 11.8 and the terminal layer term ψk (S, τ, i) specified in
Lemma 11.8.

11.4 Asymptotic Error Bounds


We have constructed the formal asymptotic expansions of the option price.
Here we validate the expansions by deriving
n
X n
X
max sup | εk ϕk (S, t, i) + εk ψk (S, τ, i) − V ε (S, t, i)|
i∈M (S,t)∈[0,1]×[0,T ]
k=0 k=0
n+1
= O(ε ).

We first deduce a couple of lemmas.


Lemma 11.11. For each i ∈ M, let u(·, ·, i) ∈ C 2,1 ([0, 1] × [0, T ]; R) such
that v(S, t) = (v(S, t, 1), . . . , v(S, t, m0 ))0 and

Lε u(S, t, i) = v(S, t, i), t < s ≤ T,


u(S, s, i) = 0, S ∈ [0, 1],

where Lε is defined in (11.8).


Assume (A11.1) and (A11.2). Then
Z s
u(S, s, i) = −E v(X ε,S (ξ), αε,i (ξ), i)dξ, (11.42)
t

where (X ε,S (t), αε,i (t)) = (S, i).

Proof. Note that S ∈ [0, 1] and t ∈ [0, T ] for some finite T > 0 so v(·) is
bounded together with its derivatives with respect to S up to the second
order and its derivative with respect to t. In view of (11.2) and (11.3), u(·)
is integrable. Moreover, the local martingale in Lemma 11.1 is in fact a
martingale now. The desired result then follows from Dynkin’s formula. The
proof may also be worked out by means of martingale problem formulation.
2
318 11. Stochastic Volatility Using Regime-Switching Diffusions

Lemma 11.12. Suppose that (A11.1) and (A11.2) hold and for each i ∈
M, eε (·, ·, i) is a suitable function such that
sup |eε (S, t, i)| = O(ε` ) for ` ≤ n + 1. (11.43)
(S,t)∈[0,1]×[0,T ]

Then for each i ∈ M, the solution of


Lε uε (S, t, i) = eε (S, t, i), uε (S, T, i) = 0 (11.44)
satisfies
sup |uε (S, t, i)| = O(ε` ).
(S,t)∈[0,1]×[0,T ]

Proof. In view of Lemma 11.11, the solution of (11.44) can be written as


Z t
uε (S, t, i) = −E eε (S, ξ, i)dξ.
0

Thus, (11.43) leads to


Z T
ε
sup |u (S, t, i)| ≤ K O(ε` ) ≤ O(ε` ).
(S,t)∈[0,1]×[0,T ] 0

The desired result thus follows. 2


With the preparation above, we proceed to obtain the desired upper
bounds on the approximation errors. For k = 0, . . . , n+1, define a sequence
of approximation errors
eεk (S, t, i) = Φεk (S, t, i) + Ψεk (S, τ, i) − V ε (S, t, i), i ∈ M,
where for each i ∈ M, V ε (S, t, i) is the option price given by (11.6), and
Φεk (S, t, i) + Ψεk (S, τ, i) is the kth-order approximation to the option price.
We proceed to obtain the order of magnitude estimates of eεn (S, t, i).
Theorem 11.13. Assume (A11.1) and (A11.2). Then for the asymptotic
expansions constructed in Theorem 11.10,
max sup |eεn (S, t, i)| = O(εn+1 ). (11.45)
i∈M (S,t)∈[0,1]×[0,T ]

Proof. The proof is divided in two steps. In the first step, we obtain an
estimate on Lε eεn+1 (S, t, i), and in the second step, we derive the desired
order estimate.
Step 1. Claim: Lε eεn+1 (S, t, i) = O(εn+1 ). To obtain this, first note that
ε ε
L V (S, t, i) = 0. Thus

Lε eεn+1 (S, t, i) = Lε Φεn+1 (S, t, i) + Lε Ψεn+1 (S, t, i)


n+1
X n+1
X
k ε
= ε L ϕk (S, t, i) + εk Lε ψk (S, τ, i),
k=0 j=1
11.4 Asymptotic Error Bounds 319

where in the last line above, we have used ψ0 (S, τ, i) = 0 for each i ∈ M.
For the outer expansions, we have
n+1
X
Lε εk ϕk (S, t, i)
k=0
n+1
X h ∂ϕ (S, t, i) 1 ∂ 2 ϕk (S, t, i)
k
= εk + σ 2 (i)S 2
∂t 2 ∂S 2
k=0
∂ϕk (S, t, i) Q(t) i
+ rS − rϕk (S, t, i) + ϕk (S, t, ·)(i)
∂S ε
Xn h 2
k ∂ϕk (S, t, i) 1 2 2 ∂ ϕk (S, t, i)
= ε + σ (i)S
∂t 2 ∂S 2
k=0
∂ϕk (S, t, i) i
+ rS − rϕk (S, t, i) + Q(t)ϕk+1 (S, t, ·)(i)
∂S h ∂ϕ
Q(t) n+1 (S, t, i)
+ ϕ0 (S, t, ·)(i) + εn+1
ε 2
∂t i
1 2 2 ∂ ϕn+1 (S, t, i) ∂ϕn+1 (S, t, i)
+ σ (i)S + rS − rϕ n+1 (S, t, i)
2 h ∂S 2 2
∂S
n+1 ∂ϕn+1 (S, t, i) 1 2 2 ∂ ϕn+1 (S, t, i)
=ε + σ (i)S 2
∂t 2 i ∂S
∂ϕn+1 (S, t, i)
+ rS − rϕn+1 (S, t, i) .
∂S
(11.46)
The boundedness of ϕn+1 (S, t, i) and their derivatives up to the order 2
together with (11.46) then lead to
max sup |Lε Φεn+1 (S, t, i)| = O(εn+1 ). (11.47)
i∈M (S,t)∈[0,1]×[0,T ]

As for the terminal layer terms, we have


n+1
X
Lε εk ψk (S, τ, i)
k=1
n+1
X h ∂ψ (S, τ, i) 1 ∂ 2 ψk (S, τ, i) ∂ψk (S, τ, i)
k
= εk−1 + εσ 2 (i)S 2 + εrS
∂τ 2 ∂S 2 ∂S
k=1 i
− εrψk (S, τ, i) + Q(t)ψk (S, τ, ·)(i)
n+1
X nh ∂ψ (S, τ, i) 1 ∂ 2 ψk (S, τ, i) ∂ψk (S, τ, i)
k
= εk−1 + εσ 2 (i)S 2 2
+ εrS
∂τ 2 ∂S ∂S
k=1
− εrψk (S, τ, i) + Q(T )ψk (S, τ, ·)(i)
X k
(−1)j τ j dj Q(T )
+ ψk (S, τ, i)
j=1
j! dtj
h X k
(−1)j τ j dj Q(T ) i o
+ Q(t) − j
ψk (S, τ, ·)(i) .
j=1
j! dt
(11.48)
320 11. Stochastic Volatility Using Regime-Switching Diffusions

Note that for k = 1, . . . , n + 1,


Xk
(−1)j τ j dj Q(T )
Q(t) − j
= |Rk (T − ετ )| ≤ Ktk+1 ,
j=1
j! dt

and that
n+1
X
εj−1 ψj (S, τ, i)O(tn+1−j )
j=1
n+1
X
≤K εn+1−j tj exp(−κ0 τ ) ≤ Kεn+1 .
j=1

The above observation together with (11.48) and


∂ψ1 (S, τ, i)
− + Q(T )ψ1 (S, τ, i) = 0
∂τ
gives us
n+1
X
Lε εk ψk (S, τ, i)
k=1
n+1
X h ∂ψ (S, τ, i) i
k ek + O(εn+1 )
= εk−1 − + Q(T )ψk (S, τ, i) + R
∂τ
k=2
= O(εn+1 ),
(11.49)
which holds uniformly in (S, t) ∈ [0, 1] × [0, T ]. Thus we obtain
max sup |Lε [Φn+1 (S, t, i) + Ψn+1 (S, τ, i)]| = O(εn+1 ). (11.50)
i∈M (S,t)∈[0,1]×[0,T ]

Step 2. Obtain estimate (11.45). Note that the definition of eεn+1 (S, t, i)
and (11.11) yield that eεn+1 (S, T, i) = 0. Because (11.50) holds, it follows
from Lemma 11.12 that
max sup |eεn+1 (S, t, i)| = O(εn+1 ).
i∈M (S,t)∈[0,1]×[0,T ]

Note that

eεn+1 (S, t, i) = eεn (S, t, i) + εn+1 ϕn+1 (S, t, i) + εn+1 ψn+1 (S, τ, i).
(11.51)
The smoothness of ϕn+1 (S, t, i) and the exponential decay of ψn+1 (S, τ, i)
imply that

max sup |εn+1 ϕn+1 (S, t, i) + εn+1 ψn+1 (S, τ, i)| = O(εn+1 ).
i∈M (S,t)∈[0,1]×[0,T ]

Substituting the above into (11.51), we obtain (11.45) as desired. 2


11.5 Notes 321

11.5 Notes
In this chapter, we have developed asymptotic expansions for a European-
type option price. The essence is the use of two-time-scale formulation to
deal with solutions of systems of parabolic PDEs. The result is based on
the recent work of Yin [166]. The approach we are using is constructive.
Thus it sheds more light on how these approximations can be carried out.
Full asymptotic expansions have been obtained with uniform asymptotic
error bounds for the continuous component belonging to a compact set. If
one is only interested in getting asymptotic expansions with certain fixed
state variables (as in [49, 50]), then one can work with the entire space R
rather than a compact set. In lieu of Qε (t) considered thus far, we may
treat a slightly more complex model with

Q0 (t)
Qε (t) = + Q1 (t),
ε
where both Q0 (·) and Q1 (·) are generators of continuous-time Markov
chains such that Q0 (t) is weakly irreducible. Then we can still obtain
asymptotic expansions. The notation, however, will be a bit more complex
due to the addition of Q1 (t). In view of the work by Il’in, Khasminskii, and
Yin [73, 74], the results of this chapter can be extended to switching dif-
fusions in which the switching process has generator Qε (x, t) that depends
on x as well.
For risk-neutral valuation, it is natural to let r be a constant. Never-
theless, the techniques presented here carry over for the more general α-
dependent process; that is r(t) = r(α(t)). Although the main motivation is
from mathematical finance, the techniques developed here can be used for
other problems involving systems of coupled differential equations where a
fast-varying switching process is a driving force.
12
Two-Time-Scale Switching Jump
Diffusions

12.1 Introduction
This chapter is concerned with jump diffusions involving Markovian switch-
ing regimes. In the models, there are a finite set of regimes or configura-
tions and a switching process that dictates which regime to take at any
given instance. At each time t, once the configuration is determined by
the switching process, the dynamics of the system follow a jump-diffusion
process. It evolves until the next jump takes place. Then the post-jump
location is determined and the process sojourns in the new location follow-
ing the evolution of another jump-diffusion process and so on. The entire
system consists of random switches and jump-diffusive motions.
One of our motivations stems from insurance risk theory. To capture
the features of insurance policies that are subject to economic or political
environment changes, generalized hybrid risk models may be considered. To
reduce the complexity of the systems, time-scale separation may be used.
Under the classical insurance risk model, the surplus U (t) of an insurance
company at t ≥ 0 is given by
U (t) = u + ct − S(t),
where u is the initial surplus, c > 0 is the rate at which the premiums
are received, and S(t), a compound Poisson process, is the total claim in
the duration [0, t]. In [35], Dufresne and Gerber extended the classical risk
model by adding an independent diffusion process so that the surplus is
given by
U (t) = u + ct − S(t) + σw(t),

G.G. Yin and C. Zhu, Hybrid Switching Diffusions: Properties and Applications, 323
Stochastic Modelling and Applied Probability 63, DOI 10.1007/978-1-4419-1105-6_12,
© Springer Science + Business Media, LLC 2010
324 12. Two-Time-Scale Switching Jump Diffusions

where w(t) is a standard real-valued Brownian motion that represents un-


certainty (often referred to as oscillations) of premium incomes and claims.
Subsequently, much work has been devoted to such jump-diffusion models;
see also variants of the models in [128, 140] and the references therein.
Recently, growing attention has been drawn to the use of switching mod-
els in finance and the insurance industry. For instance, taking the oppor-
tunity provided by using a switching process to represent the underlying
economy switching among a finite number of discrete states, the European
options under the Black–Scholes formulation of the stock market were con-
sidered in [32]; the American options were dealt with in [20]; algorithms
for liquidation of a stock were constructed in [170]. Using a random pure
jump process to represent a random environment, we proposed a Markovian
regime-switching formulation in [163] to model the insurance surplus pro-
cess. Suppose that there is a finite set M = {1, . . . , m0 }, representing the
possible regimes (configurations) of the environment. At each i ∈ M, as-
sume that the premium is payable at the rate c(i) continuously. Let U (t, i)
be the surplus process given the initial surplus u > 0 and initial state
α(0) = i:
Z t
U (t, i) = u + c(α(s))ds − S(t),
0

where S(t), as in the classical risk model, is a compound Poisson process


and α(t) is a continuous-time Markov chain with state space M repre-
senting the random environment. Under suitable conditions, we obtained
Lunderberg-type upper bounds and nonexponential upper bounds for the
ruin probability, and treated a renewal-type system of equations for ruin
probability when the claim sizes are exponentially distributed. One of the
main features of [163] is that there is an additional Markov chain, which
enables the underlying surplus to vary in accordance with different regimes.
Here we are concerned with a class of jump diffusions with regime switch-
ing to prepare us for treating applications involving more general risk mod-
els. In this chapter, we consider jump diffusions modulated by a continuous-
time Markov chain. Because the dynamic systems are complex, it is of fore-
most importance to reduce the complexity. Taking into consideration the
inherent hierarchy in a complex system [149], and different rates of vari-
ations of subsystems and components, we use the two-time-scale method
leading to systems in which the fast and slow rates of change are in sharp
contrast. Then we proceed to reduce the system complexity by aggrega-
tion/decomposition and averaging methods. We demonstrate that under
broad conditions, associated with the original systems, there are limit or
reduced systems, which are averages with respect to certain invariant mea-
sures. Using weak convergence methods [102, 103], we obtain the limit
system (jump diffusion with regime switching) via martingale problem for-
mulation.
Let Γ ⊂ Rr − {0} be the range space of the impulsive jumps, w(·) be a
12.1 Introduction 325

real-valued standard Brownian motion, and N (·, ·) be a Poisson measure


such that N (t, H) counts the number of impulses on [0, t] with values in
the set H. Let f (·, ·, ·) : [0, T ] × R × M 7→ R, σ(·, ·, ·) : [0, T ] × R × M 7→ R,
g(·, ·, ·) : Γ × R × M 7→ R, and α(·) be a continuous-time Markov chain
having a state space M. A brief description of the jump-diffusion process
with a modulating Markov chain can be found in Section A.6 of this book.
Consider the following jump-diffusion process with regime switching
Z t Z t
X(t) = x + f (s, X(s), α(s))ds + σ(s, X(s), α(s))dw(s)
Z t 0Z 0 (12.1)
− −
+ g(γ, X(s ), α(s ))N (ds, dγ).
0 Γ

Throughout the chapter, we assume that w(·), N (·), and α(·) are mutually
independent. Compared with the traditional jump-diffusion processes, the
coefficients involved in (12.1) all depend on an additional switching process,
namely, the Markov chain α(t).
In the context of risk theory, X(t) can be considered as the surplus of the
insurance company at time t, x is the initial surplus, f (t, X(t), α(t)) repre-
sents the premium rate (assumed to be ≥ 0), g(γ, X(t), α(t)) is the amount
of the claim whenever there is one (assumed to be ≤ 0), and the diffusion
is used to model additional uncertainty of the claims and/or premium in-
comes. Similar to the volatility in stock market models, σ(·, ·, i) represents
the amount of oscillations or volatility in an appropriate sense. The model
is sufficiently general to cover the traditional compound Poisson models as
well as the diffusion perturbed ruin models. It may also be used to repre-
sent security price in finance (see [124, Chapter 3]). The process α(t) may
be viewed as an environment variable dictating the regime. The use of the
Markov chain results from consideration of the general trend of the mar-
ket environment as well as other economic factors. The economic and/or
political environment changes lead to the changes of regime of the surplus,
resulting in markedly different behavior of the system across regimes.
Defining a centered (or compensated) Poisson measure and applying gen-
eralized Itô’s rule, we can obtain the generator of the jump-diffusion process
with regime switching, and formulate a related martingale problem. Instead
of a single process, we have to deal with a collection of jump-diffusion pro-
cesses that are modulated by a continuous-time Markov chain. Suppose that
λ is positive such that λ∆ + o(∆) represents the probability of a switch of
regime in the interval [t, t + ∆), and π(·) is the distribution of the jump.
Then the generator of the underlying process can be written as
 

GF (t, x, ι) = + L F (t, x, ι)
∂t
Z
+ λ[F (t, x + g(γ, x, ι), ι) − F (t, x, ι)]π(dγ) (12.2)
Γ
+ Q(t)F (t, x, ·)(ι), for each ι ∈ M,
326 12. Two-Time-Scale Switching Jump Diffusions

where
1 2 ∂2 ∂
LF (t, x, ι) = σ (t, x, ι) 2 F (t, x, ι) + f (t, x, ι) F (t, x, ι),
2 m ∂x ∂x
X 0 X
Q(t)F (t, x·)(ι) = aι` (t)F (t, x, `) = aι` (t)[F (t, x, `) − F (t, x, ι)].
`=1 `6=ι
(12.3)
By concentrating on time-scale separations, in this chapter, we treat two
cases. In the first one, the regime switching is significantly faster than the
dynamics of the jump diffusions, whereas in the second case, the diffusion
varies an order of magnitude faster than the switching processes.
The rest of the chapter is arranged as follows. Section 12.2 is devoted
to the case of fast switching. It begins with the precise formulation of the
problem. Then we derive weak convergence results and demonstrate that
the complicated problem can be “replaced” by a limit problem in which the
system coefficients are averaged out with respect to the stationary measures
of the switching process. In Section 12.3, we continue our study for the case
of fast varying diffusions. Again, by means of weak convergence methods,
we obtain a limit system. Section 12.4 gives remarks on specialization and
generalization of the asymptotic results. Section 12.5 gives remarks on nu-
merical approximation for switching-jump-diffusion processes. Section 12.6
concludes the chapter.

12.2 Fast-Varying Switching


12.2.1 Fast-Varying Markov Chain Model
Consider a continuous-time inhomogeneous Markov chain α(t) with gen-
erator Q(t). Recall that the Markov chain α(t) or the generator Q(t) is
weakly irreducible if the system of equations



 ν(t)Q(t) = 0,
Xm0

 νi (t) = 1

i=1

has a unique nonnegative solution. The nonnegative solution (row-vector-


valued function) ν(t) = (ν1 (t), . . . , νm0 (t)) is termed a quasi-stationary
distribution.
For the fast-varying Markov chain model, by introducing a small param-
eter ε > 0 into the problem, suppose that α(t) = αε (t) with the generator
of the Markov chain given by
1e b
Qε (t) = Q(t) + Q(t). (12.4)
ε
12.2 Fast-Varying Switching 327

e
Both Q(t) b
and Q(t) are generators, where Q(t)/ε e represents the rapidly
b
changing part and Q(t) describes the slowly varying part. The slow and
fast components are coupled through weak and strong interactions in the
sense that the underlying Markov chain fluctuates rapidly within a single
group Mk of states and jumps less frequently from group Mk to Mj for
e
k 6= j. Suppose that the generator Q(t) has the form
 
e = diag Q
Q(t) e 1 (t), . . . , Q
e l (t) , (12.5)

e k (t) is a generator corresponding to the states


where for each k = 1, . . . , l. Q
in Mk = {sk1 , . . . , skmk }. Naturally, the state space can be decomposed to

M = M1 ∪ · · · ∪ Ml = {s11 , . . . , s1m1 } ∪ · · · ∪ {sl1 , . . . , slml }. (12.6)

The associated system can be written as


Z t Z t
ε ε
ε
X (t) = x + f (s, X (s), α (s))ds + σ(s, X ε (s), αε (s))dw
Z tZ0 0 (12.7)
+ g(γ, X ε (s− ), αε (s− ))N (ds, dγ).
0 Γ

Define the centered (or compensated) Poisson measure

e (t, H) = N (t, H) − λtπ(H).


N

Then (12.7) may be rewritten as


Z t
X ε (t) = x + f (s, X ε (s), αε (s))ds
Z tZ0

+λ g(γ, X ε (s− ), αε (s− ))π(dγ)ds


Z t0 Γ (12.8)
+ σ(s, X ε (s), αε (s))dw
Z0 t Z
+ g(γ, X ε (s− ), αε (s− ))Ne (ds, dγ).
0 Γ

Note that the last two terms are martingales. The operator of the regime-
switching jump-diffusion process is given by
 
ε ∂
G F (t, x, ι) = + L F (t, x, ι) + J(t, x, ι) + Qε (t)F (t, x, ·)(ι), (12.9)
∂t

where
Z
J(t, x, ι) = λ[F (t, x + g(γ, x, ι), ι) − F (t, x, ι)]π(dγ).
Γ
328 12. Two-Time-Scale Switching Jump Diffusions

Remark 12.1. The use of the two-time-scale formulation with a small


parameter ε > 0 stems from an effort to reduce complexity. Taking into
consideration various factors, the number of states of the Markov chain
is usually large; the system becomes complex; dealing with it is difficult.
Nevertheless, we demonstrate by letting ε → 0, a limit system can be
obtained, which is an average of the original system with respect to a set
of quasi-stationary measures.

To carry out the desired analysis, we postulate the following conditions.

(A12.1) E|x|2 < ∞, and the following conditions hold.


(a) The functions f (·) and σ(·) satisfy: For each α ∈ M, f (·, ·, α)
and σ(·, ·, α) are defined and Borel measurable on [0, T ] × R;
for each (t, x, α) ∈ [0, T ] × M × R,

|f (t, x, α)| ≤ K(1 + |x|) and |σ(t, x, α)| ≤ K(1 + |x|);

and for any z, x ∈ R,

|f (t, z, α) − f (t, x, α)| ≤ K|z − x| and


|σ(t, z, α) − σ(t, x, α)| ≤ K|z − x|.

(b) 0 < λ < ∞. For each β ∈ M, and g(·, ·, α) is a bounded


and continuous function, g(0, x, α) = 0, and for each x, the
value of γ can be determined uniquely by g(γ, x, α).
e
(A12.2) Both Q(t) b
and Q(t) are generators that are bounded and Borel
e k (t)
measurable such that for each k = 1, . . . , l and t ∈ [0, T ], Q
is weakly irreducible with the associated quasi-stationary distri-
bution ν k (t) = (ν1k (t), . . . , νm
k
k
(t)) ∈ R1×mk .

Remark 12.2. In the above and throughout the chapter, K is used as a


generic positive constant, whose values may change for different usages, so
the conventions K + K = K and KK = K are understood.
For each fixed α ∈ M, the conditions on f (·, ·, α) and g(·, ·, α) are the
“Itô conditions” that ensure the existence and uniqueness of the solution
e leads to the
of the stochastic differential equation. The assumption on Q(t)
ε
partition of M, the state space of α (·) as in (12.6), which in turn leads
to the natural definition of an aggregated process. That is, by aggregating
the states skj in Mk as one state, we obtain an aggregated process αε (·)
defined by
αε (t) = k if αε (t) ∈ Mk . (12.10)
Thus, lump all the states in each weakly irreducible class into one state
resulting in a process with considerably smaller state space. Note that the
process αε (·) is not necessarily Markovian.
12.2 Fast-Varying Switching 329

Condition (A12.2) is concerned with the Markov chain αε (·), which is


modeled after that of [176, Chapter 7.5, p. 210] and allows the generators
to be time-dependent. Note that no continuity of Q(·)e and Q(·)
b is required,
but merely boundedness and measurability are assumed; see also [178] for
the inclusion of transient states. For our applications, the only requirement
e and the weak irreducibility of each Q
is the partitioned form Q(·) e k (·).

The following idea uses an averaging approach by aggregating the states


in each weakly irreducible class into a single state, and replacing the original
complex system by its limit, an average with respect to the quasi-stationary
distributions. Using certain probabilistic arguments, we have shown in [176,
Section 7.5] (see also [178]) that
(i) αε (·) converges weakly to α(·) whose generator is given by

b
Q(t) = diag(ν 1 (t), . . . , ν l (t))Q(t)diag(1
lm1 , . . . , 1lml ), (12.11)

where ν k (t) is the quasi-stationary distribution of Q e k (t), for k =


1, . . . , l, and 1l` = (1, . . . , 1)0 ∈ R` is an `-dimensional column vector
with all components being equal to 1;
(ii) for k = 1, . . . , l and j = 1, . . . , mk , as ε → 0,
Z t 2

sup E I{αε (s)=skj } − νjk I{αε (s)=k} ds → 0, (12.12)
0≤t≤T 0

where IA is the indicator function of the set A.

12.2.2 Limit System


Working with the pair Y ε (·) = (X ε (·), αε (·)), we obtain the following the-
orem. It indicates that there is a limit system associated with the original
process leading to a reduction of complexity.
Theorem 12.3. Assuming (A12.1) and (A12.2), Y ε (·) = (X ε (·), αε (·))
converges weakly to Y (·) = (X(·), α(·)), which is a solution of the martin-
gale problem with operator
 

GF (t, x, i) = + L F (t, x, i) + J(t, x, ι) + Q(t)F (t, x, ·)(i), (12.13)
∂t

for i ∈ M = {1, . . . , l}, where


l
X X
Q(t)F (t, x, ·)(i) = q ij (t)F (t, j, x) = q ij (t)(F (t, x, j) − F (t, x, i)),
j=1 j∈M
j6=i

(12.14)
330 12. Two-Time-Scale Switching Jump Diffusions

L is a second-order differential operator given by (12.3) with σ 2 (t, x, ι),


J(t, x, ι), and f (t, x, ι) replaced by σ 2 (t, x, i), J(t, x, ι), and f (t, x, i), re-
spectively, and
mi
X
f (t, x, i) = νji (t)f (t, x, sij ),
j=1
Xmi
J(t, i, x) = νji (t)J(t, x, sij ) (12.15)
j=1
mi
X
σ 2 (t, x, i) = νji (t)σ 2 (t, x, sij ).
j=1

Remark 12.4. Theorem 12.3 characterizes the limit as a solution of the


associated martingale problem with operator G. It can also be described by
the limit stochastic differential equation, given by
Z t Z t
X(t) = x + f (s, X(s), α(s))ds + σ(s, X(s), α(s))dw
Z t 0Z 0 (12.16)
+ g(γ, X(s− ), α(s− ))N (ds, dγ),
0 Γ

where α(·) is a Markov chain generated by Q(·). In particular, if the Markov


e
chain corresponding to the generator Q(t) consists of only one weakly ir-
reducible block, then the limit or the averaged system becomes a jump-
diffusion process. We state the result as follows.
Corollary 12.5. Under the conditions of Theorem 12.3 with the modifi-
cation that Qε (t) = Q(t)/ε + Q(t) b such that M = {1, . . . , m} and that Q(t)
is weakly irreducible for each t ∈ [0, T ] with the associated quasi-stationary
distribution ν(t) = (ν1 (t), . . . , νm0 (t)), then Y ε (·) converges weakly to Y (·)
such that
Z tX m0
X(t) = x + f (s, X(s), ι)νι (s)ds
0 ι=1
v
Z tuuX m0
+ t σ 2 (s, X(s), ι)νι (s)dw(s) (12.17)
0 ι=1
Z tX
m0
+ g(γ, X(s− ), ι)νι (s− )N (ds, dγ).
0 ι=1

Proof of Theorem 12.3. To proceed, the detailed proof is given by es-


tablishing a series of lemmas. We first show that an a priori bound holds.
Lemma 12.6. Under the conditions of Theorem 12.3,

sup E|X ε (t)|2 = O(1).


0≤t≤T
12.2 Fast-Varying Switching 331

Proof. It follows from (12.7),


 Z t 2
ε 2 2
E|X (t)| ≤ K E|x| + E f (u, X ε (u), αε (u))du
Z t 0
2
+E σ(u, X ε (u), αε (u))dw(u)
Z0 t Z 2
ε − ε −
+E g(γ, x (u ), α (u ))N (du, dγ) .
0 Γ

By virtue of the argument of [103, p.39],


Z tZ 2
E g(γ, X ε (u− ), αε (u− ))N (du, dγ) = O(1),
0 Γ

and the bound holds uniformly in t ∈ [0, T ]. Using the linear growth of
f (t, x, α) and σ(t, x, α) given in (A12.1) together with properties of stochas-
tic integrals, we obtain
Z t
E|X ε (t)|2 ≤ K + K E|X ε (u)|2 du.
0

The well-known Gronwall inequality yields

sup E|X ε (t)|2 ≤ K exp(KT ) < ∞


t∈[0,T ]

as desired. 2
Next, we derive the tightness of Y ε (·).

Lemma 12.7. Assume that the conditions of Theorem 12.3 are satisfied.
Then Y ε (·) is tight in D([0, T ] : R × M), the space of functions that are
right-continuous, and have left limits endowed with the Skorohod topology.

Proof. Because αε (·) converges weakly to α(·) [176, p. 172], {αε (·)} is
tight. Therefore, to obtain the tightness of {Y ε (·)}, it suffices to derive the
tightness of {X ε (·)}.
Note that for any t > 0, s > 0, and any δ > 0 with 0 < s ≤ δ, we have

Eεt |X ε (t + s) − X ε (t)|2
Z t+s 2
ε ε ε
≤ KEt |f (u, X (u), α (u))|du
t
Z t+s 2 (12.18)
+ Eεt σ(u, X ε (u), αε (u))dw(u)
Zt t+s Z 2
+ Eεt g(γ, X ε (u− ), αε (u− ))N (du, dγ) ,
t Γ
332 12. Two-Time-Scale Switching Jump Diffusions

where Eεt denotes the conditional expectation with respect to the σ-algebra
generated by {αε (u), X ε (u) : u ≤ t}. Using the argument as in [103, p. 39],
since s ≤ δ,
Z t+s Z 2
ε
Et g(γ, X ε (u− ), αε (u− ))N (du, dγ) ≤ Ks = O(δ).
t Γ

Taking the expectation in (12.18), and applying Lemma 12.6 lead to


Z t+s 2
ε ε 2 ε
E|X (t + s) − X (t)| ≤ K (1 + E|X (u)|)du
Zt t+s
(12.19)
+K (1 + E|X ε (u)|2 )du + O(s)
t
≤ K(s2 + s) + O(δ) = O(δ).

As a result,
lim lim sup E|X ε (t + s) − X ε (t)|2 = 0.
δ→0 ε→0

By virtue of the tightness criterion (see Lemma A.28 of this book, and also
[43, Section 3.8, p. 132] or [102, p. 47] or [16]), the tightness of {X ε (·)}
follows. 2
Lemma 12.8. Assume the conditions of Theorem 12.3 are fulfilled. Sup-
pose that ζ(t, z) defined on [0, T ] × R is a real-valued function that is Lip-
schitz continuous in both variables, and that for each x ∈ Rr , |ζ(t, x)| ≤
K(1 + |x|). Denote
ε
vij (t) = vij (t, αε (t)) with vij (t, α) = I{α=sij } − νji (t)I{α∈Mi } .
Then for any i = 1, . . . , l, j = 1, . . . , mi ,
Z t 2
sup E ζ(u, X ε (u))vij ε
(u, αε (u))du → 0 as ε → 0. (12.20)
0<t≤T 0

Proof. The proof of this lemma is similar to that of Lemma 7.14 in [176].
Pick out 0 < ∆ < 1. For any t ∈ [0, T ], partition [0, t] into subintervals
of equal length ε1−∆ (without loss of generality, assume that `0 = t/ε1−∆
is an integer, otherwise, we can always take its integer part). Denote the
partition boundaries by tk = kε1−∆ for 0 ≤ k ≤ `0 . Define
ζeε (u) = ζ(tk , X ε (tk )), u ∈ [tk , tk+1 ), 0 ≤ k ≤ `0 − 1. (12.21)
In view of the process N (·), αε (·), and w(·), the same argument as in the
proof of the tightness yields

E|X ε (t) − X ε (tk )|2


(12.22)
= O(t − tk ) = O(ε1−∆ ) → 0 as ε → 0
12.2 Fast-Varying Switching 333

for t ∈ [tk , tk+1 ], 0 ≤ k ≤ `0 − 1. It follows that


Z t 2
E ζ(u, X ε (u)))vij ε
(u, αε (u))du
0 Z t 2
≤ 2E [ζ(u, X ε (u)) − ζeε (u)]vijε
(u, αε (u))du (12.23)
0Z
t 2
+ 2E ζeε (u)vij
ε
(u, αε (u))du .
0

We claim that the term on the second line of (12.23) goes to 0. To see this
(recall that K is a generic positive constant), by using the Cauchy–Schwarz
inequality and the Lipschitz continuity of ζ(·) and (12.22),
Z t 2
E [ζ(u, X ε (u)) − ζeε (u)]vij
ε
(u, αε (u))du
0Z
t
≤T E|ζ(u, X ε (u)) − ζeε (u)|2 du
0
0 −1 Z tk+1
`X
≤K E[(u − tk )2 + |X ε (u) − X ε (tk )|2 ]du
k=0 tk
0 −1 Z tk+1
`X
≤K O(ε1−∆ )du
k=0 tk

→ 0 as ε → 0.

To estimate the term on the last line of (12.23), for each i = 1, . . . , l, and
j = 1, . . . , mi , define
Z t 2
ε
ηij (t) = E ζeε (u)vij
ε
(u, αε (u))du .
0

Then similar to the derivation of [176, pp. 191–192],


d ε
η (t) = O(ε1−∆ ), η ε (0) = 0,
dt ij
as ε → 0. Thus, solving the above initial value problem leads to
Z t
ε
sup ηij (t) = sup O(ε1−∆ )ds = O(ε1−∆ ) → 0 as ε → 0
0≤t≤T 0≤t≤T 0

as desired. 2
To proceed, for each i ∈ M = {1, . . . , l} and each F (·, ·, i) ∈ C01,2 (C01,2
represents the class of functions that have compact support and that are
continuously differentiable with respect to t and twice continuously differ-
entiable with respect to x), consider the operator defined in (12.13). The
next lemma gives the characterization of the limit process as a solution of
a martingale problem.
334 12. Two-Time-Scale Switching Jump Diffusions

Lemma 12.9. Under the conditions of Theorem 12.3, the limit process
{Y (·)} is the solution of the martingale problem with operator G given by
(12.13).

Proof. Using an argument similar to that of Lemma 7.18 in [176], it can be


shown that the martingale problem with operator G has a unique solution
for each initial condition.
To obtain the desired result, it suffices to show that for each i ∈ M and
F (·, ·, i) ∈ C01,2 ,
Z t
F (t, X(t), α(t)) − F (0, x, α) − GF (u, X(u), α(u))du
0

is a martingale. To this end, we show that for any positive integer n0 , any
bounded and continuous functions h` (·), ` ≤ n0 , and any t, s, t` ≥ 0 with
t` ≤ t < t + s ≤ T ,
n0
Y
E h` (X(t` ), α(t` )) F (t + s, X(t + s), α(t + s)) − F (t, X(t), α(t))
`=1
Z !
t+s
− GF (u, X(u), α(u))du = 0.
t
(12.24)
Let us begin with the process Y ε (·). Define

l
X
Fb (t, x, α) = F (t, x, i)I{α∈Mi } for each α ∈ M.
i=1

Clearly, Fb(t, X ε (t), αε (t)) = F (t, X ε (t), αε (t)). Moreover, for each ι ∈ M,
Fb (·, ·, ι) ∈ C01,2 . The function Fb(·) allows us to conveniently use the avail-
able αε (·) process in lieu of the aggregated process αε (·). Consider the
operator G ε defined in (12.9). Because Y ε (·) is a Markov process,
Z t
Fb (t, X ε (t), αε (t)) − Fb(0, x, αε (0)) − G ε Fb (u, X ε (u), αε (u))du
0

is a martingale. Consequently,
n0
Y h
E h` (X ε (t` ), αε (t` )) Fb(t + s, X ε (t + s), αε (t + s)) − Fb (t, X ε (t), αε (t))
`=1 Z
t+s i
− G ε Fb (u, X ε (u), αε (u))du = 0.
t
(12.25)
We proceed to obtain the limit in (12.25) as ε → 0.
12.2 Fast-Varying Switching 335

First, by the weak convergence Y ε (·) to Y (·), the definition of Fb (·), and
the Skorohod representation, as ε → 0,
n0
Y
E h` (X ε (t` ), αε (t` ))[Fb(t + s, X ε (t + s), αε (t + s)) − Fb (t, X ε (u), αε (t))]
`=1
n0
Y
→E h` (X(t` ), α(t` ))[F (t + s, X(t + s), α(t + s)) − F (t, X(t), α(t))].
`=1
(12.26)
The definition of G ε leads to
n0
Y Z t+s 
E ε
h` (X (t` ), α (t` )) ε
G Fb(u, X ε (u), αε (u))du
ε
t
`=1
n0
"Z
Y t+s
∂ b
ε ε
=E h` (X (t` ), α (t` )) F (u, X ε (u), αε (u))du
t ∂u
`=1
Z t+s
+ Fbx (u, X (u), α (u))f (u, X ε (u), αε (u))du
ε ε

Zt t+s
+ [Fbxx (u, X ε (u), αε (u))σ 2 (u, X ε (u), αε (u))]du
Zt t+s
+ Qε (u)Fb(u, X ε (u), ·)(αε (u))du
Z t #
t+s
+ J(u, X ε (u), αε (u))du .
t

Note that
n0
Y Z t+s 
E h` (X ε (t` ), αε (t` )) Fbx (u, X ε (u), αε (u))f (u, X ε (u), αε (u))du
`=1 t
mi
l X
X n0
Y
= E h` (X ε (t` ), αε (t` ))
i=1 j=1 `=1
Z t+s 
× Fbx (u, X ε (u), sij )f (u, X ε (u), sij )I{αε (u)=sij } du
t
mi
l X
X n0
Y
= E h` (X ε (t` ), αε (t` ))
i=1 j=1 `=1
Z t+s 
× Fbx (u, X ε (u), sij )f (u, X ε (u), sij )νji (u)I{αε (u)=i} du
t
mi
l X
X n0
Y
+ E h` (X ε (t` ), αε (t` ))
"i=1
Z
j=1 `=1
t+s
× Fbx (u, X ε (u), sij )f (u, X ε (u), sij )
t
#
i

× I{αε (u)=sij } − νj (u)I{αε (u)=i} du .
336 12. Two-Time-Scale Switching Jump Diffusions

By virtue of Lemma 12.8, the use of the Cauchy–Schwarz inequality, and


noting the boundedness of h` (·), for each i = 1, . . . , l, j = 1, . . . , mi ,

n0
Y Z t+s
E ε ε
h` (X (t` ), α (t` )) Fbx (u, X ε (u), sij )f (u, X ε (u), sij )
`=1 t
 2
× I{αε (u)=sij } − νji (u)I{αε (u)=i} du
Z t+s
≤ KE Fbx (u, X ε (u), sij )f (u, X ε (u), sij )
t
2

× I{αε (u)=sij } − νji (u)I{αε (u)=i} du

→ 0 as ε → 0.

In view of [176, Lemma 2.4] and similar to [176, Theorem 7.30], it can
be shown that

(I{αe (·)=1} , . . . , I{αe (·)=l} ) converges weakly to (I{α(·)=1} , . . . , I{α(·)=l} ).

By means of Cramér–Wold’s device [16, p. 48], for each i ∈ M, I{αε (·)=i}


converges to I{α(·)=i} weakly. By the Skorohod representation (without
changing notation and with a slight abuse of notation), we may assume
I{αε (·)=i} → I{α(·)=i} w.p.1. Consequently, the weak convergence of Y ε (·)
to Y (·), the Skorohod representation, and the convergence of I{αε (·)=i} to
I{α(·)=i} imply that

n0
Y Z t+s 
E ε ε
h` (X (t` ), α (t` )) Fbx (u, X ε (u), αε (u))f (u, X ε (u), αε (u))du
`=1 t
mi
l X
X n0
Y
ε→0
−→ E h` (X(t` ), α(t` ))
i=1 j=1 `=1
Z t+s 
× Fbx (u, X(u), sij )f (u, X(u), sij )νji (u)I{α(u)=i} du
t
l
X Yn0
= E h` (X(t` ), α(t` ))
i=1
Z t+s
`=1 
× Fx0 (u, X(u), i)f (u, X(u), i)I{α(u)=i} du
t Z t+s 
n0
Y
0
=E h` (X(t` ), α(t` )) Fx (u, X(u), α(u))f (u, X(u), α(u))du .
`=1 t
(12.27)
12.2 Fast-Varying Switching 337

Exactly the same argument as in the derivation of (12.27) yields


n0
Y
E h` (X ε (t` ), αε (t` ))
`=1 Z 
t+s
× [Fbxx (u, X ε (u), αε (u))σ 2 (u, X ε (u), αε (u))]du
t (12.28)
n0
Y
→E h` (X(t` ), α(t` ))
Z`=1t+s 
2
× [Fxx (u, X(u), α(u))σ (u, X(u), α(u))]du ,
t

as ε → 0 and
n0
Y Z t+s 
∂ b
E h` (X ε (t` ), αε (t` )) F (u, X ε (u), αε (u))du
t ∂u
`=1
n0
Y Z t+s 

→ h` (α(t` ), X(t` )) F (u, α(u), X(u), α(u))du ,
t ∂u
`=1
(12.29)
as ε → 0.
Next, since
e i (u)1lm = 0 for each i = 1, . . . , l,
Q i

the definition of Fb (·) yields


e Fb (u, X ε (u), ·)(αε (u)) = 0.
Q(u)

Therefore, we have
n0
Y Z t+s 
E h` (X ε (t` ), αε (t` )) Qε (u)Fb(u, X ε (u), ·)(αε (u))du
`=1 t
n0
Y Z t+s 
=E h` (X ε (t` ), αε (t` )) b Fb(u, X ε (u), ·)(αε (u))du
Q(u)
`=1 t
mi
l X
X n0
Y
= E h` (X ε (t` ), αε (t` ))
i=1 j=1 `=1
Z t+s 
× b Fb (u, X ε (u), ·)(sij )νji (u)I{αε (u)=i} du
Q(u)
t
mi
l X
X n0
Y
+ E h` (X ε (t` ), αε (t` ))
"i=1
Z
j=1 `=1
t+s
× b Fb(u, X ε (u), ·)(sij )
Q(u)
t
#
×[I{αε (u)=sij } − νji (u)I{αε (u)=i} ]du .
338 12. Two-Time-Scale Switching Jump Diffusions

By virtue of Lemma 12.8 again, the last term above goes to 0 as ε → 0.


For the next to the last term, we have

mi
l X
X n0
Y
E h` (X ε (t` ), αε (t` ))
i=1 j=1 `=1
Z t+s 
× b Fb (u, X ε (u), ·)(sij )νji (u)I{αε (u)=i} du
Q(u)
t
mi
l X
X n0
Y
= E h` (X ε (t` ), αε (t` ))
i=1 j=1 `=1
Z t+s 
× b
Q(u)F (u, X ε (u), ·)(i)I{αε (u)=i} du
t
l
X n0
Y Z t+s 
→ E h` (X(t` ), α(t` )) Q(u)F (u, X(u), ·)(i)I{α(u)=i} du
i=1 `=1 t
n0
Y Z t+s 
=E h` (X(t` ), α(t` )) Q(u)F (u, X(u), ·)(α(u))du ,
`=1 t

as ε → 0.
Arguing along the same line as in the above estimates, we obtain
Z t+s
λ[Fb(u, X ε (u− ) + g(γ, X ε (u− ), αε (u− )), αε (u− ))
t
− Fb(u, X ε (u− ), αε (u− ))]π(dγ)
Xl Xmi Z t+s
= λ[Fb (u, X ε (u− ) + g(γ, X ε (u− ), sij ), sij )
i=1 j=1 t

− Fb (u, X ε (u− ), sij )]π(dγ)I{αε (u)=sij }


Xl X mi Z t+s
= λ[Fb (u, X ε (u− ) + g(γ, X ε (u− ), sij ), sij )
i=1 j=1 t

− Fb(u, X ε (u− ), sij )]π(dγ)νji (u− )I{αε (u− )∈Mi }


Xl Xmi Z t+s
+ λ[Fb (u, X ε (u− ) + g(γ, X ε (u− ), sij ), sij )
i=1 j=1 t

− Fb(u, X ε (u− ), sij )]π(dγ)[I{αε (u− )=sij } − νji (u− )I{αε (u− )∈Mi } ]
Xl Xmi Z t+s
= λ[Fb (u, X ε (u− ) + g(γ, X ε (u− ), sij ), sij )
i=1 j=1 t

− Fb(u, X ε (u− ), sij )]π(dγ)νji (u− )I{αε (u− )∈Mi } + o(1),

where o(1) → 0 in probability uniformly in t on any bounded set. By virtue


of the weak convergence of Y ε (·) to Y (·), the Skorohod representation, and
12.3 Fast-Varying Diffusion 339

the dominated convergence theorem, we have


n0
Y
E h` (X ε (t` ), αε (t` ))
`=1 Z
 t+s
× λ[Fb (u, X ε (u− ) + g(γ, X ε (u− ), αε (u− )), αε (u− ))
t

− Fb(u, X ε (u− ), αε (u− ))]π(dγ)
n0
Y (12.30)
→E h` (X(t` ), α(t` ))
`=1
mi
l X Z
X t+s
× λ[Fb (u, X(u− ) + g(γ, X(u− ), sij ), sij )
i=1 j=1 t

− Fb(u, X(u− ), sij )]π(dγ)νji (u− )I{α(u− )=i} ,

as ε → 0. Combining (12.26)–(12.30), we obtain the desired result. 2

12.3 Fast-Varying Diffusion


This section presents another hybrid jump-diffusion model. Compared with
the Markov modulated jump-diffusion model with fast switching discussed
in the last section, there is an additional periodic fast-varying diffusion.
From the motivation of an insurance point of view, the added periodic
diffusion may be seen as a way of handling seasonal effects of uncertainty
due to claims and premium incomes. This periodic diffusion varies at a
faster pace compared with the other random effects.
Consider the system given by
Z t Z t
ε ε
ε
X (t) = x + f (X (s), α(s), z (s))ds + σ(X ε (s), α(s), z ε (s))dw
Z t Z0 0
ε − − ε −
+ g(γ, X (s ), α(s ), z (s ))N (ds, dγ),
0 ΓZ Z t
1 t ε ε 1
ε
z (t) = z0 + f1 (X (s), z (s))ds + √ σ1 (X ε (s), z ε (s))dv,
ε 0 ε 0
(12.31)
where w(·) and v(·) are independent standard Brownian motions. In the
above, z ε (·) represents a fast-varying diffusion. Relative to z ε (·), X ε (·) is a
slowly varying jump-diffusion process, which is modulated by a continuous-
time Markov chain. Due to its scaling, the process z ε (·) does not blow up.
Under suitable conditions, we show that X ε (·) converges weakly to a jump-
diffusion process modulated by the Markov chain α(·), in which the system
dynamics are averaged out with respect to the stationary measure of the
fast process z ε (·). For notational simplicity, we have chosen to treat the
case where there is no explicit time dependence in the coefficients f (·),
340 12. Two-Time-Scale Switching Jump Diffusions

f1 (·), σ(·), and σ1 (·). That is, both the diffusion process z ε (·) and the
jump diffusion X ε (·) are time homogeneous. Treating z ε as a parameter,
the generator for X ε (·) can be written as
Z
G F (x, ι) = L F (x, ι) + λ[F (x + g(γ, x, z ε , ι), ι) − F (x, ι)]π(dγ)
ε ε
Γ
+ QF (x, ·)(ι), for each ι ∈ M,
(12.32)
where
1 2 ∂2 ∂
Lε F (x, ι) = σ (x, ι, z ε ) 2 F (x, ι) + f (x, ι, z ε ) F (x, ι),
2m ∂x ∂x
X 0 X (12.33)
QF (x, ·)(ι) = aι` F (x, `) = aι` [F (x, `) − F (x, ι)].
`=1 `6=ι

We need the following conditions.

(A12.3) E|x|2 < ∞ and E|z0 |2 < ∞. Moreover,


(a) for each α ∈ M, f (·, α, ·) and σ(·, α, ·) are continuous func-
tions, |f (x, α, z)| ≤ K(1 + |x|), and |σ(x, α, z)| ≤ K(1 + |x|)
uniformly in z; for each α ∈ M, x, z ∈ R, |f (y, α, z) −
f (x, α, z)| ≤ K|y − x| and |σ(y, α, z) − σ(x, α, z)| ≤ K|y − x|
uniformly in z. For each α ∈ M, λ < ∞, g(·, ·, α, ·) is
bounded and continuous, g(0, x, α, z) = 0 for all x, z, and α;
the value of γ can be determined uniquely by g(γ, x, α, z).
(b) Both f1 (x, ·) and σ1 (x, ·) are periodic functions with period
1 such that σ1 (x, z) > 0 for each x, z, |f1 (x, z)| ≤ K(1+|z|),
|σ1 (x, z)| ≤ K(1 + |z|), |f1 (x, z) − f1 (x, y)| ≤ K|z − y|, and
|σ1 (x, z) − σ1 (x, y)| ≤ K|z − y| uniformly in x.

Remark 12.10. Part (b) yields that the fast-varying process z ε (·) is a
so-called periodic diffusion; see [10, 85] among others. This condition guar-
antees that there is an invariant density µ(x, z) (for a fixed x). In [85],
under suitable conditions, it was proved that not only an invariant mea-
sure exists, but also asymptotic expansions of the transition density can be
constructed. The choice of periodicity 1 is more or less for convenience; we
could in fact use other positive constants as the periodicity. It seems to be
more instructive to use simpler conditions as in the current setup.

Lemma 12.11. Under (A12.3), {X ε (·)} is tight in D([0, T ]; R).

Proof. The proof is somewhat similar to the previous case, therefore, we


do not spell out all the details. Similar to Lemma 12.6, it can be shown that
supt∈[0,T ] E|X ε (t)|2 ≤ K. For any δ > 0, and any t > 0, s > 0 with s ≤ δ,
we still have (12.18). Deriving a similar estimate as in that of Lemma 12.7
12.3 Fast-Varying Diffusion 341

for the last term (12.18), using the estimate, the linear growth in x, for
the first two terms on the right side of (12.18), and by virtue of Gronwall’s
inequality, we can show that

lim lim sup E|X ε (t + s) − X ε (t)|2 = 0.


δ→0 ε→0

Hence the tightness is obtained. 2


Theorem 12.12. Assume (A12.3). Then (X ε (·), α(·)) converges weakly to
(X(·), α(·)), a jump-diffusion process modulated by the Markov chain α(·),
which is a solution of the equation
Z t Z t
X(t) = x + b
f (X(u), α(u))du + b(X(u), α(u))dw
σ
Z t 0Z 0 (12.34)
+ gb(γ, X(u), α(u))N (du, dγ),
0 Γ

where Z 1
fb(x, α) = f (x, α, z)µ(x, z)dz,
0Z
1
gb(γ, x, α) = g(γ, x, α, z)µ(x, z)dz,
Z 10 (12.35)
σb2 (x, α) = σ 2 (x, α, z)µ(x, z)dz,
0
p
σb(x, α) = σ b2 (x, α).

Remark 12.13. To proceed, a pertinent way of carrying out the averaging


is to work with a truncated process (see [104, p. 248] for instance). The
basic idea is: Let M > 0 be given and SM be a sphere with radius M
centered at the origin. Define X ε,M (t) = X ε (t) up until the first exit from
the M -sphere. Use a smooth truncation function qM (x) that is equal to
1 when it is inside the M -sphere and is 0 if it is outside the sphere with
radius M + 1. Then rewrite the dynamics with the use of the M -truncated
process and the truncation function. In this process, in lieu of f (x, α, z),
g(γ, x, α, z), and σ(x, α, z), we use
def
f M (x, α, z) = f (x, α, z)qM (x),
def
g M (γ, x, α, z) = g(γ, x, α, z)qM (x),
def
σ M (x, α, z) = σ(x, α, z)qM (x),

respectively. Then we proceed to derive the weak convergence of X ε,M (·)


to X M (·). Finally, using the uniqueness of the martingale problem letting
M → ∞, we conclude that X ε (·) also converges to X(·). However, for no-
tational simplicity, in what follows, we do not use the truncation notation,
but simply assume that the process itself is bounded.
342 12. Two-Time-Scale Switching Jump Diffusions

Proof. Consider the fast-varying process. To proceed, define



zeε (t) = z ε (εt), ve(t) = v(εt)/ ε, τ = t/ε,

where v(·) is the Brownian motion given in (12.31). Then the equation for
z ε (t) may be written as
Z τ Z τ
ε ε ε
ze (τ ) = z0 + f1 (X (εu), ze (u))du + σ1 (X ε (εu), zeε (u))de
v (u).
0 0
(12.36)
Note that as ε → 0, τ → ∞. Note also that X ε (ετ ) is slowly varying
whereas zeε (τ ) is fast changing. Intuitively, it tells us that we can treat
X ε (ετ ) as if it were a “constant” in a small interval. Taking x as a fixed
parameter, in what follows, we consider the following fixed-x process (see
[103] for an explanation of the fixed-x process),
Z τ Z τ
zeε,x (τ ) = z0 + f1 (x, zeε,x (u))du + σ1 (x, zeε,x (u))de
v (u),
0 0

and the associated generator is


2
e = 1 σ 2 (x, z) ∂ + f1 (x, z) ∂ .
L (12.37)
1
2 ∂z 2 ∂z
Let p(t, x, z1 , z) denote the transition density associated with the diffusion
generated by L. e Then p(t, x, z1 , z) satisfies the Kolmogorov forward equa-
tion
∂p e ∗ p,
=L
∂t
lim+ p(t, x, z1 , z) = δ(z − z1 ),
t→0

where L e ∗ is the adjoint of L.e Because this diffusion is on a compact set,


the classical theory of diffusion processes and related results on partial
differential equations (see, e.g., Ikeda and Watanabe [72], and Agranovich
[1]) yields that there is a unique invariant transition density µ(x, z) such
that p(t, x, z1 , z) → µ(x, z) as t → ∞, where µ(x, z) is the unique solution
of 

 L e ∗ µ(x, z) = 0,
Z 1 (12.38)

 µ(x, z) ≥ 0 and µ(x, z)dz = 1.
0
We note that (X ε (·), α(·), z ε (·)) is a Markov process and that z ε (·) can
be treated as a fast-varying noise process that will be averaged out in the
limit process. The average is taken with respect to the stationary measure
of zeε (·). Because (X ε (·), α(·)) is tight in D([0, T ]; R × M), by Prohorov’s
theorem we can extract a weakly convergent subsequence. Select such a
subsequence and denote the limit by (X(·), α(·)). For notational simplic-
ity, still denote the subsequence by (X ε (·), α(·)). By virtue of the Skorohod
12.3 Fast-Varying Diffusion 343

representation, (X ε (·), α(·)) converges to (X(·), α(·)) w.p.1 and the conver-
gence is uniform on each bounded set. We proceed to show that the limit
is a solution of the martingale problem with generator Gb given by

b (x, α) = 1 σ ∂ 2 F (x, α) ∂F (x, α)


GF b2 (x, α) + fb(x, α) b α)
+ J(x,
2 ∂x2 ∂x (12.39)
+ QF (x, ·)(α), α ∈ M,

where
Z
b α) =
J(x, λ[F (x + gb(γ, x, α), α) − F (x, α)]π(dγ),
Γ

b is an average of ϕ(·) as defined in (12.35) for ϕ(·) being f (·), σ 2 (·),


and ϕ(·)
and g(·), respectively.
Using an argument as in [176, Lemma 7.18], it can be shown that the
martingale problem with operator Gb has a unique solution. To show that
(X(·), α(·)) is indeed the solution of the martingale problem with operator
b it suffices to verify that for each α ∈ M and for any F (·, α) ∈ C02 ,
G,

Z t
F (X(t), α(t)) − F (x, α) − b (X(u), α(u))du
GF (12.40)
0

is a martingale. As in the previous section, we begin with the X ε (·) process.


For any positive integer n0 , and 0 ≤ t` ≤ t and ` ≤ n0 , and any bounded
and continuous function h` (·), the weak convergence and the Skorohod
representation imply that

n0
Y
E h` (X ε (t` ))[F (X ε (t + s), α(t + s)) − F (X ε (t), α(t))]
`=1
n0
Y
→E h` (X(t` ))[F (X(t + s), α(t + s)) − F (X(t), α(t))] as ε → 0.
`=1
(12.41)
In addition,

n0
Y
E h` (X ε (t` )) [F (X ε (t + s), α(t + s)) − F (X ε (t), α(t))
`=1 Z t+s 
ε ε
− G F (X (u), α(u))du = 0.
t
344 12. Two-Time-Scale Switching Jump Diffusions

Thus in view of (12.41), it suffices to consider the limit of

n0
Y Z t+s 
ε ε ε
E h` (X (t` )) G F (X (u), α(u))du
t
`=1
n0
"Z 
Y t+s
∂F (X ε (u), α(u))
=E h` (X ε (t` )) f (X ε (u), α(u), z ε (u))
t ∂x
`=1
2 ε ε
1 ∂ F (X (u), α(u), z (u))
+ σ 2 (X ε (u), α(u), z ε (u))
2Z ∂x2
+ λ[F (X (u) + g(γ, X (u), α(u), z ε (u)), α(u))
ε ε
Γ
− F (X ε (u), α(u))]π(dγ)
 #
+ QF (X ε (u), α(u)) du .

(12.42)
Consider the last term in (12.42). Using the weak convergence, the Sko-
rohod representation, and the continuity of the function F (·, α) for each
α ∈ M, it can be shown that as ε → 0,

n0
Y Z t+s 
E h` (X ε (t` )) QF (X ε (u), α(u))du
`=1 t
n0
Y Z t+s  (12.43)
→E h` (X(t` )) QF (X(u), α(u))du .
`=1 t

Choose 0 ≤ δε such that δε → 0 as ε → 0 but δε /ε → ∞. For any t, s ≥ 0


with 0 ≤ t + s ≤ T , by partitioning the interval [t, t + s] into subintervals
of length δε , we can rewrite the terms (except the last one) in (12.42) as

Z t+s (t+s)/δε −1
X Z lδε +δε
b ε (u)du = 1 b ε (u)du,
H δε H (12.44)
t δε lδε
l=t/δε

where H b ε (u) is a representation of any of the functions that appeared in


the integrand of (12.42). We then work with each of the terms and find the
corresponding limits.
Noting t` ≤ t, h` (X ε (t` )) is Flδ
ε
ε
-measurable, where

ε
Flδ ε
= {X ε (u), α(u), z ε (u) : 0 ≤ u ≤ lδε }.
12.3 Fast-Varying Diffusion 345

Making a change of variable from u to εu leads to


n0
Y Z t+s 
ε ε ∂F (X ε (u), α(u))
ε
E h` (X (t` )) f (X (u), α(u), z (u)) du
t ∂x
`=1
Yn0  (t+s)/δ
Xε −1 ε Z lδεε+δε
=E h` (X ε (t` )) δε Eεlδε f (X ε (εu), α(εu), zeε (u))]
δε lδεε
`=1 l=t/δε
∂F (X ε (εu), α(εu))
× du ,
∂x

where Eεlδε denotes the conditioning on the σ-algebra Flδ


ε
ε
. Denote tlε =
lδε /ε and Tε = δε /ε. Then as ε → 0, Tε → ∞. By the Lipschitz continuity
of f (·, α, z) and the boundedness of (∂/∂x)F (·, α),
Z (lδε +δε )/ε
ε
E Eεlδε [f (X ε (εu), α(εu), zeε (u)) − f (X ε (lδε ), α(εu), zeε (u))]
δε (lδε )/ε
∂F (X ε (εu), α(εu))
× du
∂x
Z tlε +Tε
1
≤K E|X ε (εu) − X ε (lδε )|du
Tε tlε
≤K sup E|X ε (εu) − X ε (lδε )| → 0 as ε → 0.
lδε ≤εu≤lδε +δε

Similarly,
Z tlε +Tε
1
E Eεlδε f (X ε (lδε ), α(εu), zeε (u))]
Tε  tlε 
∂F (X ε (εu), α(εu)) ∂F (X ε (lδε ), α(εu))
× − du
∂x ∂x
Z tlε +Tε
1
≤K E|X ε (εu) − X ε (lδε )|du
Tε tlε
≤K sup E|X ε (εu) − X ε (lδε )| → 0 as ε → 0.
lδε ≤εu≤lδε +δε

Thus
Z lδε +δε
1 ∂F (X ε (u), α(u))
Eεlδε f (X ε (u), α(u), z ε (u)) du
δε lδε ∂x
Z tlε +Tε
1 ∂F (X ε (lδε ), α(εu))
= Eεlδε f (X ε (lδε ), α(εu), zeε (u)) du + o(1),
Tε tlε ∂x

where o(1) → 0 in probability uniformly on any bounded t-set.


By virtue of the measurability of (∂/∂x)F (X ε (lδε ), j) with respect to
346 12. Two-Time-Scale Switching Jump Diffusions

ε
Flδ ε
, and the independence of α(·) and z ε (·),
Z tlε +Tε
1 ∂F (X ε (lδε ), α(εu))
Eεlδε f (X ε (lδε ), α(εu), zeε (u)) du
Tε ε tl ∂x
X 1 Z tlε +Tε ∂F (X ε (lδε ), j)
= Eεlδε f (X ε (lδε ), j, zeε (u)) I{α(εu)=j} du
Tε tlε ∂x
j∈M
X 1 Z tlε +Tε ∂F (X ε (lδε ), j)
= Eεlδε f (X ε (lδε ), j, zeε (u))
Tε tlε ∂x
i,j∈M

× P(α(εu) = j|α(lδε ) = i)I{α(lδε )=i} du


X 1 Z tlε +Tε ∂F (X ε (lδε ), i)
= Eεlδε f (X ε (lδε ), i, zeε (u))
Tε tlε ∂x
i∈M

× I{α(lδε )=i} du + o(1),

where o(1) → 0 in probability as ε → 0 uniformly on any bounded t-set.


The last step above follows from the well-known fact of continuous-time
Markov chain: Because εu − lδε → 0 as ε → 0,

1, if i = j,
P(α(εu) = j|α(lδε ) = i) → δij =
0, otherwise.
The above estimates indicate that we need only consider the term
Z tlε +Tε
1 ∂F (X ε (lδε ), i)
ρε = Eεlδε f (X ε (lδε ), i, zeε (u)) I{α(lδε )=i} du
Tε tlε ∂x
in the averaging.
We approximate the X ε (lδε ) by a process taking finitely many values.
To be more specific, as in [104, Section 6.1, p. 143 and Section 8.2, p. 227],
for any ∆ > 0, let {Bn∆ : n ≤ n∆ } be a finite collection of disjoint sets that
satisfies P(X ε (lδε ) ∈ ∂Bn∆ ) = 0 and that covers the range of X ε (lδε ). Recall
that we have assumed the boundedness of X ε (·) for notational simplicity
(see Remark 12.13). Select a point Xn∆ ∈ Bn∆ and rewrite ρε as
n∆
X
ρε = I{X ε (lδε )∈Bn∆ } I{α(lδε )=i}
n=1
Z tlε +Tε
1 ∂F (Xn∆ , i)
× Eεlδε f (Xn∆ , i, zeε (u)) du
Tε tlε ∂x
n
X ∆

+ I{X ε (lδε )∈Bn∆ } I{α(lδε )=i} (12.45)


n=1
Z tlε +Tε 
1 ∂F (X ε (lδε ), i)
× Eεlδε f (X ε (lδε ), i, zeε (u))
Tε tlε ∂x


∂F (X n , i)
− f (Xn∆ , i, zeε (u)) du.
∂x
12.3 Fast-Varying Diffusion 347

The term in the last three lines of (12.45) goes to 0 in probability as ε → 0


and then ∆ → 0 by the weak convergence, the Skorohod representation, the
Lipschitz continuity of f (·, i, z), and the smoothness of F (·, i). We proceed
to figure out the limit of the term in the first two lines of the right-hand
side of (12.45).
Note that as ε → 0,


1, if X ε (lδε ) − Xn∆ → 0,
I{X ε (lδε )∈Bn∆ } →
0, otherwise.

In view of (12.36) and the existence of the unique invariant density µ(x, ·)
for each x, we have that

n∆
XX
I{X ε (lδε )∈Bn∆ } I{α(lδε )=i}
i∈M n=1
Z tlε +Tε
1 ∂F (Xn∆ , i)
× Eεlδε f (Xn∆ , i, zeε (u)) du
Tε tlε ∂x
XX n∆
= I{X ε (lδε )∈Bn∆ } I{α(lδε )=i}
i∈M n=1
Z tlε +Tε
Z 1
1 ∂F (Xn∆ , i)
× f (Xn∆ , i, z)Eεlδε I{ez ε,Xn∆ (u)∈dz} du
Tε tlε 0 ∂x
XX n∆
= I{X ε (lδε )∈Bn∆ } I{α(lδε )=i}
i∈M n=1
Z tlε +TεZ 1
1 ∆ ∂F (Xn∆ , i)

× f (Xn∆ , i, z)P(e
z ε,Xn (u) ∈ dz|e
z ε,Xn (tlε ))du
Tε tlε 0 ∂x
XZ 1 ∂F (X(u), i)
→ I{α(u)=i} f (X(u), i, z)µ(X(u), z)dz
∂x
i∈M 0
∂F (X(u), α(u))
= fb(X(u), α(u)) .
∂x

Thus, as ε → 0,

n0
Y Z t+s 
∂F (X ε (u), α(u))
E h` (X ε (t` )) f (X ε (u), α(u), z ε (u)) du
t ∂x
`=1
n0
Y Z t+s 
b ∂F (X(u), α(u))
→E h` (X(t` )) f (X(u), α(u)) du.
t ∂x
`=1
348 12. Two-Time-Scale Switching Jump Diffusions

Likewise, we obtain that as ε → 0,


n0
Y
E h` (X ε (t` ))
`=1  Z 
1 t+s 2 ε ε ∂ 2 F (X ε (u), α(u), z ε (u))
× σ (α(u), X (u), z (u)) du
2 t ∂x2
Yn 0 Z t+s 2

∂ F (X(u), α(u))
→E h` (X(t` )) b2 (X(u), α(u))
σ du,
t ∂x2
`=1

and
n0
Y Z t+s Z
ε
E h` (X (t` )) λ[F (X ε (u) + g(γ, X ε (u), α(u), z ε (u)), α(u))
t Γ
`=1 
− F (X ε (u), α(u))]π(dγ)
Yn0  Z t+s Z
→E h` (X(t` )) λ[F (X(u) + gb(γ, X(u), α(u)), α(u))
t
`=1 Γ
− F (X(u), α(u))]π(dγ) .

Combining these results, we obtain


n0
Y h
E h` (X(t` ) F (X(t + s), α(t + s)) − F (X(t), α(t))
`=1 Z
t+s i
− LF (X(u), α(u))du = 0.
t

Hence (12.40) holds. Consequently, the desired result follows. 2

12.4 Discussion and Remarks


We have studied two-time-scale hybrid jump diffusions. The motivational
insurance risk models are more general than the classical compound Poisson
models and compound Poisson models under diffusion perturbations. In
our models, the rate at which the premiums are received and the rate
of oscillation due to diffusion depend on the surplus level. Moreover, the
models also take into consideration Markovian regime switching. Under
suitable conditions, we have derived limit systems for fast switching and
fast diffusion, respectively. The results obtained can be specialized to a
number of cases.

Example 12.14. Consider (12.7) with σ(t, x, α) ≡ 0. Under such a con-


dition Theorem 12.3 still holds with σ 2 (t, x, α) ≡ 0. Then the limit system
12.5 Remarks on Numerical Solutions for Switching Jump Diffusions 349

becomes
Z t Z tZ
X(t) = x + f (s, X(s), α(s))ds + g(γ, X(s− ), α(s− ))N (ds, dγ).
0 0 Γ

Under further specialized coefficients, it reduces to the surplus model in-


volving regime switching [163].

Example 12.15. We consider the model in (12.7) again with Q(t) e = Q(t)
that is weakly irreducible. Then the limit system is given by Corollary 12.5.
The limit does not involve the Markov switching process. Thus the com-
plexity reduction is more pronounced. Further specification leads to the
classical risk model.
Example 12.16. Consider the system given by (12.31) with σ(t, x, α) ≡ 0.
Then similar to Example 12.14, Theorem 12.3 still holds with the limit
system being a Markovian modulated jump process.

12.5 Remarks on Numerical Solutions for


Switching Jump Diffusions
This chapter focuses on switching jump diffusions. The central theme is the
treatment of two-time-scale systems. Most of the jump-diffusion processes
with regime switching are nonlinear. Even without two-time-scales, closed-
form solutions are virtually impossible to obtain. As a viable alternative,
one has to find feasible numerical schemes. In what follows, we suggest
numerical algorithms for (12.1). The development is for systems without
two-time-scales. The rationale is that if one has a system that involves fast
and slow time scales, then one could first use the averaging ideals presented
in the previous sections to reduce the amount of computation leading to
a limit or reduced system. Thus, for numerical methods, we concentrate
on the limit systems only. We present the algorithm and the corresponding
convergence properties. The approach we are taking is again the martingale
problem formulation, which brings out the profile and dynamic behavior
of the process rather than dealing with the iterations directly. Because the
detailed proof of convergence of numerical approximation algorithms has a
certain similarity to that of Chapters 5 and 6, we omit the verbatim details,
but provide a reference.
We use the notation as given in the appendix in A.6 of this book. Please
consult that section for various definitions such as τn , ψn , ψ(t) and the
like. Consider the case where the coefficients have no explicit dependence
on the time variable. For simplicity, we take Q(t) = Q, a constant matrix.
We describe the algorithm as follows.
1. Choose ∆ > 0, a small parameter, as the step size.
350 12. Two-Time-Scale Switching Jump Diffusions

2. Construct a discrete-time Markov chain αn with transition probabil-


ity matrix P ∆ = I + ∆Q, where Q is the generator of the continuous-
time Markov chain α(t) given in (12.1), and I is an m0 -dimensional
identity matrix.

3. To approximate the Brownian motion w(·), a usual practice in nu-


merical solutions for stochastic differential equations is to use ∆w n =
w(∆(n + 1)) − w(∆n) to approximate dw. Because w(·) has indepen-
dent increments, {∆wk } is a sequence of independent and identically
distributed random variables with mean 0 and covariance ∆I.

4. Let {τn } and {ψn } be sequences of independent and identically dis-


tributed random variables such that τn has an exponential distribu-
tion with parameter λ for some λ > 0, and ψn is the impulse having
distribution π(·) (see Section A.6 of this book in the appendix). De-
fine
τen+1 = τen + τn , with τe0 = 0.
It follows from the independence assumption, {∆wk }, {αk }, {τn },
and {ψn } are also independent.

5. Construct the approximation algorithm


n
X n
X √
Xn+1 = X0 + ∆ f (Xk , αk ) + ∆σ(Xk , αk )ξk
k=0 k=0 (12.46)
X
+ g(ψj , Xbeτj /∆c , αbeτj /∆c ),
τej ≤∆n

where byc denotes the integer part of a real number y.

Remark 12.17. For construction of a discrete-time Markov chain, see


[177, pp. 315–316]. The discrete-time Markov chain constructed is an ap-
proximation of a discretization obtained from α(t). In fact, we could define
βn = α(∆n) for any positive integer n. It is easily verified that the process
so defined is a discrete-time Markov chain, whose transition probability
matrix is given by exp(∆Q). This Markov chain has stationary transition
probabilities or it is a time-homogeneous chain. The process βn is known
as a skeleton process in the literature [28]. One of the advantages of using
a constant stepsize for the numerical procedure is that the skeleton process
has stationary transition probabilities not depending on time, so it is easier
to generate than that of a nonstationary process.
As in Chapter 5, in the algorithm, we have used another fold of ap-
proximation, namely, using I + ∆Q in lieu of exp(∆Q) for the transition
matrix. This further simplifies the computation and reduces the complexity
in calculating exp(∆Q). Intuitively, the discrete-time Markov chain we are
constructing can be considered as one whose transition probability matrix
12.5 Remarks on Numerical Solutions for Switching Jump Diffusions 351

is obtained from that of βn by a truncated Taylor expansion. This approxi-


mation makes sense because we can invoke the results in [178] to show that
an interpolated process of αn with interpolation interval [∆n, ∆n + ∆)
converges weakly to α(·) generated by Q.
To approximate the Brownian motion, in lieu of the usual approach as
given above, we could generate a sequence of independent and identically
distributed random variables {ξn } such that Eξn = 0 and Eξn = I to fur-
ther simplify the computation. The functional central limit theorem ensures
the approximation to the Brownian motion.
To proceed, define
X
Jn = g(ψj , Xbeτj /∆c , αbeτj /∆c ). (12.47)
τej ≤∆n

It is often desirable to write (12.46) recursively. This can be done as follows



Xn+1 = Xn + ∆f (Xn , αn ) + ∆σ(Xn , αn )ξn + ∆Jn , (12.48)

where ∆Jn = Jn − Jn−1 .


Note that the sequences {τn } = {e τn+1 − τen } and {ψn } are as in those
discussed in the last section. The process τen , in fact, represents the jump
times of the underlying process.
To proceed, let us define the interpolated processes via piecewise constant
interpolations as

X ∆ (t) = Xn , 






α (t) = αn , 


t/∆−1
√ X for t ∈ [∆n, ∆n + ∆). (12.49)
w∆ (t) = ∆ ξk , 




k=0 




J (t) = Jn ,

For convenience, with a slight abuse of notation, we omitted the floor func-
tion notation above. Henceforth, for instance, we use t/∆ to denote the
integer part of t/∆ here. We state a result, whose proof is along the line of
martingale averaging. Further details can be found in [173].
Theorem 12.18. Under the conditions in (A12.1), (X ∆ (·), α∆ (·)) con-
verges weakly to (X(·), α(·)) as ∆ → 0 such that (X(·), α(·)) is the solution
of the martingale problem with operator L.
As in Chapter 5, Theorem 12.18 implies that the algorithm we con-
structed is convergent. The limit is nothing but the solution of (12.1). As
in Chapter 5, one of the variations of the algorithm is to use a sequence of
decreasing step sizes. The modifications are as follows. LetP{∆n } be a se-

quence of nonnegative real numbers such that ∆n → 0 and n=0 ∆n = ∞.
352 12. Two-Time-Scale Switching Jump Diffusions

For example, we may take ∆n = 1/n, or ∆n = 1/nγ for some 0 < γ < 1.
Define
n−1
X
tn = ∆l , and αn = α(tn ), n ≥ 0.
l=0

It then can be verified that {αn } so defined is a discrete-time Markov chain


with transition probability matrix

P n,n+1 = (pn,n+1
ij )m0 ×m0 = exp((tn+1 − tn )Q) = exp(∆n Q). (12.50)

Using ideas in stochastic approximation [104], define

m(t) = max{n : tn ≤ t}. (12.51)

It is clear that now the Markov chain αn is not time homogeneous. The
approximate solution for the stochastic differential equation with jumps
and regime switching (12.1) is given by
 n n p
 X X
 Xn+1 = X0 +
 f (X , α )∆ + ∆k σ(Xk , αk )ξk

 k k k

 k=0 k=0
X
+ g(ψj , Xj−1 , αj ), (12.52)



 τej ≤n



X0 = x, α0 = α.

With the algorithm proposed, we can then proceed to study its perfor-
mance. The proof is along the same lines as that of the constant stepsize
algorithm. The interested reader is referred to [173]; see also Chapters 5
and 6 of this book for further reading.

12.6 Notes
As a continuation of our study, based on the work Yin and Yang [175], this
chapter has been devoted to two-time-scale jump diffusions with regime
switching. Roughly, the fast-changing driving processes can be treated as
a noise, whose stationary measure does exist. Under suitable conditions,
the slow process is averaged out with respect to the stationary measure of
the fast-varying process. Such an idea has been used in the literature. We
refer the reader to Khasminskii [82], Papanicolaou, Stroock, and Varadhan
[131], Khasminskii and Yin [86, 88, 89], Pardoux and Veretennikov [132],
and references therein for related work in diffusions, and Yin [165] for that
of switching diffusions.
The limit results obtained in this chapter can be useful for applications.
For example, one may use such results in a subsequent study for obtaining
bounds on ruin probability in risk management. It is conceivable that this
12.6 Notes 353

approach will lead to approximations with tighter error bounds. Certain


generalizations are possible. For example, the finite-state Markov chain may
also include transient states. Treating (12.31), time-inhomogeneous systems
may be considered. Further investigation may also include the replacement
of the diffusion by a wideband noise [102] yielding more realistic systems.
Finally, for notational simplicity, only scalar X ε (t) is treated in this chapter.
The approach presented can be carried over to multidimensional cases.
Towards the end of the chapter, we also outlined numerical schemes for
solutions of switching jump diffusions. For detailed development, the reader
is referred to Yin, Song, and Zhang [173]. For simplicity, the switching
process is assumed to be a continuous-time Markov chain independent of
the Brownian motion. By combining the treatment of jump diffusions with
that of the switching diffusions with x-dependent switching, x-dependent
switching jump diffusions can also be treated.
Appendix A

Serving as a handy reference, this appendix collects a number of results


that are used in the book. These results include Markov chains, martin-
gales, diffusion processes, weak convergence, hybrid jump diffusions, and
other miscellaneous results. In most of the cases, only results are presented.
The detailed developments and discussions are omitted, but pointers are
provided for further reading. We assume the knowledge of basic probability
theory and stochastic processes. These can be found in standard textbooks
for a course in probability, for example, Breiman [19], Chow and Teicher
[27], among others.

A.1 Discrete-Time Markov Chains


The theory of stochastic processes is concerned with structures and proper-
ties of families of random variables Xt , where t is a parameter taken over a
suitable index set T. The index set T may be discrete (i.e., T = {0, 1, . . . , }),
or continuous (i.e., an interval of the real line). Stochastic processes asso-
ciated with these index sets are said to be discrete-time processes and
continuous-time processes, respectively; see [80], for instance. The random
variables Xt can be either scalars or vectors. For a continuous-time stochas-
tic processes, we use the notation X(t), whereas for a discrete process, we
use Xk .
A stochastic process is wide-sense (or covariance) stationary, if it has
finite second moments, a constant mean, and a covariance that depends
only on the time difference. The ergodicity of a discrete-time stationary

355
356 Appendix A. Appendix

sequence {Xk } refers to the convergence of the sequence (X1 + X2 + · · · +


Xn )/n to its expectation in an appropriate sense; see for example, Karlin
and Taylor [80, Theorem 5.6, p. 487] for a strong ergodic theorem of a
stationary process. A stochastic process Xk is adapted to a filtration {Fk },
if for each k, Xk is an Fk -measurable random vector.
Suppose that αk is a stochastic process taking values in M, which is
at most countable (i.e., it is either finite M = {1, 2, . . . , m0 } or countable
M = {1, 2, . . .}). We say that αk is a Markov chain if it possesses the
Markov property,

pk,k+1
ij = P(αk+1 = j|αk = i)
= P(αk+1 = j|α0 = i0 , . . . , αk−1 = ik−1 , αk = i),

for any i0 , . . . , ik−1 , i, j ∈ M.


Given i, j, if pk,k+1
ij is independent of time k, that is, pk,k+1
ij = pij , αk is
said to have stationary transition probabilities. The corresponding Markov
chain is said to be stationary or time-homogeneous or temporally homo-
geneous or simply homogeneous. In this case, denote the transition matrix
(n)
by P = (pij ). Denote the n-step transition matrix by P (n) = (pij ), with

(n)
pij = P(Xn = j|X0 = i).

Then P (n) = P n . That is, the n-step transition matrix is simply the matrix
P to the nth power. Note that
P
(a) pij ≥ 0, j pij = 1, and

(b) (P )k1 +k2 = (P )k1 (P )k2 , for k1 , k2 = 1, 2, . . . This identity is known


as the Chapman–Kolmogorov equation.

Suppose that P = (pij ) ∈ Rm0 ×m0 is a transition matrix. Then the


spectral radius of P satisfies ρ(P ) = 1; see Section A.7 and also Karlin and
Taylor [81, p. 3] for a definition of the spectral radius of a matrix. This
implies that all eigenvalues of P are on or inside the unit circle.
For a Markov chain αk , state j is said to be accessible from state i
(k)
if pij = P(αk = j|α0 = i) > 0 for some k > 0. Two states i and j,
accessible from each other, are said to communicate with each other. A
Markov chain is irreducible if all states communicate with each other. For
i ∈ M, let d(i) denote the period of state i (i.e., the greatest common
divisor of all k ≥ 1 such that P(αk+n = i|αn = i) > 0, define d(i) = 0 if
P(αk+n = i|αn = i) = 0 for all k). A Markov chain is aperiodic if each state
has period one. In accordance with Kolmogorov’s classification of states,
a state i is recurrent if, starting from state i, the probability of returning
to state i after some finite time is 1. A state is transient if it is not recur-
rent. Criteria on recurrence can be found in most standard textbooks of
A.1 Discrete-Time Markov Chains 357

stochastic processes or Markov chains. In this book, we consider recurrence


of switching diffusions, which as for its diffusion process counterpart, can
be considered as a generalization of Kolmogorov’s classifications.
Note that (see Karlin and Taylor [81, p. 4]) if P is a transition matrix
for a finite-state Markov chain, the multiplicity of the eigenvalue 1 is equal
to the number of recurrent classes associated with P . A row vector π =
(π1 , . . . , πm0 ) with each πi ≥ 0 is called a stationary distribution of αk if it
is the unique solution to the system of equations


 πP = π,
X

 πi = 1.
i

As demonstrated in [81, p. 85], for i being in an aperiodic recurrent class,


if πi > 0, which is the limit of the probability of starting from state i and
then entering state i at the nth transition as n → ∞, then for all j in this
class of i, πj > 0, and the class is termed positive recurrent or strongly
ergodic.
Theorem A.1. Let P = (pij ) ∈ Rm0 ×m0 be the transition matrix of an
irreducible aperiodic finite-state Markov chain. Then there exist constants
0 < λ < 1 and c0 > 0 such that

P k − P ≤ c0 λk for k = 1, 2, . . . ,

where P = 1lm0 π, 1lm0 = (1, . . . , 1)0 ∈ Rm0 ×1 , and π = (π1 , . . . , πm0 ) is the
stationary distribution of αk . This implies, in particular,

lim P k = 1lm0 π.
k→∞

Suppose αk is a discrete-time Markov chain with transition probability


matrix P . One of the ergodicity conditions of Markov chains is Doeblin’s
condition (see Doob [33, Hypothesis D, p. 192]; see also Meyn and Tweedie
[125, p. 391]). Suppose that there is a probability measure µ with the
property that for some positive integer n, 0 < δ < 1, and ∆ > 0, µ(A) ≤ δ
implies that P n (x, A) ≤ 1−∆ for all x ∈ A. In the above, P n (x, A) denotes
the transition probability starting from x reaches the set A in n steps. Note
that if αk is a finite-state Markov chain that is irreducible and aperiodic,
then the Doeblin condition is satisfied.
Given an m0 × m0 irreducible transition matrix P and a vector G, con-
sider
F (P − I) = G, (A.1)
where F is an unknown vector. Note that zero is an eigenvalue of the matrix
P − I and the null space of P − I is spanned by 1lm0 . Then by the Fredholm
alternative (see Lemma A.11), (A.1) has a solution if and only if G1lm0 = 0,
where 1lm0 = (1, . . . , 1)0 ∈ Rm0 ×1 .
358 Appendix A. Appendix

.
Define Qc = (P − I ..1lm0 ) ∈ Rm0 ×(m0 +1) . Consider (A.1) together with
Pm 0
the condition F 1lm0 = i=1 Fi = Fb , which may be written as F Qc = Gc
..
where Gc = (G.Fb). Because for each t, (A.9) has a unique solution, it
follows that Qc (t)Q0c (t) is a matrix with full rank; therefore, the equation
F [Qc Q0c ] = Gc Q0c (A.2)
has a unique solution, which is given by F = Gc Q0c [Qc Q0c ]−1 .

A.2 Continuous-Time Markov Chains


A right-continuous stochastic process with piecewise-constant sample paths
is a jump process. Suppose that α(·) = {α(t) : t ≥ 0} is a jump pro-
cess defined on (Ω, F, P ) taking values in M. Then {α(t) : t ≥ 0} is a
continuous-time Markov chain with state space M, if
P(α(t) = i|α(r) : r ≤ s) = P(α(t) = i|α(s)),
for all 0 ≤ s ≤ t and i ∈ M, with M being either finite or countable.
For any i, j ∈ M and t ≥ s ≥ 0, let pij (t, s) denote the transition
probability P(α(t) = j|α(s) = i), and P (t, s) the matrix (pij (t, s)). We
name P (t, s) the transition matrix of the Markov chain α(·), and postulate
that
lim+ pij (t, s) = δij ,
t→s
where δij = 1 if i = j and 0 otherwise. It follows that for 0 ≤ s ≤ ς ≤ t,

pij (t, s) ≥ 0, i, j ∈ M,
X
pij (t, s) = 1, i ∈ M,
j∈M
X
pij (t, s) = pik (ς, s)pkj (t, ς), i, j ∈ M.
k∈M

The last identity is the Chapman–Kolmogorov equation as its discrete-time


counterpart. If the transition probability P(α(t) = j|α(s) = i) depends only
on (t − s), then α(·) is said to be time-homogeneous or it is said to have
stationary transition probabilities. Otherwise, the process is nonhomoge-
neous or nonstationary. For time homogeneous Markov chains, we define
pij (h) := pij (s + h, s) for any h ≥ 0.
Suppose that α(t) is a continuous-time Markov chain with stationary
transition probability P (t) = (pij (t)). It then naturally induces a discrete-
time Markov chain. For each h > 0, the transition matrix (pij (h)) is the
transition matrix of the discrete-time Markov chain αk = α(kh), which is
called an h-skeleton of the corresponding continuous-time Markov chain by
Chung; see [28, p. 132].
A.2 Continuous-Time Markov Chains 359

Definition A.2 (q-Property). A matrix-valued function Q(t) = (qij (t)),


for t ≥ 0, satisfies the q-Property, if

(a) qij (t) is Borel measurable for all i, j ∈ M and t ≥ 0;


(b) qij (t) is uniformly bounded. That is, there exists a constant K such
that |qij (t)| ≤ K, for all i, j ∈ M and t ≥ 0;
P
(c) qij (t) ≥ 0 for j 6= i and qii (t) = − j6=i qij (t), t ≥ 0.

For any real-valued function f on M and i ∈ M, write


X X
Q(t)f (·)(i) = qij (t)f (j) = qij (t)(f (j) − f (i)).
j∈M j6=i

Let us recall the definition of the generator of a Markov chain.


Definition A.3 (Generator). A matrix Q(t), t ≥ 0, is an infinitesimal gen-
erator (or in short, a generator) of α(·) if it satisfies the q-property, and for
any bounded real-valued function f defined on M
Z t
f (α(t)) − Q(ς)f (·)(α(ς))dς (A.3)
0

is a martingale.

Remark A.4. Motivated by the applications we are interested in, a gen-


erator is defined as a matrix satisfying the q-property above, where an
additional condition on the boundedness of the entries of the matrix is
posed. It naturally connects the Markov chain and martingale problems.
Definitions including other classes of matrices may be devised as in Chung
[28]. To proceed, we give an equivalent condition for a finite-state Markov
chain generated by Q(·).
Lemma A.5. Let M = {1, . . . , m0 }. Then α(t) ∈ M, t ≥ 0, is a Markov
chain generated by Q(t) if and only if
Z t
 
I{α(t)=1} , . . . , I{α(t)=m0 } − I{α(ς)=1} , . . . , I{α(ς)=m0 } Q(ς)dς (A.4)
0

is a martingale.
Proof: For a proof, see Yin and Zhang [176, Lemma 2.4]. 2
For any given Q(t) satisfying the q-property, there exists a Markov chain
α(·) generated by Q(t). If Q(t) = Q, a constant matrix, the idea of Ethier
and Kurtz [43] can be utilized for the construction. For time-varying gen-
erator Q(t), we need to use the piecewise-deterministic process approach
as described in Davis [30], to define the Markov chain α(·).
360 Appendix A. Appendix

The discussion below is taken from that of Yin and Zhang [176], which
was originated in the work of Davis [30]. Let 0 = τ0 < τ1 < · · · < τl < · · · be
a sequence of jump times of α(·) such that the random variables τ1 , τ2 − τ1 ,
. . ., τk+1 − τk , . . . are independent. Let α(0) = i ∈ M. Then α(t) = i on
the interval [τ0 , τ1 ). The first jump time τ1 has the probability distribution
Z Z t 
P(τ1 ∈ B) = exp qii (s)ds (−qii (t)) dt,
B 0

where B ⊂ [0, ∞) is a Borel set. The post-jump location of α(t) = j, j 6= i,


is given by
qij (τ1 )
P(α(τ1 ) = j|τ1 ) = .
−qii (τ1 )
If qii (τ1 ) is 0, define P(α(τ1 ) = j|τ1 ) = 0, j 6= i. Then P(qii (τ1 ) = 0) = 0.
In fact, if Bi = {t : qii (t) = 0}, then

P(qii (τ1 ) = 0) = P(τ1 ∈ Bi )


Z Z t 
= exp qii (s)ds (−qii (t)) dt = 0.
Bi 0

In general, α(t) = α(τl ) on the interval [τl , τl+1 ). The jump time τl+1 has
the conditional probability distribution

P(τl+1 − τl ∈ Bl |τ1 , . . . , τl , α(τ1 ), . . . , α(τl ))


Z Z t+τl 

= exp qα(τl )α(τl ) (s)ds −qα(τl )α(τl ) (t + τl ) dt.
Bl τl

The post-jump location of α(t) = j, j 6= α(τl ) is given by


qα(τl )j (τl+1 )
P(α(τl+1 ) = j|τ1 , . . . , τl , τl+1 , α(τ1 ), . . . , α(τl )) = .
−qα(τl )α(τl ) (τl+1 )
Theorem A.6. Suppose that the matrix Q(t) satisfies the q-property for
t ≥ 0. Then the following statements hold.
(a) The process α(·) constructed above is a Markov chain.
(b) The process
Z t
f (α(t)) − Q(ς)f (·)(α(ς))dς (A.5)
0
is a martingale for any uniformly bounded function f (·) on M. Thus
Q(t) is indeed the generator of α(·).
(c) The transition matrix P (t, s) satisfies the forward differential equa-
tion 
 ∂P (t, s) = P (t, s)Q(t), t ≥ s,

∂t (A.6)

 P (s, s) = I,
A.2 Continuous-Time Markov Chains 361

where I is the identity matrix.

(d) Assume further that Q(t) is continuous in t. Then P (t, s) also satisfies
the backward differential equation

 ∂P (t, s) = −Q(s)P (t, s), t ≥ s,

∂s (A.7)

 P (s, s) = I.

Proof. For (a)–(c), see Yin and Zhang [176, Theorem 2.5]. As for (d), see
[26, p. 402]. 2
Note that frequently, working with s ∈ [0, T ], the backward equations are
written slightly differently by using reversed time τ = T − s. In this case,
the minus sign in (A.7) disappears. Suppose that α(t), t ≥ 0, is a Markov
chain generated by an m0 × m0 matrix Q(t). The notions of irreducibility
and quasi-stationary distribution are given next.

Definition A.7 (Irreducibility).

(a) A generator Q(t) is said to be weakly irreducible if, for each fixed
t ≥ 0, the system of equations


 ν(t)Q(t) = 0,

m0
X (A.8)

 νi (t) = 1

i=1

has a unique solution ν(t) = (ν1 (t), . . . , νm0 (t)) and ν(t) ≥ 0.

(b) A generator Q(t) is said to be irreducible, if for each fixed t ≥ 0 the


system of equations (A.8) has a unique solution ν(t) and ν(t) > 0.

By ν(t) ≥ 0, we mean that for each i ∈ M, νi (t) ≥ 0. A similar in-


terpretation holds for ν(t) > 0. It follows from the definitions above that
irreducibility implies weak irreducibility. However, the converse is not true.
For example, the generator
 
−1 1
Q=
0 0

is weakly irreducible, but it is not irreducible because it contains an ab-


sorbing state corresponding to the second row in Q. A moment of reflection
reveals that for a two-state Markov chain with generator
 
−λ(t) λ(t)
Q=
µ(t) −µ(t)
362 Appendix A. Appendix

the weak irreducibility requires only λ(t) + µ(t) > 0, whereas the irre-
ducibility requires that both λ(t) and µ(t) be positive. Such a definition
is convenient for many applications (e.g., the manufacturing systems men-
tioned in Khasminskii, Yin, and Zhang [91, p. 292]).
Definition A.8 (Quasi-Stationary Distribution). For t ≥ 0, ν(t) is termed
a quasi-stationary distribution if it is the unique solution of (A.8) satisfying
ν(t) ≥ 0.

Remark A.9. In the study of homogeneous Markov chains, stationary dis-


tributions play an important role. When we are interested in nonstationary
(nonhomogeneous) Markov chains, stationary distributions are replaced by
the corresponding quasi-stationary distributions, as defined above.

If ν(t) = ν > 0, it is a stationary distribution. In view of Definitions


A.7 and A.8, if Q(t) is weakly irreducible, then there is a quasi-stationary
distribution. Note that the rank of a weakly irreducible m0 ×m0 matrix Q(t)
is m0 − 1, for each t ≥ 0. The definition above emphasizes the probabilistic
interpretation. An equivalent definition using the algebraic properties of
Q(t) is provided next. One can verify their equivalence using the Fredholm
alternative; see Lemma A.11.
Definition A.10. A generator Q(t) is said to be weakly irreducible if, for
each fixed t ≥ 0, the system of equations


 f (t)Q(t) = 0,

Xm0
(A.9)

 fi (t) = 0

i=1

has only the trivial (zero) solution.

A.3 Fredholm Alternative and Ramification


The Fredholm alternative, which provides a powerful method for establish-
ing existence and uniqueness of solutions for various systems of equations,
can be found, for example, in Hutson and Pym [71, p. 184].
Lemma A.11 (Fredholm Alternative). Let B be a Banach space and A
a linear compact operator defined on it. Let I : B → B be the identity
operator. Assume γ 6= 0. Then one of the two alternatives holds:

(a) The homogeneous equation (γI − A)f = 0 has only the zero solution,
in which case γ ∈ ρ(A), the resolvent set of A, (γI −A)−1 is bounded,
and the inhomogeneous equation (γI − A)f = g also has one solution
f = (γI − A)−1 g, for each g ∈ B.
A.3 Fredholm Alternative and Ramification 363

(b) The homogeneous equation (γI − A)f = 0 has a nonzero solution, in


which case the inhomogeneous equation (γI − A)f = g has a solution
if and only if hg, f ∗ i = 0 for every solution f ∗ of the adjoint equation
γf ∗ = A∗ f ∗ .
Note that in (b) above, hg, f ∗ i is a pairing defined on B × B∗ (with B∗
denoting the dual of B). This is also known as an “outer product” (see [71,
p. 149]), whose purpose is similar to the inner product in a Hilbert space. If
we work with a Hilbert space, this “outer product” is identical to the usual
inner product. When one considers linear systems of algebraic equations,
the lemma above can be rewritten in a simpler form.
Let B denote an m0 × m0 matrix. For any γ 6= 0, define an operator
A : Rm0 ×m0 → Rm0 ×m0 as
Ay = y(γI − B).
Note that in this case, I is just the m0 × m0 identity matrix I. Then the
adjoint operator A∗ : Rm0 ×m0 → Rm0 ×m0 is
A∗ x = (γI − B)x.
Suppose that b and y ∈ R1×m0 . Consider the system yB = b. If the adjoint
system Bx = 0 where x ∈ Rm0 ×1 has only the zero solution, then yB = b
has a unique solution given by y = bB −1 . If Bx = 0 has a nonzero solution
x, then yB = b has a solution if and only if hb, xi = 0.
Suppose that the generator Q of a continuous-time Markov chain α1 (t)
is a constant matrix and is irreducible. Then the rank of Q is m0 − 1.
Denote by R(Q) and N (Q) the range and the null space of Q, respec-
tively. It follows that N (Q) is one-dimensional spanned by 1l (i.e., N (Q) =
span{1l}). As a consequence, the Markov chain α1 (t) with generator Q is
ergodic. In what follows, denote the associated stationary distribution by
ν = (ν1 , ν2 , . . . , νm0 ) ∈ R1×m0 . Consider a linear system of equations
Qc = η, (A.10)
where c and η ∈ Rm0 .
Lemma A.12. The following assertions hold.
(i) Equation (A.10) has a solution if and only if νη = 0.
(ii) Suppose that c1 and c2 are two solutions of (A.10). Then c1 −c2 = γ0 1l
for some γ0 ∈ R.
(iii) Any solution of (A.10) can be written as
c = γ0 1l + h0 ,
where γ0 ∈ R is an arbitrary constant, 1l = (1, . . . , 1)0 ∈ Rm0 , and
h0 ∈ Rm0 is the unique solution of (A.10) satisfying νh0 = 0.
364 Appendix A. Appendix

Proof. We begin with a stochastic representation. First, if (A.10) has a


solution c, then νQc = νη. But νQ = 0, and hence νη = 0. Next, suppose
c = c1 − c2 . Then we have
that c1 and c2 are two solutions of (A.10). Define e
Qec = 0. Thus ec ∈ N (Q) = span {1l} and hence it follows that there exists
a γ0 ∈ R such that ec = γ0 1l.
Suppose that νη = 0. Using orthogonal decomposition, we can write η
as η = b + b1 , where b ∈ R(Q0 ) and b1 ∈ N (Q). Thus, b1 = β1l for some
β ∈ R. Thus, η can be written as η = b + β1l. Since νη = 0, we obtain
β = −νb. Then solving (A.10) is equivalent to that of

Qc = b − νb1l. (A.11)

Denote by Ei the conditional expectation corresponding to the condi-


tional probability Pi (·) := P( · |α1 (0) = i), and define a column vector
h = (h1 , h2 , . . . , hm0 )0 ∈ Rm0 by
Z ∞
hi = E i (νb − bα1 (t) )dt, i ∈ M. (A.12)
0

It is readily seen that


m0
X
Ei bα1 (t) = bj P(α1 (t) = j|α1 (0) = i)
j=1
m0
X m0
X
= pij (t)bj → νj bj = νb as t → ∞.
j=1 j=1

Moreover, because α1 (t) has a finite state space, the convergence above
takes place exponentially fast; see [176, Appendix] and references therein.
Thus, h is well defined. We proceed to show that h, in fact, is a solution of
(A.11).
By direct calculation, it is seen that
Z ∞ Z ∞
Qh = (Qνb1l − QP (t)b)dt = − QP (t)bdt
0Z ∞ 0
dP (t) ∞
=− bdt = −P (t)b = −1lνb + b,
0 dt 0

where P (t) = (pij (t)) is the transition matrix satisfying the Kolmogorov
backward equation (d/dt)P (t) = QP (t) with P (0) = I and limt→∞ P (t) =
1lν. Thus, the vector h satisfies equation (A.11) and hence (A.10). By using
the result proved earlier, any solution of (A.10) can be represented by
c = h + γ1l for some γ ∈ R.
Finally, we verify that any solution of (A.10) can be written as c = h 0 +α1l
for α ∈ R, where h0 is the unique solution of (A.10) satisfying νh0 = 0.
In fact, we have shown that h defined in (A.12) solves (A.10) and any
A.3 Fredholm Alternative and Ramification 365

solution of (A.10) can be represented by c = h + γ1l for some γ ∈ R. Now


let h0 = h − νh1l ∈ Rm0 . Then we have

Qh0 = Q(h − νh1l) = Qh − νhQ1l = Qh = η.

Also, we verify that

νh0 = ν(h − νh1l) = νh − (νh)(ν1l) = 0.

Hence h0 satisfies (A.10) and νh0 = 0. Moreover, any solution of (A.10)


can be represented by

c = h + γ1l = h0 + (γ + νh)1l = h0 + γ0 1l,

where γ0 = γ + νh ∈ R. It remains to show uniqueness. To this end, define


an augmented matrix
 
Q
Qa =   ∈ R(m0 +1)×m0
ν

and a new vector  


b
bb =   ∈ Rm0 +1 .
0
Then (A.10) together with νh = 0 can be written as

Qa h = bb. (A.13)

It can be shown that Q0a Qa has full rank m0 due to the irreducibility of Q,
and the solution of (A.13) can be represented by

h = (Q0a Qa )−1 Q0abb = h0 .

This leads to the desired uniqueness. 2


Remark A.13. In the literature, (A.10) is sometimes referred to as Pois-
son equation (see [14]), and the results in (i) and (ii) above are deemed
to be well known. They are more or less a consequence of the Fredholm
alternative. The irreducibility of Q used in the lemma can be relaxed to
weak irreducibility. The result still holds.
Perhaps being interesting in its own right, we illustrate how the solu-
tions may be obtained through a stochastic representation in the proof
above. This angle of view has not been extensively exploited to our knowl-
edge. Although classifying solutions of an algebraic system by the sum of
homogenous part and a particular solution is a time honored concept, char-
acterizing the unique particular solution using orthogonality with respect
to the stationary distribution of Q is useful in many applications.
366 Appendix A. Appendix

Similar to Lemma A.12, when studying stability of switching diffusions,


we need to examine an equation
1
Qc = b − σ 2 + β1l, (A.14)
2
where β ∈ R, b ∈ Rm0 is a constant vector, and σ 2 = (σ12 , . . . , σm
2
0
) 0 ∈ R m0
2
with σi ≥ 0 for i ∈ M. Then equation (A.14) has a solution if the Markov
chain α1 (t) is ergodic. Let the associated stationary distribution be denoted
by ν. Then β is given by
m0
X  
1 2 1
β=− νi bi − σi = −νb + νσ 2 . (A.15)
i=1
2 2

Moreover, let the column vector h = (h1 , . . . , hm0 )0 ∈ Rm0 be defined by


Z ∞ 
1 2 1 2
hi = E i νb − νσ − bα1 (t) + σα1 (t) dt, i ∈ M. (A.16)
0 2 2

Then h is well defined and is a solution of (A.14) with β given by (A.15).

A.4 Martingales, Gaussian Processes, and


Diffusions
This section briefly reviews several random processes including martingales,
Gaussian processes, and diffusions.

A.4.1 Martingales
Many applications involving stochastic processes depend on the concept of
the martingale. The definition and properties of discrete-time martingales
can be found in Breiman [19, Chapter 5], Chung [28, Chapter 9], and Hall
and Heyde [63] among others. This section provides a brief review.

Discrete-Time Martingales
Definition A.14. Suppose that {Fn } is a filtration, and {Xn } is a se-
quence of random variables. The pair {Xn , Fn } is a martingale if for each
n,

(a) Xn is Fn -measurable;

(b) E|Xn | < ∞;

(c) E(Xn+1 |Fn ) = Xn a.s.


A.4 Martingales, Gaussian Processes, and Diffusions 367

It is a supermartingale (resp., submartingale) if (a) and (b) in the above


hold, and
E(Xn+1 |Fn ) ≤ Xn (resp., E(Xn+1 |Fn ) ≥ Xn ) a.s.
In what follows if the sequence of σ-algebras is clear, we simply say that
{Xn } is a martingale.
Pn
Let Xn = j=1 Yj , where {Yn } is a sequence of independent and identi-
cally distributed (i.i.d.) random variables with zero mean. It is plain that

E[Xn+1 |Y1 , . . . , Yn ] = E[Xn + Yn+1 |Y1 , . . . , Yn ]


= Xn + EYn+1 = Xn a.s.

The above equation illustrates the defining relation of a martingale.


If {Xn } is a martingale, we can define Yn = Xn − Xn−1 , which is known
as a martingale difference sequence. Suppose that {Xn , Fn } is a martingale.
Then the following properties hold.
(a) Suppose ϕ(·) is an increasing and convex function defined on R. If
for each positive integer n, E|ϕ(Xn )| < ∞, then {ϕ(Xn ), Fn } is a
submartingale.
(b) Let τ be a stopping time with respect to Fn (i.e., an integer-valued
random variable such that {τ ≤ n} is Fn -measurable for each n).
Then {Xτ ∧n , Fτ ∧n } is also a martingale.
(c) The martingale inequality (see Kushner [102, p. 3]) states that for
each λ > 0,
 
1
P max |Xj | ≥ λ ≤ E|Xn |,
1≤j≤n λ (A.17)
E max |Xj |2 ≤ 4E|Xn |2 , if E|Xn |2 < ∞ for each n.
1≤j≤n

(d) The Doob inequality (see Hall and Heyde [63, p.15]) states that for
each p > 1,
 p
1/p p 1/p
E |Xn | ≤ E max |Xj | ≤ qE1/p |Xn |p ,
1≤j≤n

−1 −1
where p +q = 1.
(e) The Burkholder inequality (see Hall and Heyde [63, p.23]) is: For
1 < p < ∞, there exist constants K1 and K2 such that
n
X p/2 n
X p/2
K1 E yj2 ≤ E|Xn |p ≤ K2 E yj2 ,
j=1 i=j

where Yn = Xn − Xn−1 .
368 Appendix A. Appendix

Remark A.15. If X(·) is a right-continuous martingale in continuous


time and f (·) is a nonnegative convex function, then
h i
h i E f (X(t) Fs
P sup f (X(t)) ≥ λ Fs ≤ , (A.18)
s≤t≤T λ
E sup |X(t)|2 ≤ 4E|X(T )|2 . (A.19)
0≤t≤T

Consider a discrete-time Markov chain {αn } with state space M (either


finite or countable) and one-step transition probability matrix P = (pij ).
Recall that a sequence {f (i) : i ∈ M} is P -harmonic or right-regular
(Karlin and Taylor [81, p. 48]), if (a) f (·) is a real-valued function such
that f (i) ≥ 0 for each i ∈ M, and (b)
X
f (i) = pij f (j) for each i ∈ M. (A.20)
j∈M

If the equality in (A.20) is replaced by ≥ (resp., ≤), {f (i) : i ∈ M} is said


to be P -superharmonic or right superregular (resp., P -subharmonic or right
subregular). Considering f = (f (i) : i ∈ M) as a column vector, (A.20) can
be written as f = P f . Similarly, we can write f ≥ P f for P -superharmonic
(resp., g ≤ P f for P -subharmonic). Likewise, {f (i) : i ∈ M} is said to be
P left regular, if (b) above is replaced by
X
f (j) = f (i)pij for each j ∈ M. (A.21)
i∈M

Similarly, left superregular and subregular functions can be defined.


There is a natural connection between a martingale and a discrete-time
Markov chain; see Karlin and Taylor [80, p. 241]. Let {αn } be a discrete-
time Markov chain and {f (i) : i ∈ M} be a bounded P -harmonic sequence.
Define Xn = f (αn ). Then E|Xn | < ∞. Moreover, owing to the Markov
property,
E(Xn+1 |Fn ) = E(f (αn+1 )|αn ))
X
= pαn ,j f (j)
j∈M

= f (αn ) = Xn a.s.
Therefore, {Xn , Fn } is a martingale. Note that if M is finite, the bound-
edness of {f (i) : i ∈ M} is not needed.
As explained in Karlin and Taylor [80], one of the widely used ways of
constructing martingales is through the utilization of eigenvalues and eigen-
vectors of a transition matrix. Again, let {αn } be a discrete-time Markov
chain with transition matrix P . Recall that a column vector f is a right
eigenvector of P associated with an eigenvalue λ ∈ C, if P f = λf . Let f
be a right eigenvector of P satisfying E|f (αn )| < ∞ for each n. For λ 6= 0,
define Xn = λ−n f (αn ). Then {Xn } is a martingale.
A.4 Martingales, Gaussian Processes, and Diffusions 369

Continuous-Time Martingales
Next, let us denote the space of Rr -valued continuous functions on [0, T ]
by C([0, T ]; Rr ), and the space of functions that are right-continuous with
left-hand limits endowed with the Skorohod topology by D([0, T ]; Rr ); see
Definition A.18. Consider X(·) = {X(t) ∈ Rr : t ≥ 0}. If for each t ≥ 0,
X(t) is an Rr random vector, we call X(·) a continuous-time stochastic
process and write it as X(t), t ≥ 0, or simply X(t) if there is no confusion.
A process X(·) is adapted to a filtration {Ft }, if for each t ≥ 0, X(t) is an
Ft -measurable random variable; X(·) is progressively measurable if for each
t ≥ 0, the process restricted to [0, t] is measurable with respect to the σ-
algebra B[0, t] × Ft in [0, t] × Ω, where B[0, t] denotes the Borel sets of [0, t].
A progressively measurable process is measurable and adapted, whereas
the converse is not generally true. However, any measurable and adapted
process with right-continuous sample paths is progressively measurable.
Frequently, we need to work with a stopping time for applications. Con-
sider (Ω, F, P ) with a filtration {Ft }. A stopping time τ is a nonnegative
random variable satisfying {τ ≤ t} ∈ Ft for all t ≥ 0.
A stochastic process {X(t) : t ≥ 0} (real- or vector-valued) is a martin-
gale on (Ω, F, P ) with respect to {Ft } if:

(a) For each t ≥ 0, X(t) is Ft -measurable,

(b) E|X(t)| < ∞, and

(c) E[X(t)|Fs ] = X(s) a.s for all t ≥ s.

If Ft is the natural filtration σ{X(s) : s ≤ t}, we often say that X(·)


is a martingale without specifying the filtration Ft . The process X(·) is
a local martingale if there exists a sequence of stopping times {τn } such
that 0 ≤ τ1 ≤ τ2 ≤ · · · ≤ τn ≤ τn+1 ≤ · · ·, τn → ∞ a.s as n → ∞, and
X (n) (t) := X(t ∧ τn ) is a martingale.

A.4.2 Gaussian Processes and Diffusion Processes


A Gaussian random vector X = (X1 , X2 , . . . , Xr ) is one whose character-
istic function has the form
 
1
φ(y) = exp iy 0 µ − y 0 Σy ,
2

where µ ∈ Rr is a constant vector, y 0 µ is the usual inner product, i denotes


the pure imaginary number satisfying i2 = −1, and Σ is a symmetric
nonnegative definite r × r matrix. In the above, µ and Σ are the mean
vector and covariance matrix of X, respectively.
Consider a stochastic process X(t), t ≥ 0. It is a Gaussian process if for
any k = 1, 2, . . . and 0 ≤ t1 < t2 < · · · < tk , (X(t1 ), X(t2 ), . . . , X(tk )) is a
370 Appendix A. Appendix

Gaussian vector. A random process X(·) has independent increments if for


any k = 1, 2, . . . and 0 ≤ t1 < t2 < · · · < tk ,

(X(t1 ) − X(0)), (X(t2 ) − X(t1 )), . . . , (X(tk ) − X(tk−1 ))

are independent. A sufficient condition for a process to be Gaussian is given


next, whose proof can be found in Skorohod [150, p. 7].
Lemma A.16. Suppose that the process X(·) has independent increments
and continuous sample paths almost surely. Then X(·) is a Gaussian pro-
cess.
Next, we consider the notion of Brownian motions. An Rr -valued random
process w(t) for t ≥ 0 is a Brownian motion, if
(a) w(0) = 0 almost surely;
(b) w(·) is a process with independent increments;
(c) w(·) has continuous sample paths almost surely;
(d) for all t, s ≥ 0, the increments w(t) − w(s) have Gaussian distribution
with E(w(t) − w(s)) = 0 and Cov(w(t), w(s)) = Σ|t − s| for some
nonnegative definite r × r matrix Σ, where Cov(w(t), w(s)) denotes
the covariance.

A Brownian motion w(·) with Σ = I is termed a standard Brownian mo-


tion. In view of Lemma A.16, a Brownian motion is necessarily a Gaussian
process. For an Rr -valued Brownian motion w(t), let Ft = σ{w(s) : s ≤ t}.
Let h(·) be an Ft -measurable process taking values in Rr×r such that
Rt
0
E|h(s)|2 ds < ∞ for all t ≥ 0. Using w(·) and h(·), we may define a
Rt
stochastic integral 0 h(s)dw(s) such that it is a martingale with mean 0
and Z t Z t h
2 i
E h(s)dw(s) = E tr(h(s)h0 (s)) ds.
0 0
Suppose that b(·) and σ(·) are nonrandom Borel measurable functions.
A process X(·) defined as
Z t Z t
X(t) = X(0) + b(s, X(s))ds + σ(s, X(s))dw(s) (A.22)
0 0

is called a diffusion. Then X(·) defined in (A.22) is a Markov process in


the sense that the Markov property

P(X(t) ∈ A|Fs ) = P(X(t) ∈ A|X(s))

holds for all 0 ≤ s ≤ t and for any Borel set A. A slightly more general
definition allows b(·) and σ(·) to be Ft -measurable processes. Nevertheless,
the current definition is sufficient for our purpose.
A.5 Weak Convergence 371

Associated with the diffusion process, there is an operator L, known as


the generator of the diffusion X(·). Let C 1,2 be the class of real-valued func-
tions on (a subset of) Rr × [0, ∞) whose first-order partial derivative with
respect to t and the second-order mixed partial derivatives with respect to
x are continuous. Define an operator L on C 1,2 by
r r
∂f (t, x) X ∂f (t, x) 1 X ∂ 2 f (t, x)
Lf (t, x) = + bi (t, x) + aij (t, x) ,
∂t i=1
∂xi 2 i,j=1 ∂xi ∂xj
(A.23)
where A(t, x) = (aij (t, x)) = σ(t, x)σ 0 (t, x). The above may also be written
in a more compact form as
∂f (t, x) 1
Lf (t, x) = + b0 (t, x)∇f (t, x) + tr(∇2 f (t, x)A(t, x)),
∂t 2
where ∇f and ∇2 f denote the gradient and Hessian of f , respectively. Note
that in this book, we use the notations b0 f and b, f interchangeably to
represent an inner product.
The well-known Itô lemma (see Gihman and Skorohod [53], Ikeda and
Watanabe [72], and Liptser and Shiryayev [110]) states that

df (t, X(t)) = Lf (t, X(t)) + ∇f 0 (t, X(t))σ(t, X(t))dw(t),

or in its integral form

f (t, X(t)) − f (0, X(0))


Z t Z t
= Lf (s, X(s))ds + ∇f 0 (s, X(s))σ(s, X(s))dw(s).
0 0

By virtue of Itô’s lemma,


Z t
Mf (t) = f (t, X(t)) − f (0, X(0)) − Lf (s, X(s))ds
0

is a square integrable Ft -martingale. Conversely, suppose that X(·) is right


continuous. Using the notation of martingale problems given by Stroock
and Varadhan [153], X(·) is said to be a solution of the martingale problem
with operator L if Mf (·) is a martingale for each f (·, ·) ∈ C01,2 (the class of
C 1,2 functions with compact support).

A.5 Weak Convergence


The notion of weak convergence is a generalization of convergence in dis-
tribution in elementary probability theory. In what follows, we present def-
initions and results, including tightness, tightness criteria, the martingale
problem, Skorohod representation, Prohorov’s theorem, and so on.
372 Appendix A. Appendix

Definition A.17 (Weak Convergence). Let P and Pk , k = 1, 2, . . ., be


probability measures defined on a metric space S. The sequence {Pk } con-
verges weakly to P if Z Z
f dPk → f dP

for every bounded and continuous function f (·) on S. Suppose that {Xk }
and X are random variables associated with Pk and P, respectively. The
sequence Xk converges to X weakly if for any bounded and continuous
function f (·) on S, Ef (Xk ) → Ef (X) as k → ∞.

Let D([0, ∞); Rr ) be the space of Rr -valued functions defined on [0, ∞)


that are right continuous and have left-hand limits; let L be a set of strictly
increasing Lipschitz continuous functions ζ(·) : [0, ∞) 7→ [0, ∞) such that
the mapping is surjective with ζ(0) = 0, limt→∞ ζ(t) = ∞, and
 
ζ(s) − ζ(t)
γ(ζ) := sup log < ∞.
0≤t<s s−t

Similar to D([0, ∞); Rr ), we also use the notation D([0, T ]; F) to denote


the D-space of functions that take values in a metric space F.

Definition A.18 (Skorohod Topology). For ξ, η ∈ D([0, ∞); Rr ), the Sko-


rohod topology d(·, ·) on D([0, ∞); Rr ) is defined as
 Z ∞ 

d(ξ, η) = inf γ(ζ) ∨ e−s sup 1 ∧ |ξ(t ∧ s) − η(ζ(t) ∧ s)| ds .
ζ∈L 0 t≥0

Analogous definitions and results are available for D([0, T ]; F), where F is
a metric space; see Ethier and Kurtz [43] and Billingsley [16] for related
references. Although we frequently work with D([0, T ]; Rr ) in this book, the
following results are often stated with respect to the space D([0, ∞); Rr ).
This enables us to apply them to t ∈ [0, T ] for any T > 0.

Definition A.19 (Tightness). A family of probability measures P defined


on a metric space S is tight if for each δ > 0, there exists a compact set
Kδ ⊂ S such that
inf P(Kδ ) ≥ 1 − δ.
P∈P

The notion of tightness is closely related to compactness. The following the-


orem, known as Prohorov’s theorem, gives such an implication. A complete
proof can be found in Ethier and Kurtz [43].

Theorem A.20 (Prohorov’s Theorem). If P is tight, then P is relatively


compact. That is, every sequence of elements in P contains a weakly conver-
gent subsequence. If the underlying metric space is complete and separable,
the tightness is equivalent to relative compactness.
A.5 Weak Convergence 373

Although weak convergence techniques usually allow one to use weaker


conditions and lead to a more general setup, it is often more convenient to
work with probability one convergence for purely analytic reasons, however.
The Skorohod representation provides us with such opportunities.

Theorem A.21 (The Skorohod representation (Ethier and Kurtz [43])).


Let Xk and X be random elements belonging to D([0, ∞); Rr ) such that Xk
converges weakly to X. Then there exists a probability space (Ω, e F, e on
e P)
e e
which are defined random elements Xk , k = 1, 2, . . . , and X in D([0, ∞); Rr )
such that for any Borel set B and all k < ∞,

e X
P( e X
ek ∈ B) = P(Xk ∈ B), and P( e ∈ B) = P(X ∈ B)

satisfying
ek = X
lim X e a.s.
k→∞

Elsewhere in the book, when we use the Skorohod representation, with a


slight abuse of notation, we often omit the tilde notation for convenience
and notational simplicity.
Let C([0, ∞); Rr ) be the space of Rr -valued continuous functions equipped
with the sup-norm topology, and C0 be the set of real-valued continuous
functions on Rr with compact support. Let C0l be the subset of C0 functions
that have continuous partial derivatives up to the order l.

Definition A.22. Let S be a metric space and A be a linear operator on


B(S) (the set of all Borel measurable functions defined on S). Let X(·) =
{X(t) : t ≥ 0} be a right-continuous process with values in S such that for
each f (·) in the domain of A,
Z t
f (X(t)) − Af (X(s))ds
0

is a martingale with respect to the filtration σ{X(s) : s ≤ t}. Then X(·) is


called a solution of the martingale problem with operator A.

Theorem A.23 (Ethier and Kurtz [43, p. 174]). A right-continuous pro-


cess X(t), t ≥ 0, is a solution of the martingale problem for the operator
A if and only if
" i  Z ti+1 #
Y
E hj (X(tj )) f (X(ti+1 )) − f (X(ti ))− Af (X(s))ds = 0
j=1 ti

whenever 0 ≤ t1 < t2 < · · · < ti+1 , f (·) in the domain of A, and


h1 , . . . , hi ∈ B(S), the Borel field of S.
374 Appendix A. Appendix

Theorem A.24 (Uniqueness of Martingale Problems, Ethier and Kurtz


[43, p. 184]). Let X(·) and Y (·) be two stochastic processes whose paths are
in D([0, T ]; Rr ). Denote an infinitesimal generator by A. If for any function
f ∈ D(A) (the domain of A),
Z t
f (X(t)) − f (X(0)) − Af (X(s))ds, t ≥ 0,
0

and Z t
f (Y (t)) − f (Y (0)) − Af (Y (s))ds, t ≥ 0
0

are martingales and X(t) and Y (t) have the same distribution for each
t ≥ 0, X(·) and Y (·) have the same distribution on D([0, ∞); Rr ).

Theorem A.25. Let X ε (·) be a solution of the differential equation

dX ε (t)
= F ε (t),
dt
and for each T < ∞, {F ε (t) : 0 ≤ t ≤ T } be uniformly integrable. If the
set of initial values {X ε (0)} is tight, then {X ε (·)} is tight in C([0, ∞); Rr ).

Proof: The proof is essentially in Billingsley [16, Theorem 8.2] (see also
Kushner [102, p. 51, Lemma 7]). 2
Define the notion of “p-lim” and an operator Aε as in Ethier and Kurtz
[43]. Suppose that X ε (·) are defined on the same probability space. Let Ftε
be the minimal σ-algebra over which {X ε (s), ξ ε (s) : s ≤ t} is measurable
and let Eεt denote the conditional expectation given Ftε . Denote

ε 
M = f : f is real-valued with bounded support and is
progressively measurable w.r.t. {Ftε }, sup E|f (t)| < ∞ .
t

ε
Let g(·), f (·), f δ (·) ∈ M . For each δ > 0 and t ≤ T < ∞, f = p − limδ f δ
if
sup E|f δ (t)| < ∞,
t,δ

then
lim E|f (t) − f δ (t)| = 0 for each t.
δ→0

The function f (·) is said to be in the domain of Aε ; that is, f (·) ∈ D(Aε ),
and Aε f = g, if
 ε 
Et f (t + δ) − f (t)
p − lim − g(t) = 0.
δ→0 δ
A.5 Weak Convergence 375

If f (·) ∈ D(Aε ), then Ethier and Kurtz [43] or Kushner [102, p. 39] implies
that Z t
f (t) − Aε f (u)du
0
is a martingale, and
Z t+s
Eεt f (t + s) − f (t) = Eεt Aε f (u)du a.s.
t

In applications, φ-mixing processes frequently arise; see [43] and [102].


The assertion below presents a couple of inequalities for mixing processes.
Further results on various mixing processes are in [43].
Lemma A.26 (Kushner [102, Lemma 4.4]). Let ξ(·) be a φ-mixing process
with mixing rate φ(·) and let h(·) be Ft∞ -measurable and |h| ≤ 1. Then

E(h(ξ(t + s))|F0t ) − Eh(ξ(t + s)) ≤ 2φ(s).

If t < u < v, and Eh(ξ(s)) = 0 for all s, then


 1/2
E(h(ξ(u))h(ξ(v))|F0t ) − Eh(ξ(u))h(ξ(v)) ≤ 4 φ(v − u)φ(u − t) ,

where Fτt = σ{ξ(s) : τ ≤ s ≤ t}.

Example A.27. A useful example of a mixing process is a function of a


stationary Markov chain with finite state space. Let αk be such a Markov
chain with state space M = {1, . . . , m0 }. Let ξk = g(αk ), where g(·) is a
real-valued function defined on M. Suppose the Markov chain or equiva-
lently, its transition probability matrix, is irreducible and aperiodic. Then
as proved in Billingsley [16, pp. 167–169], ξk is a mixing process with the
mixing measure decaying to 0 exponentially fast.
A crucial step in obtaining many limit problems depends on the verifica-
tion of tightness of the sequences of interest. A sufficient condition known
as Kurtz’s criterion appears to be rather handy to use.
Lemma A.28 (Kushner [102, Theorem 3, p. 47]). Suppose that {Y ε (·)} is
a process with paths in D([0, ∞); Rr ), and suppose that
  
ε
lim lim sup P sup |Y (t)| ≥ K1 = 0 for each T < ∞, (A.24)
K1 →∞ ε→0 0≤t≤T

and for all 0 ≤ s ≤ δ, t ≤ T ,



Eεt min 1, |Y ε (t + s) − Y ε (t)|2 ≤ Eεt γε (δ),
(A.25)
lim lim sup Eγε (δ) = 0.
δ→0 ε→0

Then {Y (·)} is tight in D([0, ∞); Rr ).


ε
376 Appendix A. Appendix

Remark A.29. In lieu of (A.24), one may verify the following condition
(see Kurtz [95, Theorem 2.7, p. 10]). Suppose that for each η > 0 and
rational t ≥ 0 there is a compact set Γt,η ⊂ Rr such that

inf P (Y ε (t) ∈ Γt,η ) > 1 − η. (A.26)


ε

A.6 Hybrid Jump Diffusion


Let us recall the notion of a switching jump diffusion or hybrid jump dif-
fusion. A hybrid jump diffusion is a jump diffusion modulated by an addi-
tional continuous-time switching process. In what follows, we confine our-
selves to the case where the switching process is a continuous-time Markov
chain. In this case, in lieu of one jump diffusion, we have a system of
jump diffusions. The description of the jump-diffusion system below is a
modification of that of [103] due to the appearance of the switching pro-
cess. Suppose that α(·) is a continuous-time Markov chain with state space
M = {1, . . . , m} and generator Q(t) = (qij (t)) [176, Sections 2.3–2.5]. Let
{τn } be an increasing sequence of stopping times independent of α(t). Let
{ψn } be a sequence of random variables representing the “impulses,” and
ψ(·) be a random process defined by

ψn if t = τn ,
ψ(t) =
0, otherwise.

The process is termed a point process if {ψn , τn } is a random sequence and


{τn } has no finite accumulation point. In the above, the ψn are referred to
as impulses.
Let Γ, a compact set not including the origin in some Euclidean space,
be the range space of ψ(·). Denote the σ-algebra of Borel sets of Γ by B(Γ).
Suppose that the impulse time τn → ∞ as n → ∞. For each H ∈ B(Γ),
and each ι ∈ M, define

N (t, H) = {# of impulses of ψ(·) on [0, t] with values in H},

which is a counting process or counting measure. Suppose that EN (t, Γ) <


∞, that Ft is a filtration such that N (·, H) is Ft -adapted for each H ∈ B(Γ),
and that ψ(·) is an Ft -Poisson point process (i.e., {N (t + ·, H) − N (t, H) :
H ∈ B(Γ)} is independent of Ft ) and N (·, ·) is an Ft -Poisson measure. If
ψ(·) is a Poisson process and the distribution of {N (t+s, H)−N (t, H) : H ∈
B(Γ)} is independent of t, then ψ(·) is a stationary Poisson point process.
Then it is known that there exists a λ > 0 and probability measure π(·) on
B(Γ) such that

E[N (t + s, H) − N (t, H) Ft ] = sπ(H)λ,


A.7 Miscellany 377

where λ is known as the impulse rate of ψ(·) and/or the jump rate of
N (·, Γ), and π(H) is the jump distribution in the sense that
P(ψ(t) ∈ H|ψ(t) 6= 0, ψ(u), u < t) = π(H).
The values and times of the impulses can be recovered from the integral
Z tZ X
G(t) = γN (ds, dγ) = ψ(s).
0 Γ s≤t

With the setup above, the jump-diffusion process modulated by α(t) is


given in
Z t Z t
X(t) = x0 + f (s, α(s), X(s))ds + σ(s, α(s), X(s))dw(s)
Z t Z0 0 (A.27)
+ g(γ, α(s− ), X(s− ))N (ds, dγ),
0 Γ

where w(·) is a real-valued standard Brownian motion, and N (·, ·) is a


Poisson measure.

A.7 Miscellany
Suppose that A is an r × r square matrix. Denote the collection of eigenval-
ues of A by Λ. Then the spectral radius of A, denoted by ρ(A), is defined
by ρ(A) = maxλ∈Λ |λ|. Recall that a matrix with real entries is a positive
matrix if it has at least one positive entry and no negative entries. If every
entry of A is positive, we call the matrix strictly positive. Likewise, for a
vector x = (x1 , . . . , xr ), by x ≥ 0, we mean xi ≥ 0 for i = 1, . . . , r; by
x > 0, we mean all entries xi > 0.
By a multi-index ζ, we mean a vector ζ = (ζ1 , . . . , ζr ) with nonnegative
integer components with |ζ| defined as |ζ| = ζ1 + · · · + ζr . For a multi-index
ζ, Dxζ is defined to be
∂ζ ∂ |ζ|
Dxζ = = ; (A.28)
∂xζ ∂x11 . . . ∂xζrr
ζ

see Friedman [47] or Gihman and Skorohod [54].

The following inequalities are widely used. The first of them is known as
the Gronwall inequality and second one is the so-called generalized Gron-
wall inequality. Both of them can be found in [61, p. 36].
Lemma A.30. If γ ∈ R, β(t) ≥ 0, and ϕ(t) are continuous real-valued
functions for a ≤ t ≤ b, which satisfy
Z t
ϕ(t) ≤ γ + β(s)ϕ(s)ds, t ∈ [a, b],
a
378 Appendix A. Appendix

then Z t
ϕ(t) ≤ γ exp( β(s)ds), t ∈ [a, b].
a

Lemma A.31. Suppose that ϕ(·) and γ(·) are real-valued continuous func-
tions on [a, b], that β(t) ≥ 0 is integrable on [a, b], and that
Z t
ϕ(t) ≤ γ(t) + β(s)ϕ(s)ds, t ∈ [a, b].
a

Then
Z t Z t
ϕ(t) ≤ γ(t) + β(s)γ(s) exp( β(u)du)ds, t ∈ [a, b].
a s

When we treat stochastic integrals, the Burkholder–Davis–Gundy in-


equality is used quite often. We state it below, a proof of which may be
found in [120, p. 70].

Lemma A.32. LetR ψ : [0, ∞) 7→ Rr×d for some positive integers r and d,

and suppose that E 0 |ψ(s)|2 < ∞. Define
Z t Z t
Z(t) = ψ(s)dw(s) and a(t) = |ψ(s)|2 ds,
0 0

where w(t) is a d-dimensional standard Brownian motion. Then for any


p > 0, there exist positive constants cp and Cp such that

cp E|a(t)|p/2 ≤ E sup |Z(s)|p ≤ Cp E|a(t)|p/2 , for all t ≥ 0.


0≤s≤t

A.8 Notes
One may find a nonmeasure-theoretic introduction to stochastic processes
in Ross [141]. The two volumes by Karlin and Taylor [80, 81] provide
an introduction to discrete-time and continuous-time Markov chains. Ad-
vanced treatments of Markov chains can be found in Chung [28] and Re-
vuz [142]. A book that deals exclusively with finite-state Markov chains is
by Iosifescu [76]. The book of Meyn and Tweedie [125] examines Markov
chains and their stability. Doob’s book [33] gives an introduction to stochas-
tic processes. Gihman and Skorohod’s three-volume work [55] provides a
comprehensive introduction to stochastic processes, whereas Liptser and
Shiryayev’s book [110] presents further topics such as nonlinear filtering.
References

[1] M.S. Agranovich, Elliptic operators on closed manifolds, in Current Prob-


lems in Mathematics: Fundamental Directions, 63 (1994), 1–130.

[2] B.D.O. Anderson and J.B. Moore, Optimal Control: Linear Quadratic
Methods, Prentice Hall, Englewood Cliffs, NJ, 1990.

[3] A. Arapostathis, M.K. Ghosh, and S.I. Marcus, Harnack’s inequality for
cooperative weakly coupled elliptic systems, Comm. Partial Differential
Eqs., 24 (1999), 1555–1571.

[4] G. Badowski and G. Yin, Stability of hybrid dynamic systems containing


singularly perturbed random processes, IEEE Trans. Automat. Control, 47
(2002), 2021–2031.

[5] G. Barone-Adesi and R. Whaley, Efficient analytic approximation of Amer-


ican option values, J. Finance, 42 (1987), 301–320.

[6] G.K. Basak, A. Bisi, and M.K. Ghosh, Stability of a random diffusion with
linear drift, J. Math. Anal. Appl., 202 (1996), 604–622.

[7] G.K. Basak, A. Bisi, and M.K. Ghosh, Stability of degenerate diffusions
with state-dependent switching, J. Math. Anal. Appl., 240 (1999), 219–
248.

[8] M.M. Benderskii and L.A. Pastur, The spectrum of the one-dimensional
Schrodinger equation with random potential, Mat. Sb. 82 (1972), 273–284.

[9] M.M. Benderskii and L.A. Pastur, Asymptotic behavior of the solutions of
a second order equation with random coefficients, Teorija Funkeii i Func-
tional’nyi Analiz, 22 (1973), 3–14.

379
380 References

[10] A. Bensoussan, Perturbation Methods in Optimal Control, J. Wiley, Chich-


ester, 1988.

[11] A. Bensoussan and P.L. Lions, Optimal control of random evolutions,


Stochastics, 5 (1981), 169–190.

[12] A. Bensoussan and J.L. Menaldi, Hybrid control and dynamic program-
ming, Dynamics Continuous Disc. Impulsive Sys., 3 (1997), 395–442.

[13] B. Bercu, F. Dufour, and G. Yin, Almost sure stabilization for feedback
controls of regime-switching linear systems with a hidden Markov chain,
IEEE Trans. Automat. Control, 54 (2009).

[14] A. Benveniste, M. Metivier, and P. Priouret, Adaptive Algorithms and


Stochastic Approximations, Springer-Verlag, Berlin, 1990.

[15] R.N. Bhattacharya, Criteria for recurrence and existence of invariant mea-
sures for multidimensional diffusions, Ann. Probab., 6 (1978), 541–553.

[16] P. Billingsley, Convergence of Probability Measures, J. Wiley, New York,


NY, 1968.

[17] T. Björk, Finite dimensional optimal filters for a class of Ito processes with
jumping parameters, Stochastics, 4 (1980), 167–183.

[18] G.B. Blankenship and G.C. Papanicolaou, Stability and control of stochas-
tic systems with wide band noise, SIAM J. Appl. Math., 34 (1978), 437–
476.

[19] L. Breiman, Probability, SIAM, Philadelphia, PA, 1992.

[20] J. Buffington and R.J. Elliott, American options with regime switching,
Int. J. Theoret. Appl. Finance, 5 (2002), 497–514.

[21] P.E. Caines and H.-F. Chen, Optimal adaptive LQG control for systems
with finite state process parameters, IEEE Trans. Automat. Control, 30
(1985), 185–189.

[22] P.E. Caines and J.-F. Zhang, On the adaptive control of jump parameter
systems via nonlinear filtering, SIAM J. Control Optim., 33 (1995), 1758–
1777.

[23] M.-F. Chen, From Markov Chains to Non-equilibrium Particle Systems,


2nd ed., World Scientific, Singapore, 2004.

[24] Z.Q. Chen and Z. Zhao, Potential theory for elliptic systems, Ann. Probab.,
24 (1996), 293–319.

[25] Z.Q. Chen and Z. Zhao, Harnack inequality for weakly coupled elliptic
systems, J. Differential Eqs., 139 (1997), 261–282.

[26] C.L. Chiang, An Introduction to Stochastic Processes and Their Applica-


tions, Kreiger, Huntington, NY, 1980.
References 381

[27] Y.S. Chow and H. Teicher, Probability Theory, 3rd ed., Springer-Verlag,
New York, NY, 1997.

[28] K.L. Chung, Markov Chains with Stationary Transition Probabilities, 2nd
ed., Springer-Verlag, New York, NY, 1967.

[29] D.R. Cox and H.D. Miller, The Theory of Stochastic Processes, J. Wiley,
New York, NY, 1965.

[30] M.H.A. Davis, Markov Models and Optimization, Chapman & Hall, Lon-
don, UK, 1993.

[31] D.A. Dawson, Critical dynamics and fluctuations for a mean–field model
of cooperative behavior, J. Statist. Phys., 31 (1983), 29–85.

[32] G.B. Di Masi, Y.M. Kabanov, and W.J. Runggaldier, Mean variance hedg-
ing of options on stocks with Markov volatility, Theory Probab. Appl., 39
(1994), 173–181.

[33] J.L. Doob, Stochastic Processes, Wiley Classic Library Edition, Wiley, New
York, NY, 1990.

[34] N.H. Du and V.H. Sam, Dynamics of a stochastic Lotka–Volterra model


perturbed by white noise, J. Math. Anal. Appl., 324 (2006), 82–97.

[35] F. Dufresne and H.U. Gerber, Risk theory for the compound Poisson pro-
cess that is perturbed by diffusion, Insurance; Math. Economics, 10 (1991),
51–59.

[36] F. Dufour and P. Bertrand, Stabilizing control law for hybrid modes, IEEE
Trans. Automat. Control, 39 (1994), 2354–2357.

[37] N.H. Du, R. Kon, K. Sato, and Y. Takeuchi, Dynamical behavior of Lotka-
Volterra competition systems: Non-autonomous bistable case and the effect
of telegraph noise, J. Comput. Appl. Math., 170 (2004), 399–422.

[38] E.B. Dynkin, Markov Processes, Vols. I and II, Springer-Verlag, Berlin,
1965.

[39] S.D. Eidelman, Parabolic Systems, North-Holland, New York, 1969.

[40] R.J. Elliott, Stochastic Calculus and Applications, Springer-Verlag, New


York, NY, 1982.

[41] A. Eizenberg and M. Freidlin, On the Dirichlet problem for a class of second
order PDE systems with small parameter. Stochastics Stochastics Rep., 33
(1990), 111–148.

[42] A. Eizenberg and M. Freidlin, Averaging principle for perturbed random


evolution equations and corresponding Dirichlet problems, Probab. Theory
Related Fields, 94 (1993), 335–374.

[43] S.N. Ethier and T.G. Kurtz, Markov Processes: Characterization and Con-
vergence, J. Wiley, New York, NY, 1986.
382 References

[44] W.H. Fleming and R.W. Rishel, Deterministic and Stochastic Optimal
Control, Springer-Verlag, New York, NY, 1975.

[45] W.H. Fleming and H.M. Soner, Controlled Markov Processes and Viscosity
Solutions, Springer-Verlag, New York, 1992.

[46] A. Friedman, Partial Differential Equations of Parabolic Type, Prentice-


Hall, Englewood, Cliffs, NJ, 1967.

[47] A. Friedman, Stochastic Differential Equations and Applications, Vol. I and


Vol. II, Academic Press, New York, NY, 1975.

[48] M.D. Fragoso and O.L.V. Costa, A unified approach for stochastic and
mean square stability of continuous-time linear ystems with Markovian
jumping parameters and additive disturbances, SIAM J. Control Optim.,
44 (2005), 1165–1191.

[49] J.P. Fouque, G. Papanicolaou, and R.K. Sircar, Derivatives in Finan-


cial Markets with Stochastic Volatility, Cambridge University Press, Cam-
bridge, UK, 2000.

[50] J.P. Fouque, G. Papanicolaou, R.K. Sircar, and K. Solna, Singular pertur-
bations in option pricing, SIAM J. Appl. Math., 63 (2003), 1648–1665.

[51] J.P. Fouque, G. Papanicolaou, R.K. Sircar, and K. Solna, Multiscale


stochastic volatility asymptotics, Multiscale Modeling & Simulation, 1
(2004) 22–42.

[52] M.K. Ghosh, A. Arapostathis, and S.I. Marcus, Ergodic control of switching
diffusions, SIAM J. Control Optim., 35 (1997), 1952–1988.

[53] I.I. Gihman and A.V. Skorohod, Introduction to the Theory of Random
Processes, W.B. Saunders, Philadelphia, PA, 1969.

[54] I.I. Gihman and A.V. Skorohod, Stochastic Differential Equations,


Springer-Verlag, Berlin, 1972.

[55] I.I. Gihman and A.V. Skorohod, Theory of Stochastic Processes, I, II, III,
Springer-Verlag, Berlin, 1979.

[56] D. Gilbarg and N.S. Trudinger, Elliptic Partial Differential Equations of


Second Order, Springer, Berlin, 2001.

[57] I.V. Girsanov, Strongly Feller processes I. General properties, Theory


Probab. Appl., 5 (1960), 5–24.

[58] K. Glover, All optimal Hankel norm approximations of linear multivariable


systems and their L-error bounds, Int. J. Control, 39 (1984), 1145–1193.

[59] G.H. Golub and C.F. Van Loan, Matrix Computations, 2nd ed., Johns
Hopkins University Press, Baltimore, MD, 1989.
References 383

[60] M.G. Garroni and J.L. Menaldi, Green Functions for Parabolic Second Or-
der Integro-Differential Equations, Pitman Research Notes in Math. Series,
No. 275, Longman, London, 1992.

[61] J.K. Hale, Ordinary Differential Equations, 2nd ed., R.E. Krieger, Malabar,
FL, 1980.

[62] J.K. Hale and E.P. Infante, Extended dynamical systems and stability the-
ory, Proc. Nat. Acad. Sci. 58 (1967), 405–409.

[63] P. Hall and C.C. Heyde, Martingale Limit Theory and Its Application,
Academic Press, New York, NY, 1980.

[64] F.B. Hanson, Applied Stochastic Processes and Control for Jump-
diffusions: Modeling, Analysis, and Computation, SIAM, Philadelphia, PA,
2007.

[65] Q. He and G. Yin, Invariant density, Liapunov exponent, and almost sure
stability of Markovian-regime-switching linear systems, preprint, 2009.

[66] U. Helmke and J.B. Moore, Optimization and Dynamical Systems,


Springer-Verlag, New York, NY, 1994.

[67] J.P. Hespanha, Stochastic Hybrid Systems: Application to Communication


Networks, Springer, Berlin, 2004.

[68] J.P. Hespanha, A model for stochastic hybrid systems with application to
communication networks, Nonlinear Anal., 62 (2005), 1353–1383.

[69] J.C. Hull, Options, Futures, and Other Derivatives, 3rd ed., Prentice-Hall,
Upper Saddle River, NJ, 1997.

[70] J.C. Hull and A. White, The pricing of options on assets with stochastic
volatilities, J. Finance, 42 (1987), 281–300.

[71] V. Hutson and J.S. Pym, Applications of Functional Analysis and Operator
Theory, Academic Press, London, UK, 1980.

[72] N. Ikeda and S. Watanabe, Stochastic Differential Equations and Diffusion


Processes, North-Holland, Amsterdam, 1981.

[73] A.M. Il’in, R.Z. Khasminskii, and G. Yin, Singularly perturbed switching
diffusions: rapid switchings and fast diffusions, J. Optim. Theory Appl.,
102 (1999), 555–591.

[74] A.M. Il’in, R.Z. Khasminskii, and G. Yin, Asymptotic expansions of solu-
tions of integro-differential equations for transition densities of singularly
perturbed switching diffusions, J. Math. Anal. Appl., 238 (1999), 516–539.

[75] J. Imae, J.E. Perkins, and J.B. Moore Toward time-varying balanced real-
ization via Riccati equations, Math. Control Signals Syst., 5 (1992), 313–
326.
384 References

[76] M. Iosifescu, Finite Markov Processes and Their Applications, Wiley,


Chichester, 1980.

[77] J. Jacod and A.N. Shiryayev, Limit Theorems for Stochastic Processes,
Springer-Verlag, New York, NY, 1980.

[78] Y. Ji and H.J. Chizeck, Controllability, stabilizability, and continuous-time


Markovian jump linear quadratic control, IEEE Trans. Automat. Control,
35 (1990), 777–788.

[79] I.I. Kac and N.N. Krasovskii, On the stability of systems with random
parameters, J. Appl. Math. Mech., 24 (1960), 1225–1246.

[80] S. Karlin and H.M. Taylor, A First Course in Stochastic Processes, 2nd
ed., Academic Press, New York, NY, 1975.

[81] S. Karlin and H.M. Taylor, A Second Course in Stochastic Processes, Aca-
demic Press, New York, NY, 1981.

[82] R.Z. Khasminskii, On an averaging principle for Ito stochastic differential


equations, Kybernetika, 4 (1968), 260–279.

[83] R.Z. Khasminskii, Stochastic Stability of Differential Equations, Sijthoff


and Noordhoff, Alphen aan den Rijn, Netherlands, 1980.

[84] R.Z. Khasminskii and F.C. Klebaner, Long term behavior of solutions of
the Lotka-Volterra systems under small random perturbations, Ann. Appl.
Probab., 11 (2001), 952–963.

[85] R.Z. Khasminskii and G. Yin, Asymptotic series for singularly perturbed
Kolmogorov-Fokker-Planck equations, SIAM J. Appl. Math. 56 (1996),
1766–1793.

[86] R.Z. Khasminskii and G. Yin, On transition densities of singularly per-


turbed diffusions with fast and slow components, SIAM J. Appl. Math.,
56 (1996), 1794–1819.

[87] R.Z. Khasminskii and G. Yin, Asymptotic behavior of parabolic equations


arising from one-dimensional null-recurrent diffusions, J. Differential Eqs.,
161 (2000), 154–173.

[88] R.Z. Khasminskii and G. Yin, On averaging principles: An asymptotic


expansion approach, SIAM J. Math. Anal., 35 (2004), 1534–1560.

[89] R.Z. Khasminskii and G. Yin, Limit behavior of two-time-scale diffusions


revisited, J. Differential Eqs., 212 (2005) 85–113.

[90] R.Z. Khasminskii and G. Yin, Uniform asymptotic expansions for pricing
European options, Appl. Math. Optim., 52 (2005), 279–296.

[91] R.Z. Khasminskii, G. Yin, and Q. Zhang, Asymptotic expansions of singu-


larly perturbed systems involving rapidly fluctuating Markov chains, SIAM
J. Appl. Math., 56 (1996), 277–293.
References 385

[92] R.Z. Khasminskii, C. Zhu, and G. Yin, Stability of regime-switching diffu-


sions, Stochastic Process. Appl., 117 (2007), 1037–1051.

[93] R.Z. Khasminskii, C. Zhu, and G. Yin, Asymptotic properties of parabolic


systems for null-recurrent switching diffusions, Acta Math. Appl. Sinica,
43 (2007), 177–194.

[94] P.E. Kloeden and E. Platen, Numerical Solution of Stochastic Differential


Equations, Springer-Verlag, New York, NY, 1992.

[95] T.G. Kurtz, Approximation of Population Processes, SIAM, Philadelphia,


PA, 1981.

[96] N.V. Krylov, Controlled Diffusion Processes, Springer-Verlag, New York,


NY, 1980.

[97] H.J. Kushner, Stochastic Stability and Control, Academic Press, New York,
NY, 1967.

[98] H.J. Kushner, The concept of invariant set for stochastic dynamical sys-
tems and applications to stochastic stability, in Stochastic Optimization
and Control, H.F. Karreman Ed., J. Wiley, New York, NY, 1968, 47–57.

[99] H.J. Kushner, On the stability of stochastic differential-difference equa-


tions, J. Differential Eqs., 4 (1968), 424–443.

[100] H.J. Kushner, On the convergence of Lion’s identification method with


random inputs, IEEE Trans. Automat. Control, 15 (1970), 652–654.

[101] H.J. Kushner, Asymptotic distributions of solutions of ordinary differential


equations with wide band noise inputs; approximate invariant measures,
Stochastics, 6 (1982), 259–278.

[102] H.J. Kushner, Approximation and Weak Convergence Methods for Ran-
dom Processes, with Applications to Stochastic Systems Theory, MIT Press,
Cambridge, MA, 1984.

[103] H.J. Kushner, Weak Convergence Methods and Singularly Perturbed


Stochastic Control and Filtering Problems, Birkhäuser, Boston, MA, 1990.

[104] H.J. Kushner and G. Yin, Stochastic Approximation and Recursive Algo-
rithms and Applications, 2nd ed., Springer-Verlag, New York, NY, 2003.

[105] O.A. Ladyzenskaja, V.A. Solonnikov, and N.N. Ural’ceva, Linear and
Quasi-linear Equations of Parabolic Type, Translations of Math. Mono-
graphs, Vol. 23, Amer. Math. Soc., Providence, RI, 1968.

[106] J.P. LaSalle, The extent of asymptotic stability, Proc. Nat. Acad. Sci., 46
(1960), 365.

[107] J.P. LaSalle and S. Lefschetz, The Stability by Liapunov Direct Method,
Academic Press, New York, NY, 1961.
386 References

[108] G.M. Lieberman, Second-Order Parabolic Differential Equations, World


Scientific, Singapore, 1996.

[109] J.-J. Liou, Recurrence and transience of Gaussian diffusion processes, Ko-
dai Math. J., 13 (1990), 210–230.

[110] R.S. Liptser and A.N. Shiryayev, Statistics of Random Processes I & II,
Springer-Verlag, New York, NY, 2001.

[111] Y.J. Liu, G. Yin, Q. Zhang, and J.B. Moore, Balanced realizations of
regime-switching linear systems, Math. Control, Signals, Sys., 19 (2007),
207–234.

[112] K.A. Loparo and G.L. Blankenship, Almost sure instability of a class of
linear stochastic systems with jump process coefficients, in Lyapunov Ex-
ponents, Lecture Notes in Math., 1186, Springer, Berlin, 1986, 160–190.

[113] Q. Luo and X. Mao, Stochastic population dynamics under regime switch-
ing, J. Math. Anal. Appl., 334 (2007), 69–84.

[114] X. Mao, Stability of Stochastic Differential Equations with respect to Sem-


inartingales, Longman Sci. Tech., Harlow, Essex, UK, 1991.

[115] X. Mao, Stochastic Differential Equations and Applications, 2nd ed., Hor-
wood, Chichester, UK, 2007.

[116] X. Mao, Stability of stochastic differential equations with Markovian


switching, Stochastic Process. Appl., 79 (1999), 45–67.

[117] X. Mao, A note on the LaSalle-type theorems for stochastic differential


delay equations, J. Math. Anal. Appl., 268 (2002), 125–142.

[118] X. Mao, S. Sabanis, and R. Renshaw, Asymptotic behavior of the stochastic


Lotka-Volterra model, J. Math. Anal. Appl., 287 (2003), 141–156.

[119] X. Mao, G. Yin, and C. Yuan, Stabilization and destabilization of hybrid


systems of stochastic differential equations, Automatica, 43 (2007), 264–
273.

[120] X. Mao and C. Yuan, Stochastic Differential Equations with Markovian


Switching, Imperial College Press, London, UK, 2006.

[121] X.R. Mao, C. Yuan, and G. Yin, Numerical method for stationary dis-
tribution of stochastic differential equations with Markovian switching, J.
Comput. Appl. Math., 174 (2005), 1–27.

[122] X. Mao, C. Yuan, and G. Yin, Approximations of Euler-Maruyama type


for stochastic differential equations with Markovian switching, under non-
Lipschitz conditions, J. Computational Appl. Math., 205 (2007), 936–948.

[123] M. Mariton, Jump Linear Systems in Automatic Control, Marcel Dekker,


New York, NY, 1990.

[124] R.C. Merton, Continuous-Time Finance, Blackwell, Cambridge, MA, 1990.


References 387

[125] S.P. Meyn and R.L. Tweedie, Markov Chains and Stochastic Stability,
Springer-Verlag, London, UK, 1993.

[126] G.N. Milstein, Numerical Integration of Stochastic Differential Equations,


Kluwer, New York, NY, 1995.

[127] G.N. Milstein and M.V. Tretyakov, Stochastic Numerics for Mathematical
Physics, Springer-Verlag, Berlin, 2004.

[128] C.M. Moller, Stochastic differential equations for ruin probability, J. Appl.
Probab., 32 (1995), 74–89.

[129] B.C. Moore, Principal component in linear systems: Controllability, observ-


ability, and model reduction, IEEE Trans. Automat. Control, 26 (1981),
17–31.

[130] B. Øksendal, Stochastic Differential Equations, An Introduction with Ap-


plications, 6th ed., Springer-Verlag, Berlin, 2003.

[131] G.C. Papanicolaou, D. Stroock, and S.R.S. Varadhan, Martingale approach


to some limit theorems, in Proc. 1976 Duke Univ. Conf. on Turbulence,
Durham, NC, 1976.

[132] E. Pardoux and A. Yu. Veretennikov, On Poisson equation and diffusion


approximation 1, Ann. Probab. 29 (2001), 1061–1085.

[133] J.E. Perkins, U. Helmke, and J.B. Moore, Balanced realizations via gradient
flow techniques, Sys. Control Lett., 14 (1990), 369–379.

[134] L. Perko, Differential Equations and Dynamical Systems, 3rd ed., Springer,
New York, NY, 2001.

[135] S. Peszat and J. Zabczyk, Strong Feller property and irreducibility for
diffusions on Hilbert spaces, Ann. Probab., 23 (1995), 157–172.

[136] K. Pichór and R. Rudnicki, Stability of Markov semigroups and applica-


tions to parabolic systems, J. Math. Anal. Appl., 215 (1997), 56–74.

[137] M. Prandini, J. Hu, J. Lygeros, and S. Sastry, A probabilistic approach


to aircraft conflict detection, IEEE Trans. Intelligent Transport. Syst., 1
(2000), 199–220.

[138] M.H. Protter and H.F. Weinberger, Maximum Principles in Differential


Equations, Prentice-Hall, Englewood Cliffs, NJ, 1967.

[139] H. Robbins and S. Monro, A stochastic approximation method, Ann. Math.


Statist. 22 (1951), 400–407.

[140] T. Rolski, H. Schmidli, V. Schmidt, and J. Teugels, Stochastic Processes


for Insurance and Finance, J. Wiley, New York, NY, 1999.

[141] S. Ross, Stochastic Processes, J. Wiley, New York, NY, 1983.

[142] D. Revuz, Markov Chains, 2nd ed., North-Holland, Amsterdam, 1984.


388 References

[143] W. Rudin, Real and Complex Analysis, 3rd ed., McGraw-Hill, New York,
NY, 1987.

[144] H. Sandberg and A. Rantzer, Balanced truncation of linear time-varying


systems, IEEE Trans. Automat. Control, 49 (2004), 217–229.

[145] E. Seneta, Non-negative Matrices and Markov Chains, Springer-Verlag,


New York, NY, 1981.

[146] S.P. Sethi and Q. Zhang, Hierarchical Decision Making in Stochastic Man-
ufacturing Systems, Birkhäuser, Boston, 1994.

[147] S.P. Sethi, H. Zhang, and Q. Zhang, Average-cost Control of Stochastic


Manufacturing Systems, Springer, New York, NY, 2005.

[148] S. Shokoohi, L.M. Silverman, and P.M. Van Dooren, Linear time variable
systems: Balancing and model reduction, IEEE Trans. Automat. Control,
28 (1983), 810–822.

[149] H.A. Simon and A. Ando, Aggregation of variables in dynamic systems,


Econometrica, 29 (1961), 111–138.

[150] A.V. Skorohod, Asymptotic Methods in the Theory of Stochastic Differen-


tial Equations, Amer. Math. Soc., Providence, RI, 1989.

[151] M. Slatkin, The dynamics of a population in a Markovian environment,


Ecology, 59 (1978), 249–256.

[152] Q.S. Song and G. Yin, Rates of convergence of numerical methods for
controlled regime-switching diffusions with stopping times in the costs,
SIAM J. Control Optim., 48 (2009), 1831–1857.

[153] D.W. Stroock and S.R.S. Varadhan, Multidimensional Diffusion Processes,


Springer-Verlag, Berlin, 1979.

[154] D.D. Sworder and J.E. Boyd, Estimation Problems in Hybrid Systems,
Cambridge University Press, Cambridge, UK, 1999.

[155] D.D. Sworder and V.G. Robinson, Feedback regulators for jump parameters
systems with state and control dependent transition rates, IEEE Trans.
Automat. Control, AC-18 (1973), 355–360.

[156] Y. Takeuchi, N.H. Du, N.T. Hieu, and K. Sato, Evolution of predator-prey
systems decribed by a Lotka-Volterra equation under random environment,
J. Math. Anal. Appl., 323 (2006), 938–957.

[157] E.I. Verriest and T. Kailath, On generalized balanced realizations, IEEE


Trans. Automat. Control, 28 (1983), 833–844.

[158] J.T. Wloka, B. Rowley, and B. Lawruk, Boundary Value Problems for El-
liptic Systems, Cambridge University Press, Cambridge, UK, 1995.

[159] W.M. Wonham, Some applications of stochastic differential equations to


optimal nonlinear filtering, SIAM J. Control, 2 (1965), 347–369.
References 389

[160] W.M. Wonham, Liapunov criteria for weak stochastic stability, J. Differ-
ential Eqs., 2 (1966), 195–207.

[161] F. Xi, Feller property and exponential ergodicity of diffusion processes with
state-dependent switching, Sci. China Ser. A, 51 (2008), 329–342.

[162] F. Xi and G. Yin, Asymptotic properties of a mean-field model with a


continuous-state-dependent switching process, J. Appl. Probab., 46 (2009),
221–243.

[163] H. Yang and G. Yin, Ruin probability for a model under Markovian switch-
ing regime, in Probability, Finance and Insurance, World Scientific, T.L.
Lai, H. Yang, and S.P. Yung, Eds., 2004, 206–217.

[164] D.D. Yao, Q. Zhang and X.Y. Zhou, A regime-switching model for Euro-
pean options, in Stochastic Processes, Optimization, and Control Theory
Applications in Financial Engineering, Queueing Networks, and Manufac-
turing Systems, H.M. Yan, G. Yin, and Q. Zhang, Eds., Springer, New
York, NY, 2006, 281–300.

[165] G. Yin, On limit results for a class of singularly perturbed switching diffu-
sions, J. Theoret. Probab., 14 (2001), 673–697.

[166] G. Yin, Asymptotic expansions of option price under regime-switching dif-


fusions with a fast-varying switching process, Asymptotic Anal., 63 (2009).

[167] G. Yin and S. Dey, Weak convergence of hybrid filtering problems involving
nearly completely decomposable hidden Markov chains, SIAM J. Control
Optim., 41 (2003), 1820–1842.

[168] G. Yin, V. Krishnamurthy, and C. Ion, Regime switching stochastic ap-


proximation algorithms with application to adaptive discrete stochastic
optimization, SIAM J. Optim., 14 (2004), 1187–1215.

[169] G. Yin and V. Krishnamurthy, Least mean square algorithms with Markov
regime switching limit, IEEE Trans. Automat. Control, 50 (2005), 577–593.

[170] G. Yin, R.H. Liu, and Q. Zhang, Recursive algorithms for stock liquidation:
A stochastic optimization approach, SIAM J. Optim., 13 (2002), 240–263.

[171] G. Yin, X.R. Mao, and K. Yin, Numerical approximation of invariant


measures for hybrid diffusion systems, IEEE Trans. Automat. Control, 50
(2005), 577–593.

[172] G. Yin, X. Mao, C. Yuan, and D. Cao, Approximation methods for hybrid
diffusion systems with state-dependent switching diffusion processes: Nu-
merical alogorithms and existence and uniqueness of solutions, preprint,
2007.

[173] G. Yin, Q.S. Song, and Z. Zhang, Numerical solutions for jump-diffusions
with regime switching, Stochastics, 77 (2005), 61–79.
390 References

[174] G. Yin, H.M. Yan, and X.C. Lou, On a class of stochastic optimization algo-
rithms with applications to manufacturing models, in Model-Oriented Data
Analysis, W.G. Müller, H.P. Wynn and A.A. Zhigljavsky, Eds., Physica-
Verlag, Heidelberg, 213–226, 1993.

[175] G. Yin and H.L. Yang, Two-time-scale jump-diffusion models with Marko-
vian switching regimes, Stochastics Stochastics Rep., 76 (2004), 77–99.

[176] G. Yin and Q. Zhang, Continuous-time Markov Chains and Applications:


A Singular Perturbations Approach, Springer-Verlag, New York, NY, 1998.

[177] G. Yin and Q. Zhang, Discrete-time Markov Chains: Two-time-scale Meth-


ods and Applications, Springer, New York, NY, 2005.

[178] G. Yin, Q. Zhang, and G. Badowski, Asymptotic properties of a singu-


larly perturbed Markov chain with inclusion of transient states, Ann. Appl.
Probab., 10 (2000), 549–572.

[179] G. Yin and C. Zhu, On the notion of weak stability and related issues of
hybrid diffusion systems, Nonlinear Anal.: Hybrid System, 1 (2007), 173–
187.

[180] G. Yin and C. Zhu, Regularity and recurrence of switching diffusions, J.


Syst. Sci. Complexity, 20 (2007), 273–283.

[181] J. Yong and X.Y. Zhou, Stochastic Controls: Hamiltonian Systems and
HJB Equations, Springer-Verlag, New York, 1999.

[182] C. Yuan and J. Lygeros, Stabilization of a class of stochastic differential


equations with Markovian switching, Syst. Control Lett., 54 (2005), 819–
833.

[183] C. Yuan and X. Mao, Asymptotic stability in distribution of stochastic


differential equations with Markovian switching, Stochastic Process Appl.,
103 (2003), 277–291.

[184] Q. Zhang, Stock trading: An optimal selling rule, SIAM J. Control Optim.,
40 (2001), 64–87.

[185] Q. Zhang and G. Yin, On nearly optimal controls of hybrid LQG problems,
IEEE Trans. Automat. Control, 44 (1999) 2271–2282.

[186] X.Y. Zhou and G. Yin, Markowitz mean-variance portfolio selection with
regime switching: A continuous-time model, SIAM J. Control Optim., 42
(2003), 1466–1482.

[187] C. Zhu and G. Yin, Asymptotic properties of hybrid diffusion systems,


SIAM J. Control Optim., 46 (2007), 1155–1179.

[188] C. Zhu and G. Yin, On strong Feller, recurrence, and weak stabilization
of regime-switching diffusions, SIAM J. Control Optim., 48 (2009), 2003–
2031.
References 391

[189] C. Zhu and G. Yin, On competitive Lotka–Volterra model in random en-


vironments, J. Math. Anal. Appl., 357 (2009), 154–170.

[190] C. Zhu, G. Yin, and Q.S. Song, Stability of random-switching systems of


differential equations, Quarterly Appl. Math., 67 (2009), 201–220.
Index

Adapted process, 369 law of large numbers, 119


Adaptive control, 9 Markov chain, 113
Asymptotic distribution, 129
Asymptotic stability Feller property, 41
in probability, 254 Filtering
with probability 1, 254 Wonham, 9
Fredholm alternative, 357, 362
Backward equation, 361
Balanced realization, 6 Gaussian process, 369
Brownian motion, 370
Burkholder’s inequality, 367 Hartman–Grobman theorem, 246,
249
Cauchy problem, 119
Hybrid diffusion, 1
Chapman–Kolmogorov equation,
Hybrid jump diffusion, 376
358
Hybrid process, 1
Coupling technique, 59
Cycle, 111
Inaccessibility
Diffusion, 370 state zero, 191
Discrete event, 1 Instability, 203
Doeblin’s condition, 357 sufficient condition, 204
Doob’s inequality, 367 Insurance risk, 14, 323
Dynkin’s formula, 30, 73 Invariant density, 238
Invariant measure
Ergodicity, 111, 285 approximation, 159
criterion, 114 Invariant set

393
394 Index

measure-theoretic approach, Poisson measure, 325, 376


269 Positive recurrence, 72, 286
sample path approach, 254 criterion, 85, 87, 290
Irreducibility, 361 linearization, 89
Itô formula, 30, 371 multiergodic class, 285
perturbed Liapunov function,
Jump diffusion, 323, 376 285
Jump process, 358 two-time scale, 285
Process
Liapunov exponent, 231 two-time scale, 16, 323
Local martingale, 369 Progressive measurability, 369
Lotka–Volterra model, 5, 37, 210 Prohorov’s theorem, 373
Markov chain, 356, 358
Quasi-stationary distribution, 362
continuous time, 358
discrete time, 355
Random enviroment
generator, 359
option price, 301, 324
irreducibility, 356, 361
risk theory, 14, 323
quasi-stationary distribution,
Recurrence, 69
362
criterion, 78
stationarity, 358
definition, 72
Martingale, 366
independent of region, 78
continuous time, 369
Regularity, 34
discrete time, 366
criterion, 35
Martingale problem, 170, 324, 371,
373
Skorohod representation, 373
criterion, 373
Skorohod topology, 372
uniqueness, 374
Smooth dependence of initial data,
Mean square continuity, 38
56
Mixing
Stability, 183
φ-mixing, 375
p-stability, 188
inequality, 130, 375
criterion, 227
Null recurrence, 72 in probability, 254
criterion, 94 linearization, 203
Numerical algorithm necessary condition, 201
constant stepsize, 138, 161 switching ODE, 217
decreasing stepsize, 155, 161 Stabilization
Numerical approximation, 137, 139 adaptive system, 9
Stationary distribution, 357, 362
Omega-limit set, 269 Stochastic approximation, 162
Option Stochastic asymptotic stability in
European style, 301, 324 the large, 254
Stochastic volatility, 301
Perturbed Liapunov function, 290 Stopping time, 369
Poisson equation, 365 Strong Feller property, 53
Index 395

Switching diffusion, 28 Two-time scale


control, 18 fast diffusion, 339
existence, 30 fast switching, 326
uniqueness, 30
Weak continuity, 38
Tightness, 268, 372 Weak convergence, 371, 372
criterion, 376 Weak irreducibility, 326
switching diffusion, 125 Weak stabilization, 119
Transience, 82
criterion, 82

You might also like