(Graduate Studies in Mathematics) N. V. Krylov-Introduction To The Theory of Random Processes-Amer Mathematical Society (2002) PDF

Introduction
to the Theory of
Random Processes
N.V. Krylov
Graduate Studies
in Mathematics
Volume 43
Selected Titles in This Series
43 N. V. Krylov, Introduction to the theory of random processes, 2002
42 Jin Hong and Seok-Jin Kang, Introduction to quantum groups and crystal bases, 2002
41 Georgi V. Smirnov, Introduction to the theory of differential inclusions, 2002
40 Robert E. Greene and Steven G. Krantz, F\Inction theory of one complex variable,
2002
39 Larry C. Grove, Classical groups and geometric algebra, 2002
38 Elton P. Hsu, Stochastic analysis on manifolds, 2002
37 Hershel M. Farkas and Irwin Kra, Theta constants, Riemann surfaces and the modular
group, 2001
36 Martin Schechter, Principles of functional analysis, second edition, 2002
35 James F. Davis and Paul Kirk, Lecture notes in algebraic topology, 2001
34 Sigurdur Helgason, Differential geometry, Lie groups, and symmetric spaces, 2001
33 Dmitri Burago, Yuri Burago, and Sergei Ivanov, A course in metric geometry, 2001
32 Robert G. Bartle, A modern theory of integration, 2001
31 Ralf Korn and Elke Korn, Option pricing and portfolio optimization: Modern methods
of financial mathematics, 2001
30 J. C. McConnell and J. C. Robson, Noncommutative Noetherian rings, 2001
29 Javier Duoandikoetxea, Fourier analysis, 2001
28 Liviu I. Nicolaescu, Notes on Seiberg-Witten theory, 2000
27 Thierry Aubin, A course in differential geometry, 2001
26 Rolf Berndt, An introduction to symplectic geometry, 2001
25 Thomas Friedrich, Dirac operators in Riemannian geometry, 2000
24 Helmut Koch, Number theory: Algebraic numbers and functions, 2000
23 Alberto Candel and Lawrence Conlon, Foliations I, 2000
22 Giinter R. Krause and Thomas H. Lenagan, Growth of algebras and Gelfand-Kirillov
dimension, 2000
21 John B. Conway, A course in operator theory, 2000
20 Robert E. Gompf and Andras I. Stipsicz, 4-manifolds and Kirby calculus, 1999
19 Lawrence C. Evans, Partial differential equations, 1998
18 Winfried Just and Martin Weese, Discovering modern set theory. II: Set-theoretic
tools for every mathematician, 1997
17 Henryk Iwaniec, Topics in classical automorphic forms, 1997
16 Richard V. Kadison and John R. Ringrose, F\Indamentals of the theory of operator
algebras. Volume II: Advanced theory, 1997
15 Richard V. Kadison and John R. Ringrose, F\Indamentals of the theory of operator
algebras. Volume I: Elementary theory, 1997
14 Elliott H. Lieb and Michael Loss, Analysis, 1997
13 Paul C. Shields, The ergodic theory of discrete sample paths, 1996
12 N. V. Krylov, Lectures on elliptic and parabolic equations in Holder spaces, 1996
11 Jacques Dixmier, Enveloping algebras, 1996 Printing
10 Barry Simon, Representations of finite and compact groups, 1996
9 Dino Lorenzini, An invitation to arithmetic geometry, 1996
8 Winfried Just and Martin Weese, Discovering modern set theory. I: The basics, 1996
7 Gerald J. Janusz, Algebraic number fields, second edition, 1996
6 Jens Carsten Jantzen, Lectures on quantum groups, 1996
5 Rick Miranda, Algebraic curves and Riemann surfaces, 1995
4 Russell A. Gordon, The integrals of Lebesgue, Denjoy, Perron, and Henstock, 1994
(Continued in the back of this publication)
Introduction
to the Theory of
Random Processes
Introduction
to the Theory of
Random Processes
N.V. Krylov
Graduate Studies
in Mathematics
Volume 43
American Mathematical Society
Providence, Rhode Island
Editorial Board
Steven G. Krantz
David Saltman (Chair)
David Sattinger
Ronald Stern
2000 Mathematics Subject Classification. Primary 60-01; Secondary 60G99.
The author was supported in part by NSF Grant DMS-9876586
ABSTRACT. These lecture notes concentrate on some general facts and ideas of the theory of
stochastic processes. The main objects of study are the Wiener processes, the stationary processes,
the infinitely divisible processes, and the Ito stochastic equations.
Although it is not possible to cover even a noticeable portion of the topics listed above in a
short course, the author sincerely hopes that after having followed the material presented here
the reader will have acquired a good understanding of what kind of results are available and what
kind of techniques' are used to obtain them.
These notes are intended for graduate students and scientists in mathematics, physics and
engineering interested in the theory of random processes and its applications.
Library of Congress Cataloging-in-Publication Data
Krylov, N. V. (Nikolai Vladimirovich)
Introduction to the theory of random processes/ N. V. Krylov
p. cm. - (Graduate studies in mathematics, ISSN 1065-7339; v. 43)
Includes bibliographical references and index.
ISBN 0-8218-2985-8 (alk. paper)
1. Stochastic processes. I. Title. II. Series.
QA274.K79 2002
519.2/3-dc21 2002018241
Copying and reprinting. Individual readers of this publication, and nonprofit libraries
acting for them, are permitted to make fair use of the material, such as to copy a chapter for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for such
permission should be addressed to the Acquisitions Department, American Mathematical Society,
P. 0. Box 6248, Providence, Rhode Island 02940-6248. Requests can also be made by e-mail to
reprint-permissionams.org.
2002 by the American Mathematical Society. All rights reserved.
The American Mathematical Society retains all rights
except those granted to the United States Government.
Printed in the United States of America.
The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at URL: http: I /www.ams.org/
10987654321 07 06 05 04 03 02
Contents
Preface Xl
Chapter 1. Generalities 1
1. Some selected topics from probability theory 1
2. Some facts from measure theory on Polish spaces 5
3. The notion of random process 14
4. Continuous random processes 16
5. Hints to exercises 25
Chapter 2. The Wiener Process 27
1. Brownian motion and the Wiener process 27
2. Some properties of the Wiener process 32
3. Integration against random orthogonal measures 39
4. The Wiener process on [O, oo) 50
5. Markov and strong Markov properties of the Wiener process 52
6. Examples of applying the strong Markov property 57
7. Ito stochastic integral 61
8. The structure of Ito integrable functions 65
Chapter 3. Martingales 71
-
vii
viii Contents
1. Conditional expectations 71
2. Discrete time martingales 78
3. Properties of martingales 81
4. Limit theorems for martingales 87
Chapter 4. Stationary Processes 95
1. Simplest properties of second-order stationary processes 95
2. Spectral decomposition of trajectories 101
3. Ornstein-Uhlenbeck process 105
4. Gaussian stationary processes with rational spectral densities 112
5. Remarks about predicting Gaussian stationary
processes with rational spectral densities 117
6. Stationary processes and the Birkho-Khinchin theorem 119
Chapter 5. Innitely Divisible Processes 131
1. Stochastically continuous processes with independent increments131
2. Levy-Khinchin theorem 137
3. Jump measures and their relation to Levy measures 144
4. Further comments on jump measures 154
5. Representing innitely divisible processes through jump measures
155
6. Constructing innitely divisible processes 160
Chapter 6. It o Stochastic Integral 169
1. The classical denition 169
2. Properties of the stochastic integral on H 174
3. Dening the It o integral if
_
T
0
f
2
s
ds < 179
4. It o integral with respect to a multidimensional Wiener process 186
5. It os formula 188
6. An alternative proof of It os formula 195
Contents ix
7. Examples of applying It os formula 200
8. Girsanovs theorem 204
9. Stochastic It o equations 211
10. An example of a stochastic equation 216
11. The Markov property of solutions of stochastic equations 220
Bibliography 227
Index 229
Preface
For about ten years between 1973 and 1986 the author was delivering a one-
year topics course Random Processes at the Department of Mechanics and
Mathematics of Moscow State University. This topics course was obligatory
for third-fourth year undergraduate students (about 20 years of age) with
major in probability theory and its applications. With great sympathy I
remember my rst students in this course: M. Safonov, A. Veretennikov,
S. Anulova, and L. Mikhailovskaya. During these years the contents of the
course gradually evolved, simplifying and shortening to the shape which has
been presented in two 83 and 73 page long rotaprint lecture notes published
by Moscow State University in 1986 and 1987. In 1990 I emigrated to the
USA and in 1998 got the opportunity to present parts of the same course
as a one-quarter topics course in probability theory for graduate students at
the University of Minnesota. I thus had the opportunity to test the course
in the USA as well as on several generations of students in Russia. What
the reader nds below is a somewhat extended version of my lectures and
the recitations which went along with the lectures in Russia.
The theory of random processes is an extremely vast branch of math-
ematics which cannot be covered even in ten one-year topics courses with
minimal intersection of contents. Therefore, the intent of this book is to
get the reader acquainted only with some parts of the theory. The choice
of these parts was mainly dened by the duration of the course and the au-
thors taste and interests. However, there is no doubt that the ideas, facts,
and techniques presented here will be useful if the reader decides to move
on and study some other parts of the theory of random processes.
From the table of contents the reader can see that the main topics of
the book are the Wiener process, stationary processes, innitely divisible
xi
xii Preface
processes, and Ito integral and stochastic equations. Chapters 1 and 3 are
devoted to some techniques needed in other chapters. In Chapter 1 we
discuss some general facts from probability theory and stochastic processes
from the point of view of probability measures on Polish spaces. The re-
sults of this chapter help construct the Wiener process by using Donskers
invariance principle. They also play an important role in other issues, for
instance, in statistics of random processes. In Chapter 3 we present basics
of discrete time martingales, which then are used in one way or another in
all subsequent chapters. Another common feature of all chapters excluding
Chapter 1 is that we use stochastic integration with respect to random or-
thogonal measures. In particular, we use it for spectral representation of
trajectories of stationary processes and for proving that Gaussian station-
ary processes with rational spectral densities are components of solutions to
stochastic equations. In the case of innitely divisible processes, stochas-
tic integration allows us to obtain a representation of trajectories through
jump measures. Apart from this and from the obvious connection between
the Wiener process and It os calculus, all other chapters are independent
and can be read in any order.
The book is designed as a textbook. Therefore it does not contain any
new theoretical material but rather a new compilation of some known facts,
methods and ways of presenting the material. A relative novelty in Chapter
2 is viewing the It o stochastic integral as a particular case of the integral of
nonrandom functions against random orthogonal measures. In Chapter 6 we
give two proofs of It os formula: one is more or less traditional and the other
is based on using stochastic intervals. There are about 128 exercises in the
book. About 41 of them are used in the main text and are marked with an
asterisk. The bibliography contains some references we use in the lectures
and which can also be recommended as a source of additional reading on
the subjects presented here, deeper results, and further references.
The author is sincerely grateful to Wonjae Chang, Kyeong-Hun Kim,
and Kijung Lee, who read parts of the book and pointed out many errors,
to Dan Stroock for his friendly critisizm of the rst draft, and to Naresh
Jain for useful suggestions.
Nicolai Krylov
Minneapolis, January 2001
Chapter 1
Generalities
This chapter is of an introductory nature. We start with recalling some basic
probabilistic notions and facts in Sec. 1. Actually, the reader is supposed to
be familiar with the material of this rather short section, which in no way is
intended to be a systematic introduction to probability theory. All missing
details can be found, for instance, in excellent books by R. Dudley [Du]
and D. Stroock [St]. In Sec. 2 we discuss measures on Polish spaces. Quite
often this subject is also included in courses on probability theory. Sec. 3
is devoted to the notion of random process, and in Sec. 4 we discuss the
relation between continuous random processes and measures on the space of
continuous functions.
1. Some selected topics from probability theory
The purpose of this section is to remember some familiar tunes and get
warmed up. We just want to refresh our memory, recall some standard
notions and facts, and introduce the notation to be used in the future.
Let be a set and T a collection of its subsets.
1. Denition. We say that T is a -eld if
(i) T,
(ii) for every A
1
, ..., A
n
, ... such that A
n
T, we have

n
A
n
T,
(iii) if A T, then A
c
:= A T.
In the case when T is a -eld the couple (, T) is called a measurable
space, and elements of T are called events.
2. Example. Let be a set. Then T := , is a -eld which is called
the trivial -eld.
1
2 Chapter 1. Generalities, Sec 1
3. Example. Let be a set. Then the family of all its subsets is a
-eld.
Example 3 shows, in particular, that if T is a family of subsets of , then
there always exists at least one -eld containing T: T . Furthermore,
it is easy to understand that, given a collection of -elds T
of subsets of
, where runs through a set of indices, the set of all subsets of each
of which belongs to every -eld T
is again a -eld. In other words the

intersection of every nonempty collection of -elds is a -eld. In view
of Example 3, it makes sense to consider the intersection of all -elds
containing a given family T of subsets of , and this intersection is a -
eld. Hence the smallest -eld containing T exists. It is called the -eld
generated by T and is denoted by (T).
If X is a closed subset of R
d
, the -eld of its subsets generated by the
collection of intersections of all closed balls in R
d
with X is called the Borel
-eld and is denoted by B(X). Elements of B(X) are called Borel subsets
of X.
Assume that T is a -eld (then of course (T) = T). Suppose that to
every A T there is assigned a number P(A).
4. Denition. We say that P() is a probability measure on (, T) or on
T if
(i) P(A) 0 and P() = 1,
(ii) for every sequence of pairwise disjoint A
1
, ..., A
n
, ... T, we have
P
_
_
n
A
n
_
=
n
P(A
n
).
If on a measurable space (, T) there is dened a probability measure P,
the triple (, T, P) is called a probability space.
5. Example. The triple, consisting of [0, 1] (= ), the -eld B([0, 1]) of
Borel subsets of [0, 1] (taken as T) and Lebesgue measure (as P) is a
probability space.
Let (, T, P) be a probability space and A (not necessarily A T).
6. Denition. We say that A has zero probability and write P(A) = 0 if
there exists a set B T such that A B and P(B) = 0. The family of
all subsets of of type C A, where C T and A has zero probability, is
denoted by T
P
and called the completion of T with respect to P. If ( T
is a sub--eld of T, one completes G in the same way by using again events
of zero probability (from (, T, P) but not (, (, P)).
7. Exercise*. Prove that T
P
is a -eld.
Ch 1 Section 1. Some selected topics from probability theory 3
The measure P extends to T
P
by the formula P(C A) = P(C) if
C T and P(A) = 0. It is easy to prove that this extension is well dened,
preserves the values of P on T and yields a probability measure on T
P
.
8. Denition. The -eld T is said to be complete (with respect to P) if
T
P
= T. The probability space (, T, P) is said to be complete if T
P
= T,
that is, if T contains all sets of zero probability. If ( T is a sub--eld of
T containing all sets of zero probability, it is also called complete.
The above argument shows that every probability space (, T, P) admits
a completion (, T
P
, P). In general there are probability spaces which are
not complete. In particular, in Example 5 the completion of B([0, 1]) with
respect to is the -eld of Lebesgue sets (or Lebesgue -eld), which does
not coincide with B([0, 1]). In other words, there are sets of measure zero
which are not Borel.
9. Exercise. Let f be the Cantor function on [0, 1], and let C be a non-
Borel subset of [0, 1] , where is the set of all rational numbers. Existence
of such C is guaranteed, for instance, by Vitalis example. Prove that x :
f(x) C has Lebesgue measure zero and is not Borel.
By denition, for every B T
P
, there exists C T such that P(BC) =
0. Therefore, the advantages of considering T
P
may look very slim. How-
ever, sometimes it turns out to be very convenient to pass to T
P
, because
then more sets become measurable and tractable in the framework of mea-
sure theory. It is worth noting the following important result even though it
will not be used in the future. It turns out that the projection on the x-axis
of a Borel subset of R
2
is not necessarily Borel, but is always a Lebesgue
set (see, for instance, [Me]). Therefore, if f(x, y) is a Borel function on R
2
,
then for the function

f(x) := supf(x, y) : y R and every c R we have
x :

f(x) > c = x : y such that f(x, y) > c B
(R).
It follows that

f is Lebesgue measurable (but not necessarily Borel measur-
able) and it makes sense to consider its integral against dx. On the other
hand, one knows that for every T
P
-measurable function there exists an T-
measurable one equal to the original almost surely, that is, such that the set
where they are dierent has zero probability. It follows that there exists a
Borel function equal to

f(x) almost everywhere. However the last sentence
is just a long way of saying that

f(x) is measurable, and it also calls for new
notation for the modication, which can make exposition quite cumbersome.
10. Lemma. Let and X be sets and let be a function dened on with
values in X. For every B X set
1
(B) = : () X. Then
(i)
1
as a mapping between sets preserves all set-theoretic operations
(for instance, if we are given a family of subsets B
of X indexed by , then
1
(
) =
1
(B
), and so on),
(ii) if T is a -eld of subsets of , then
B : B X,
1
(B) T
is a -eld of subsets of X.
We leave the proof of these simple facts to the reader.
If : X and there is a -eld B of subsets of X, we denote
() :=
1
(B) :=
1
(B) : B B. By Lemma 10 (i) the family
1
(B)
is a -eld. It is called the -eld generated by . Observe that, by denition,
each element of () is representable as : () B for some B B.
11. Denition. Let (, T) and (X, B) be measurable spaces, and let :
X be a function. We say that is a random variable if () T.
If, in addition, (, T, P) is a probability space and is a random variable,
the function dened on B by the formula
P
1
(B) = P(
1
(B)) = P : () B
is called the distribution of . By Lemma 10 (i) the function P
1
is a
probability measure on B. One also uses the notation
F
= P
1
.
It turns out that every probability measure is the distribution of a ran-
dom variable.
12. Theorem. Let be a probability measure on a measurable space (X, B).
Then there exist a probability space (, T, P) and an X-valued random vari-
able dened on this space such that F
= .
Proof. Let (, T, P) = (X, B, ) and (x) = x. Then x : (x) B =
B. Hence for every B B we have F
(B) = (B), and the theorem is

proved.
Remember that if is a real-valued random variable dened on a prob-
ability space (, T, P) and at least one of the integrals
_
+
() P(d),
_
() P(d)
(
:= ([[ )/2) is nite, then by the expectation of we mean

Ch 1 Section 2. Some facts from measure theory on Polish spaces 5
E :=
_
() P(d) :=
_
+
() P(d)
_
() P(d).
The next theorem relates expectations to distributions.
13. Theorem. Let (, T, P) be a probability space, (X, B) a measurable
space and : X a random variable. Let f be a measurable mapping
from (X, B) to ([0, ), B[0, )). Then f() is a random variable and
Ef() =
_
X
f(x) F
(dx). (1)
Proof. For t 0, let [t] be the integer part of t and
n
(t) = 2
n
[2
n
t].
Drawing the graph of
n
makes it clear that 0 t
n
(t) 2
n
,
n
increases
when n increases, and
n
are Borel functions. Furthermore, the variables
f(),
n
(f()),
n
(f(x)) are appropriately measurable and, by the monotone
convergence theorem,
Ef() = lim
n
E
n
(f()),
_
X
f(x) F
(dx) = lim
n
_
X
n
(f(x)) F
(dx).
It follows that it suces to prove the theorem for functions
n
(f). Each
of them is measurable and only takes countably many nonnegative values;
that is, it has the form
k
c
k
I
B
k
(x),
where B
k
B and c
k
0. It only remains to notice that by denition
EI
B
k
() = P B
k
= F
(B
k
) =
_
X
I
B
k
(x) F
(dx)
and by the monotone convergence theorem
E
k
c
k
I
B
k
() =
k
c
k
EI
B
k
() =
_
X
k
c
k
I
B
k
(x) F
(dx).
The theorem is proved.
Notice that (1) also holds for f taking values of dierent signs whenever
at least one side of (1) makes sense. This follows easily from the equality
f = f
+
f
and from (1) applied to f
.
2. Some facts from measure theory on Polish spaces
In this book the only Polish spaces we will be dealing with are Euclidean
spaces and the space of continuous functions dened on [0, 1].
2:1. Denitions and simple facts. A complete separable metric space
is called a Polish space. Let X be a Polish space with metric (x, y). By
denition the closed ball of radius r centered at x is
B
r
(x) = y : (x, y) r.
The smallest -eld of subsets of X containing all closed balls is called
the Borel -eld and is denoted B(X). Elements of B(X) are called Borel
sets.
The structure of an arbitrary Borel set, even in R, is extremely complex.
However, very often working with all Borel sets is rather convenient.
Observe that
y : (x, y) < r =
_
n
y : (x, y) r 1/n.
Therefore, open balls are Borel. Furthermore, since X is separable, each
open set can be represented as the countable union of certain open balls.
Therefore, open sets are Borel. Their complements, which are arbitrary
closed sets, are Borel sets as well. By the way, it follows from this discussion
that one could equivalently dene the Borel -eld as the smallest -eld of
subsets of X containing all open balls.
If X and Y are Polish spaces, and f : X Y , then the function f is
called a Borel function if
f
1
(B) := x : f(x) B B(X) B B(Y ).
In other words f is a Borel function if f : X Y is a random variable with
respect to the -elds B(X) and B(Y ). An example of Borel functions is
given in the following theorem.
1. Theorem. Let X and Y be Polish spaces, and let f : X Y be a
continuous function. Then f is Borel.
Proof. Remember that by Lemma 1.10 the collection
:= B Y : f
1
(B) B(X)
is a -eld. Next, for every B
r
(y) Y the set f
1
(B
r
(y)) is closed because
of the continuity of f. Hence B
r
(x) . Since B(Y ) is the smallest -eld
containing all B
r
(x), we have B(Y ) , which is the same as saying that
f is Borel. The theorem is proved.
Let us emphasize a very important feature of the above proof. Instead
of taking a particular B B(Y ) and proving that f
1
(B) B(X), we took
the collection of all sets possessing a desired property. This device will be
used quite often.
Next, we are going to treat measures on Polish spaces. We recall that a
measure is called nite if all its values belong to (, ). Actually, it is safe
to say that everywhere in the book we are always dealing with nonnegative
measures. The only exception is encountered in Remark 17, and even there
we could avoid using signed measures if we rely on - and -systems, which
come somewhat later in Sec. 2.3.
2. Theorem. Let X be a Polish space and a nite nonnegative measure
on (X, B(X)). Then is regular in the sense that for every B B(X) and
> 0 there exist an open set G and a closed set satisfying
G B , (G ) . (1)
Proof. Take a nite nonnegative measure on (X, B(X)) and call a
set B B(X) regular if for every > 0 there exist open G and closed
satisfying (1).
Let be the set of all regular sets. We are going to prove that
(i) is a -eld, and
(ii) B
r
(x) .
Then by the denition of B(X) we have B(X) , and this is exactly
what we need.
Statement (ii) is almost trivial since, for every n 1,
:= B
r
(x) B
r
(x) x : (x, y) < r + 1/n =: G
n
,
where is closed, the G
n
are open and (G
n
) 0 since the sets G
n

are nested and their intersection is empty.
To prove (i), rst notice that X as a set open and closed simulta-
neously. Furthermore, the complement of an open (closed) set is a closed
(respectively, open) set and if G B , then
c
B
c
G
c
with
c
G
c
= G .
This shows that if B , then B
c
. It only remains to check that
countable unions of elements of belong to .
Let B
n
, n = 1, 2, 3, ..., > 0, and let G
n
be open and
n
be closed
and such that
G
n
B
n

n
, (G
n

n
) 2
n
.
Dene
B =
_
n
B
n
, G =
_
n
G
n
, D
n
=
n
_
i=1
i
.
Then G is open, D
n
is closed, and obviously G D
n
are nested, so that
lim
n
(G D
n
) = (G D
n
(G
n

n
) .
Hence, for appropriate n we have (G D
n
) 2, and this brings the proof
to an end.
3. Corollary. If
1
and
2
are nite nonnegative measures on (X, B(X))
and
1
() =
2
() for all closed , then
1
=
2
.
Indeed, then
1
(X) =
2
(X) (X is closed) and hence the
i
s also coin-
cide on all open subsets of X. But then they coincide on all Borel sets, as
seen from
i
(G
i
)
i
(B)
i
(
i
),
i
(G ) ,
where G = G
1
G
2
, =
1

2
and G is open.
4. Theorem. If
1
and
2
are nite nonnegative measures on (X, B(X))
and
_
X
f(x)
1
(dx) =
_
X
f(x)
2
(dx)
for every bounded continuous f, then
1
=
2
.
Proof. By the preceding corollary we only need to check that
1
=
2
on closed sets. Take a closed set and let
(x, ) = inf(x, y) : y .
Since the absolute value of a dierence of infs is not greater than the
sup of the absolute values of the dierences and since [(x, y) (z, y)[
(x, z), we have that [(x, ) (z, )[ (x, z), which implies that (x, )
is continuous. Furthermore,
(x, ) > 0 x ,
since is closed. Hence, for the continuous function
f
n
(x) := (1 +n(x, ))
1
we have 1 f
n
(x) I
(x), so that by the dominated convergence theorem
1
() =
_
I
1
(dx) = lim
n
_
f
n
1
(dx) = lim
n
_
f
n
2
(dx) =
2
().
2:2. Tightness and convergence of measures. As we have mentioned
in the Preface, the results of this chapter help construct the Wiener process
by using a version of the central limit theorem for random walks known as
Donskers invariance principle. Therefore we turn our attention to studying
convergence of measures on Polish spaces. An important property of a
measure on a Polish space is its tightness, which is expressed in the following
terms.
5. Theorem (Ulam). Let be a nite nonnegative measure on (X, B(X)).
Then for every > 0 there exists a compact set K X such that (K
c
) .
Proof. Let x
i
: i = 1, 2, 3, ... be a dense subset of X. Observe that for
every n 1
_
i
B
1/n
(x
i
) = X.
Therefore, there exists an i
n
such that
_
_
ii
n
B
1/n
(x
i
)
_
(X) 2
n
. (2)
Now dene
K =

n1
_
ii
n
B
1/n
(x
i
). (3)
Observe that K is totally bounded in the sense that, for every > 0,
there exists a nite set A = x
1
, ..., x
i()
, called an -net, such that every
point of K is in the -neighborhood of at least one point in A. Indeed, it
suces to take i() = i
n
with any n 1/.
In addition

ii
n
B
1/n
(x
i
) is closed as a nite union of closed sets, and
then K is closed as the intersection of closed sets. It follows that K is a
compact set (see Exercise 6). Now it only remains to notice that
(K
c
)
__
_
ii
n
B
1/n
(x
i
)
_
c
_
n
2
n
.
6. Exercise*. Prove that the following are equivalent:
(i) K is a totally bounded closed set.
(ii) For every sequence of points x
n
K, there is a subsequence x
n
which converges to an element of K.

7. Corollary. For every Borel B and > 0 there exists a compact set
B such that (B ) .
Now we consider the issue of convergence of measures on X.
8. Denition. Let and
n
be nite nonnegative measures on (X, B(X)).
We say that
n
converge weakly to and write
n
w
if for every bounded
continuous function f
_
X
f
n
(dx)
_
X
f (dx) (4)
A family / of nite measures on (X, B(X)) is called relatively weakly (se-
quentially) compact if every sequence of elements of / has a weakly con-
vergent subsequence.
9. Exercise*. Let ,
n
be random variables with values in X dened on
some probability spaces. Assume that the distributions of
n
on (X, B(X))
converge weakly to the distribution of . Let f(x) be a real-valued contin-
uous function on X. Prove that the distributions of f(
n
) converge weakly
to the distribution of f().
10. Exercise*. Let / =
1
,
2
, ... be a sequence of nonnegative nite
measures on (X, B(X)) and let be a nonnegative measure on (X, B(X)).
Prove that if every sequence of elements of / has a subsequence weakly
convergent to , then
n
w
.
11. Theorem. Let ,
n
, n = 1, 2, 3, ..., be nonnegative nite measures on
(X, B(X)). Then the following conditions are equivalent:
(i)
n
w
,
(ii) () lim
n
n
() for every closed and (X) = lim
n
n
(X),
(iii) (G) lim
n
n
(G) for every open G and (X) = lim
n
n
(X),
(iv) (B) = lim
n
n
(B) for every Borel B such that (B) = 0,
(v)
_
f
n
(dx)
_
f (dx) for every Borel bounded f such that (
f
) =
0, where
f
is the set of all points at which f is discontinuous.
Proof. (i) =(ii). Take a closed set and dene f
n
as in the proof of
Theorem 4. Then for every m 1
_
f
m
(dx) = lim
n
_
f
m
n
(dx) lim
n
n
()
since f
m
I
. In addition, the left hand sides converge to () as m ,

so that () lim
n
n
(). The second equality in (ii) is obvious since
_
1
n
(dx)
_
1 (dx).
Obviously (ii)(iii).
(ii)&(iii)=(iv). Indeed,
B B

B B,
where

B is closed,

B B is open, B

B, (

B (

B B)) = (B) = 0.
Hence
(

B) = (

B B) = (B)
and
(B) = (

B) lim
n
n
(

B) lim
n
n
(B) lim
n
n
(B)
lim
n
n
(

B B) (

B B) = (B).
(iv)=(v). First, since X = ,
n
(X) (X). It follows that we can
add any constant to f without altering (4), which allows us to concentrate
only on f 0. For such a bounded f we have
_
f
n
(dx) =
_
_
_
M
0
I
f(x)>t
dt
_
n
(dx) =
_
M
0
n
x : f(x) > t dt,
where M = supf. It is seen now that, to prove (4), it suces to show that
n
x : f(x) > t x : f(x) > t (5)
for almost all t. We will see that this convergence holds at every point t at
which x : f(x) = t = 0; that is, one needs to exclude not more than a
countable set.
Take a t > 0 such that x : f(x) = t = 0 and let B = x : f(x) > t.
If y B and f is continuous at y, then f(y) = t. Hence B f(x) =
t
f
, (B) = 0, and (5) follows from the assumption.
Finally, since the implication (v)=(i) is obvious, the theorem is proved.
Before stating the following corollary we remind the reader that we have
dened weak convergence (Denition 8) only for nonnegative nite measures.
12. Corollary. Let X be a closed subset in R
d
and
n
w
, where is
Lebesgue measure. Then (4) holds for every Borel Riemann integrable func-
tion f, since for such a function (
f
) = 0.
13. Exercise. If is an irrational number in (0, 1), then, for every integer
m ,= 0 and every x R,
1
n+1
n
k=0
e
im2(x+k)
= e
im2x
e
im2(n+1)
1
(n + 1)(e
im2
1)
0 as n . (6)
Also, if m = 0, the limit is just 1. By using Fourier series, prove that
1
n+1
n
k=0
f(x +k)
_
1
0
f(y) dy (7)
for every x [0, 1] and every 1-periodic continuous function f. By writing
the sum in (7) as the integral against a measure
n
and applying Corollary
12 for indicators, prove that, for every 0 a < b 1, the asymptotic
frequency of fractional parts of numbers , 2, 3, ... in the interval (a, b) is
b a.
14. Exercise. Take the sequence 2
n
, n = 1, 2, ..., and, for each n, let a
n
be
the rst digit in the decimal form of 2
n
. Here is the sequence of the rst 45
values of a
n
obtained by using Matlab:
2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4,
8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3.
We see that there are no 7s or 9s in this sequence. Let N
b
(n) denote the
number of appearances of digit b = 1, ..., 9 in the sequence a
1
, ..., a
n
. By
using Exercise 13 nd the limit of N
b
(n)/n as n and, in particular,
show that this limit is positive for every b = 1, ..., 9.
15. Exercise. Prove that for every function f (measurable or not) the set
f
is Borel.
We will use the following theorem, the proof of which can be found in
[Bi].
16. Theorem (Prokhorov). A family / of probability measures on the
space (X, B(X)) is relatively weakly compact if and only if it is tight in the
sense that for every > 0 there exists a compact set K such that (K
c
)
for every /.
Let us give an outline of a proof of this theorem (a complete proof can
be found, for instance, in [Bi], [Du], [GS]). The necessity is proved in the
same way as Ulams theorem. Indeed, use the notation from its proof and
rst prove that for every n 1
inf
_
_
im
B
1/n
(x
i
)
_
: / 1 (8)
as m . By way of getting a contradiction, assume that this is wrong.
Then, for an > 0 and every m, there would exist a measure
m
/ such
that
m
_
_
im
B
1/n
(x
i
)
_
1 .
By assumption there exists a (probability) measure which is a weak limit
point of
m
. By Ulams theorem there is a compact set K such that
1/2 (K). Since K admits a nite 1/(2n)-net, there exists k such that
K
ik
B
o
1/n
(x
i
), where B
o
r
(x) is the open ball of radius r centered at x.
By Theorem 11 (iii)
1 /2
_
_
ik
B
o
1/n
(x
i
)
_
lim
m
m
_
_
ik
B
o
1/n
(x
i
)
_
lim
m
m
_
_
ik
B
1/n
(x
i
)
_
1 .
We have a contradiction which proves (8). Now it is clear how to choose i
n
in order to have (2) satised for all /, and then the desired set K can
be given by (3).
Proof of suciency can be based on Rieszs remarkable theorem on the
general form of continuous linear functionals dened on the set of continuous
functions on a compact set. Let K be a compact subset of X, C(K) the
set of all continuous functions on K, and assume that on C(K) we have a
linear function (f) such that
[(f)[ N sup[f(x)[ : x K
for all f C(K) with N independent of f. Then it turns out that there is
a measure such that
(f) =
_
K
f (dx).
Now x > 0 and take an appropriate K(). For every countable set of
f
m
C(K()) and every sequence of measures /, by using Cantors
diagonalization method one can extract a subsequence
n
such that
_
K()
f
m
n
(dx)
would have limits as n . One can choose such a sequence of fs to
be dense in C(K()), and then lim
n
_
K()
f
n
(dx) exists for every con-
tinuous f and denes a linear bounded functional on C(K()), and hence
denes a measure on K(). It remains to paste these measures obtained for
dierent and get a measure on X, and also arrange for one sequence
n
to be good for all K() with running through 1, 1/2, 1/3, ....
17. Remark. In the above explanation we used the fact that if is a
nite measure on (X, B(X)) and
_
f (dx) 0 for all nonnegative bounded
continuous functions f, then 0.
To prove that this is indeed true, remember that by Hahns theorem
there exist two measurable (Borel) sets B
1
and B
2
such that B
1
B
2
= X,
B
1
B
2
= , and
i
(B) = (1)
i
(B B
i
) 0 for i = 1, 2 and every
B B(X).
Then =
2
1
,
_
f
2
(dx)
_
f
1
(dx) for all nonnegative continuous
f. One derives from here as in the proof of Theorem 4 that
2
()
1
() for
all closed , and by regularity
2
(B)
1
(B) for all B B(X). Plugging
in B B
1
in place of B, we get 0 =
2
(B B
1
)
1
(B B
1
) =
1
(B) 0
and
1
(B) = 0, as claimed.
3. The notion of random process
Let T be a set, (, T, P) a probability space, (X, B) a measurable space,
and assume that, for every t T, we are given an X-valued T-measurable
function
t
=
t
(). Then we say that
t
is a random process on T with
values in X. For individual the function
t
() as a function of t is called
a path or a trajectory of the process.
The set T may be dierent in dierent settings. If T = 0, 1, 2, ..., then
t
is called a random sequence. If T = (a, b), then
t
is a continuous-time
random process. If T = R
2
, then
t
is called a two-parameter random eld.
Ch 1 Section 3. The notion of random process 15
In the following lemma, for a measurable space (X, B) and integer n,
we denote by (X
n
, B
n
) the product of n copies of (X, B), that is,
X
n
= (x
1
, ..., x
n
) : x
1
, ..., x
n
X,
and B
n
is the smallest -eld of subsets of X
n
containing every B
(n)
of type
B
1
... B
n
,
where B
i
B(X).
1. Lemma. Let t
1
, ..., t
n
T. Then (
t
1
, ...,
t
n
) is a random variable with
values in (X
n
, B
n
).
Proof. The function () := (
t
1
(), ...,
t
n
()) maps into X
n
. The
set of all subsets B
(n)
of X
n
for which
1
(B
(n)
) T is a -eld. In
addition, contains every B
(n)
of type B
1
... B
n
, where B
i
B(X).
This is seen from the fact that
1
(B
1
... B
n
) = : () B
1
... B
n
= :
t
1
() B
1
, ...,
t
n
() B
n
=
i
:
t
i
() B
i
T.
Hence contains the -eld generated by those B
(n)
. Since the latter
is B
n
by denition, we have B(X), i.e.
1
(B
(n)
) T for every
B
(n)
B
n
. The lemma is proved.
2. Remark. In particular, we have proved that :
2
() +
2
() 1 is
a random event if and are random variables.
The random variable (
t
1
, ...,
t
n
) has a distribution on (X
n
, B
n
). This
distribution is called the nite-dimensional distribution corresponding to
t
1
, ..., t
n
.
So-called cylinder sets play an important role in the theory of random
processes.
Let (X, B(X)) be a Polish space and T a set. Denote by X
T
the set of
all X-valued functions on T. This notation is natural if one observes that if
T only consists of two points, T = 1, 2, then every X-valued function on
T is just a pair (x, y), where x is the value of the function at t = 1 and y is
the value of the function at t = 2. So the set of X-valued functions on T is
just the set of all pairs (x, y), and X
T
= X X = X
2
.
We denote by x
the points in X
T
and by x
t
the value of x
at t. Every
set of type
x
: (x
t
1
, ..., x
t
n
) B
(n)
,
where t
i
T and B
(n)
B
n
, is called the nite dimensional cylinder set
with base B
(n)
attached to t
1
, ..., t
n
. The -eld generated by all nite
dimensional cylinder sets is called the cylinder -eld.
3. Exercise*. Prove that the family of all nite dimensional cylinder sets
is an algebra, that is, X
T
is a cylinder set and complements and nite unions
and intersections of cylinder sets are cylinder sets.
4. Exercise. Let denote the cylinder -eld of subsets of the set of all X-
valued functions on [0, 1]. Prove that for every A there exists a countable
set t
1
, t
2
, ... [0, 1] such that if x
A and y
is a function such that y

t
n
= x
t
n
for all n, then y
A. In other words, elements of are dened by specifying

conditions on trajectories only at countably many points of [0, 1].
5. Exercise. Give an example of a Polish space (X, B(X)) such that the
set C([0, 1], X) of all bounded and continuous X-valued functions on [0, 1]
is not an element of the -eld from the previous exercise. Thus you will
see that there exists a very important and quite natural set which is not
measurable.
4. Continuous random processes
For simplicity consider real-valued random processes on T = [0, 1]. Such a
process is called continuous if all its trajectories are continuous functions
on T. In that case, for each , we have a continuous trajectory or in other
words an element of the space C = C([0, 1]) of continuous functions on [0, 1].
You know that this is a Polish space when provided with the metric
(x
, y
) = sup
t[0,1]
[x
t
y
t
[.
Apart from the Borel -eld, which is convenient as far as convergence of
distributions is concerned, there is the cylinder -eld (C), dened as the
-eld of subsets of C generated by the collection of all subsets of the form
x
C : x
t
, t [0, 1], B(R).
Observe that (C) is not the cylinder -eld in the space of all real-valued
functions on [0, 1] as dened before Exercise 3.3.
1. Lemma. (C) = B(C).
Proof. For t xed, denote by
t
the function on C dened by
t
(x
) = x
t
.
Ch 1 Section 4. Continuous random processes 17
Obviously
t
is a real-valued continuous function on C. By Theorem 2.1 it
is Borel, i.e. for every B B(R) we have
1
t
(B) B(C), i.e. x
: x
t

B B(C). It follows easily (for instance, as in the proof of Theorem 2.1)
that (C) B(C).
To prove the opposite inclusion it suces to prove that all closed balls
are cylinder sets. Fix x
0
C and > 0. Then obviously

B
(x
0
) = x
C : (x
0
, x
) =
C : x
r
[x
0
r
, x
0
r
+],
where the intersection is taken for all rational r [0, 1]. This intersection
being countable, we have B
(x
0
) (C), and the lemma is proved.

The following theorem allows one to treat continuous random processes
just like C-valued random elements.
2. Theorem. If
t
() is a continuous process on [0, 1], then
is a C-valued
random variable. Conversely, if
is a C-valued random variable, then

t
()
is a continuous process on [0, 1].
Proof. To prove the direct statement, it suces to notice that, by de-
nition, the -eld of all those B C for which
1
(B) T contains all sets

of the type
x
: x
t
, t [0, 1], B(R),
and hence contains all cylinder subsets of C, that is, by Lemma 1, all Borel
subsets of C.
The converse follows at once from the fact that
t
=
t
(
), which shows
that
t
is a superposition of two measurable functions. The lemma is proved.
By Ulams theorem the distribution of a process with continuous tra-
jectories is concentrated up to on a compact set K
C. Remember the
following necessary and sucient condition for a subset of C to be compact
(the Arzel`a-Ascoli theorem).
3. Theorem. Let K be a closed subset of C. It is compact if and only if
the family of functions x
K is uniformly bounded and equicontinuous, i.e.

if and only if
(i) there is a constant N such that
sup
t
[x
t
[ N x
K
and
(ii) for each > 0 there exists a > 0 such that [x
t
x
s
[ whenever
x
K and [t s[ , t, s [0, 1].

4. Lemma. Let x
t
be a real-valued function dened on [0, 1] (independent
of ). Assume that there exist a constant a > 0 and an integer n 0 such
that
[x
(i+1)/2
m x
i/2
m[ 2
ma
for all m n and 0 i 2
m
1. Then for all binary rational numbers
t, s [0, 1] satisfying [t s[ 2
n
we have
[x
t
x
s
[ N(a)[t s[
a
,
where N(a) = 2
2a+1
(2
a
1)
1
.
Proof. Let t, s [0, 1] be binary rational. Then
t =
i=0
1
(i)2
i
, s =
i=0
2
(i)2
i
, (1)
where
k
(i) = 0 or 1 and the series are actually nite sums. Let
t
k
=
k
i=0
1
(i)2
i
, s
k
=
k
i=0
2
(i)2
i
. (2)
Observe that if [t s[ 2
k
, then t
k
= s
k
or [t
k
s
k
[ = 2
k
. This
follows easily from the following picture in which [ shows numbers of type
r2
k
with integral r, the short arrow shows the set of possible values for t
and the long one the set of possible values of s.
[ [ [ [ [ [
-
t
k
-
Now let k n and [t s[ 2
k
. Write
x
t
= x
t
k
+
m=k
(x
t
m+1
x
t
m
),
write similar representation for x
s
and subtract these formulas to get
[x
t
x
s
[ [x
t
k
x
s
k
[ +
m=k
[x
t
m+1
x
t
m
[ +[x
s
m+1
x
s
m
[. (3)
Here t
k
= r2
k
for an integer r, and there are only three possibility for s
k
:
s
k
= (r1)2
k
or = r2
k
or = (r+1)2
k
. In addition, [t
m+1
t
m
[ 2
(m+1)
since, for an integer p, we have t
m
= p2
m
= (2p)2
(m+1)
and t
m+1
equals
either t
m
or t
m
+ 2
(m+1)
. Therefore, by the assumption,
[x
t
x
s
[ 2
m=k
2
ma
= 2
ka
2
a+1
(2
a
1)
1
. (4)
We have proved this inequality if
k n and [t s[ 2
k
.
It is easy to prove that, for every t and s satisfying [t s[ 2
n
, one
can take k = [log
2
(1/[t s[)] and then one has k n, [t s[ 2
k
, and
2
ka
2
a
[t s[
a
. This proves the lemma.
For integers n 0 and a > 0 denote
K
n
(a) = x
C : [x
0
[ 2
n
, [x
t
x
s
[ N(a)[t s[
a
[t s[ 2
n
.
5. Exercise*. Prove that K
n
(a) are compact sets in C.
6. Theorem. Let
t
be a continuous process and let > 0, > 0, N
(0, ) be constants such that
E[
t

s
[
N[t s[
1+
s, t [0, 1].
Then for 0 < a <
1
and for every > 0 there exists n such that
P
K
n
(a) 1 .
(observe that P
K
n
(a) makes sense by Theorem 2).
Proof. Denote
A
n
= : [
0
[ 2
n
: sup
mn
max
i=0,...,2
m
1
[
(i+1)/2
m
i/2
m[2
ma
> 1.
For , A
n
, we have
K
n
(a) by the previous lemma. Hence by
Chebyshevs inequality
P
, K
n
(a) P(A
n
) P[
0
[ 2
n

+E sup
mn
max
i=0,...,2
m
1
[
(i+1)/2
m
i/2
m[
2
ma
.
We replace the sup and the max with sums of the random variables involved
and we nd
P
, K
n
(a) P(A
n
) P[
0
[ 2
n
(5)
+
m=n
2
m
1
i=0
2
ma
E[
(i+1)/2
m
i/2
m[
P[
0
[ 2
n
+N
m=n
2
m(a)
.
It only remains to notice that the last expression tends to zero as n .
Remember that if
is a C-valued random variable, then the measure

P
B, B B(C), is called the distribution of
. From (5) and

Prokhorovs theorem we immediately get the following.
7. Theorem. Let
k
t
, k = 1, 2, 3, ..., be continuous processes on [0, 1] such
that, for some constants > 0, > 0, N (0, ), we have
E[
k
t

k
s
[
N[t s[
1+
s, t [0, 1], k 1.
Also assume that sup
k
P[
k
0
[ c 0 as c . Then the sequence of
distributions of
k
on C is relatively compact.
Lemma 4 is the main tool in proving Theorems 6 and 7. It also allows
us to prove Kolmogorovs theorem on existence of continuous modications.
If T is a set on which we are given two processes
1
t
and
2
t
such that
P(
1
t
=
2
t
) = 1 for every t T, then we call
1
a modication of
2
(and
vice versa).
8. Theorem (Kolmogorov). Let
t
be a process dened for t [0, ) such
that, for some > 0, > 0, N < , we have
E[
t

s
[
N[t s[
1+
t, s 0.
Then the process
t
has a continuous modication.
Proof. Take a = /(2) and dene
kn
= : sup
mn
max
i=0,...,k2
m
1
2
ma
[
(i+1)/2
m
i/2
m[ 1,
t
=
k1
_
n
kn
.
If
t
, then for every k 1 there exists n such that for all m n
and i = 0, ..., k2
m
1 we have
[
(i+1)/2
m()
i/2
m()[ 2
ma
.
It follows by Lemma 4 that, for
t
and every k, the function
t
() is
uniformly continuous on the set r/2
m
of binary fractions intersected with
[0, k]. By using Cauchys criterion, it is easy to prove that, for
t
and
every t [0, ), there exists
lim
r/2
m
t
r/2
m() =:

t
(),
and in addition,

t
() is continuous and

t
() =
t
() for all binary rational
t. We have dened

t
() for
t
. For ,
t
dene

t
() 0. The
process

t
is continuous, and it only remains to prove that it is a modication
of
t
.
First we claim that P(
t
) = 1. To prove this it suces to prove that
P(
n

kn
) = 1. Since (
n

kn
)
c
=
n

c
kn
and
c
kn
=
_
mn
k2
m
1
_
i=0
: [
(i+1)/2
m
i/2
m[ > 2
ma
,
we have (cf. (5))
1 P
_
_
n
kn
_
lim
n
mn
kN2
m(a)
= 0.
Thus P(
t
) = 1. Furthermore, we noticed above that

r/2
m =
r/2
m on
t
. Therefore,
P
r/2
m =
r/2
m = 1.
For other values of t, by Fatous theorem
E[
t

t
[
lim
r/2
k
t
E[
r/2
k
t
[
N lim
r/2
k
t
[r/2
k
t[
1+
= 0.
Hence P
t
=
t
= 1 for every t [0, ), and the theorem is proved.
For Gaussian processes the above results can be improved. Remember
that a random vector = (
1
, ...,
k
) with values in R
k
is called Gaussian or
normal if there exist a vector m R
k
and a symmetric nonnegative k k
matrix R = (R
ij
) such that
() := Eexp(i(, )) = exp(i(, m) (R, )/2) R
k
,
where
(, ) =
k
i=1
i
is the scalar product in R
k
and
(R, ) =
k
i,j=1
R
ij
j
.
In this case one also writes N(m, R). One knows that
m = E, R
ij
= E(
i
m
i
)(
j
m
j
),
so that m is the mean value of and R is its covariance matrix. It is known
that linear transformations of Gaussian vectors are Gaussian. In particular,
(
2
,
1
,
3
, ...,
k
) is Gaussian.
9. Denition. A real-valued process
t
is called Gaussian if all its nite-
dimensional distributions are Gaussian. The function m
t
= E
t
is called the
mean value function of
t
and R(t, s) = cov (
t
,
s
) = E(
t
m
t
)(
s
m
s
) is
called the covariance function of
t
.
10. Remark. Very often it is useful to remember that (x
t
1
, ..., x
t
k
) is a k-
dimensional Gaussian vector if an only if, for arbitrary constants c
1
, ..., c
k
,
the random variable

i
c
i
x
t
i
is Gaussian.
11. Exercise. Let x
t
be a real-valued function dened on [0, 1] (indepen-
dent of ). Let g(x) be a nonnegative increasing function dened on (0, 1/2]
and such that
G(x) =
_
x
0
y
1
g(y) dy
is nite for every x [0, 1/2]. Assume that there exists an integer n 3
such that
[x
(i+1)/2
m x
i/2
m[ g(2
m
)
for all m n and 0 i 2
m
1. By analyzing the proof of Lemma 4, show
that for all binary rational numbers t, s [0, 1] satisfying [t s[ 2
n
we
have
[x
t
x
s
[ NG(4[t s[), N = 2/ ln 2 .
12. Exercise. Let be a normal random variable with zero mean and
variance less than or equal to
2
, where > 0. Prove that, for every x > 0,
2 P([[ x) 2x
1
exp(x
2
/(2
2
)).
13. Exercise. Let
t
be a Gaussian process with zero mean given on [0, 1]
and satisfying E[
t

s
[
2
R([t s[), where R is a continuous function
dened on (0, 1]. Denote g(x) =
_
R(x)(ln x) and suppose that g satises
the assumptions of Exercise 11. For a constant a >
2 and n 3 dene
n
= : sup
mn
max
i=0,...,2
m
1
[
(i+1)/2
m()
i/2
m()[/g(2
m
) a,
t
=
n3
n
. Notice that
c
n
=
_
m=n
2
m
1
_
i=0
_
: [
(i+1)/2
m()
i/2
m()[ > ag(2
m
)
_
and, by using Exercise 12, prove that
P(
c
n
) N
1
mn
2
m
_
R(2
m
)
g(2
m
)
exp
_
a
2
g
2
(2
m
)
2R(2
m
)
_
= N
2
mn
1
m
2
m(1a
2
/2)
,
where the N
i
are independent of n. Conclude that P(
t
) = 1. By using
Exercise 11, derive from here that
t
has a continuous modication. In
particular, prove that, if
E[
t

s
[
2
N(ln [t s[)
p
t, s [0, 1], [t s[ 1/2
with a constant N and p > 3, then
t
14. Exercise. Let
t
be a process satisfying the assumptions in Exercise
13 and let

t
be its continuous modication. Prove that, for almost every
, there exists n 1 such that for all t, s [0, 1] satisfying [t s[ 2
n
we
have
[
s
[ 8G(4[t s[).
_
G(x) =
_
x
0
y
1
g(y) dy
_
Sometimes one needs the following multidimensional version of Kol-
mogorovs theorem. To prove it we rst generalize Lemma 4. Denote
by Z
d
n
the lattice in [0, 1]
d
consisting of all points (k
1
2
n
, ..., k
d
2
n
) where
k
i
= 0, 1, 2, ..., 2
n
. Also let
[[t s[[ = max[t
i
s
i
[ : i = 1, ..., d.
15. Lemma. Let d 1 be an integer and x
t
a real-valued function dened
for t [0, 1]
d
. Assume that there exist a > 0 and an integer n 0 such that
m n, t, s Z
d
m
, [[t s[[ 2
m
= [x
t
x
s
[ 2
ma
.
Then, for every t, s
m
Z
d
m
satisfying [[t s[[ 2
n
we have
[x
t
x
s
[ N(a)[[t s[[
a
.
Proof. Let t, s Z
d
m
, t = (t
1
, ..., t
d
), s = (s
1
, ..., s
d
). Represent t
j
and s
j
as (cf. (1))
t
j
=
i=0
j
1
2
i
, s
j
=
i=0
j
2
2
i
,
dene t
j
k
and s
j
k
as these sums for i k (cf. (2)), and let
t
k
= (t
1
k
, ..., t
d
k
), s
k
= (s
1
k
, ..., s
d
k
).
Then [[t s[[ 2
k
implies [t
j
k
s
j
k
[ 2
k
, and as in Lemma 4 we get
t
k
, s
k
Z
d
k
, [t
j
k
s
j
k
[ 2
k
and [[t
k
s
k
[[ 2
k
. We use (3) again and the
fact that, as before, t
m+1
, t
m
Z
d
m+1
, [[t
m+1
t
m
[[ 2
(m+1)
. Then we get
(4) again and nish the proof by the same argument as before. The lemma
is proved.
Now we prove a version of Theorem 6. For an integer n 0 denote
n
(a) =x
: x
t
is a real-valued function given on [0, 1]
d
such that
[x
t
x
s
[ N(a)[[t s[[
a
for all t, s
_
m
Z
d
m
with [[t s[[ 2
n
.
16. Lemma. Let a random eld
t
be dened on [0, 1]
d
. Assume that there
exist constants > 0, > 0, K < such that
E[
t

s
[
K[[t s[[
d+
provided t, s [0, 1]
d
. Then, for every 0 < a < /,
P
,
n
(a) 2
n(a)
KN(d, , , a).
Ch 1 Section 5. Hints to exercises 25
Proof. Let
A
n
=
_
: sup
mn
sup2
ma
[
t

s
[ : t, s Z
d
m
, [[t s[[ 2
m
> 1
_
.
For , A
n
we get
()
n
(a) by Lemma 15. Hence, P
,
n
(a)
P(A
n
). The probability of A
n
we again estimate by Chebyshevs inequality
and estimate the power of the sup through the sum of powers of the
random variables involved. For each m the number of these random variables
is not greater than the number of couples t, s Z
d
m
for which [[t s[[ 2
m
(and the number of disjoint ones is less than half this number). This number
is not bigger than the number of points in Z
d
m
times 3
d
, the latter being the
number of neighbors of t. Hence
P(A
n
)
m=n
(1 + 2
m
)
d
3
d
K2
ma
2
m(d+)
6
d
K
m=n
2
m(a)
= 2
n(a)
K6
d
(1 2
(a)
)
1
.
The lemma is proved.
17. Theorem (Kolmogorov). Under the conditions of Lemma 16 the ran-
dom eld
t
Proof. By Lemma 16, with probability one,
belongs to one of the sets
n
(a). The elements of these sets are uniformly continuous on

m
Z
d
m
and
therefore can be redened outside

m
Z
d
m
to become continuous on [0, 1]
d
.
Hence, with probability one there exists a continuous function

t
coinciding
with
t
on

m
Z
d
m
. To nish the proof it suces to repeat the end of the
proof of Theorem 8. The theorem is proved.
5. Hints to exercises
1.7 It suces to prove that A
c
T
P
if P(A) = 0.
2.6 To prove (i)=(ii), observe that, for every k 1, in the 1/k-neighbor-
hood of a point from a 1/k-net there are innitely many elements of x
n
,
which allows one to choose a Cauchy subsequence. To prove (ii)=(i),
assume that for an > 0 there is no nite -net, and nd a sequence of
x
n
K such that (x
n
, x
m
) /3 for all n, m.
2.10 Assume the contrary.
2.14 Observe that N
b
(n) is the number of i = 1, ..., n such that 10
k
b 2
i
<
10
k
(b + 1) for some k = 0, 1, 2, ..., and then take log
10
.
2.15 Dene
f(x) = lim
0
sup
y:[yx[<
f(y), f(x) = lim
0
inf
y:[yx[<
f(y)
and prove that
f
=
f ,= f and the sets x :

f(x) < c and x : f(x) > c
are open.
3.3 Attached points t
1
, ..., t
n
and n may vary and t
1
, ..., t
n
are not supposed
to be distinct.
3.4 Show that the set of all such A is a -eld.
4.12 Let
2
= E
2
. Observe that P([[ x) = P([/[ x/). Then in
the integral
_
x/
exp(y
2
/2) dy rst replace with and after that divide
and multiply the integrand by y.
Chapter 2
The Wiener Process
1. Brownian motion and the Wiener process
Robert Brown, an English botanist, observed (1828) that pollen grains sus-
pended in water perform an unending chaotic motion. L. Bachelier (1900)
derived the law governing the position w
t
at time t of a single grain perform-
ing a one-dimensional Brownian motion starting at a R at time t = 0:
P
a
w
t
dx = p(t, a, x) dx, (1)
where
p(t, a, x) =
1
2t
e
(xa)
2
/(2t)
is the fundamental solution of the heat equation
u
t
=
1
2
2
u
a
2
.
Bachelier (1900) also pointed out the Markovian nature of the Brownian
path and used it to establish the law of maximum displacement
P
a
max
st
w
s
b =
2
2t
_
b
0
e
x
2
/(2t)
dx, t > 0, b 0.
Einstein (1905) also derived (1) from statistical mechanics considerations
and applied it to the determination of molecular diameters. Bachelier was
unable to obtain a clear picture of the Brownian motion, and his ideas were
27
28 Chapter 2. The Wiener Process, Sec 1
unappreciated at the time. This is not surprising, because the precise math-
ematical denition of the Brownian motion involves a measure on the path
space, and even after the ideas of Borel, Lebesgue, and Daniell appeared,
N. Wiener (1923) only constructed a Daniell integral on the path space
which later was revealed to be the Lebesgue integral against a measure, the
so-called Wiener measure.
The simplest model describing movement of a particle subject to hits by
much smaller particles is the following. Let
k
, k = 1, 2, ..., be independent
identically distributed random variables with E
k
= 0 and E
2
k
= 1. Fix
an integer n, and at times 1/n, 2/n, ... let our particle experience instant
displacements by
1
n
1/2
,
2
n
1/2
, .... At moment zero let our particle be
at zero. If
S
k
:=
1
+... +
k
,
then at moment k/n our particle will be at the point S
k
/
n and will stay

there during the time interval [k/n, (k +1)/n). Since real Brownian motion
has continuous paths, we replace our piecewise constant trajectory by a
continuous piecewise linear one preserving its positions at times k/n. Thus
we come to the process
n
t
:= S
[nt]
/
n + (nt [nt])
[nt]+1
/
n. (2)
This process gives a very rough caricature of Brownian motion. Clearly,
to get a better model we have to let n . By the way, precisely this
necessity dictates the intervals of time between collisions to be 1/n and the
displacements due to collisions to be
k
/
n, since then
n
t
is asymptotically
normal with parameters (0, 1).
It turns out that under a very special organization of randomness, which
generates dierent
k
; k 1 for dierent n, one can get the situation where
the
n
t
converge for each uniformly on each nite interval of time. This
is a consequence of a very general result due to Skorokhod. We do not use
this result, conning ourselves to the weak convergence of the distributions
of
n
.
1. Lemma. The sequence of distributions of
n
in C is relatively compact.
Proof. For simplicity we assume that m
4
:= E
4
k
< , referring the
reader to [Bi] for the proof in the general situation. Since
n
0
= 0, by
Theorem 1.4.7 it suces to prove that
E[
n
t

n
s
[
4
N[t s[
2
s, t [0, 1], (3)
Ch 2 Section 1. Brownian motion and the Wiener process 29
where N is independent of n, t, s.
Without loss of generality, assume that s < t. Denote a
n
= E(S
n
)
4
.
By virtue of the independence of the
k
and the conditions E
k
= 0 and
E
2
k
= 1, we have
a
n+1
= E(S
n
+
n+1
)
4
= a
n
+ 4ES
3
n
n+1
+ 6ES
2
n
2
n+1
+ 4ES
n
3
n+1
+m
4
= a
n
+ 6n +m
4
.
Hence (for instance, by induction),
a
n
= 3n(n 1) +nm
4
3n
2
+nm
4
.
Furthermore, if s and t belong to the same interval [k/n, (k + 1)/n], then
[
n
t

n
s
[ =

n[
k+1
[ [t s[,
E[
n
t

n
s
[
4
= n
2
m
4
[t s[
4
m
4
[t s[
2
. (4)
Now, consider the following picture, where s and t belong to dierent
intervals of type [k/n, (k + 1)/n) and by crosses we denote points of type
k/n:

s
1
t
1
t s
[ [
Clearly
s
1
s t s, t t
1
t s, t
1
s
1
t s, (t
1
s
1
)/n (t
1
s
1
)
2
,
s
1
= ([ns] + 1)/n, t
1
= [nt]/n, [nt] ([ns] + 1) = n(t
1
s
1
).
Hence and from (4) and the inequality (a +b +c)
4
81(a
4
+b
4
+c
4
) we
conclude that
E[
n
t

n
s
[
4
81E([
n
t

n
t
1
[
4
+[
n
t
1

n
s
1
[
4
+[
n
s
1

n
s
[
4
)
162(t s)
2
m
4
+ 81E[S
[nt]
/
n S
[ns]+1
/
n[
4
= 162(t s)
2
m
4
+ 81n
2
a
[nt]([ns]+1)
162(t s)
2
m
4
+ 243(t s)
2
+ 81(t
1
s
1
)m
4
/n 243(m
4
+ 1)[t s[
2
.
Thus for all positions of s and t we have (3) with N = 243(m
4
+ 1). The
lemma is proved.
Remember yet another denition from probability theory. We say that a
sequence
n
, n 1, of R
k
-valued random variables is asymptotically normal
with parameters (m, R) if F
n converges weakly to the Gaussian distribution

with parameters (m, R) (by F
we denote the distribution of a random vari-

able ). Below we use the fact that the weak convergence of distributions is
equivalent to the pointwise convergence of their characteristic functions.
2. Lemma. For every 0 t
1
< t
2
< ... < t
k
1 the vectors (
n
t
1
,
n
t
2
, ...,
n
t
k
)
are asymptotically normal with parameters (0, (t
i
t
j
)).
Proof. We only consider the case k = 2. Other ks are treated similarly.
We have
n
t
1
+
2
n
t
2
= (
1
+
2
)S
[nt
1
]
/
n +
2
(S
[nt
2
]
S
[nt
1
]+1
)/
n
+
[nt
1
]+1
(nt
1
[nt
1
])
1
/
n +
2
/
n +
[nt
2
]+1
(nt
2
[nt
2
])
2
/
n.
On the right, we have a sum of independent terms. In addition, the coe-
cients of
[nt
1
]+1
and
[nt
2
]+1
go to zero and
Eexp(ia
n
[nt]+1
) = E exp(ia
n
1
) 1 as a
n
0.
Finally, by the central limit theorem, for () = Eexp(i
1
),
lim
n
n
(/
n) = e
2
/2
.
Hence,
lim
n
Ee
i(
1
n
t
1
+
2
n
t
2
)
= lim
n
_
(
1
/
n+
2
/
n)
_
[nt
1
]
_
(
2
/
n)
_
[nt
2
][nt
1
]1
= exp((
1
+
2
)
2
t
1
+
2
2
(t
2
t
1
))/2
= exp(
2
1
(t
1
t
1
) + 2
1
2
(t
1
t
2
) +
2
2
(t
2
t
2
))/2.
3. Theorem (Donsker). The sequence of distributions F
weakly converges
on C to a measure. This measure is called the Wiener measure.
Ch 2 Section 1. Brownian motion and the Wiener process 31
Proof. Owing to Lemma 1, there is a sequence n
i
such that F
n
i
converges weakly to a measure . By Exercise 1.2.10 it only remains to

prove that the limit is independent of the choice of subsequences.
Let F
m
i
be another weakly convergent subsequence and its limit. Fix

0 t
1
< t
2
< ... < t
k
1 and dene a continuous function on C by the
formula (x
) = (x
t
1
, ..., x
t
k
). By Lemma 2, considering as a random
element on (C, B(C), ), for every bounded continuous f(x
1
, ..., x
k
), we get
_
R
k
f(x
1
, ..., x
k
)
1
(dx) =
_
C
f(x
t
1
, ..., x
t
k
) (dx
)
= lim
i
_
C
f(x
t
1
, ..., x
t
k
) F
n
i
(dx
) = lim
i
Ef(
n
i
t
1
, ...,
n
i
t
k
) = Ef(
1
, ...,
k
),
where (
1
, ...,
k
) is a random vector normally distributed with parameters
(0, t
i
t
j
). One gets the same result considering m
i
instead of n
i
. By Theo-
rem 1.2.4, we conclude that
1
=
1
. This means that for every Borel
B
(k)
R
k
the measures and coincide on the set x
: (x
t
1
, ..., x
t
k
)
B
(k)
. The collection of all such sets (with varying k, t
1
, ..., t
k
) is an alge-
bra. By a result from measure theory, a measure on a -eld is uniquely
determined by its values on an algebra generating the -eld. Thus =
on B(C), and the theorem is proved.
Below we will need the conclusion of the last argument from the above
proof, showing that there can be only one measure on B(C) with given
values on nite dimensional cylinder subsets of C.
4. Remark. Since Gaussian distributions are uniquely determined by their
means and covariances, nite-dimensional distributions of Gaussian pro-
cesses are uniquely determined by mean value and covariance functions.
Hence, given a continuous Gaussian process
t
, its distribution on (C, B(C))
is uniquely determined by the functions m
t
and R(s, t).
5. Denition. By a Wiener process we mean a continuous Gaussian pro-
cess on [0, 1] with m
t
= 0 and R(s, t) = s t.
As follows from above, the distributions of all Wiener processes on
(C, B(C)) coincide if the processes exist at all.
6. Exercise*. Prove that if w
t
is a Wiener process on [0, 1] and c is a
constant with c 1, then cw
t/c
2 is also a Wiener process on [0, 1]. This
property is called self-similarity of the Wiener process.
7. Theorem. There exists a Wiener process, and its distribution on
(C, B(C)) is the Wiener measure.
Proof. Let be the Wiener measure. On the probability space (C, B(C),
) dene the process w
t
(x
) = x
t
. Then, for every 0 t
1
< ... < t
k
1 and
continuous bounded f(x
1
, ..., x
k
), as in the proof of Donskers theorem, we
have
Ef(w
t
1
, ..., w
t
k
) =
_
C
f(x
t
1
, ..., x
t
k
) (dx
)
= lim
n
Ef(
n
t
1
, ...,
n
t
k
) = Ef(
1
, ...,
k
),
where is a Gaussian vector with parameters (0, (t
i
t
j
)). Since f is arbi-
trary, we see that the distribution of (w
t
1
, ..., w
t
k
) and (
1
, ...,
k
) coincide,
and hence (w
t
1
, ..., w
t
k
) is Gaussian with parameters (0, (t
i
t
j
)). Thus, w
t
is a Gaussian process, Ew
t
i
= 0, and R(t
i
, t
j
) = Ew
t
i
w
t
j
= E
i
j
= t
i
t
j
.
This theorem and the remark before it show that the limit in Donskers
theorem is independent of the distributions of the
k
as long as E
k
= 0
and E
2
k
= 1. In this framework Donskers theorem is called the invariance
principle (although there is no more invariance in this theorem than in
the central limit theorem).
2. Some properties of the Wiener process
First we prove two criteria for a process to be a Wiener process.
1. Theorem. A continuous process on [0, 1] is a Wiener process if and only
if
(i) w
0
= 0 (a.s.),
(ii) w
t
w
s
is normal with parameters (0, [t s[) for every s, t [0, 1],
(iii) w
t
1
, w
t
2
w
t
1
, ...w
t
n
w
t
n1
are independent for every n 2 and
0 t
1
t
2
... t
n
1.
Proof. First assume that w
t
is a Wiener process. We have w
0
N(0, 0),
hence w
0
= 0 (a.s.). Next take 0 t
1
t
2
... t
n
1 and let
1
= w
t
1
,
2
= w
t
2
w
t
1
, ...,
n
= w
t
n
w
t
n1
.
The vector = (
1
, ...,
n
) is a linear transform of (w
t
1
, ..., w
t
n
). There-
fore is Gaussian. In particular
i
and, generally, w
t
w
s
are Gaussian.
Obviously, E
i
= 0 and, for i > j,
E
i
j
= E(w
t
i
w
t
i1
)(w
t
j
w
t
j1
) = Ew
t
i
w
t
j
Ew
t
i1
w
t
j
Ew
t
i
w
t
j1
+Ew
t
i1
w
t
j1
= t
j
t
j
t
j1
+t
j1
= 0.
Ch 2 Section 2. Some properties of the Wiener process 33
Similarly, the equality Ew
t
w
s
= s t implies that E[w
t
w
s
[
2
= [t s[.
Thus w
t
w
s
N(0, [t s[), and we have proved (ii). In addition
i

N(0, t
i
t
i1
), E
2
i
= t
i
t
i1
, and
E expi
k
= exp
1
2
k,r
r
cov (
k
,
r
)
= exp
1
2
2
k
(t
k
t
k1
) =
k
E expi
k
k
.
This proves (iii).
Conversely, let w
t
be a continuous process satisfying (i) through (iii).
Again take 0 t
1
t
2
... t
n
1 and the same
i
s. From (i) through
(iii), it follows that (
1
, ...,
n
) is a Gaussian vector. Since (w
t
1
, ..., w
t
n
) is a
linear function of (
1
, ...,
n
), (w
t
1
, ..., w
t
n
) is also a Gaussian vector; hence
w
t
is a Gaussian process. Finally, for every t
1
, t
2
[0, 1] satisfying t
1
t
2
,
we have
m
t
1
= E
1
= 0, R(t
1
, t
2
) = R(t
2
, t
1
) = Ew
t
1
w
t
2
= E
1
(
1
+
2
)
= E
2
1
= t
1
= t
1
t
2
.
2. Theorem. A continuous process on [0, 1] is a Wiener process if and only
if
(i) w
0
= 0 (a.s.),
(ii) w
t
w
s
is normal with parameters (0, [t s[) for every s, t [0, 1],
(iii) for every n 2 and 0 t
1
t
2
... t
n
1, the random variable
w
t
n
w
t
n1
is independent of w
t
1
, w
t
2
, ...w
t
n1
.
Proof. It suces to prove that properties (iii) of this and the previous
theorems are equivalent under the condition that (i) and (ii) hold. We are
going to use the notation from the previous proof. If (iii) of the present
theorem holds, then
Eexpi
n
k=1
k
= E expi
n
n
E expi
n1
k=1
k
,
since (
1
, ...,
n1
) is a function of (w
t
1
, ..., w
t
n1
). By induction,
Eexpi
n
k=1
k
=
k
E expi
k
k
.
This proves property (iii) of the previous theorem. Conversely if (iii) of the
previous theorem holds, then one can carry out the same computation in
the opposite direction and get that
n
is independent of (
1
, ...,
n1
) and of
(w
t
1
, ..., w
t
n1
), since the latter is a function of the former. The theorem is
proved.
3. Theorem (Bachelier). For every t (0, 1] we have max
st
w
s
[w
t
[,
which is to say that for every x 0
Pmax
st
w
s
x =
2
2t
_
x
0
e
y
2
/(2t)
dy.
Proof. Take independent identically distributed random variables
k
so
that P(
k
= 1) = P(
k
= 1) = 1/2, and dene
n
t
by (1.2). First we want
to nd the distribution of
n
= max
[0,1]
n
t
= n
1/2
max
kn
S
k
.
Observe that, for each n, the sequence (S
1
, ..., S
n
) takes its every par-
ticular value with the same probability 2
n
. In addition, for each integer
i > 0, the number of sequences favorable for the events
max
kn
S
k
i, S
n
< i and max
kn
S
k
i, S
n
> i (1)
is the same. One proves this by using the reection principle; that is, one
takes each sequence favorable for the rst event, keeps it until the moment
when it reaches the level i and then reects its remaining part about this
level. This implies equality of the probabilities of the events in (1). Further-
more, due to the fact that i is an integer, we have
n
in
1/2
,
n
1
< in
1/2
= max
kn
S
k
i, S
n
< i
and
n
in
1/2
,
n
1
> in
1/2
= max
kn
S
k
i, S
n
> i.
Hence,
P
n
in
1/2
,
n
1
< in
1/2
= P
n
in
1/2
,
n
1
> in
1/2
.
Moreover, obviously,
P
n
in
1/2
,
n
1
> in
1/2
= P
n
1
> in
1/2
,
P
n
in
1/2
= P
n
in
1/2
,
n
1
> in
1/2
+P
n
in
1/2
,
n
1
< in
1/2
+P
n
1
= in
1/2
.
It follows that
P
n
in
1/2
= 2P
n
1
> in
1/2
+P
n
1
= in
1/2
(2)
for every integer i > 0. The last equality also obviously holds for i = 0. We
see that for numbers a of type in
1/2
, where i is a nonnegative integer, we
have
P
n
a = 2P
n
1
> a +P
n
1
= a. (3)
Certainly, the last probability goes to zero as n since
n
1
is asymp-
totically normal with parameters (0, 1). Also, keeping in mind Donskers
theorem, it is natural to think that
Pmax
s1
n
s
a Pmax
s1
w
s
a, 2P
n
1
> a 2Pw
1
> a.
Therefore, (3) naturally leads to the conclusion that
Pmax
s1
w
s
a = 2Pw
1
> a = P[w
1
[ > a a 0,
and this is our statement for t = 1.
To justify the above argument, notice that (2) implies that
P
n
= in
1/2
= P
n
in
1/2
P
n
(i + 1)n
1/2
= 2P
n
1
= (i + 1)n
1/2
+P
n
1
= in
1/2
P
n
1
= (i + 1)n
1/2
= P
n
1
= (i + 1)n
1/2
+P
n
1
= in
1/2
, i 0.
Now for every bounded continuous function f(x) which vanishes for x < 0
we get
Ef(
n
) =
i=0
f(in
1/2
)P
n
= in
1/2
= Ef(
n
1
n
1/2
) +Ef(
n
1
).
By Donskers theorem and by the continuity of the function x
max
[0,1]
x
t
we have
Ef(max
[0,1]
w
t
) = 2Ef(w
1
) = Ef([w
1
[).
We have proved our statement for t = 1. For smaller t one uses Exercise 1.6,
saying that cw
s/c
2 is a Wiener process for s [0, 1] if c 1. The theorem is
proved.
4. Theorem (on the modulus of continuity). Let w
t
be a Wiener process
on [0, 1], 1/2 > > 0. Then for almost every there exists n 0 such that
for each s, t [0, 1] satisfying [t s[ 2
n
, we have
[w
t
w
s
[ N[t s[
1/2
,
where N depends only on . In particular, [w
t
[ = [w
t
w
0
[ Nt
1/2
for
t 2
n
.
Proof. Take a number > 2 and denote = /2 1. Let N(0, 1).
Since w
t
w
s
N(0, [t s[), we have w
t
w
s
[t s[
1/2
. Hence
E[w
t
w
s
[
= [t s[
/2
E[[
= N
1
()[t s[
1+
.
Next, let
K
n
(a) = x
C : [x
0
[ 2
n
, [x
t
x
s
[ N(a)[t s[
a
[t s[ 2
n
.
By Theorem 1.4.6, for 0 < a <
1
, we have
Pw
_
n=1
K
n
(a) = 1.
Therefore, for almost every there exists n 0 such that for all s, t [0, 1]
satisfying [t s[ 2
n
, we have [w
t
() w
s
()[ N(a)[t s[
a
. It only
remains to observe that we can take a = 1/2 if from the very beginning
we take > 1/ (for instance = 2/). The theorem is proved.
5. Exercise. Prove that there exists a constant N such that for almost
every there exists n 0 such that for each s, t [0, 1] satisfying [t s[
2
n
, we have
[w
t
w
s
[ N
_
[t s[(ln [t s[),
The result of Exercise 5 is not far from the best possible. P. Levy proved
that
lim
0s<t1
u=ts0
[w
t
w
s
[
_
2u(ln u)
= 1 (a.s.).
6. Theorem (on quadratic variation). Let 0 = t
0n
t
1n
... t
k
n
n
= 1
be a sequence of partitions of [0, 1] such that max
i
(t
i+1,n
t
in
) 0 as
n . Also let 0 s t 1. Then, in probability as n ,
st
in
t
i+1,n
t
(w
t
i+1,n
w
t
in
)
2
t s. (4)
Proof. Let
n
:=
st
in
t
i+1,n
t
(w
t
i+1,n
w
t
in
)
2
and observe that
n
is a sum of independent random variables. Also use that
if N(0,
2
), then = , where N(0, 1), and Var
2
=
4
Var
2
.
Then, for N := Var , we obtain
Var
n
=

st
in
t
i+1,n
t
Var [(w
t
i+1,n
w
t
in
)
2
] = N
st
in
t
i+1,n
t
(t
i+1,n
t
in
)
2
N max
i
(t
i+1,n
t
in
)

0t
in
t
i+1,n
1
(t
i+1,n
t
in
) = N max
i
(t
i+1,n
t
in
) 0.
In particular,
n
E
n
0 in probability. In addition,
E
n
=

st
in
t
i+1,n
t
(t
i+1,n
t
in
) t s.
Hence
n
(t s) =
n
E
n
+ E
n
(t s) 0 in probability, and the
theorem is proved.
7. Exercise. Prove that if t
in
= i/2
n
, then the convergence in (4) holds
almost surely.
8. Corollary. It is not true that there exist functions () and N() such
that with positive probability () > 0, N() < , and
[w
t
() w
s
()[ N()[t s[
1/2+()
whenever t, s [0, 1] and [t s[ ().
Indeed, if [w
t
() w
s
()[ N()[t s[
1/2+()
for [t s[ suciently
small, then
i
(w
t
i+1,n
() w
t
in
())
2
N
2
i
(t
i+1,n
t
in
)
1+2
0.
9. Corollary. PVar
[0,1]
w
t
= = 1.
This follows from the fact that, owing to the continuity of w
t
,
i
(w
t
i+1,n
() w
t
in
())
2
max
i
[w
t
i+1,n
() w
t
in
()[Var
[0,1]
w
t
() 0
if Var
[0,1]
w
t
() < .
10. Exercise. Let w
t
be a one-dimensional Wiener process. Find
Pmax
s1
w
s
b, w
1
a.
The following exercise is a particular case of the Cameron-Martin the-
orem regarding the process w
t

_
t
0
f
s
ds with nonrandom f. Its extremely
powerful generalization for random f is known as Girsanovs Theorem 6.8.8.
11. Exercise. Let w
t
be a one-dimensional Wiener process on a probability
space (, T, P). Prove that
Ee
w
t
t/2
= 1.
Introduce a new measure by Q(d) = e
w
1
1/2
P(d). Prove that (, T, Q)
is a probability space, and that w
t
t, t [0, 1], is a Wiener process on
(, T, Q).
12. Exercise. By using the results in Exercise 11 and the fact that the
distributions on (C, B(C)) of Wiener processes coincide, show that
Pmax
s1
[w
s
+s] a = Ee
w
1
1/2
I
max
s1
w
s
a
.
Then by using the result in Exercise 10, compute the last expectation.
Unboundedness of the variation of Wiener trajectories makes it hard to
justify the following argument. In real situations the variance of Brownian
motion of pollen grains should depend on the water temperature. If the
temperature is piecewise constant taking constant value on each interval of
a partition 0 t
1
< t
2
< ... < t
n
= 1, then the trajectory can be modeled
by
t
i+1
t
(w
t
i+1
w
t
i
)f
i
+ (w
t
w
t
k
)f
k
,
where k = maxi : t
i
t and the factor f
i
reects the dependence of the
variance on temperature for t [t
i
, t
i+1
). The diculty comes when one
tries to pass from piecewise constant temperatures to continuously changing
ones, because the sum should converge to an integral against w
t
as we make
partitions ner and ner. On the other hand, the integral against w
t
is not
dened since the variation of w
t
is innite for almost each . Yet there is a
Ch 2 Section 3. Integration against random orthogonal measures 39
rather narrow class of functions f, namely functions of bounded variation,
for which one can dene the Riemann integral against w
t
pathwise (see
Theorem 3.22). For more general functions one denes the integral against
w
t
in the mean-square sense.
3. Integration against random orthogonal measures
The reader certainly knows the basics of the theory of L
p
spaces, which can
be found, for instance, in [Du] and which we only need for p = 1 and p = 2.
Our approach to integration against random orthogonal measures requires
a version of this theory which starts with introducing step functions using
not all measurable sets but rather some collection of them. Actually, the
version is quite parallel to the usual theory, and what follows below should
be considered as just a reminder of the general scheme of the theory of L
p
spaces.
Let X be a set, some family of subsets of X, A a -algebra of subsets
of X, and a measure on (X, A). Suppose that A and
0
:= :
() < ,= . Let S() = S(, ) denote the set of all step functions,
that is, functions
n
i=1
c
i
I
(i)
(x),
where c
i
are complex numbers, (i)
0
(not !), n < is an integer. For
p [1, ), let L
p
(, ) denote the set of all A
-measurable complex-valued
functions f on X for each of which there exists a sequence f
n
S() such
that
_
X
[f f
n
[
p
(dx) 0 as n . (1)
A sequence f
n
S() that satises (1) will be called a dening sequence
for f. From the convexity of [t[
p
, we infer that [a+b[
p
2
p1
[a[
p
+2
p1
[b[
p
,
[f[
p
2
p1
[f
n
[
p
+ 2
p1
[f f
n
[
p
and therefore, if f L
p
(, ), then
|f|
p
:=
__
X
[f[
p
(dx)
_
1/p
< . (2)
The expression |f|
p
is called the L
p
norm of f. For p = 2 it is also
useful to dene the scalar product (f, g) of elements f, g L
2
(, ):
(f, g) :=
_
X
f g (dx). (3)
This integral exists and is nite, since [f g[ [f[
2
+[g[
2
. The expression
|f g|
p
denes a distance in L
p
(, ) between the elements f, g L
p
(, ).
It is almost a metric on L
p
(, ), in the sense that, although the equality
|f g|
p
= 0 implies that f = g only almost everywhere with respect to ,
nevertheless |f g|
p
= |g f|
p
and the triangle inequality holds:
_
_
f +g
_
_
p

_
_
f
_
_
p
+
_
_
g
_
_
p
.
If f
n
, f L
p
(, ) and |f
n
f|
p
0 as n , we will naturally say
that f
n
converges to f in L
p
(, ). If |f
n
f
m
|
p
0 as n, m , we will
call f
n
a Cauchy sequence in L
p
(, ). The following results are useful. For
their proofs we refer the reader to [Du].
1. Theorem. (i) If f
n
is a Cauchy sequence in L
p
(, ), then there exists
a subsequence f
n(k)
such that f
n(k)
has a limit -a.e. as k .
(ii) L
p
(, ) is a linear space, that is, if a, b are complex numbers and
f, g L
p
(, ), then af +bg L
p
(, ).
(iii) L
p
(, ) is a complete space, that is, for every Cauchy sequence
f
n
L
p
(, ), there exists an A-measurable function f for which (1) is
true; in addition, every A
-measurable function f that satises (1) for some

sequence f
n
L
p
(, ) is an element of L
p
(, ).
2. Exercise*. Prove that if is a -eld, then L
p
(, ) is simply the set
of all
-measurable functions f that satisfy (2).

3. Exercise. Prove that if
0
consists of only one set , then L
p
(, ) is
the set of all functions -almost everywhere equal to a constant times the
indicator of .
4. Exercise. Prove that if (X, A, ) = ([0, 1], B[0, 1], ) and = (0, t] :
t (0, 1), then L
p
(, ) is the space of all Lebesgue measurable functions
summable to the pth power on [0, 1].
We now proceed to the main contents of this section. Let (, T, P) be
a probability space and suppose that to every
0
there is assigned a
random variable () = (, ).
5. Denition. We say that is a random orthogonal measure with reference
measure if (a) E[()[
2
< for every
0
, (b) E(
1
)
(
2
) =
(
1

2
) for all
1
,
2

0
.
6. Example. If (X, A, ) = (, T, P) and = A, then () := I
is a
random orthogonal measure with reference measure . In this case, for each
, is just the Dirac measure concentrated at .
Generally, random orthogonal measures are not measures for each ,
because they need not even be dened on a -eld. Actually, the situation
is even more interesting, as the reader will see from Exercise 21.
7. Example. Let w
t
be a Wiener process on [0, 1] and
(X, A, ) = ([0, 1], B([0, 1]), ).
Let = [0, t] : t (0, 1] and, for each = [0, t] , let () = w
t
.
Then, for
i
= [0, t
i
] , we have
E(
1
)(
2
) = Ew
t
1
w
t
2
= t
1
t
2
= (
1

2
),
which shows that is a random orthogonal measure with reference mea-
sure .
8. Exercise*. Let
n
be a sequence of independent random variables ex-
ponentially distributed with parameter 1. Dene a sequence of random
variables
n
=
1
+... +
n
and the corresponding counting process
t
=
n=1
I
[
n
,)
(t).
Observe that
t
is a function of locally bounded variation (at least for almost
all ), so that the usual integral against d
t
is well dened: if f vanishes
outside a nite interval, then
_

0
f(t) d
t
=
n=1
f(
n
).
Prove that, for every bounded continuous real-valued function f given on R
and having compact support and every s R,
(s) := Eexpi
_

0
f(s +t) d
t
= exp(
_

0
(e
if(s+t)
1) dt).
Conclude from here that
t

s
has Poisson distribution with parameter
[t s[. In particular, prove E
t
= t and E(
t
t)
2
= t. Also prove that
t
is a process with independent increments, that is,
t
2

t
1
, ...,
t
k+1

t
k
are independent as long as the intervals (t
j
, t
j+1
] are disjoint. The process
t
is called a Poisson process with parameter 1.
9. Example. Take the Poisson process
t
from Exercise 8. Denote m
t
=
t
t. If 0 s t, then
Em
s
m
t
= Em
2
s
+Em
s
(m
t
m
s
) = Em
2
s
= s = s t.
Therefore, if in Example 7 we replace w
t
with
t
, we again have a random
orthogonal measure with reference measure .
We will always assume that satises the assumptions of Denition 5.
Note that by Exercise 2 we have () L
2
(T, P) for every
0
. The
word orthogonal in Denition 5 comes from the fact that if
1

2
= ,
then (
1
) (
2
) in the Hilbert space L
2
(T, P). The word measure is
explained by the property that if ,
i

0
, the
i
s are pairwise disjoint,
and =

i
i
, then () =

i
(
i
), where the series converges in the
mean-square sense. Indeed,
lim
n
E[()
in
(
i
)[
2
= lim
n
[E[()[
2
+
in
E[(
i
)[
2
2Re

in
E()
(
i
)]
= lim
n
[() +
in
(
i
) 2
in
(
i
)] = 0.
Interestingly enough, our explanation of the word measure is void in
Examples 7 and 9, since there is no which is representable as a
countable union of disjoint members of .
10. Lemma. Let
i
,
j

0
, and let c
i
, d
j
be complex numbers, i =
1, ..., n, j = 1, ..., m. Assume

in
c
i
I
i
=
jm
d
j
I
j
(-a.e.). Then
in
c
i
(
i
) =
jm
d
j
(
j
) (a.s.), (4)
E[
in
c
i
(
i
)[
2
=
_
X
[
in
c
i
I
i
[
2
(dx). (5)
Proof. First we prove (5). We have
E[
in
c
i
(
i
)[
2
=

i,jn
c
i
c
j
E(
i
)
(
j
) =

i,jn
c
i
c
j
(
i

j
)
=
_
X
i,jn
c
i
c
j
I
i
I
j
(dx) =
_
X
[
in
c
i
I
i
[
2
(dx).
Hence,
E[
in
c
i
(
i
)
jm
d
j
(
j
)[
2
=
_
X
[
in
c
i
I
jm
d
j
I
j
[
2
(dx) = 0.
11. Remark. The rst statement of the lemma looks quite surprising in
the situation when is concentrated at only one point x
0
. Then the equality
in
c
i
I
i
=
jm
d
j
I
j
holds -almost everywhere if and only if
in
c
i
I
i
(x
0
) =
jm
d
j
I
j
(x
0
),
and this may hold for very dierent c
i
,
i
, d
j
,
j
. Yet each time (4) holds
true.
Next, on S() dene an operator I by the formula
I :
in
c
i
I
in
c
i
(
i
).
In the future we will always identify two elements of an L
p
space which
coincide almost everywhere. Under this stipulation, Lemma 10 shows that
I is a well dened linear unitary operator from a subset S() of L
2
(, )
into L
2
(T, P). In addition, by denition S() is dense in L
2
(, ) and every
isometric operator is uniquely extendible from a dense subspace to the whole
space. By this we mean the following result, which we suggest as an exercise.
12. Lemma. Let B
1
and B
2
be Banach spaces and B
0
a linear subset of
B
1
. Let a linear isometric operator I be dened on B
0
with values in B
2
([Ib[
B
2
= [b[
B
1
for every b B
0
). Then there exists a unique linear isometric
operator

I :

B
0
B
2
(

B
0
is the closure of B
0
in B
1
) such that

Ib = Ib for
every b B
0
.
Combining the above arguments, we arrive at the following.
13. Theorem. There exists a unique linear operator I : L
2
(, )
L
2
(T, P) such that
(i) I(
in
c
i
I
i
) =

in
c
i
(
i
) (a.s.) for all nite n,
i

0
and
complex c
i
;
(ii) E[If[
2
=
_
X
[f[
2
(dx) for all f L
2
(, ).
For f L
2
(, ) we write
If =
_
X
f(x) (dx)
and we call If the stochastic integral of f with respect to . Observe
that, by continuity of I, to nd If it suces to construct step functions f
n
converging to f in the L
2
(, ) sense, and then
_
X
f(x) (dx) = l.i.m.
n
_
X
f
n
(x) (dx).
The operator I preserves not only the norm but also the scalar product:
E
_
X
f(x) (dx)
_
X
g(x) (dx) =
_
X
f g (dx), f, g L
2
(, ). (6)
This follows after comparing the coecients of the complex parameter in
the equal (by Theorem 13) polynomials E[I(f +g)[
2
and
_
[f +g[
2
(dx).
14. Exercise. Take
t
from Example 9. Prove that for every Borel f
L
2
(0, 1) the stochastic integral of f against
t
t equals the usual integral;
that is,
_
1
0
f(s) ds +
n
1
f(
n
).
15. Remark. If E() = 0 for every
0
, then for every f L
2
(, ),
we have
E
_
X
f (dx) = 0.
Indeed, for f S(), this equality is veried directly; for arbitrary f
L
2
(, ) it follows from the fact that, by Cauchys inequality for f
n
S(),
[E
_
X
f (dx)[
2
= [E
_
X
(f f
n
) (dx)[
2
E[
_
X
(f f
n
) (dx)[
2
=
_
X
[f f
n
[
2
(dx).
We now proceed to the question as to when L
p
(, ) and L
p
(A, ) co-
incide, which is important in applications. Remember the following deni-
tions.
16. Denition. Let X be a set, B a family of subsets of X. Then B is
called a -system if A
1
A
2
B for every A
1
, A
2
B. It is called a
-system if
(i) X B and A
2
A
1
B for every A
1
, A
2
B such that A
1
A
2
;
(ii) for every A
1
, A
2
, ... B such that A
i
A
j
= when i ,= j,

n=1
A
n

B.
A typical example of -system is given by the collection of all subsets
on which two given probability measures coincide.
17. Exercise*. Prove that if B is both a -system and a -system, then
it is a -eld.
A very important property of - and -systems is stated as follows.
18. Lemma. If is a -system and is a -system and , then
() .
Proof. Let
1
denote the smallest -system containing (
1
is the
intersection of all -systems containing ). It suces to prove that
1

(). To do this, it suces to prove, by Exercise 17, that
1
is a -system,
that is, it contains the intersection of every two of its sets. For B
1
let
(B) denote the family of all A
1
such that A B
1
. Obviously,
(B) is a -system. In addition, if B , then (B) (since is a
-system). Consequently, if B , then by the denition of
1
, we have
(B)
1
. But this means that (A) for each A
1
, so that as
before, (A)
1
for each A
1
, that is,
1
is a -system. The lemma is
proved.
19. Theorem. Let A
1
= (). Assume that is a -system and that
there exists a sequence (1), (2), ...
0
such that (n) (n + 1),
X =
n
(n). Then L
p
(, ) = L
p
(A
1
, ).
Proof. Let denote the family of all subsets A of X such that
I
A
I
(n)
L
p
(, )
for every n. Observe that is a -system. Indeed for instance, if A
1
, A
2
, ...
are pairwise disjoint and A =
k
A
k
, then
I
A
I
(n)
=
k
I
A
k
I
(n)
,
where the series converges in L
p
(, ) since

km
A
k
as m ,
((n)) < , and
_
X
[
km
I
A
k
I
(n)
[
p
(dx) =
_
X
km
I
A
k
I
(n)
(dx) =
_
(n)
_
km
A
k
_
0
as m .
Since , because is a -system, it follows by Lemma 18 that
A
1
. Consequently, it follows from the denition of L
p
(A
1
, ) that I
(n)
f
L
p
(, ) for f L
p
(A
1
, ) and n 1. Finally, a straightforward application
of the dominated convergence theorem shows that [[I
(n)
f f[[
p
0 as
n . Hence f L
p
(, ) if f L
p
(A
1
, ) and L
p
(A
1
, ) L
p
(, ).
Since the reverse inclusion is obvious, the theorem is proved.
It turns out that, under the conditions of Theorem 19, one can extend
from
0
to the larger set A
0
:= () : () < . Indeed, for A
0
we have I
L
2
(, ), so that the denition
() =
_
X
I
(dx)
makes sense. In addition, if
1
,
2
A
0
, then by (6)
E
(
1
)
(
2
) = E
_
X
I
1
(dx)
_
X
I
2
(dx)
=
_
X
I
1
I
2
(dx) = (
1

2
).
Since obviously () =

() (a.s.) for every
0
, we have an extension
indeed. In Sec. 7 we will see that sometimes one can extend even to a
larger set than A
0
.
20. Exercise. Let X
0
, and let be a -system. Show that if

1
and
2
are two extensions of to (), then
_
X
f(x)

1
(dx) =
_
X
f(x)

2
(dx)
(a.s.) for every f L
2
((), ). In particular,

1
() =

2
() (a.s.) for any
().
21. Exercise. Come back to Example 7. By what is said above there is an
extension of to B([0, 1]). By using the independence of increments of w
t
,
prove that
Eexp(
n
[((a
n+1
, a
n
])[) = 0,
where a
n
= 1/n. Derive from here that for almost every the function
(), B([0, 1]), has unbounded variation and hence cannot be a measure.
Let us apply the above theory of stochastic integration to modeling
Brownian motion when the temperature varies in time.
Take the objects introduced in Example 7. By Theorem 19 (and by
Exercise 2), for every f L
2
(0, 1) (where L
2
(0, 1) is the usual L
2
space of
square integrable functions on (0, 1)) the stochastic integral
_
X
f(t) (dt) is
well dened. Usually, one writes this integral as
_
1
0
f(t) dw
t
.
Observe that (by the continuity of the integral) if f
n
f in L
2
(0, 1), then
_
1
0
f
n
(t) dw
t

_
1
0
f(t) dw
t
in the mean-square sense. In addition, if
f
n
(t) =
i
f(t
i+1,n
)I
t
in
<tt
i+1,n
=
i
f(t
i+1,n
)[I
tt
i+1,n
I
tt
in
]
with 0 t
in
t
i+1,n
1, then (by denition and linearity)
_
1
0
f(t) dw
t
= l.i.m.
n
_
1
0
f
n
(t) dw
t
= l.i.m.
n
i
f(t
i+1,n
)(w
t
i+1,n
w
t
in
).
(7)
Naturally, the integral
_
t
0
f(s) dw
s
:=
_
1
0
I
st
f(s) dw
s
gives us a representation of Brownian motion in the environment with chang-
ing temperature. However, for each individual t this integral is an element
of L
2
(T, P) and thus is uniquely dened only up to sets of probability zero.
For describing individual trajectories of Brownian motion we should take an
appropriate representative of
_
t
0
f(s) dw
s
for each t [0, 1]. At this moment
it is absolutely not clear whether this choice can be performed so that we
will have continuous trajectories, which is crucial from the practical point
of view. Much later (see Theorem 6.1.10) we will prove that one can indeed
make the right choice even when f is a random function. The good news is
that this issue can be easily settled at least for some functions f.
22. Theorem. Let t [0, 1], and let f be absolutely continuous on [0, t].
Then
_
t
0
f(s) dw
s
= f(t)w
t

_
t
0
w
s
f
t
(s) ds (a.s.).
Proof. Dene t
in
= ti/n. Then the functions f
n
(s) := f(t
in
) for s
(t
in
, t
i+1,n
] converge to f(s) uniformly on [0, t] so that (cf. (7)) we have
_
t
0
f(s) dw
s
=
_
1
0
I
st
f(s) dw
s
= l.i.m.
n
in1
f(t
in
)(w
t
i+1,n
w
t
in
)
= f(t)w
t
l.i.m.
n
in1
w
t
i+1,n
_
f(t
i+1,n
) f(t
in
)
_
(summation by parts), where the last sum is written as
_
t
0
w
(s,n)
f
t
(s) ds (8)
with (n, s) = t
i+1,n
for s (t
in
, t
i+1,n
]. By the continuity of w
s
we have
w
(s,n)
w
s
uniformly on [0, t], and by the dominated convergence theorem
(f
t
is integrable) we see that (8) converges to
_
t
0
w
s
f
t
(s) ds for every . It
only remains to remember that the mean-square limit coincides (a.s.) with
the pointwise limit if both exist. The theorem is proved.
23. Exercise*. Prove that if a real-valued f L
2
(0, 1), then
_
t
0
f(s) dw
s
,
t [0, 1], is a Gaussian process with zero mean and covariance
R(s, t) =
_
st
0
f
2
(u) du = (
_
s
0
f
2
(u) du) (
_
t
0
f
2
(u) du).
The construction of the stochastic integral with respect to a random
orthogonal measure is not specic to probability theory. We have consid-
ered the case in which () L
2
(T, P), where P is a probability measure.
Our arguments could be repeated almost word for word for the case of an
arbitrary measure. It would then turn out that the Fourier integral of L
2
functions is a particular case of integrals with respect to random orthogonal
measure. In this connection we oer the reader the following exercise.
24. Exercise. Let be the set of all intervals (a, b], where a, b (, ),
a < b. For = (a, b] , dene a function () = (, ) on (, ) by
() =
1
i
_
e
ib
e
ia
_
=
_
e
ix
dx.
Dene L
p
= L
p
(, ) = L
p
(B(R), ). Prove, using a change of variable,
that the number
_
(
1
), (
2
)
_
equals its complex conjugate, that is, it is
real, and that
_
_
()
_
_
2
2
= c () for
1
,
2
, , where c is a constant
independent of . Use this and the observation that (
1
2
) = (
1
) +
(
2
) if
1
,
2
,
1

2
,
1

2
= , to deduce that in that case
_
(
1
), (
2
)
_
= 0. Using the fact that
1
= (
1

2
) (
1

2
)
and adding an interval between
1
,
2
if they do not intersect, prove that
_
(
1
), (
2
)
_
= c (
1
2
) for every
1
,
2
and, consequently, that
we can construct an integral with respect to , such that Parsevals equality
holds for every f L
2
:
c
_
_
f
_
_
2
2
=
_
_
_
_
_
f(dx)
_
_
_
_
2
2
.
Keeping in mind that for f S(), obviously,
_
f(dx) =
f(x)e
ix
dx (a.e.),
generalize this equality to all f L
2
L
1
. Putting f = exp(x
2
) and using
the characteristic function of the normal distribution, prove that c = 2.
Finally, use Fubinis theorem to prove that for f L
1
and < a < b < ,
we have
b
_
a
_

_
f()e
ix
d
_
dx =
1
i
_
e
ib
e
ia
_
f() d.
In other words, if f L
1
L
2
, then
_
(), f
_
= c(I
, g), where
g(x) = c
1
_

f()(x, d),
and (by denition) this leads to the inversion formula for the Fourier trans-
form:
f() =
_
g(x)(, dx).
Generalize this formula from the case f L
1
L
2
to all f L
2
.
50 Chapter 2. The Wiener process, Sec 4
4. The Wiener process on [0, )
The denition of the Wiener process on [0, ) is the same as on [0, 1] (cf. Def-
inition 1.5). Clearly for the Wiener process on [0, ) one has the corre-
sponding counterparts of Theorems 2.1 and 2.2 about the independence of
increments and the independence of increments of previous values of the
process. Also as in Exercise 1.6, if w
t
is a Wiener process on [0, ) and c is
a strictly positive constant, then cw
t/c
2 is also a Wiener process on [0, ).
This property is called self-similarity of the Wiener process.
1. Theorem. There exists a Wiener process dened on [0, ).
Proof. Take any smooth function f(t) > 0 on [0, 1) such that
_
1
0
f
2
(t) dt = .
Let (r) be the inverse function to
_
t
0
f
2
(s) ds. For t < 1 dene
y(t) = f(t)w
t

_
t
0
w
s
f
t
(s) ds.
Obviously y(t) is a continuous process. By Theorem 3.22 we have
y(t) =
_
t
0
f(s) dw
s
=
_
1
0
I
st
f(s) dw
s
(a.s.).
By Exercise 3.23, y
t
is a Gaussian process with zero mean and covariance
_
st
0
f
2
(u) du = (
_
s
0
f
2
(u) du) (
_
t
0
f
2
(u) du), s, t < 1.
Now, as is easy to see, x(r) := y((r)) is a continuous Gaussian process
dened for r [0, ) with zero mean and covariance r
1
r
2
. The theorem
is proved.
Apart from the properties of the Wiener process on [0, ) stated in the
beginning of this section, which are similar to the properties on [0, 1], there
are some new ones, of which we will state and prove only two.
2. Theorem. Let w
t
be a Wiener process for t [0, ) dened on a prob-
ability space (, T, P). Then there exists a set
t
T such that P(
t
) = 1
and, for each
t
, we have
lim
t0
tw
1/t
() = 0.
Furthermore, for t > 0 dene
Ch 2 Section 4. The Wiener process on [0, ) 51
t
() =
_
tw
1/t
() if
t
,
0 if ,
t
,
and let
0
() 0. Then
t
is a Wiener process.
Proof. Dene

t
= tw
1/t
for t > 0 and

0
0. As is easy to see,

t
is a
Gaussian process with zero mean and covariance s t. It is also continuous
on (0, ). It follows, in particular, that sup
s(0,t]
[
s
()[ equals the sup over
rational numbers on (0, t]. Since this sup is an increasing function of t, its
limit as t 0 can also be calculated along rational numbers. Thus,
t
:= : lim
t0
sup
s(0,t]
[
s
()[ = 0 T.
Next, let C
t
be the set of all (maybe unbounded) continuous functions
on (0, 1], and (C
t
) the cylinder -eld of subsets of C
t
, that is, the smallest
-eld containing all sets x
C
t
: x
t
for all t (0, 1] and B(R).
Then the distributions of

and w
on (C
t
, (C
t
)) coincide (cf. Remark 1.4).
Dene
A = x
C
t
: lim
t0
sup
s(0,t]
[x
s
[ = 0.
Since x
C
t
are continuous in (0, 1], it is easy to see that A (C
t
).
Therefore,
P(
A) = P(w
A),
which is to say,
P(lim
t0
sup
s(0,t]
[
s
[ = 0) = P(lim
t0
sup
s(0,t]
[w
s
[ = 0).
The last probability being 1, we conclude that P(
t
) = 1, and it only
remains to observe that
t
is a continuous process and
t
=

t
on
t
or almost
surely, so that
t
is a Gaussian process with zero mean and covariance s t.
3. Corollary. Let 1/2 > > 0. By Theorem 2.4 for almost every there
exists n() < such that [
t
()[ Nt
1/2
for t 2
n()
, where N
depends only on . Hence, for w
t
, for almost every we have [w
t
[ Nt
1/2+
if t 2
n()
.
4. Remark. Having the Wiener process on [0, ), we can repeat the con-
struction of the stochastic integral and dene
_
0
f(t) dw
t
for every f
L
2
([0, )) starting with the random orthogonal measure (0, a] = w
a
de-
ned for all a 0. Of course, this integral has properties similar to those
of
_
1
0
f(t) dw
t
. In particular, the results of Theorem 3.22 on integrating by
parts and of Exercise 3.23 still hold.
5. Markov and strong Markov properties of the
Wiener process
Let (, T, P) be a probability space carrying a Wiener process w
t
, t [0, ).
Also assume that for every t [0, ) we are given a -eld T
t
T such
that T
s
T
t
for t s. We call such a collection of -elds an (increasing)
ltration of -elds.
A trivial example of ltration is given by T
t
T.
1. Denition. Let be a -eld, T and a random variable taking
values in a measurable space (X, B). We say that and are independent
if P(A, B) = P(A)P( B) for every A and B B.
2. Exercise*. Prove that if and are independent, f(x) is a measurable
function, and is -measurable, then f() and are independent as well.
3. Denition. We say that w
t
is a Wiener process relative to the ltration
T
t
if w
t
is T
t
-measurable for every t and w
t+h
w
t
is independent of T
t
for
every t, h 0. In that case the couple (w
t
, T
t
) is called a Wiener process.
Below we assume that (w
t
, T
t
) is a Wiener process, explaining rst that
there always exists a ltration with respect to which w
t
4. Lemma. Let
T
w
t
:= : w
s
() B, s t, B B(R).
Then (w
t
, T
w
t
) is a Wiener process.
Proof. By denition T
w
t
is the smallest -eld containing all sets :
w
s
() B for s t and Borel B. Since each of them is (as an element) in
T, T
w
t
T. The inclusion T
w
s
T
w
t
for t s is obvious, since : w
r
()
B belong to T
w
t
for r s and T
w
s
is the smallest -eld containing them.
Therefore T
w
t
is a ltration.
Next, : w
t
() B T
w
t
for B B(R); hence w
t
is T
w
t
-measurable.
To prove the independence of w
t+h
w
t
and T
w
t
, x a B B(R), t, h 0,
and dene
(A) = P(A, w
t+h
w
t
B), (A) = P(A)P(w
t+h
w
t
B).
Ch 2 Section 5. Strong Markov property of the Wiener process 53
One knows that and are measures on (, T). By Theorem 2.2 these
measures coincide on every A of type : (w
t
1
(), ..., w
t
n
()) B
(n)
pro-
vided that t
i
t and B
(n)
B(R
n
). The collection of these sets is an
algebra (Exercise 1.3.3). Therefore and coincide on the smallest -eld,
say , containing these sets. Observe that T
w
t
, since the collection
generating contains : w
s
() D for s t and D B(R). Hence
and coincide on T
w
t
. It only remains to remember that B is an arbitrary
element of B(R). The lemma is proved.
We see that one can always take T
w
t
as T
t
. However, it turns out that
sometimes it is very inconvenient to restrict our choice of T
t
to T
w
t
. For
instance, we can be given a multi-dimensional Wiener process (w
1
t
, ..., w
d
t
)
(see Denition 6.4.1) and study only its rst coordinate. In particular, while
introducing stochastic integrals of random processes against dw
1
t
we may be
interested in integrating functions depending not only on w
1
t
but on all other
components as well.
5. Exercise*. Let

T
w
t
be the completion of T
w
t
. Prove that (w
t
,

T
w
t
) is a
Wiener process.
6. Theorem (Markov property). Let (w
t
, T
t
) be a Wiener process. Fix t,
h
1
, .., h
n
0. Then the vector (w
t+h
1
w
t
, ..., w
t+h
n
w
t
) and the -eld
T
t
are independent. Furthermore, w
t+s
w
t
, s 0, is a Wiener process.
Proof. The last statement follows directly from the denitions. To
prove the rst one, without losing generality we assume that h
1
... h
n
and notice that, since (w
t+h
1
w
t
, ..., w
t+h
n
w
t
) is obtained by a linear
transformation from
n
, where
k
= (w
t+h
1
w
t+h
0
, ..., w
t+h
k
w
t+h
k1
) and
h
0
= 0, we need only show that
n
and T
t
are independent. We are going to
use the theory of characteristic functions. Take A T
t
and a vector R
n
.
Notice that
EI
A
exp(i
n
) = EI
A
exp(i
n1
) exp(i
n
(w
t+h
n
w
t+h
n1
)),
where = (
1
, ...,
n1
). Here I
A
is T
t
-measurable and, since T
t
T
t+h
n1
,
it is T
t+h
n1
-measurable as well. It follows that I
A
exp(i
n1
) is T
t+h
n1
-
measurable. Furthermore, w
t+h
n
w
t+h
n1
is independent of T
t+h
n1
. Hence,
by Exercise 2
EI
A
exp(i
n
) = EI
A
exp(i
n1
)E exp(i
n
(w
t+h
n
w
t+h
n1
)),
and by induction and independence of increments of w
t
EI
A
exp(i
n
) = EI
A
n
j=1
Eexp(i
n
(w
t+h
j
w
t+h
j1
)) = P(A)E exp(i
n
).
It follows from the theory of characteristic functions that for every Borel
bounded g
EI
A
g(
n
) = P(A)Eg(
n
).
It only remains to substitute here the indicator of a Borel set in place of g.
Theorem 6 says that, for every xed t 0, the process w
t+s
w
t
, s 0,
starts afresh as a Wiener process forgetting everything that happened to w
r
before time t. This property is quite natural for Brownian motion. It also
has a natural extension when t is replaced with a random time , provided
that does not depend on the future in a certain sense. To describe exactly
what we mean by this, we need the following.
7. Denition. Let be a random variable taking values in [0, ] (including
). We say that is a stopping time (relative to T
t
) if : () > t T
t
for every t [0, ).
The term stopping time is discussed after Exercise 3.3.3. Trivial ex-
amples of stopping times are given by nonrandom positive constants. A
much more useful example is the following.
8. Example. Fix a 0 and dene
=
a
= inft 0 : w
t
a (inf := )
as the rst hitting time of the point a by w
t
. It turns out that is a stopping
time.
Indeed, one can easily see that
: () > t = : max
st
w
s
() < a, (1)
where, for dened as the set of all rational points on [0, ),
max
st
w
s
= sup
r,rt
w
r
,
which shows that max
st
w
s
is an T
t
-measurable random variable.
9. Exercise*. Let a < 0 < b and let be the rst exit time of w
t
from
(a, b):
= inft 0 : w
t
, (a, b).
Prove that is a stopping time.
Ch 2 Section 5. Strong Markov property of the Wiener process 55
10. Denition. Random processes
1
t
, ...,
n
t
dened for t 0 are called
independent if for every t
1
, ..., t
k
0 the vectors (
1
t
1
, ...,
1
t
k
), ..., (
n
t
1
, ...,
n
t
k
)
are independent.
In what follows we consider some processes at random times, and these
times occasionally can be innite even though this happens with probability
zero. In such situations we use the notation
x
= x
() =
_
x
()
() if () < ,
0 if () = .
11. Lemma. Let (w
t
, T
t
) be a Wiener process and let be an T
t
-stopping
time. Assume P( < ) = 1. Then the processes w
t
and B
t
:= w
+t
w
are independent and the latter one is a Wiener process.

Proof. Take 0 t
1
... t
k
. As is easy to see, we need only prove that
for any Borel nonnegative functions f(x
1
, ..., x
k
) and g(x
1
, ..., x
k
)
I
:= Ef(w
t
1
, ..., w
t
k
)g(B
t
1
, ..., B
t
k
)
= Ef(w
t
1
, ..., w
t
k
)Eg(w
t
1
, ..., w
t
k
). (2)
Assume for a moment that the set of values of is countable, say r
1
<
r
2
< .... By noticing that = r
n
= > r
n1
> r
n
T
r
n
and
F
n
:= f(w
t
1
, ..., w
t
k
)I
=r
n
= f(w
t
1
r
n
, ..., w
t
k
r
n
)I
=r
n
,
we see that the rst term is T
r
n
-measurable. Furthermore,
I
=r
n
g(B
t
1
, ..., B
t
k
) = I
=r
n
g(w
r
n
+t
1
w
r
n
, ..., w
r
n
+t
k
w
r
n
),
where, by Theorem 6, the last factor is independent of T
r
n
, and
Eg(w
r
n
+t
1
w
r
n
, ..., w
r
n
+t
k
w
r
n
) = Eg(w
t
1
, ..., w
t
k
).
Therefore,
I
r
n
EF
n
g(w
r
n
+t
1
w
r
n
, ..., w
r
n
+t
k
w
r
n
)
= Eg(w
t
1
, ..., w
t
k
)
r
n
EF
n
.
The last sum equals the rst term on the right in (2). This proves the
theorem for our particular .
In the general case we approximate and rst notice (see, for instance,
Theorem 1.2.4) that equation (2) holds for all Borel nonnegative f, g if and
only if it holds for all bounded continuous f, g. Therefore, we assume f, g
to be bounded and continuous.
Now, for n = 1, 2, ..., dene
n
() = (k + 1)2
n
for such that k2
n
< () (k + 1)2
n
, (3)
k = 1, 0, 1, .... It is easily seen that
n
+2
n
,
n
, and for t 0
:
n
> t = : () > 2
n
[2
n
t] T
2
n
[2
n
t]
T
t
,
so that
n
are stopping times. Hence, by the above result,
I
= lim
n
I
n
= Eg(w
t
1
, ..., w
t
k
) lim
n
Ef(w
t
1
n
, ..., w
t
k
n
),
and this leads to (2). The lemma is proved.
The following theorem states that the Wiener process has the strong
Markov property.
12. Theorem. Let (w
t
, T
t
) be a Wiener process and an T
t
-stopping time.
Assume that P( < ) = 1. Let
T
w
= : w
s
B, s 0, B B(R),
T
w
= : w
+s
w
B, s 0, B B(R).
Then the -elds T
w
and T
w
are independent in the sense that for every

A T
w
and B T
w
we have P(AB) = P(A)P(B). Furthermore, w

+t
Proof. The last assertion is proved in Lemma 11. To prove the rst one
we follow the proof of Lemma 4 and rst let B = : (w
+s
1
w
, ..., w
+s
k
) , where B(R
k
). Consider two measures (A) = P(AB) and
(A) = P(A)P(B) as measures on sets A. By Lemma 11 these measures
coincide on every A of type : (w
t
1
, ..., w
t
n
) B
(n)
provided that
B
(n)
B(R
n
). The collection of these sets is an algebra (Exercise 1.3.3).
Therefore and coincide on the smallest -eld, which is T
w
, containing
these sets. Hence P(AB) = P(A)P(B) for all A T
w
and our particular

B. It only remains to repeat this argument relative to B upon xing A. The
theorem is proved.
Ch 2 Section 6. Examples of applying the strong Markov property 57
6. Examples of applying the strong Markov property
First, we want to apply Theorem 5.12 to
a
from Example 5.8. Notice that
Bacheliers Theorem 2.3 holds not only for t (0, 1] but for t 1 as well.
One proves this by using the self-similarity of the Wiener process (cw
t/c
2 is
a Wiener process for every constant c ,= 0). Then, owing to (5.1), for t > 0
we nd that P(
a
> t) = P([w
t
[ < a) = P([w
1
[
t < a), which tends to

zero as t , showing that P(
a
< ) = 1. Now Theorem 5.12 allows us
to conclude that w
+t
w
= w
+t
a is a Wiener process independent of
the trajectory on [0, ]. This makes rigorous what is quite clear intuitively.
Namely, after reaching a, the Wiener process starts afresh, forgetting
everything which happened to it before. The same happens when it reaches
a higher level b > a after reaching a, and moreover,
b

a
has the same
distribution as
ba
. This is part of the following theorem, in which, as well
as above, we allow ourselves to consider random variables like
b
a
which
may not be dened on a set of probability zero. We set
b
()
a
() = 0 if
b > a > 0 and
b
() =
a
() = .
1. Theorem. (i) For every 0 < a
1
< a
2
< ... < a
n
< the random
variables
a
1
,
a
2

a
1
, ...,
a
n

a
n1
are independent.
(ii) For 0 < a < b, the law of
b
a
coincides with that of
ba
, and
a
has Walds distribution with density
p(t) = (2)
1/2
at
3/2
exp(a
2
/(2t)), t > 0.
Proof. (i) It suces to prove that
a
n

a
n1
is independent of
a
1
, ...,
a
n1
(cf. the proof of Theorem 2.2). To simplify notation, put (a) =
a
.
Since a
i
a
n1
for i n 1, we can rewrite (5.1) as
: (a
i
) > t = : sup
s,st
w
s(a
n1
)
< a
i
,
which implies that the (a
i
) are T
(a
n1
)
-measurable. On the other hand,
for t 0,
: (a
n
) (a
n1
) > t
= : (a
n
) (a
n1
) > t, (a
n1
) <
= : sup
s,st
(w
(a
n1
)+s
w
(a
n1
)
) < a
n
a
n1
, (a
n1
) <
= : 0 < sup
s,st
(w
(a
n1
)+s
w
(a
n1
)
) < a
n
a
n1
, (1)
which shows that (a
n
) (a
n1
) is T
(a
n1
)
-measurable. Referring to
Theorem 5.12 nishes the proof of (i).
(ii) Let n = 2, a
1
= a, and a
2
= b. Then in the above notation (a
n
) =
b
and (a
n1
) =
a
. Since w
(a
n1
)+t
w
(a
n1
)
= w
a
+t
w
a
is a Wiener
process and the distributions of Wiener processes coincide, the probability
of the event on the right in (1) equals
P( sup
s,st
w
s
< a
n
a
n1
= b a) = P(
ba
> t).
This proves the rst assertion in (ii). To nd the distribution of
a
, remember
that
P(
a
> t) = P(max
st
w
s
< a) = P([w
1
[
t < a) =
2
2
_
a/
t
0
e
y
2
/2
dy.
By dierentiating this formula we immediately get our density. The theorem
is proved.
2. Exercise. We know that the Wiener process is self-similar in the sense
that cw
t/c
2 is a Wiener process for every constant c ,= 0. The process
a
,
a 0, also has this kind of property. Prove that, for every c > 0, the process
c
a/
c
, a 0, has the same nite-dimensional distributions as
a
, a 0.
Such processes are called stable. The Wiener process is a stable process of
order 2, and the process
a
is a stable process of order 1/2.
Our second application exhibits the importance of the operator u
u
tt
in computing various expectations related to the Wiener process. The
following results can be obtained quite easily on the basis of Itos formula
from Chapter 6. However, the reader might nd it instructive to see that
there is a dierent approach using the strong Markov property.
3. Lemma. Let u be a twice continuously dierentiable function dened on
R such that u, u
t
, and u
tt
are bounded. Then, for every > 0,
u(0) = E
_

0
e
t
(u(w
t
) (1/2)u
tt
(w
t
)) dt. (2)
Proof. Since w
t
is a normal (0, t) variable, the right-hand side of (2)
equals
I :=
_

0
e
t
E(u(w
t
) (1/2)u
tt
(w
t
)) dt
=
_

0
e
t
_
_
R
(u(x) (1/2)u
tt
(x))p(t, x) dx
_
dt,
where
p(t, x) :=
1
2t
e
x
2
/(2t)
, t > 0.
Ch 2 Section 6. Examples of applying the strong Markov property 59
We continue our computation, integrating by parts. One can easily check
that
1
2
2
p
(x)
2
=
p
t
, e
t
p
e
t
2
2
p
(x)
2
=

t
(e
t
p).
Hence
I = lim
0
_

e
t
_
_
R
(u(x) (1/2)u
tt
(x))p(t, x) dx
_
dt
= lim
0
_

t
_
e
t
_
R
u(x)p(t, x) dx
_
dt = lim
0
e
_
R
u(x)p(, x) dx
= lim
0
Eu(w
) = u(0).
4. Theorem. Let < a < 0 < b < , and let u be a twice continuously
dierentiable function given on [a, b]. Let be the rst exit time of w
t
from
the interval (a, b) (see Exercise 5.9). Then, for every 0,
u(0) = E
_

0
e
t
(u(w
t
) (1/2)u
tt
(w
t
)) dt +Ee
u(w
). (3)
Proof. If needed, one can continue u outside [a, b] and have a function,
for which we keep the same notation, satisfying the assumptions of Lemma
3. Denote f = u u
tt
. Notice that obviously
b
, and, as we have seen
above, P(
b
< ) = 1. Therefore by Lemma 3 we nd that, for > 0,
u(0) = E
_

0
... = E
_

0
... +E
_

...
= E
_

0
e
t
f(w
t
) dt +
_

0
e
t
Ee
f(w
+B
t
) dt =: I +J,
where B
t
= w
+t
w
. Now we want to use Theorem 5.12. The reader

who did Exercise 5.9 understands that is T
w
-measurable. Furthermore,
w
t
I
<
w
as t , so that w
is also T
w
-measurable. Hence (, w
)
and B
t
are independent, and
J =
_

0
e
t
Ee
f(w
+B
t
) dt = Ee
v(w
),
where
v(y) := E
_

0
e
t
f(y +B
t
) dt = E
_

0
e
t
f(y +w
t
) dt.
Upon applying Lemma 3 to u(x + y) in place of u(x), we immediately get
that v = u, and this proves the theorem if > 0.
To prove (3) for = 0 it suces to pass to the limit, which is possible due
to the dominated convergence theorem if we know that E < . However,
for the function u
0
(x) = (x a)(b x) and the result for > 0, we get
[a[b = u
0
(0) = E
_

0
e
t
(u(w
t
) + 1) dt E
_

0
e
t
dt,
E
_

0
e
t
dt [a[b
and it only remains to apply the monotone convergence theorem to get
E [a[b < . The theorem is proved.
In the following exercises we suggest the reader use Theorem 4.
5. Exercise. (i) Prove that E = [a[b.
(ii) By noticing that
Eu(w
) = u(b)P( =
b
) +u(a)P( <
b
)
and taking an appropriate function u, show that the probability that the
Wiener process hits b before hitting a is [a[/([a[ +b).
6. Exercise. Sometimes one is interested in knowing how much time the
Wiener process spends in a subinterval [c, d] (a, b) before exiting from
(a, b). Of course, by this time we mean Lebesgue measure of the set t <
: w
t
[c, d].
(i) Prove that this time equals
:=
_

0
I
[c,d]
(w
t
) dt.
(ii) Prove that for any Borel nonnegative f we have
E
_

0
f(w
t
) dt =
2
b a
_
b
_
0
a
f(y)(y a) dy a
_
b
0
f(y)(b y) dy
_
,
and nd E.
7. Exercise. Dene x
t
= w
t
+ t, and nd the probability that x
t
hits b
before hitting a.
Ch 2 Section 7. It o stochastic integral 61
7. Ito stochastic integral
In Sec. 3 we introduced the stochastic integral of nonrandom functions on
[0, 1] against dw
t
. It turns out that a slight modication of this procedure
allows one to dene stochastic integrals of random functions as well. The
way we proceed is somewhat dierent from the traditional one, which will
be presented in Sec. 6.1. We decided to give this denition just in case
the reader decides to study stochastic integration with respect to arbitrary
square integrable martingales.
Let (w
t
, T
t
) be a Wiener process in the sense of Denition 5.3, given on a
probability space (, T, P). To proceed with dening It o stochastic integral
in the framework of Sec. 3 we take
X = (0, ), A = T B((0, )), = P (1)
and dene as the collection of all sets A(s, t] where 0 s t < and
A T
s
. Notice that, for A (s, t] ,
(A(s, t]) = P(A)(t s) < ,
so that
0
= . For A(s, t] let
(A(s, t]) = (w
t
w
s
)I
A
.
1. Denition. Denote T = () and call T the -eld of predictable sets.
The functions on (0, ) which are T-measurable are called predictable
(relative to T
t
).
By the way, the name predictable comes from the observation that
the simplest T-measurable functions are indicators of elements of which
have the form I
A
I
(s,t]
and are left-continuous, thus predictable on the basis
of past observations, functions of time.
2. Exercise*. Prove that is a -system, and by relying on Theorem 3.19
conclude that L
2
(, ) = L
2
(T, ).
3. Theorem. The function on is a random orthogonal measure with
reference measure , and E() = 0 for every .
Proof. We have to check the conditions of Denition 3.5. Let
1
=
A
1
(t
1
, t
2
],
2
= A
2
(s
1
, s
2
] . Dene
f
t
() = I
1
(, t) +I
2
(, t)
and introduce the points r
1
... r
4
by ordering t
1
, t
2
, s
1
, and s
2
. Obvi-
ously, for every t 0, the functions I
i
(, t+) are T
t
-measurable and the
same holds for f
t+
(). Furthermore, for each , f
t
() is piecewise constant
and left continuous in t. Therefore,
f
t
() =
3
i=1
g
i
()I
(r
i
,r
i+1
]
(t), (2)
where the g
i
= f
r
i
+
are T
r
i
-measurable.
It turns out that for every
(
1
) +(
2
) =
3
i=1
g
i
()(w
r
i+1
w
r
i
). (3)
One can prove (3) in the following way. Fix an and dene a continuous
function A
t
, t [r
1
, r
4
], so that A
t
is piecewise linear and equals w
r
i
at
all r
i
s. Then by integrating through (2) against dA
t
, remembering the
denition of f
t
and the fact that the integral of a sum equals the sum of
integrals, we come to (3).
It follows from (3) that
E((
1
) +(
2
))
2
=
3
i=1
Eg
2
i
(w
r
i+1
w
r
i
)
2
+2
i<j
Eg
i
g
j
(w
r
i+1
w
r
i
)(w
r
j+1
w
r
j
),
where all expectations make sense because 0 f 2 and Ew
2
t
= t < .
Remember that E(w
r
j+1
w
r
j
) = 0 and E(w
r
i+1
w
r
i
)
2
= r
i+1
r
i
. Also
notice that (w
r
i+1
w
r
i
)
2
and g
2
i
are independent by Exercise 5.2 and, for i <
j, the g
i
are T
r
i
-measurable and T
r
j
-measurable, owing to T
r
i
T
r
j
, so that
g
i
g
j
(w
r
i+1
w
r
i
) is T
r
j
-measurable and hence independent of w
r
j+1
w
r
j
.
Then we see that
E((
1
) +(
2
))
2
=
3
i=1
Ef
2
r
i
+
(r
i+1
r
i
) = E
_
r
4
r
1
f
2
t
dt
= E
_
r
4
r
1
(I
1
+I
2
)
2
dt = E
_
r
4
r
1
I
1
dt + 2E
_
r
4
r
1
I
2
dt +E
_
r
4
r
1
I
2
dt
Ch 2 Section 7. It o stochastic integral 63
= (
1
) + 2(
1

2
) +(
2
). (4)
By plugging in
1
=
2
= , we nd that E
2
() = (). Then, devel-
oping E((
1
) + (
2
))
2
and coming back to (4), we get E(
1
)(
2
) =
(
1

2
). Thus by Denition 3.5 the function is a random orthogonal
measure with reference measure .
The fact that E = 0 follows at once from the independence of T
s
and
w
t
w
s
for t s. The theorem is proved.
Theorem 3 allows us to apply Theorem 3.13. By combining it with
Exercise 2 and Remark 3.15 we come to the following result.
4. Theorem. In notation (1) there exists a unique linear isometric opera-
tor I : L
2
(T, ) L
2
(T, P) such that, for every n = 1, 2, ..., constants c
i
,
s
i
t
i
, and A
i
T
s
i
given for i = 1, ..., n, we have
I(
n
i=1
c
i
I
A
i
I
(s
i
,t
i
]
) =
n
i=1
c
i
I
A
i
(w
t
i
w
s
i
) (a.s.). (5)
In addition, EIf = 0 for every f L
2
(T, ).
5. Exercise*. Formula (5) admits the following generalization. Prove that
for every n = 1, 2, ..., constants s
i
t
i
, and T
s
i
-measurable functions g
i
given for i = 1, ..., n and satisfying Eg
2
i
< , we have
I(
n
i=1
g
i
I
(s
i
,t
i
]
) =
n
i=1
g
i
(w
t
i
w
s
i
) (a.s.).
6. Denition. We call If, introduced in Theorem 4, the It o stochastic
integral of f, and write
If =:
_

0
f(, t) dw
t
.
The Ito integral between nonrandom a and b such that 0 a b
is naturally dened by
_
b
a
f(, t) dw
t
=
_

0
f(, t)I
(a,b]
(t) dw
t
.
The comments in Sec. 3 before Theorem 3.22 are valid for Ito stochastic
integrals as well as for integrals of nonrandom functions against dw
t
. It is
natural to notice that for nonrandom functions both integrals introduced in
this section and in Sec. 3 coincide (a.s.). This follows from formula (3.7),
valid for both integrals (and from the possibility of nding appropriate f
n
, a
possibility which is either known to the reader or will be seen from Remark
8.6).
Generally it is safe to say that the properties of the It o integral are ab-
solutely dierent from those of the integral of nonrandom functions. For
instance Exercise 3.23 implies that for nonrandom integrands the integral
is either zero or its distribution has density. About 1981 M. Safonov con-
structed an example of random f
t
satisfying 1 f
t
2 and such that the
distribution of
_
1
0
f
t
dw
t
is singular with respect to Lebesgue measure.
One may wonder why we took sets like A(s, t] and not A[s, t) as a
starting point for stochastic integration. Actually, for the Ito stochastic in-
tegral against the Wiener process this is irrelevant, and the second approach
even has some advantages, since then (cf. Exercise 5) almost by denition
we would have a very natural formula:
_

0
f(t) dw
t
=
n
i=1
f(t
i
)(w
t
i+1
w
t
i
)
provided that f(t) is T
t
-measurable and E[f(t)[
2
< for every t, and
0 t
1
... t
n+1
< are nonrandom and such that f(t) = f(t
i
) for
t [t
i
, t
i+1
) and f(t) = 0 for t t
n+1
. We show that this formula is indeed
true in Theorem 8.8.
However, there is a signicant dierence between the two approaches
if one tries to integrate with respect to discontinuous processes. Several
unusual things may happen, and we oer the reader the following exercises
showing one of them.
7. Exercise. In completely the same way as above one introduces a sto-
chastic integral against
t
:=
t
t, where
t
is the Poisson process with
parameter 1. Of course, one needs an appropriate ltration of -elds T
t
such that
t
is T
t
-measurable and
t+h

t
is independent of T
t
for all
t, h 0. On the other hand, one can integrate against
t
as usual, since this
function has bounded variation on each interval [0, T]. In connection with
this, prove that
E(usual)
_
1
0
t
d
t
,= 0,
so that either
t
is not stochastically integrable or the usual integral is
dierent from the stochastic one. (As follows from Theorem 8.2, the latter
is true.)
Ch 2 Section 8. The structure of It o integrable functions 65
8. Exercise. In the situation of Exercise 7, prove that for every predictable
nonnegative f
t
we have
E(usual)
_
1
0
f
t
d
t
= E
_
1
0
f
t
dt.
Conclude that
t
is not predictable, and is not T
-measurable either.
8. The structure of Ito integrable functions
Dealing with It o stochastic integrals quite often requires much attention to
tiny details, since often what seems true turns out to be absolutely wrong.
For instance, we will see below that the function I
(0,)
(w
t
)I
(0,1)
(t) is Ito
integrable and consequently its It o integral has zero mean. This may look
strange due to the following.
Represent the open set t : w
t
> 0 as the countable union of disjoint
intervals (
i
,
i
). Clearly w
i
= w
i
= 0, and
I
(0,)
(w
t
)I
(0,1)
(t) =
i
I
(0,1)(
i
,
i
)
(t). (1)
In addition it looks natural that
_

0
I
(0,1)(
i
,
i
)
(t) dw
t
= w
1
i
w
1
i
, (2)
where the right-hand side is dierent from zero only if
i
< 1,
i
> 1,
and w
1
> 0, i.e. if 1 (
i
,
i
). In that case the right-hand side of (2)
equals (w
1
)
+
, and since the integral of a sum should be equal to the sum of
integrals, formula (1) shows that the It o integral of I
(0,)
(w
t
)I
(0,1)
(t) should
equal (w
1
)
+
. However, this is impossible since E(w
1
)
+
> 0.
The contradiction here comes from the fact that the terms in (1) are not
Ito integrable and (2) just does not make sense.
One more example of an integral with no sense gives
_
1
0
w
1
dw
t
. Again
its mean value should be zero, but under every reasonable way of dening
this integral it should equal w
1
_
1
0
dw
t
= w
2
1
.
All this leads us to the necessity of investigating the set of It o inte-
grable functions. Due to Theorem 3.19 and Exercise 3.2 this is equivalent
to investigating which functions are T
-measurable.
1. Denition. A function f
t
() given on (0, ) is called T
t
-adapted if
it is T
t
-measurable for each t > 0. By H we denote the set of all real-valued
T
t
-adapted functions f
t
() which are T B(0, )-measurable and satisfy
E
_

0
f
2
t
dt < .
The following theorem says that all elements of H are Ito integrable.
The reader is sent to Sec. 7 for necessary notation.
2. Theorem. We have H L
2
(T, ).
Proof (Doob). It suces only to prove that f L
2
(T, ) for f H such
that f
t
() = 0 for t T, where T is a constant. Indeed, by the dominated
convergence theorem
_
X
[f
t
f
t
I
tn
[
2
dPdt = E
_

n
f
2
t
dt 0
as n , so that, if f
t
I
tn
L
2
(T, ), then f
t
L
2
(T, ) due to the
completeness of L
2
(T, ).
Therefore we x an f H and T < and assume that f
t
= 0 for
t T. It is convenient to assume that f
t
is dened for negative t as well,
and f
t
= 0 for t 0. Now we recall that it is known from integration theory
that every L
2
-function is continuous in L
2
. More precisely, if h L
2
([0, T])
and h(t) = 0 outside [0, T], then
lim
a0
_
T
T
[h(t +a) h(t)[
2
dt = 0.
This and the inequality
_
T
T
[f
t+a
f
t
[
2
dt 2
_
_
T
T
f
2
t+a
dt +
_
T
T
f
2
t
dt
_
4
_
T
0
f
2
t
dt
along with the dominated convergence theorem imply that
lim
a0
E
_
T
T
[f
t+a
f
t
[
2
dt = 0. (3)
Now let
n
(t) = k2
n
for t (k2
n
, (k + 1)2
n
].
Changing variables t +s = u, t = v shows that
Ch 2 Section 8. The structure of It o integrable functions 67
_
1
0
E
_
T
0
[f
n
(t+s)s
f
t
[
2
dtds =
_
T+1
0
_
E
_
uT
u1
[f
n
(u)u+v
f
v
[
2
dv
_
du.
The last expectation tends to zero owing to (3) uniformly with respect to
u, since 0 u
n
(u) 2
n
. It follows that there is a sequence n(k)
such that for almost every s [0, 1]
lim
k
E
_
T
0
[f
n(k)
(t+s)s
f
t
[
2
dt = 0. (4)
Fix any s for which (4) holds, and denote f
k
t
= f
n(k)
(t+s)s
. Then (4)
and the inequality [a[
2
2[b[
2
+ 2[a b[
2
show that [f
k
t
[
2
is -integrable at
least for all large k.
Furthermore, it turns out that the f
k
t
are predictable. Indeed,
f
n
(t+s)s
=
i
f
i2
n
s
I
(i2
n
s,(i+1)2
n
s]
(t) =
i:i2
n
s>0
. (5)
In addition, f
t
1
I
(t
1
,t
2
]
is predictable if 0 t
1
t
2
, since for any Borel B
(, t) : f
t
1
()I
(t
1
,t
2
]
(t) B
= ( : f
t
1
() B (t
1
, t
2
]) (, t) : I
(t
1
,t
2
]
(t) = 0 B T.
Therefore (5) yields the predictability of f
k
t
, and the integrability of [f
k
t
[
2
now implies that f
k
t
L
2
(T, ). The latter space is complete, and owing to
(4) we have f
t
L
2
(T, ). The theorem is proved.
3. Exercise*. By following the above proof, show that left continuous T
t
-
adapted processes are predictable.
4. Exercise. Go back to Exercise 7.7 and prove that if f
t
is left continuous,
T
t
-adapted, and E
_
1
0
f
2
t
dt < , then the usual integral
_
1
0
f
t
d
t
coincides
with the stochastic one (a.s.). In particular, prove that the usual integral
_
1
0

t
d
t
coincides with the stochastic integral
_
1
0

t
d
t
(a.s.).
5. Exercise. Prove that if f L
2
(T, ), then there exists h H such that
f = h -a.e. and in this sense H = L
2
(T, ).
6. Remark. If f
t
is independent of , (4) implies that for almost any s
[0, 1]
lim
k
_
T
0
[f
n(k)
(t+s)s
f
t
[
2
dt = 0,
_
T
0
f
t
dt = lim
k
_
T
0
f
n(k)
(t+s)s
dt.
This means that appropriate Riemann sums converge to the Lebesgue inte-
gral of f.
7. Remark. It is seen from the proof of Theorem 2 that, if f H, then for
any integer n 1 one can nd a partition 0 = t
n0
< t
n1
< ... < t
nk(n)
= n
such that max
i
(t
n,i+1
t
ni
) 1/n and
lim
n
E
_

0
[f
t
f
n
t
[
2
dt = 0,
where f
n
H are dened by f
n
t
= f
t
ni
for t (t
ni
, t
n,i+1
], i k(n) 1, and
f
n
t
= 0 for t > n. Furthermore, the f
n
t
are predictable, and by Theorem 7.4
_

0
f
t
dw
t
= l.i.m.
n
_

0
f
n
t
dw
t
. (6)
One can apply the same construction to vector-valued functions f, and then
one sees that the above partitions can be taken the same for any nite
number of fs.
Next we prove two properties of the It o integral. The rst one justies
the notation
_
0
f
t
dw
t
, and the second one shows a kind of local property
of this integral.
8. Theorem. (i) If f H, 0 = t
0
< t
1
< ... < t
n
< ..., f
t
= f
t
i
for
t [t
i
, t
i+1
) and i 0, then in the mean square sense
_

0
f
t
dw
t
=
i=0
f
t
i
(w
t
i+1
w
t
i
).
(ii) If g, h H, A T, and h
t
() = g
t
() for t 0 and A, then
_
0
g
t
dw
t
=
_
0
h
t
dw
t
on A (a.s.).
Proof. (i) Dene f
i
t
= f
t
i
I
(t
i
,t
i+1
]
and observe the simple fact that f =
i
f
i
-a.e. Then the linearity and continuity of the It o integral show that
to prove (i) it suces to prove that
_

0
gI
(r,s]
(t) dw
t
= (w
s
w
r
)g (7)
(a.s.) if g is T
r
-measurable, Eg
2
< , and 0 r < s < .
If g is a step function (having the form

n
i=1
c
i
I
A
i
with constant c
i
and
A
i
T
r
), then (7) follows from Theorem 7.4. The general case is suggested
as Exercise 7.5.
To prove (ii), take common partitions for g and h from Remark 7 and
on their basis construct the sequences g
n
t
and h
n
t
. Then by (i) the left-hand
sides of (6) for f
n
t
= g
n
t
and f
n
t
= h
n
t
coincide on A (a.s.). Formula (6)
then says that the same is true for the integrals of g and h. The theorem is
proved.
Much later (see Sec. 6.1) we will come back to Ito stochastic integrals
with variable upper limit. We want these integrals to be continuous. For
this purpose we need some properties of martingales which we present in
the following chapter. The reader can skip it if he/she is only interested in
stationary processes.
2.5 Use Exercise 1.4.14, with R(x) = x, and estimate
_
x
0
_
(ln y)/y dy
through
_
x(ln x) by using lHospitals rule.
2.10 The cases a b and a > b are dierent. At some moment you may
like to consult the proof of Theorem 2.3 taking there 2
2n
in place of n.
2.12 If P( a, b) =
_
b
f(x) dx for every b, then Eg()I

a
=
_
R
g(x)f(x) dx. The result of these computations is given in Sec. 6.8.
3.4 It suces to prove that the indicators of sets (s, t] are in L
p
(, ).
3.8 Observe that
(s) = Eexp(i
n=1
f(s +
n
)),
and by using the independence of the
n
and the fact that EF(
1
,
2
, ...) =
E(
1
), where (t) = EF(t,
2
, ...), show that
(s) =
_

0
e
if(s+t)t
(s +t) dt = e
s
_

s
e
if(t)
(e
t
(t)) dt.
Conclude rst that is continuous, then that (s)e
s
is dierentiable, and
solve the above equation. After that, approximate by continuous functions
the function which is constant on each interval (t
j
, t
j+1
] and vanishes outside
of the union of these intervals.
3.14 Prove that, for every Borel nonnegative f, we have
E
n
1
f(
n
) =
_
1
0
f(s) ds,
and use it to pass to the limit from step functions to arbitrary ones.
3.21 For b
n
> 0 with b
n
1, we have
b
n
= 0 if and only if
n
(1 b
n
) =
.
3.23 Use Remark 1.4.10 and (3.6).
5.9 Take any continuous function u(x) dened on [a, b] such that u < 0 in
(a, b) and u(a) = u(b) = 0, and use it to write a formula similar to (5.1).
6.7 Dene as the rst exit time of x
t
from (a, b) and, similarly to (6.3),
prove that
u(0) = E
_

0
e
t
(u(x
t
) u
t
(x
t
) (1/2)u
tt
(x
t
)) dt +Ee
u(x
).
7.7 Observe that
_
(0,t]
s
d
s
=
t
(
t
+ 1)/2.
7.8 First take f
t
= I
.
8.4 Keep in mind the proof of Theorem 8.2, and redo Exercise 7.5 for
t
in
place of w
t
.
8.5 Take a sequence of step functions converging to f -a.e., and observe
that step functions are T
t
-adapted.
Chapter 3
Martingales
1. Conditional expectations
The notion of conditional expectation plays a tremendous role in probability
theory. In this book it appears in the rst place in connection with the theory
of martingales, which we will use several times in the future, in particular,
to construct a continuous version of the It o stochastic integral with variable
upper limit.
Let (, T, P) be a probability space, ( a -eld and ( T.
1. Denition. Let and be random variables, and moreover let be
(-measurable. Assume that E[[, E[[ < and for every A ( we have
EI
A
= EI
A
.
Then we call a conditional expectation of given ( and write E[( = .
If ( is generated by a random element , one also uses the notation =
E[. Finally, if = I
A
with A T, then we write P(A[() = E(I
A
[().
The notation E[( needs a justication.
2. Theorem. If
1
and
2
are conditional expectations of given (, then
1
=
2
(a.s.).
Proof. By denition, for any A (,
E
1
I
A
= E
2
I
A
, E(
1

2
)I
A
= 0.
Since
1

2
is (-measurable, one can take A = :
1
()
2
() > 0.
Then one gets E(
1
2
)
+
= 0, (
1
2
)
+
= 0, and
1

2
(a.s.). Similarly,
2

1
(a.s.). The theorem is proved.
71
72 Chapter 3. Martingales, Sec 1
The denition of conditional expectation involves expectations. There-
fore, if = E([(), then any (-measurable function coinciding with almost
surely also is a conditional expectation of given (. Theorem 2 says that
the converse is also true. To avoid misunderstanding, let us emphasize that
if
1
= E([() and
2
= E([(), then we cannot say that
1
() =
2
() for
all , although this equality does hold for almost every .
3. Exercise. Let =
n
A
n
be a partition of into disjoint sets A
n
T,
n = 1, 2, .... Let ( = (A
n
, n = 1, 2, ...). Prove that
E([() =
1
P(A
n
)
EI
A
n
_
0
0
:= 0
_
almost surely on A
n
for any n.
4. Exercise. Let (, ) be an R
2
-valued random variable and p(x, y) a non-
negative Borel function on R
2
. Remember that p is called a density of (, )
if for any Borel B B(R
2
) we have
P((, ) B) =
_
B
p(x, y) dxdy.
Denote =
_
R
p(x, ) dx, assume E[[ < , and prove that (a.s.)
E([) =
1
_
R
xp(x, ) dx
_
0
0
:= 0
_
.
We need some properties of conditional expectations.
5. Theorem. Let E[[ < . Then E([() exists.
Proof. On the probability space (, (, P) consider the set function
(A) = E
+
I
A
, A (. Obviously 0 and () = E
+
< . Fur-
thermore, from measure theory we know that is a measure and (A) = 0
if P(A) = 0. Thus is absolutely continuous with respect to P, and by the
Radon-Nikod ym theorem there is a (-measurable function
(+)
0 such
that
(A) =
_
A
(+)
P(d) = E
(+)
I
A
for any A (. Similarly, there is a (-measurable
()
such that E
I
A
=
E
()
I
A
for any A (. The random variable
(+)

()
is obviously a
conditional expectation of given (. The theorem is proved.
The next theorem characterizes computing conditional expectation as a
linear operation.
Ch 3 Section 1. Conditional expectations 73
6. Theorem. Let E[[ < . Then
(i) for any constant c we have E(c[() = cE([() (a.s.), in particular,
E(0[() = 0 (a.s.);
(ii) we have EE([() = E;
(iii) if E[
1
[, E[
2
[ < , then E(
1

2
[() = E(
1
[() E(
2
[() (a.s.);
(iv) if is (-measurable, then E([() = (a.s.);
(v) if a -eld (
1
(, then
E
_
E([(
1
)[(
_
= E
_
E([()[(
1
_
= E([(
1
)
(a.s.), which can be expressed as the statement that the smallest -eld pre-
vails.
Proof. Assertions (i) through (iv) are immediate consequences of the
denitions and Theorem 2. To prove (v), let = E([(),
1
= E([(
1
).
Since
1
is (
1
-measurable and (
1
(, we have that
1
is (-measurable.
Hence E(
1
[() =
1
(a.s.) by (iv); that is,
E
_
E([(
1
)[(
_
= E([(
1
).
Furthermore, if A (
1
, then A ( and EI
A
= EI
A
= E
1
I
A
by denition. The equality of the extreme terms by denition means that
E([(
1
) =
1
. The theorem is proved.
Next we study the properties of conditional expectations related to in-
equalities and limits.
7. Theorem. (i) If E[
1
[, E[
2
[ < , and
2

1
(a.s.), then E(
2
[()
E(
1
[() (a.s.).
(ii) (The monotone convergence theorem) If E[
i
[, E[[ < ,
i+1

i
(a.s.) for i = 1, 2, ... and = lim
i
i
(a.s.), then
lim
i
E(
i
[() = E([() (a.s.).
(iii) (Fatous theorem) If
i
0, E
i
< , i = 1, 2, ..., and E lim
i
i
<
, then
E lim
i
i
[( lim
i
E
i
[( (a.s.).
(iv) (The dominated convergence theorem) If [
i
[ , E < , and the
limit lim
i
i
=: exists, then
E([() = lim
i
E(
i
[() (a.s.).
(v) (Jensens inequality) If (t) is nite and convex on R and E[[ +
E[()[ < , then E(()[() (E([()) (a.s.).
Proof. (i) Let
i
= E(
i
[(), A = :
2
()
1
() 0. Then
E(
2

1
)
= E
1
I
A
E
2
I
A
= E
1
I
A
E
2
I
A
0.
Hence E(
2

1
)
= 0 and
2

1
(a.s.).
(ii) Again let
i
= E(
i
[(). Then the sequence
i
increases (a.s.) and
if := lim
i
i
on the set where the limit exists, then by the monotone
convergence theorem
EI
A
= lim
i
E
i
I
A
= lim
i
E
i
I
A
= EI
A
for every A (. Hence by denition = E([().
(iii) Observe that if a
n
, b
n
are two sequences of numbers and the lima
n
exists and a
n
b
n
, then lima
n
limb
n
. Since inf(
i
, i n) increases with
n and is less than
n
for each n, by the above we have
E( lim
n
inf(
i
, i n)[() = lim
n
E(inf(
i
, i n)[() lim
n
E(
n
[() (a.s.).
(iv) Owing to (iii),
lim
i
E(
i
[() = lim
i
E(
i
+[() E([() E( +[() E([() = E([()
(a.s.). Upon replacing
i
and with
i
and , we also get
lim
i
E(
i
[() E([()
(a.s.). The combination of these two inequalities proves (iv).
(v) It is well known that there exists a countable set of pairs (a
i
, b
i
) R
2
such that for all t
(t) = sup
i
(a
i
t +b
i
).
Hence, for any i, (t) a
i
t + b
i
and E(()[() a
i
E([() + b
i
(a.s.). It
only remains to take the sup with respect to countably many is (preserving
(a.s.)). The theorem is proved.
8. Corollary. If 0 and E < , then E([() 0 (a.s.).
9. Corollary. If p 1 and E[[
p
< , then [E([()[
p
E([[
p
[() (a.s.).
In particular, [E([()[ E([[[() (a.s.).
10. Corollary. If E[[ < and E[
i
[ 0 as i , then
E[E([() E(
i
[()[ E[
i
[ 0.
11. Remark. The monotone convergence theorem can be used to dene
E([() as the limit of the increasing sequence E( n[() as n for
any satisfying E
< . With this denition we would not need the

condition E[()[ < in Theorem 7 (v), and some other results would hold
true under less restrictive assumptions. However, in this book the notation
E([() is only used for with E[[ < .
The following theorem shows the relationship between conditional ex-
pectations and independence.
12. Theorem. (i) Let E[[ < and assume that and ( are independent
(see Denition 2.5.1). Then E([() = E (a.s.). In particular, E(c[() = c
(a.s.) for any constant c.
(ii) Let E[[ < and B (. Then E(I
B
[() = I
B
E([() (a.s.).
(iii) Let E[[ < , let be (-measurable, and let E[[ < . Then
E([() = E([() (a.s.).
Proof. (i) Let
n
(t) = 2
n
[2
n
t] and
n
=
n
(). Take A ( and notice
that [
n
[ 2
n
. Then our assertion follows from
EI
A
= lim
n
E(
n
I
A
) = lim
n
k=
k
2
n
P(k2
n
< (k + 1)2
n
, A)
= lim
n
k=
k
2
n
P(k2
n
< (k + 1)2
n
)P(A)
= P(A) lim
n
k=
k
2
n
P(k2
n
< (k + 1)2
n
) = P(A)E = E(I
A
E).
(ii) For = E([() and any A ( we have
E(I
B
)I
A
= EI
AB
= EI
AB
= E(I
B
)I
A
,
which yields the result by denition.
(iii) Denote
n
=
n
() and observe that [
n
[ [[. Therefore, E
n
and E(
n
[() exist. Also let B
nk
= : k2
n
< (k + 1)2
n
. Then
I
B
nk
E(
n
[() = E(I
B
nk
n
[() = k2
n
I
B
nk
E([() = I
B
nk
n
E([()
(a.s.). In other words, E(
n
[() =
n
E([() on B
nk
(a.s.). Since

k
B
nk
=
, this equality holds almost surely. By letting n and using [
n
[
2
n
, we get the result. The theorem is proved.
Sometimes the following generalization of Theorem 12 (iii) is useful.
13. Theorem. Let f(x, y) be a Borel nonnegative function on R
2
, and let
be (-measurable and independent of (. Assume Ef(, ) < . Denote
(y) := Ef(, y). Then (y) is a Borel function of y and
E(f(, )[() = () (a.s.). (1)
Proof. We just repeat part of the usual proof of Fubinis theorem. Let
be the collection of all Borel sets B R
2
such that EI
B
(, y) is a Borel
function and
E(I
B
(, )[() = (EI
B
(, y))[
y=
(a.s.). (2)
On the basis of the above results it is easy to check that is a -system.
In addition, contains the -system of all sets AB with A, B B(R),
since I
AB
(, y) = I
B
(y)I
A
(). Therefore, contains the smallest -eld
generated by . Since () = B(R
2
), EI
B
(, y) is a Borel function for all
Borel B R
2
.
Now a standard approximation of nonnegative Borel functions by linear
combinations of indicators shows that (y) is indeed a Borel function and
leads from (2) to (1). The theorem is proved.
In some cases one can nd conditional expectations by using Exercise 3
and the following result, the second assertion of which is called the normal
correlation theorem.
14. Theorem. (i) Let ( be a -eld, ( T. Denote H = L
2
(T, P),
H
1
= L
2
((, P), and let
be the orthogonal projection operator of H on

H
1
. Then, for each random variable with E
2
< , we have E([() =
(a.s.). In particular,
E(
)
2
= infE( )
2
: is (-measurable.
(ii) Let (,
1
, ...,
n
) be a Gaussian vector and ( the -eld generated by
(
1
, ...,
n
). Then E([() = a +b
1
1
+... +b
n
n
(a.s.), where (a, b
1
, ..., b
n
) is
any solution of the system
E =a +b
1
E
1
+... +b
n
E
n
,
E
i
=aE
i
+b
1
E
1
i
+... +b
n
E
n
i
, i = 1, ..., n.
(3)
Furthermore, system (3) always has at least one solution.
Proof. (i) We have that
is (-measurable, or at least has a (-

measurable modication for which we use the same notation. Furthermore,

for any H
1
, so that E(
) = 0. For = I
A
with
A ( this yields EI
A
EI
A
= 0, which by denition means that

E([() =
.
(ii) The function E( (a+b
1
1
+... +b
n
n
))
2
is a nonnegative quadratic
function of (a, b
1
, ..., b
n
).
15. Exercise. Prove that any nonnegative quadratic function attains its
minimum at at least one point.
Now take a point (a, b
1
, ..., b
n
) at which E( (a + b
1
1
+ ... + b
n
n
))
2
takes its minimum value, and write that all rst derivatives with respect to
(a, b
1
, ..., b
n
) vanish at this point. The most convenient way to do this is
to express E( (a + b
1
1
+ ... + b
n
n
))
2
by developing the second power
and factoring out all products of constants. Then we will see that system
(3) has a solution. Next, notice that for any solution of (3) and =
(a +b
1
1
+... +b
n
n
) we have
E
i
= 0, E = 0.
It follows that in the Gaussian vector (,
1
, ...,
n
) the rst component is
uncorrelated with the others. The theory of characteristic functions implies
that in that case is independent of (
1
, ...,
n
). Since any event A ( has
the form : (
1
, ...,
n
) with Borel R
n
, we conclude that and
( are independent. Hence E([() = E = 0 (a.s.), and, adding that
i
are
(-measurable, we nd that
0 = E([() = E([() (a +b
1
1
+... +b
n
n
)
16. Remark. Theorem 14 (i) shows another way to introduce the condi-
tional expectations on the basis of Hilbert space theory without using the
Radon-Nikod ym theorem.
17. Exercise. Let (,
1
, ...,
n
) be a Gaussian vector with mean zero, and
L the set of all linear combinations of
i
with constant coecients. Prove
that E([() coincides with the orthogonal projection in L
2
(T, P) of on L.
2. Discrete time martingales
A notion close to martingale was used by S.N. Bernstein. According to
J. Doob [Do] the notion of martingale was introduced in 1939 by J. Ville,
who is not very well known in the theory of probability. At the present
time the theory of martingales is a very wide and well developed branch of
probability theory with many applications in other areas of mathematics.
Many mathematicians took part in developing this theory, J. Doob, P. Levy,
P.-A. Meyer, H. Kunita, H. Watanabe, and D. Burkholder should be named
in any list of the main contributors to the theory.
Let (, T, P) be a complete probability space and let T
n
, n = 1, ..., N,
be a sequence of -elds satisfying T
1
T
2
... T
N
T.
1. Denition. A sequence of real-valued random variables
n
, n = 1, ..., N,
such that
n
is T
n
-measurable and E[
n
[ < for every n is called
(i) a martingale if, for each 1 n m N,
E(
m
[T
n
) =
n
(a.s.);
(ii) a submartingale if, for each 1 n m N,
E(
m
[T
n
)
n
(a.s.);
(iii) a supermartingale if, for each 1 n m N,
E(
m
[T
n
)
n
(a.s.).
In those cases in which the -eld T
n
should be mentioned one says,
for instance, that
n
is a martingale relative to T
n
or that (
n
, T
n
) is a
martingale.
Obviously,
n
is a supermartingale if and only if
n
is a submartingale,
and
n
is a martingale if and only if
n
are supermartingales. Because
of these simple facts we usually state the results only for submartingales
or supermartingales, whichever is more convenient. Also trivially, E
n
is
constant for a martingale, increases with n for submartingales and decreases
for supermartingales.
2. Exercise*. By using properties of conditional expectations, prove that:
(i) If E[[ < , then
n
:= E([T
n
) is a martingale.
(ii) If
1
, ...,
N
are independent, T
n
= (
1
, ...,
n
), and E
n
= 0, then
n
:=
1
+... +
n
is an T
n
-martingale.
(iii) If w
t
is a Wiener process and T
n
= (w
1
, ..., w
n
), then (w
n
, T
n
) and
(exp(w
n
n/2), T
n
) are martingales.
Ch 3 Section 2. Discrete time martingales 79
(iv) If
n
is a martingale, is convex and E[(
n
)[ < for any n, then
(
n
) is a submartingale. In particular, [
n
[ is a submartingale.
(v) If
n
is a submartingale and is a convex increasing function satis-
fying E[(
n
)[ < for any n, then (
n
) is a submartingale. In particular,
(
n
)
+
is a submartingale.
3. Exercise. By Denition 1 and properties of conditional expectations, a
sequence of real-valued random variables
n
, n = 1, ..., N, such that
n
is T
n
-
measurable and E[
n
[ < for every n, is a martingale if and only if, for each
1 n N,
n
= E(
N
[T
n
) (a.s.). This describes all martingales dened
for a nite number of times. Prove that a sequence of real-valued random
variables
n
0, n = 1, ..., N, such that
n
is T
n
-measurable and E[
n
[ <
for every n, is a submartingale if and only if, for each 1 n N, we have
n
= E(
n
[T
n
) (a.s.), where
n
is an increasing sequence of nonnegative
random variables such that
N
=
N
.
One also has a dierent characterization of submartingales.
4. Exercise. (i) (Doobs decomposition) Prove that a sequence of real-
valued random variables
n
, n = 1, ..., N, such that
n
is T
n
-measurable and
E[
n
[ < for every n, is a submartingale if and only if
n
= A
n
+ m
n
,
where m
n
is an T
n
-martingale and A
n
is an increasing sequence such that
A
1
= 0 and A
n
is T
n1
-measurable for every n 2.
(ii) (Multiplicative decomposition) Prove that a sequence of real-valued
random variables
n
0, n = 1, ..., N, such that
n
is T
n
-measurable and
E[
n
[ < for every n, is a submartingale if and only if
n
= A
n
m
n
, where
m
n
is a nonnegative T
n
-martingale and A
n
is an increasing sequence such
that A
1
= 1 and A
n
is T
n1
-measurable for every n 2.
5. Exercise. As a generalization of Exercise 2 (iii), prove that if (w
t
, T
t
) is
a Wiener process, 0 t
0
t
1
... t
N
, and b
n
are T
t
n
-measurable random
variables, then
_
exp
_
n1
i=0
b
i
(w
t
i+1
w
t
i
) (1/2)
n1
i=0
b
2
i
(t
i+1
t
i
)
_
, T
t
n
_
,
n = 1, ..., N, is a martingale with expectation 1.
The above denition describes martingales with discrete time parameter.
Similarly one introduces martingales dened on any subset of R. A distin-
guished feature of discrete time martingales is described in the following
lemma.
6. Lemma. (
n
, T
n
)
N
n=1
is a martingale if and only if the
n
are T
n
-measur-
able and
E[
n
[ < n N, E(
n+1
[T
n
) =
n
(a.s.) n N 1.
Similar assertions are true for sub- and supermartingales.
Proof. The only if part is obvious. To prove the if part, notice that,
for m = n,
n
is T
n
-measurable and E(
m
[T
n
) =
n
(a.s.). For m = n + 1
this equality holds by the assumption. For m = n + 2, since T
n
T
n+1
we
have
E(
m
[T
n
) = E(E(
n+2
[T
n+1
)[T
n
) = E(
n+1
[T
n
) =
n
(a.s.).
In the same way one considers other m n, ..., N. The lemma is proved.
7. Denition. Let real-valued random variables
n
and -elds T
n
T
be dened for n = 1, 2, ... and be such that
n
is T
n
-measurable and
E[
n
[ < , T
n+1
T
n
, E(
n
[T
n+1
) =
n+1
(a.s.) for all n. Then we say that (
n
, T
n
) is a reverse martingale.
An important and somewhat unexpected example of a reverse martingale
is given in the following theorem.
8. Theorem. Let
1
, ...,
N
be independent identically distributed random
variables with E[
1
[ < . Dene
n
= (
1
+... +
n
)/n, T
n
= (
m
: m = n, ..., N).
Then (
n
, T
n
) is a reverse martingale.
Proof. Simple manipulations show that it suces to prove that
E(
i
[
n
, ...,
N
) =
n
(a.s.) (1)
for n = 1, ..., N and i = 1, ..., n. In turn (1) will be proved if we prove that
E(
1
[
n
, ...,
N
) = E(
i
[
n
, ...,
N
) i = 1, ..., n (2)
(a.s.). Indeed, then, upon letting = E(
1
[
n
, ...,
N
), we nd that
n
= E(
n
[
n
, ...,
N
) =
1
n
E
_
(
1
+... +
n
)[
n
, ...,
N
_
=
1
n
n
(a.s.), which implies (1).
Ch 3 Section 3. Properties of martingales 81
To prove (2), observe that any event A (
n
, ...,
N
) can be written as
: (
n
, ...,
N
) B, where B is a Borel subset of R
Nn+1
. In addition,
the vectors (
1
,
2
, ...,
N
) and (
2
,
1
,
3
, ...,
N
) have the same distribution.
Therefore, the vectors
(
1
,
1
+
2
+... +
n
, ...,
1
+
2
+... +
N
)
and
(
2
,
2
+
1
+
3
+... +
n
, ...,
2
+
1
+
3
+... +
N
)
have the same distribution. In particular, for n 2,
E
2
I
(
n
,...,
N
)B
= E
1
I
(
n
,...,
N
)B
= EI
(
n
,...,
N
)B
.
Hence = E(
2
[
n
, ...,
N
) (a.s.). Similarly one proves (2) for other values
of i. The theorem is proved.
3. Properties of martingales
First we adapt the denition of ltration of -elds from Sec. 2.7 to the case
of sequences.
1. Denition. Let T
n
be -elds dened for n = 0, 1, 2, ... and such that
T
n
T and T
n
T
n+1
. Then we say that we are given an (increasing)
ltration of -elds T
n
.
2. Denition. Let be a random variable with values in 0, 1, ..., . We
say that is a stopping time (relative to T
n
) if : () > n T
n
for all
n = 0, 1, 2, ....
Observe that we do not assume to be nite, on a subset of it may be
equal to . The simplest examples of stopping time are given by nonrandom
nonnegative integers.
3. Exercise*. Prove that a nonnegative integer-valued random variable
is a stopping time if and only if : = n T
n
for all n 0. Also prove
that , , and + are stopping times if and are stopping times.
In applications, quite often the -eld T
n
is interpreted as the set of all
events observable or happening up to moment of time n when we conduct a
series of experiments. Assume that we decided to stop our experiments at a
random time and then, of course, stop observing its future development.
Then, for every n, the event = n denitely either occurs or does not occur
on the interval of time [0, n], which is transformed into the requirement
= n T
n
. This is the origin of the term stopping time.
4. Example. Let
n
, n = 0, 1, 2, ..., be a sequence of T
n
-measurable ran-
dom variables, and let c R be a constant. Dene
= infn 0 :
n
() c (inf := )
as the rst time when
n
hits [c, ) (making the denition inf :=
natural). It turns out that is a stopping time.
Intuitively it is clear, since, for every , knowing
0
, ...,
n
we know
whether one of them is higher than c or not, that is, whether > n or
not, which shows that : > n (
0
, ...,
n
) T
n
. To get a rigorous
argument, observe that
: > n = :
0
< c, ...,
n
() < c
and this set is in T
n
since, for i = 0, 1, ..., n, the
i
are T
i
-measurable, and
because T
i
T
n
they are T
n
-measurable as well.
5. Exercise. Let
n
be integer valued and c an integer. Assume that
from Example 4 is nite. Is it true that
= c ?
If
n
, n = 0, 1, 2, ..., is a sequence of random variables and is an integer-
valued variable, then the sequence
n
=
n
coincides with
n
for n and
equals
after that. Therefore, we say that

n
is the sequence
n
stopped at
time .
6. Theorem (Doob). Let (
n
, T
n
), n = 0, 1, 2, ..., be a submartingale, and
let be an T
n
-stopping time. Then (
n
, T
n
), n = 0, 1, 2, ..., is a sub-
martingale.
Proof. Observe that
n
=
0
I
=0
+... +
n
I
=n
+
n
I
>n
, I
n
=
0
I
=0
+... +
n
I
=n
.
It follows by Exercise 3 that
n
and I
n
are T
n
-measurable. By fac-
toring out T
n
-measurable random variables, we nd that
E(
(n+1)
[T
n
) = E(I
>n
n+1
[T
n
)+E(I
n
[T
n
) I
>n
n
+I
n
=
n
7. Corollary. If is a bounded stopping time, then on the set : n
we have (a.s.)
E(
[T
n
)
n
.
Indeed, if () N, then
=
(N+n)
and
E(
[T
n
) = E(
(N+n)
[T
n
)
n
,
where the last term equals
n
if n.
8. Denition. Let be a stopping time. Dene T
as the family of all

events A T such that
A : () n T
n
n = 0, 1, 2, ... .
9. Exercise*. The notation T
needs a justication. Prove that, if for an

integer n we have n, then T
= T
n
.
Clearly, if , then T
= T. Also it is not hard to see that T
is
always a -eld and T
T. This -eld is interpreted as the collection

of all events which happen during the time interval [0, ]. The simplest
properties of -elds T
are collected in the following lemma.

10. Lemma. Let and be stopping times, let ,
n
, n = 0, 1, 2, ..., , be
random variables, let
n
be T
n
-measurable (T
:= T), and let E[[ < .

Then
(i) A T
A : () = n T
n
n = 0, 1, 2, ..., ;
(ii) : T
and, if , then T
;
(iii) ,
,
n
I
=n
are T
-measurable for all n = 0, 1, 2, ..., ;

(iv) E([T
) = E([T
n
) (a.s.) on the set = n for any n = 0, 1, 2, ..., .
Proof. We set the proof of (i) as an exercise. To prove (ii) notice that
= n = n = n T
n
,
= n = = n, n T
n
.
Hence : T
by (i). In addition, if and A T
, then
A = n =
n
_
i=1
A = i = n T
n
because A = i T
i
T
n
. Therefore, A T
for each A T
; that
is, T
.
(iii) Since constants are stopping times, (ii) leads to n T
, so
that is T
-measurable. Furthermore, for A :=
< c, where c is a
constant, we have
A = n =
n
< c, = n T
n
for any n = 0, 1, 2, ..., . Hence A T
and
is T
-measurable. That the

same holds for
n
I
=n
follows from
n
I
=n
=
I
=n
.
(iv) Dene = I
=n
E([T
) and = I
=n
E([T
n
). Notice that by (i),
for any constant c
< c = ,= n, 0 < c
_
= n E([T
) < c
_
T
n
.
Hence is T
n
-measurable. Also is T
n
-measurable due to Exercise 3.
Furthermore, for any A T
n
assertion (iii) (with
n
= I
A
and
k
= 0 for
k ,= n) implies that
EI
A
= EI
A
I
=n
E([T
) = EI
A
I
=n
= EI
A
.
Since both and are T
n
-measurable, we conclude that = E([T
n
) =
(a.s.). The lemma is proved.
11. Theorem (Doobs optional sampling theorem). Let (
n
, T
n
), n 0,
be a submartingale, and let
i
, i = 1, ..., m, be bounded stopping times satis-
fying
1
...
m
. Then (
i
, T
i
), i = 1, ..., m, is a submartingale.
Proof. The T
i
-measurability of
i
follows from Lemma 10. This lemma
and Corollary 7 imply also that on the set
i
= n we have
E(
i+1
[T
i
) = E(
i+1
[T
n
)
n
=
i
(a.s.)
since
i+1
n. Upon noticing that the union of
i
= n is , we get the
result. The theorem is proved.
12. Corollary. If and are bounded stopping times and , then
E(
[T
) and E
.
Surprisingly enough the inequality E
in Corollary 12 can be
taken as a denition of submartingale. An advantage of this denition is
that it allows one to avoid using the theory of conditional expectations
altogether. In connection with this we set the reader the following exercise.
13. Exercise. Let
n
be summable T
n
-measurable random variables given
for n = 0, 1, .... Assume that for any bounded stopping times we have
E
, and prove that (

n
, T
n
) is a submartingale.
14. Theorem (Doob-Kolmogorov inequality). (i) If (
n
, T
n
), n = 0, 1, ...,
is a submartingale, c > 0 is a constant, and N is an integer, then
Pmax
nN
n
c
1
c
E
N
I
max
nN

n
c

1
c
E(
N
)
+
, (1)
Psup
n
n
c
1
c
sup
n
E(
n
)
+
. (2)
(ii) If (
n
, T
n
), n = 0, 1, ..., is a supermartingale,
n
0, and c > 0 is a
constant, then
Psup
n
n
c
1
c
E
0
.
Proof. (i) First we prove (1). Since the second inequality in (1) obviously
follows from the rst one, we only need to prove the latter. Dene
= inf(n 0 :
n
c).
By applying Corollary 12 with N and N in place of and and also
using Chebyshevs inequality we nd that
Pmax
nN
n
c = P
I
N
c
1
c
E
I
N
=
1
c
E
N
I
N
1
c
EI
N
E(
N
[T
N
) =
1
c
EI
N
N
=
1
c
E
N
I
max
nN

n
c
.
To prove (2) notice that, for any > 0,
sup
n
n
c
_
N
max
nN
n
c
and the terms in the union expand as N grows. Hence, for < c
Psup
n
n
c lim
N
Pmax
nN
n
c
lim
N
1
c
E(
N
)
+

1
c
sup
n
E(
n
)
+
.
The arbitrariness of proves (2).
(ii) Introduce as above and x an integer N. Then, as in the beginning
of the proof,
Pmax
nN
n
c
1
c
E
I
N
=
1
c
E
N
I
N

1
c
E
N

1
c
E
0
.
Now one can let N as above. The theorem is proved.
15. Theorem (Doobs inequality). If (
n
, T
n
), n = 0, 1, ..., is a nonnega-
tive submartingale and p > 1, then
E
_
sup
n
p
q
p
sup
n
E
p
n
, (3)
where q = p/(p 1). In particular,
E
_
sup
n
2
4 sup
n
E
2
n
.
Proof. Without losing generality we assume that the right-hand side of
(3) is nite. Then for any integer N
_
sup
nN
_

nN
p
N
p
nN
p
n
, E
_
sup
nN
p
< .
Next, by the Doob-Kolmogorov inequality, for c > 0,
Psup
nN
n
c
1
c
E
N
I
sup
nN

n
c
.
We multiply both sides by pc
p1
, integrate with respect to c (0, ), and
use
P( c) = EI
c
,
p
= p
_

0
c
p1
I
c
dc,
where is any nonnegative random variable. We also use H olders inequality.
Then we nd that
E
_
sup
nN
p
qE
N
_
sup
nN
p1
q
_
E
p
N
_
1/p
_
E
_
sup
nN
p
_
11/p
.
Upon dividing through by the last factor (which is nite by the above) we
conclude that
E
_
sup
nN
p
q
p
sup
n
E
p
n
.
It only remains to use Fatous theorem and let N . The theorem is
proved.
Ch 3 Section 4. Limit theorems for martingales 87
4. Limit theorems for martingales
Let (
n
, T
n
), n = 0, 1, ..., N, be a submartingale, and let a and b be xed
numbers such that a < b. Dene consecutively the following:
1
= inf(n 0 :
n
a) N,
1
= inf(n
1
:
n
b) N,
n
= inf(n
n1
:
n
a) N,
n
= inf(n
n
:
n
b) N.
Clearly 0
1

1

2

2
... and
N+i
=
N+i
= N for all i 0. We
have seen before that
1
is a stopping time.
1. Exercise*. Prove that all
n
and
n
are stopping times.
The points (n,
n
) belong to R
2
. We join the points (n,
n
) and (n +
1,
n+1
) for n = 0, ..., N 1 by straight segments. Then we obtain a piece-
wise linear function, say l. Let us say that if
m
a and
m
b, then
on [
m
,
m
] the function l upcrosses (a, b). Denote (a, b) the number of
upcrossings of the interval (a, b) by l. It is seen that (a, b) = m if and only
if
m
a,
m
b and either
m+1
> a or
m+1
< b.
The following theorem is the basis for obtaining limit theorems for mar-
tingales.
2. Theorem (Doobs upcrossing inequality). If (
n
, T
n
), n = 0, 1, ..., N, is
a submartingale and a < b, then
E(a, b)
1
b a
E(
N
a)
+
.
Proof. Notice that (a, b) is also the number of upcrossing of (0, b a)
by the piecewise linear function constructed from (
n
a)
+
. Furthermore,
n
a and (
n
a)
+
are submartingales along with
n
. It follows that without
loss of generality we may assume that
n
0 and a = 0. In that case notice
that any upcrossing of (0, b) can only occur on an interval of type [
i
,
i
]
with
i
b. Also in any case,
n
0. Hence,
b(a, b) (
1
) + (
2
) +... + (
N
).
Furthermore,
n+1

n
and E
n+1
E
n
. It follows that
bE(a, b)
E
1
+(E
1
E
2
) +(E
2
E
3
) +... +(E
N1
E
N
) +E
N
E
N
E
1
E
N
= E
N
,
thus proving the theorem.
3. Exercise. For
n
0 and a = 0 it seems that typically
n
b and
n+1
= 0. Then why do we have E
n+1
E
n
?
If we have a submartingale (
n
, T
n
) dened for all n = 0, 1, 2, ..., then
we can construct our piecewise linear function on (0, ) and dene
(a, b)
as the number of upcrossing of (a, b) on [0, ) by this function. Obviously
(a, b) is the monotone limit of upcrossing numbers on [0, N]. By Fatous

theorem we obtain the following.
4. Corollary. If (
n
, T
n
), n = 0, 1, 2, ..., is a submartingale, then
E
(a, b)
1
b a
sup
n
E(
n
a)
+

1
b a
(sup
n
E(
n
)
+
+[a[).
5. Theorem. Let one of the following conditions hold:
(i ) (
n
, T
n
), n = 0, 1, 2, ..., is a submartingale and sup
n
E(
n
)
+
< ;
(ii ) (
n
, T
n
), n = 0, 1, 2, ..., is a supermartingale and sup
n
E(
n
)
< ;
(iii ) (
n
, T
n
), n = 0, 1, 2, ..., is a martingale and sup
n
E[
n
[ < .
Then the limit lim
n
n
exists with probability one.
Proof. Obviously we only need prove the assertion under condition (i).
Dene as the set of all rational numbers on R, and notice that almost
obviously
: lim
n
n
() > lim
n
n
() =
_
a,b,a<b
:
(a, b) = .
Then it only remains to notice that the events on the right have probability
zero since
E
(a, b)
1
b a
(sup
n
E(
n
)
+
+[a[) < ,
so that
(a, b) < (a.s.). The theorem is proved.

6. Corollary. Any nonnegative supermartingale converges at innity with
probability one.
7. Corollary (cf. Exercise 2.2). If (
n
, T
n
), n = 0, 1, 2, ..., is a martingale
and is a random variable such that [
n
[ for all n and E < , then
n
= E(
[T
n
) (a.s.), where
= lim
n
n
.
Indeed, by the dominated convergence theorem for martingales
n
= E(
n+m
[T
n
) = lim
m
E(
n+m
[T
n
) = E(
[T
n
).
Corollary 7 describes all bounded martingales. The situation with un-
bounded, even nonnegative, martingales is much more subtle.
8. Exercise. Let
n
= exp(w
n
n/2), where w
t
is a Wiener process. By
using Corollary 2.4.3, show that
= 0, so that
n
> E(
[T
n
). Conclude
that Esup
n
n
= and, moreover, that for every nonrandom sequence
n(k) , no matter how sparse it is, E sup
k

n(k)
= .
In the case of reverse martingales one does not need any additional
conditions for its limit to exist.
9. Theorem. Let (
n
, T
n
), n = 0, 1, 2, ..., be a reverse martingale. Then
lim
n
n
exists with probability one.
Proof. By denition (
n
, T
n
), n = ..., 2, 1, 0, is a martingale. De-
note by
N
(a, b) the number of upcrossing of (a, b) by the piecewise linear
function constructed from
n
restricted to [N, 0]. By Doobs theorem,
E
N
(a, b) (E[
0
[ + [a[)/(b a). Hence E lim
N
N
(a, b) < , and we get
the result as in the proof of Theorem 5.
10. Theorem (Levy-Doob). Let be a random variable such that E[[ <
, and let T
n
be -elds dened for n = 0, 1, 2, ... and satisfying T
n
T.
(i) Assume T
n
T
n+1
for each n, and denote by T
the smallest -eld

containing all T
n
(T
=
_
n
T
n
). Then
lim
n
E([T
n
) = E([T
) (a.s.), (1)
lim
n
E[E([T
n
) E([T
)[ = 0. (2)
(ii) Assume T
n
T
n+1
for all n and denote T
=

n
T
n
. Then (1)
and (2) hold again.
To prove the theorem we need the following remarkable result.
11. Lemma (Schee). Let ,
n
, n = 1, 2, ..., be nonnegative random vari-
ables such that
n
P
and E
n
E as n . Then E[
n
[ 0.
This lemma follows immediately from the dominated convergence theo-
rem and from the relations
[
n
[ = 2(
n
)
+
(
n
), (
n
)
+

+
, (
n
)
+
P
0.
Proof of Theorem 10. (i) Writing =
+

shows that we may

concentrate on 0. Then Lemma 11 implies that we only need to prove
(1).
Denote = E([T
) and observe that
n
:= E([T
n
) = E(E([T
)[T
n
) = E([T
n
).
Therefore it only remains to prove that if is T
-measurable, 0, and
E < , then
n
:= E([T
n
) (a.s.).
Obviously (
n
, T
n
) is a nonnegative martingale. By Theorem 5 it has a
limit at innity, which we denote
. Since the
n
are T
-measurable,
is T
-measurable as well. Now for each k = 0, 1, 2, ... and A T

k
we have
A T
n
for all large n, and by Fatous theorem
EI
A
lim
n
EI
A
n
= lim
n
EI
A
E([T
n
) = EI
A
. (3)
Hence EI
A
(
) is a nonnegative measure dened on the algebra
n
T
n
.
This measure uniquely extends to T
and yields a nonnegative measure on

T
. Since EI
A
(
) considered on T
is obviously one of the extensions,

we have EI
A
(
) 0 for all A T
. Upon taking A = :
0,
we see that E(
= 0, that is,
(a.s.).
Furthermore, if is bounded, then the inequality in (3) becomes an
equality, implying =
(a.s.). Thus, in general
(a.s.) and, if
is bounded, then =
(a.s.). It only remains to notice that, for any

constant a 0,

= lim
n
E([T
n
) lim
n
E( a[T
n
) = a (a.s.),
that is,
a (a.s.), and let a . This proves (i).

(ii) As in (i) we may and will assume that 0. Denote
n
= E([T
n
).
Then (
n
, T
n
) is a reverse martingale, and lim
n
n
exists with probability
one. We dene
to be this limit where it exists, and 0 otherwise. Ob-

viously
is T
n
-measurable for any n, and therefore T
-measurable. Let
:= E([T
) and for any A T
write
EI
A
lim
n
EI
A
n
= EI
A
= EI
A
.
It follows that

(a.s.). Again, if is bounded, then
=

(a.s.).
Next,
= lim
n
E([T
n
) lim
n
E( a[T
n
) = E( a[T
)
(a.s.). By letting a and using the monotone convergence theorem we
conclude that
lim
a
E( a[T
) = E([T
) =


From the Levy-Doob theorem one gets one more proof of the strong law
of large numbers.
12. Theorem (Kolmogorov). Let
1
,
2
, ... be independent identically dis-
tributed random variables with E[
1
[ < . Denote m = E
1
. Then
lim
n
1
n
(
1
+... +
n
) = m (a.s.), lim
n
E
1
n
(
1
+... +
n
) m
= 0.
Proof. Without losing generality we assume that m = 0. Dene
n
= (
1
+... +
n
)/n, T
N
n
= (
n
, ...,
N
), T
n
=

Nn
T
N
n
.
We know that (
n
, T
N
n
), n = 1, 2, ..., N, is a reverse martingale (Theorem
2.8). In particular,
n
= E(
1
[T
N
n
) (a.s.), whence by Levys theorem
n
=
E(
1
[T
n
) (a.s.). Again by Levys theorem, := lim
n
n
exists almost
surely and in L
1
(T, P). It only remains to prove that = 0 (a.s.).
Since E[
1
[ < and E
1
= 0, the function (t) = Ee
it
1
is continuously
dierentiable and
t
(0) = 0. In particular, (t) = 1 +o(t) as t 0 and
Ee
it
= lim
n
((t/n))
n
= lim
n
(1 +o(t/n))
n
= 1
for any t. This implies = 0 (a.s.). The theorem is proved.
One application of martingales outside of probability theory is related
to dierentiating.
13. Exercise. Prove the following version of Lebesgues dierentiation the-
orem. Let f(t) be a nite monotone function on [0, 1]. For x [0, 1] and
integer n 0 write x = k2
n
+ , where k is an integer and 0 < 2
n
,
and dene a
n
(x) = k2
n
and b
n
(x) = (k + 1)2
n
. Prove that
lim
n
f(b
n
(x)) f(a
n
(x))
b
n
(x) a
n
(x)
exists for almost every x [0, 1].
The following exercise bears on a version of Lebesgues dierentiation
theorem for measures.
14. Exercise. Let (, T) be a measurable space and (T
n
) an increasing
ltration of -elds T
n
T. Assume T =
_
n
T
n
. Let and be two
probability measures on (, T). Denote by
n
and
n
the restrictions of
and respectively on (, T
n
), and show that for any n and nonnegative
T
n
-measurable function f we have
_
f (d) =
_
f
n
(d). (4)
Next, assume that
n
is absolutely continuous with respect to
n
and let
n
() be the Radon-Nikod ym derivative
n
(d)/
n
(d). Prove that:
(i) lim
n
n
exists -almost everywhere and, if we denote
() =
_
lim
n
n
() if
:= sup
n
n
< ,
otherwise,
then
(ii) if is absolutely continuous with respect to , then = (d)/(d)
and
n
in L
1
(T, ), while
(iii) in the general case admits the following decomposition into the
sum of absolutely continuous and singular parts: =
a
+
s
, where
a
(A) =
_
I
A
(d),
s
(A) = (A = ).
1.4 Notice that, for any Borel f(y) with E[f()[ < , we have
Ef() =
_
R
2
f(y)p(x, y) dxdy.
2.2 In (iii) notice that w
m
= w
n
+(w
m
w
n
), where w
m
w
n
is independent
of T
n
.
2.3 In the proof of necessity start by writing (0 0
1
:= 0)
n
=
n
_
E(
n+1
[T
n
)
_
1
E(
n+1
[T
n
) =
n
E(
n+1
[T
n
) = E(
n
n+1
[T
n
),
where
n
:=
n
_
E(
n+1
[T
n
)
_
1
1, then iterate.
2.4 (ii) If the decomposition exists, then E(
n
[T
n1
) = A
n
m
n1
.
2.5 Use Theorem 1.13.
3.3 Consider the event n.
3.5 In some cases the answer is no.
3.13 For A T
n
dene = n on A and = n + 1 on A
c
.
4.13 On the probability space ([0, 1], B([0, 1]), ) take the ltration of -
elds T
n
each of which is dened as the -eld generated by the sets
[k2
n
, (k + 1)2
n
), k = 0, 1, ..., 2
n
1.
Then check that f(b
n
(x)) f(a
n
(x))2
n
n
.
4.14 (i) By using (4.4) prove that
n
is an T
n
-martingale on (, T, ). (iii)
For each a > 0 dene
a
= infn 0 :
n
> a and show that for every n,
A T
n
, and m n
(A) =
n
(A) =
_
A
I
a
>m
(
m
a) (d) +(A
a
m).
By letting m , derive that
(A) =
_
A
I
a
(d) +(A
> a).
Next let a and extend the formula from A T
n
to all of T. (ii) Use
(iii), remember that is a probability measure, and use Schees lemma.
Chapter 4
Stationary Processes
1. Simplest properties of second-order
stationary processes
1. Denition. Let T [, ). A complex-valued random process
t
dened on (T, ) is called second-order stationary if E[
t
[
2
< , E
t
is
constant, and the function E
s
t
depends only on the dierence s t for
t, s > T.
The function R(s t) = E
s
t
is called the correlation function of
t
.
We will always assume that E
t
0 and that R(t) is continuous in t.
2. Exercise*. Prove that R is continuous if and only if the function
t
is
continuous in t in the mean-square sense, that is, as a function from (T, )
to L
2
(T, P).
Notice some simple properties of R. Obviously, R(0) = E
t
t
= E[
t
[
2
is a real number. Also R(t) = E
t
0
if 0 (T, ), and generally
R(t) = E
r+t
r
, R(t) = E
st
s
= E
s
st
=

R(t) (1)
provided r, r +t, s, st (T, ). The most important property of R is that
it is positive denite.
3. Denition. A complex-valued function r(t) given on (, ) is called
positive denite if for every integer n 1, t
1
, ..., t
n
R, and complex
z
1
, ..., z
n
we have
95
96 Chapter 4. Stationary Processes, Sec 1
n
j,k=1
r(t
j
t
k
)z
j
z
k
0 (2)
(in particular, it is assumed that the sum in (2) is a real number).
That R is positive denite, one proves in the following way: take s large
enough and write
n
j,k=1
R(t
j
t
k
)z
j
z
k
= E
n
j,k=1
z
j
z
k
s+t
j
s+t
k
= E[
n
j=1
z
j
s+t
j
[
2
0.
Below we prove the Bochner-Khinchin theorem on the general form of
positive denite functions. We need the following.
4. Lemma. Let r(t) be a continuous positive denite function. Then
(i ) r(0) 0,
(ii ) r(t) = r(t), [r(t)[ r(0) and, in particular, r(t) is a bounded
function,
(iii) if
_
[r(t)[ dt < , then

_

r(t) dt 0,
(iv) for every x R, the function e
itx
r(t), as a function of t, is positive
denite.
Proof. Assertion (i) follows from (2) with n = 1, z = 1. Assertion (iv)
also trivially follows from (2) if one replaces z
k
with z
k
e
it
k
x
.
To prove (ii), take n = 2, t
1
= t, t
2
= 0, z
1
= z, z
2
= , where is a real
number. Then (2) becomes
r(0)([z[
2
+
2
) +r(t)z +r(t) z 0. (3)
It follows immediately that r(t)z+r(t) z is real for any complex z. Further-
more, since r(t) z + r(t)z = 2Re r(t) z is real, the number (r(t) r(t))z
is real for any complex z, which is only possible when r(t) r(t) = 0.
Next, from (3) with z = r(t) we get
r(0)[r(t)[
2
+r(0)
2
+ 2[r(t)[
2
0
Ch 4 Section 1. Simplest properties of stationary processes 97
for all real . It follows that [r(t)[
4
r
2
(0)[r(t)[
2
0. This proves asser-
tion (ii).
Turning to assertion (iii), remember that r is continuous and its integral
is the limit of appropriate sums. Viewing dt and ds as z
j
and z
k
, respectively,
from (2), we get
_
N
N
_
N
N
r(t s) dtds
i,j
r(t
i
t
j
)t
i
t
j
0,
0
1
N
_
N
N
_
N
N
r(t s) dtds =
_

r(t)(2
[t[
N
)I
[t[2N
dt,
where the equality follows after the change of variables t s = t
t
, t +s = s
t
.
By the Lebesgue dominated convergence theorem the last integral converges
to 2
_
r(t) dt. This proves assertion (iii) and nishes the proof of the
lemma.
5. Theorem (Bochner-Khinchin). Let r(t) be a continuous positive denite
function. Then there exists a unique nonnegative measure F on R such that
F(R) = r(0) and
r(t) =
_
R
e
itx
F(dx) t R. (4)
Proof. The uniqueness follows at once from the theory of characteristic
functions. In the proof of existence, without loss of generality, one may
assume that r(0) ,= 0 and even that r(0) = 1.
Assuming that r(0) = 1, we rst prove (4) in the particular case in which
_
R
[r(t)[ dt < . (5)
Then by Lemma 4 (ii) we have
_
R
[r(t)[
2
dt < .
Next, dene f as the Fourier transform of r:
f(x) =
1
2
_
R
e
itx
r(t) dt.
By Lemma 4 (iii), (iv) we have f(x) 0. From the theory of the Fourier
transform we obtain that f L
2
(R) and
r(t) =
_
R
e
itx
f(x) dx (6)
for almost all t, where the last integral is understood in the sense of L
2
(as
the limit in L
2
of
_
[x[n
e
itx
f(x) dx). To nish the proof of the theorem in
our particular case, we prove that f is integrable, so that the integral in (6)
exists in the usual sense and is a continuous function of t, which along with
the continuity of r implies that (6) holds for all t rather than only almost
everywhere.
By Parsevals identity, for s > 0,
_
R
e
sx
2
/2
f(x) dx =
1
2s
_
R
e
t
2
/(2s)
r(t) dt
(knowing the characteristic function of the normal law, we know that the
function
_
s/(2)e
sx
2
/2
is the Fourier transform of e
t
2
/(2s)
). The last integral is rewritten as
Er(
s ), where N(0, 1), and it is seen that, owing to boundedness

and continuity of r, this integral converges to r(0) as s 0. Now the mono-
tone convergence theorem (f 0) shows that
_
R
f(x) dx = r(0) < .
This proves the theorem under condition (5).
In the general case, for > 0, dene
r
(t) := r(t)e
2
t
2
/2
= Er(t)e
it
.
The second equality and Lemma 4 (iv) show that r
is positive denite. Since

r
(0) = 1 and
_
R
[r
[ dt < , there exists a distribution for which r
is the
characteristic function. Now remember that in probability theory one proves
that if a sequence of characteristic functions converges to a function which
is continuous at zero, then this function is also the characteristic function
of a distribution. Since obviously r
r as 0, the above-mentioned fact

brings the proof of our theorem to an end.
6. Denition. The measure F, corresponding to R, is called the spectral
measure of R or of the corresponding second-order stationary process. If F
is absolutely continuous, its density is called a spectral density of R.
From the rst part of the proof of Theorem 5 we get the following.
Ch 4 Section 1. Simplest properties of stationary processes 99
7. Corollary. If
_
R
[R(t)[ dt < , then R admits a bounded continuous
spectral density.
From the uniqueness of representation and from (1) one easily obtains
the following.
8. Corollary. If R is real valued (

R = R) and the spectral density f exists,
then R is even and f is even (f(x) = f(x) (a.e.)). Conversely, if f is
even, then R is real valued and even.
Yet another description of positive denite functions is given in the fol-
lowing theorem.
9. Theorem. A function r(t) is continuous and positive denite if and only
if it is the correlation function of a second-order stationary process.
Proof. The suciency has been proved above. While proving the neces-
sity, without loss of generality, we may and will assume that r(0) = 1. By the
Bochner-Khinchin theorem the spectral distribution F exists. By Theorem
1.1.12 there exists a random variable with distribution F and character-
istic function r. Finally, take a random variable uniformly distributed on
[, ] and independent of , and dene
t
= e
i(t+)
, t R.
Then
E
t
= r(t)Ee
i
= 0, E
s
t
= Ee
i(st)
= r(s t),
which proves the theorem.
10. Remark. We have two representations for correlation functions of second-
order stationary processes:
R(s t) = E
s
t
and R(s t) =
_
R
e
isx
e
itx
F(dx).
Hence, in some sense, the random variable
t
given on corresponds to the
function e
itx
on R. We will see in the future that this correspondence turns
out to be very deep.
11. Exercise. In a natural way one gives the denition of a second-order
stationary sequence
n
given only for integers n (T, ). For a second-order
stationary sequence
n
its correlation function R(n) is dened on integers
n R. Prove that, for each such R(n), there exists a nonnegative measure
F on [, ] such that R(n) =
_
e
inx
F(dx) for all integers n R.
Various representation formulas play an important role in the theory of
second-order stationary processes. We are going to prove several of them,
starting with the following.
12. Theorem (Kotelnikov-Shannon). Let the spectral measure F of a
second-order stationary process
t
, given on R, be concentrated on (, ),
so that F(, ) = F(R). Then for every t
t
=
n=
sin (t n)
(t n)

n
_
sin 0
0
:= 1
_
,
which is understood as
t
= l.i.m.
m
m
n=m
sin(t n)
(t n)

n
.
Proof. We have to prove that
lim
m
E
t

m
n=m
sin(t n)
(t n)

n
2
= 0. (7)
This equality can be expressed in terms of the correlation function alone.
It follows that we need only prove (7) for some second-order stationary
process with the same correlation function R. We choose the process from
Theorem 9. Then we see that the expression in (7) under the limit sign
equals
E
e
it
n=m
sin (t n)
(t n)
e
in
2
= 0. (8)
Since the function e
itx
is continuously dierentiable in x, its partial
Fourier sums are uniformly bounded and converge to e
itx
on (, ). The
random variable takes values in (, ) by assumption, and the sum in
(8) is a partial Fourier sum of e
itx
evaluated at x = . Now the assertion of
the theorem follows from the Lebesgue dominated convergence theorem.
13. Exercise. Let be a uniformly distributed random variable on (, )
and
t
:= e
i(t+)
, so that the corresponding spectral measure is concen-
trated at . Prove that (for all )
n=
sin (t n)
(t n)

n
= e
i
cos t ,=
t
.
Ch 4 Section 2. Spectral decomposition of trajectories 101
14. Exercise*. Remember the way the one-dimensional Riemann integral
of continuous functions is dened. It turns out that this denition is easily
extendible to continuous functions with values in Banach spaces. We mean
the following denition.
Let f(t) be a continuous function dened on a nite interval [0, 1] with
values in a Banach space H. Then
_
1
0
f(t) dt := lim
n
2
n
1
i=0
f(i2
n
)2
n
,
where the limit is understood in the sense of convergence in H. Similarly one
denes the integrals over nite intervals [a, b]. Prove that the limit indeed
exists.
15. Exercise. The second-order stationary processes that we concentrate
on are assumed to be continuous as L
2
(T, P)-valued functions. Therefore,
according to Exercise 14 for nite a and b, the integral
_
b
a

t
dt is well-dened
as the integral of an L
2
(T, P)-valued continuous function
t
. We say that
this is the mean-square integral .
By using the same method as in the proof of Theorem 9, prove that if
t
is a second-order stationary process dened for all t, then
l.i.m.
T
1
2T
_
T
T
t
dt
always exists. Also prove that this limit equals zero if and only if F0 = 0.
Finally prove that F0 = 0 if R(t) 0 as t .
2. Spectral decomposition of trajectories
H. Cramer discovered a representation of trajectories of second-order sta-
tionary processes as sums of harmonics with random amplitudes. To prove
his result we need the following.
1. Lemma. Let F be a nite measure on B(R). Then the set of all func-
tions
f(x) =
n
j=1
c
j
e
it
j
x
, (1)
where c
j
, t
j
, and n are arbitrary, is everywhere dense in L
2
(B(R), F).
Proof. If the assertion of the lemma is false, then there exists a nonzero
element g L
2
(B(R), F) such that
_
R
g(x)e
itx
F(dx) = 0
for all t. Multiply this by a function

f(t) L
1
(B(R), ) and integrate with
respect to t R. Then by Fubinis theorem and the inequality
_
R
[g(x)[ F(dx)
_
_
R
[g(x)[
2
F(dx)
_
1/2
<
we obtain
_
R
g(x)f(x) F(dx) = 0, f(x) :=
_
R
f(t)e
itx
dt.
One knows that every smooth function f with compact support can be
written in the above form. Therefore, g is orthogonal to all such functions.
The same obviously holds for its real and imaginary parts, which we denote
g
r
and g
i
, respectively. Now for the measures
(dx) = g
r
(x) F(dx) we
have
_
R
f(x)
+
(dx) =
_
R
f(x)
(dx) (2)
for every smooth function f with compact support. Then, as in Theorem
1.2.4, we obtain that
+
=
, so that (2) holds for all f 0. Substituting

f = g
r+
and noticing that the right-hand side of (2) vanishes, we see that
g
r+
= 0 F-almost everywhere. Similarly g
r
= 0 and g
i
= 0 F-almost
everywhere, so that g = 0 F-almost everywhere, contradicting the choice of
g. The lemma is proved.
2. Theorem (Cramer). Let
t
be a (mean-square continuous) second-order
stationary process on R. Let F be the spectral measure of
t
. Then, on
the collection of all sets of type (, a], there exists a random orthogonal
measure with reference measure F such that E(, a] = 0 and
t
=
_
R
e
itx
(dx) (a.s.) t. (3)
If
1
is another random orthogonal measure having these properties, then,
for any a R, we have
1
(, a] = (, a] (a.s.).
Ch 4 Section 2. Spectral decomposition of trajectories 103
Proof. Instead of nding in the rst place, we will nd the stochastic
integral against . To do so, dene an operator : L
2
(B(R), F) L
2
(T, P)
in the following way. For f given by (1), dene (cf. (3) and Remark 1.10)
f =
n
j=1
c
j
t
j
.
It is easy to check that if f is as in (1), then
[f[
2
L
2
(B(R),F)
= E
j=1
c
j
t
j
2
= E[f[
2
, Ef = 0. (4)
It follows, in particular, that the operator is well dened on f of type (1)
(cf. the argument after Remark 2.3.11). By the way, the fact that it is well
dened does not follow from the fact that, if we are given some constants
c
j
, t
j
, c
t
j
, t
t
j
, j = 1, ..., n, t
1
< ... < t
n
, t
t
1
< ... < t
t
n
, and
n
j=1
c
j
e
it
j
x
=
n
j=1
c
t
j
e
it
j
x
for all x, then the families (c
j
, t
j
) and (c
t
j
, t
t
j
) are the same.
We also see that the operator is a linear isometry dened on the
linear subspace of functions (1) as a subspace of L
2
(B(R), F) and maps
it into L
2
(T, P). By Lemma 2.3.12 it admits a unique extension to an
operator dened on the closure in L
2
(B(R), F) of this subspace. We keep
the notation for this extension and remember that the closure in question
coincides with L
2
(B(R), F) by Lemma 1. Thus, we have a linear isometric
operator : L
2
(B(R), F) L
2
(T, P) such that e
it
=
t
(a.s.).
Next, observe that I
(,a]
L
2
(B(R), F) and dene
(, a] = I
(,a]
. (5)
Since preserves scalar products, we have
E(, a]
(, b] = F
_
(, a] (, b]
_
.
Hence is a random orthogonal measure with reference measure F. Fur-
thermore, it follows from (5) that
f =
_
R
f(x) (dx) (6)
if f is a step function. Since and the stochastic integral are continuous
operators, (6) holds (a.s.) for any f L
2
(B(R), F). For f = e
itx
we
conclude that
t
= f =
_
R
e
itx
(dx) (a.s.).
Finally, as has been noticed in (4), we have Ef = 0 if f is a function
of type (1). For any f L
2
(B(R), F), take a sequence of functions f
n
of
type (1) converging to f in L
2
(B(R), F) and observe that
[Ef[ = [E(f f
n
)[
_
E[f f
n
[
2
_
1/2
,
where the last expression tends to zero by the isometry of and the choice
of f
n
. Thus, Ef = 0 for any f L
2
(B(R), F). By taking f = I
(,a]
, we
conclude that E(, a] = 0.
We have proved the existence part of our theorem. To prove the
uniqueness, dene
1
f =
_
R
f
1
(dx). The isometric operators and
1
coincide on all functions of type (1), and hence on L
2
(B(R), F). In partic-
ular, (, a] = I
(,a]
=
1
I
(,a]
=
1
(, a] (a.s.). The theorem is
proved.
3. Remark. We have seen in the proof that, for each f L
2
(B(R), F),
E
_
R
f (dx) = 0.
4. Remark. Let L
2
be the smallest linear closed subspace of L
2
(T, P) con-
taining all
t
, t R. Obviously the operator in the proof of Theorem 2
is acting from L
2
(B(R), F) into L
2
. Therefore, (, a] and each integral
_
R
g (dx) with g L
2
(B(R), F) belong to L
2
.
Furthermore, every element of L
2
is representable as
_
R
g (dx) with g
L
2
(B(R), F), due to the equality
t
=
_
R
exp(itx) (dx) and the isometric
property of stochastic integrals.
5. Denition. We say that a complex-valued random vector (
1
, ...,
n
) is
Gaussian if for any complex numbers
1
, ...,
n
we have

j

j
j
=
1
+
i
2
, where = (
1
,
2
) is a two-dimensional Gaussian vector (with real
coordinates). As usual, a complex-valued or a real-valued Gaussian process
is one whose nite-dimensional distributions are all Gaussian.
Ch 4 Section 3. Ornstein-Uhlenbeck process 105
6. Corollary. If
t
is a Gaussian process, then
(
_
R
f
1
(x) (dx), ...,
_
R
f
n
(x) (dx))
is a Gaussian vector for any f
j
L
2
(B(R), F).
This assertion follows from the facts that f is Gaussian for trigonomet-
ric polynomials and mean-square limits of Gaussian variables are Gaussian.
7. Corollary. If
t
is a real valued second-order stationary process, then
_
R
f(x) (dx) =
_
R
f(x) (dx) (a.s.) f L

2
(B(R), F).
This follows from the fact that the equality holds for f = e
itx
.
8. Exercise. Prove that if both
t
and are real valued, then
t
is inde-
pendent of t in the sense that
t
=
s
(a.s.) for any s, t.
9. Exercise. Let be a random orthogonal measure, dened on all sets
(, a], satisfying E(, a] = 0 and having nite reference measure.
Prove that
_
R
e
itx
(dx)
is a mean square continuous second-order stationary process.
10. Denition. The random orthogonal measure whose existence is as-
serted in Theorem 2 is called the random spectral measure of
t
, and formula
(3) is called the spectral representation of the process
t
.
For processes with spectral densities which are rational functions, one
can give yet another representation of their trajectories. In order to under-
stand how to do this, we start with an important example.
3. Ornstein-Uhlenbeck process
The Wiener process is not second-order stationary because Ew
2
t
= t is not
a constant and the distribution of w
t
spreads out when time is growing.
However, if we add to w
t
a drift which would keep the variance moderate,
then we can hope to construct a second-order stationary process on the basis
of w
t
. The simplest way to do so is to consider the following equation:
t
=
0

_
t
0
s
ds +w
t
, (1)
where and are real numbers, > 0,
0
is a real-valued random variable
independent of w
, and w
t
is a one dimensional Wiener process. For each
, equation (1) has a unique solution
t
dened for all t 0, which follows
after writing down the equation for
t
:=
t
w
t
. Indeed,

t
= (
t
+w
t
),
0
=
0
,
t
=
0
e
t
_
t
0
e
(st)
w
s
ds,
t
=
0
e
t
_
t
0
e
(st)
w
s
ds +w
t
. (2)
By Theorem 2.3.22 bearing on integration by parts (cf. Remark 2.4.4), the
last formula reads
t
=
0
e
t
+
_
t
0
e
(st)
dw
s
(a.s.). (3)
1. Theorem. Let
0
N(0,
2
/(2)). Then the solution
t
of equation
(1) is a Gaussian second-order stationary process on [0, ) with zero mean,
correlation function
R(t) =

2
2
e
[t[
,
and spectral density
f(x) =

2
2
1
x
2
+
2
.
Proof. It follows from (3) that E
t
= 0 and E[
t
[
2
< . The reader
who did Exercise 2.3.23 will understand that the fact that
t
is a Gaussian
process is proved as in this exercise with the additional observation that, by
assumption,
0
is Gaussian and independent of w
.
Next, for t
1
t
2
0, from (3) and the isometric property of stochastic
integrals, we get
E
t
1
t
2
=

2
2
e
(t
1
+t
2
)
+
2
_
t
2
0
e
(2st
1
t
2
)
ds =

2
2
e
(t
2
t
1
)
.
It follows that
t
is second-order stationary with correlation function R. The
fact that f is indeed its spectral density is checked by simple computation.
2. Denition. A real-valued Gaussian second-order stationary process de-
ned on R is called an Ornstein-Uhlenbeck process if its correlation function
satises R(t) = R(0) exp([t[), where is a nonnegative constant.
3. Exercise. Prove that if
t
is a real-valued Gaussian second-order station-
ary Markov process dened on R, then it is an Ornstein-Uhlenbeck process.
Also prove the converse.
Here by the Markov property we mean that
Ef(
t
)[
t
1
, ...,
t
n
= Ef(
t
)[
t
n
(a.s.)
for any t
1
... t
n
t and Borel f satisfying E[f(
t
)[ < .
Theorem 1 makes it natural to conjecture that any Ornstein-Uhlenbeck
process
t
should satisfy equation (1) for t 0 with some Wiener process
w
t
. To prove the conjecture, for
t
satisfying (1), we nd w
t
in terms of the
random spectral measure of
t
if > 0 and > 0. We need a stochastic
version of Fubinis theorem, the proof of which we suggest as an exercise.
4. Exercise*. Let be a family of subsets of a set X. Let be a random
orthogonal measure dened on with reference measure dened on ().
Take a nite interval [a, b] R and assume that on [a, b] X we are given
a bounded function g(t, x) which is continuous in t [a, b] for any x X,
belongs to L
2
(, ) for any t [a, b], and satises
_
X
sup
t[a,b]
[g(t, x)[
2
(dx) < .
Prove that
_
X
g(t, x) (dx) is continuous in t as an L
2
(T, P)-valued func-
tion and
_
b
a
_
_
X
g(t, x) (dx)
_
dt =
_
X
_
_
b
a
g(t, x) dt
_
(dx),
where the rst integral against dt is the mean-square integral (see Exercise
1.14) and the second one is the Riemann integral of a continuous function.
By using this result, we nd that
w
t
=
t
0
+
_
t
0
s
ds =
_
R
_
e
itx
1 +
_
t
0
e
isx
ds
_
(dx)
=
_
R
e
itx
1
ix
(ix +) (dx),
w
t
=
_
R
e
itx
1
ix
ix +
(dx). (4)
This representation of w
t
will look more natural and invariant if one
replaces
2(ix+)
1
(dx) with a dierential of a new random orthogonal
measure. To do this rigorously, let

= (a, b] : < a b < and for
(a, b]

dene
(a, b] =

2
_
R
I
(a,b]
(x)
ix +
(dx).
It turns out that is a random orthogonal measure with reference measure
. Indeed,
E
_
R
I
(a
1
,b
1
]
(x)
ix +
(dx)
_
R
I
(a
2
,b
2
]
(x)
ix +
(dx)
=
_
R
I
(a
1
,b
1
]
(x)I
(a
2
,b
2
]
(x)
ix +
ix +
f(x) dx
=
1
2
_
R
I
(a
1
,b
1
]
(x)I
(a
2
,b
2
]
(x) dx =
1
2
((a
1
, b
1
] (a
2
, b
2
])
(remember that f =
2
(x
2
+
2
)
1
(2)
1
and the product of indicators is
the indicator of the intersection). By the way, random orthogonal measures
with reference measure are called standard random orthogonal measures.
Next for any g S(
, ), obviously,
_
R
g(x) (dx) =

2
_
R
g(x)
ix +
(dx) (a.s.).
Actually, this equality holds for any g L
2
(B(R), ), which is proved by
standard approximation after noticing that if g
n
S(
, ) and g
n
g in
L
2
(B(R), ), then
_
R
g
n
(x)
ix +
g(x)
ix +
2
f(x) dx =
1
2
_
R
[g
n
(x) g(x)[
2
dx 0.
In terms of formula (4) takes the form
w
t
=
1
2
_
R
e
itx
1
ix
(dx), t 0. (5)
Also
t
=
1
2
_
R
e
itx

ix +
(dx), t R. (6)
Now we want to prove that every Ornstein-Uhlenbeck process
t
satises
(1) with the Wiener process w
t
dened by (5). First of all we need to
prove that w
t
is indeed a Wiener process. In the future we need a stronger
statement, which we prove in the following lemma.
5. Lemma. Let
t
be a real-valued Gaussian second-order stationary pro-
cess dened on R. Assume that it has a spectral density f(x) , 0 which
is represented as (x) (x), where (x) is a rational function such that
(x) = (x) and all poles of (z) lie in the upper half plane Imz > 0. Let
be the random spectral measure of
t
. For < a < b < dene
(a, b] =
_
R
I
(a,b]
(x)
1
(x)
(dx)
_
1
(x)
:= 0 if (x) = 0
_
,
w
t
=
1
2
_
R
e
itx
1
ix
(dx), t 0. (7)
Then w
t
has a continuous modication which is a Wiener process indepen-
dent of
s
, s 0.
Proof. Notice that the number of points where (x) = 0 is nite and has
zero Lebesgue measure. Therefore, in the same way as before the lemma,
it is proved that is a standard random orthogonal measure, and since
(exp(itx) 1)/(ix) L
2
(B(R), ), the integral in (7) is well dened and
w
t
=
1
2
_
R
e
itx
1
ix
1
(x)
(dx).
By Corollary 2.6 the process w
t
is Gaussian. By virtue of (x) = (x)
and Corollary 2.7 we get
w
t
=
1
2
_
R
e
itx
1
ix
1
(x)
(dx) = w
t
,
so that w
t
is real valued. In addition, w
0
= 0, Ew
t
= 0, and
Ew
t
w
s
= Ew
t
w
s
=
1
2
_
R
e
itx
1
ix
e
isx
1
ix
dx. (8)
One can compute the last integral in two ways. First, if we take
t
from
Theorem 1, then (4) holds with its left-hand side being a Wiener process by
construction. For this process (8) holds, with the rst expression known to
be t s.
On the other hand, the Fourier transform of I
(0,s]
(z) is easily computed
and turns out to be proportional to (e
isx
1)/(ix). Therefore, by Parsevals
identity
1
2
_
R
e
itx
1
ix
e
isx
1
ix
dx =
_
R
I
(0,t]
(z)I
(0,s]
(z) dz = t s.
It follows in particular that E[w
t
w
s
[
2
= [t s[ and E[w
t
w
s
[
4
=
c[t s[
2
, where c is a constant. By Kolmogorovs theorem, w
t
has a con-
tinuous modication. This modication, again denoted w
t
, is the Wiener
process we need.
It only remains to prove that w
t
, t 0, and
s
, s 0, are independent.
Since (w
t
1
, ..., w
t
n
,
s
1
, ...,
s
m
) is a Gaussian vector for any t
1
, ..., t
n
0,
s
1
, ..., s
m
0, we need only prove that Ew
t
s
= 0 for all t 0 s. From
s
=
_
R
e
isx
(dx) =
_
R
e
isx
(x) (dx)
and (7) we obtain
E
s
w
t
=
1
2
_
R
e
isx
(x)
_
e
itx
1
ix
_
dx =
1
2
_
R
e
isx
(x)
e
itx
1
ix
dx.
Remember that (z) is square integrable over the real line and is a rational
function with poles in the upper half plane. Also the functions e
isz
and e
itz
are bounded in the lower half plane. It follows easily that
[(z)[ = O
_
1
[z[
_
,
e
isz
(z)
e
itz
1
iz
= O
_
1
[z[
2
_
for [z[ with Imz 0. By adding to this the fact that the function
e
isz
(z)
e
itz
1
iz
has no poles in the lower half plane, so that by Jordans lemma its integral
over the real line is zero, we conclude that E
s
w
t
= 0. The lemma is proved.
6. Remark. We know that the Wiener process is not dierentiable in t.
However, especially in technical literature, its derivative, called the white
noise, is used quite often.
Mathematically speaking, the white noise is a generalized function de-
pending on . We want to discuss why it is called white. There is a
complete analogy with white light, which is a mixture of colors correspond-
ing to electromagnetic waves with dierent frequencies. If one dierentiates
(7) formally, then
w
t
=
1
2
_
R
e
itx
(dx),
which shows that w
t
is a mixture of all harmonics e
itx
each taken with the
same mean amplitude (2)
1
E[(dx)[
2
= (2)
1
dx, and the amplitudes
corresponding to dierent frequencies are uncorrelated and moreover inde-
pendent.
7. Theorem. Let
t
be an Ornstein-Uhlenbeck process with
R(t) =
2
(2)
1
e
[t[
, > 0, > 0.
Then, for t 0, the process
t
admits a continuous modication

t
and there
exists a Wiener process w
t
such that
t
=

0

_
t
0
s
ds +w
t
t 0 (9)
and w
t
, t 0, and
s
, s 0, are independent.
Proof. Dene w
t
by (4). Obviously Lemma 5 is applicable with (x) =
(2)
1/2
(ix + )
1
. Therefore, the process w
t
has a continuous modica-
tion, which is a Wiener process independent of
s
, s 0, and for which we
keep the same notation. Let
t
=
0
e
t
_
t
0
e
(st)
w
s
ds +w
t
.
By the stochastic Fubini theorem
t
=
_
R
_
e
t
_
t
0
e
(st)
e
isx
1
ix
ix +
ds +
e
itx
1
ix
(ix +)
_
(dx)
=
_
R
e
itx
(dx) =
t
(a.s.). In addition

t
is continuous and satises (9), which is shown by
reversing the arguments leading to (2). The theorem is proved.
8. Exercise. We assumed that > 0. Prove that if = 0, then
t
=
0
(a.s.) for any t.
4. Gaussian stationary processes with rational
spectral densities
Let
t
be a real-valued Gaussian second-order stationary process. Assume
that it has a spectral density f(x) and f(x) = P
n
(x)/P
m
(x), where P
n
and P
m
are polynomials of degree n and m respectively. Without loss of
generality we assume that P
n
and P
m
do not have common roots and P
m
has the form x
m
+....
1. Exercise*. Assume that f(x) =

P
n
(x)/

P
m
(x), where

P
n
and

P
m
do not
have common roots and

P
m
(x) = x
m
+ .... Prove that

P
m
(x) P
m
(x) and
P
n
(x) P
n
(x).
Exercise 1 shows that n, m, P
n
and P
m
are determined uniquely. More-
over, since

f = f, we get that

P
n
= P
n
and

P
m
= P
m
, so that P
n
and P
m
are real valued. Furthermore, f is summable, so that the denominator P
m
does not have real zeros, m is even, and certainly n < m. Next, f 0 and
therefore each real zero of P
n
has even multiplicity. Since
t
is real valued,
by Corollary 1.8, we have f(x) = f(x), which along with the uniqueness
of representation implies that
P
n
(x) = P
n
(x), P
m
(x) = P
m
(x).
In turns it follows at once that if a is a root of P
m
, then a, a, and a
are also roots of P
m
. Remember that m is even, and dene
Q
+
(x) = i
m/2
Ima
j
>0
(x a
j
), Q
(x) = i
m/2
Ima
j
<0
(x a
j
),
where a
j
, j = 1, ..., m are the roots of P
m
. Notice that Q
+
(x)Q
(x) =
P
m
(x) and that, as follows from the above analysis, for real x,
Q
+
(x) = Q
(x) = i
m/2
(1)
m/2
Ima
j
<0
(x (a
j
))
= i
m/2
(1)
m/2
Ima
j
>0
(x a
j
) = Q
+
(x), Q
+
(x)Q
+
(x) = P
m
(x).
Similarly, if P
n
does not have real roots, then there exist polynomials
P
+
and P
such that
P
(x) = P
+
(x) = P
+
(x), P
+
(x)P
+
(x) = P
n
(x).
Such polynomials exist in the general case as well. In order to prove this, it
suces to notice that, for real a,
Ch 4 Section 4. Gaussian processes with rational spectral densities 113
(x a)
2k
(x +a)
2k
= (x
2
a
2
)
k
(x
2
a
2
)
k
, x
2k
= (ix)
k
(ix)
k
.
We have proved the following fact with = P
+
/Q
+
.
2. Lemma. Let the spectral density f(x) of a real-valued second-order sta-
tionary process
t
be rational, namely f(x) = P
n
(x)/P
m
(x), where P
n
and
P
m
are nonnegative polynomials of degree n and m respectively without com-
mon roots. Then m is even and f(x) = (x)(x), where the rational func-
tion (z) has exactly m/2 poles all of which lie in the upper half plane and
(x) = (x) for all x R.
3. Exercise. From the equality (x) = (x), valid for all x R, derive
that (ix) is real valued for real x.
4. Theorem. Let the spectral density f(x) of a real-valued Gaussian second-
order stationary process
t
be a rational function with simple poles. Then
there exist an integer k 1, (complex) constants
j
and
j
, and continuous
Gaussian processes
j
t
and w
t
dened for t [0, ) and j = 1, ..., k such
that
(i) w
t
is a Wiener process, (
1
0
, ...,
k
0
) is independent of w
, and w
t
,
t 0, is independent of
s
, s 0;
(ii)
j
t
=
j
0

j
_
t
0

j
s
ds +
j
w
t
for any t 0;
(iii) for t 0 we have
t
=
1
t
+... +
k
t
(a.s.).
Proof. As in the case of Ornstein-Uhlenbeck processes, we replace the
spectral representation
t
=
_
R
exp(itx) (dx) with
t
=
_
R
(x)e
itx
(dx),
where is taken from Lemma 2 and
(a, b] =
_
R
I
(a,b]
1
(x)
(dx)
_
1
0
:= 0
_
.
Such replacement is possible owing to the fact that
1
= 1 almost every-
where. It is also seen that is a standard orthogonal measure.
Next let
(x) =

1
ix +
1
+... +

k
ix +
k
(1)
be the decomposition of into partial fractions. Since the poles of lie
only in the upper half plane, we have Re
j
> 0. For t 0 denote
w
t
=
1
2
_
R
e
itx
1
ix
(dx),
j
t
=
_
R
j
ix +
j
e
itx
(dx).
Observe that
j
t
are Gaussian processes by Corollary 2.6 and w
t
is a
Wiener process by Lemma 3.5. Furthermore, by following our treatment of
the Ornstein-Uhlenbeck process one proves existence of a continuous modi-
cation
j
t
of
j
t
, the independence of
0
= (
1
0
, ...,
k
0
) and w
, and the fact

that
j
t
=
j
0

j
_
t
0
j
s
ds +
j
w
t
, t 0.
It only remains to notice that
t
=
1
t
+... +
k
t
=
1
t
+... +
k
t
(a.s.). The
theorem is proved.
Consider the following system of equations:
_
1
t
=
1
0

1
_
t
0

1
s
ds +
1
w
t
,
. . .
. . .
k
t
=
k
0

k
_
t
0

k
s
ds +
k
w
t
,
t
=
0
k
j=1
j
_
t
0

j
s
ds +w
t
k
j=1
j
,
and the system obtained from it for the real and imaginary parts of
j
t
. Then
we get the following result.
5. Theorem. Under the conditions of Theorem 4, for t 0 the process
t
has a continuous modication which is represented as the last coordinate of
a multidimensional real-valued Gaussian continuous process
t
satisfying
t
=
0

_
t
0
A
s
ds +w
t
B, t 0, (2)
where A, B are nonrandom, A is a matrix, B is a vector, w
t
is a one-
dimensional Wiener process, and
0
and w
are independent.
6. Remark. Theorem 5 is also true if the multiplicities of the poles are
greater than 1. To explain this, observe that then in (1) we also have terms
which are constants times higher negative powers of ix+
j
, so that we need
to understand what kind of equation holds for
Ch 4 Section 4. Gaussian processes with rational spectral densities 115
n
t
() :=
_
R
(ix +)
n+1
e
itx
(dx), (3)
where n 1, and are some complex numbers, and Re > 0. Arguing
formally, one sees that
d
n
d
n
0
t
() = (1)
n
n!
n
t
(),
and this is the clue. From above we know that there is a continuous modi-
cation
0
t
() of
0
t
() satisfying the equation
0
t
() =
0
0
()
_
t
0
0
s
() ds +w
t
.
If we are allowed to dierentiate this equation with respect to , then for
j
t
() = (1)
j
(j!)
1
d
j
0
t
()/d
j
,
after simple manipulations we get
_
1
t
() =
1
0
()
_
t
0

1
s
() ds +
_
t
0

0
s
() ds,
...
n
t
() =
n
0
()
_
t
0

n
s
() ds +
_
t
0

n1
s
() ds.
(4)
After having produced (4), we forget the way we did it and derive the
result we need rigorously. Dene
j
0
=
_
R
(ix +)
j+1
(dx)
and solve the system
_
0
t
=
0
0

_
t
0

0
s
ds +w
t
,
1
t
=
1
0

_
t
0

1
s
ds +
_
t
0

0
s
ds,
...
n
t
=
n
0

_
t
0

n
s
ds +
_
t
0

n1
s
ds,
(5)
which is equivalent to a system of rst-order linear ordinary dierential
equations. It turns out that
j
t
=
_
R
(ix +)
j+1
e
itx
(dx) (6)
(a.s.) for each t 0 and j = 0, ..., n. One proves this by induction, noticing
that for j = 0 this fact is known and, for j 1,
j
t
=
j
0
e
t
+
_
t
0
e
(st)
j1
s
ds,
so that if (6) holds with j 1 in place of j, then, owing to the stochastic
Fubini theorem,
j
t
=
_
R
_

(ix +)
j
e
t
+
_
t
0
e
(st)

(ix +)
j1
e
isx
ds
_
(dx)
=
_
R
(ix +)
j
e
itx
(dx) (a.s.).
This completes the induction. Furthermore,
j
0
and w
are independent,
which is proved in the same way as in Lemma 3.5.
Thus, we see that the processes (3) are also representable as the last
coordinates of solutions of linear systems of type (5), and the argument
proving Theorem 5 works again.
7. Remark. Equation (2) is a multidimensional version of (3.1). In the
same way in which we arrived at (3.3), one proves that the solution to (2)
is given by
t
= e
At
0
+
_
t
0
e
A(st)
Bdw
s
= e
At
0
+e
At
_
t
0
e
As
Bdw
s
, (7)
where the vector
0
is composed of
jk
:=
_
R
1
(ix +
j
)
k
(dx), (8)
where k = 1, ..., n
j
, the i
j
s are the roots of Q
+
, and the n
j
are their
multiplicities.
Ch 4 Section 5. Remarks about predicting stationary processes 117
8. Remark. Similarly to the one-dimensional case, one gives the denition
of stationary vector-valued process and, as in Section 3, one proves that
the right-hand side of (7) is a Gaussian stationary process even if B is a
matrix and w
t
is a multidimensional Wiener process, provided that
0
is
appropriately distributed and A only has eigenvalues with strictly positive
real parts.
9. Remark. We will see later (Sec. 6.11) that solutions of stochastic equa-
tions even more complex than (2) have the Markov property, and then we
will be able to say that real-valued Gaussian second-order stationary pro-
cesses with rational spectral density are just components of multidimensional
Gaussian Markov processes.
5. Remarks about predicting Gaussian stationary
processes with rational spectral densities
We follow the notation from Sec. 4 and again take a real-valued Gaussian
second-order stationary process
t
and assume that it has a spectral density
f(x) which is a rational function. In Sec. 4 we showed that there is a rep-
resentation of the form f = [[
2
and constructed satisfying = P
+
/Q
+
.
Actually, all results of Sec. 4 also hold if we take = P
/Q
+
. It turns out
that the choice of = P
+
/Q
+
is crucial in applications, in particular, in
solving the problem of predicting
t
for t 0 given observations of
s
for
s 0. We explain this in the series of exercises and remarks below.
1. Exercise. Take = P
+
/Q
+
. Prove that for each g L
2
(B(R), ),
_
R
e
itx
g(x)(x) dx = 0 t < 0 =
_
R
g(x)
1
(ix +
j
)
k
dx = 0,
where k = 1, ..., n
j
, the i
j
s are the roots of Q
+
, and the n
j
are their
multiplicities.
2. Exercise. Let L
2
(a, b) be the smallest linear closed subspace of L
2
(T, P)
containing all
t
, t (a, b). By using Exercise 1 prove that (see (4.8))
jk
,
_
R
1
Q
+
(x)
(dx) L
2
(, 0). (1)
3. Remark. Now we can explain why we prefer to take P
+
/Q
+
and not
P
/Q
+
. Here it is convenient to assume that the space (, T, P) is complete,
so that, if a complete -eld ( T and for some functions and we have
= (a.s.) and is (-measurable, so is . Let T
0
be the completion of the
-eld generated by the
t
, t 0. Notice that in formula (4.7) the random
vector
0
is T
0
-measurable by Exercise 2. Owing to the independence of
w
t
, t 0, and
s
, s 0, for any bounded Borel h() (for instance, depending
only on the last coordinate of the vector ), we have (a.s.)
E[h(
t
)[T
0
] = [Eh( +
_
t
0
e
A(st)
Bdw
s
)][
=
0
e
At .
We see that now the problem of prediction is reduced to the problem
of expressing
0
or equivalently
jk
in terms of
t
, t 0. The following few
exercises are aimed at showing how this can be done.
4. Exercise. As a continuation of Exercise 2, prove that if all roots of
P
+
are real, then in (1) one can replace L
2
(, 0) with L
2
(0, ) or with
L
2
(, 0)L
2
(0, ). By the way, this intersection can be much richer than
only multiples of
0
. Consider the case (x) = ix/(ix +1)
2
and prove that
:=
_
R
1
(ix + 1)
2
(dx) L
2
(, 0) L
2
(0, ) and
0
.
5. Exercise*. We say that a process
t
, given in a neighborhood of a point
t
0
, is dierentiable in the mean-square sense at the point t = t
0
and its
derivative equals if
l.i.m.
tt
0
t

t
0
t t
0
= .
In an obvious way one gives the denition of higher order mean-square
derivatives. Prove that
t
has (mn)/2 1 mean-square derivatives. Fur-
thermore, for j (mn)/2 1
d
j
dt
j
t
= i
j
_
R
e
itx
x
j
(x) (dx),
where by d
j
/dt
j
we mean the jth mean-square derivative.
6. Exercise*. As we have seen in the situation of Exercise 4, L
2
(, 0)
L
2
(0, ) is not a linear subspace generated by
0
. Neither is this a linear
subspace generated by
0
and the derivatives of
t
at zero, which is seen
from the same example given in Exercise 4. In this connection, prove that if
P
+
= const, then
jk
from (4.8) and
0
do admit representations as values
at zero of certain ordinary dierential operators applied to
t
.
7. Remark. In what concerns
0
, the general situation is not too much
more complicated than the one described in Exercise 6. Observe that by
Exercise 5, the process
Ch 4 Section 6. Stationary processes 119
t
=
_
R
1
Q
+
(x)
e
itx
(dx)
satises the equation P
+
(iD
t
)
t
=
t
, t 0, which is understood as an
equation for L
2
-valued functions with appropriate conditions at (cf. the
hint to Exercise 1 and Exercise 8). The theory of ordinary dierential equa-
tions for Banach-space-valued functions is quite parallel to that of real-
valued functions. In particular, well known formulas for solutions of linear
equations are available. Therefore, there are formulas expressing

t
through
s
, s 0. Furthermore, as in Exercise 6 the random variables
jk
from (4.8)
are representable as values at zero of certain ordinary dierential operators
applied to

t
.
8. Exercise. For > 0, let P
+
(x) = P
+
(x i). Prove that there exists a
unique solution of P
+
(iD
t
)
t
=
t
on (, 0) in the class of functions

t
for which E[
t
[
2
is bounded. Also prove that l.i.m.
0

t
=

t
.
6. Stationary processes and the
Birkho-Khinchin theorem
For second-order stationary processes the covariance between
t
and
t+s
is
independent of the time shift. There are processes possessing stronger time
shift invariance properties.
1. Denition. A real-valued process
n
given for integers n (T, ) is
said to be stationary if for any integers k
1
, ..., k
n
(T, ) and i 0 the
distributions of the vectors (
k
1
, ...,
k
n
) and (
k
1
+i
, ...,
k
n
+i
) coincide.
Usually we assume that T = 1, so that
n
is given for n = 0, 1, 2, ....
Observe that, obviously, if
n
is stationary, then f(
n
) is stationary for any
Borel f.
2. Exercise*. Prove that
n
, n = 0, 1, 2, ..., is stationary if and only if
for each integer n, the vectors (
0
, ...,
n
) and (
1
, ...,
n+1
) have the same
distribution.
3. Example. The sequence
n
is stationary for any random variable .
4. Example. Any sequence of independent identically distributed random
variables is stationary.
5. Example. This example generalizes both Examples 3 and 4. Remember
that a random sequence
0
,
1
, ... is called exchangeable if for every n and
every permutation of 0, 1, 2, ..., n, the distribution of (
(0)
, ...,
(n)
)
coincides with that of (
0
, ...,
n
).
It turns out that if a sequence
0
,
1
, ... is exchangeable, then it is sta-
tionary. Indeed, for any Borel bounded f
Ef(
0
, ...,
n+1
) = Ef(
1
, ...,
n+1
,
0
).
By taking f independent of the last coordinate, we see that the laws of
the vectors (
1
, ...,
n+1
) and (
0
, ...,
n
) coincide, so that
n
is stationary by
Exercise 2.
6. Example. Clearly, if
0
,
1
, ... is stationary and E[
k
[
2
< for some
k, the same holds for any k > T and E
n
k
does not change under the
translations n n + i, k k + i. Therefore, E
n
k
depends only on the
dierence k n, and
n
is a mean-square stationary process (sequence).
The converse is also true if
n
is a Gaussian sequence, since then the nite-
dimensional distributions of
, which are uniquely determined by the mean

value and the covariance function, do not change under translations of time.
In particular, the Ornstein-Uhlenbeck process (considered at integral times)
is stationary.
7. Example. Let be a circle of length 1 centered at zero with Borel -
eld and linear Lebesgue measure. Fix a point x
0
and think of any
other point x as the length of the arc from x
0
to x in the clockwise
direction. Then the operation x
1
+x
2
is well dened. Fix and dene
n
() = +n. Since the distribution of +x is the same as that of for
any x, we have that the distribution of (
0
( + x), ...,
n
( + x)) coincides
with that of (
0
(), ...,
n
()) for any x. By taking x = , we conclude that
n
is a stationary process.
For stationary processes we will prove only one theorem, namely the
Birkho-Khinchin theorem. This theorem was rst proved by Birkho, then
generalized by Khinchin. Kolmogorov, F. Riesz, E. Hopf and many others
invented various proofs and generalizations of the theorem. All these proofs,
however, were quite involved. Only at the end of the sixties did Garsia
nd an elementary proof of the key Hopf inequality which made it possible
to present the proof of the Birkho-Khinchin theorem in this introductory
book.
The proof, given below, consists of two parts, the rst being the proof
of the Hopf maximal inequality, and the second being some more or less
general manipulations. In order to get acquainted with these manipulations,
we show them rst not for stationary processes but for reverse martingales.
We will see again that they have (a.s.) limits as n , this time without
using Doobs upcrossing theorem.
Remember that a sequence (
n
, T
n
) is called a reverse martingale if the
-elds T
n
T decrease in n,
n
is T
n
-measurable, E[
n
[ < and
E
n
[T
n+1
=
n+1
(a.s.).
Then
n
= E
0
[T
n
and, as we know (Theorem 3.4.9), the limit of
n
exists
almost surely as n .
Let us prove this fact starting with the Kolmogorov-Doob inequality:
for any p R (and not only p > 0),
E
0
I
max
in
i
>p
=
n1
i=0
E
0
I
n
,...,
ni
p,
ni1
>p
+E
0
I
n
>p
=
n1
i=0
E
ni1
I
n
,...,
ni
p,
ni1
>p
+E
n
I
n
>p
p
_
n1
i=0
P(
n
, ...,
ni
p,
ni1
> p) +P(
n
> p)
_
= pP(max
in
i
> p).
From the above proof it is also seen that if A T
:=
n
T
n
, then
E
0
I
A,max
in

i
>p
pP(A, max
in
i
> p). (1)
Take here
A = B C
p
, C
p
:= : lim
n
n
> p, B T
.
Clearly, for n n
0
, the random variable sup
in
i
is T
n
- and T
n
0
-measurable.
Hence, C
p
T
n
0
and C
p
T
. Furthermore, C
p
max
in
i
> p C
p
as
n . Therefore, employing also the dominated convergence theorem,
from (1), we get
E
0
I
BC
p
,max
in
i
>p
pP(B C
p
, max
in
i
> p), (2)
E
0
I
BC
p
pP(B C
p
). (3)
By replacing
n
with
n
and p with p, for any q R, we obtain
E
0
I
BD
q
qP(B D
q
) with D
q
:= : lim
n
n
< q. (4)
Now take B = D
q
in (3) and B = C
p
in (4). Then
pP( lim
n
n
< q, lim
n
n
> p) E
0
I
D
q
C
p
qP( lim
n
n
< q, lim
n
n
> p).
For p > q, these inequalities are only possible if
P( lim
n
n
< q, lim
n
n
> p) = 0.
Therefore, the set
: lim
n
n
< lim
n
n
=
_
rational p,q
p>q
: lim
n
n
< q, lim
n
n
> p
has probability zero, and this proves that lim
n
n
exists almost surely.
Coming back to stationary processes, we give the following denition.
8. Denition. An event A is called invariant if for each n 0 and Borel
f(x
0
, ..., x
n
) such that E[f(
0
, ...,
n
)[ < , we have
Ef(
1
, ...,
n+1
)I
A
= Ef(
0
, ...,
n
)I
A
.
Denote
S
n
=
0
+... +
n
n 0,

l = lim
n
S
n
n
, l = lim
n
S
n
n
.
9. Lemma. For any Borel B R
2
, the event : (
l, l) B is invariant.
Proof. Fix n 0 and without loss of generality only concentrate on
Borel bounded f 0. Dene
i
(B) = Ef(
i
, ...,
n+i
)I
(
l,l)B
.
We need to prove that
0
(B) =
1
(B). Since the
i
s are nite measures
on Borel Bs, it suces to prove that the integrals of bounded continuous
functions against
i
s coincide. Let g be such a function. Then
_
R
2
g(x, y)
i
(dxdy) = Ef(
i
, ...,
n+i
)g(
l, l).
Next, let S
t
n
=
1
+... +
n+1
. Then, by using the stationarity of
k
and
the dominated convergence theorem and denoting F
i
= f(
i
, ...,
n+i
), we
nd that
EF
0
g(
l, l) = lim
k
1
EF
0
g(
l, inf
rk
1
[S
r
/(r + 1)])
= lim
k
1
lim
k
2
EF
0
g(
l, min
k
2
rk
1
[S
r
/(r + 1)])
= lim
k
1
lim
k
2
lim
k
3
lim
k
4
EF
0
g( max
k
4
rk
3
[S
r
/(r + 1)], min
k
2
rk
1
[S
r
/(r + 1)])
= EF
1
g( lim
n
[S
t
n
/(n + 1)], lim
n
[S
t
n
/(n + 1)]) = EF
1
g(
l, l).
Now comes the key lemma.
10. Lemma (Hopf). Let A be an invariant event and E[
0
[ < . Then
for all p R and n = 1, 2, ..., we have
E
0
I
A,max
0in
[S
i
/(i+1)]>p
pPA, max
0in
[S
i
/(i + 1)] > p.
Proof (Garsia). First assume p = 0 and use the obvious equality
max
0in
S
i
=
0
+ max0, S
1
1
, ..., S
1
n
=
0
+ ( max
1in
S
1
i
)
+
,
where S
1
n
=
1
+... +
n
. Also notice that
E max
0in
[S
i
[ E([
0
[ +... +[
n
[) = (n + 1)E[
0
[ < .
Then, for any invariant A,
E
0
I
A,max
0in
[S
i
/(i+1)]>0
= E
0
I
A,max
0in
S
i
>0
= E( max
0in
S
i
)I
A,max
0in
S
i
>0
E( max
1in
S
1
i
)
+
I
A,max
0in
S
i
>0
E( max
0in
S
i
)
+
I
A
E( max
1in+1
S
1
i
)
+
I
A
.
The last expression is zero by denition, since A is invariant.
This proves the lemma for p = 0. In the general case, it suces to
consider
i
p instead of
i
and notice that S
i
/(i + 1) > p if and only if
(
0
p) +... + (
i
p) > 0. The lemma is proved.
11. Theorem (Birkho-Khinchin). Let
n
be a stationary process and f(x)
a Borel function such that E[f(
0
)[ < . Then (i) the limit
f
:= lim
n
1
n + 1
[f(
0
) +... +f(
n
)]
exists almost surely, and (ii) we have
E[f
[ E[f(
0
)[, lim
n
E
1
n + 1
[f(
0
) +... +f(
n
)]
= 0. (5)
Proof. (i) Since f(
n
) is a stationary process, without loss of generality
we may and will take f(
n
) =
n
and assume that E[
0
[ < . In this
situation we just repeat almost word for word the above proof of convergence
for reverse martingales.
Denote
n
= S
n
/(n+1). Then Hopfs lemma says that (2) holds provided
B C
p
is invariant. By letting n we obtain (3). Changing signs leads
to (4) provided B D
q
is invariant. Lemma 9 allows us to take B = D
q
in
(3) and B = C
p
in (4). The rest is exactly the same as above, and assertion
(i) follows.
(ii) The rst equation in (5) follows from (i), Fatous lemma, and the
fact that E[f(
k
)[ = E[f(
0
)[. The second one follows from (i) and the
dominated convergence theorem if f is bounded. In the general case, take
any > 0 and nd a bounded Borel g such that E[f(
0
) g(
0
)[ . Then
lim
n
E
1
n + 1
[f(
0
) +... +f(
n
)]
lim
n
E

1
n + 1
[f(
0
) g(
0
) +... +f(
n
) g(
n
)]
E[(f g)
[ +E[f(
0
) g(
0
)[ 2E[f(
0
) g(
0
)[ 2.
Since > 0 is arbitrary, we get the second equation in (5). The theorem is
proved.
12. Exercise. We concentrated on real-valued stationary processes only for
the sake of convenience of notation. One can consider stationary processes
with values in arbitrary measure spaces, and the Birkho-Khinchin theorem
with its proof carries over to them without any change. Moreover, obviously
instead of real-valued f one can take R
d
-valued functions. In connection
with this, prove that if
n
is a (real-valued) stationary process, f is a Borel
function satisfying E[f(
0
)[ < , and z is a complex number with [z[ = 1,
then the limit
lim
n
1
n + 1
[z
0
f(
0
) +... +z
n
f(
n
)]
exists almost surely.
The Birkho-Khinchin theorem looks like the strong law of large num-
bers, and its assertion is most valuable when the limit is nonrandom. In
that case (5) implies that f
= Ef(
0
). In other words,
lim
n
1
n + 1
[f(
0
) +... +f(
n
)] = Ef(
0
) (a.s.). (6)
Let us give some conditions for the limit to be constant.
13. Denition. A stationary process
n
is said to be ergodic if any invari-
ant event A belonging to (
0
,
1
, ...) has probability zero or one.
14. Theorem. If
n
is a stationary ergodic process, then (6) holds for any
Borel function f satisfying E[f(
0
)[ < .
Proof. By Lemma 9, for any constant c, the event f
c is invariant
and, obviously, belongs to (
0
,
1
, ...). Because of ergodicity, P(f
c) = 0
or 1. Since this holds for any constant c, f
= const (a.s.) and, as we have

seen before the theorem, (6) holds indeed. The theorem is proved.
The Birkho-Khinchin theorem for ergodic processes is important in
physics. For instance, take the problem of nding the average magnitude
of the speed of molecules of a gas in a given volume. Assume that the gas
is in a stationary regime, so that, in particular, this average is independent
of time. It is absolutely impossible to measure the speeds of all molecules
at a given time and then compute the average in question. The Birkho-
Khinchin theorem guarantees, on the intuitive level, that if we take almost
any particular molecule and measure its speed at moments 0, 1, 2, ..., then
the arithmetic means of magnitudes of these measurements will converge
to the average magnitude of speed of all molecules. Physical intuition tells
us that in order for this to be true, the molecules of gas should intermix
well during their displacements. In mathematical terms this translates to
the requirement of ergodicity. We may say that if there is a good mixing or
ergodicity, then the individual average over time coincides with the average
over the ensemble of all molecules.
Generally, stationary processes need not be ergodic, as it is seen from
Example 3. For many of those that are ergodic, proving ergodicity turns out
to be very hard. On the other hand, there are some cases in which checking
ergodicity is rather simple.
15. Theorem. Any sequence of i.i.d. random variables is ergodic.
Proof. Let
n
be a sequence of i.i.d. random variables and let A be an
invariant event belonging to := (
0
,
1
, ...). Dene =
n
(
0
,
1
, ...
n
).
Then, for each n and Borel , we have
:
n
(
0
,
1
, ...
n
) (),
so that (). On the other hand, and () . Thus () = ,
which by Theorem 2.3.19 implies that L
1
(, P) = L
1
(, P). In particular,
for any (0, 1), there are an n and a (
0
,
1
, ...
n
)-measurable random
variable f such that
E[I
A
f[ .
Without loss of generality, we may assume that [f[ 2 and that f takes
only nitely many values.
Next, by using the fact that any element of (
0
,
1
, ...
n
) has the form
: (
0
,
1
, ...
n
) B, where B is an appropriate Borel set in R
n+1
, it is
easy to prove that f = f(
0
, ...,
n
), where f(x
0
, ..., x
n
) is a Borel function.
Therefore, the above assumptions imply that
P(A) = EI
A
I
A
+Ef(
0
, ...,
n
)I
A
= +Ef(
n+1
, ...,
2n+1
)I
A
3 +Ef(
n+1
, ...,
2n+1
)f(
0
, ...,
n
)
= 3 + [Ef(
0
, ...,
n
)]
2
3 + [P(A) +]
2
.
By letting 0, we conclude that P(A) [P(A)]
2
, and our assertion follows.
From this theorem and the Birkho-Khinchin theorem we get Kolmo-
gorovs strong law of large numbers for i.i.d. random variables with E[
0
[ <
. This theorem also allows one to get stronger results even for the case
of
n
which are i.i.d. As an example let
n
= f(
n
,
n+1
, ...), where f is
independent of n. Assume E[
0
[ < . Then
n
is a stationary process and
the event
: lim
n
1
n + 1
(
0
+... +
n
) < c
is invariant with respect to the process
n
. Therefore, the limit is constant
with probability 1. As above, one proves that the limit equals E
0
(a.s.).
In Example 7, for irrational, one could also prove that
n
is an ergodic
process (see Exercise 16), and this would lead to (1.2.7) for almost every x.
Notice that in Exercise 1.2.13 we have already seen that actually (1.2.7) holds
for any x provided that f is Borel and Riemann integrable. The application
of the Birkho-Khinchin theorem allows one to extend this result for any
Borel function that is integrable with convergence for almost all x.
16. Exercise. Prove that the process from Example 7 is ergodic if is
irrational.
Finally, let us prove that if
n
is a real-valued Gaussian second order
stationary process with correlation function tending to zero at innity, then
f
= Ef(
0
) (a.s.) for any Borel f such that E[f(
0
)[ < . By the way, f
exists due to Example 6 and the Birkho-Khinchin theorem.

Furthermore, owing to the rst relation in (5), to prove f
= Ef(
0
)
it suces to concentrate on bounded and uniformly continuous f. In that
case, g(
n
) := f(
n
) Ef(
0
) is a second order stationary process. As in
Exercise 1.14 (actually easier because there is no need to use mean-square
integrals) one proves that
l.i.m.
1
n + 1
(g(
0
) +... +g(
n
)) = 0 (7)
if
lim
n
Eg(
0
)g(
n
) 0. (8)
By the Birkho-Khinchin theorem, the limit in (7) exists pointwise (a.s.)
and it coincides, of course, with the mean-square limit. It follows that we
need only prove (8).
Without loss of generality assume R(0) = 1. Then
n
:=
n
R(n)
0
and
0
are uncorrelated and hence independent. By using this and the uniform
continuity of g we conclude that
lim
n
Eg(
0
)g(
n
) = lim
n
Eg(
0
)g(
n
+R(n)
0
)
= lim
n
Eg(
0
)g(
n
) = lim
n
Eg(
0
)Eg(
n
) = 0,
the last equality being true because Eg(
0
) = 0.
1.11 Instead of Fourier integrals, consider Fourier series.
1.14 Use that continuous H-valued functions are uniformly continuous.
1.15 Observe that our assertions can be expressed in terms of R only, since,
for every continuous nonrandom f,
E
_
b
a
t
f
t
dt
2
= E
_
b
a
t
f
t
dt
2
whenever
t
and
t
have the same correlation function. Another useful
observation is that, if R(0) = 1, then R(t) = Ee
it
= F0 +Ee
it
I
,=0
, and
1
T
_
T
0
R(t) dt = F0 +EI
,=0
[e
iT
1]/(iT).
3.3 In the proof of the converse, notice that, if R(0) = 1, then
r
and
t+s
e
s
t
are uncorrelated, hence independent for r t, s 0.
3.4 Write the left-hand side as the mean-square limit of integral sums, and
use the isometric property of the stochastic integral along with the domi-
nated convergence theorem to nd the L
2
-limit.
4.1 From P
m
(x)

P
n
(x)

P
m
(x)P
n
(x) conclude that any root of P
m
is a root
of

P
m
, but not of P
n
since P
m
and P
n
do not have common roots. Then
derive that

P
m
(x) P
m
(x).
4.3 Observe that (x)[
x=z
= (x)[
x=z
for all complex z and (x)[
x=iy
=
(iy) for real y.
5.1 Dene
G(t) =
_
R
e
itx
g(x)
1
Q
+
(x)
dx
and prove that G is m/21 times continuously dierentiable in t and tends
to zero as [t[ as the Fourier transform of an L
1
function. Then prove
that G satises the equation P
+
(iD
t
)G(t) = 0 for t 0, where D
t
= d/dt.
Solutions of this linear equation are linear combinations of some integral
powers of t times exponential functions. Owing to the choice of P
+
, its
roots lie in the closed upper half plane, which implies that the exponential
functions are of type exp(at) with Re a 0, none of which goes to zero as
t . Since G(t) 0 as t , we get that G(t) = 0 for t 0. Now
apply linear dierential operators to G to get the conclusion.
5.2 Remember the denition of L
2
from Remark 2.4. By this remark, if
L
2
, then =
_
R
g(x) (dx) with g L
2
(B(R), ). If in addition
L
2
(, 0), then
_
R
g(x)e
itx
(x) dx = 0 for t 0. Exercise 5.1 shows then
that is orthogonal to the random variables in (5.1).
5.8 For the uniqueness see the hint to Exercise 5.1. Also notice that P
+
does not have real roots, and
t
=
_
R
P
+
(x)
P
+
(x)Q
+
(x)
e
itx
(dx).
6.2 If the distributions of two vectors coincide, the distributions of their
respective subvectors coincide too. Therefore, for any i n, the vectors
(
i
, ...,
n
) and (
i+1
, ...,
n+1
) have the same distribution.
6.12 Notice that the process
n
:= z
n
e
i
, where is = [0, 2] with
Lebesgue measure, is stationary. Also notice that the product of two inde-
pendent stationary processes is stationary.
6.16 For an invariant set A and any integers m R and k 0 we have
_
e
2im
I
A
() d = e
2im
_
e
2im
I
A
() d
= e
2imk
_
e
2im
I
A
() d,
where d is the dierential of the linear Lebesgue measure and k is any
integer. By using (1.2.6), conclude that, for any square-integrable random
variable f, EfI
A
= P(A)Ef. Then take f = I
A
.
Chapter 5
Innitely Divisible
Processes
The Wiener process has independent increments and the distribution of each
increment depends only on the length of the time interval over which the
increment is taken. There are many other processes possessing this property;
for instance, the Poisson process or the process
a
, a 0, from Example 2.5.8
are examples of those (see Theorem 2.6.1).
In this chapter we study what can be said about general processes of
that kind. They are supposed to be given on a complete probability space
(, T, P) usually behind the scene. The assumption that this space is com-
plete will turn out to be convenient to use starting with Exercise 5.5. One
more stipulation is that unless explicitely stated otherwise, all the processes
under consideration are assumed to be real valued. Finally, after Theorem
1.5 we tacitly assume that all processes under consideration are stochasti-
cally continuous without specifying this each time.
1. Stochastically continuous processes with
independent increments
We start with processes having independent increments. The main goal of
this section is to show that these processes, or at least their modications,
have rather regular trajectories (see Theorem 11).
1. Denition. A real- or vector-valued random process
t
given on [0, )
is said to be a process with independent increments if
0
= 0 (a.s.) and
t
1
,
t
2

t
1
, ...,
t
n

t
n1
are independent provided 0 t
1
... t
n
< .
131
132 Chapter 5. Innitely Divisible Processes, Sec 1
We will be only dealing with stochastically continuous processes.
2. Denition. A real- or vector-valued random process
t
given on [0, )
is said to be stochastically continuous at a point t
0
[0, ) if
t
P

t
0
as t t
0
. We say that
t
is stochastically continuous on a set if it is
stochastically continuous at each point of the set.
Clearly,
t
is stochastically continuous at t
0
if E[
t
t
0
[ 0 as t t
0
.
Stochastic continuity is very weakly related to the continuity of trajectories.
For instance, for the Poisson process with parameter 1 (see Exercise 2.3.8)
we have E[
t

t
0
[ = [t t
0
[. However, all trajectories of
t
are discontin-
uous. By the way, this example shows also that the requirement > 0 in
Kolmogorovs Theorem 1.4.8 is essential. The trajectories of
a
, a 0, are
also discontinuous, but this process is stochastically continuous too since
(see Theorem 2.6.1 and (2.5.1))
P([
b

a
[ > ) = P(
[ba[
> ) = P(max
t
w
s
< [b a[) 0 as b a.
3. Exercise. Prove that, for any , the function
a
, a > 0, is left continuous
in a.
4. Denition. A (real-valued) random process
t
given on [0, ) is said to
be bounded in probability on a set I [0, ) if
lim
c
sup
tI
P([
t
[ > c) = 0.
As in usual analysis, one proves the following.
5. Theorem. If the process
t
is stochastically continuous on [0, T] (T <
), then
(i) it is uniformly stochastically continuous on [0, T], that is, for any
, > 0 there exists > 0 such that
P([
t
1

t
2
[ > ) < ,
whenever t
1
, t
2
[0, T] and [t
1
t
2
[ ;
(ii) it is bounded in probability on [0, T].
The proof of this theorem is left to the reader as an exercise.
From this point on we will only consider stochastically continuous pro-
cesses on [0, ), without specifying this each time.
To prove that processes with independent increments admit modica-
tions without second-type discontinuities, we need the following lemma.
Ch 5 Section 1. Processes with independent increments 133
6. Lemma (Ottavianis inequality). Let
k
, k = 1, ..., n, be independent
random variables, S
k
=
1
+... +
k
, a 0, 0 < 1, and
P[S
n
S
k
[ a k.
Then for all c 0
Pmax
kn
[S
k
[ a +c
1
1
P[S
n
[ c. (1)
Proof. The probability on the left in (1) equals
n
k=1
P[S
i
[ < a +c, i < k, [S
k
[ a +c
1
1
n
k=1
P[S
i
[ < a +c, i < k, [S
k
[ a +c, [S
n
S
k
[ < a
1
1
n
k=1
P[S
i
[ < a+c, i < k, [S
k
[ a+c, [S
n
[ c
1
1
P[S
n
[ c.
7. Theorem. Let
t
be a process with independent increments on [0, ),
T [0, ), and let be the set of all rational points on [0, T]. Then
Psup
r
[
r
[ < = 1.
Proof. Obviously it suces to prove that for some h > 0 and all t [0, T]
we have
P sup
r[t,t+h]
[
r
[ < = 1. (2)
Take h > 0 so that P[
u

u+s
[ 1 1/2 for all s, u such that
0 s h and s + u T. Such a choice is possible owing to the uniform
stochastic continuity of
t
on [0, T]. Fix t [0, T] and let
r
1
, ..., r
n
[t, t +h] , r
1
... r
n
.
Observe that
r
k
=
r
1
+(
r
2
r
1
) +... +(
r
k
r
k1
), where the summands
are independent. In addition, P[
r
n

r
k
[ 1 1/2. Hence by Lemma 6
Psup
kn
[
r
k
[ 1 +c 2 sup
t[0,T]
P[
t
[ c. (3)
The last inequality is true for any arrangement of the points r
k
[t, t+h]
which may not be necessarily ordered increasingly. Therefore, now we can
think of the set r
1
, r
2
, ... as being the whole [t, t + h]. Then, passing
to the limit in (3) as n and noticing that
sup[
r
k
[ : k = 1, 2, ... sup[
r
[ : r [t, t +h],
we nd that
P sup
r[t,t+h]
[
r
[ > 1 +c 2 sup
t[0,T]
P[
t
[ c.
Finally, by letting c and using the uniform boundedness of
r
in
probability, we come to (2). The theorem is proved.
Dene D[0, ) to be the set of all complex-valued right-continuous func-
tions on [0, ) which have nite left limits at each point t (0, ). Similarly
one denes D[0, T]. We say that a function x
is a cadlag function on [0, T]

if x
D[0, T], and just cadlag if x
D[0, ).
8. Exercise*. Prove that if x
n
D[0, ), n = 1, 2, ..., and the x

n
t
converge
to x
t
as n uniformly on each nite time interval, then x
D[0, ).
9. Lemma. Let = r
1
, r
2
, ... be the set of all rational points on [0, 1], x
t
a real-valued (nonrandom) function given on . For a < b dene
n
(x
, a, b)
to be the number of upcrossings of the interval (a, b) by the function x
t
restricted to the set r
1
, r
2
, ..., r
n
. Assume that
lim
n
n
(x
, a, b) <
for any rational a and b. Then the function
x
t
:= lim
rt
x
r
is well dened for any t [0, 1), is right continuous on [0, T), and has
(perhaps innite) left limits on (0, T].
This lemma is set as an exercise on properties of lim and lim.
Ch 5 Section 1. Processes with independent increments 135
10. Lemma. Let (t, ) be a complex-valued function dened for R
and t [0, 1]. Assume that (t, ) is continuous in t and never takes the
zero value. Let
t
be a stochastically continuous process such that
(i) sup
r
[
r
[ < (a.s.);
(ii) lim
n
E
n
(
i
(), a, b) < for any < a < b < , R,

i = 1, 2, where
1
t
() = Re [(t, )e
i
t
],
2
t
() = Im[(t, )e
i
t
].
Then the process
t
admits a modication, all trajectories of which belong
to D[0, 1].
Proof. Denote
t
() = (t, )e
i
t
and
t
=
m=1
a<b
a,b rational
lim
n
n
(
i
(
1
m
), a, b) < , i = 1, 2 sup
r
[
r
[ < .
Obviously, P(
t
) = 1. For
t
Lemma 9 allows us to let

t
(
1
m
) = lim
rt
r
(
1
m
), t < 1,
1
(
1
m
) =
1
(
1
m
).
For ,
t
let
t
(
1
m
) 0. Observe that, since is continuous in t and
P(
t
) = 1 and
t
is stochastically continuous, we have that

t
(
1
m
) = P- lim
rt
r
(
1
m
) =
t
(
1
m
) (a.s.) t < 1,

1
(
1
m
) =
1
(
1
m
).
(4)
Furthermore, [
t
(
1
m
)
1
(t,
1
m
)[ 1 for all and t.
Now dene = () = [sup
r
[
r
[] + 1 and
t
= arcsin Im
t
(1/)
1
(t, 1/)I
.
By Lemma 9,
(
1
m
) D[0, 1] for any . Hence,

t
D[0, 1] for any .
It only remains to prove that P
t
=
t
= 1 for any t [0, 1]. For t
we have this equality from (4) and from the formula
t
= arcsin Im
t
(1/)
1
(t, 1/),
which holds for
t
. For other t, owing to the stochastic continuity of
t
and the right continuity of

t
, we have
t
= P- lim
rt
r
= P- lim
rt
r
=

t
(a.s.). The lemma is proved.
11. Theorem. Stochastically continuous processes with independent incre-
ments admit modications which are right continuous and have nite left
limits for any .
Proof. Let
t
be a process in question. It suces to construct a modi-
cation with the described properties on each interval [n, n+1], n = 0, 1, 2, ....
The reader can easily combine these modications to get what we want on
[0, ). We will conne ourselves to the case n = 0. Let be the set of all
rational points on [0, 1], and let
(t, ) = Ee
i
t
, (t
1
, t
2
, ) = Ee
i(
t
2
t
1
)
.
Since the process
t
is stochastically continuous, the function (t
1
, t
2
, )
is continuous in (t
1
, t
2
) [0, 1] [0, 1] for any . Therefore, this function
is uniformly continuous on [0, 1] [0, 1], and, because (t, t, ) = 1, there
exists () > 0 such that [(t
1
, t
2
, )[ 1/2 whenever [t
1
t
2
[ < () and
t
1
, t
2
[0, 1]. Furthermore, for any t [0, 1] and R one can nd n 1
and 0 = t
1
t
2
... t
n
= t such that [t
k
t
k1
[ < (). Then, using the
independence of increments, we nd that
(t, ) = (t
1
, t
2
, ) ... (t
n1
, t
n
, ),
which implies that (t, ) ,= 0. In addition, (t, ) is continuous in t.
For xed consider the process
t
=
t
() =
1
(t, )e
i
t
.
Let s
1
, s
2
, ..., s
n
be rational numbers in [0, 1] such that s
1
... s
n
. Dene
T
k
=
s
1
,
s
2

s
1
, ...,
s
k

s
k1
.
Notice that (Re
s
k
, T
k
) and (Im
s
k
, T
k
) are martingales. Indeed, by virtue
of the independence of
s
k+1

s
k
and T
k
, we have
ERe
s
k+1
[T
k
= Re Ee
i
s
k
1
(s
k+1
, )e
i(
s
k+1
s
k
)
[T
k
= Re e
i
s
k
1
(s
k+1
, )(s
k
, s
k+1
, ) = Re
s
k
(a.s.).
Hence by Doobs upcrossing theorem, if r
i
, r
1
, ..., r
n
= s
1
, ..., s
n
,
and 0 s
1
... s
n
, then
E
n
(Re
, a, b) (E[Re
s
n
[ +[a[)/(b a)
Ch 5 Section 2. Levy-Khinchin theorem 137
( sup
t[0,1]
1
(t, ) +[a[)/(b a) < ,
sup
n
E
n
(Im
, a, b) < .
It only remains to apply Lemma 10. The theorem is proved.
12. Exercise* (cf. Exercise 3). Take the stable process
a
, a 0, from
Theorem 2.6.1. Observe that
a
increases in a and prove that its cadlag
modication, the existence of which is asserted in Theorem 11, is given by
a+
, a 0.
2. Levy-Khinchin theorem
In this section we prove a remarkable Levy-Khinchin theorem. It is worth
noting that this theorem was originally proved for so-called innitely divisi-
ble laws and not for innitely divisible processes. As usual we are only deal-
ing with one-dimensional processes (the multidimensional case is treated,
for instance, in [GS]).
1. Denition. A process
t
with independent increments is called time
homogeneous if, for every h > 0, the distribution of
t+h
t
is independent
of t.
2. Denition. A stochastically continuous time-homogeneous process
t
with independent increments is called an innitely divisible process.
3. Theorem (Levy-Khinchin). Let
t
be an innitely divisible process on
[0, ). Then there exist a nite nonnegative measure on (R, B(R)) and a
number b R such that, for any t [0, ) and R, we have
Ee
i
t
= expt
_
R
f(, x) (dx) +itb, (1)
where
f(, x) = (e
ix
1 isin x)
1 +x
2
x
2
, x ,= 0, f(, 0) :=
2
2
.
Proof. Denote (t, ) = Ee
i
t
. In the proof of Theorem 1.11 we saw
that (t, ) is continuous in t and (t, ) ,= 0. In addition (t, ) is contin-
uous with respect to the pair (t, ). Dene
a(t, ) = arg (t, ), l(t, ) = ln [(t, )[.
By using the continuity of and the fact that ,= 0, one can uniquely
dene a(t, ) to be continuous in t and in and satisfy a(0, ) = a(t, 0) = 0.
Clearly, l(t, ) is a nite function which is also continuous in t and in .
Furthermore,
(t, ) = expl(t, ) +ia(t, ).
Next, it follows from the homogeneity and independence of increments
of
t
that
(t +s, ) = (t, )(s, ).
Hence, by denition of a, we get that, for each , it satises the equation
f(t +s) = f(t) +f(s) + 2k(s, t),
where k(s, t) is a continuous integer-valued function. Since k(t, 0) = 0, in
fact, k 0, and a satises f(t +s) = f(t) +f(s). The same equation is also
valid for l. Any continuous solution of this equation has the form ct, where
c is a constant. Thus,
a(t, ) = ta(), l(t, ) = tl(),
where a() = a(1, ) and l() = l(1, ). By dening g() := l() + ia(),
we write
(t, ) = e
tg()
,
where g is a continuous function of and g(0) = 0. We have reduced our
problem to nding g.
Observe that
g() = lim
t0
e
tg()
1
t
= lim
t0
(t, ) 1
t
. (2)
Moreover, from Taylors expansion of exp(tg()) with respect to t one easily
sees that the convergence in (2) is uniform on each set of values of on
which g() is bounded. In particular, this is true on each set [h, h] with
0 h < .
By taking t of type 1/n and denoting F
t
the distribution of
t
, we con-
clude that
n
_
R
(e
ix
1) F
1/n
(dx) g() (3)
as n uniformly in on any nite interval. Integrate this against d to
get
lim
n
n
_
R
_
1
sin xh
xh
_
F
1/n
(dx) =
1
2h
_
h
h
g() d. (4)
Notice that the right-hand side of (4) can be made arbitrarily small by
choosing h small, since g is continuous and vanishes at zero. Furthermore,
as is easy to see, 1 sin xh/xh 1/2 for [xh[ 2. It follows that, for any
> 0, there exists h > 0 such that
lim
n
(n/2)
_
[x[2/h
F
1/n
(dx) .
In turn, it follows that, for all large n,
n
_
[x[2/h
F
1/n
(dx) 4. (5)
By reducing h one can accommodate any nite set of values of n and nd
an h such that (5) holds for all n 1 rather than only for large ones.
To derive yet another consequence of (4), notice that there exists a
constant > 0 such that
1
sinx
x

x
2
1 +x
2
x R.
Therefore, from (4) with h = 1, we obtain that there exists a nite constant
c such that for all n
n
_
R
x
2
1 +x
2
F
1/n
(dx) c. (6)
Finally, upon introducing measures
n
by the formula
n
(dx) = n
x
2
1 +x
2
F
1/n
(dx),
and noticing that
n
nF
1/n
, from (5) and (6), we see that the family
n
, n = 1, 2, ... is weakly compact. Therefore, there exist a subsequence
n
t
and a nite measure such that
_
R
f(x)
n
(dx)
_
R
f(x) (dx)
for every bounded and continuous f. As is easy to check f(, x) is bounded
and continuous in x. Hence,
g() = lim
n
n
_
R
(e
ix
1) F
1/n
(dx)
= lim
n
[
_
R
f(, x)
n
(dx) +in
_
R
sin xF
1/n
(dx)]
= lim
n
[
_
R
f(, x)
n
(dx) +in
t
_
R
sin xF
1/n
(dx)]
=
_
R
f(, x)(dx) +ib,
where
b := lim
n
n
t
_
R
sin xF
1/n
(dx),
and the existence and niteness of this limit follows from above computations
in which all limits exist and are nite. The theorem is proved.
Formula (1) is called Khinchins formula. The following Levys formula
sheds more light on the structure of the process x
t
:
(t, ) = exp t
_
R
(e
ix
1 isin x) (dx) +ib
2
2
/2,
where is called the Levy measure of
t
. This is a nonnegative, generally
speaking, innite measure on B(R) such that
_
R
x
2
1 +x
2
(dx) < , (0) = 0. (7)
Any such measure is called a Levy measure. One obtains one formula from
the other by introducing the following relations between and the pair
(,
2
):
(0) =
2
, () =
_
\0
1 +x
2
x
2
(dx).
4. Exercise*. Prove that if one introduces (,
2
) by the above formulas,
then one gets Levys formula from Khinchins formula, and, in addition,
satises (7).
5. Exercise*. Let a measure satisfy (7). Dene
() =
_
x
2
1 +x
2
(dx) +I
(0)
2
.
Show that is a nite measure for which Levys formula transforms into
Khinchins formula.
6. Theorem (uniqueness). There can exist only one nite measure and
one number b for which (t, ) is representable by Khinchins formula. There
can exist only one measure satisfying (7) and unique numbers b and
2
for which (t, ) is representable by Levys formula.
Proof. Exercises 4 and 5 show that we may concentrate only on the rst
part of the theorem. The exponent in Khinchins formula is continuous in
and vanishes at = 0. Therefore it is uniquely determined by (t, ), and
we only need prove that and b are uniquely determined by the function
g() :=
_
R
f(, x) (dx) +ib.
Clearly, it suces only to show that is uniquely determined by g.
For h > 0, we have
g()
g( +h) +g( h)
2
=
_
R
e
ix
1 cos xh
x
2
(1 +x
2
) (dx) (8)
with the agreement that (1 cos xh)/x
2
= h
2
/2 if x = 0. Dene a new
measure
h
() =
_
(x, h) (dx), (x, h) =

1 cos xh
x
2
(1 +x
2
)
and use
_
R
f(x)
h
(dx) =
_
R
f(x)(x, h) (dx)
for all bounded Borel f. Then we see from (8) that the characteristic func-
tion of
h
is uniquely determined by g. Therefore,
h
is uniquely determined
by g for any h > 0.
Now let be a bounded Borel set and h be such that [1/h, 1/h].
Take f(x) =
1
(x, h) for x and f(x) = 0 elsewhere. By the way,
observe that f is a bounded Borel function. For this f
_
R
f(x)
h
(dx) =
_
R
f(x)(x, h) (dx) = (),
where the left-hand side is uniquely determined by g. The theorem is proved.
7. Corollary. Dene
t
(dx) =
x
2
t(1 +x
2
)
F
t
(dx), b
t
=
1
t
_
R
sin xF
t
(dx).
Then
t
weakly and b
t
b as t 0.
Indeed, similarly to (3) we have
1
t
_
R
(e
ix
1) F
t
(dx) g(),
which as in the proof of the Levy-Khinchin theorem shows that the family
t
; t 1 is weakly compact. Next, if
t
n
w
, then, again as in the proof
of the Levy-Khinchin theorem, b
t
n
converges, and if we denote its limit by c,

then Khinchins formula holds with = and b = c. Finally, the uniqueness
implies that all weak limit points of
t
, t 0, coincide with and hence
(cf. Exercise 1.2.10) (t)
w
as t 0. This obviously implies that b
t
also
converges and its limit is b.
8. Corollary. In Levys formula
2
= lim
n
lim
t0
1
t
E
2
t
I
[
t
[
n
,
where
n
is a sequence such that
n
> 0 and
n
0. Moreover, F
t
/t converges
weakly on R 0 as t 0 to , that is,
lim
t0
1
t
_
R
f(x) F
t
(dx) = lim
t0
1
t
Ef(
t
) =
_
R
f(x) (dx) (9)
for each bounded continuous function f which vanishes in a neighborhood
of 0.
Proof. By the denition of and Corollary 7, for each bounded contin-
uous function f which vanishes in a neighborhood of 0, we have
_
R
f(x) (dx) =
_
R
f(x)
1 +x
2
x
2
(dx)
= lim
t0
_
R
f(x)
1 +x
2
x
2

t
(dx) = lim
t0
1
t
_
R
f(x) F
t
(dx).
This proves (9).
Let us prove the rst assertion. By the dominated convergence theorem,
for every sequence of nonnegative
n
0 we have
2
= (0) =
_
R
I
[0,0]
(x) (dx)
=
_
R
I
[0,0]
(x)(1 +x
2
) (dx) = lim
n
_
R
I
[
n
,
n
]
(x)(1 +x
2
) (dx).
By Theorem 1.2.11 (v), if (
n
) = (
n
) = 0, then
_
R
I
[
n
,
n
]
(x)(1 +x
2
) (dx)
= lim
t0
1
t
_
R
I
[
n
,
n
]
(x)x
2
F
t
(dx) = lim
t0
1
t
E
2
t
I
[
t
[
n
.
It only remains to notice that the set of x such that (x) > 0 is count-
able, so that there exists a sequence
n
such that
n
0 and (
n
) =
(
n
) = 0. The corollary is proved.
9. Exercise. Prove that, if
t
0 for all t 0 and , then ((, 0]) = 0.
One can say more in that case, as we will see in Exercise 3.15.
We know that the Wiener process has independent increments, and also
that it is homogeneous and stochastically continuous (even just continuous).
In Levys formula, to get E exp(iw
t
) one takes = 0, b = 0, and = 1.
If in Levys formula we take = 0, () = I
(1), and b = sin 1,

where is a nonnegative number, then the corresponding process is called
the Poisson process with parameter .
If = b = 0 and (dx) = ax
2
dx with a constant a > 0, the corre-
sponding process is called the Cauchy process.
Clearly, for the Poisson process
t
with parameter we have
Ee
i
t
= e
t(e
i
1)
,
so that
t
has Poisson distribution with parameter t. In particular,
E[
t+h

t
[ = E
h
= h
for t, h 0. The values of
t
are integers and
t
is not identically constant
(the expectation grows). Therefore
t
does not have continuous modica-
tion, which shows, in particular, that the requirement > 0 in Theorem
1.4.8 is essential. For = 1 we come to the Poisson process introduced in
Exercise 2.3.8.
10. Exercise. Prove that for the Cauchy process we have (t, ) =
exp(ct[[), with a constant c > 0.
11. Exercise*. Prove that the Levy measure of the process
a+
, a 0 (see
Theorem 2.6.1, and Exercise 1.12) is concentrated on the positive half line
and is given by I
x>0
(2)
1/2
x
3/2
dx. This result will be used in Sec. 6.
You may also like to show that
(t, ) = exp(t[[
1/2
(a ib sign )),
where
a = (2)
1/2
_

0
x
3/2
(1 cos x) dx, b = (2)
1/2
_

0
x
3/2
sin xdx,
and, furthermore, that a = b = 1.
12. Exercise. Prove that if in Levys formula we have = 0 and = 0,
then
t
= bt (a.s.) for all t, where b is a constant.
3. Jump measures and their relation to Levy measures
Let
t
be an innitely divisible cadlag process on [0, ). Dene
t
=
t

t
.
For any set R
+
R := [0, ) R let p() be the number of points
(t,
t
) . It may happen that p() = . Obviously p() is a -additive
measure on the family of all subsets of R
+
R. The measure p() is called
the jump measure of
t
.
For T, (0, ) dene
R
T,
= [0, T] x : [x[ .
1. Remark. Notice that p(R
T,
) < for any , which is to say that on
[0, T] there may be only nitely many t such that [
t
[ . This property
follows immediately from the fact that the trajectories of
t
do not have
discontinuities of the second kind. It is also worth noticing that p() is
concentrated at points (t,
t
) and each point of this type receives a unit
mass.
We will need yet another measure dened on subsets of R. For any
B R dene
p
t
(B) = p((0, t] B).
Ch 5 Section 3. Jump measures and their relation to Levy measures 145
2. Remark. By Remark 1, if B is separated from zero, then p
t
(B) is nite.
Moreover, let f(x) be a Borel function (perhaps unbounded) vanishing for
[x[ < , where > 0. Then, the process
t
:=
t
(f) :=
_
R
f(x) p
t
(dx)
is well dened and is just equal to the (nite) sum of f(
s
) for all s t
such that [
s
[ .
The structure of
t
is pretty simple. Indeed, x an and let 0 s
1
<
... < s
n
< ... be all s for which [
s
[ (if there are only m < such s,
we let s
n
= for n m + 1). Then, of course, s
n
as n . Also
s
1
> 0, because
t
is right continuous and
0
= 0. With this notation
t
=

s
n
t
f(
s
n
). (1)
We see that
t
starts from zero, is constant on each interval [s
n1
, s
n
),
n = 1, 2, ... (with s
0
:= 0), and
s
n
= f(
s
n
). (2)
3. Lemma. Let f(x) be a function as in Remark 2. Assume that f is
continuous. Let 0 t < and t
n
i
be such that
s = t
n
1
< ... < t
n
k(n)+1
= t, max
j=1,...,k(n)
(t
n
j+1
t
n
j
) 0
as n . Then for any
t
(f)
s
(f) =
_
R
+
R
I
(s,t]
(u)f(x) p(dudx) = lim
n
k(n)
j=1
f(
t
n
j+1

t
n
j
). (3)
Proof. We have noticed above that the set of all u (s, t] for which
[
u
[ is nite. Let u
1
, ..., u
N
be this set. Single out those intervals
(t
h
j
, t
n
j+1
] which contain at least one of the u
i
s. For large n we will have
exactly N such intervals. First we prove that, for large n,
[
t
n
j+1

t
n
j
[ < , f(
t
n
j+1

t
n
j
) = 0
if the interval (t
n
j
, t
n
j+1
] does not contain any of the u
i
s. Indeed, if this were
not true, then one could nd a sequence s
k
, t
k
such that [
t
k

s
k
[ ,
s
k
, t
k
(s, t], s
k
< t
k
, t
k
s
k
0, and on (s
k
, t
k
] there are no points u
i
.
Without loss of generality, we may assume that s
k
, t
k
u (s, t] (actually,
one can obviously assume that u [s, t], but since the trajectories are right
continuous,
s
k
,
t
k

s
if s
k
, t
k
s, so that u ,= s).
Furthermore, there are innitely many s
k
s either to the right or to the
left of u. Therefore, using subsequences if needed, we may assume that the
sequence s
k
is monotone and then that t
k
is monotone as well. Then, since
t
has nite right and left limits, we have that s
k
u, s
k
< u, and t
k
u, which
implies that [
u
[ . But then we would have a point u u
1
, ..., u
N
which belongs to (s
k
, t
k
] for all k (after passing to subsequences). This is a
contradiction, which proves that for all large n the sum on the right in (3)
contains at most N nonzero terms. These terms correspond to the intervals
(t
j
, t
j+1
] containing u
i
s, and they converge to f(
u
i
).
It only remains to observe that the rst equality in (3) is obvious and,
by Remark 2,
_
R
+
R
I
(s,t]
(u)f(x) p(dudx) =
N
i=1
f(
u
i
).
4. Denition. For 0 s < t < dene T
s,t
as the completion of the
-eld generated by
r

s
, r [s, t]. Also set T
t
= T
0,t
.
5. Remark. Since the increments of
t
are independent, the -elds T
0,t
1
,
T
t
1
,t
2
,..., T
t
n1
,t
n
are independent for any 0 < t
1
< ... < t
n
.
Next remember Denition 2.5.10.
6. Denition. Random processes
1
t
, ...,
n
t
dened for t 0 are called
independent if for any t
1
, ..., t
k
0 the vectors (
1
t
1
, ...,
1
t
k
), ..., (
n
t
1
, ...,
n
t
k
)
are independent.
7. Lemma. Let
t
be an R
d
-valued process starting from zero and such that
t

s
is T
s,t
-measurable whenever 0 s < t < . Also assume that, for
all 0 s < t < , the random variables
1
t

1
s
,...,
d
t

d
s
are independent.
Then the process
t
has independent increments and the processes
1
t
,...,
d
t
are independent.
Proof. That
t
has independent increments follows from Remark 5. To
prove that the vectors
(
1
t
1
, ...,
1
t
n
), ..., (
d
t
1
, ...,
d
t
n
) (4)
are independent if 0 = t
0
< t
1
< ..., < t
n
, it suces to prove that
(
1
t
1

1
t
0
,
1
t
2

1
t
1
, ...,
1
t
n

1
t
n1
), ..., (
d
t
1

d
t
0
,
d
t
2

d
t
1
, ...,
d
t
n

1
t
n1
)
(5)
are independent. Indeed, the vectors in (4) can be obtained after applying a
linear transformation to the vectors in (5). Now take
k
j
R for k = 1, ..., d
and j = 1, .., n, and write
Eexp
_
i
k,j
k
j
(
k
t
j

k
t
j1
)
_
= E exp
_
i
jn1,k
k
j
(
k
t
j

k
t
j1
)
_
Eexp
_
i
k
n
(
k
t
n

k
t
n1
)
_
[T
0,t
n1
= E exp
_
i
jn1,k
k
j
(
k
t
j

k
t
j1
)
_
E exp
_
i
k
n
(
k
t
n

k
t
n1
)
_
= Eexp
_
i
jn1,k
k
j
(
k
t
j

k
t
j1
)
_
k
Eexp
_
i
k
n
(
k
t
n

k
t
n1
)
_
.
An obvious induction allows us to represent the characteristic function of
the family
k
t
j

k
t
j1
, k = 1, ..., d, j = 1, ..., n as the product of the char-
acteristic functions of its members, thus proving the independence of all
k
t
j

k
t
j1
and, in particular, of the vectors (5). The lemma is proved.
8. Lemma. Let f be as in Remark 2 and let f be continuous. Take R
and denote
t
=
t
(f) +
t
. Then
(i) for every 0 s < t < , the random variable
t

s
is T
s,t
-
measurable;
(ii) the process
t
is an innitely divisible cadlag process and
Ee
i
t
= exp t
_
_
R
(e
i(f(x)+x)
1 isin x) (dx) +ib
2
2
/2
_
. (6)
Proof. Assertion (i) is a trivial consequence of (3). In addition, Remark
5 shows that
t
has independent increments.
(ii) The homogeneity of
t
follows immediately from (3) and the similar
property of
t
. Furthermore, Remark 2 shows that
t
is cadlag. From the
homogeneity and right continuity of
t
we get
lim
st
Ee
i(
t
s
)
= lim
st
Ee
i
ts
= Ee
i
0
= 1, t > 0.
Similar equations hold for s t with t 0. Therefore,
s
P

t
as s t, and
t
is stochastically continuous.
To prove (6), take Khinchins measure and take
t
and b
t
from Corol-
lary 2.7. Also observe that
lim
n
a
n
n
= lim
n
e
nlog a
n
= lim
n
e
n(a
n
1)
provided a
n
1 and one of the limits exists. Then we have
Ee
i(
t
+
t
)
= lim
n
_
Ee
i(f(
t/n
)+
t/n
)
_
n
= lim
n
exp n
_
R
(e
i(f(x)+x)
1) F
t/n
(dx),
with
lim
n
n
_
R
(e
i(f(x)+x)
1) F
t/n
(dx)
= lim
n
t
_
R
(e
i(f(x)+x)
1 isin x)(1 +x
2
)/x
2
t/n
(dx) +itb
= t
_
R
(e
i(f(x)+x)
1 isin x)(1 +x
2
)/x
2
(dx) +itb.
Now to get (6) one only has to refer to Exercise 2.4. The lemma is proved.
9. Theorem. (i) For ab > 0 the process p
t
(a, b] is a Poisson process with
parameter ((a, b]), and, in particular,
Ep
t
(a, b] = t((a, b]); (7)
(ii) if a
m
< b
m
, a
m
b
m
> 0, m = 1, ..., n, and the intervals (a
m
, b
m
] are
pairwise disjoint, then the processes p
t
(a
1
, b
1
], ..., p
t
(a
n
, b
n
] are independent.
Proof. To prove (i), take a sequence of bounded continuous functions
f
k
(x) such that f
k
(x) I
(a,b]
(x) as k and f
k
(x) = 0 for [x[ < :=
([a[ [b[)/2. Then, for each ,
_
R
f
k
(x) p
t
(dx) p
t
(a, b]. (8)
Moreover, [ expif
k
(x) 1[ 2I
[x[
and
_
R
I
[x[
(dx)
1 +
2
2
_
R
x
2
1 +x
2
(dx) < . (9)
Hence, by Lemma 8 and by the dominated convergence theorem,
Ee
ip
t
(a,b]
= exp t
_
R
(e
iI
(a,b]
(x)
1) (dx) = expt((a, b])(e
i
1).
The homogeneity of p
t
(a, b] and independence of its increments follow
from (8) and Lemma 8. Remark 2 shows that p
t
(a, b] is a cadlag process.
As in Lemma 8, this leads to the conclusion that p
t
(a, b] is stochastically
continuous. This proves (i).
(ii) Formula (8) and Lemma 8 imply that p
t
(a, b] p
s
(a, b] is T
s,t
-
measurable if s < t. By Lemma 7, to prove that the processes p
t
(a
1
, b
1
],...,
p
t
(a
n
, b
n
] are independent, it suces to prove that, for any s < t, the random
variables
p
t
(a
1
, b
1
] p
s
(a
1
, b
1
], ..., p
t
(a
n
, b
n
] p
s
(a
n
, b
n
] (10)
are independent.
Take
1
, ...,
n
R and dene f(x) =
m
for x (a
m
, b
m
] and f = 0
outside the union of the (a
m
, b
m
]. Also take a sequence of bounded continu-
ous functions f
n
vanishing in a neighborhood of zero such that f
n
(x) f(x)
for all x R. Then
t
(f
n
)
s
(f
n
)
t
(f)
s
(f) =
n
m=1
m
p
t
(a
m
, b
m
] p
s
(a
m
, b
m
].
Hence and from Lemma 8 we get
Eexp(i
n
m=1
m
p
t
(a
m
, b
m
] p
s
(a
m
, b
m
]) = lim
n
Ee
i(
t
(f
n
)
s
(f
n
))
= lim
n
Ee
i
ts
(f
n
)
= lim
n
exp(t s)
_
R
(e
if
n
(x)
1) (dx)
= exp(t s)
_
R
(e
if(x)
1) (dx) =
n
m=1
exp(t s)((a
m
, b
m
])(e
i
m
1).
This and assertion (i) prove that the random variables in (5) are indepen-
dent. The theorem is proved.
10. Corollary. Let f be a Borel nonnegative function. Then, for each t
0,
_
R\0
f(x) p
t
(dx)
is a random variable and
E
_
R\0
f(x) p
t
(dx) = t
_
R
f(x) (dx). (11)
Notice that on the right in (11) we write the integral over R instead of
R0 because (0) = 0 by denition. To prove the assertion, take > 0
and let be the collection of all Borel such that p
t
((, )) is a random
variable and
() := Ep
t
( (, )) = t
() := t( (, )).
It follows from (7) and from the niteness of (R (, )) that R
. By adding an obvious argument we conclude that is a -system.
Furthermore, from Theorem 9 (i) we know that contains := (a, b] :
ab > 0, which is a -system. Therefore, = B(R). Now a standard
measure-theoretic argument shows that, for every Borel nonnegative f, we
have
E
_
R\(,)
f(x) p
t
(dx) =
_
R
f(x)
(dx)
= t
_
R
f(x)
(dx) = t
_
R\(,)
f(x) (dx).
It only remains to let 0 and use the monotone convergence theorem.
11. Corollary. Every continuous innitely divisible process has the form
bt + w
t
, where and b are the constants from Levys formula and w
t
is a
Wiener process if ,= 0 and w
t
0 if = 0.
Indeed, for a continuous
t
we have p
t
(, ] = 0 if > 0. Hence
((, ]) = 0 and (t, ) = expibt
2
2
t/2. For ,= 0, it follows
that
t
:= (
t
bt)/ is a continuous process with independent increments,
0
= 0, and
t

s
N(0, [t s[). As we know,
t
is a Wiener process. If
= 0, then
t
bt = 0 (a.s.) for any t and, actually,
t
bt = 0 for all t at
once (a.s.) since
t
bt is continuous.
12. Corollary. Let an open set G R 0 be such that (G) = 0. Then
there exists
t
T such that P(
t
) = 1 and, for each t 0 and
t
,
t
() , G.
Indeed, represent G as a countable union (perhaps with intersections)
of intervals (a
m
, b
m
]. Since ((a
m
, b
m
]) = 0, we have Ep
t
(a
m
, b
m
] = 0 and
p
t
(a
m
, b
m
] = 0 (a.s.). Adding to this that p
t
(a
m
, b
m
] increases in t, we
conclude that p
t
(a
m
, b
m
] = 0 for all t (a.s.). Now let
t
=
m
: p
t
(a
m
, b
m
] = 0 t 0.
Then P(
t
) = 1 and
p((0, t] G)
m
p
t
(a
m
, b
m
] = 0
for each
t
and t 0, as asserted.
The following corollary will be used for deriving an integral representa-
tion of
t
through jump measures.
13. Corollary. Denote q
t
(a, b] = p
t
(a, b] t((a, b]). Let some numbers
satisfying a
i
b
i
and a
i
b
i
> 0 be given for i = 1, 2. Then, for all t, s 0,
Eq
t
(a
1
, b
1
]q
s
(a
2
, b
2
] = (s t)((a
1
, b
1
] (a
2
, b
2
]). (12)
Indeed, without loss of generality assume t s. Notice that both parts
of (12) are additive in the sense that if, say, (a
1
, b
1
] = (a
3
, b
3
] (a
4
, b
4
] and
(a
3
, b
3
] (a
4
, b
4
] = , then
q
t
(a
1
, b
1
] = q
t
(a
3
, b
3
] +q
t
(a
4
, b
4
],
((a
1
, b
1
] (a
2
, b
2
]) = ((a
3
, b
3
] (a
2
, b
2
]) + ((a
4
, b
4
] (a
2
, b
2
]).
It follows easily that to prove (12) it suces to prove it only for two cases:
(i) (a
1
, b
1
] (a
2
, b
2
] = and (ii) a
1
= a
2
, b
1
= b
2
.
In the rst case (12) follows from the independence of the processes
p
(a
1
, b
1
] and p
(a
2
, b
2
] and from (7). In the second case, it suces to remem-
ber that the variance of a random variable having the Poisson distribution
with parameter is and use the fact that
q
t
(a, b] = q
s
(a, b] + (q
t
(a, b] q
s
(a, b]),
where the summands are independent.
We will also use the following theorem, which is closely related to The-
orem 9.
14. Theorem. Take a > 0 and dene
t
=
_
[a,)
xp
t
(dx) +
_
(,a]
xp
t
(dx). (13)
Then:
(i) the process
t
is innitely divisible, cadlag, with = b = 0 and Levy
measure ( (a, a));
(ii) the process
t

t
is innitely divisible, cadlag, and does not have
jumps larger in magnitude than a;
(iii) the processes
t
and
t
t
are independent.
Proof. Assertion (i) is proved like the similar assertion in Theorem 9
on the basis of Lemma 8. Indeed, take a sequence of continuous functions
f
k
(x) x(1 I
(a,a)
(x)) such that f
k
(x) = 0 for [x[ a/2. Then, for any
,
_
R
f
k
(x) p
t
(dx)
t
. (14)
This and Lemma 8 imply that
t
is a homogeneous process with independent
increments. That it is cadlag follows from Remark 2. The stochastic conti-
nuity of
t
follows from its right continuity and homogeneity as in Lemma 8.
To nd the Levy measure of
t
, observe that [ expif
k
(x)1[ 2I
[x[a/2
.
By using (9), Lemma 8, and the dominated convergence theorem, we con-
clude that
Ee
i
t
= exp t
_
R
(e
ix(1I
(a,a)
(x))
1) (dx) = exp t
_
R\(a,a)
(e
ix
1) (dx).
In assertion (ii) the fact that
t

t
is an innitely divisible cadlag
process is proved as above on the basis of Lemma 8. The assertion about
its jumps is obvious because of Remark 2. Another explanation of the same
fact can be obtained from Lemma 8, which implies that
Ee
i(
t
+
t
)
= exp t
_
_
R
(e
ix(1I
(a,a)
(x))+ix
1 isin x) (dx) +ib
2
2
/2
_
= exp t
_
_
(a,a)
(e
ix
1 isin x) (dx)
+
_
R\(a,a)
(e
i(+)x
1 isin x) (dx) +ib
2
2
/2
_
, (15)
where, for = , the expression in the last braces is
_
(a,a)
(e
ix
1 isin x) (dx) +i(b
_
R\(a,a)
sin x(dx))
2
2
/2,
which shows that the Levy measure of
t
t
is concentrated on (a, a).
To prove (iii), rst take = in (15). Then we see that
Ee
i
t
+i(
t
t
)
= e
tg
,
where
g =
_
R\(a,a)
(e
ix
1) (dx)
+
_
(a,a)
(e
ix
1isinx) (dx)+i(b
_
R\(a,a)
sin x(dx))
2
2
/2,
so that Ee
i
t
+i(
t
t
)
= Ee
i
t
Ee
i(
t
t
)
. Hence, for any t,
t
and
t
t
are independent.
Furthermore, for any constants , R, the process
t
+(
t

t
) =
( )
t
+
t
is a homogeneous process, which is proved as above by
using Lemma 8. It follows that the two-dimensional process (
t
,
t

t
)
has homogeneous increments. In particular, if s < t, the distributions of
(
ts
,
ts
ts
) and (
t
s
,
t
t
(
s
s
)) coincide, and since the rst
pair is independent, so is the second. Now the independence of the processes
t
and
t
t
follows from Lemma 7 and from the fact that
t
s
,
t
s
,
and (
t

s
,
t

t
(
s

s
)) are T
s,t
-measurable (see (14) and Lemma
8). The theorem is proved.
The following exercise describes all nonnegative innitely divisible cadlag
processes.
15. Exercise. Let
t
be an innitely divisible cadlag process satisfying
t

0 for all t 0 and . Take
t
=
t
(a) from Theorem 14.
(i) By using Exercise 2.9, show that all jumps of
t
are nonnegative.
(ii) Prove that for every t 0, we have P(
t
(a) = 0) = exp(t([a, ))).
(iii) From Theorem 14 and (ii), derive that
t
t
(a) 0 (a.s.) for each
t 0.
(iv) Since obviously
t
(a) increases as a decreases, conclude that
t
(0+)
t
< (a.s.) for each t 0. From (15) with = 0 nd the characteristic
function of
t
(0+) and prove that
t

t
(0+) has normal distribution. By
using that
t

t
(0+) 0 (a.s.), prove that
t
=
t
(0+) (a.s.).
(v) Prove that
_
1
0
x(dx) < ,
t
=
_
(0,)
xp(t, dx) (a.s.),
and, in particular,
t
is a pure jump process with nonnegative jumps.
4. Further comments on jump measures
1. Exercise. Let f(t, x) be a Borel nonnegative function such that f(t, 0) =
0. Prove that
_
R
+
R
f(s, x) p(dsdx) is a random variable and
E
_
R
+
R
f(s, x) p(dsdx) =
_
R
+
R
f(s, x) ds(dx). (1)
2. Exercise. Let f(t, x) = f(, t, x) be a bounded function such that f = 0
for [x[ < and for t T, where the constants , T (0, ). Also assume
that f(, t, x) is left continuous in t for any (, x) and T
t
B(R)-measurable
for any t. Prove that the following version of (1) holds:
E
_
R
+
R
f(s, x) p(dsdx) =
_
R
+
R
Ef(s, x) ds(dx).
The following two exercises are aimed at generalizing Theorem 3.9.
3. Exercise. Let f(t, x) be a bounded Borel function such that f = 0 for
[x[ < , where the constant > 0. Prove that, for t [0, ),
(t) := E expi
_
(0,t]R
f(s, x) p(dsdx) = exp
_
(0,t]R
(e
if(s,x)
1) ds(dx).
Ch 5 Section 5. Representing processes through jump measures 155
4. Exercise. By taking f in Exercise 3 as linear combinations of the indica-
tors of Borel subsets
1
, ...,
n
of R
+
R, prove that, if the sets are disjoint,
then p(
1
), ..., p(
n
) are independent. Also prove that, if
1
R
T,
, then
p(
1
) is Poisson with parameter ( )(
1
).
The following exercise shows that Poisson processes without common
jumps are independent.
5. Exercise. Let (, T, P) be a probability space, and let T
t
be -elds
dened for t 0 such that T
s
T
t
T for s t. Assume that
t
and
t
are two Poisson processes with parameters and respectively dened on
, and such that
t
and
t
are T-measurable for each t and
t+h

t
and
t+h

t
are independent of T
t
for all t, h 0. Finally, assume that
t
and
t
do not have common jumps, that is, (
t
)
t
= 0 for all t and . Prove
that the processes
t
and
t
are independent.
5. Representing innitely divisible processes
through jump measures
We start with a simple result.
1. Theorem. Let
t
be an innitely divisible cadlag process with parameters
, b, and Levy measure concentrated at points x
1
, ..., x
n
.
(i) If ,= 0, then there exist a Wiener process w
t
and Poisson pro-
cesses p
1
t
,..., p
n
t
with parameters (x
1
), ..., (x
n
), respectively, such that
w
t
, p
1
t
,..., p
n
t
are mutually independent and
t
= x
1
p
1
t
+... +x
n
p
n
t
+bt +w
t
t 0 (a.s.). (1)
(ii) If = 0, assertion (i) still holds if one does not mention w
t
and
drops the term w
t
in (1).
Proof. (i) Of course, we assume that x
i
,= x
j
for i ,= j. Notice that
(0) = 0. Therefore, x
m
,= 0. Also
(R x
1
, ..., x
n
) = 0.
Hence, by Corollary 3.12, we may assume that all jumps of
t
belong to the
set x
1
, ..., x
n
.
Now take a > 0 such that a < [x
i
[ for all i, and dene
t
by (3.13).
By Theorem 3.14 the process
t

t
does not have jumps and is innitely
divisible. By Corollary 3.11 we conclude that
t
= bt +w
t
.
In addition, formula (3.1) shows also that
t
= x
1
p
t
(x
1
) +... +x
n
p
t
(x
n
) = x
1
p
t
(a
1
, b
1
] +... +x
n
p
t
(a
n
, b
n
],
where a
m
, b
m
are any numbers satisfying a
m
b
m
> 0, a
m
< x
m
b
m
,
and such that (a
m
, b
m
] are mutually disjoint. This proves (1) with p
m
t
=
p
t
(a
m
, b
m
], which are Poisson processes with parameters (x
m
).
To prove that w
t
, p
1
t
,..., p
n
t
are mutually independent, introduce p
as
the jump measure of
t
and observe that by Theorem 3.14 the processes
t

t
= bt + w
t
and
t
(that is, w
t
and
t
) are independent. It follows
from Lemma 3.3 that, if we take any continuous functions f
1
, ..., f
n
vanishing
in the neighborhood of the origin, then the process w
t
and the vector-valued
process
_
_
R
f
1
(x) p
t
(dx), ...,
_
R
f
n
(x) p
t
(dx)
_
are independent. By taking appropriate approximations we conclude that
the process w
t
and the vector-valued process
(p
t
(a
1
, b
1
], ..., p
t
(a
n
, b
n
])
are independent. Finally, by observing that, by Theorem 3.9, the processes
p
t
(a
1
, b
1
],..., p
t
(a
n
, b
n
] are independent and, obviously (cf. (3.2)), p
= p,
we get that w
t
, p
1
t
,..., p
n
t
are mutually independent. The theorem is proved.
The above proof is based on the formula
t
=
a
t
+
a
t
, (2)
where
a
t
=
_
R\(a,a)
xp
t
(dx),
a
t
=
t

a
t
, a > 0,
and the fact that for small a all processes
a
t
are the same. In the general
case we want to let a 0 in (2). The only trouble is that generally there is
no limit of
a
t
as a 0. On the other hand, the left-hand side of (2) does
have a limit, just because it is independent of a. So there is a hope that if we
subtract an appropriate quantity from
a
t
and add it to
a
t
, the results will
converge. This appropriate quantity turns out to be the stochastic integral
against the centered Poisson measure q introduced by
q
t
(a, b] = p
t
(a, b] t((a, b]) if ab > 0.
2. Lemma. Let = (0, t] (a, b] : t > 0, a < b, ab > 0 and for A =
(0, t] (a, b] let q(A) = q
t
(a, b]. Then is a -system and q is a
random orthogonal measure on with reference measure .
Proof. Let A = (0, t
1
] (a
1
, b
1
], B = (0, t
2
] (a
2
, b
2
] . Then
AB = (0, t
1
t
2
] (c, d], (c, d] := (a
1
, b
1
] (a
2
, b
2
],
which shows that is a -system. That q is a random orthogonal measure
on with reference measure is stated in Corollary 3.13. The lemma
is proved.
3. Remark. We may consider as a system of subsets of R
+
R 0.
Then as is easy to see, () = B(R
+
) B(R 0). By Theorem 2.3.19,
L
2
(, ) = L
2
((), ). Therefore, Lemma 2 and Theorem 2.3.13 allow
us to dene the stochastic integral
_
R
+
(R\0)
f(t, x) q(dtdx)
for every Borel f satisfying
_
R
+
R
[f(t, x)[
2
dt(dx) <
(we write this integral over R
+
R instead of R
+
(R0) because (0) =
0 by denition). Furthermore,
E[
_
R
+
(R\0)
f(t, x) q(dtdx)[
2
=
_
R
+
R
[f(t, x)[
2
dt(dx),
E
_
R
+
(R\0)
f(t, x) q(dtdx) = 0,
(3)
the latter following from the fact that Eq(A) = 0 if A (see Remark
2.3.15).
4. Remark. Denote
_
R\0
f(x) q
t
(dx) =
_
R
+
(R\0)
I
(0,t]
(u)f(x) q(dudx). (4)
Then (3) shows that, for each Borel f satisfying
_
R
[f(x)[
2
(dx) < and
every t, s [0, ),
E
_
R\0
f(x) q
t
(dx)
_
R\0
f(x) q
s
(dx)[
2
= [t s[
_
R
[f(x)[
2
(dx),
E
_
R\0
f(x) q
t
(dx)[
2
= t
_
R
[f(x)[
2
(dx).
In the following exercise we use for the rst time our assumption that
(, T, P) is a complete probability space. This assumption allowed us to
complete (
s
: s t) and have this completion, denoted T
t
, to be part
of T. This assumption implies that, if we are given two random variables
satisfying = (a.s) and is T
t
-measurable, so is .
5. Exercise*. Prove that if f is a bounded Borel function vanishing in a
neighborhood of zero, then
_
R
[f(x)[
2
(dx) < and
_
R\0
f(x) q
t
(dx) =
_
R
f(x) p
t
(dx) t
_
R
f(x) (dx) (a.s.). (5)
By using Lemma 3.8, conclude that the left-hand side of (5) is T
t
-measur-
able for every f L
2
(B(R), ).
6. Exercise*. As a continuation of Exercise 5, prove that (5) holds for
every Borel f satisfying f(0) = 0 and
_
R
([f[ +[f[
2
) (dx) < .
7. Lemma. For every Borel f L
2
(B(R), ) the stochastic integral
t
:=
_
R\0
f(x) q
t
(dx)
is an innitely divisible T
t
-adapted process such that, if 0 s t < ,
then
t
s
and T
s
are independent. By Theorem 1.11 the process
t
admits
a modication with trajectories in D[0, ). If we keep the same notation
for the modication, then for every T [0, )
E sup
tT
2
t
4T
_
R
[f(x)[
2
(dx). (6)
Proof. If f is a bounded continuous function vanishing in a neighborhood
of zero, the rst statement follows from Exercise 5 and Lemma 3.8. An
obvious approximation argument and Remark 4 allow us to extend the result
to arbitrary f in question.
To prove (6) take 0 t
1
... t
n
T and observe that, owing to the
independence of
t
k+1

t
k
and T
t
k
, we have
E(
t
k+1

t
k
[T
t
k
) = E(
t
k+1

t
k
) = 0.
Therefore, (
t
k
, T
t
k
) is a martingale. By Doobs inequality
E sup
k
2
t
k
4E
2
T
= 4T
_
R
[f(x)[
2
(dx).
Clearly the inequality between the extreme terms has nothing to do with
ordering t
k
. Therefore by ordering the set of all rationals on [0, T] and
taking the rst n rationals as t
k
, k = 1, ..., n, and then sending n to innity,
by Fatous theorem we nd that
E sup
r,r<T
2
r
4T
_
R
[f(x)[
2
(dx).
Now equation (6) immediately follows from the right continuity and the
stochastic continuity (at point T) of
, since (a.s.)
sup
tT
2
t
= sup
t<T
2
t
= sup
r,r<T
2
r
.
8. Theorem. Let
t
be an innitely divisible cadlag process with parameters
, b, and Levy measure .
(i) If ,= 0, then there exist a constant

b and a Wiener process w
t
, which
is independent of all processes p
t
(c, d], such that, for each t 0,
t
=

bt +w
t
+
_
(1,1)
xq
t
(dx) +
_
R\(1,1)
xp
t
(dx) (a.s.). (7)
(ii) If = 0, assertion (i) still holds if one does not mention w
t
and
drops the term w
t
in (7).
Proof. For a (0, 1) write (2) as
t
=
a
t
+
_
(1,1)\(a,a)
xp
t
(dx) +
_
R\(1,1)
xp
t
(dx).
Here, by Exercise 5,
_
(1,1)\(a,a)
xp
t
(dx) =
_
(1,1)\(a,a)
xq
t
(dx) +t
_
(1,1)\(a,a)
x(dx),
so that
t
=
a
t
+
_
(1,1)\(a,a)
xq
t
(dx) +
_
R\(1,1)
xp
t
(dx), (8)
where
a
t
=
a
t
+t
_
(1,1)\(a,a)
x(dx).
By Lemma 7, for any T (0, ),
Esup
tT
[
_
(1,1)\(a,a)
xq
t
(dx)
_
(1,1)
xq
t
(dx)[
2
0
as a 0. Therefore, there exists a sequence a
n
0, along which with
probability one the rst integral on the right in (8) converges uniformly on
each nite time interval to the rst integral on the right in (7). It follows
from (8) that almost surely
a
n
t
also converges uniformly on each nite time
interval to a process, say
t
. Bearing in mind that the
a
t
are cadlag and
using Exercise 1.8, we see that
t
is cadlag too. By Theorem 3.14, the
process
a
t
is innitely divisible cadlag. It follows that
a
t
and
t
are innitely
divisible cadlag as well.
Furthermore, since
a
t
does not have jumps larger in magnitude than a,
the process
t
does not have jumps at all and hence is continuous (the last
conclusion is easily proved by contradiction). Again by Theorem 3.14, the
process
a
t
is independent of
a
t
and, in particular, is independent of the jump
measure of
a
t
(cf. Lemma 3.3). The latter being p
t
((c, d] (a, a)) (cf. (3.2))
shows that
a
t
as well as
a
t
are independent of all processes p
t
((c, d](a, a)).
By letting a 0, we conclude that
t
is independent of all processes p
t
(c, d].
To conclude the proof it only remains to use Corollary 3.11. The theorem
is proved.
9. Exercise. It may look as though assertion (i) of Theorem 8 holds even
if = 0. Indeed, in this case w
t
0 anyway. However, generally this
assertion is false if = 0. The reader is asked to give an example in which
this happens.
6. Constructing innitely divisible processes
Here we want to show that for an arbitrary Levy measure and constants b
and there exists an innitely divisible process
t
, dened on an appropriate
probability space, such that
Ch 5 Section 6. Constructing innitely divisible processes 161
Ee
i
t
= exp t
_
R
(e
ix
1 isin x) (dx) +ib
2
2
/2. (1)
By the way, this will show that generally there are no additional properties
of apart from those listed in (2.7).
The idea is that if we have at least one process with arbitrarily small
jumps, then by redirecting the jumps we can get jump measures corre-
sponding to arbitrary innitely divisible process. We know that at least one
such test process exists, the increasing 1/2-stable process
a+
, a 0 (see
Theorem 2.6.1 and Exercises 1.12).
The following lemma shows how to redirect the jumps of
a+
.
1. Lemma. Let be a positive measure on B(R) such that (R(a, a)) <
for any a > 0 and (0) = 0. Then there exists a nite Borel function
f(x) on R such that f(0) = 0 and for any Borel
() =
_
f
1
()
[x[
3/2
dx.
Proof. For x > 0, dene 2F(x) = (x, ). Notice that F(x) is right
continuous on (0, ) and F() = 0. For x > 0 let
f(x) = infy > 0 : 1 xF
2
(y).
Since F() = 0, f is a nite function.
Next notice that, if t > 0 and f(x) > t, then for any y > 0 satisfying
1 xF
2
(y), we have y > t, which implies that 1 < xF
2
(t). Hence,
x > 0 : f(x) > t x > 0 : xF
2
(t) > 1. (2)
On the other hand, if t > 0 and xF
2
(t) > 1, then due to the right continuity
of F also xF
2
(t +) > 1, where > 0. In that case, f(x) t + > t. Thus
the sets in (2) coincide if t > 0, and hence
(t, ) = 2F(t) =
_

1/F
2
(t)
x
3/2
dx =
_
x:xF
2
(t)>1
x
3/2
dx = (t, ),
where
() =
_
x>0:f(x)
x
3/2
dx.
A standard measure-theoretic argument allows us to conclude that
( (0, )) = ()
not only for = (t, ), t > 0, but for all Borel (0, ).
Similarly, one constructs a negative function g(x) on (, 0) such that
( (, 0)) =
_
x<0:g(x)
[x[
3/2
dx.
Finally, the function we need is given by f(x)I
x>0
+ g(x)I
x<0
. The lemma
is proved.
We also need the following version of Lemma 3.8.
2. Lemma. Let p
t
be the jump measure of an innitely divisible cadlag
process with Levy measure , and let f be a nite Borel function such that
f(0) = 0 and (x : f(x) ,= 0) < . Then
(i) we have
_
R\0
[f(x)[ p
t
(dx) <
(a.s.), and
t
:=
_
R\0
f(x) p
t
(dx)
is well dened and is cadlag;
(ii)
t
is an innitely divisible process, and
Ee
i
t
= exp t
_
R
(e
if(x)
1) (dx). (3)
Proof. (i) By Corollary 3.10
Ep
t
(x : f(x) ,= 0) = t(x : f(x) ,= 0) < .
Since the measure p
t
is integer valued, it follows that (a.s.) there are only
nitely many points in x : f(x) ,= 0 to which p
t
assigns a nonzero mass.
This proves (i).
To prove (ii) we use approximations. The inequality [e
if
1[ 2I
f,=0
and the dominated convergence theorem show that, if assertion (ii) holds
for some functions f
n
(x) such that f
n
f, (x : sup
n
[f
n
(x)[ > 0) < ,
and
_
R\0
[f f
n
[ p
t
(dx)
P
0, (4)
then (ii) is also true for f. By taking f
n
= (n) f n, we see that it
suces to prove (ii) for bounded f. Then considering f
n
= fI
1/n<[x[<n
reduces the general case further to bounded functions vanishing for small
and large [x[. Any such function can be approximated in L
1
(B(R), ) by
continuous functions f
n
, for which (4) holds automatically due to Corollary
3.10 and (3) holds due to Lemma 3.8 (ii) with = 0. The lemma is proved.
Now let be a Levy measure and b and some constants. Take a
probability space carrying two independent copies
t
of the process
t+
,
t 0, and a Wiener process w
t
independent of (
+
t
,
t
). By Exercise 2.11,
the Levy measure of
t
is given by c
0
x
3/2
I
x>0
dx, where c
0
is a constant.
Dene
0
(dx) = c
0
[x[
3/2
dx
and take the function f from Lemma 1 constructed from /c
0
in place of
, so that, for any B(R),
() =
0
f
1
() =
0
(x : f(x) ). (5)
3. Remark. Equation (5) means that, for any B(R) and h = I
,
_
R
h(x) (dx) =
_
R
h(f(x))
0
(dx). (6)
A standard measure-theoretic argument shows that (6) is true for each Borel
nonnegative h and also for each Borel h for which at least one of
_
R
[h(x)[ (dx) and
_
R
[h(f(x))[
0
(dx)
is nite. In particular, if h is a Borel function, then h L
2
(B(R), ) if and
only if h(f) L
2
(B(R),
0
).
4. Theorem. Let p
be the jump measures of
t
and q
the centered Pois-

son measures of
t
. Dene
t
=
_
R\0
f(x)I
[f(x)[<1
q
t
(dx) +
_
R\0
f(x)I
[f(x)[1
p
t
(dx)
=:
t
+
t
.
Then, for a constant

b, the process
t
=

bt +w
t
+
+
t
+
t
is an innitely divisible process satisfying (1).
Proof. Observe that
_
R
f
2
(x)I
[f(x)[<1

0
(dx) =
_
(1,1)
x
2
(dx) < .
Therefore, the processes
t
are well dened. In addition,
0
(x > 0 : [f(x)[ 1)
0
(x : [f(x)[ 1) = ([x[ 1) < .
Hence,
t
is well dened due to Lemma 2.
Next, in order to nd the characteristic function of
t
, notice that
fI
[f[<a
0 in L
2
(B(R),
0
) as a 0, so that upon remembering the prop-
erties of stochastic integrals, in particular, Exercise 5.6, we obtain
t
= l.i.m.
a0
_
_
R\0
f(x)I
a[f(x)[<1
p
t
(dx)
t
_

0
f(x)I
a[f(x)[<1

0
(dx)
_
.
It follows that
t
= P- lim
a0
_
_
R\0
f(x)I
a[f(x)[
p
t
(dx)
t
_

0
f(x)I
a[f(x)[<1

0
(dx)
_
.
Now Lemma 2 implies that
t
are innitely divisible and
Ee
i
t
= lim
a0
exp t
_

0
_
(e
if(x)
1)I
a[f(x)[
if(x)I
a[f(x)[<1
)
_
0
(dx).
In the next few lines we use the fact that [e
ix
1 ixI
[x[<1
[ is less
than
2
x
2
if [x[ < 1 and less than 2 otherwise. Then, owing to Remark 3,
we nd that
g(, a) :=
_

0
_
(e
if(x)
1)I
a[f(x)[
if(x)I
a[f(x)[<1
_
0
(dx)
+
_

0
_
(e
if(x)
1)I
a[f(x)[
if(x)I
a[f(x)[<1
_
0
(dx)
=
_
R
_
(e
if(x)
1)I
a[f(x)[
if(x)I
a[f(x)[<1
_
0
(dx)
=
_
R\(a,a)
_
e
ix
1 ixI
[x[<1
_
(dx).
This along with the dominated convergence theorem implies that
g(, a)
_
R
(e
ix
1ixI
[x[<1
) (dx) =
_
R
(e
ix
1isin x) (dx)+i
b,
where
b =
_
R
(sin x xI
[x[<1
) (dx)
is a well-dened constant because [ sin x xI
[x[<1
[ 2 x
2
.
Finally, upon remembering that the processes w
t
, p
+
t
, p
t
are indepen-
dent, we conclude that
t
is innitely divisible and
Ee
i
t
= lim
a0
exp t
_
i
b
2
2
/2 +g(, a)
_
,
which equals the right-hand side of (1) if

b +
b = b. The theorem is proved.

The theory in this chapter admits a very natural generalization for
vector-valued innitely divisible processes, which are dened in the same
way as in Sec. 2. Also as above, having an innitely divisible process with
jumps of all sizes in all directions allows one to construct all other innitely
divisible processes. In connection with this we set the reader the following.
5. Exercise. Let w
t
, w
1
t
, ..., w
d
t
be independent Wiener processes. Dene
t
= infs 0 : w
s
t and
t
= (w
1
t
, ..., w
d
t
),
t
=
t
.
Prove that:
(i) The process
t
is innitely divisible.
(ii) Eexp(i
t
) = exp(ct[[) for any R
d
, where c > 0 is a constant,
so that
t
has a multidimensional Cauchy distribution.
(iii) It follows from (ii) that the components of
t
are not independent.
On the other hand, the components of
t
are independent random processes
and we do a change of time in
t
, random but yet independent of . Explain
why this makes the components of
t
=
t
depend on each other. What
kind of nontrivial information about the trajectory of
2
t
can one get if one
knows the trajectory
1
t
, t > 0 ?
1.8 Assume the contrary.
1.12 For any cadlag modication

t
of a process
t
we have
t
P
s
as t s.
2.10 Use
_
R
(sin(x/) sin x)x
2
dx = 0, which is true since sin x is an
odd function.
2.11 To show that a = b = 1, observe that
(z) :=
_

0
x
3/2
(e
zx
1) dx
is an analytic function for Re z > 0 which is continuous for Re z 0. Fur-
thermore, for real z, changing variables, prove that (z) =

z(1) and
express (1) through the gamma function by integrating by parts. Then
notice that

2(i) = a ib.
3.15 (ii) P(
t
(a) = 0) = P(p[a, ) = 0). (iii) Use that
t

t
(a) and
t
(a)
are independent and their sum is positive. (iv)&(v) Put = 0 in (3.15) to
get the characteristic function of
t
(0+) and also the fact that
lim
a0
_
[a,)
(e
ix
1) (dx)
exists.
4.1 Corollary 3.10 says that the nite measures
,T
() := Ep((0, T] (R (, )))
and
( )((0, T] (R (, )))
coincide on sets of the form (0, t] (a, b].
4.2 Assume f 0, approximate f by the functions f([tn]/n, x), and prove
that
E
_
(k/n,(k+1)/n]R
f(k/n, x) p(dsdx)
= E
_
(k/n,(k+1)/n]R
(Ef(k/n, x)) p(dsdx).
To do this step, use - and -systems in order to show that it suces to
take f(k/n, x) equal to I
A
(, x), where A and p
(k+1)/n
() p
k/n
() are
independent.
4.3 First let there be an integer n such that f(t, x) = f((k + 1)2
n
, x)
whenever k is an integer and t (k2
n
, (k +1)2
n
], and let f((k +1)2
n
, x)
be continuous in x. In that case use Lemma 3.8. Then derive the result for
any continuous function f(t, x) vanishing for [x[ < . Finally, pass to the
limit from continuous functions to arbitrary ones by using (4.1).
4.5 Take some constants and and dene
t
=
t
+
t
, (t) = Ee
i
t
.
Notice that
e
i
t
= 1 +
_
(0,t]
e
i
t
[e
i
1] d
t
+ [e
i
1] d
t
,
where on the right we just have a telescoping sum. By taking expectations
derive that
(t) = 1 +
_
t
0
(s)[e
i
1] + [e
i
1] ds.
This will prove the independence of
t
and
t
for any t. To prove the inde-
pendence of the processes, repeat part of the proof of Lemma 3.7.
5.5 First check (5.5) for f = I
(a,b]
with ab > 0, and then use Corollary
3.10, the equality L
2
(, ) = L
2
((), ), and (2.7), which shows that
(R (a, a)) < for any a > 0.
5.6 The functions (n f)I
[x[>1/n
converge to f in L
1
(B(R), ) and in
L
2
(B(R), ).
6.5 (i) Use Theorem 2.6.1. (ii) Add that
Eexp(i
t
) =
_

0
E exp(i
s
) P(
t
ds).
(iii) Think of jumps.
Chapter 6
It o Stochastic Integral
The reader may have noticed that stochastic integrals or stochastic integral
equations appear in every chapter in this book. Here we present a systematic
study of the It o stochastic integral against the Wiener process. This integral
has already been introduced in Sec. 2.7 by using an approach which is equally
good for dening stochastic integrals against martingales. This approach
also exhibits the importance of the -eld of predictable sets. Traditionally
the It o stochastic integral against dw
t
is introduced in a dierent way, with
discussion of which we start the chapter.
1. The classical denition
Let (, T, P) be a complete probability space, T
t
, t 0, an increasing
ltration of -elds T
t
T, and w
t
, t 0, a Wiener process relative to T
t
.
1. Denition. Let f
t
= f
t
() be a function dened on [0, ). We
write f H
0
if there exist nonrandom points 0 = t
0
t
1
... t
n
<
such that the f
t
i
are T
t
i
-measurable, Ef
2
t
i
< , and f
t
= f
t
i
for t [t
i
, t
i+1
)
if i n, whereas f
t
= 0 for t t
n
.
2. Exercise. Why does it not make much sense to consider functions sat-
isfying f
t
= f
t
i
for t (t
i
, t
i+1
] ?
For f H
0
we set
If =
n1
i=0
(w
t
i+1
w
t
i
)f
t
i
.
169
170 Chapter 6. It o Stochastic Integral, Sec 1
Obviously this denition is independent of the partition t
i
of [0, ) pro-
vided that f H
0
. In particular, the notation If makes sense, and I is a
linear operator on H
0
.
3. Lemma. If f H
0
, then
E(If)
2
= E
_

0
f
2
t
dt, EIf = 0.
Proof. We have (see Theorem 3.1.12)
Ef
2
t
j
(w
t
j+1
w
t
j
)
2
= Ef
2
t
j
E(w
t
j+1
w
t
j
)
2
[T
t
j
= Ef
2
t
j
(t
j+1
t
j
),
since w
t
j+1
w
t
j
is independent of T
t
j
and f
t
j
is T
t
j
-measurable. This and
Cauchys inequality imply that the rst expression in the following relations
makes sense:
Ef
t
i
(w
t
i+1
w
t
i
)f
t
j
(w
t
j+1
w
t
j
)
= Ef
t
i
(w
t
i+1
w
t
i
)f
t
j
E(w
t
j+1
w
t
j
)[T
t
j
= 0
if i < j, since t
i+1
t
j
and f
t
j
, w
t
i+1
w
t
i
, f
t
i
are T
t
j
-measurable, whereas
w
t
j+1
w
t
j
is independent of T
t
j
. Hence
E(If)
2
=
n1
j=0
Ef
2
t
j
(w
t
j+1
w
t
j
)
2
+2

i<jn1
Ef
t
i
(w
t
i+1
w
t
i
)f
t
j
(w
t
j+1
w
t
j
)
=
n1
j=0
Ef
2
t
j
(t
j+1
t
j
) = E
_

0
f
2
t
dt.
Similarly, Ef
t
j
(w
t
j+1
w
t
j
) = 0 and EIf = 0. The lemma is proved.
The next step was not done in Secs. 2.7 and 2.8 because we did not have
the necessary tools at that time. In the following lemma we use the notion
of continuous time martingale, which is introduced in the same way as in
Denition 3.2.1, just allowing m and n to be arbitrary numbers satisfying
0 n m.
4. Lemma. For f H
0
, dene I
s
f = I(I
[0,s)
f). Then (I
s
f, T
s
) is a mar-
tingale for s 0.
Proof. Fix s and without loss of generality assume that s t
0
, ..., t
n
.
If s = t
k
, then
I
[0,s)
f
t
=
k1
i=0
f
t
i
I
[t
i
,t
i+1
)
(t), I
s
f =
k1
i=0
f
t
i
(w
t
i+1
w
t
i
).
Ch 6 Section 1. Classical denition 171
It follows that I
s
f is T
s
-measurable. Furthermore, if t s, t, s t
0
, ..., t
n
,
t = t
r
, and s = t
k
, then
EI
s
f I
t
f[T
t
=
k1
i=r
Ef
t
i
Ew
t
i+1
w
t
i
[T
t
i
[T
t
= 0.
Next by using the theory of martingales we derive an inequality allowing
us to dene stochastic integrals with variable upper limit as continuous
processes.
5. Lemma (Doobs inequality). For f H
0
, we have
Esup
s0
(I
s
f)
2
4E
_

0
f
2
t
dt. (1)
Proof. First, notice that
I
s
f =
n1
i=0
f
t
i
(w
t
i+1
s
w
t
i
s
).
Therefore, the process I
s
f is continuous in s and the sup in (1) can be taken
over the set of rational s. In particular, the sup is a random variable, and
(1) makes sense.
Next, let 0 s
0
... s
m
< . Since (I
s
k
f, T
s
k
) is a martingale, by
Doobs inequality
E sup
km
(I
s
k
f)
2
4E(I
s
m
f)
2
= 4E
_
s
m
0
f
2
t
dt 4E
_

0
f
2
t
dt.
Clearly the inequality between the extreme terms holds for any s
0
, ..., s
m
,
not necessarily ordered. In particular, one can number all rationals on [0, )
and then take the rst m + 1 rational numbers as s
0
, ..., s
m
. If after that
one lets m , then one gets (1) by the monotone convergence theorem.
Lemma 3 allows us to follow an already familiar pattern. Namely, con-
sider H
0
as a subset of L
2
(T B[0, ), ), where (ddt) = P(d)dt. On
H
0
we have dened the operator I which maps H
0
isometrically to a subset
of L
2
(T, P). By Lemma 2.3.12 the operator I admits a unique extension
to an isometric operator acting from

H
0
into L
2
(T, P). We keep the same
notation I for the extension, and for a function f

H
0
we dene its It o
stochastic integral by the formula
_

0
f
t
dw
t
= If.
We have to explain that this integral coincides with the one introduced
in Sec. 2.7.
6. Remark. Obviously
H
0
H,
where H is introduced in Denition 2.8.1 as the set of all real-valued T
t
-
adapted functions f
t
() which are T B(0, )-measurable and belong to
L
2
(T B[0, ), ).
7. Remark. Generally the processes from H
0
are not predictable, since
they are right continuous in t. However, if one redenes them at points of
discontinuity by taking the left limits, then one gets left-continuous, hence
predictable, processes (see Exercise 2.8.3) coinciding with the initial ones
for almost all t. It follows that H
0
L
2
(T, ).
Observe that

H
0
, which is the closure of H
0
in L
2
(T B[0, ), ), coin-
cides with the closure of H
0
in L
2
(T, ), since L
2
(T, ) L
2
(TB[0, ), ).
Furthermore, replacing the left continuous
n
(t) in the proof of Theorem
2.8.2 with the right continuous 2
n
[2
n
t], we see that f

H
0
if f
t
is an T
t
-
adapted T B(0, )-measurable function belonging to L
2
(T B(0, ), ).
In other words,
H

H
0
.
8. Remark. It follows by Theorem 2.8.8 (i) that If coincides with the It o
stochastic integral introduced in Sec. 2.7 on functions f H
0
. Since H
0
H
and H
0
is dense in H (Remarks 6 and 7) and H = L
2
(T, ) in the sense
described in Exercise 2.8.5, we have that

H
0
= L
2
(T, ), implying that both
stochastic integrals are dened on the same set and coincide there.
9. Denition. For f

H
0
and s 0 dene
_
s
0
f
t
dw
t
=
_

0
I
[0,s)
(t)f
t
dw
t
.
This is the traditional way of introducing the stochastic It o integral
against the Wiener process with variable upper limit. Notice that for many
other martingales such as m
t
:=
t
t, where
t
is a Poisson process with
parameter one, it is much more natural to replace I
[0,s)
with I
[0,s]
, since
then
_
s
0
1 dm
t
= m
s
on each trajectory. In our situation the integral on each
particular trajectory makes no sense, and taking I
[0,s]
leads to the same
result since I
[0,s]
= I
[0,s)
as elements of L
2
(T B(0, ), ).
Ch 6 Section 1. Classical denition 173
Dening the stochastic integral as result of a mapping into L
2
(T, P)
species the result only almost surely, so that for any s 0 there are many
candidates for
_
s
0
f
t
dw
t
. If one chooses these candidates arbitrary for each s,
one can easily end up with a process which has nonmeasurable trajectories
for each . It is very important for the theory of stochastic integration
that one can arrange the choosing in such a way that almost all trajectories
become continuous in s.
10. Theorem. Let f

H
0
. Then the process
_
s
0
f
t
dw
t
admits a continuous
modication.
Proof. Take f
n
H
0
such that
E
_

0
[f
t
f
n
t
[
2
dt 2
n
.
Then for each s 0 in the sense of convergence in L
2
(T B(0, ), ) we
have
I
[0,s)
(t)f
t
= I
[0,s)
(t)f
1
t
+I
[0,s)
(t)(f
2
t
f
1
t
) +... +I
[0,s)
(t)(f
n+1
t
f
n
t
) +....
Hence by continuity (or isometry), in the sense of the mean-square conver-
gence we have
_
s
0
f
t
dw
t
=
_
s
0
f
1
t
dw
t
+
_
s
0
(f
2
t
f
1
t
) dw
t
+... +
_
s
0
(f
n+1
t
f
n
t
) dw
t
+....
(2)
Here each term is continuous as the integral of an H
0
-function, so that
to prove the theorem it suces to prove that the series in (2) converges
uniformly for almost every .
By Doobs inequality
E sup
s0
_
s
0
(f
n+1
t
f
n
t
) dw
t
2
4E
_

0
(f
n+1
t
f
n
t
)
2
dt 16 2
n
.
By Chebyshevs inequality
Psup
s0
_
s
0
(f
n+1
t
f
n
t
) dw
t
n
2
16n
4
2
n
,
and since the series

n
4
2
n
converges, by the Borel-Cantelli lemma with
probability one for all large n we have
sup
s0
_
s
0
(f
n+1
t
f
n
t
) dw
t
< n
2
.
Finally, we remember that

n
2
< . The theorem is proved.
2. Properties of the stochastic integral on H
The Ito integral is dened on the set

H
0
, which is a space of equivalence
classes. By Remarks 1.7 and 1.8, in each equivalence class there is a function
belonging to H. As usual we prefer to deal not with equivalence classes
but rather with their particular representatives, and now we concentrate on
integrating processes of class H. Furthermore, Theorem 1.10 allows us to
consider only continuous versions of stochastic integrals.
1. Theorem. Let f, g H, a, b R. Then:
(i) (linearity) for all t at once with probability one
_
t
0
(af
s
+bg
s
) dw
s
= a
_
t
0
f
s
dw
s
+b
_
t
0
g
s
dw
s
; (1)
(ii) E
_
0
f
t
dw
t
= 0;
(iii) the process
_
t
0
f
s
dw
s
P
t
;
(iv) Doobs inequality holds:
E sup
t
_
_
t
0
f
s
dw
s
_
2
4E
_

0
f
2
t
dt;
(v) if A T, T [0, ], and f
t
() = g
t
() for all A and t [0, T),
then
I
A
_
t
0
f
s
dw
s
= I
A
_
t
0
g
s
dw
s
(2)
for all t [0, T] at once with probability one.
Proof. (i) For each t 0 equation (1) (a.s.) follows by denition (see
Lemma 2.3.12). Furthermore, both sides of (1) are continuous in t, and
hence they coincide for all t if they coincide for all rational t. Since for
each rational t, (1) holds almost surely and the set of rational numbers is
countable and the intersection of countably many events of full probability
has probability one, (1) indeed holds for all t 0 on a set of full probability.
Ch 6 Section 2. Properties of the stochastic integral on H 175
(ii) Take f
n
H
0
such that f
n
f in L
2
(T B(0, ), ). Then use
Cauchys inequality and Lemma 1.3 to nd that
[EIf[ = [EI(f f
n
)[
_
EI(f f
n
)
2
1/2
=
_
E
_

0
(f
t
f
n
t
)
2
dt
1/2
0.
(iii) Take the same sequence f
n
as above and remember that Lemma 1.4
allows us to write
E
_
_
t
0
f
n
s
dw
s
[T
r
_
=
_
r
0
f
n
s
dw
s
(a.s.) 0 r t. (3)
Furthermore, E
_ _
t
0
f
n
s
dw
s
_
t
0
f
s
dw
s
_
2
= E
_
t
0
(f
n
s
f
s
)
2
ds 0, and eval-
uating conditional expectation is a continuous operator in L
2
(T, P) as a
projection operator in L
2
(T, P) (Theorem 3.1.14). Hence upon passing to
the limit in (3) in the mean-square sense, we get an equality which shows
that
(a)
_
r
0
f
s
dw
s
is T
P
r
-measurable as a function almost surely equal to an
T
r
-measurable E([T
r
);
(b) the martingale equality holds.
This proves (iii). Assertion (iv) is proved in the same way as Lemma 1.5.
(v) By the argument in (i) it suces to prove that (2) holds with prob-
ability one for each particular t [0, T]. In addition, I
t
f = I(I
[0,t)
f), which
shows that it only remains to prove that
I
A
_

0
f
s
dw
s
= I
A
_

0
g
s
dw
s
(a.s.) if f
s
() = g
s
() for all A and s 0. But this is just statement
(ii) of Theorem 2.8.8. The theorem is proved.
Further properties of the stochastic integral are related to the notion of
stopping time (Denition 2.5.7), which, in particular, will allow us to extend
the domain of denition of the stochastic integral from H to a larger set.
2. Exercise*. Prove that if a random process
t
is right continuous for
each (or left continuous for each ), then it is T B[0, )-measurable.
3. Exercise*. Let be an T
t
-stopping time. Prove that I
t<
and I
t
are
T
t
-adapted and T B[0, )-measurable, and that : t T
t
for
every t 0.
4. Exercise*. Let
t
be an T
t
-adapted continuous real-valued process, and
take real numbers a < b. Dene = inft 0 :
t
, (a, b) (inf := ) so
that is the rst exit time of
t
from (a, b). Prove that is an T
t
-stopping
time.
The major part of stopping times we are going to deal with will be
particular applications of Exercise 4 and the following.
5. Lemma. Let f = f
t
() be nonnegative, T
t
-adapted, and T B[0, )-
measurable. Assume that the -elds T
t
are complete, that is, T
t
= T
P
t
.
Then, for any t 0,
_
t
0
f
s
ds is T
t
-measurable.
Proof. If the assertion holds for f n in place of f, then by letting
n and using the monotone convergence theorem we get the result for
our f. It follows that without losing generality we may assume that f is
bounded. Furthermore, we can cut o the function f in t by taking t 0
and setting f
s
= 0 for s t. Then we see that it suces to prove our
assertion for f H.
In that case, as in the proof of Theorem 2.8.2 we conclude that there
exist f
n
H
0
such that
E
_
t
0
f
s
ds
_
t
0
f
n
s
ds
E
_
t
0
[f
s
f
n
s
[ ds
t
_
E
_

0
[f
s
f
n
s
[
2
ds
_
1/2
0.
Furthermore,
_
t
0
f
n
s
ds is obviously written as a sum in which all terms are
T
t
-measurable. The mean-square limit of T
t
-measurable variables is at least
T
P
t
-measurable, and the lemma is proved.
6. Remark. Due to this lemma, everywhere below we assume that T
P
t
=
T
t
. This assumption does not restrict generality at all, since, as is easy to
see, (w
t
, T
P
t
) is again a Wiener process and passing from T
t
to T
P
t
only
enlarges the set H. Actually, the change of H is almost unnoticeable, since
the set

H
0
remains unchanged as well as the stochastic integral and the
inclusions H
0
H

H
0
hold true as before. Also, as we have pointed out
before, we always take continuous versions of stochastic integrals, which is
possible due to Theorem 1.10.
Before starting to use stopping times we point out two standard ways of
approximating an arbitrary stopping time with discrete ones
n
. One can
use (2.5.3), or alternatively one lets
n
() = (k + 1)2
n
if () [k2
n
, (k + 1)2
n
)
and
n
() = if () = . In other words,
Ch 6 Section 2. Properties of the stochastic integral on H 177
n
= 2
n
[2
n
] + 2
n
. (4)
Then
n
,
n
2
n
, and
: t <
n
() = : 2
n
[2
n
t] T
2
n
[2
n
t]
T
t
,
so that
n
are stopping times.
7. Theorem. Let f H. Denote
t
=
_
t
0
f
s
dw
s
, t [0, ], and let be
an T
t
-stopping time. Then
=
_

0
I
s<
f
s
dw
s
=
_

0
I
s
f
s
dw
s
(5)
(a.s.) and (Walds identity)
E
_
_

0
f
s
dw
s
_
2
= E
_

0
f
2
s
ds. (6)
Proof. To prove (5), rst assume that takes only countably many
values t
1
, t
2
, .... On the set
k
:= : () = t
k
we have I
s<t
k
f
s
= I
s<
f
s
for all s 0. By denition, on
k
(a.s.) we have
=
t
k
=
_

0
I
s<t
k
f
s
dw
s
,
and by Theorem 1 (v) on
k
(a.s.)
_

0
I
s<t
k
f
s
dw
s
=
_

0
I
s<
f
s
dw
s
.
Thus the rst equality in (5) holds on any
k
(a.s.). Since

k

k
= , this
equality holds almost surely. To prove it in the general case it suces to
dene
n
by (4) and notice that
because
t
is continuous, whereas
E
_
_

0
I
s<
f
s
dw
s

_

0
I
s<
n
f
s
dw
s
_
2
= E
_

0
I
s<
n
f
2
s
ds 0
by the dominated convergence theorem. The second equality in (5) is obvious
since the integrands coincide for almost all (, s).
On the basis of (5) and the isometry of stochastic integration we conclude
that
E
2
= E
_

0
I
s<
f
2
s
ds = E
_

0
f
2
s
ds.
The following fundamental inequality can be extracted from the original
memoir of Ito [It].
8. Theorem. Let f H, and let N, c > 0, and T be constants. Then
P
_
sup
tT
_
t
0
f
s
dw
s
c
_
P
_
_
T
0
f
2
s
ds N
_
+
1
c
2
E
_
N
_
T
0
f
2
s
ds
_
.
Proof. We use the standard way of stopping stochastic integrals
t
=
_
t
0
f
s
dw
s
by using their brackets, dened as
)
t
:=
_
t
0
f
2
s
ds.
Let = inft 0 : )
t
N, so that is the rst exit time of )
t
from
(1, N). By Exercise 4 and Lemma 5 we have that is a stopping time.
Furthermore,
: < T : )
T
N
and on the set : T we have I
s<
f
s
= f
s
if s < T. Therefore, upon
denoting
A = : sup
tT
[
t
[ c,
by the Doob-Kolmogorov inequality for submartingales we get
P(A, T) = P
_
T, sup
tT
_
t
0
I
s<
f
s
dw
s
c
_
P
_
sup
tT
_
t
0
I
s<
f
s
dw
s
2
c
2
_
1
c
2
E
_
_
T
0
I
s<
f
s
dw
s
_
2
=
1
c
2
E
_
T
0
I
s<
f
2
s
ds =
1
c
2
E
_
_
T
0
I
s<
f
2
s
ds
_

0
I
s<
f
2
s
ds
_
1
c
2
E
_
N
_
T
0
f
2
s
ds
_
,
Ch 6 Section 3. It o integral if
_
T
0
f
2
s
ds < 179
where in the last inequality we have used the fact that, if < , then
obviously )
= N, and if = , then )
N. Hence
P(A) = P(A, < T) +P(A < T) P( < T) +P(A, T)
P
_
_
T
0
f
2
s
ds N
_
+
1
c
2
E
_
N
_
T
0
f
2
s
ds
_
.
9. Exercise. Under the assumptions of Theorem 8, prove that
P()
T
N)
1
N
E
_
c
2
sup
tT
2
t
_
+P(sup
tT
[
t
[ c).
10. Exercise. Prove Daviss inequality: If f H, then
1
3
E)
1/2
T
E sup
tT
[
t
[ 3E)
1/2
T
.
3. Dening the It o integral if
_
T
0
f
2
s
ds <
Denote by o the set of all T
t
-adapted, T B(0, )-measurable processes
f
t
such that
_
T
0
f
2
s
ds < (a.s.) T < .
Our task here is to dene
_
t
0
f
t
dw
t
for f o.
Dene
(n) = inft 0 :
_
t
0
f
2
s
ds n.
In Sec. 2 we have already seen that (n) are stopping times and
_
(n)
0
f
2
s
ds n.
Furthermore, obviously (n) (a.s.) as n . Finally, notice that
I
s<(n)
f
s
H. Indeed, the fact that this process is T
t
-adapted follows from
Exercise 2.3. Also
E
_

0
I
s<(n)
f
2
s
ds = E
_
(n)
0
f
2
s
ds n < .
It follows from the above that the stochastic integrals
t
(n) :=
_
t
0
I
s<(n)
f
s
dw
s
are well dened. If
_
t
0
f
s
dw
s
were dened, it would certainly satisfy
_
t
0
I
s<(n)
f
s
dw
s
=
_
t(n)
0
f
s
dw
s
.
This observation is a clue to dening
_
t
0
f
s
dw
s
.
1. Lemma. Let f o. Then there exists a set
t
such that P(
t
) = 1
and, for every
t
, m n, and t [0, (n, )], we have
t
(n) =
t
(m).
Proof. Fix t and n, and notice that on the set A = : t (n) we
have
I
s<t(n)
f
s
= I
s<t(m)
f
s
for all s. By Theorem 2.1, almost surely on A
_
t
0
I
s<(n)
f
s
dw
s
=
_

0
I
s<t(n)
f
s
dw
s
=
_
t
0
I
s<(m)
f
s
dw
s
.
In other words, almost surely
I
t(n)
_
t
0
I
s<(n)
f
s
dw
s
= I
t(n)
_
t
0
I
s<(m)
f
s
dw
s
(1)
for any t and m n. Clearly, the set
t
of all for each of which (1)
holds for all m n and rational t has full probability. If
t
, then (1) is
actually true for all t, since both sides are left continuous in t. This is just
a restatement of our assertion, so the lemma is proved.
2. Corollary. If f o, then with probability one the sequence
t
(n) con-
verges uniformly on each nite time interval.
3. Denition. Let f o. For those for which the sequence
t
(n) con-
verges uniformly on each nite time interval we dene
_
t
0
f
s
dw
s
= lim
n
_
t
0
I
s<(n)
f
s
dw
s
.
For all other we dene
_
t
0
f
s
dw
s
= 0.
_
T
0
f
2
s
ds < 181
Of course, one has to check that Denition 3 does not lead to anything
new if f H. Observe that if f H, then by Fatous theorem and the
dominated convergence theorem
E
lim
n
_
t
0
I
s<(n)
f
s
dw
s

_
t
0
f
s
dw
s
2
lim
n
E
_
t
0
(1 I
s<(n)
)f
2
s
ds = 0.
Therefore both denitions give the same result almost surely for any given
t. Since Denition 3 yields a continuous process, we see that, for f H, the
new integral is also the integral in the previous sense.
Also notice that f 1 o, (n) = n, and hence (a.s.)
_
t
0
1 dw
s
= lim
n
_
t
0
I
s<n
dw
s
= lim
n
_

0
I
s<nt
dw
s
= lim
n
w
nt
= w
t
.
Now come some properties of the stochastic integral on o.
4. Exercise. By using Fatous theorem and Exercise 2.10, prove Daviss
inequality for f o.
5. Theorem. Let f, f
n
, g o, and let , > 0, T [0, ) be constants.
Then:
(i) the stochastic integral
_
t
0
f
s
dw
s
is continuous in t and T
t
-adapted;
(ii) we have
P
_
sup
tT
_
t
0
f
s
dw
s

_
t
0
g
s
dw
s

_
P
_
_
T
0
[f
s
g
s
[
2
ds
_
+
1
2
E
_
T
0
(f
s
g
s
)
2
ds P
_
_
T
0
[f
s
g
s
[
2
ds
_
+

2
; (2)
(iii) we have
_
T
0
[f
n
s
f
s
[
2
ds
P
0 = sup
tT
_
t
0
f
n
s
dw
s

_
t
0
f
s
dw
s
P
0.
Proof. (i) The continuity of
_
t
0
f
s
ds follows from Denition 3, in which
_
t
0
I
s<(n)
f
s
ds
are continuous and T
t
-adapted (even T
t
-martingales). Their limit is also
T
t
-adapted.
To prove (ii), rst notice that all expressions in (2) are monotone and
right continuous in and . Therefore, it suces to prove (2) only at points
of their continuity. Also notice that the second inequality in (2) is obvious
since .
Now x appropriate , and and dene
(n) = inft 0 :
_
t
0
f
2
s
ds n, (n) = inft 0 :
_
t
0
g
2
s
ds n,
f
n
s
= I
s<(n)
f
s
, g
n
s
= I
s<(n)
g
s
.
Since f
n
and g
n
belong to H, inequality (2) holds with f
n
, g
n
in place of
f, g due to the linearity of the stochastic integral on H and Theorem 2.8.
Furthermore, almost surely, as n ,
sup
tT
_
t
0
f
n
s
dw
s

_
t
0
g
n
s
dw
s
sup
tT
_
t
0
f
s
dw
s

_
t
0
g
s
dw
s
,
_
T
0
[f
n
s
g
n
s
[
2
ds
_
T
0
[f
s
g
s
[
2
ds.
These convergences of random variables imply convergence of the corre-
sponding distribution functions at all points of their continuity. Adding to
this tool the dominated convergence theorem, we get (2) from its version for
f
n
, g
n
.
To prove (iii) it suces to take g = f
n
in (2) and let rst n and
then 0. The theorem is proved.
6. Exercise. Prove that the converse implication in assertion (iii) of The-
orem 5 is also true.
Before discussing further properties of the stochastic integral we denote
n
(x) = (n) x n,
so that
n
(x) = x for [x[ n and
n
(x) = nsign x otherwise. Observe that,
if f o, T [0, ), and f
n
s
:=
n
(f
s
)I
s<T
, then f
n
H and (a.s.)
_
T
0
[f
n
s
f
s
[
2
ds 0.
This way of approximating f o along with Theorem 5 and known proper-
ties of the stochastic integral on H immediately yields assertions (i) through
(iii) of the following theorem.
_
T
0
f
2
s
ds < 183
7. Theorem. (i) If f, g o, a, b R, then (a.s.)
_
t
0
(af
s
+bg
s
) dw
s
= a
_
t
0
f
s
dw
s
+b
_
t
0
g
s
dw
s
t [0, ).
(ii) If f
s
= f
t
i
for s [t
i
, t
i+1
), 0 = t
0
< t
1
< ..., t
i
as i , and
the f
t
i
are T
t
i
-measurable, then f o and (a.s.) for every t 0
_
t
0
f
s
dw
s
=

t
i+1
<t
f
t
i
(w
t
i+1
w
t
i
) +f
t
k
(w
t
w
t
k
),
where k is such that t
k
t and t
k+1
> t.
(iii) If f, g o, T < , A T, and f
s
() = g
s
() for all s [0, T] and
A, then almost surely on A
_
t
0
f
s
dw
s
=
_
t
0
g
s
dw
s
t T.
(iv) If f o, T < , and is a stopping time satisfying T, then
(a.s.)
_

0
f
s
dw
s
=
_
T
0
I
s<
f
s
dw
s
. (3)
Assertion (iv) is obtained from the fact that due to Theorem 2.7, if
f H, then the left-hand side of (3) equals (a.s.)
_
T
0
f
s
dw
s
=
_

0
I
s<T
f
s
dw
s
=
_

0
I
s<T
I
s<
f
s
dw
s
=
_
T
0
I
s<
f
s
dw
s
.
In the statement of the following theorem we use the fact that if f o,
is a stopping time, and
E
_

0
f
2
s
ds = E
_

0
I
s<
f
2
s
ds < ,
then I
s<
f
s
H and
_
0
I
s<
f
s
dw
s
makes sense.
8. Theorem. Let f o, T , and let be an almost surely nite
stopping time (that is, () < for almost every ).
(i) If E
_
0
f
2
s
ds < , then
_

0
f
s
dw
s
=
_

0
I
s<
f
s
dw
s
(a.s.) (4)
and Walds identities hold:
E
_
_

0
f
s
dw
s
_
2
= E
_

o
f
2
s
ds, E
_

0
f
s
dw
s
= 0. (5)
In particular (for f 1), if E < , then Ew
2
= E and Ew
= 0.
(ii) If E
_
T
0
f
2
t
dt < , then
_
t
0
f
s
dw
s
is a martingale for t [0, T]
[0, ).
Proof. To prove (4) it suces to remember what has been said before
the theorem and use Theorems 7 and 2.7, which imply that (a.s.)
_

0
f
s
dw
s
= lim
n
_
n
0
f
s
dw
s
= lim
n
_
n
0
I
s<n
f
s
dw
s
= lim
n
_

0
I
s<n
I
s<n
f
s
dw
s
= lim
n
_

0
I
s<n
f
s
dw
s
=
_

0
I
s<
f
s
dw
s
,
where the last equality holds (a.s.) because
E
_

0
[I
s<n
f
s
I
s<
f
s
[
2
ds 0.
Equation (5) follows from (4) and the properties of the stochastic integral
on H.
To prove (ii) it suces to notice that
_
t
0
f
s
dw
s
=
_
t
0
I
s<T
f
s
dw
s
for t T, I
s<T
f
s
H,
and that stochastic integrals of elements of H are martingales. The theorem
is proved.
9. Example. Sometimes Walds identities can be used for concrete com-
putations. To show an example, let a, b > 0, and let be the rst exit time
of w
t
from (a, b). Then, for each t, we have [w
t
[ a +b and
(a +b)
2
Ew
2
t
= E
_
_
t
0
I
s<
dw
s
_
2
= E
_
t
0
I
s<
ds = Et .
_
T
0
f
2
s
ds < 185
Since this is true for any t, by the monotone convergence theorem E
(a +b)
2
< .
It follows that Walds identities hold true for this , so that, in particular,
Ew
= 0, which is written as
aP(w
= a) +bP(w
= b) = 0.
Adding to this that P(w
= a) + P(w
= b) = 1 since < (a.s.), we

get
P(w
= a) =
b
a +b
, P(w
= b) =
a
a +b
.
Furthermore, Ew
2
= E. In other words,
E = a
2
P(w
= a) +b
2
P(w
= b) = ab.
We thus rediscover the results of Exercise 2.6.5.
10. Remark. Generally Walds identities are wrong if E
_
0
f
2
s
ds = .
For instance, let = inft 0 : w
t
1. We know that has Walds
distribution, so that P( < ) = 1 and w
= 1 (a.s.). Hence Ew
= 1 ,= 0,
and one of identities is violated. It follows that E = and Ew
2
= 1 ,= E.
11. Exercise. In Remark 10 we have that E
= , which follows from

the explicit formula for Walds distribution. In connection with this, prove
that, if E(
_
0
f
2
s
ds)
1/2
< , then E
_
0
f
s
dw
s
= 0.
Regarding assertion (ii) of Theorem 8, it is worth noting that generally
stochastic integrals are not martingales. We give two exercises to that eect:
Exercises 12 and 7.4.
12. Exercise. For t < 1 consider the process 1 +
_
t
0
(1 s)
1
dw
s
and let
be the rst time it hits zero. Prove that < 1 (a.s.) and
_
t
0
1
1 s
I
s<
dw
s
+ 1 = 0
(a.s.) for all t 1.
13. Exercise. Prove that if E
_ _
T
0
f
2
s
ds
_
1/2
< , then
_
t
0
f
s
dw
s
is a mar-
tingale for t T.
In the future we also use the stochastic integral with variable lower
limit. If 0 t
1
t
2
< , f
t
() is measurable with respect to (, t) and
T
t
-adapted, and
_
t
2
t
1
f
2
t
dt < (a.s.),
then dene
_
t
2
t
1
f
s
dw
s
=
_
t
2
0
I
[t
1
,t
2
)
(s)f
s
dw
s
. (6)
We have I
[t
1
,t
2
)
= I
[0,t
2
)
I
[0,t
1
)
. Hence, if f o, then (a.s.)
_
t
2
t
1
f
s
dw
s
=
_
t
2
0
f
s
dw
s

_
t
1
0
f
s
dw
s
. (7)
14. Theorem. Let f o, 0 t
1
t
2
< , and let g be T
t
1
-measurable.
Then (a.s.)
g
_
t
2
t
1
f
s
dw
s
=
_
t
2
t
1
gf
s
dw
s
, (8)
that is, one can factor out appropriately measurable random variables.
Proof. First of all, notice that the right-hand side of (8) is well dened
by virtue of denition (6) but not (7). Next, (8) is trivial for f H
0
, since
both sides are just simple sums. If f H, one can approximate f with
f
n
H
0
so that
_

0
[f
n
t
f
t
[
2
dt
P
0,
_

0
[gf
n
t
gf
t
[
2
dt
P
0.
Then one can pass to the limit on the basis of Theorem 5. After having
proved (8) for f H, one easily gets (8) in the general case by noticing that
in the very Denition 3 we use f
n
H such that
_
T
0
[f
n
t
f
t
[
2
dt 0 (a.s.)
for every T < . The theorem is proved.
4. Ito integral with respect to a
multidimensional Wiener process
1. Denition. Let (, T, P) be a complete probability space, let T
t
,t 0,
be a ltration of complete -elds T
t
T, and let w
t
= (w
1
t
, ..., w
d
t
) be a d-
dimensional process on . We say that w
t
is a d-dimensional Wiener process
relative to T
t
, t 0, or that (w
t
, T
t
) is a d-dimensional Wiener process, if
(i) w
k
t
are Wiener processes for each k = 1, ..., d,
(ii) the processes w
1
t
, ..., w
d
t
are independent,
Ch 6 Section 4. Multidimensional It o integral 187
(iii) w
t
is T
t
-adapted and w
t+h
w
t
is independent of T
t
if t, h 0.
If f
t
= (f
1
t
, ..., f
d
t
) is a d-dimensional process, we write f o whenever
f
i
o for any i. If f
t
= (f
1
t
, ..., f
d
t
) o, we dene
_
t
0
f
s
dw
s
=
_
t
0
f
1
s
dw
1
s
+... +
_
t
0
f
d
s
dw
d
s
, (1)
so that f
s
dw
s
is interpreted as the scalar product of f
s
and dw
s
.
The stochastic integral against multidimensional Wiener processes pos-
sesses properties quite similar to the ones in the one-dimensional case. We
neither list nor prove all of them, pointing out only that, if f, g o, T < ,
and E
_
T
0
([f
s
[
2
+[g
s
[
2
) ds < , then
E
_
T
0
f
s
dw
s
_
T
0
g
s
dw
s
= E
_
T
0
f
s
g
s
ds.
This property is easily proved on the basis of (1) and the fact that, for
instance,
E
_
T
0
f
1
s
dw
1
s
_
T
0
g
2
s
dw
2
s
= 0,
which in turn is almost obvious for f, g H
0
and extends to f, g o by
standard passages to the limit.
We also need to integrate matrix-valued processes. If
t
= (
ik
t
), i =
1, ..., d
1
, k = 1, ..., d, and
ij
o, then we write o and by
_
t
0
s
dw
s
we naturally mean the d
1
-dimensional process, the ith coordinate of which
is given by
d
k=1
_
t
0
ik
s
dw
k
s
.
In other terms we look at
s
dw
s
as the product of the matrix
s
and the
column vector dw
s
.
2. Exercise*. Prove that if E
_
t
0
tr
s
s
ds < , then
E
_
t
0
s
dw
s
2
= E
_
t
0
tr
s
s
ds.
3. Exercise. Let b
t
be a d-dimensional process, b
t
o. Prove that
_
exp(
_
t
0
b
t
dw
t
(1/2)
_
t
0
[b
t
[
2
dt), T
t
_
is a supermartingale.
5. Itos formula
In the usual calculus, after the notion of integral is introduced one discusses
the rules of integration and compiles the table of elementary integrals. The
most important tools of integration are change of variable and integration
by parts, which are proved on the basis of the formula for dierentiating
superpositions. The formula for the stochastic dierential of a superposition
is called It os formula. This formula was discovered in [It] as a curious fact
and then became the main tool of modern stochastic calculus.
1. Denition. Let (, T, P) be a complete probability space carrying a
d
1
-dimensional Wiener process (w
t
, T
t
) and a continuous d-dimensional T
t
-
adapted process
t
. Assume that we are also given a d d
1
matrix valued
process
t
and a d-dimensional process b
t
such that o and b is jointly
measurable in (, t), T
t
-adapted, and
_
T
0
[b
s
[ ds < (a.s.) for any T < .
Then we write
d
t
=
t
dw
t
+b
t
dt
if and only if (a.s.) for all t
t
=
0
+
_
t
0
s
dw
s
+
_
t
0
b
s
ds. (1)
In that case one says that
t
has stochastic dierential equal to
t
dw
t
+b
t
dt.
From calculus we know that if f(x) and g(t) are dierentiable, then
df(g(t)) = f
t
(g(t)) dg(t).
It turns out that stochastic dierentials possess absolutely dierent proper-
ties. For instance, consider d(w
2
t
) for one-dimensional w
t
. If the usual rules
were true, we would have dw
2
t
= 2w
t
dw
t
, that is,
w
2
t
= 2
_
t
0
w
s
dw
s
.
However, this is impossible since
Ch 6 Section 5. It os formula 189
Ew
2
t
= t, E
_
t
0
w
2
s
ds < , E
_
t
0
w
s
dw
s
= 0.
Still, there is a case in which the usual formula holds. This case was
found by Hitsuda. Let (w
t
t
, w
tt
t
) be a two-dimensional Wiener process and
dene the complex Wiener process by
z
t
= w
t
t
+iw
tt
t
.
It turns out (see Exercise 5) that, for any analytic function f(z), we have
df(z
t
) = f
t
(z
t
) dz
t
, that is,
f(z
t
) = f(0) +
_
t
0
f(z
s
) dz
s
. (2)
We have what would be the usual formula if z
t
were piecewise dieren-
tiable.
We have introduced formal d
1
-dimensional expressions
t
dw
t
+ b
t
dt.
Now we dene rules of operating with them. We assume that while multi-
plying them by constants, adding up, and evaluating their scalar products
the usual algebraic rules of factoring out and combining similar terms are
enforced along with the following multiplication table (which, by the way,
keeps the products of stochastic dierentials in the set of stochastic dier-
entials):
dw
i
t
dw
j
t
=
ij
dt, dw
i
t
dt = (dt)
2
= 0. (3)
A crucial role in the proof of It os formula is played by the following.
2. Lemma. Let
t
,
t
be real-valued processes having stochastic dierentials.
Then
t
t
also has a stochastic dierential, and
d(
t
t
) =
t
d
t
+
t
d
t
+ (d
t
)d
t
.
Proof. Let
t
=
0
+
_
t
0
s
dw
s
+
_
t
0
b
s
ds,
t
=
0
+
_
t
0

s
dw
s
+
_
t
0
b
s
ds,
where
s
and
s
are vector-valued processes and b
s
and

b
s
are real-valued
ones. By the above rules, assuming the summation convention, we can write
t
d
t
=
t
(
k
t
dw
k
t
+b
t
dt) =
t
k
t
dw
k
t
+
t
b
t
dt =
t
t
dw
t
+
t
b
t
dt,
t
d
t
=
t

t
dw
t
+
t
b
t
dt, (d
t
)d
t
=
j
t
dw
j
t

k
t
dw
k
t
=
j
t

j
t
dt =
t

t
dt.
Therefore our assertion means that, for all t [0, ) at once, with proba-
bility one,
t
=
0
0
+
_
t
0
(
s
s
+
s

s
) dw
s
+
_
t
0
(
s
b
s
+
s
b
s
+
s

s
) ds. (4)
First, notice that the right-hand side of (4) makes sense because (a.s.)
_
t
0
[
s
b
s
[ ds max
st
[
s
[
_
t
0
[b
s
[ ds < ,
_
t
0
[
s
j
s
[
2
ds max
st
[
s
[
_
t
0
[
j
s
[
2
ds < ,
_
t
0
[
s

s
[ ds
_
t
0
[
s
[
2
ds +
_
t
0
[
s
[
2
ds < .
Next, notice that if d
t
t
=
t
t
dw
t
+ b
t
t
dt and d
tt
t
=
tt
t
dw
t
+ b
tt
t
dt and
(4) holds with
t
,
t
, b
t
and
tt
,
tt
, b
tt
in place of , , b, then it also holds
for
t
+
tt
,
t
+
tt
, b
t
+b
tt
. It follows that we may concentrate only on two
possibilities for d
t
: d
t
=
t
dw
t
and d
t
= b
t
dt. We have the absolutely
similar situation with . Therefore, we have to deal only with four pairs of
d
t
and d
t
. To nish our preparation, we also notice that both sides of (4)
are continuous in t, so that to prove that they coincide with probability one
for all t at once, it suces to prove that they are equal almost surely for
each particular t.
Thus, x t, and rst let d
t
= b
t
dt and d
t
=

b
t
dt. Then (4) follows
from the usual calculus (or is proved as in the following case).
The two cases, (i) d
t
=
t
dw
t
and d
t
=

b
t
dt and (ii) d
t
= b
t
dt and
d
t
=
t
dw
t
, are similar, and we concentrate on (i).
Let 0 = t
m0
t
m1
... t
mk
m
= t be a sequence of partitions of [0, t]
such that max
i
(t
m,i+1
t
mi
) 0 as m . Dene
m
(s) = t
mi
,
m
(s) = t
m,i+1
if s [t
mi
, t
m,i+1
).
Obviously
m
(s),
m
(s) s uniformly on [0, t]. In addition, the formula
ab cd = (a c)d + (b d)a
and Theorem 3.14 show that (a.s.)
t

0
0
=
k
m
1
i=0
(
t
m,i+1
t
m,i+1

t
mi
t
mi
)
=
k
m
1
i=0
t
mi
_
t
m,i+1
t
mi
s
dw
s
+
k
m
1
i=0
t
m,i+1
_
t
m,i+1
t
mi
b
s
ds
=
_
t
0
m
(s)
s
dw
s
+
_
t
0

m
(s)
b
s
ds. (5)
Furthermore, as m , we have (a.s.)
_
t
0

m
(s)
b
s
ds
_
t
0
b
s
ds
sup
st
[

m
(s)

s
[
_
t
0
[
b
s
[ ds 0,
_
t
0
[
m
(s)

s
[
2
(
j
s
)
2
ds sup
st
[
m
(s)

s
[
2
_
t
0
(
j
s
)
2
ds 0,
and the last relation by Theorem 3.5 (iii) implies that
_
t
0
m
(s)
s
dw
s
P
_
t
0
s
dw
s
. (6)
Now by letting m in (5) we get (4) (a.s.) in our particular case.
Thus it only remains to consider the case d
t
=
t
dw
t
, d
t
=
t
dw
t
, and
prove that
t
=
0
0
+
_
t
0
(
s
s
+
s

s
) dw
s
+
_
t
0
s

s
ds. (7)
Notice that we may assume that
0
=
0
= 0, since in the initial reduction
to four cases we could absorb the initial values in the terms with dt.
Now we again use bilinearity and conclude that, since and can
be represented as sums of vector-valued processes each of which has only
one nonidentically zero element, we only have to prove (7) for such simple
vector-valued processes. Furthermore, keeping in mind that each f o can
be approximated by f
n
H
0
(see, for instance, the proof of Theorem 3.14),
we see that we may assume that
j
,
j
H
0
.
In this way we conclude that to prove (7) in the general case, it suces
to prove that, if f, g H
0
,
r
=
_
r
0
f
s
dw
i
s
, and
r
=
_
r
0
g
s
dw
j
s
, then (a.s.)
t
=
_
t
0
f
s
s
dw
i
s
+
_
t
0
g
s
s
dw
j
s
+
_
t
0
f
s
g
s
ij
ds. (8)
Remember that t is xed, and without losing generality assume that the
partitions corresponding to f and g coincide and t is one of the partition
points. Let t
0
, t
1
, ... be the common partition with t = t
k
. Next, as above
we take the sequence of partitions dened by t
mi
of [0, t] and again without
loss of generality assume that each t
i
lying in [0, t] belongs to t
mi
: i =
0, 1, .... We use the formula
ab cd = (a c)d + (b d)c + (a c)(b d) (9)
and Theorem 3.14. Fix a q = 0, ..., k 1 and, by default summing up with
respect to those r for which t
q
t
mr
< t
q+1
, write (a.s.)
t
q+1
t
q+1

t
q
t
q
=
(
t
m,r+1
t
m,r+1

t
mr
t
mr
)
=
t
mr
_
t
m,r+1
t
mr
f
s
dw
i
s
+
t
mr
_
t
m,r+1
t
mr
g
s
dw
j
s
+
_
t
m,r+1
t
mr
f
s
dw
i
s
_
t
m,r+1
t
mr
g
s
dw
j
s
=
_
t
q+1
t
q
m
(s)
f
s
dw
i
s
+
+
_
t
q+1
t
q
m
(s)
g
s
dw
j
s
+f
t
q
g
t
q
(w
i
t
m,r+1
w
i
t
mr
)(w
j
t
m,r+1
w
j
t
mr
). (10)
In the expression after the last equality sign the rst two terms converge in
probability to
_
t
q+1
t
q
s
f
s
dw
i
s
,
_
t
q+1
t
q
s
g
s
dw
j
s
respectively, which is proved in the same way as (6). If i = j, the last term
converges in probability to
f
t
q
g
t
q
(t
q+1
t
q
) =
_
t
q+1
t
q
f
s
g
s
ds
by Theorem 2.2.6. Consequently, by letting m in (10) and then adding
up the results for q = 0, ..., k 1, we come to (7) if i = j. For i ,= j one
uses the same argument complemented by the observation that the last
sum in (10) tends to zero in probability, since its mean is zero due to the
independence of w
i
and w
j
, and
E
_
(w
i
t
m,r+1
w
i
t
mr
)(w
j
t
m,r+1
w
j
t
mr
)
2
= Var
_
...
E(w
i
t
m,r+1
w
i
t
mr
)
2
(w
j
t
m,r+1
w
j
t
mr
)
2
=
(t
m,r+1
t
mr
)
2
max
i
(t
m,i+1
t
mi
)t 0.
3. Exercise. Explain why in the treatment of the fourth case one cannot
use a formula similar to (5) in place of (10).
4. Theorem (It os formula). Let a d
1
-dimensional process
t
have stochas-
tic dierential, and let u(x) = u(x
1
, ..., x
d
1
) be a real-valued twice continu-
ously dierentiable function of x R
d
1
. Then u(
t
) has a stochastic dier-
ential, and
du(
t
) = u
x
i (
t
) d
i
t
+ (1/2)u
x
i
x
j (
t
) d
i
t
d
j
t
. (11)
Proof. Let C
2
be the set of all real-valued twice continuously dieren-
tiable function on R
d
1
. We are going to use the fact that for every u C
2
there is a sequence of polynomials u
m
such that u
m
, u
m
x
i
, u
m
x
i
x
j
converge to
u, u
x
i , u
x
i
x
j uniformly on each ball. For such a sequence and any , t, i, j
sup
st
[u
m
x
i
(
t
) u
x
i (
t
)[ + sup
st
[u
m
x
i
x
j
(
t
) u
x
i
x
j (
t
)[ 0,
since each trajectory of
s
, s t, lies in a ball. It follows easily that, if (11)
is true for u
m
, then it is also true for u.
Thus, we only need to prove (11) for polynomials, and to do this it
obviously suces to show that (11) holds for linear function and also for the
product of any two functions u and v for each of which (11) holds.
For linear u formula (11) is obvious. If (11) holds for u and v, then by
Lemma 2
d(u(
t
)v(
t
)) = u(
t
) dv(
t
) +v(
t
) du(
t
) + (du(
t
))dv(
t
)
= [uv
x
i +vu
x
i ](
t
) d
i
t
+ (1/2)[uv
x
i
x
j +vu
x
i
x
j ](
t
) d
i
t
d
i
t
+u
x
i v
x
j (
t
) d
i
t
d
i
t
= (uv)
x
i (
t
) d
i
t
+ (1/2)(uv)
x
i
x
j (
t
) d
i
t
d
i
t
.
Itos formula (11) looks very much like Taylors formula with two terms.
Usually one rewrites it in a dierent way. Namely, let d
t
=
t
dw
t
+ b
t
dt,
a = (1/2)
t
t
. Simple manipulations show that (d
i
t
)d
j
t
= 2a
ij
t
dt and
hence
du(
t
) = L
t
u(
t
) dt +
t
u
x
(
t
) dw
t
,
where u
x
= grad u is a column vector and L
t
is the second-order dierential
operator given by
L
t
v(x) = a
ij
t
v
x
i
x
j (x) +b
i
t
v
x
i (x).
In this notation (11) means that for all t (a.s.)
u(
t
) = u(
0
) +
_
t
0
L
s
u(
s
) ds +
_
t
0
s
u
x
(
s
) dw
s
. (12)
5. Exercise. Prove that (2) holds for analytic functions f.
Itos formula leads to extremely important formulas relating the theory
of stochastic integration with the theory of partial dierential equations.
One of them is the following theorem.
6. Theorem. Let
0
be nonrandom, let Q be a domain in R
d
1
, let
0
Q,
let be the rst exit time of
t
from Q, and let u be a function which
is continuous in

Q and has continuous rst and second derivatives in Q.
Assume that
P( < ) = 1, E
_

0
[L
s
u(
s
)[ ds < .
Then
u(
0
) = Eu(
) E
_

0
L
s
u(
s
) ds.
We give no proof to this theorem because it is just a particular result,
and usually when one needs such results it is easier and shorter to prove
what is needed directly instead of trying to nd the corresponding result in
the literature. We will see examples of this in Sec. 7.
Roughly speaking, to prove Theorem 6 one plugs in place of t in (12)
and takes expectations. The main diculties on the way are caused by
the fact that u is not even given in the whole R
d
1
and the expectation of
a stochastic integral does not necessarily exist, let alone equal zero. One
overcomes these diculties by taking smaller domains Q
m
Q, extending
u outside Q
m
, taking even smaller that the rst exit time from Q
m
, and
then passing to the limit.
Ch 6 Section 6. An alternative proof of It os formula 195
6. An alternative proof of It os formula
The approach we have in mind is based on using stopping times and sto-
chastic intervals. It turns out that these tools could be used right from the
beginning, even for dening It o integral. First we briey outline how to do
this, to give the reader one more chance to go through the basics of the the-
ory and also to show a way which is valid for integrals against more general
martingales.
1. Denition. Let = () be a [0, )-valued function on taking only
nitely many values, say t
1
, ..., t
n
0. We say that is a simple stopping
time (relative to T
t
) if : () = t
k
T
t
k
for any k = 1, ..., n. The set of
all simple stopping times is denoted by /.
Below in this section we only use simple stopping times.
2. Exercise*. (i) Prove that simple stopping times are stopping times, and
that : () t T
t
for any t.
(ii) Derive from (i) that if
1
and
2
are simple stopping times, then
1

2
and
1

2
are simple stopping times as well.
3. Lemma. For a real-valued function (), dene the stochastic interval
[
(0, ]] as the set (, t) : , 0 < t () and let be the collection
of all stochastic intervals
[
(0, ]] with running through the set of all simple
stopping times. Finally, for =
[
(0, ]] , dene () = w
. Then is
a random orthogonal measure on with reference measure = P and
E() = 0 for any .
Proof. Let be a simple stopping time and t
1
, ..., t
n
the set of its
values. Then
w
= w
t
1
I
=t
1
+... +w
t
n
I
=t
n
and, since Ew
2
s
< , E
2
(
[
(0, ]]) = Ew
2
< .
Next we will be using the simple fact that, if is a simple stopping time
and the set 0 = t
0
< t
1
< ... < t
n
contains all possible values of , then
w
=
n1
i=0
f
t
i
(w
t
i+1
w
t
i
), =
n1
i=0
f
t
i
(t
i+1
t
i
), (1)
where f
t
:= I
>t
is T
t
-measurable (Exercise 2). Since : () > t
i
T
t
i
and w
t
i+1
w
t
i
is independent of T
t
i
, we have
Ef
t
i
(w
t
i+1
w
t
i
) = Ef
t
i
E(w
t
i+1
w
t
i
) = 0, E(
[
(0, ]]) = 0.
Now, let and be simple stopping times, t
1
, ..., t
n
the ordered set of
their values, and
1
=
[
(0, ]] and
2
=
[
(0, ]]. By using (1) we have
E(
1
)(
2
) = Ew
=
n1
i,j=0
Ef
t
i
g
t
j
(w
t
i+1
w
t
i
)(w
t
j+1
w
t
j
),
which, in the same way as in the proofs of Theorem 2.7.3 or Lemma 1.3 used
in other approaches, is shown to be equal to
n1
i=0
Ef
t
i
g
t
i
(t
i+1
t
i
) = E
n1
i=0
I
>t
i
(t
i+1
t
i
) = E .
Since
E =
_
_

0
I
|
(0,]]
|
(0,]]
(, t) P(d)dt = (
1

2
),
the lemma is proved.
From this lemma we derive the following version of Walds identities.
4. Corollary. Let
1
and
2
be simple stopping times. Then Ew
2
1
= E
1
and E(w
1
w
2
)
2
= E[
1

2
[.
Indeed, we get the rst equality from the proof of Lemma 3 by taking
= . To prove the second one, dene =
1

2
, =
1

2
and notice
that
E(w
1
w
2
)
2
= E(w
)
2
= Ew
2
2Ew
+Ew
2
= E E = E( ) = E[
1

2
[.
5. Exercise. Carry over the result of Corollary 4 to all bounded stopping
times.
6. Remark. Lemma 3 and the general Theorem 2.3.13 imply that there
is a stochastic integral operator, say I, dened on L
2
(, ) with values in
L
2
(T, P). Since is a -system of subsets of (0, ), we have L
2
(, ) =
L
2
((), ) due to Theorem 2.3.19.
7. Remark. It turns out that () = T. Indeed, on the one hand the
indicators of the sets
[
(0, ]] generating () are left-continuous and T
t
-
adapted, hence predictable (Exercise 2.8.3). In other words,
[
(0, ]] T
and () T. On the other hand, if A T
s
, s 0, and for n > s we dene
n
= s on A and
n
= n on A, then
n
are simple stopping times and
[
(0,
n
]] = (, t) : 0 < t
n
()
= (, t) : 0 < t s, A
_
(, t) : 0 < t n, A
c
,
_
n
[
(0,
n
]] = (A(0, s]) (A
c
(0, )) (),
so that (
n
[
(0,
n
]])
c
= A(s, ) (). It follows that the set generating
T is a subset of () and T ().
8. Remark. Remark 7 and the denition of L
2
(, ) imply the somewhat
unexpected result that for every f L
2
(T, ), in particular, f H, there
are simple stopping times
m
i
and constants c
m
i
dened for m = 1, 2, ... and
i = 1, ..., k(m) < such that
E
_

0
[f
t
k(m)
i=1
c
m
i
I
|
(0,
m
i
]]
(t)[
2
dt 0
as m .
9. Exercise. Find simple stopping times
m
i
and constants c
m
i
such that,
for the one-dimensional Wiener process w
t
,
E
_

0
[I
t1
w
t

k(m)
i=1
c
m
i
I
| (0,
m
i
]]
(t)[
2
dt 0
as m .
10. Remark. The operator I from Remark 6 coincides on L
2
(, ) with
the operator of stochastic integration introduced before Remark 1.6. This
follows easily from the uniqueness of continuation and Theorem 2.7, showing
that the old stochastic integral coincides with the new one on the indicators
of
[
(0, ]] and both are equal to w
.
After making sure that we deal with the same objects as in Sec. 5,
we start proving It os formula, allowing ourselves to use everything proved
before Sec. 5. As in Sec. 5, we need only prove Lemma 5.2. Dene
n
(t) =
2
n
[2
n
t].
Due to (5.9) we have
w
i
t
w
j
t
=
_
t
0
w
i
n
(s)
dw
j
s
+
_
t
0
w
j
n
(s)
dw
i
s
+
k=0
(w
i
k+1
2
n
t
w
i
k
2
n
t
)(w
j
k+1
2
n
t
w
j
k
2
n
t
), i, j = 1, ..., d (a.s.). (2)
By sending n to innity, from the theorem on quadratic variation of the
Wiener process we get that (a.s.)
w
i
t
w
j
t
=
_
t
0
w
i
s
dw
j
s
+
_
t
0
w
j
s
dw
i
s
+
ij
t, i, j = 1, ..., d. (3)
Furthermore, for , /, , by using the fact that the sets of all
values of , are nite, we obtain that
_

0
w
j
I
<s
dw
i
s
= w
j
(w
i
w
i
) (a.s.).
Hence and from (3) for i, j = 1, ..., d, , /, = we have (a.s.)
w
i
w
j
= (w
i
w
i
)w
j
+ (w
j
w
j
)w
i
+w
i
w
j
=
_

0
w
j
I
<s
dw
i
s
+
_

0
w
i
I
<s
dw
j
s
+
_

0
w
j
s
I
s
dw
i
s
+
_

0
w
i
s
I
s
dw
j
s
+
ij
=
_

0
w
j
s
I
s
dw
i
s
+
_

0
w
i
s
I
s
dw
j
s
+
_

0
I
s
I
s
ds.
By replacing here , by t, t, we conclude that (a.s.)
w
i
t
=
_
t
0
I
s
dw
i
s
, w
j
t
=
_
t
0
I
s
dw
j
s
,
w
i
t
w
j
t
=
_
t
0
w
j
s
I
s
dw
i
s
+
_
t
0
w
i
s
I
s
dw
j
s
+
_
t
0
I
s
I
s
ds. (4)
Next, similarly to our argument about (2) and (3), by replacing w
j
t
with
t and then w
i
t
with t as well, instead of (4) we get
t =
_
t
0
I
s
ds, t =
_
t
0
I
s
ds,
(t )w
i
t
=
_
t
0
(s )I
s
dw
i
s
+
_
t
0
w
i
s
I
s
ds,
(t )(t ) =
_
t
0
(s )I
s
ds +
_
t
0
(s )I
s
ds.
(5)
To nish the preliminaries, we observe that for each T
0
-measurable ran-
dom variable
0
, obviously
0
w
i
t
=
_
t
0
0
I
s
dw
i
s
, (t )
0
=
_
t
0
0
I
s
ds. (6)
Now we recall the notion of stochastic dierential from before Lemma
5.2, and the multiplication table (5.3). Then we automatically have the
following.
11. Lemma. All the formulas (4), (5), and (6) can be written in one and
the same way: If
t
,
t
are real-valued processes and
d
t
=
t
dw
t
+b
t
dt, d
t
=
t
t
dw
t
+b
t
t
dt, (7)
where all entries of
t
,
t
t
and of b
t
, b
t
t
are indicators of elements of , then
d(
t
t
) =
t
d
t
+
t
d
t
+ (d
t
)(d
t
). (8)
Also notice that since both sides of equality (8) are linear in and in
, equality (8) immediately extends to all processes
t
,
t
satisfying (7) with
functions ,
t
, b, b
t
of class S().
Now we are ready to prove Lemma 5.2, saying that (8) holds true for all
scalar processes
t
,
t
possessing stochastic dierentials. To this end, assume
rst that
t
, b
t
S() and take a sequence of processes
n
, b
n
of class S()
such that (a.s.)
_
T
0
([
t

nt
[
2
+[b
t
b
nt
[) dt 0 T [0, ).
Dene also processes
n
t
, replacing , b in (6) by
n
, b
n
. As is well known, in
probability
sup
tT
[[
_
t
0
(
s

ns
) dw
s
[ +[
_
t
0
(b
s
b
ns
) ds[] 0,
sup
sT
[
t

n
t
[ 0 T [0, ). (9)
If necessary, we take a subsequence and we assume that the convergences
in (9) hold almost surely. Then by the dominated convergence theorem we
have (a.s.)
_
T
0
[
t
n
t
[([
t
t
[
2
+[b
t
t
[) dt 0,
_
T
0
[
t
[([
t

nt
[
2
+[b
t
b
nt
[) dt 0,
_
T
0
[
t

t
t

nt

t
t
[ dt
(
_
T
0
[
t

nt
[
2
dt)
1/2
(
_
T
0
[
t
t
[
2
dt)
1/2
0 T [0, ).
This and an argument similar to the one which led us to (9) show that in
the integral form of (8), with
n
t
instead of
t
, we can pass to the limit in
probability and get (8) for the limit process
t
. Of course, after this we x
the process
t
and we carry out a similar limit passage in (8) aecting the
second factor. In this way we get Lemma 5.2 in a straightforward way from
the quite elementary Lemma 11.
7. Examples of applying It os formula
In this section w
t
is a d-dimensional Wiener process.
1. Example. Let be the rst exit time of w
t
from B
R
= x : [x[ < R,
where R > 0 is a number. As we know, is a stopping time. Take
u(x) = (1/d)(R
2
[x[
2
)
and apply It os formula to u(w
t
). Here
t
= w
t
, is the identity matrix,
b = 0, and the corresponding dierential operator L
t
= (1/2). We have
(a.s.)
u(w
t
) = t
_
t
0
(2/d)w
s
dw
s
+ (1/d)R
2
t.
Substitute t in place of t, take expectations, and notice that, since
[w
t
[ R before , we have 0 u(w
t
) (1/d)R
2
and
Ch 6 Section 7. Examples of applying It os formula 201
E
_
t
0
[w
s
[
2
ds R
2
t < .
Then we obtain
Eu(w
t
) = E(t ) + (1/d)R
2
, E(t ) = (1/d)R
2
Eu(w
t
).
It follows in particular that
E(t ) (1/d)R
2
, E (1/d)R
2
, < (a.s.).
Furthermore, by letting t and noticing that on the set < we
obviously have u(w
t
) u(w
) = 0, by the monotone convergence and

dominated convergence theorems we conclude that
E = lim
t
E(t ) = (1/d)R
2
lim
t
Eu(w
t
) = (1/d)R
2
.
Notice that, for d = 1, we have the result which we know already:
E = R
2
. Also notice that if we wanted to use Theorem 5.6, then we would
have to nd a function u such that L
t
u(w
t
) = 1 for s and u(w
) = 0.
In other words, we needed u such that (1/2)u = 1 in B
R
and u = 0 on
B
R
. This is exactly the one we used above. Finally, notice that in order
to apply Theorem 5.6 we have to be sure in advance that P( < ) = 1.
2. Example. Fix > 0 and x
0
R
d
with [x
0
[ > . Let us nd the
probability P that w
t
will ever reach

B
(x
0
) = x : [x x
0
[ .
First nd the probability P
R
that w
t
reaches [x x
0
[ = before
reaching [x x
0
[ = R, where R > [x
0
[. We want to apply Theorem 5.6
and therefore represent the desired probability P
R
as E(w
), where is
the rst exit time of w
t
from x : < [xx
0
[ < R, = 1 on [xx
0
[ =
and = 0 on [x x
0
[ = R. Notice that, owing to Example 1, we have
< (a.s.).
Now it is natural to try to nd a function u such that u = on [xx
0
[ =
[x x
0
[ = R and u = 0 in x : < [x x
0
[ < R. This is natural,
since then P
R
= u(0) by Theorem 5.6. It turns out that an appropriate
function u exists and is given by
u(x) =
_
_
A([x x
0
[
(d2)
R
(d2)
) if d 3,
A(ln [x x
0
[ ln R) if d = 2,
A([x x
0
[ R) if d = 1,
where
A =
_
_
(
(d2)
R
(d2)
)
1
if d 3,
(ln ln R)
1
if d = 2,
( R)
1
if d = 1.
Next, since the trajectories of w are continuous and for any T, are
bounded on [0, T], the event that w
t
ever reaches

B
(x
0
) is the union of
nested events, say E
n
that w
t
reaches [x x
0
[ = before reaching
[x x
0
[ = n. Hence P = lim
n
P
n
and
P =
_
_
_
d2
[x
0
[
d2
if d 3,
1 if d 2.
We see that one- and two-dimensional Wiener processes reach any neigh-
borhood of any point with probability one. For d 3 this probability is
strictly less than one, and this leads to the conjecture that [w
t
[ as
t for d 3.
3. Example. Our last example is aimed at proving the conjecture from the
end of Example 2. Fix x
0
,= 0 and take > 0 such that < [x
0
[. Denote by
the rst time w

t
reaches x : [x x
0
[ .
First we prove that
t
:= [w
t
x
0
[
2d
is a bounded martingale. The boundedness of
t
is obvious: 0 <
t

2d
.
To prove that it is a martingale, construct a smooth function f(x) on R
d
such that f(x) = [x x
0
[
2d
for [x x
0
[ . Then
t
= f(w
t
), and by
Itos formula
f(w
t
) = f(0) +
_
t
0
(1/2)f(w
s
) ds +
_
t
0
f
x
(w
s
) dw
s
.
Hence, owing to f(x) = 0, which holds for [x x
0
[ , we have
t
= f(w
t
) = [x
0
[
2d
+
_
t
0
I
s
f
x
(w
s
) dw
s
.
Here the second term and the right-hand side are martingales since [f
x
[ is
bounded. By the theorem on convergence of nonnegative (super)martingales,
lim
t
t
exists with probability one. We certainly have to remember that
this theorem was proved only for discrete time supermartingales. But its
Ch 6 Section 7. Examples of applying It os formula 203
proof is based on Doobs upcrossing inequality, and for continuous super-
martingales this inequality and the convergence theorem are extended with-
out any diculty as in the case of Lemma 1.5.
Now use that
t
is bounded to conclude that
[x
0
[
2d
= E
0
= E
t
= E lim
t
t
= E
1
lim
t
[w
t
x
0
[
d2
I
=
+EI
<
2d
.
By using the result of Example 2 we get that the last expectation is [x
0
[
2d
,
and therefore
E
1
lim
t
[w
t
x
0
[
d2
I
=
= 0,
so that lim
t
[w
t
[ = (a.s.) on the set
= for each > 0. Finally,

P
_
_
=1/m
=
_
= lim
m
P(
1/m
= ) = lim
m
(1 1/[mx
0
[
d2
) = 1,
and lim
t
[w
t
[ = (a.s.) indeed.
4. Exercise. Let d = 2 and take
from Example 3.
(i) Example 2 shows that
< (a.s.). Prove that
(a.s.) as
0, so that the probability that the two-dimensional Wiener process hits
a particular point is zero even though it hits any neigborhood of this point
with probability one.
(ii) Use the method in Example 3 to show that for d = 2 and 0 < < [x
0
[
ln [w
t
x
0
[ = ln [x
0
[ +
_
t
0
I
s
[w
s
x
0
[
2
(w
s
x
0
) dw
s
.
Let 0 here, and by using (i) conclude that [w
t
x
0
[
2
(w
t
x
0
) o and
ln [w
t
x
0
[ = ln [x
0
[ +
_
t
0
[w
s
x
0
[
2
(w
s
x
0
) dw
s
. (1)
(iii) Prove that Eln [w
t
x
0
[ > ln [x
0
[ for t > 0, so that the stochastic
integral in (1) is not a martingale.
8. Girsanovs theorem
Itos formula allows one to obtain an extremely important theorem about
change of probability measure. We consider here a d-dimensional Wiener
process (w
t
, T
t
) given on a complete probability space (, T, P) and assume
that the T
t
are complete.
We need the following lemma in which, in particular, we show how one
can do Exercises 3.2.5 and 4.3 by using It os formula.
1. Lemma. Let b o be an R
d
-valued process. Denote
t
=
t
(b) = exp
_
_
t
0
b
s
dw
s
(1/2)
_
t
0
(b
s
)
2
ds
_
= exp
_
d
i=1
_
t
0
b
i
s
dw
i
s
(1/2)
d
i=1
_
t
0
(b
i
s
)
2
ds
_
. (1)
Then
(i) d
t
= b
t
t
dw
t
;
(ii)
t
is a supermartingale;
(iii) if the process b
t
is bounded, then
t
is a martingale and, in partic-
ular, E
t
= 1;
(iv) if T [0, ) and E
T
= 1, then (
t
, T
t
) is a martingale for t
[0, T], and also for any sequence of bounded b
n
o such that
_
T
0
[b
n
s
b
s
[
2
ds
0 (a.s.) we have
E[
T
(b
n
)
T
(b)[ 0. (2)
Proof. Assertion (i) follows at once from Itos formula. To prove (ii)
dene
n
= inft 0 :
_
t
0
[b
s
[
2
2
s
ds n.
Then I
t<
n
b
t
t
H (see the beginning of Sec. 3), and so
_
t
0
I
s<
n
b
s
s
dw
s
is
a martingale. By adding that
t
n
= 1 +
_
t
n
0
b
s
s
dw
s
= 1 +
_
t
0
I
s<
n
b
s
s
dw
s
,
we see that
t
n
is a martingale. Consequently, for t
1
t
2
(a.s.)
Ch 6 Section 8. Girsanovs theorem 205
E(
t
2
n
[T
t
1
) =
t
1
n
.
As n , we have
n
and t
i

n
t
i
, so that by Fatous theorem
(a.s.)
E(
t
2
[T
t
1
)
t
1
.
This proves (ii) and implies that
E exp
_
_
t
0
b
s
dw
s
(1/2)
_
t
0
[b
s
[
2
ds
_
1. (3)
To prove (iii) let [b
s
[ K, where K is a constant, and notice that by
virtue of (3)
E
_
t
0
[b
s
[
2
2
s
ds K
2
E
_
t
0
2
s
ds
= K
2
_
t
0
E
2
s
(2b) exp
_
_
t
0
[b
s
[
2
ds
_
ds K
2
_
t
0
e
K
2
s
ds < .
Hence
_
t
0
b
s
s
dw
s
and
t
= 1 +
_
t
0
b
s
s
dw
s
are martingales.
To prove (iv), rst notice that E
T
(b
n
) = 1 by (iii), E
T
(b) = 1 by the
assumption, and
T
(b
n
)
T
(b) in probability by properties of stochastic
integrals. This implies (2) by Schees theorem. Furthermore, for t T
(a.s.)
t
(b
n
) = E(
T
(b
n
)[T
t
).
Letting n here and using Corollary 3.1.10 lead to a similar equality for
b in place of b
n
, and the martingale property of
t
(b) for t T now follows
from Exercise 3.2.2. The lemma is proved.
2. Remark. Notice again that
t
is a solution of d
t
= b
t
t
dw
t
. We know
that in the usual calculus solutions of df
t
= f
t
dt (that is, exponential
functions) play a very big role. As big a role in stochastic calculus is played
by exponential martingales
t
(b).
Inequality (3) implies the following.
3. Corollary. If b
s
is a bounded process or
_
t
0
[b
s
[
2
ds is bounded, then
Eexp
_
t
0
b
s
dw
s
< .
4. Exercise. (i) By following the argument in the proof of Lemma 1 (ii),
prove that if E sup
tT

t
< , then (
t
, T
t
) is a martingale for t [0, T].
(ii) Use the result of (i) to prove that, if p > 1 and N < and E
p
N
for every stopping time T, then (
t
, T
t
) is a martingale for t [0, T].
5. Exercise. Use Holders inequality and Exercise 4 (ii) to prove that if
E exp
_
T
0
c[b
t
[
2
dt <
for a constant c > 1/2, then E
T
(b) = 1.
6. Exercise. By using Exercise 5 and inspecting the inequality
1 = E
T
((1 )b)
_
E
T
(b)
1
_
Eexp
1
2
_
T
0
[b
t
[
2
dt
,
improve the result of Exercise 5 and show that it holds if
lim
0
ln E exp
1
2
_
T
0
[b
t
[
2
dt = 0, (4)
which is true if, for instance, E exp(1/2)
_
T
0
[b
t
[
2
dt < (A. Novikov). It
turns out that condition (4) can be relaxed even further by replacing = 0
with < on the right and lim with lim on the left.
The next lemma treats
t
(b) for complex-valued d-dimensional b
t
. In
this situation we introduce
t
(b) by the same formula (1) and for d-vectors
f = (f
1
, ..., f
d
) with complex entries f
k
denote (f)
2
=
k
f
2
k
.
7. Lemma. If b
t
is a bounded d-dimensional complex-valued process of
class o, then
t
(b) is a (complex-valued) martingale and, in particular,
E
t
(b) = 1 for any t.
Proof. Take t
2
> t
1
0 and A T
t
1
. To prove the lemma it suces to
prove that, if f
t
and g
t
are bounded R
d
-valued processes of class o, then for
all complex z
EI
A
exp
_
_
t
2
0
(f
s
+zg
s
) dw
s
(1/2)
_
t
2
0
(f
s
+zg
s
)
2
ds
_
= EI
A
exp
_
_
t
1
0
(f
s
+zg
s
) dw
s
(1/2)
_
t
1
0
(f
s
+zg
s
)
2
ds
_
. (5)
Observe that (5) holds for real z by Lemma 1 (iii). Therefore we will prove
(5) if we prove that both sides are analytic functions of z. In turn to prove
this it suces to show that both sides are continuous and their integrals
along closed bounded paths vanish. Finally, due to the analyticity of the ex-
pressions under expectation signs and Fubinis theorem we only need to show
that, for every R [0, ) and all [z[ R, these expressions are bounded
by a summable function independent of z. This boundedness follows easily
from Corollary 3, boundedness of f, g, and the fact that
exp
_
t
j
0
(f
s
+zg
s
) dw
s
= exp
_
t
j
0
(f
s
+g
s
Re z) dw
s
exp
_
t
j
0
(f
s
+Rg
s
) dw
s
+ exp
_
t
j
0
(f
s
Rg
s
) dw
s
,
where we have used the inequality
e
+e
+e
if [[ [[. The lemma is proved.

8. Theorem (Girsanov). Let T [0, ), and let b be an R
d
-valued process
of class o satisfying
E
T
(b) = 1.
On the measurable space (, T) introduce the measure

P by
P(d) =
T
(b)() P(d).
Then (, T,

P) is a probability space and w
t
is a d-dimensional Wiener
process on (, T,

P) for t T.
Proof. That (, T,

P) is a probability space follows from
P() =
_
T
(b) P(d) = E
T
(b) = 1.
Next denote
t
= w
t

_
t
0
b
s
ds. Since
0
= 0 and
t
is continuous in
t, to prove that
t
is a Wiener process, it suces to show that relative to
(, T,

P) the joint distributions of the increments of the
t
, t T, are the
same as for w
t
relative to (, T, P).
Let 0 t
0
t
1
... t
n
= T. Fix
j
R
d
, j = 0, ..., n 1, and dene
the function
s
as i
j
on [t
j
, t
j+1
), j = 0, ..., n 1. Also denote by

E the
expectation sign relative to

P. By Lemma 7, if b is bounded, then
E exp i
n1
j=0
j
(
t
j+1

t
j
) = E exp
_
_
T
0
s
dw
s

_
T
0
s
b
s
ds
_
T
(b)
= E
T
( +b)e
(1/2)
R
T
0
(
s
)
2
ds
= e
(1/2)
R
T
0
(
s
)
2
ds
.
It follows that
E exp i
n1
j=0
j
(
t
j+1

t
j
) = exp
_
(1/2)
n1
j=0
[
j
[
2
(t
j+1
t
j
)
_
. (6)
This proves the theorem if b is bounded. In the general case take a sequence
of bounded b
n
o such that (a.s.)
_
T
0
[b
n
s
b
s
[
2
ds 0 (for instance, cutting
o large values of [b
s
[). Then
E
T
( +b) = lim
n
E
T
( +b
n
),
since by Lemma 1 (iv) and the dominated convergence theorem (remember
s
is imaginary)
E[
T
( +b
n
)
T
( +b)[
= e
(1/2)
R
T
0
[
s
[
2
ds
E
T
(b
n
)e
R
T
0

s
b
n
s
ds
T
(b)e
R
T
0

s
b
s
ds
e
(1/2)
R
T
0
[
s
[
2
ds
_
E[
T
(b
n
)
T
(b)[
+E
R
T
0

s
b
s
ds
e
R
T
0

s
b
n
s
ds
T
(b)
_
0.
This and (6) yield the result in the general case. The theorem is proved.
Girsanovs theorem and the lemmas proved before it have numerous
applications. We discuss only few of them.
From the theory of ODEs it is known that the equation dx
t
= b(t, x
t
) dt
need not have a solution for any bounded Borel b. In contrast with this it
turns out that, for almost any trajectory of the Wiener process, the equation
dx
t
= b(t, x
t
+w
t
) dt does have a solution whenever b is Borel and bounded.
This fact is obtained from the following theorem after replacing x
t
with
t
w
t
.
9. Theorem. Let b(t, x) be an R
d
-valued Borel bounded function on (0, )
R
d
. Then there exist a probability space (, T, P), a d-dimensional continu-
ous process
t
and a d-dimensional Wiener process w
t
dened on that space
for t [0, T] such that
t
=
_
t
0
b(s,
s
) ds +w
t
(7)
for all t [0, T] and .
Proof. Take any complete probability space (, T,

P) carrying a d-
dimensional Wiener process, say
t
. Dene
w
t
=
t
_
t
0
b(s,
s
) ds
and on (, T) introduce a new measure P by the formula
P(d) = exp
_
_
T
0
b(s,
s
) d
s
(1/2)
_
T
0
[b(s,
s
)[
2
ds
_
P(d).
Then (, T, P) is a probability space, w
t
is a Wiener process on (, T, P)
for t [0, T], and, by denition,
t
solves (7). The theorem is proved.
The proof of this theorem looks like a trick and usually leaves the reader
unsatised. Indeed rstly, no real method is given such as Picards method of
successive approximations or Eulers method allowing one to nd solutions.
Secondly, the question remains as to whether one can nd solutions on a
given probability space without changing it, so that
t
would be dened
by the Wiener process w
t
and not conversely. Theorem 9 was proved by
I. Girsanov around 1965. Only in 1978 did A. Veretennikov prove that
indeed the solutions can be found on any probability space, and only in
1996 did it become clear that Eulers method allows one to construct the
solutions eectively.
Let us also show the application of Girsanovs theorem to nding
P(max
t1
(w
t
+t) 1),
where w
t
is a one-dimensional Wiener process. Let b = 1 and
P(d) = e
w
t
1/2
P(d).
By Girsanovs theorem w
t
:= w
t
+t is a Wiener process for t [0, 1]. Since
the distributions of Wiener processes in the space of continuous functions
are all the same and are given by Wiener measure, we conclude
P(max
t1
(w
t
+t) 1) =
_
I
max
t1
w
t
1
e
w
1
1/2
e
w
1
1/2
P(d)
=
_
I
max
t1
w
t
1
e
w
1
1/2

P(d) = EI
max
t1
w
t
1
e
w
1
1/2
.
Now remember the result of Exercise 2.2.10, which is
P(max
t1
w
t
1, w
1
x) =
_
_
_
P(w
1
2 x) if x 1,
2P(w
1
1) P(w
1
x) if x 1.
Then by using the hint to Exercise 2.2.12, we get
P(max
t1
(w
t
+t) 1) =
_

1
e
x1/2
1
2
e
x
2
/2
dx
+
_
1
e
x1/2
1
2
e
(2x)
2
/2
dx =
1
2e
_

1
(e
x
+e
2x
)e
x
2
/2
dx.
In the following exercise we suggest the reader derive a particular case
of the Burkholder-Davis-Gundy inequalities.
10. Exercise. Let be a bounded stopping time. Then for any real we
have
Ee
w
2
/2
= 1.
By using Corollary 3, prove that we can dierentiate this equality with
respect to as many times as we wish, bringing all derivatives inside the
expectation sign. Then, for any integer k 1, prove that
E(a
0
w
2k
+a
2
w
2k2
+a
4
w
2k4

2
+... +a
2k
k
) = 0,
where a
0
, ..., a
2k
are certain absolute constants (depending on k) and a
0
,= 0
and a
2k
,= 0. Finally, remembering H olders inequality, prove that
Ew
2k
NE
k
, E
k
NEw
2k
,
where the constant N depends only on k.
Ch 6 Section 9. Stochastic It o equations 211
9. Stochastic It o equations
A very wide class of continuous stochastic processes can be obtained by
modeling various diusion processes. They are generally characterized by
being Markov and having local drift and diusion; that is, behaving near
a point x on the time interval t like (x) w
t
+ b(x) t, where (x) is
the local diusion coecient and b(x) is the local drift. A quite satisfactory
model of such processes is given by solutions of stochastic Ito equations.
Let (, T, P) be a complete probability space, (w
t
, T
t
) a d-dimensional
Wiener process given for t 0. Assume that the -elds T
t
are complete
(which is needed, for instance, to dene stochastic integrals as continuous
T
t
-adapted processes). Let b(t, x) and (t, x) be Borel functions dened on
(0, ) R
d
. We assume that b is R
d
1
-valued and takes values in the set
of d
1
d matrices. Finally, assume that there exists a constant K < such
that for all x, y, t
[[(t, x)[[+[b(t, x)[ K(1 +[x[),
[[(t, x) (t, y)[[+[b(t, x) b(t, y)[ K[x y[,
(1)
where by [[[[ for a matrix we mean
_
i,j
(
ij
)
2
_
1/2
.
Take an T
0
-measurable R
d
1
-valued random variable
0
and consider the
following It o equation:
t
=
0
+
_
t
0
(s,
s
) dw
s
+
_
t
0
b(s,
s
) ds t 0. (2)
By a solution of this equation we mean a continuous T
t
-adapted process
given for t 0 and such that (2) holds for all t 0 at once with probability
one. Notice that, for any continuous T
t
-adapted process
t
given for t 0,
the function (, t) is jointly measurable in (, t) and the functions (t,
t
)
and b(t,
t
) are jointly measurable in (, t) and T
t
-adapted. In addition,
(t,
t
) and b(t,
t
) are bounded for each on [0, t] for any t < . It follows
that for such processes
t
the right-hand side of (2) makes sense.
In our investigation of solvability of (2) we use the following lemma, in
which M is the set of all nite stopping times.
1. Lemma. (i) Let
t
and
t
be continuous nonnegative T
t
-adapted pro-
cesses, f o, and
t

t
+
_
t
0
f
s
dw
s
.
Let
t
be nondecreasing in t and E
< for every M. Then

E
M.
(ii) Let
t
be a continuous nonnegative T
t
-adapted process and E
N
for all M, where N is a constant (independent of ). Then, for every
> 0,
P(sup
t
t
> ) N/.
Proof. (i) Denote
n
= inft 0 :
_
t
0
[f
s
[
2
ds n.
Then
n
as n and I
s<
n
f
s
H. Hence, for every M,
+
_

0
I
s<
n
f
s
dw
s
, E
n
E
.
After that, Fatous theorem proves (i).
(ii) Dene
= inft 0 :
t
.
Then
P(sup
t
t
> ) P( < ) = lim
t
P( < t)
lim
t
P(
t
) lim
t
1
E
t

N
.
2. Theorem. Equation (2) has a solution.
Proof. We apply the usual Picard method of successive approximations.
For n 0 dene
t
(n + 1) =
0
+
_
t
0
(s,
s
(n)) dw
s
+
_
t
0
b(s,
s
(n)) ds,
t
(0)
0
. (3)
Notice that all the processes
t
(n) are continuous and T
t
-adapted, and our
denition makes sense for all n 0. Dene
t
= e
N
0
t[
0
[
,
where the constant N
0
1 will be specied later. We want to show by
induction that
sup
M
E
_
(n)[
2
+
_

0
s
[
(n)[
2
ds
_
< . (4)
For n = 0 estimate (4) is obvious, since a
2
e
[a[
is a bounded function
and
_
0
e
N
0
t
dt < . Next, by It os formula we nd that
d(
t
[
t
(n + 1)[
2
) = [
t
(n + 1)[
2
d
t
+
t
d[
t
(n + 1)[
2
=
t
[N
0
[
t
(n + 1)[
2
+ 2
t
(n + 1) b(t,
t
(n)) +[[(t,
t
(n))[[
2
] dt
+2
t
(t,
t
(n))
t
(n + 1) dw
t
.
Here to estimate the expression in the brackets we use 2ab a
2
+b
2
and (1)
to nd that
2
t
(n + 1) b(t,
t
(n)) +[[(t,
t
(n))[[
2
[
t
(n + 1)[
2
+[b(t,
t
(n))[
2
+[[(t,
t
(n))[[
2
[
t
(n + 1)[
2
+ 2K
2
(1 +[
t
(n)[)
2
[
t
(n + 1)[
2
+ 4K
2
(1 +[
t
(n)[
2
).
Hence, for N
0
2
t
[
t
(n + 1)[
2
+
_
t
0
s
[
s
(n + 1)[
2
ds
0
[
0
[
2
+ 4K
2
_
t
0
(1 +[
s
(n)[
2
)
s
ds
+2
_
t
0
(s,
s
(n))
s
(n + 1) dw
s
Applying Lemma 1 leads to (4).
Further,
d(
t
[
t
(n + 1)
t
(n)[
2
) =
t
[N
0
[
t
(n + 1)
t
(n)[
2
+2(
t
(n + 1)
t
(n)) (b(t,
t
(n)) b(t,
t
(n 1)))
+[[(t,
t
(n)) (t,
t
(n 1))[[
2
] dt
+2
t
[(t,
t
(n)) (t,
t
(n 1))]
(
t
(n + 1)
t
(n)) dw
t
.
Due to (1) the expression in the brackets is less than
(N
0
1)[
t
(n + 1)
t
(n)[
2
+ 2K
2
[
t
(n)
t
(n 1)[
2
.
Now we make the nal choice of N
0
and take it equal to 4K
2
+ 2, so that
N
0
2 as we needed above and c := N
0
1 c/2 2K
2
. Then we get
d(
t
[
t
(n + 1)
t
(n)[
2
) +c
t
[
t
(n + 1)
t
(n)[
2
dt
(c/2)
t
[
t
(n)
t
(n 1)[
2
dt
+2
t
[(t,
t
(n)) (t,
t
(n 1))]
(
t
(n + 1)
t
(n)) dw
t
.
It follows by Lemma 1 that for any M
E
(n + 1)
(n)[
2
+cE
_

0
t
[
t
(n + 1)
t
(n)[
2
dt
(c/2)E
_

0
t
[
t
(n)
t
(n 1)[
2
dt. (5)
By iterating (5) we get
E
_

0
t
[
t
(n + 1)
t
(n)[
2
dt 2
n
E
_

0
t
[
t
(1)
t
(0)[
2
dt =: N2
n
.
Coming back to (5), we now see that
E
(n + 1)
(n)[
2
cN2
n
,
which by Lemma 1 yields
P
_
sup
t0
(
t
[
t
(n + 1)
t
(n)[
2
) n
4
_
n
4
cN2
n
.
By the Borel-Cantelli lemma we conclude that the series
n=1
1/2
t
[
t
(n + 1)
t
(n)[
converges uniformly on [0, ) with probability one. Obviously this implies
that
t
(n) converges uniformly on each nite time interval with probability
one. Let
t
denote the limit. Then, by the dominated convergence theorem
(or just because of the uniform convergence to zero of the integrands),
_
t
0
[[(s,
s
(n)) (s,
s
)[[
2
ds +
_
t
0
[b(s,
s
(n)) b(s,
s
)[ ds 0
(a.s.). Furthermore,
t
is continuous in t and T
t
-adapted. Therefore, by
letting n in (3) we obtain (a.s.)
t
=
0
+
_
t
0
(s,
s
) dw
s
+
_
t
0
b(s,
s
) ds.
It is convenient to deduce the uniqueness of solutions to (2) from the
following theorem on continuous dependence of solutions on initial data.
3. Theorem. Let T
0
-measurable d-vector valued random variables
n
0
sat-
isfy
n
0

0
(a.s.) as n . Let (a.s.) for t 0
t
=
0
+
_
t
0
(s,
s
) dw
s
+
_
t
0
b(s,
s
) ds,
n
t
=
n
0
+
_
t
0
(s,
n
s
) dw
s
+
_
t
0
b(s,
n
s
) ds.
Then
P
_
sup
tT
[
n
t

t
[
_
0
as n for any > 0 and T [0, ).
Proof. Take N
0
from the previous proof and denote
t
= exp(N
0
t sup
n
[
n
0
[).
Notice that [
0
[ sup
n
[
n
0
[ (a.s.). Also the last sup is nite, and hence
t
> 0 (a.s.). By It os formula
d(
t
[
t

n
t
[
2
) =
t
[N
0
[
t

n
t
[
2
+ 2(
t

n
t
) (b(t,
t
) b(t,
n
t
))
+[[(t,
t
) (t,
n
t
)[[
2
] dt + 2[(t,
t
) (t,
n
t
)]
(
t

n
t
) dw
t
.
By following the proof of Theorem 2 we see that the expression in brackets
is nonpositive. Hence for any M
E
[
2
E
0
[
0
n
0
[
2
.
Here the random variables
0
[
0
n
0
[
2
are bounded by a constant indepen-
dent of n and tend to zero (a.s.). By the dominated convergence theorem
and Lemma 1,
P
_
sup
t0
t
[
t

n
t
[
2
>
_
E
0
[
0
n
0
[
2
0.
Consequently, sup
t0
t
[
t
n
t
[
2
converges to zero in probability and, since
sup
tT
[
t

n
t
[
2

1
T
sup
t0
t
[
t

n
t
[
2
,
the random variables sup
tT
[
t
n
t
[
2
converge to zero in probability as well.
4. Corollary (uniqueness). If
t
and
t
are two solutions of (2), then
P(sup
t0
[
t

t
[ > 0) = 0.
The following corollary states the so-called Feller property of solutions
of (2).
5. Corollary. For x R
d
, let
t
(x) be a solution of equation (2) with
0
x. Then, for every bounded continuous function f and t 0, the
function Ef(
t
(x)) is a continuous function of x.
6. Corollary. In notation from the proof of Theorem 3,
P
_
T
sup
tT
[
t

n
t
[
2
>
_
E
0
[
0

n
0
[
2
.
10. An example of a stochastic equation
In one-dimensional space we consider the following equation:
t
=
_
t
0
(
s
) dw
s
+
_
t
0
b(
s
) ds (1)
with a one-dimensional Wiener process w
t
which is Wiener relative to a
ltration of complete -elds T
t
. We assume that and b are bounded
functions satisfying a Lipschitz condition, so that there exists a unique so-
lution of (1).
Fix r 0 and let
(r) = inft 0 :
t
, (r, 1).
By Exercise 2.4 we have that (r) is a stopping time relative to T
t
. We
want to nd E(r). By It os formula, for twice continuously dierentiable
functions u we have
u(
t
) = u(0) +
_
t
0
Lu(
s
) ds +
_
t
0
(
s
)u
t
(
s
) dw
s
, (2)
where the operator L, called the generator of the process
t
, is given by
Lu(x) = a(x)u
tt
(x) +b(x)u
t
(x), a = (1/2)
2
.
Ch 6 Section 10. An example of a stochastic equation 217
If we can substitute (r) in place of t in (2) and take the expectation
of both sides and be sure that the expectation of the stochastic integral
vanishes, then we nd that
u(0) = Eu(
(r)
) E
_
(r)
0
Lu(
s
) ds. (3)
Upon noticing after this that
E(r) = E
_
(r)
0
dt,
we arrive at the following way to nd E(r): Solve the equation Lu = 1 on
(a, 1) with boundary conditions u(r) = u(1) = 0 (in order to have u(
(r)
) =
0); then E(r) should be equal to u(0).
1. Lemma. Let a(x) , where is a constant, > 0. For x, y [r, 1]
dene
(x) = exp
_
_
x
0
b(s)/a(s) ds
_
, =
_
1
r
(x) dx,
g(x, y) =
1
a(y)(y)
_
1
xy
(s) ds
_
xy
r
(s) ds.
Then, for any continuous function f(x) given on [r, 1], the function
u(x) :=
_
1
r
g(x, y)f(y) dy (4)
is twice continuously dierentiable on [r, 1], vanishes at the end points of
this interval, and satises Lu = f on [r, 1].
The proof of this lemma is suggested as an exercise, which the reader
is supposed to do by solving the equation au
tt
+ bu
t
= f on [r, 1] with
boundary condition u(r) = u(1) = 0 and then transforming the result to the
right-hand side of (4).
2. Theorem. Under the assumptions of Lemma 1, for any Borel nonneg-
ative function f we have
E
_
(r)
0
f(
t
) dt =
_
1
r
g(0, y)f(y) dy. (5)
In particular,
E(r) =
_
1
r
g(0, y) dy.
Proof. A standard measure-theoretic argument shows that it suces
to prove the theorem for nonnegative bounded continuous f. For such a
function, dene u(x) in [r, 1] as a solution to the equation au
tt
+ bu
t
= f
on [r, 1] with boundary condition u(r) = u(1) = 0. Then continue u outside
[r, 1] to get a twice continuously dierentiable function on R, and keeping the
same notation for the continuation use (2) with t (r) in place of t. After
that take expectations, and notice that the expectation of the stochastic
integral vanishes since u
t
(
s
) is bounded on [0, (r)]. Then we get
Eu(
t(r)
) = u(0) E
_
t(r)
0
f(
s
) ds. (6)
If we take f 1 here, then we see that E(t (r)) is bounded by a
constant independent of t. It follows by the monotone convergence theorem
that E(r) < and (r) < (a.s.). Hence by letting t in (6) and
noticing that u(
t(r)
) u(
(r)
) = 0 (a.s.) due to the boundary condi-
tions, by the dominated convergence theorem and the monotone convergence
theorem (f 0) we get
u(0) = E
_
(r)
0
f(
s
) ds,
which is (5) owing to Lemma 1. The theorem is proved.
As a consequence of (5), as in Exercise 2.6.6, one nds that the average
time the process
t
spends in an interval [c, d] (r, 1) before exiting from
(r, 1) is given by
_
d
c
g(0, y) dy.
The remaining part of this section is aimed at exhibiting a rather unex-
pected eect which happens when b 1 and a is very close to zero for x < 0
and very large for x > 0. It turns out that in this situation E(r) is close to
1, and the average time spent by the process in a very small neighborhood
of zero before exiting from (r, 1) is also close to 1. Hence the process spends
almost all time near the origin and then immediately jumps out of (r, 1).
The unexpected here is that there is the unit drift pushing the particle to
the right, and the diusion is usually expected to get the particle around
this deterministic motion but not to practically stop it. Furthermore, it
Ch 6 Section 10. An example of a stochastic equation 219
turns out that the process spends almost all time in a small region where
the diusion is small and, remember, b 1.
The following exercise makes it natural that if the diusion on (, 0) is
slow then E(r) is close to 1. Assertion (i) in Exercise 3 looks quite natural
because neither diusion vanishing for x c nor positive drift can bring our
moving particle
t
starting at zero below c 0.
3. Exercise. Assume that (x) = 0 for x c, where c is a constant, c 0
and b(x) 0 for all x.
(i) Prove that
t
c for all t 0 (a.s.).
(ii) Prove that, if c > r and b , where is a constant and > 0, then
E
_
(r)
0
b(
t
) dt = 1
and, in particular, if b 1, then E(r) = 1.
4. Exercise*. Let b 1. Prove that E(r) 1.
We will be dealing with depending on r (1, 0), which will be sent
to zero. Let
b 1, a(x) = r
4
if x < 0, a(x) = [r[
1
if x > [r[,
and let a be linear on [0, [r[] with a(0) = r
4
and a([r[) = [r[
1
, so that a is a
Lipschitz continuous function. Naturally, we take =

2a. Then is also
Lipschitz continuous, and the corresponding process
t
is well dened. Now
we can make precise what is stated before Exercise 3.
5. Theorem. As r 0 we have
E(r) 1, E
_
(r)
0
I
[r,0]
(
t
) dt 1.
Proof. Due to Exercise 4 the rst relation follows from the second one,
which in turn by virtue of Theorem 2 can be rewritten as
_
0
r
g(0, y) dy 1. (7)
Next, in the notation from Lemma 1 the integral in (7) equals
1
_
1
0
(s) ds
_
0
r
1
a(y)(y)
_
_
y
r
(s) ds
_
dy
=
1
_
1
0
(s) ds
_
0
r
_
_
y
r
(s) ds
_
d
1
(y)
=
1
_
1
0
(s) ds
_
_
0
r
(s) ds [r[
. (8)
Furthermore, (s) = e
[s[/r
4
if s 0, whence
_
0
r
(s) ds =
_
0
r
e
[s[/r
4
ds
as r 0. To investigate other terms in (8), observe that, for s [0, [r[],
_
s
0
a
1
(t) dt =
_
s
0
r
2
r
6
+ (1 [r[
5
)t
dt =
r
2
1 [r[
5
ln(1 + (r
6
[r[
1
)s)
r
2
1 [r[
5
ln(1 + ([r[
5
1)) =
5r
2
1 [r[
5
ln [r[ 0.
For s [r[
_
s
0
a
1
(t) dt
5r
2
1 [r[
5
ln [r[+
_
s
[r[
a
1
(t) dt =
5r
2
1 [r[
5
ln [r[+(s[r[)[r[ 0.
It follows that
_
1
0
(s) ds 1, =
_
0
r
(s) ds +
_
1
0
(s) ds
_
0
r
(s) ds ,
so that indeed the last expression in (8) tends to 1 as r 0. The theorem is
proved.
11. The Markov property of solutions of
stochastic equations
1. Denition. A vector-valued random process
t
given for t 0 is called
Markov if for every Borel bounded function f(x), n = 1, 2, ..., and t, t
1
, ..., t
n
such that 0 t
1
... t
n
t we have (a.s.)
E(f(
t
)[
t
1
, ...,
t
n
) = E(f(
t
)[
t
n
).
Ch 6 Section 11. Markov property of solutions 221
Remember that E(f(
t
)[
t
1
, ...,
t
n
) was dened as the conditional expec-
tation of f(
t
) given the -eld (
t
1
, ...,
t
n
) generated by
t
1
, ...,
t
n
. We
know that E(f(
t
)[
t
1
, ...,
t
n
) is the best (in the mean square sense) estimate
of f(
t
) which can be constructed on the basis of
t
1
, ...,
t
n
. If we treat this
estimate as the prediction of f(
t
) on the basis of
t
1
, ...,
t
n
, then we see
that for Markov processes there is no need to remember the past to predict
the future: remembering the past does not aect our best prediction.
In this section we make the same assumptions as in Sec. 9, and rst we
try to explain why the solution of equation (9.2) should possess the Markov
property.
Let 0 t
1
... t
n
. Obviously, for t t
n
the process
t
satises
t
=
t
n
+
_
t
t
n
(s,
s
) dw
s
+
_
t
t
n
b(s,
s
) ds.
This makes it more or less clear that
t
is completely dened by
t
n
and the
increments of w
after time t. For t xed one may write this fact as
t
= g(
t
n
, w
u
w
v
, u v t
n
), f(
t
) = h(
t
n
, w
u
w
v
, u v t
n
).
Next, observe that
t
is T
t
-measurable and w
u
w
v
is independent of T
t
by denition. Then we see that w
u
w
v
is independent of
t
if u v t,
and Theorem 3.1.13 seems to imply that
E(f(
t
)[
t
1
, ...,
t
n
) = E(h(
t
n
, w
u
w
v
, u v t
n
)[
t
1
, ...,
t
n
)
= Eh(x, w
u
w
v
, u v t
n
)[
x=
t
n
(1)
(a.s.). Since one gets the same result for E(f(
t
)[
t
n
), we see that
t
is
a Markov process. Unfortunately this very convincing explanation cannot
count as a proof, since to apply Theorem 3.1.13 in (1) we have to know that
h(x, w
u
w
v
, u v t
n
) is measurable with respect to (x, ). Actually,
on the basis of Kolmogorovs theorem for random elds one can prove that
g(x, w
u
w
v
, u v t
n
) has a modication which is continuous in x, so
that h(x, w
u
w
v
, u v t
n
) has a modication measurable with respect
to (x, ). However we prefer a dierent way of proving the Markov property,
because it is shorter and applicable in many other situations.
Let us x x and consider the equation
t
= x +
_
t
t
n
(s,
s
) dw
s
+
_
t
t
n
b(s,
s
) ds, t t
n
. (2)
Above we have investigated such equations only with t
n
= 0. The case
t
n
0 is not any dierent. Therefore, for t t
n
equation (2) has a continu-
ous T
t
-adapted solution. We denote this solution by
t
(x). As in Theorem
9.3, one proves that
t
(x
n
)
P

t
(x) if x
n
x. Among other things, this
implies the uniqueness of solutions to (2).
2. Lemma. For any t t
n
and x R
d
1
the random variable
t
(x) is
measurable with respect to the completion of the -eld (
t
generated by w
s
w
t
n
, s [t, t
n
].
Proof. It is easy to understand that the process w
t
:= w
t+t
n
w
t
n
is
a Wiener process and, by denition, (
t
= T
w
tt
n
. Let

o be the set o con-
structed from

T
w
t
:= (T
w
t
)
P
. It turns out that, for any R
d
1
-valued process

o, we have I
tt
n
tt
n
o and (a.s.)
_
t
0
s
d w
s
=
_
t
n
+t
t
n
st
n
dw
s
t 0. (3)
Here the rst statement is obvious since

T
w
t
T
t+t
n
and an

T
w
t
-measurable
variable
t
is also T
t+t
n
-measurable. In (3) both sides are continuous in t.
Therefore, it suces to prove (3) for each particular t. Since the equality
is obvious for step functions, a standard argument applied already at least
twice in previous sections proves (3) in the general case.
Next, by Theorem 9.2 there is an

T
w
t
-adapted solution to
t
= x +
_
t
0
(s +t
n
,

s
) d w
s
+
_
t
0
b(s +t
n
,

s
) ds.
By virtue of (3) the process

tt
n
satises (2) for t t
n
and is T
t
-adapted.
It follows from uniqueness that
t
(x) =

tt
n
(a.s.), and since

tt
n
is

T
w
tt
n
-
measurable (that is,

(
t
-adapted) the lemma is proved.
3. Lemma. The -elds (
t
and T
t
n
are independent. That is, P(AB) =
P(A)P(B) for each A (
t
and B T
t
n
.
Proof. Let B T
t
n
, Borel
1
, ...,
m
R
d
1
, and 0 s
1
... s
m
. By
using properties of conditional expectations we nd that
PB, ( w
s
1
, w
s
2
w
s
1
, ..., w
s
m
w
s
m1
)
1
...
m
= EI
B
I
1
(w
t
n
+s
1
w
t
n
) ... I
m1
(w
t
n
+s
m1
w
t
n
+s
m2
)
EI
m
(w
t
n
+s
m
w
t
n
+s
m1
)[T
t
n
+s
m1
= PB, ( w
s
1
, w
s
2
w
s
1
, ..., w
s
m1
w
s
m2
)
1
...
m1
Ch 6 Section 11. Markov property of solutions 223

P( w
s
m
w
s
m1

m
)
= P(B)P( w
s
1
, w
s
2
w
s
1
, ..., w
s
m
w
s
m1
)
1
...
m
.
Therefore, P(AB) = P(A)P(B) for A from a -system generating
( w
s
1
, w
s
2
w
s
1
, ..., w
s
m
w
s
m1
) = ( w
s
1
, w
s
2
, ..., w
s
m
).
Since both sides of P(AB) = P(A)P(B) are measures in A, they coincide
on this -eld. Now P(AB) = P(A)P(B) for any A of type
: ( w
s
1
, w
s
2
, ..., w
s
m
)
1
...
m
.
The collection of those sets is again a -system, this time generating (
t
.
Since both sides of P(AB) = P(A)P(B) are measures, they coincide for all
A (
t
. The lemma is proved.
Lemmas 2 and 3 imply the following.
4. Corollary. For t t
n
and x R
d
1
the random vector
t
(x) and -eld
T
t
n
are independent.
In the following lemma we use the notation [x] = ([x
1
], ..., [x
d
1
]) for
x = (x
1
, ..., x
d
1
).
5. Lemma. Let
m
= 2
m
[2
m
t
n
], where
t
is the solution of (2). Then
t
(
m
)
P

t
as m for each t t
n
.
Proof. On the set :
m
= x (a.s.) for t t
n
the process
t
(
m
)
satises equation (2) with x replaced by
m
. Since the union of such sets
is , the process
t
(
m
) satises equation (2) (a.s.) with x replaced by
m
.
We have already noticed above that
t
for t t
n
satises (2) with
t
n
in
place of x. By noticing that
m

t
n
(uniformly in ) we get the result as
in Theorem 9.3. The lemma is proved.
6. Theorem. The solution of equation (9.2) is a Markov process.
Proof. Take t t
n
and a bounded continuous function f(x) 0. Dene
(x) = Ef(
t
(x))
and let
m
be the countable set of all values of 2
m
[2
m
x], x R
d
1
. Since
t
(x) is continuous in probability, the function is continuous. Therefore,
for B T
t
n
by Corollary 4 and Lemma 5 we obtain
EI
B
f(
t
) = lim
m
EI
B
f(
t
(
m
)) = lim
m
r
m
EI
B
f(
t
(r))I
m
=r
= lim
m
r
m
EI
B,
m
=r
(r) = lim
m
EI
B
(
m
) = EI
B
(
t
n
).
By denition and properties of conditional expectations this yields (a.s.)
E(f(
t
)[T
t
n
) = (
t
n
),
E(f(
t
)[
t
n
) = EE(f(
t
)[T
t
n
)[
t
n
= E((
t
n
)[
t
n
) = (
t
n
),
E(f(
t
)[
t
1
, ...,
t
n
) = E(f(
t
)[
t
n
).
It remains to extend the last equality to all Borel bounded f. Again x
a B T
t
n
and consider two measures
() = P(B,
t
), () = EI
(
t
)P(B[
t
n
).
If f is a step function, one easily checks that
_
f(x) (dx) = EI
B
f(
t
),
_
f(x) (dx) = Ef(
t
)E(I
B
[
t
n
) = EI
B
E(f(
t
)[
t
n
).
These equalities actually hold for all Borel bounded functions, as one can
see upon remembering that such functions are approximated uniformly by
step functions. Hence, what we have proved above means that
_
f (dx) =
_
f (dx)
for all bounded continuous f 0. We know that in this case the measures
and coincide. Then the integrals against them also coincide, so that for
any Borel bounded f and B T
t
n
we have
EI
B
f(
t
) = EI
B
E(f(
t
)[
t
n
).
This yields E(f(
t
)[T
t
n
) = E(f(
t
)[
t
n
). The theorem is proved.
2.2 If
t
is right continuous, then
t
= lim
n
(t)
, where
n
(t) = 2
n
[2
n
t] +
2
n
.
2.4 If a = 1 and b = 1, then for every t 0
: > t = : sup
rt
2
r
< 1,
where is the set of all rational numbers on [0, ).
2.9 Dene = inft 0 : [
t
[ c and use Chebyshevs inequality.
2.10 In Theorem 2.8 and Exercise 2.9 put N = c
2
and integrate with respect
to c over (0, ).
3.6 Use Exercise 2.9.
3.11 Use Daviss inequality.
3.12 See the proof of Theorem 2.4.1 in order to get that < 1 and
(1 s)
1
I
s<
o. Then prove that, for each t 0, on the set t
we have (a.s.)
_
t
0
1
1 s
I
s<
dw
s
=
_

0
1
1 s
dw
s
.
3.13 Use Daviss inequality.
4.3 Use Exercise 3.2.5 and Fatous theorem for conditional expectations.
6.2 In (i) consider t.
6.5 Approximate stopping times with simple ones and use Bacheliers the-
orem.
7.4 In (iii) use the fact that
_
n
0
s
1
e
[x[
2
/(2s)
ds
_
n
0
s
1
e
1/(2s)
ds =
_
n[x[
2
n
s
1
e
1/(2s)
ds 2 ln [x[
as n .
8.4 For appropriate stopping times
n
, the processes
t
n
are mar-
tingales on [0, T] and the processes
p
t
n
are submartingales. By Doobs
inequality conclude that E sup
tT

p
t
n
N.
8.10 Observe that for =
1/2
and r = w
1/2
we have
exp(w

2
/2) = exp(r
2
/2) =: f(r, ).
Furthermore, Leibnizs rule shows that f
(2k)
(r, 0) is a polynomial (called a

Hermite polynomial) in r of degree 2k with nonzero free coecient.
10.3 In (i) take any smooth decreasing function u(x) such that u(x) > 0 for
x < c and u(x) = 0 for x c, and prove that u(
t
) =
_
t
0
u
t
(
s
)b(
s
) ds. By
comparing the signs of the sides of this formula, conclude that u(
t
) = 0.
10.4 Observe that E
t(r)
= E(t (r)).
Bibliography
[Bi] Billingsley, P., Convergence of probability measures. Second edition. Wiley Series in
Probability and Statistics: Probability and Statistics. A Wiley-Interscience Publica-
tion. John Wiley & Sons, Inc., New York, 1999.
[Do] Doob, J. L., Stochastic processes. Wiley Classics Library. A Wiley-Interscience Pub-
lication. John Wiley & Sons, Inc., New York, 1990.
[Du] Dudley, R.M., Real analysis and probability. Chapman & Hall/CRC, Boca Raton-
London-New York-Washington, D.C., 1989.
[GS] Ghman,

I.
I.; Skorokhod, A.V., The theory of stochastic processes. I. Grundlehren

der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sci-
ences], 210, Springer-Verlag, Berlin-New York, 1980.
[IW] Ikeda, N.; Watanabe, S., Stochastic dierential equations and diusion processes.
North-Holland Publishing Company, Amsterdam-Oxford-New York, 1981.
[It] Ito, K., On stochastic dierential equations. Mem. Amer. Math. Soc., No. 4 (1951).
[IM] Ito, K., McKean, H.P., Diusion processes and their sample paths. Grundlehren der
Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences],
125, Springer-Verlag, Berlin, 1965.
[Kr] Krylov, N.V., Introduction to the theory of diusion processes, Amer. Math. Soc.,
Providence, RI, 1995.
[Me] Meyer, P. A., Probability and potentials, Blaisdell Publishing Company, A Division
of Ginn and Company, Waltham, Massachusetts, Toronto, London, 1966.
[RY] Revuz, D.; Yor, M., Continuous martingales and Brownian motion. Third edition.
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Math-
ematical Sciences], 293. Springer-Verlag, Berlin-New York, 1999.
[Sk] Skorokhod, A. V., Random processes with independent increments. Mathematics and
its Applications (Soviet Series), 47. Kluwer Academic Publishers Group, Dordrecht,
1991.
[St] Stroock, D.W., Probability theory, an analytic view (revised edition). Cambridge
Univ. Press, 1999.
227
228 Bibliography
[SW] Stroock, D.W.; Varadhan, S.R.S., Multidimensional diusion processes. Springer
Verlag, Berlin-New York, 1979.
[Ya] Yaglom, A. M., Correlation theory of stationary and related random functions. Vols.
I and II. Springer Series in Statistics. Springer-Verlag, Berlin-New York, 1987.
Index
A
c
, 1
B
o
r
(x), 13
B(X), 2, 6
B
n
, 15
C, 16
D[0, T], 134
D[0, ), 134
E[, 71
E[, 71
F
, 4
T
P
, 2
T
w
t
, 52
T
s,t
, 146
T
, 83, 89
T
, 83
H, 66
H
0
, 169
L
p
(, ), 39
L
2
(0, 1), 47
L
p
-norm, 39
, 2
/, 195
M, 211
N(a), 18
N(m, R), 22
P(A[), 71
P
1
, 4
1, 61
R
T,
, 144
R
+
, 144
S(), 39
S, 179
X
n
, 15
x
, 55
Z
d
n
, 23
(a, b), 87
n
, 134
f
, 10
t
(f), 145
-system, 45
1
(B), 3
, 4
0
, 39
-system, 45
t
(b), 204
(C), 16
-eld, 1
-eld generated by, 2, 4
(T), 2
(), 4
a
, 54
[], 5
W
n
T
n
, 89
), 178
n
w
, 10
|
(0, ]], 195
[[ [[
p
, 39
[[[[, 211
adapted functions, 66
almost surely, 3
asymptotically normal sequences, 30
Borel functions, 6
Borel sets, 2, 6
Borel -eld, 2, 6
cadlag functions, 134
Cauchy process, 143
centered Poisson measure, 156
229
230 Index
complete -eld, 3
completion of a -eld, 2
complex Wiener process, 189
conditional expectation, 71
continuous process, 16
continuous-time random process, 14
correlation function, 95
covariance function, 22
cylinder -eld, 16
dening sequence, 39
distribution, 4
Doobs decomposition, 79
Doobs inequality for moments, 85
Doob-Kolmogorov inequality, 84
ergodic process, 125
exchangeable sequence, 119
expectation, 4
exponential martingales, 205
Feller property, 216
ltration of -elds, 52, 81
nite dimensional cylinder sets, 16
nite-dimensional distribution, 15
Gaussian process, 22, 104
Gaussian vector, 21, 104
generator of a process, 216
independence, 52, 55
independent processes, 55, 146
innitely divisible process, 137
invariance principle, 32
invariant event, 122
It o stochastic integral, 63, 171
It os formula, 193
jump measure, 144
Khinchins formula, 140
Lebesgue -eld, 3
Levy measure, 140
Levys formula, 140
Markov process, 220
martingale, 78
mean-square dierentiability, 118
mean-square integral, 101
measurable space, 1
modication of a process, 20
multidimensional Wiener process, 186
multiplicative decomposition, 79
normal correlation theorem, 76
normal vectors, 21
number of upcrossings, 87
Ornstein-Uhlenbeck process, 107
Parsevals equality, 49
path, 14
Poisson process, 41, 42, 143
Polish space, 6
positive denite function, 95
predictable functions, 61
probability measure, 2
probability space, 2
processes bounded in probability, 132
random eld, 14
random orthogonal measure, 40
random process, 14
random sequence, 14
random spectral measure, 105
random variable, 4
reference measure, 40
regular measure, 7
relatively weakly compact family, 10
reverse martingale, 80
scalar product, 39
Schees theorem, 89
second-order stationary process, 95
self-similarity, 31, 50
simple stopping time, 195
spectral density, 98
spectral measure, 98
spectral representation, 105
stable processes, 58
standard random orthogonal measure, 108
stationary process, 119
step function, 39
stochastic dierential, 188
stochastic integral, 44
stochastic interval, 195
stochastically continuous process, 132
stopped sequences, 82
stopping time, 54, 81
submartingale, 78
supermartingale, 78
time homogeneous process, 137
trajectory, 14
Walds distribution, 57
Walds identity, 177, 184
weak convergence, 10
white noise, 110
Wiener measure, 30
Wiener process, 52
Wiener process relative to a ltration, 52
230
complete er-field, 3
completion of a er-field, 2
complex Wiener process, 189
conditional expectation, 71
continuous process, 16
continuous-time random process, 14
correlation function, 95
covariance function, 22
cylinder er-field, 16
defining sequence, 39
distribution, 4
Doob's decomposition, 79
Doob's inequality for moments, 85
Doob-Kolmogorov inequality, 84
ergodic process, 125
exchangeable sequence, 119
expectation, 4
exponential martingales, 205
Feller property, 216
filtration of er-fields, 52, 81
finite dimensional cylinder sets, 16
finite-dimensional distribution, 15
Gaussian process, 22, 104
Gaussian vector, 21, 104
generator of a process, 216
independence, 52, 55
independent processes, 55, 146
infinitely divisible process, 137
invariance principle, 32
invariant event, 122
Ito stochastic integral, 63, 171
Ito's formula, 193
jump measure, 144
Khinchin's formula, 140
Lebesgue er-field, 3
Levy measure, 140
Levy's formula, 140
IV1arkov process, 220
martingale, 78
mean-square differentiability, 118
mean-square integral, 101
measurable space, 1
modification of a process, 20
multidimensional Wiener process, 186
multiplicative decomposition, 79
normal correlation theorem, 76
normal vectors, 21
number of upcrossings, 87
Ornstein-Uhlenbeck process, 107
Parseval's equality, 49
path, 14
Poisson process, 41, 42, 143
Polish space, 6
positive definite function, 95
predictable functions, 61
probability measure, 2
probability space, 2
Index
processes bounded in probability, 132
random field, 14
random orthogonal measure, 40
random process, 14
random sequence, 14
random spectral measure, 105
random variable, 4
reference measure, 40
regular measure, 7
relatively weakly compact family, 10
reverse martingale, 80
scalar product, 39
Scheffe's theorem, 89
second-order stationary process, 95
self-similarity, 31, 50
simple stopping time, 195
spectral density, 98
spectral measure, 98
spectral representation, 105
stable processes, 58
standard random orthogonal measure, 108
stationary process, 119
step function, 39
stochastic differential, 188
stochastic integral, 44
stochastic interval, 195
stochastically continuous process, 132
stopped sequences, 82
stopping time, 54, 81
submartingale, 78
supermartingale, 78
time homogeneous process, 137
trajectory, 14
Wald's distribution, 57
Wald's identity, 177, 184
weak convergence, 10
white noise, 110
Wiener measure, 30
Wiener process, 52
Wiener process relative to a filtration, 52
Selected Titles in This Series
(Continued from the front of this publication)
3 William W. Adams and Philippe Loustaunau, An introduction to Grobner bases,
1994
2 Jack Graver, Brigitte Servatius, and Herman Servatius, Combinatorial rigidity,
1993
1 Ethan Akin, The general topology of dynamical systems, 1993
ISBN 0-8218-2985-8
9 780821 829851
GSM/43

(Graduate Studies in Mathematics) N. V. Krylov-Introduction To The Theory of Random Processes-Amer Mathematical Society (2002) PDF

Uploaded by

Copyright:

Available Formats

(Graduate Studies in Mathematics) N. V. Krylov-Introduction To The Theory of Random Processes-Amer Mathematical Society (2002) PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Graduate Studies in Mathematics) N. V. Krylov-Introduction To The Theory of Random Processes-Amer Mathematical Society (2002) PDF

Uploaded by

Copyright:

Available Formats

Introduction

is again a -eld. In other words the

(B) = (B), and the theorem is

:= ([[ )/2) is nite, then by the expectation of we mean

and from (1) applied to f

(x), so that by the dominated convergence theorem

which converges to an element of K.

. In addition, the left hand sides converge to () as m ,

is a function such that y

A. In other words, elements of are dened by specifying

C and > 0. Then obviously

) (C), and the lemma is proved.

is a C-valued random variable, then

(B) T contains all sets

K is uniformly bounded and equicontinuous, i.e.

K and [t s[ , t, s [0, 1].

20 Chapter 1. Generalities, Sec 4

is a C-valued random variable, then the measure

B, B B(C), is called the distribution of

. From (5) and

belongs to one of the sets

f ,= f and the sets x :

n and will stay

n converges weakly to the Gaussian distribution

we denote the distribution of a random vari-

converges weakly to a measure . By Exercise 1.2.10 it only remains to

be another weakly convergent subsequence and its limit. Fix

-measurable function f that satises (1) for some

-measurable functions f that satisfy (2).

are independent and the latter one is a Wiener process.

are independent in the sense that for every

we have P(AB) = P(A)P(B). Furthermore, w

and our particular

t < a), which tends to

. Now we want to use Theorem 5.12. The reader

f(x) dx for every b, then Eg()I

< . With this denition we would not need the

be the orthogonal projection operator of H on

is (-measurable, or at least has a (-

= 0, which by denition means that

after that. Therefore, we say that

as the family of all

needs a justication. Prove that, if for an

= T. Also it is not hard to see that T

T. This -eld is interpreted as the collection

are collected in the following lemma.

:= T), and let E[[ < .

-measurable for all n = 0, 1, 2, ..., ;

by (i). In addition, if and A T

-measurable. Furthermore, for A :=

-measurable. That the

, and prove that (

(a, b) is the monotone limit of upcrossing numbers on [0, N]. By Fatous

(a, b) < (a.s.). The theorem is proved.

the smallest -eld

shows that we may

) and observe that

-measurable as well. Now for each k = 0, 1, 2, ... and A T

) is a nonnegative measure dened on the algebra