Lecture 2006
Lecture 2006
G EORG L INDGREN
October 2006
Faculty of Engineering
Centre for Mathematical Sciences
Mathematical Statistics
Contents
Foreword vii
2 Stochastic analysis 27
2.1 Quadratic mean properties . . . . . . . . . . . . . . . . . . . . . 27
2.2 Sample function continuity . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 Countable and uncountable events . . . . . . . . . . . . . 28
2.2.2 Conditions for sample function continuity . . . . . . . . . 30
2.2.3 Probability measures on C[0, 1] . . . . . . . . . . . . . . . 37
2.3 Derivatives, tangents, and other characteristics . . . . . . . . . . 37
i
ii Contents
2.3.1 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.2 Jump discontinuities and Hölder conditions . . . . . . . . 40
2.4 Quadratic mean properties a second time . . . . . . . . . . . . . 45
2.4.1 Quadratic mean continuity . . . . . . . . . . . . . . . . . 45
2.4.2 Quadratic mean differentiability . . . . . . . . . . . . . . 46
2.4.3 Higher order derivatives and their correlations . . . . . . 48
2.5 Summary of smoothness conditions . . . . . . . . . . . . . . . . . 49
2.6 Stochastic integration . . . . . . . . . . . . . . . . . . . . . . . . 49
2.7 An ergodic result . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3 Crossings 57
3.1 Level crossings and Rice’s formula . . . . . . . . . . . . . . . . . 57
3.1.1 Level crossings . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1.2 Rice’s formula for absolutely continuous processes . . . . 58
3.1.3 Alternative proof of Rice’s formula . . . . . . . . . . . . . 60
3.1.4 Rice’s formula for differentiable Gaussian processes . . . . 62
3.2 Prediction from a random crossing time . . . . . . . . . . . . . . 63
3.2.1 Prediction from upcrossings . . . . . . . . . . . . . . . . . 64
3.2.2 The Slepian model . . . . . . . . . . . . . . . . . . . . . . 67
3.2.3 Excursions and related distributions . . . . . . . . . . . . 73
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Literature 199
Index 202
Contents v
Foreword
The book Stationary and Related Stochastic Processes [9] appeared in 1967.
Written by Harald Cramér and M.R. Leadbetter, it drastically changed the
life of PhD students in Mathematical statistics with an interest in stochastic
processes and their applications, as well as that of students in many other fields
of science and engineering. By that book, they got access to tools and results for
stationary stochastic processes that until then had been available only in rather
advanced mathematical textbooks, or through specialized statistical journals.
The impact of the book can be judged from the fact that still in 1999, after
more than thirty years, it is a standard reference to stationary processes in PhD
theses and research articles.
Unfortunately, the book only appeared in a first edition and it is since long
out of print. Even if many of the more specialized results in the book now
have been superseded by more general results, and simpler proofs have been
found for some of the statements, the general attitude in the book makes it
enjoyable reading both for the student and for the teacher. It will remain a
definite source of reference for many standard results on sample function and
crossings properties of continuous time processes, in particular in the Gaussian
case.
These lecture notes are the results of a series of PhD courses on Stationary
stochastic processes which have been held at the Department of Mathemat-
ical Statistics, Lund University, during a sequence of years, all based on and
inspired by the book by Cramér and Leadbetter. The aim of the notes is to pro-
vide a reasonably condensed presentation of sample function properties, limit
theorems, and representation theorems for stationary processes, in the spirit of
[9]. It must be said, however, that it represents only a selection of the material,
and the reader who has found interest in the present course, should take the
time to read the original.
Even if the Cramér and Leadbetter book is the basic source of inspiration,
other texts have influenced these notes. The most important of these is the
now reprinted book on Probability [5] by Leo Breiman. The Ergodic chapter is
a mixture of the two approaches. The Karhunen-Loève expansion follows the
book by Wong [35]. Finally, the classical memoirs by S.O. Rice [27], have also
been a source of inspiration.
Some knowledge of the mathematical foundations of probability helps while
vii
viii
1
2 Some probability and process background Chapter 1
Figure 1.1: Overview of the three types of worlds in which our processes live.
During the experiment one can record the time evolution of a number of
things, such as rudder angle, which we call {x(t), t ∈ R}, ship head angle,
called {y(t), t ∈ R}, and roll angle {z(t), t ∈ R}. Each observed function is an
observation of a continuous random process. In the figure, the randomness is
indicated by the dependence of the experiment outcome ω . The distributions
of the different processes are Px , Py , Pz – we need one probability measure for
each of the phenomena we have chosen to observe.1
In practice, the continuous functions are sampled in discrete time steps,
t = t1 , . . . , tn , resulting in a finite-dimensional observation vector, (x1 , . . . , xn ),
(n)
with an n-dimensional distribution, Px , etc. This is illustrated in the third
box in the figure.
Since we do not always want to specify a finite value for n, the natural
mathematical model for the practical situation is to replace the middle box,
the sample space C of continuous functions, by the sample space R∞ of infinite
sequences of real numbers (x0 , x1 , . . .). This is close, as we shall see later, really
very close, to the finite-dimensional space Rn , and mathematically not much
more complicated.
Warning: Taking the set C of continuous functions as a sample space and
assigning probabilities Px , etc, on it, is not as innocent as it may sound from
the description above. Chapter 2 deals with conditions that guarantee that a
stochastic process is continuous, i.e. has continuous sample functions. In fact,
these conditions are all on the finite-dimensional distributions.
Summary: The abstract sample space Ω contains everything that can conceiv-
ably happen and is therefore very complex and detailed. Each outcome ω ∈ Ω is
unique, and we need only one comprehensive probability measure P to describe
every outcome of experiment we can do. An experiment is a way to “observe
the world”.
1
The symbol ω is here used to represent then elementary experimental outcome, a practice
that is standard in probability theory. In most part of this book, ω will stand for (angular)
frequency; no confusion should arise from this.
Section 1.2 Events, probabilities and random variables 3
2
See Appendix A.
3
That is, if a set A is in the family F0 , then also the complement A∗ belongs to F0 , etc.
4 Some probability and process background Chapter 1
1.2.2 Probabilities
Probabilities are defined for events, i.e. subsets of a sample space Ω. By a
probability measure is meant any function P defined for every event in a field
F0 , such that
0 ≤ P (A) ≤ 1, P (∅) = 0, P (Ω) = 1,
and such that, first of all, for any finite number of disjoint events A1 , . . . , An
in F0 one has
That is, probabilities are finitely additive. In order to deal with limiting events
and the infinity, they are also required to be countably additive, i.e. equation
(1.1) holds for infinitely many disjoint events, i.e. it holds with n = ∞,
∞
P (∪∞
k=1 Ak ) = P (Ak )
k=1
for some distribution function F . By additivity and the property of fields, one
then also assigns probability to the field F0 of finite unions of intervals.
A natural question to ask is, whether this also produces probabilities to the
events in the σ -field F generated by F0 . In fact, it does, and that in a unique
way:
Section 1.2 Events, probabilities and random variables 5
For outcomes such that f (y) = 0, ϕ(y) can be defined arbitrarily. We write
E(x | y) = ϕ(y).
It satisfies
E(x) = E(ϕ(y)) = E(E(x | y)) = ϕ(y)f (y) dy, (1.5)
y
Theorem 1:1 The best predictor of x given y in least squares sense is given
by ϕ(y), i.e.
E (x − ϕ(y))2 ≤ E (x − ψ(y))2
for every function ψ(y).
T × Ω
(t, ω) → x(t, ω) ∈ R,
such that for fixed t = t0 , x(t0 , ·) is a random variable, i.e. a Borel measurable
function, Ω
ω → x(t0 , ω) ∈ R, and for fixed ω = ω0 , x(·, ω0 ) is a function
T
t → x(t, ω0 ) ∈ R.
The family {Ftn }∞n=1 of finite-dimensional distributions is the family of dis-
tribution functions
F (a1 , . . . , an ; t1 , . . . , tn ) = Prob(x1 ≤ a1 , . . . , xn ≤ an ); n = 1, 2, . . . ; tj ∈ T.
The finite-dimensional distributions in {Ftn }∞
n=1 of a stochastic process sat-
isfy some trivial conditions to make sure they are consistent with each other,
of the type
F (a1 , a2 ; t1 , t2 ) = F (a2 , a1 ; t2 , t1 )
F (a1 , ∞; t1 , t2 ) = F (a1 , t1 ).
By this definition we have the following concepts at our disposal in the three
scenes from Section 1.1:
sample space events probability
abstract space: Ω σ-field F P
continuous functions: C ??? ???
real sequences: R∞ ??? ???
real vectors: Rn Borel sets: Ptn from finite-dimensional dis-
Bn tribution functions Ftn
real line: R Borel sets: B P from a distribution function F
In the table, the ??? indicate what we yet have to define – or even show
existence of – to reach beyond the elementary probability theory, and into the
world of stochastic processes.
Section 1.3 Stochastic processes 9
{y = (x1 , x2 , . . .) ∈ R∞ ; (x1 , x2 , . . . , xn ) ∈ Bn } = Bn × R × R × . . . = Bn × R∞ .
B∞ = σ (∪∞ ∞
n=1 (Bn × R )) .
A particularly simple form of rectangles are the intervals, which are sets of
the form
I = (a1 , b1 ] × (a2 , b2 ] × . . . × (an , bn ] × R∞ ,
where each (aj , bj ] is a half-open interval. Thus, the sequence x = (x1 , x2 , . . .)
belongs to the interval I if
Sets which are unions of a finite number of intervals will be important later;
they form a field, which we denote I . The σ -field generated by I is exactly
B∞ , i.e.
σ(I) = B∞ .
Probabilities on R∞
The next step is to assign probabilities to the events in B∞ , and this can
be done in either of two ways, from the abstract side or from the observable,
finite-dimensional side:
10 Some probability and process background Chapter 1
F = {Ftn }∞
n=1
is given a priori, then one can define probabilities PF for all half-open
n-dimensional intervals in R∞ , by, for n = 1, 2, . . . , taking (cf. (1.2))
more random variables, x1 , x2 , etc. and also find their distribution, by some
statistical procedure. There is no serious difficulty to allow the outcome to be
any real number, and to define probability distributions on R.
When the result of an experiment is a function with continuous parameter,
the situation is more complicated. In principle, all functions of t ∈ T are
potential outcomes, and the sample space of all functions on T is simply too
big to allow any sensible probabilistic structure. There are too many possible
realizations that ask for probability.
Here practice comes to our assistance. In an experiment one can only ob-
serve the values of x(t) at a finite number of times, t1 , t2 , . . . , tn ; with n = ∞
we allow an unlimited series of observations. The construction of processes with
continuous time is built on exactly this fact: the observable events are those
which can be defined by countably many x(tj ), j ∈ mN , and the probability
measure shall assign probabilities to only such events.
Write RT for the set of all real-valued functions of t ∈ T . By an interval in
R is meant any set of functions x(t) which are characterized by finitely many
T
To show the other inclusion we show that the family of sets with countable
basis is a σ -field which contains the intervals, and then it must be at least as
large as the smallest σ -field that contains all interval, namely BT . First, we
note that taking complements still gives a set with countable basis. Then, take
a sequence C1 , C2 , . . ., of sets, all with countable basis, and let T1 , T2 , . . . , Tj =
{t11 , t12 , . . .}, . . . , {tj1 , tj2 , . . .} be the corresponding countable sets of time points,
so that
(j) (j)
Cj = {x ∈ RT ; (x(t1 ), x(t2 ), . . .) ∈ Bj }, with Bj ∈ B∞ .
Then T = ∪j Tj is a countable set, T = (t1 , t2 , . . .), and ∪∞
j=1 Cj is character-
ized by its values on T .
Example 1:3 Here are some examples of function sets with and without count-
able basis, when T = [0, 1]:
• {x ∈ RT ; limn→∞ x(1/n) exists} ∈ BT ,
• {x ∈ RT ; x is a continuous function} ∈
/ BT ,
where the function F (ω) is called the spectral distribution function. It is char-
acterized by the properties:
As indicated by the way we write the three properties, F (ω) is defined only up
to an additive constant, and we usually take F (−∞) = 0. The spectral distri-
bution function is then equal to a cumulative distribution function multiplied
by a positive constant, equal to the variance of the process.
ω
If F (ω) is absolutely continuous with F (ω) = s=−∞ f (s) ds, then the
spectrum is said to be (absolutely) continuous, and f (ω) is the spectral density
function; see Section 5.6.2.1 for more discussion of absolute continuity.
The spectral moments are defined as
∞
ωk = |ω|k dF (ω).
−∞
Note that the odd spectral moments are defined as absolute moments. Since
F is symmetric around 0 the signed odd moments are always 0. Spectral
14 Some probability and process background Chapter 1
0 10 0
-5
0 -5
0 50 100 150 200 250 0 1 2 0 50 100
20 5
5
Moderate
0 10 0
-5
0 -5
0 50 100 150 200 250 0 1 2 0 50 100
4 5
5
Low
0 2 0
-5
0 -5
0 50 100 150 200 250 0 1 2 0 50 100
Figure 1.2: Processes with narrow band spectrum, moderate width JONSWAP
wave spectrum, and low frequency white noise spectrum.
moments may be finite or infinite. As we shall see in the next chapter, the
finiteness of the spectral moments are coupled to the smoothness properties of
the process x(t). For example, the process is differentiable (in quadratic mean),
see Section 2.1, if ω2 = −r (0) < ∞, and similarly for higher order derivatives.
As we shall see in later sections, ω is in a natural way interpreted as an
angular frequency, not to be confused with the elementary event ω in basic
probability theory.
ζ = 0.01
100
0 0
50
-5
0 -5
0 50 100 150 200 250 0 1 2 0 50 100
5
5
15
ζ = 0.1
0 10 0
5
-5
0 -5
0 50 100 150 200 250 0 1 2 0 50 100
6 5
5
ζ = 0.5
4
0 0
2
-5
0 -5
0 50 100 150 200 250 0 1 2 0 50 100
the variance of a · ξ is
V (a · ξ) = a Σa.
If the determinant of Σ is positive, the distribution of ξ is non-singular
and has a density
1 1 −1
fξ(x) = √ e− 2 (x−m) Σ (x−m)
.
(2π)p/2 Σ
Σξξ Σξη
Σ = Cov ((ξ, η); (ξ, η)) = .
Σηξ Σηη
1 1 −1
(x−mξ ,y−mη )
fξη (x, y) = √ e− 2 (x−mξ ,y−mη )Σ .
(2π)(m+n)/2 det Σ
The density of η is
1 1 −1
fη (y) = e− 2 (y−mη )Σηη (y−mη ) .
(2π)m/2 det Σηη
fηξ (y, x)
fξ|η (x | y) = ,
fη (y)
E(ξ | η = y) =
ξ(y) = E(ξ) + Cov(ξ, η) Σ−1
ηη (y − E(η))
= mξ + Σξη Σ−1
ηη (y − mη ) . (1.9)
(1.10)
Σξξ|η = E((ξ −
ξ(η)) · (ξ −
ξ(η)) ) = Σξξ − Σξη Σ−1
ηη Σηξ . (1.11)
1 1 1
φ(x) − 3 ≤ 1 − Φ(x) ≤ φ(x) , (1.13)
x x x
for x > 0.
The following asymptotic expansion is useful as x → ∞,
V (w(t + h) − w(t)) = h σ 2 .
and that therefore Cov(w(s), w(t)) = σ 2 min(s, t). Since the increments over
disjoint intervals are normal, by the definition of a normal process, they are
also independent.
A characteristic feature of the Wiener process is that its future changes are
statistically independent of its actual and previous values. It is intuitively clear
that a process with this property cannot be differentiable.
√ The increment over a
small time interval from t to t + h is of the order h, which is small enough to
make the process continuous, but it is too large to give a differentiable process.5
The sample functions are in fact objects that have fractal dimension, and the
process is self similar in the sense that when magnified with proper scales it
retains its statistical geometrical properties. More precisely, for each a > 0,
√
the process a w(t/a) has the same distributions as the original process w(t).
The Wiener process is commonly used to model phenomena where the local
changes are virtually independent. Symbolically, one use to write dw(t) for the
5
Chapter 2 gives conditions for continuity and differentiability of sample functions.
20 Some probability and process background Chapter 1
4RT
V (x(t)) = V (y(t)) = V (z(t)) = t = t σ2, (1.14)
Nf
1
dv(t) + α v(t) dt = dw(t), (1.15)
m
Section 1.6 Some historical landmarks 21
History
Three years ago to first co-author of the present work collab-
orated with Weinblum in the writing of a paper entitled “On the
motion of ships at sea”. In that paper Lord Rayleigh was quoted
saying: “The basic law of the seaway is the apparent lack of any
law”. Having made this quotation, however, the authors then pro-
ceed to consider the seaway as being composed of “a regular train
of waves defined by simple equations”. This artificial substitution
of pattern for chaos was dictated by the necessity of reducing the
utterly confused reality to a simple form amenable to mathematical
treatment.
Yet at the same time and in other fields the challenging study
of confusion was being actively pursued. Thus in 1945 Rice was
writing on the mathematical analysis of random noise and in 1949
Tukey and Hamming were writing on the properties of stationary
time series and their power spectra in connection with colored noise.
In the same year Wiener published his now famous book on time
series. These works were written as contributions to the theory of
communication. Nevertheless the fundamental mathematical disci-
pline expounded therein can readily be extended to other fields of
scientific endeavor. Thus in 1952 the second co-author, inspired by
a contribution of Tukey, was able to apply the foregoing theories to
the study of actual ocean waves. As the result of analyses of actual
wave records, he succeeded in giving not only a logical explanation
as to why waves are irregular, but a statement as well of the laws
underlying the behavior of a seaway. There is indeed a basic law of
the seaway. Contrary to the obvious inference from the quotation
of Lord Rayleigh, the seaway can be described mathematically and
precisely, albeit in a statistical way.
If Rice’s work had been in the vein of generally accepted ideas in communi-
cation theory, the St Denis and Pierson paper represented a complete revolution
in common naval practice. Nevertheless, its treatment of irregular water waves
as, what now is called, a random field was almost immediately accepted, and
set a standard for much of naval architecture.
Section 1.6 Some historical landmarks 23
One possible reason for this can be that the authors succeeded to formulate
and analyze the motions of a ship that moved with constant speed through the
field in a rational way. The random sea could directly be used as input to a
linear (later also non-linear) filter representing the ship.
St. Denis and Pierson extended the one-dimensional description of a time
dependent process {x(t), t ∈ R}, useful for example to model the waves mea-
sured at a single point, to a random field x(t, (s1 , s2 )) with time and location
parameter (s1 , s2 ). They generalized the sum (1.17) to be a sum of a packet of
directed waves, with ω = (ω, κ1 , κ2 ),
Aω cos(ωt − κ1 s1 − κ2 s2 + φω ). (1.18)
ω
Aω cos(ωt − κs + φω ).
By physical considerations one can derive an explicit relation, called the dis-
persion relation, between wave number κ and frequency ω . If h is the water
depth, then
ω 2 = κg tanh(hκ),
which for infinite depth reduces to ω 2 = κg . Here g is the constant of gravity.
The case of a two-dimensional time dependent Gaussian wave x(t, s1 , s2 ),
the elementary waves with frequency ω and direction θ becomes
H0 : m(t) = 0,
H1 : m(t) = s(t).
Exercises
1:1. Consider the sample space Ω = [0, 1] with uniform probability P , i.e.
P ([a, b]) = b − a, 0 ≤ a ≤ b ≤ 1. Construct a stochastic process y =
(x1 , x2 , . . .) on Ω such that the components are independent zero-one
variables, with P (xk = 0) = P (xk = 1). What is the distribution of
∞ k
k=1 xk /2 ?
Stochastic analysis
Definition 2:1 Let {xn }∞ n=1 be a random sequence, with the random variables
x1 (ω), x2 (ω), . . . defined on the same probability space as a random variable
x = x(ω). Then, the convergence xn → x as n → ∞ can be defined in three
ways:
a.s.
• almost surely, with probability one (xn → x): P ({ω; xn → x}) = 1;
• in quadratic mean (xn → x): E |xn − x|2 → 0;
q.m.
P
• in probability (xn → x): for every > 0, P (|xn − x| > ) → 0.
In Appendix B we give several conditions, necessary and sufficient, as well as
only sufficient, for convergence of a random sequence xn . The most useful of
these involve only conditions on the bivariate distributions of xm and xn . We
shall in this chapter examine such conditions for sample function continuity,
differentiability, and integrability. We shall also give conditions which guar-
antee that only simple discontinuities occur. In particular, we shall formulate
conditions in terms of bivariate distributions, which are easily checked for most
standard processes, such as the normal and the Poisson process.
27
28 Stochastic analysis Chapter 2
for a proof, see Lemma 2.3, page 47. Since ω2 = −r (0) = V (x (t)), the
finiteness of ω2 is necessary and sufficient for the existence of a quadratic
mean derivative. Analogous relations
hold for higher derivatives of order k
and the spectral moments ω2k = ω 2k dF (ω). We will give some more details
on quadratic mean properties in Section 2.4.
does not have a countable basis, and is not a Borel set, i.e. C ∈ BT . If {x(t); t ∈
T } is a stochastic process on a probability space (Ω, F, P ), then the probability
P (C) need not be defined – it depends on the structure of (Ω, F) and on how
complicated x is in itself, as a function on Ω. In particular, even if P (C) is
Section 2.2 Sample function continuity 29
Then y has the same finite-dimensional distributions as x but its sample func-
tions are always discontinuous at τ .
2.2.1.1 Equivalence
In the constructed example, the two processes x and y differ only at a single
point τ , and as we constructed τ to be random with continuous distribution,
we have
P (x(t) = y(t)) = 1, for all t. (2.1)
Two processes x and y which satisfy (2.1) are called equivalent. The sample
paths of two equivalent process always coincide, with probability one, when
looked at a fixed, pre-determined time point. (In the example above, the time
τ where they differed was random.)
2.2.1.2 Separability
The annoying fact that a stochastic process can fail to fulfill some natural
regularity condition, such as continuity, even if it by all natural standards should
be regular, can be partly neutralized by the concept of separability, introduced
by Doob. It uses the approximation by sets with countable basis mentioned
in Section 1.3.3. Loosely speaking, a process {x(t), t ∈ R} is separable in an
interval I if there exists a countable set of t-values T = {tk } ⊂ I such that
the process, with probability one, does not behave more irregularly on I than
it does already on T . An important consequence is that for all t in the interior
of I , there are sequences τ1 < τ2 < . . . τn ↑ t and τ1 > τ2 > . . . τn ↓ t such
that, with probability one,
lim inf n→∞ x(τn ) = lim inf τ ↑t x(τ ) ≤ lim supτ ↑t x(τ ) = lim supn→∞ x(τn ),
with a similar set of relations for the sequence τn . Hence, if the process is
continuous on any discrete set of points then it is continuous. Every process
has an equivalent separable version; see [11].
1
This is where it is necessary that Ω is rich enough so we can define an independent τ .
30 Stochastic analysis Chapter 2
∞
∞
g(2−n ) < ∞ 2n q(2−n ) < ∞,
n=1 n=1
then there exists an equivalent stochastic process y(t) whose sample paths are,
with probability one, continuous on [0, 1].
Proof: Start with the process x(t) with given finite-dimensional distributions.
Such a process exists, and what is questioned is whether it has continuous sam-
ple functions if its bivariate distributions satisfy the conditions in the theorem.
We shall now explicitly construct a process y(t), equivalent to x(t), and with
continuous sample paths. Then y(t) will automatically have the same finite-
dimensional distributions as x(t). The process y(t) shall be constructed as the
limit of a sequence of piecewise linear functions xn (t), which have the correct
distribution at the dyadic time points of order n,
tn(k) = k/2n , k = 0, 1, . . . , 2n ; n = 1, 2, . . . .
(k)
xn (t) = x(t), for t = tn , k = 0, 1, . . . , 2n ,
xn+1
xn
(k) (k+1)
tn tn
(2k) (2k+1) (2k+2)
= tn+1 tn+1 = tn+1
The tail distribution of the maximal difference between two successive ap-
proximations,
1 1
Mn(k) = max |xn+1 (t) − xn (t)| ≤ A + B,
(k)
tn ≤t≤tn
(k+1) 2 2
P (Mn(k) ≥ c) ≤ P (A ≥ c) + P (B ≥ c),
(n)
since if Mn ≥ c, then either A ≥ c or B ≥ c, or both.
Now take c = g(2−n−1 ) and use the bound (2.2), to get
−n−1
P max |xn+1 (t) − xn (t)| ≥ g(2 )
0≤t≤1
2n −1
(k) −n−1
=P Mn ≥ g(2 ) ≤ 2n+1 q(2−n−1 ).
k=0
n+1 q(2−n−1 ) < ∞ by assumption, and then the Borel-Cantelli
Now n2
lemma (see Exercises in Appendix B), gives that, with probability one, only
finitely many of the events
occur. This means that there is a set Ω0 with P (Ω0 ) = 1, such that for every
outcome ω ∈ Ω0 , from some integer N (depending of the outcome, N = N (ω))
and onwards, (n ≥ N ),
First of all, this shows that there exists a limiting function y(t) for all
ω ∈ Ω0 ; the condition (B.4) for almost sure convergence, given in Appendix B,
says that limn→∞ xn (t) exists with probability one.
It also shows that the convergence is uniform: for ω ∈ Ω0 and n ≥ N ,
m > 0,
Letting m → ∞, so that xn+m (t) → y(t), and observing that the inequalities
hold for all t ∈ [0, 1], we get that
∞
∞
max |y(t) − xn (t)| ≤ g(2−n−j ) = g(2−j ).
0≤t≤1
j=0 j=n
t(k
n
n)
≤ t < tn(kn ) + 2−n .
Since both g(h) and q(h) are non-decreasing, we have from (2.2),
(kn )
P x(t(k
n
n)
) − x(t) ≥ g(2−n
) ≤ P x(t n ) − x(t) ≥ g(t − t (kn )
n )
≤ q(t − t(k n)
n ) ≤ q(2
−n
).
and it follows from the Borel-Cantelli lemma that it can happen only finitely
(k )
many times that |x(tn n ) − x(t)| ≥ g(2−n ). Since g(2−n ) → 0 as n → ∞,
(k )
we have proved that x(tn n ) → x(t) with probability one. Further, since y(t)
(k ) (k ) (k )
is continuous, y(tn n ) → y(t). But x(tn n ) = y(tn n ), and therefore the two
limits are equal, with probability one, as was to be proved. 2
The theorem says that for each process x(t) that satisfies the conditions
there exists at least one other equivalent process y(t) with continuous sample
paths, and with exactly the same finite-dimensional distributions. Of course it
seems unnecessary to start with x(t) and immediately change to an equivalent
continuous process y(t). In the future we assume that we only have the contin-
uous version, whenever the sufficient conditions for sample function continuity
are satisfied.
Theorem 2:1 is simple to use, since it depends only on the distribution of the
increments of the process, and involves only bivariate distributions. For special
processes conditions that put bounds on the moments of the increments are
even simpler to use. One such is the following.
Corollary 2.1 If there exist constants C , and r > p > 0, such that for all
small enough h > 0,
|h|
E (|x(t + h) − x(t)|p ) ≤ C (2.3)
| log |h||1+r
then the condition in Theorem 2:1 is satisfied and the process has, with proba-
bility one, continuous sample paths.
34 Stochastic analysis Chapter 2
Note, that many processes satisfy a stronger inequality than (2.3), namely
for some constants C , and c > 0, p > 0. Then (2.3) is automatically satisfied
with any r > p, and the process has, with probability one, continuous sample
paths.
Proof: Markov’s inequality, a generalization of Chebysjev’s inequality, states
that for all random variables U , P (|U | ≥ λ) ≤ E(|U |p )/λp . Apply the theorem
with g(h) = | log |h||−b , 1 < b < r/p. One gets,
C|h|
P (|x(t + h) − x(t)| > g(h)) ≤ .
| log |h||1+r−bp
1
Since b > 1, one has g(2−n ) = (n log 2)b
< ∞, and, with
and 1 + r − bp > 1,
C
2n q(2−n ) = < ∞,
(n log 2)1+r−bp
which proves the assertion. 2
Example 2:1 We show that the Wiener process W (t) has, with probability
one, continuous sample paths. In the standard Wiener process, the increment
W (t + h) − W (t), h > 0, is Gaussian with mean 0 and variance h. Thus,
with C = E(|U |p ) < ∞, for a standard normal variable U , giving the moment
bound
|h|
E(|W (t + h) − W (t)|4 ) = C|h|2 < ,
| log |h||6
for small h. We see that condition (2.3) in the corollary is satisfied with r =
5 > 4 = p. Condition (2.4) is satisfied with p = 3, c = 1/2.
which is a function only of the time lag t. Since the increments have variance
|t|
r(t) = r(0) − O , (2.6)
| log |t||q
for some q > 3, then x(t) has3 continuous sample functions.4
Theorem 2:3 A stationary Gaussian process x(t) has, with probability one,
continuous sample paths if, for some a > 3, any of the following conditions is
satisfied:
r(t) = r(0) − O | log |t||−a , as t → 0, (2.9)
∞
(log(1 + ω))a dF (ω) < ∞. (2.10)
0
3
Or rather ”Is equivalent to a process that has . . . ”
4
The notation f (x) = g(x) + 0(h(x)) as x → 0 means that |(f (x) − g(x))/h(x)| is bounded
by some finite constant C as x → 0 .
36 Stochastic analysis Chapter 2
Remark 2:1 The sufficient conditions for sample function continuity given in
the theorems are satisfied for almost all covariance functions that are encoun-
tered in applied probability. But for Gaussian processes, even the weak condition
(2.9) for a > 3, can be relaxed to require only that a > 1, which is very close
to being necessary; see [9, Sect. 9.5].
Gaussian stationary processes which are not continuous behave with neces-
sity very badly, and it can be shown that the sample functions are, with prob-
ability one, unbounded in any interval. This was shown by Belyaev [4] but is
also a consequence of a theorem by Dobrushin [10], see also [9, Ch. 9.5].
5
At this stage you should convince yourself that α > 2 impossible?
Section 2.3 Derivatives, tangents, and other characteristics 37
as conditions similar to those for sample function continuity, but now with
bounds on the second order differences. By pasting together piecewise linear
approximations by means of smooth arcs, one can prove the following theorem;
see [9, Sect. 4.3].
∞
∞
2n g1 (2−n ) < ∞ and 2n q1 (2−n ) < ∞,
n=1 n=1
K|h|1+p
E(|x(t + h) − 2x(t) + x(t − h)|p ) ≤ , (2.12)
| log |h||1+r
for some constants C , and c > 0, p > 0. Then (2.12) is satisfied, and the
process has, with probability one, continuously differentiable sample paths.
Example 2:3 Condition (2.14) is easy to use for Gaussian processes. Most
covariance functions used in practice have an expansion
ω 2 t2
r(t) = r(0) − + O(|t|a ),
2
where a is an integer, either 3 or 4. Then the process is continuously differ-
entiable. Processes with covariance function admitting an expansion r(t) =
r(0) − C|t|α + o(|t|α ) with α < 2 are not differentiable; they are not even dif-
ferentiable in quadratic mean. An example is the Ornstein-Uhlenbeck process
with r(t) = r(0)e−C|t| .
Theorem 2:8 If there are positive constants C , p, r , such that for all s, t with
0 ≤ t < s < t + h ≤ 1,
then the process {x(t); 0 ≤ t ≤ 1} has, with probability one,7 sample functions
with at most jump discontinuities, i.e.
Example 2:4 The Poisson process with intensity λ has independent incre-
ments and hence
E |x(t + h) − x(s)|2 · |x(s) − x(t)|2
= E(|x(t + h) − x(s)|2 ) · E(|x(s) − x(t)|2 )
= (λ(t + h − s) + (λ(t + h − s))2 ) · (λ(s − t) + (λ(s − t))2 ) ≤ Cλ2 h2 .
since r −ap > 0. The Borel-Cantelli lemma gives that only finitely many events
An = { max |x(tn(k+1) ) − x(tn(k) )| > δn }
0≤k≤2n −1
occur, which means that there exists a random index ν such that for all n > ν ,
|x(tn(k+1) ) − x(tn(k) )| ≤ δn for all k = 0, 1, . . . , 2n − 1. (2.19)
(k)
Next we estimate the increment from a dyadic point tn to an arbitrary
(k) (k+1)
point t. To that end, take any t ∈ [tn , tn ), and consider its dyadic expan-
sion, (αm = 0 or 1),
∞
αm
t = tn(k) + .
2n+m
m=1
(k)
Summing all the inequalities (2.19), we obtain that the increment from tn to
t is bounded (for n > ν ),
∞
δn+1
|x(t) − x(tn(k) )| ≤ δn+m = . (2.20)
m=1
1−δ
The final estimate relates t + h to the dyadic points. Let ν < ∞ be the
random index just found to exist. Then, suppose h < 2−ν and find n, k such
that 2−n ≤ h < 2−n+1 and k/2n ≤ t < (k + 1)/2n . We see that n > ν and
tn(k) ≤ t < tn(k+1) < t + h ≤ tn(k+) ,
where is either 2 or 3. As above, we obtain
δn+1
|x(t + h) − x(tn(k+1) )| ≤ + δn . (2.21)
1−δ
Summing the three estimates (2.19-2.21), we see that
δn+1 δn+1 2 2
|x(t + h) − x(t)| ≤ δn + + + δn = (2−n )a ≤ ha ,
1−δ 1−δ 1−δ 1−δ
for 2−n ≤ h < 2−ν . For h ≥ 2−ν it is always true that
M a
|x(t + h) − x(t)| ≤ M ≤ h
2−ν
for some random M . If we take A = max(M 2ν , 2/(1 − δ)), we complete the
proof by combining the last two inequalities, to obtain |x(t + h) − x(t)| ≤ Aha .
2
Lemma 2.2 Let x(t) be a stochastic process with continuous sample functions
in 0 ≤ t ≤ 1, and let ωx (h) be its (random) continuity modulus, defined by
(2.17). Then, to every > 0 there is a (deterministic) function ω (h) such
that ω (h) ↓ 0 as h ↓ 0, and
Proof: The sample continuity of x(t) says that the continuity modulus tends
to 0 for h → 0,
lim P (ωx (h) < c) = 1
h→0
for every fixed c > 0. Take a sequence c1 > c2 . . . > cn ↓ 0. For a given > 0
we can find a decreasing sequence hn ↓ 0 such that
2.3.2.3 Tangencies
We start with a theorem due to E.V. Bulinskaya on the non-existence of tan-
gents of a pre-specified level.
44 Stochastic analysis Chapter 2
ft (x) ≤ c0 < ∞,
and that x(t) has, with probability one, continuously differentiable sample paths.
Then,
a) for any level u, the probability is zero that there exists a t ∈ [0, 1] such that
simultaneously x(t) = u and x (t) = 0, i.e. there exists no points where x(t)
has a tangent on the level u in [0, 1],
b) there are only finitely many t ∈ [0, 1] for which x(t) = u.
Proof: a) By assumption, x(t) has continuously differentiable sample paths.
We identify the location of those t-values for which x (t) = 0 and x(t) is close
to u. For that sake, take an integer n and a constant h > 0, let Hτ be the
event
Hτ = {x (τ ) = 0} ∩ {|x(τ ) − u| ≤ h},
and define, for k = 1, 2, . . . , n,
Now take a sample function that satisfies the conditions for Ah (k, n) and
let ωx be the continuity modulus of its derivative. For such a sample function,
We now use Lemma 2.2 to bound ωx . If ω(t) ↓ 0 as t ↓ 0, let Bω denote the
sample functions for which ωx (t) ≤ ω(t) for all t in [0, 1]. By the lemma, given
> 0, there exists at least one function ω (t) ↓ 0 such that P (Bω ) > 1 − /2.
For outcomes satisfying (2.23) we use the bound ω , and obtain
n
P (Ah ) ≤ P (Ah (k, n) ∩ Bω ) + (1 − P (Bω ))
k=1
n
≤ P (|x(k/n) − u| ≤ h + n−1 ω (n−1 )) + /2
k=1
Theorem 2:11 A stochastic process x(t) with mean zero is continuous in qua-
dratic mean at t0 if and only if the covariance function r(s, t) is continuous on
the diagonal point s = t = t0 .
Proof: If r(s, t) is continuous at s = t = t0 , then
r(t0 + h, t0 + k) − r(t0 , t0 )
= E((x(t0 + h) − x(t0 )) · (x(t0 + k) − x(t0 )))
+ E((x(t0 + h) − x(t0 )) · x(t0 ))
+ E(x(t0 ) · (x(t0 + k) − x(t0 ))) = e1 + e2 + e3 , say.
Here
e1 ≤ E(|x(t0 + h) − x(t0 )|2 ) · E(|x(t0 + k) − x(t0 )|2 ) → 0,
e2 ≤ E(|x(t0 + h) − x(t0 )|2 ) · E(|x(t0 )|2 ) → 0,
e3 ≤ E(|x(t0 )|2 ) · E(|x(t0 + k) − x(t0 )|2 ) → 0,
so r(t0 + h, t0 + k) → r(t0 , t0 ) as h, k → 0. 2
Proof: For the ”if” part we use the Loève criterion, and show that, if h, k → 0
independently of each other, then
2(r(0)−r(t)) ∞ 2 dF (ω)
Lemma 2.3 a) limt→0 t2
= ω2 = −∞ ω ≤ ∞.
c) If r (0) exists, finite, then ω2 < ∞ and then, by (b), r (t) exists for all
t.
for some θ1 , θ2 as h, k → 0. 2
Theorem 2:13 The cross-covariance between derivatives x(j) (s) and x(k) (t)
of a stationary process {x(t), t ∈ R} is
In particular
In Chapter 3 we will need the covariances between the the process and its
first two derivatives. The covariance matrix for x(t), x (t), x (t) is
⎛ ⎞
ω0 0 −ω2
⎝ 0 ω2 0 ⎠ ,
−ω2 0 ω4
where ω2k = (−1)k rx2k (0) = ω 2k dF (ω) are spectral moments. Thus, the slope
at a specified point is uncorrelated both with the process value at that point and
with the curvature, while process value and curvature have negative correlation.
We have, for example, V (x (t) | x(0), x (0)) = ω4 − ω22 /ω0 .
where g(t) is a deterministic function and x(t) a stochastic process with mean
0. The integrals can be defined either as quadratic mean limits of approximating
Riemann or Riemann-Stieltjes sums, and depending on the type of convergence
we require, the process x(t) has to satisfy suitable regularity conditions.
The two types of integrals are sufficient for our needs in these notes. A
third type of stochastic integrals, needed for stochastic differential equations,
are those of the form
b
J3 = g(t, x(t)) dx(t),
a
50 Stochastic analysis Chapter 2
in which g also is random and dependent on the integrator x(t). These will
not be dealt with here.
The integrals are defined as limits in quadratic mean of the approximating
sums
n
J1 = lim g(tk )x(tk )(tk − tk−1 ),
n→∞
k=1
n
J2 = lim g(tk )(x(tk ) − x(tk−1 )),
n→∞
k=1
Theorem 2:14 a) If r(s, t) is continuous in [a, b] × [a, b], and g(t) is such
that the Riemann integral
Q1 = g(s)g(t)r(s, t) ds dt < ∞,
[a,b]×[a,b]
b
then J1 = a g(t)x(t) dt exists as a quadratic mean limit, and E(J1 ) = 0 and
E(|J1 |2 ) = Q1 .
b) If r(s, t) has bounded variation 8 in [a, b] × [a, b] and g(t) is such that the
Riemann-Stieltjes integral
Q2 = g(s)g(t) ds,t r(s, t) < ∞,
[a,b]×[a,b]
b
then J2 = a g(t) dx(t) exists as a quadratic mean limit, and E(J2 ) = 0 and
E(|J2 |2 ) = Q2 .
Proof: The simple proof uses the Loève criterion (B.8) for quadratic mean
convergence: take two sequences of partitions of [a, b] with points s0 , s2 , . . . , sm
8 P
That f (t) is of bounded variation in [a, b] means that sup |f (tk )−f (tk−1 )| is bounded,
with the sup taken over all possible partitions.
Section 2.6 Stochastic integration 51
Example 2:6 Take x(t) = w(t), as the Wiener process. Since rw (s, t) =
b
σ 2 min(s, t), we see that a g(t)w(t) dt exists for all integrable g(t).
Theorem 2:15 If x(s) and y(t) are stochastic processes with cross-covariance
rx,y (s, t) = Cov(x(s), y(t)),
and if the conditions of Theorem 2:14 are satisfied, then
d b d
b
E g(s)x(s) ds · h(t)y(t) dt = g(s)h(t)rx,y (s, t) ds dt,(2.29)
a c a c
b
b d d
E g(s) dx(s) · h(t) dy(t) = g(s)h(t) ds,t rx,y (s, t). (2.30)
a c a c
52 Stochastic analysis Chapter 2
Theorem 2:16 For the Wiener-process with rx,x (s, t) = min(s, t) one has
Remark 2:2 A natural question is: are quadratic mean integrals and ordinary
integrals equal? If a stochastic process has a continuous covariance function,
and continuous sample paths, with probability one, and if g(t) is, for example,
b
continuous, then a g(t)x(t) dt exists both as a regular Riemann integral and
as a quadratic mean integral. Both integrals are random variables and they are
limits of the same approximating Riemann sum, the only difference being the
mode of convergence – with probability one, and in quadratic mean, respectively.
But then the limits are equivalent, i.e. equal with probability one.
That the sum (2.33) is finite implies, by the Borel-Cantelli lemma and the
T a.s.
Chebysjev inequality, see (B.3) in Appendix B, that Tn−1 0 n x(t) dt → 0, and
so we have showed the convergence for a special sequence of times.
To complete the proof, we have to show that
T Tn
1 1 a.s.
sup x(t) dt − x(t) dt → 0,
Tn ≤T ≤Tn+1 T 0 Tn 0
as n → ∞; see [9, p. 95]. 2
For stationary processes, the theorem yields the following ergodic theorem
about the observed average.
54 Stochastic analysis Chapter 2
T
Theorem 2:18 a) If x(t) is stationary and T1 0 r(t) dt → 0 as T → ∞ then
1 T q.m.
T 0 x(t) dt → 0.
b) If moreover there is a constant K > 0 and a β > 0, such that |r(t)| ≤ |t|Kβ
T
as t → ∞, then T1 0 x(t) dt → 0.
a.s.
Section 2.7 An ergodic result 55
Exercises
2:1. Prove the following useful inequality valid for any non-negative, integer-
valued random variable N ,
1
E(N ) − E(N (N − 1)) ≤ P (N > 0) ≤ E(N ).
2
Generalize it to the following inequalities where
αi = E(N (N − 1) · . . . · (N − i + 1))
2:3. Find the values on the constants a and b that make a Gaussian process
twice continuously differentiable if its covariance function is
2:4. Complete the proof of Theorem 2:3 and show that, in the notation of the
proof,
g(2−n ) < ∞, and 2n q(2−n ) < ∞.
n n
2:5. Show that the sample paths of the Wiener process have infinite variation,
a.s., by showing the stronger statement that if
n −1
2
k+1 k
Yn = W −W
2n 2n
k=0
∞
then n=1 P (Yn < n) < ∞.
2:7. Convince yourself of the “trivial” fact that a sequence of normal variables
{xn , n ∈ Z}, such that E(xn ) and V (xn ) have finite limits, then the
sequence converges in distribution to a normal variable.
56 Stochastic analysis Chapter 2
2:8. Give an example of a stationary process that violates the sufficient condi-
tions in Theorem 2:10 and for which the sample functions can be tangents
of the level u = 1.
2:9. Assume that sufficient conditions on r(s, t) = E(x(s) x(t)) are satisfied
so that the integral
T
g(t)x(t) dt
0
exists for all T , both as a quadratic mean integral and as a sample function
integral. Show that, if
∞
|g(t)| r(t, t) dt < ∞,
0
∞
then the generalized integral 0 g(t)x(t) dt exists as a limit as T → ∞,
both in quadratic mean and with probability one.
2:10. Let (xn , yn ) have a bivariate Gaussian distribution with mean 0 , vari-
ance 1, and correlation coefficient ρn .
1
a) Show that P (xn < 0 < yn ) = 2π arccos ρn .
b) Calculate the conditional density functions for
(xn + yn ) | xn < 0 < yn , and (yn − xn ) | xn < 0 < yn .
c) Let zn and un be distributed with the in (b) derived density
func-
tions and assume that ρn → 1 as n → ∞. Take cn = 1/ 2(1 − ρn ),
and show that the density functions for cn zn and cn un converge to
density functions f1 and f2 , respectively.
Hint: f2 (u) = u exp(−u2 /2), u > 0 is the Rayleigh density.
2:11. Let {x(t), t ∈ R} be a stationary Gaussian process with mean 0, and
with a covariance function that satisfies
−r (t) = −r (0) + o(|t|a ), t → 0,
for some a > 0. Define xn = x(0), yn = x(1/n), ρn = r(1/n) and use
the previous exercise to derive the asymptotic distribution of
x(1/n) − x(0)
1/n x(0) < 0 < x(1/n)
as n → ∞. What conclusion do you draw about the derivative at a
point with an upcrossing of the zero level? (Answer: it has a Rayleigh
distribution, not a half normal distribution.)
2:12. Find an example of two dependent normal random variables U and V
such that C(U, V ) = 0; obviously you cannot let (U, V ) have a bivariate
normal distribution.
2:13. Prove that Theorem 2:18 follows from Theorem 2:17.
Chapter 3
Crossings
57
58 Crossings Chapter 3
x(t) is strictly less than u immediately to the left and strictly greater than u
immediately to the right of the upcrossing point. Also define
NI = NI (x, u) = the number of t ∈ I such that x(t) = u.
By the intensity of upcrossings we mean any function μ+
t (u) such that
μ+ +
t (u) dt = E(NI (x, u)).
t∈I
Theorem 3:1 (Rice’s formula) For any stationary process {x(t), t ∈ R} with
density fx(0) (u), the crossings and up-crossings intensities are given by
∞
μ(u) = E(N[0,1] (x, u)) = |z|fx(0),x (0) (u, z) dz
−∞
These expressions hold for almost any u, whenever the involved densities exist.
Before we state the short proof we shall review some facts about functions
of bounded variation, proved by Banach. To formulate the proof, write for any
continuous function f (t), t ∈ [0, 1], and interval I = [a, b] ⊂ [0, 1],
Lemma 3.1 (Banach) For any continuous function f (t), t ∈ I , the total vari-
ation is equal to
∞
NI (f, u) du.
−∞
Taking expectations and using Fubini’s theorem to change the order of integra-
tion and expectation, we get
∞
|I| μ(u) du = 1A (u) E(NI (x, u)) du
u∈A −∞
=E 1A (x(t)) |x (t)| dt = |I| E 1A (x(0)) |x (0)|
I
= |I| fx(0) (u) E(|x (0)| | x(0) = u) du;
u∈A
Theorem 3:2 For a stationary process {x(t), t ∈ R} with almost surely con-
tinuous sample paths, suppose x(0) and ζn = 2n (x(1/2n ) − x(0)) have a joint
density gn (u, z) which is continuous in u for all z and all sufficiently large n.
Also suppose gn (u, z) → p(u, z) uniformly in u for fixed z as n → ∞ and that
∞
gn (u, z) ≤ h(z) with 0 zh(z) dz < ∞.2 Then
∞
+
μ (u) = E(N[0,1] (x, u)) = zp(u, z) dz. (3.4)
0
{x(0) < u < x(0) + ζn /2n } = {x(0) < u} ∩ {ζn > 2n (u − x(0))},
we have
Remark 3:1 The proof of Rice’s formula as illustrated in Figure 3.1 shows the
relation to the Kac and Slepian horizontal window conditioning: one counts the
number of times the process passes through a small horizontal window; we will
dwell upon this concept in Section 3.2.1, page 65.
The integral ∞
μt (u) = |z|fx(t),x (t) (u, z) dz
−∞
is the local crossings intensity at time t.
62 Crossings Chapter 3
x (t) = z
x(t + Δt) x (t) 6
zΔt
≈ x(t) + zΔt z ! "# $
x(t)
- -
t t + Δt u x(t)
Simple integration of (3.1) and (3.2) gives that for Gaussian stationary pro-
cesses,
%
1 ω2 −(u−m)2 /2ω0
μ(u) = E(N[0,1] (x, u)) = e ,
π ω0
%
+ + 1 ω2 −(u−m)2 /2ω0
μ (u) = E(N[0,1] (x, u)) = e ,
2π ω0
which are the original forms of Rice’s formula. These formulas hold regardless
of whether ω2 is finite or not, so ω2 = ∞ if and only if the expected number
of crossings in any interval is infinite. This does not mean, however, that there
necessarily are infinitely many crossings, but if there is crossings, then there
may be infinitely many in the neighborhood.
3
If you have not seen this before, prove it by showing that x(t) and x (t) = limh→0 (x(t +
h) − x(t))/h are uncorrelated.
Section 3.2 Prediction from a random crossing time 63
Remark 3:3 The expected number of mean-level upcrossings per time unit in
a stationary Gaussian process is
&
1 1 ω 2 f (ω) dω
μ+ (m) = ω2 /ω0 = ,
2π 2π f (ω) dω
and it is called the (root) mean square frequency of the process. The inverse
is equal to the long run average time distance between successive mean level
upcrossings, 1/μ+ (m) = 2π ω0 /ω2 , also called the mean period.
A local extreme, minimum or maximum, for a differentiable stochastic pro-
cess {x(t), t ∈ R} corresponds to, respectively, an upcrossing and a downcross-
ing of the zero level by the process derivative {x (t), t ∈ R}. Rice’s formula
applied to x (t) therefore gives the expected number of local extremes. For
a Gaussian process the formulas involve the fourth spectral moment ω4 =
V (x (t)) = ω 4 f (ω) dω . The general and Gaussian expressions are, respec-
tively,
∞ %
1 ω4
μmin = zfx ,x (0, z) dz = ,
0 2π ω2
0 %
1 ω4
μmax = |z|fx ,x (0, z) dz = .
−∞ 2π ω2
If we combine this with Remark 3:3 we get the average number of local
maxima per mean level upcrossings,
1
%
2π ω4 /ω2 ω0 ω4
1/α = 1 = .
2π ω2 /ω0 ω22
P (x(t0 + τ ) ≤ v | x(t0 ) = u)
= lim P (X(t0 + τ ) ≤ v | u − ≤ x(t0 ) ≤ u), (3.9)
→0
E(x(t0 + τ ) | x(t0 ) = u)
= lim E(X(t0 + τ ) | u − ≤ x(t0 ) ≤ u), (3.10)
→0
calculated at time t0 , and (3.10) gives the best predictor of the future value
x(t0 + τ ) in the sense that it minimizes the squared error taken as an average
over all the possible outcomes of x(t0 ). By ”average” we then mean expected
value as well as an empirical average over many realizations observed at the
fixed predetermined time t0 , chosen independently of the process.
We now consider prediction from the times of upcrossings of a fixed level u. This
differs from the previous type of conditioning in that the last observed value
of the process is known, and the time points are variable. The interpretation
of ”average future value” is then not clear at this moment and has to be made
precise. Obviously, what we should aim at is a prediction method that works
well on the average, in the long run, for all the u-level upcrossings we observe
in the process. Call these upcrossings time points tk > 0. To this end, define
the following distribution.
Definition 3:1 For a stationary process {x(t), t ∈ R}, the (long run, ergodic)
conditional distribution of x(t0 + ·) after u-upcrossing at t0 is defined as
Thus, in P u (A) is counted all those u-upcrossings tk for which the process,
taken with tk as new origin, satisfies the condition given by A.
The definition makes sense only if the limit exists, but, as we shall prove
in Chapter 5, the limit exists for every stationary process {x(t), t ∈ R}, but it
may be random. If the process is ergodic the limit is non-random and it defines
a proper distribution on C for a (non-stationary) stochastic process.
The empirical long run distribution is related to Kac and Slepian’s hori-
zontal window conditioning, [19]. For a stationary process {x(t), t ∈ R}, the
(horizontal window) conditional distribution of x(t0 + ·) after u-upcrossing at
66 Crossings Chapter 3
−5
0 20 40 60 80 100
−5
0 2 4 6 8 10
Figure 3.2: Excursions above u = 1.5 contribute to the distribution P 1.5 (·).
t0 is
P hw (A) = P (x(t0 + ·) ∈ A | x(t0 ) = u, upcrossing in h.w.sense) (3.12)
= lim P (x(t0 + ·) ∈ A | x(s) = u, upcrossing, some s ∈ [t0 − , t0 ]).
→0
It is easy to show that the two distributions just defined are identical, P u (A) =
P hw (A), for every (ergodic) stationary process such that there are only a finite
number of u-upcrossings in any finite interval; see [9].
With point process terminology one can look at the upcrossings as a se-
quence of points in a staionary point process and the shape of the process
around the upcrossing as a mark, attached to the point. The conditional distri-
bution of the shape is then treated as a Palm distribution in the marked point
process; see [9].
The term, horizontal window condition, is natural, since the process has
to pass a horizontal window at level u somewhere near t0 . In analogy, the
condition in (3.9) is called vertical window condition, since the process has to
pass a vertical window near v exactly at time t0 .
The distribution P u can be found via its finite-dimensional distribution
functions. Take s = (s1 , . . . , sn ), v = (v1 , . . . , vn ), write
x(s) ≤ v for x(sj ) ≤ vj , j = 1, . . . , n,
and define
N[0,T ] (x, u; s, v) = #{tk ; 0 ≤ tk ≤ T, and x(tk + s) ≤ v},
as the number of u-upcrossings in [0, T ] which are such that the process, at
N ] (x,u;s,v)
each of the times sj after the upcrossing is less than vj . Thus [0,T
N (x,u) , v ∈
[0,T ]
Section 3.2 Prediction from a random crossing time 67
Proof: We need a result from Chapter 5, namely that for an ergodic process
(with probability one)
N[0,T ] (x, u)
→ E N[0,1] (x, u)
T
N[0,T ] (x, u; s, v)
→ E N[0,1] (x, u; s, v)
T
as T → ∞. This gives (3.13).
The proof of (3.14) is analogous to that of Theorem 3:2 and can be found
in [22, Ch. 10] under the (unnecessarily strict) condition that {x(t), t ∈ R} has
continuously differentiable sample paths. 2
Noting that
zfx(0),x (0) (u, z)
pu (z) = ∞ , z ≥ 0, (3.15)
ζ=0 ζfx(0),x (0) (u, ζ) dζ
• Crest shape: What is the shape of the process near its local maxima?
u (t0 + τ )
We can immediately solve the first problem: the best predictor x
after u-upcrossings is the expectation of the Slepian model:
u (t0 + τ ) = E (ξu (τ )) ,
x (3.17)
in the sense that the average of (x(tk + τ ) − a)2 , when tk runs over all u-
upcrossings, takes it minimum value when a = E (ξu (τ )).
z −z 2 /ω2
pu (z) = e , z ≥ 0, (3.18)
ω2
say. Here
Theorem 3:4 a) The Slepian model for a Gaussian process {x(t), t ∈ R} after
u-upcrossings has the form
ur(t) ζr (t)
ξu (t) = − + κ(t), (3.20)
ω0 ω2
2
where ζ has the Rayleigh density pζ (z) = (z/ω2 )e−z /(2ω2 ) , and {κ(t), t ∈ R}
is a non-stationary Gaussian process, independent of ζ , with mean zero and
covariance function rκ (s1 , s2 ) given by (3.19).
b) In particular, the best prediction of x(tk + τ ) taken over all u-upcrossings
tk , is obtained by taking E(ζ) = πω2 /2 and κ (τ ) = 0 in (3.20) to get
%
ur(τ ) E(ζ)r (τ ) u π
(t0 + τ ) =
x u
− = r(τ ) − r (τ ). (3.21)
ω0 ω2 ω0 2ω2
We have now found the correct way of taking the apparent positive slope at
a u-upcrossing into account in predicting the near future. Note that the simple
formula (3.6),
u
(t0 + τ ) = E(x(t0 + τ ) | x(t0 ) = u) =
x r(τ ),
ω0
in the Gaussian case, lacks the slope term. Of course, the slope at an upcrossing
is always positive, but it is perhaps intuitively obvious that the observed slopes
at the upcrossings are “more positive” than that. The slope = derivative of a
stationary Gaussian process is normal with mean zero, but we do not expect
slopes at fixed level upcrossings to have a half-normal distribution with a mode
(= most likely values) at 0. The r -term in (3.21) tells us how we shall take
this “sample bias” into account.
The prediction of the slope is the expectation of the Rayleigh variable ζ . If
the slope at the u-upcrossing is observed and used in the prediction, then the
difference between the two approaches disappears; see [23].
Example 3:1 To illustrate the efficiency of the Slepian model, we shall analyse
the shape of an excursion above a very high level u in a Gaussian process, and
then expand the Slepian model ξu (t) in a Taylor series as u → ∞. It will turn
out that the length and height of the excursion will both be of the order u−1 ,
so we normalize the scales of ξu (t) by that factor. Using
t2 t
r(t/u) = ω0 − ω2 2
(1 + o(1)), r (t/u) = −ω2 (1 + o(1)),
2u u
as t/u → 0, and that κ(t/u) = o(t/u), and omitting all o-terms, we get
r(t/u) r (t/u) ω 2 t2
u {ξu (t/u) − u} = u u −1 −ζ + κ(t/u) ≈ ζt − .
ω0 ω2 2ω0
Thus, the excursion above a high leve u takes the form of a parabola with
ζ 2 ω0
height 2uω 2
and length 2ω 0ζ
uω2 . It is easy to check that the normalized height of
the excursion above u has an exponential distribution.
Section 3.2 Prediction from a random crossing time 71
Remark 3:4 One should be aware that a Slepian model as it is described here
represents the “marginal distribution” of the individual excursions above the
defined level u. Of course one would like to use it also to analyse the dependence
there may exist between successive excursions in the original process x(t), and
this is in fact possible. For example, suppose we want to find how often it
happens that two successive excursions both exceed a critical limit T0 in length.
Then, writing τ1 = inf{τ > 0; ξu (τ ) = u, upcrossing} for the first u-upcrosing
in ξu (t) strictly on the positive side, one can calculate
P (ξu (s) > u, for 0 < s < T0 , and ξu (τ1 + s) > u, for 0 < s < T0 ).
Theorem 3:5 If {x(t), t ∈ R} is twice differentiable and ergodic the long run
empirical distribution of x(tk + s) around local maxima is equal to
P1max (A)
0
z=−∞ |z|fx (0),x (0) (0, z)P (x(s) ≤ v | 0, z) dz
= 0 ,
−∞ |z|fx (0),x (0) (0, z) dz
r (t)
ξ1max (t) = ζ1max + Δ1 (t), (3.22)
ω4
where ζ1max has a negative Rayleigh distribution with density
|z| −z 2 /(2ω4 )
pmax
1 (z) = e , z < 0, (3.23)
ω4
72 Crossings Chapter 3
with standard Rayleigh and normal variables, illustrating the relevance of the
√
spectral width parameter α = 1 − 2 = ω22 /(ω0 ω4 ).
Theorem 3:5 is the basis for numerical calculations of wave characteristic
distributions like height and time difference between local maxima and minima,
as they can be made by the routines in the Matlab package WAFO, [34].
The model (3.22) contains an explicit function r (t)/ω4 with a simple random
factor, representing the Rayleigh distributed curvature at the maximum, plus
a continuous parameter Gaussian process. The numerical procedures work by
successively replacing the continuous parameter process by explicit functions
multiplied by random factors. We illustrate now the first steps in this procedure.
The model (3.22) contains the random curvature, and it is the simplest
form of the Slepian model after maximum. There is nothing that prevents us
to include also the random height of the local maximum in the model. We have
seen in (3.25) how the height and the curvature depend on each other, so we
can build an alternative Slepian model after maximum that explicitly includes
both the height of the maximum and the curvature.
To formulate the extended model we define three functions, A(t), B(t), C(t),
by
The conditional covariance between x(s1 ) and x(s2 ) are found from the same
theorem, and the explicit expression is given in the following Theorem 3:6.
As we have seen in Section 2.4.3 the derivative x (0) is uncorrelated with
both x(0) and x (0), but x(0) and x (0) are correlated. To formulate the
effect of observing a local maximum we will first introduce the crest height,
Section 3.2 Prediction from a random crossing time 73
x(0), and then find the conditional properties of x (0) given x(0) and x (0).
We use Theorem 2:13 and define the function
Cov(x(t), x (t) | x(0), x (0)) r (t) + (ω2 /ω0 )r(t)
b(t) = = .
V (x (0) | x(0), x (0)) ω4 − ω22 /ω0
Theorem 3:6 If {x(t), t ∈ R} is twice differentiable and ergodic the long run
empirical distribution of x(tk + s) around local maxima is equal to
P2max (A)
∞ 0
u=−∞ z=−∞ |z|fx(0),x (0),x (0) (u, 0, z)P (x(s) ≤ v | u, 0, z) dz du
= ∞ 0 ,
u=−∞ −∞ |z|fx(0),x (0),x (0) (u, 0, z) dz du
where the random (η2max , ζ2max ) has the two-dimensional density (with normal-
izing constant c),
ω0 z 2 + 2ω2 uz + ω4 u2
p2 (u, z) = c|z| exp −
max
, −∞ < u < ∞, z < 0.
2(ω0 ω4 − ω22 )
numerically. The only problems are the residual processes which require infinite
dimensional probabilities to be calculated. To overcome this in a numerical
algorithm one can use a successive conditioning technique that first introduces
the value of the normal residual at a single point, say, κ(s1 ), and include that
as a separate term in the model. The residual process will be correspondingly
reduced and the procedure repeated.
For numerical calculations of interesting crossings probabilities one can trun-
cate the conditioning procedure when sufficient accuracy is attained. This ap-
proximation technique is called regression approximation in crossing theory.
where pu (z) is the Rayleigh density for the derivative at u-upcrossing, and the
expectation
ur(s) zr (s)
E(Iz (κ, t)) = P inf − + κ(s) > u
0<s<t ω0 ω2
where ξu (t)− = min(0, ξu (t)) is the negative part of the derivative. The expec-
tation can be calculated by mean of algorithms from WAFO, by means of the
regression technique with successive conditioning on the residual process.
Section 3.2 Prediction from a random crossing time 75
0.35
Levels (from left to right):
0.3 2, 1, 0, −1
0.25
0.2
0.15
0.1
0.05
0
0 5 10 15
period [s]
Figure 3.3: Probability densities for excursions above u = −1, 0, 1, 2 for pro-
cess with North Sea wave spectrum Jonswap.
r (t)
ξ1max (t) = ζ1max + Δ1 (t),
ω4
which completely describes the stochastic properties of the shape around maxi-
mum. The simplest, zero order, approximation is to delete the residual process
Δ1 (t) completely, only keeping the curvature dependent term ζ1max r ω(t) . By
4
max
replacing ζ1 by its average − ω4 π/2. From this we can, for example, get
the average shape, as
√ r (t)
ξ
max (t) = − ω α
0 .
ω2
The zero order approximation is usually too crude to be of any use. A
better approximation is obtained from the model (3.26), which also includes
the (random) height at the maximum point,
and define the random variable T as the time of the first local minimum of
ξ2max (t), t > 0. The height drop is then H = ξ2max (0) − ξ2max (T ) and we ask for
the joint distribution of T and H .
Using the fact that A(0) = 1, C(0) = 0 and ξ2max (T ) = 0, we get the
following relations that need to be satisfied,
1 − A(t) C(t)
G(t) =
A (t) C (t)
and write the equations (3.28) and (3.29) as ( T for matrix transpose),
If det G(T r ) = 0 we get from (η2max ζ2max )T = G(T r )−1 (H r 0)T that the vari-
ables with known distribution (η2max and ζ2max ) are simple functions of the
variables with unknown distribution,
where
−C (t) −A (t)
p(t) = , q(t) = .
A (t) (1 − A(t))C (t) − A (t)C(t)
We want the density at the point T r = t, H r = h; let ξ(t, h), ζ(t, h) be
the corresponding solution and define the indicator function I(t, h) to be 1
if the approximating process ξ(t, h)A(s) + ζ(t, h)(s) is strictly decreasing for
0 < s < t.
The Jacobian for the transformation is J(t, h) = hp (t)q(t)2 , and therefore
the density of T r , H r is
6 12
−2 4
−4 2
−6 0
0 50 100 150 200 250 0 2 4 6 8
Figure 3.4: Probability density for T, H for process with North Sea wave spec-
trum Jonswap together with 343 observed cycles.
This form
of the T, H distribution is common in the technical literature, where
Tm = π ω2 /ω4 is called the mean half wave period. Note that dependence
) on
ω2
the spectrum is only through the spectral width parameter = 1 − ω0 ω2 4 =
√
1 − α2 .
This first order approximation of the T, H -density is not very accurate, but
it illustrates the basic principle of the regression approximation. The WAFO
toolbox, [34], contains algorithms for very accurate higher order approxima-
tions. Figure 3.4 shows the result for a process with a common North Sea
Jonswap spectrum.
78 Crossings Chapter 3
Exercises
3:1. Prove that κ(0) = κ (0) = 0 in the Slepian model after upcrossing.
3:2. Formulate conditions on the covariance function rx (t) that guarantee that
the residual process κ(t) has differentiable sample paths.
This chapter deals with the spectral representation of weakly stationary pro-
cesses – stationary in the sense that the mean is constant and the covariance
Cov(x(s), x(t)) only depends on the time difference t − s. For real-valued
Gaussian processes, the mean and covariance function determines all finite-
dimensional distributions, and hence the entire process distribution. However,
the spectral representation requires complex-valued processes, and then one
needs to specify also the correlation structure between the real and the imag-
inary part of the process. We therefore start with a summary of the basic
properties of complex-valued processes, in general, and in the Gaussian case.
We remind the reader of the classical memoirs by S.O. Rice, [27], which can be
recommended to anyone with the slightest historical interest. That work also
contains many old references.
79
80 Spectral- and other representations Chapter 4
is Hermitian, i.e.
r(−τ ) = r(τ ).
For real-valued processes, the covariance function r(τ ) determines all co-
variances between x(t1 ), . . . , x(tn ),
⎛ ⎞
r(0) r(t1 − t2 ) . . . r(t1 − tn )
⎜ .. .. .. .. ⎟
Σ(t1 , . . . , tn ) = ⎝ . . . . ⎠ (4.1)
r(tn − t1 ) r(tn − t2 ) . . . r(0)
⎛ ⎞
V (x(t1 )) C(x(t1 ), x(t2 )) . . . C(x(t1 ), x(tn ))
⎜ .. .. .. .. ⎟
=⎝ . . . . ⎠.
C(x(tn ), x(t1 )) C(x(tn ), x(t2 )) . . . V (x(tn ))
Theorem 4:2 A complex normal process x(t) = y(t) + iz(t) with mean zero is
strictly stationary if and only if the two functions
r(s, t) = E(x(s)x(t))
q(s, t) = E(x(s)x(t)),
only depend on t − s.
Proof: To prove the ”if” part, express r(s, t) and q(s, t) in terms of y and z ,
r(s, t) = E(y(s)y(t) + z(s)z(t)) + iE(z(s)y(t) − y(s)z(t)),
q(s, t) = E(y(s)y(t) − z(s)z(t)) + iE(z(s)y(t) + y(s)z(t)).
Since these only depend on t−s, the same is true for the sums and differences of
their real and imaginary parts, i.e. for E(y(s)y(t)), E(z(s)z(t)), E(z(s)y(t)),
E(y(s)z(t)). Therefore, the 2n-dimensional distribution of
y(t1 ), . . . , y(tn ), z(t1 ), . . . , z(tn )
only depends on time differences, and x(t) is strictly stationary. The converse
is trivial. 2
Example 4:1 If x(t) is a real and stationary normal process, and μ is a con-
stant, then
x∗ (t) = eiμt x(t)
is a weakly, but not strictly, stationary complex normal process,
The function F (ω) is the spectral distribution function of the process, and it
has all the properties of a statistical distribution function except that F (+∞) −
F (−∞) = r(0) need not be equal to one. The function F (ω) is defined only up
to an additive constant, and one usually takes F (−∞) = 0.
Proof: The ”if” part is clear, since if r(t) = exp(iωt) dF (ω), then
zj zk r(tj − tk ) = zj zk eiωtj · e−iωtk dF (ω)
j,k j,k
= zj eiωtj zk eiωtk dF (ω)
j,k
2
= zj e dF (ω) ≥ 0.
iωtj
j
For the ”only if” part we shall use some properties of characteristic func-
tions, which are proved elsewhere in the probability course. We shall show that,
given r(t), there exists a proper distribution function F∞ (ω) = F (ω)/F (∞)
such that
F∞ (∞) − F∞ (−∞) = 1,
r(t)
eiωt dF∞ (ω) = .
r(0)
To this end, take a real A > 0, and define
A A
1
g(ω, A) = r(t − u) e−iω(t−u) dt du
2πA 0 0
1
= lim r(tj − tk )e−iωtj e−iωtk Δtj Δtk
2πA
j,k
1
= lim r(tj − tk )Δtj e−iωtj · Δtk e−iωtk ≥ 0,
2πA
j,k
where
1 − |t| for |t| ≤ 1
μ(t) =
0 otherwise.
|t|
lim 1 − r(t) = r(t). (4.6)
A→∞ A
Here,
∞ 2M
−iωt |ω|
μ(ω/2M )e dω = 1− e−iωt dω
−∞ −2M 2M
2M
2
|ω| sin M t
= 1− cos ωt dω = 2M ,
−2M 2M Mt
so (4.7) is equal to
∞
2 ∞
2
M sin M t 1 sin s
μ(t/A)r(t) dt = μ(s/M A)r(s/M ) ds
π −∞ Mt π −∞ s
∞
2
1 sin s
≤ r(0) ds = r(0).
π −∞ s
We have now shown that g(ω, A) and μ(t/A)r(t) are both absolutely in-
tegrable over the whole real line. Since they form a Fourier transform pair,
i.e. ∞
1
g(ω, A) = μ(t/A)r(t)e−iωt dt,
2π −∞
we can use the Fourier inversion theorem, which states that
∞
μ(t/A)r(t) = g(ω, A)eiωt dω,
−∞
For step (3) we need one of the basic lemmas in probability theory, the
convergence properties of characteristic functions: if FA (x) is a family of dis-
tribution functions with characteristic functions φA (t), and φA (t) converges to
a continuous function φ(t), as A → ∞, then there exists a distribution func-
tion F (x) with characteristic function φ(t) and FA (x) → F (x), for all x where
F (x) is continuous.
Here the characteristic functions φA (t) = μ(t/A)
r(0) r(t) converge to φ(t) =
r(t)/r(0), and since we have assumed r(t) to be continuous, we know from the
86 Spectral- and other representations Chapter 4
x
basic lemma that FA (x) = −∞ fA (ω) dω converges to a distribution function
F∞ (x) as A → ∞, with characteristic function φ(t):
∞
r(t)
= eiωt dF∞ (ω).
r(0) −∞
then the spectrum is absolutely continuous and the Fourier inversion formula
holds, ∞
1
f (ω) = e−iωt r(t) dt. (4.10)
2π −∞
Remark 4:1 The inversion formula (4.9) defines the spectral distribution for
all continuous covariance functions. One can also use (4.10) to calculate the
spectral density in case r(t) is absolutely integrable, but if it not, one may use
4.9) and take f(ω) = limh→0 (F(ω + h) − F(ω))/h. This is always possible, but
one has to be careful in case f (ω) is not continuous. Even when the limit f(ω)
exists it need not be equal to f (ω) as the following example shows. The limit,
which always exists, is called the Cauchy principal value.
1
Note that F (ω) can have only a denumerable number of discontinuity points.
Section 4.2 Bochner’s theorem and the spectral distribution 87
Example 4:2 We use (4.9) to find the spectral density of low frequency white
noise, with covariance function r(t) = sint t . We get
T −i(ω+h)t
F(ω + h) − F(ω − h) 1 1 e − e−i(ω−h)t sin t
= lim dt
2h 2π 2h T →∞ −T −it t
∞
1 sin ht sin t
= e−iωt dt
2π −∞ ht t
⎧
⎪ 1
⎪
⎨ 2, for |ω| < 1 − h,
1
=
⎪ 4 (1 + (1 − |ω|)/h), for 1 − h < |ω| < 1 + h,
⎪
⎩ 0, for |ω| > 1 + h.
The limit as h → 0 is 1/2, 1/4, and 0, respectively, which gives as spectral
density ⎧
⎪
⎪ 1/2, for |ω| < 1,
⎨
f (ω) = 1/4, for |ω| = 1,
⎪
⎪
⎩ 0, for |ω| > 1.
Note that the Fourier inversion formula (4.10) gives 1/4 for ω = 1 as the
Cauchy principal value,
T T
1 −iωt 1 sin 2t
lim e r(t) dt = lim dt = 1/4.
T →∞ 2π −T T →∞ 2π −T 2t
and the inversion formula (4.9) gives that for any continuity points
2 ∞ sin ωt
G(ω) = F (ω) − F (−ω) = r(t) dt.
π 0 t
∞
Note that −∞ sin ωt dF (ω) = 0, since F is symmetric.
88 Spectral- and other representations Chapter 4
This means that all frequencies ω + 2kπ for k = 0 are lumped together with
the frequency ω and cannot be individually distinguished. This is the aliasing
or folding effect of sampling a continuous time process.
For a stationary sequence {xn , n ∈ Z}, the covariance function r(t) is
defined only for t ∈ Z. Instead of Bochner’s theorem we have the following
theorem, in the literature called Herglotz’ lemma.
Note that the spectrum is defined over the half-open interval to keep the right-
continuity of F (ω). It is possible to move half the spectral mass in π to −π
without changing the representation (4.12).
The inversion theorem states that if ∞ t=−∞ |r(t)| < ∞ then the spectrum
is absolutely continuous with spectral density given by
∞
1 −iωt
f (ω) = e r(t),
2π t=−∞
1
T
1 e−iω2 t − e−iω1 t
F (ω2 ) − F (ω1 ) = r(0)(ω2 − ω1 ) + lim r(t) ,
2π T →∞ 2π
t=−T,
−it
t=0
where as before F(ω) is defined as the average of left and right hand side limits
of F (ω).
Section 4.3 Spectral representation of a stationary process 89
where ωk > 0 are fixed frequencies, while Ak are random amplitudes, and
φk random phases, uniformly distributed in (0, 2π) and independent of the
Ak . The uniformly distributed phases make the process stationary, and its
spectrum is discrete, concentrated at {ωk }. The covariance function and one-
sided spectral distribution function are, respectively,
r(t) = E(A2k /2) cos ωk t,
k
G(ω) = E(A2k /2), ω > 0.
k;ωk ≤ω
The process (4.13) can also be defined as the real part of a complex process
x(t) = Ak eiφk eiωk t ,
k
for ω1 < ω2 < ω3 < ω4 . The variance of its increments is equal to the incre-
ments of the spectral distribution, i.e. for ω1 < ω2 ,
It follows that Z(ω) is continuous in quadratic mean if and only if the spectral
distribution function F is continuous. If F has a jump at a point ω0 ,
then lim →0 (Z(ω0 + ) − Z(ω0 − )) exists and has variance σ02 .
Now, let us start with a spectral process {Z(ω); ω ∈ R}, a complex process
with E(Z(ω)) = 0 and with orthogonal increments, and define the function
F (ω) by
E(|Z(ω) − Z(0)|2 ) for ω ≥ 0,
F (ω) =
−E(|Z(ω) − Z(0)|2 ) for ω < 0.
Since only the increments of Z(ω) are used in the theory, we can fix its value
at any point, and we take Z(0) = 0. Following the definition of a stochastic
integral in Section 2.6, we can define a stochastic process
x(t) = eiωt dZ(ω) = lim eiωk t (Z(ωk+1 ) − Z(ωk )),
where the limit is in quadratic mean. It is then easy to prove that E(x(t)) = 0
and that its covariance function is given by the Fourier-Stieltjes transform of
F (ω): use Theorem 2:14, and (4.14), to get
E eiωs dZ(ω) · eiμt dZ(μ) = ei(ωs−μt) E dZ(ω) · dZ(μ)
= eiω(s−t) dF (ω).
and prove that it has all the required properties. This is the technique used
in Yaglom’s classical book, [38]; see Exercise 5. We shall present a functional
analytic proof, as in [9], and find a relation between H(x) = S(x(t); t ∈ R) and
H(F ) = L2 (F ) = the set of all functions g(ω) with |g(ω)|2 dF (ω) < ∞. We
start by the definition of an isometry.
Proof: We shall build an isometry between the Hilbert space of random vari-
ables H(x) = S(x(t); t ∈ R) and the function Hilbert space H(F ) = L2 (F ),
with scalar products defined as
y2H(x) = E(|y|2 ),
g2H(F ) = |g(ω)|2 dF (ω),
This just means that x(t) has the same length as an element of H(x) as has
ei·t as an element of H(F ), i.e. x(t)H(x) = ei·t H(F ) . Furthermore, scalar
products are preserved,
(x(s), x(t))H(x) = E(x(s)x(t)) = eiωs eiωt dF (ω) = (ei·s , ei·t )H(F ) .
This is the start of our isometry: x(t) and ei·t are the corresponding el-
ements of the two spaces. Instead of looking for random variables Z(ω0 ) in
H(x) we shall look for functions gω0 (·) in H(F ) with the same properties.
Step 1: Extend the correspondence to finite linear combinations of x(t) and
eiωt by letting
Step 6: Let Z(ω) be the elements in H(x) that correspond to gω (·) in H(F ). It
is easy to see that Z(ω) is a process with orthogonal increments and incremental
variance given by F (ω):
E (Z(ω4 ) − Z(ω3 )) · (Z(ω2 ) − Z(ω1 ))
= (gω4 (ω) − gω3 (ω))(gω2 (ω) − gω1 (ω)) dF (ω) = 0,
for an increasingly dense subdivision {ωk } with ωk < ωk+1 . But we have that
x(t) ∈ H(x) and ei·t ∈ H(F ) are corresponding elements. Further,
(n)
eiωt = lim eiωk t (gωk+1 (ω) − gωk (ω)) = lim gt (ω),
(n) (n)
and g(n) (·) = αk ei·tk converges in H(F ) to some function g(·), and then
g(n) (ω) dZ(ω) → g(ω) dZ(ω)
covariance function then has the corresponding form, r(t) = ΔFk eiωk t .
In the general spectral representation, the complex Z(ω) defines a random
amplitude and phase for the different components eiωt . This fact is perhaps
94 Spectral- and other representations Chapter 4
difficult to appreciate in the integral form, but is easily understood for processes
with discrete spectrum. Take the polar form, ΔZk = |ΔZk |ei arg ΔZk = ρk eiθk .
Then,
x(t) = ρk ei(ωk t+θk ) = ρk cos(ωk t + θk ) + i ρk sin(ωk t + θk ).
For a real process, the imaginary part vanishes, and we have the form, well
known from elementary courses – see also later in this section –
x(t) = ρk cos(ωk t + φk ). (4.18)
Theorem 4:7 If F (ω) is a step function, with jumps of size ΔFk at ωk , then
1 T
lim r(t)e−iωk t dt = ΔFk , (4.19)
T →∞ T 0
1 T
lim r(t)2 dt = (ΔFk )2 , (4.20)
T →∞ T 0
k
T
1
lim x(t)e−iωk t dt = ΔZk . (4.21)
T →∞ T 0
2
with Z(ω) = √
dZ(x)
, and E dZ(ω) = dF (ω)
= dω . Even if
{x≤ω;f (x)>0} f (x) f (ω)
Z(ω) is not a true spectral process – it may for example have infinite incremental
variance – it is useful as a model for white noise. We will meet this “constant
spectral density” formulation several times in later sections.
For this to be real for all t it is necessary that ΔZ0 is real, and also that
dZ(ω)+dZ(−ω) is real, and dZ(ω)−dZ(−ω) is purely imaginary, which implies
dZ(−ω) = dZ(ω), i.e. arg Z(−ω) = − arg Z(ω) and |Z(−ω)| = |Z(ω)|. (These
properties also imply that x(t) is real.)
Now, introduce two real processes {u(λ), 0 ≤ λ < ∞} and {v(λ), 0 ≤ λ <
∞}, with mean zero, and with u(0−) = v(0−) = 0, du(0) = ΔZ0 , v(0+) = 0,
and such that, for ω > 0,
The real spectral representation of x(t) will then take the form
∞ ∞
x(t) = cos ωt du(ω) + sin ωt dv(ω)
0 0
∞ ∞
= cos ωt du(ω) + sin ωt dv(ω) + du(0). (4.23)
0+ 0
It is easily checked that with the one-sided spectral distribution function G(ω),
defined by (4.11),
In almost all applications, when a spectral density for a time process x(t) is
presented, it is the one-sided density g(ω) = 2f (ω) = dG(ω)/dω that is given.
96 Spectral- and other representations Chapter 4
processes {u(λ), 0 ≤ λ < ∞} and {v(λ), 0 ≤ λ < ∞} are Gaussian, and since
they have uncorrelated increments, they are Gaussian processes with indepen-
dent increments.
The sample paths of u(ω) and v(ω) can be continuous, or they could contain
jump discontinuities, which then are normal random variables. In the contin-
uous case, when there is a spectral density f (ω), they are almost like Wiener
processes, and they can be transformed into Wiener processes by normalizing
the incremental variance. In analogy with Z(ω) in (4.22), define w1 (ω) and
w2 (ω) by
du(x) dv(x)
w1 (ω) = , w2 (ω) = , (4.27)
{x≤ω;f (x)>0} 2f (x) {x≤ω;f (x)>0} 2f (x)
with the previously mentioned Langevin equation, (1.15), and deal with these
more in detail in Section 4.4.4.
with Gaussian white noise w (t). We met this equation in Section 1.6.1 under
the name Langevin’s equation.
For large α (α → ∞), the covariance function falls off very rapidly around
t = 0 and the correlation between x(s) and x(t) becomes negligible when
s = t. In the integral (4.31) each x(t) depends asymptotically only on the
increment dw(t) and are hence approximately independent. With increasing α,
the spectral density becomes increasingly flatter at the same time as f (ω) → 0.
In order to keep the variance of the process constant, not going to 0 or ∞, we
Section 4.3 Spectral representation of a stationary process 99
to get ∞
x(t) = eiωt f (ω) dwC (ω). (4.33)
−∞
We then do some formal calculation with white noise: w (t) is the formal
derivative of the Wiener process, and it is a stationary process with constant
spectral density equal to 1/2π over the whole real line, i.e. by (4.33),
∞
1
w (t) = √ eiωt dwC (ω),
2π −∞
for some complex Wiener process wC (ω). Inserting this in (4.31), we obtain
√ t
x(t) = 2ασ 2 e−α(t−τ ) w (τ ) dτ
−∞
√ ∞
2ασ 2 t −α(t−τ )
= √ e iωτ
e dwC (ω) dτ
2π τ =−∞ ω=−∞
√ t
2ασ 2 ∞ −(α+iω)(t−τ )
= √ e dτ eiωt dwC (ω)
2π ω=−∞ τ =−∞
√
2ασ 2 ∞ 1
= √ eiωt dwC (ω)
2π ω=−∞ α + iω
√
2ασ 2 ∞ 1
= √ √ ei(− arg(α+iω)+ωt) dwC (ω)
2π ω=−∞ α + ω 2 2
∞
= ei(ωt+γ(ω)) f (ω) dwC (ω),
ω=−∞
with γ(−ω) = −γ(ω). The same Wiener process wC (ω) which works in the
spectral representation of the white noise in (4.31) can be used as spectral
process in (4.33) after correction of the phase.
100 Spectral- and other representations Chapter 4
is minimal. This
linear combination is characterized by the requirement that the
residual x − cj yj is orthogonal, i.e. uncorrelated with all the yj -variables.
This is the least squares solution to the common linear regression problem.
Section 4.4 Linear filters 101
Example 4:4 For the MA(1)-process, x(t) = e(t) + b1 e(t − 1), we found in
Example C:1 that
∞
k=0 (−b1 ) x(t − k), if |b1 | < 1,
k
e(t) = n
limn→∞ k=0 (1 − nk )x(t − k), for b1 = −1.
Thus, x(t + 1) = e(t + 1) + b1 e(t) has been written as the sum of one variable
e(t + 1) ⊥ H(x, t) and one variable b1 e(t) ∈ H(x, t). The projection theorem
implies that the best linear prediction of x(t + 1) based on x(s), s ≤ t, is
∞
k=0 (−b1 ) x(t − k), if |b1 | < 1,
b1 k
t (t + 1) = b1 e(t) =
x
limn→∞ − nk=0 (1 − nk )x(t − k), for b1 = −1.
Example 4:5 We can extend the previous example to have H(x, t) ⊂ H(e, t)
with strict inclusion. Take a series of variables e∗ (t) and a random variable U
with E(U ) = 0 and V (U ) < ∞, everything uncorrelated, and set
e(t) = U + e∗ (t).
Then x(t) = e(t) − e(t − 1) = e∗ (t) − e∗ (t − 1), and H(e, t) = H(U ) ⊕ H(e∗ , t)
with H(U ) and H(e∗ , t) orthogonal, and H(e, t) ⊇ H(e∗ , t) = H(x, t).
where g(ω)
is the transfer function (also called frequency function). It has to
satisfy |g(ω)|2 dF (ω) < ∞. That the filter is linear and time-invariant means
that, for any (complex) constants a1 , a2 and time delay τ ,
In particular, if x(t) has spectral density fx (ω) then the spectral density of
y(t) is
fy (ω) = |g(ω)|2 fx (ω). (4.37)
Lemma 4.1 If gn → g in H(F ), i.e. |gn (ω) − g(ω)|2 dF (ω) → 0, then
iωt
gn (ω) e dZ(ω) → g(ω) eiωt dZ(ω)
in H(x).
2
Section 4.4 Linear filters 103
The frequency function for derivation is therefore g(ω) = iω , and the spectral
density of the derivative is fx (ω) = ω 2 fx (ω).
In general, writing y(t) = |g(ω)| ei(ωt+arg g(ω)) dZ(ω), we see how the filter
amplifies the amplitude of dZ(ω) by a factor |g(ω)| and adds arg g(ω) to the
phase. For the derivative, the phase increases by π/2, while the amplitude
increases by a frequency dependent factor ω .
then, by (4.14),
Cov(x(s), u(t)) = eiωs e−iμt g(μ) E dZ(ω) · dZ(μ)
ω μ
= ei(s−t)ω g(ω) dFx (ω),
and similarly,
Cov(u(s), v(t)) = g(ω)h(ω) ei(s−t)ω dFx (ω). (4.38)
104 Spectral- and other representations Chapter 4
Inserting the spectral representation of x(t) and changing the order of integra-
tion, we obtain a filter in frequency response form,
∞ ∞ ∞
y(t) = e h(t − u) du dZ(ω) =
iωu
g(ω) eiωt dZ(ω),
ω=−∞ u=−∞ −∞
with ∞
g(ω) = e−iωu h(u) du, (4.39)
u=−∞
if |h(u)| du < ∞.
Conversely, if h(u) is absolutely integrable,
|h(u)| du < ∞, then g(ω),
defined by (4.39), is bounded and hence |g(ω)|2 dF (ω) < ∞. Therefore
y(t) = g(ω) eiωt dZ(ω)
defines a linear filter with frequency function g(ω) as in (4.36). Inserting the
expression for g(ω) and changing the order of integration we get the impulse
response form,
iωt −iωu
y(t) = e e h(u) du dZ(ω)
= h(u) eiω(t−u)
dZ(ω) du = h(u)x(t − u) du.
The impulse response and frequency response function form a Fourier transform
pair, and ∞
1
h(u) = eiωu g(ω) dω. (4.40)
2π ω=−∞
If h(u) = 0 for u < 0 the ∞filter is called causal or physically realizable,
indicating that then y(t) = u=0 h(u)x(t − u) du depends only on x(s) for
s ≤ t, i.e. the output from the filter at time t depends on the past and not on
the future.
Proof: We πshow part a); part b) is quite similar. For the ”only if” part, use
that yk = −π e dZ(ω), where E(|dZ(ω)|2 ) = dω
iωk
2π . Then
π π
xt = ht−k eiωk dZ(ω) = eiωt ht−k e−iω(t−k) dZ(ω)
k −π −π k
π
= eiωt g(ω) dZ(ω),
−π
with g(ω) = k hk e−iωk . Thus, the spectral distribution of xk has
dω
dF (ω) = E(|g(ω) dZ(ω)|2 ) = |g(ω)|2 ,
2π
1
with spectral density f (ω) =2π |g(ω)|2 .
ω 1 2
For the ”if” part,F (ω) = −∞ f (x) dx, write f (ω) = 2π |g(ω)| , and expand
|g(ω)| in a Fourier series,
|g(ω)| = ck eiωk .
k
106 Spectral- and other representations Chapter 4
we then get
π
1
xt = eiωt √ ck eiωk dZ(ω)
−π 2π k
ck π
= √
eiω(t+k) dZ(ω) = ck et+k = ht−k ek ,
k
2π −π k k
π
with ek = √12π −π eiωk dZ(ω),
and hk = ck+t . Since the Z(ω) has constant
incremental variance, the ek -variables are uncorrelated and normalized as re-
quired. 2
? ?
A2 B2
Figure 4.1: Input x(t) and output y(t) in an exponential smoother (RC-
filter).
By “solution” we mean either that (almost all) sample functions satisfy the
equations or that there exists a process {y(t), t ∈ R} such that the two sides are
equivalent. Note that (4.48) is only marginally more general than (4.47), since
both right hand sides are stationary processes without any further assumption.
What can then be said about the solution to these equations: when does it
exist and when is it a stationary process; and in that case, what is its spectrum
and covariance function?
For the linear differential equation (4.47),
a0 y (p) (t) + a1 y (p−1) (t) + . . . + ap−1 y (t) + ap y(t) = x(t). (4.49)
we define the generating function,
A(r) = a0 + a1 r + . . . + ap r p ,
and the corresponding characteristic equation
r p A(r −1 ) = a0 r p + a1 r p−1 + . . . + ap−1 r + ap = 0. (4.50)
The existence of a stationary process solution depends on the solutions to the
characteristic equation. The differential equation (4.49) is called stable if the
roots of the characteristic equation all have negative real part.
One can work with (4.49) as a special case of a multivariate first order
differential equation. Dividing both sides by a0 the form is
y = Ay + x, (4.51)
with y(t) = (y(t), y (t), · · · , y (p−1) (t)) , x(t) = (0, 0, · · · , x(t)) , and
⎛ ⎞
0 1 0 ... 0
⎜ 0 0 1 ··· 0 ⎟
⎜ ⎟
A=⎜ . . .. . . ⎟.
⎝ . . .
. . . 1 ⎠
−ap −ap−1 ap−2 · · · −a1
This is the formulation which is common in linear and non-linear systems the-
ory; cf. for example [14, Ch. 8], to which we refer for part of the following
theorem.
Section 4.4 Linear filters 109
Theorem 4:9 a) If the differential equation (4.49) is stable, and the right hand
side {x(t), t ∈ R} is a stationary process, then there exists a stationary process
{y(t), t ∈ R} that solves the equation. The solution can be written as the output
of a linear filter t
y(t) = h(t − u) x(u) du, (4.52)
−∞
with initial conditions h(0) = h (0) = . . . = h(p−2) (0) = 0, h(p−1) (0) = 1/ap .
∞
Further −∞ |h(u)| du < ∞.
b) If {x(t), t ∈ R} is a p times differentiable stationary process with spectral
p (j)
density fx (ω), then also 0 aj x (t) is a stationary process, and it has the
spectral density
p 2
j
aj (iω) fx (ω). (4.54)
0
x(t)
y(t)
⇑ 6
m
c k
The relation between the force X(t) and the resulting displacement is de-
scribed by the following differential equation,
This equation can be solved just like an ordinary differential equation with a
continuous x(t) and, from Theorem 4:9, it has the solution
t
y(t) = h(t − u)x(u) du,
−∞
α = ζω0 ,
0 = ω0 (1 − ζ 2 )1/2 .
ω
To find the frequency function g(ω) for the linear oscillator we consider
each term on the left hand side in (4.56). Since differentiation has frequency
function iω , and hence, repeated differentiation has frequency function −ω 2 ,
we see that g(ω) satisfies the equation
−mω 2 + ciω + k · g(ω) = 1,
Section 4.4 Linear filters 111
and hence
1
g(ω) = . (4.57)
−mω 2 + icω + k
Since
1
|g(ω)|2 = ,
(k − mω 2 )2 + c2 ω 2
the spectral density for the output signal y(t) is
fx (ω) fx (ω)/m2
fx (ω) = = . (4.58)
(k − mω 2 )2 + c2 ω 2 (ω02 − ω 2 )2 + 4α2 ω 2
Example 4:8 A resonance circuit with one inductance, one resistance, and one
capacitance in series is an electronic counterpart to the harmonic mechanical
oscillator; see Figure 4.3.
A1 B1
6 6
L
x(t) R y(t)
C
? ?
A2 B2
Figure 4.3: A resonance circuit.
If the input potential between A1 and A2 is x(t), the current I(t) through
the circuit obeys the equation
1 t
LI (t) + RI(t) + I(s) ds = x(t).
C −∞
The output potential between B1 and B2 , which is y(t) = RI(t), therefore
follows the same equation (4.56) as the linear mechanical oscillator,
1
Ly (t) + Ry (t) + y(t) = Rx (t), (4.59)
C
but this time with x (t) as driving force. The frequency function for the filter
between x(t) and y(t) is (cf. (4.57)),
iω
g(ω) = .
−(L/R)ω 2 + iω + 1/(RC)
√
The response frequency ω0 = 1/ LC is here called the resonance frequency.
The relative damping ζ corresponds to the relative bandwidth 1/Q = 2ζ , where
1/Q = Δω/ω0 = R C/L,
√
and Δω = ω2 − ω1 is such that |g(ω1 )| = |g(ω2 )| = |g(ω0 )|/ 2.
112 Spectral- and other representations Chapter 4
with the stationary process {x(t), t ∈ R} as input. The frequency function for
the filter in (4.52) is the solution to h (u) + 2h (u) + h(u) = 0, and is of the
form
h(u) = e−u (C1 + C2 u),
fx (ω) fx (ω)
fy (ω) = = .
|1 + 2(iω) + (iω)2 |2 (1 + ω 2 )2
where a1 depends on the viscosity and a0 is the particle mass. If the force x(t)
is caused by collisions from independent molecules it is reasonable that different
x(t) be independent. Adding the assumption that they are Gaussian leads us
to take x(t) = σ w (t) as the “derivative of a Wiener process”, i.e. Gaussian
white noise,
a0 y (t) + a1 y(t) = σ w (t).
This equation can be solved as an ordinary differential equation by
t t
1 −α(t−u) 1
y(t) = e w (u) du = e−α(t−u) dw(u). (4.61)
a0 −∞ a0 −∞
Here the last integral is well defined, Example 2:7 on page 51, even if the
differential equation we started out from is not.
Section 4.4 Linear filters 113
By carrying out the integration it is easy to see that the process y(t) defined
by (4.61) satisfies
t
a1 y(u) du = a0 (y(t) − y(t0 )) + σ(w(t) − w(t0 )),
u=t0
which means that, instead of equation (4.60), we could have used the integral
equation
t
a0 (y(t) − y(t0 )) + a1 y(u) du = σ (w(t) − w(t0 )), (4.62)
u=t0
as in Theorem 4:9. The formal differential equation can be replaced by the well
defined differential-integral equation
a complete new theory is needed, namely stochastic calculus; the reader is re-
ferred to [39] for a good introduction.
114 Spectral- and other representations Chapter 4
σ2
fn (ω) = , −∞ < ω < ∞.
2π
Strictly, this is not a proper spectral density of a stationary process since it
has infinite integral, but used as input in a linear system with a impulse re-
sponse function that satisfies |h(u)|2 du < ∞, it produces a stationary output
process.
a0 y (p) (t) + a1 y (p−1) (t) + . . . + ap−1 y (t) + ap y(t) = σw (t), (4.64)
σ2 1
fx (ω) = · .
2π p p−k 2
k=0 ak (iω)
b) The relation between the impulse response and the frequency response func-
tion g(ω) is a property of the systems equation (4.64) and does not depend on
any stochastic property. One can therefore use the established relation
1
g(ω) = p p−k
k=0 ak (iω)
Part (b) of the theorem finally confirms our claim that the Gaussian white
noise σw (t) can be treated as if it has constant spectral density σ 2 /(2π).
C
A stationary process with spectral density of the form |P (iω)| 2 where P (ω)
σ2
where the white noise input n(t) has constant spectral density, fn (ω) = 2π ,
has spectral density (cf. (4.58)),
σ2 1
fy (ω) = · 2 .
2π (ω0 − ω )2 + 4α2 ω 2
2
The covariance function is found, for example by residue calculus, from ry (t) =
iωt
e fy (ω) dω . With α = ζω0 and ω0 = ω0 1 − ζ 2 one gets the covariance
function
σ 2 −α|t| α
ry (t) = e 0 t +
cos ω 0 |t| .
sin ω (4.65)
4αω02 0
ω
where ΔZ(0) is the jump of Z(ω) at the origin, we obtain a particularly useful
linear transform of x(t). One can obtain x∗ (t) as the limit, as h ↓ 0 through
continuity points of F (ω), of the linear operation with frequency function
⎧
⎨ 0, ω < −h,
⎪
gh (ω) = 1, |ω| ≤ h,
⎪
⎩
2, ω > h.
(t), defined by
The process x
x∗ (t) = x(t) + i
x(t),
Thus, x∗ (t) is a complex process with x(t) as real part and x (t) as imaginary
part. All involved processes can be generated by the same real spectral processes
{u(λ), 0 ≤ λ < ∞} and {v(λ), 0 ≤ λ < ∞}.
Theorem 4:11 Let {x(t), t ∈ R} be stationary and real, with mean 0, co-
variance function r(t) and spectral distribution function F (ω), with a possible
jump ΔF (0) at ω = 0. Denote the Hilbert transform of x(t) by x (t). Then,
with G(ω) denoting the one-sided spectrum,
3
Matlab, Signal processing toolbox, contains a routine for making Hilbert transforms.
Try it!
Section 4.4 Linear filters 117
a) {
x(t), t ∈ R} is stationary and real, with mean 0, and covariance function
∞ ∞
r(t) = r(t) − ΔF (0) = eiωt dF (ω) − ΔF (0) = cos ωt dG(ω).
−∞ 0+
(t) has the same spectrum F (ω) as x(t), except that any
b) The process x
jump at ω = 0 has been removed.
(t) is
c) The cross-covariance function between x(t)and x
∞
∗
r (t) = E(x(s) · x
(s + t)) = sin ωt dG(ω).
0
Proof: Part (a) and (c) follow from (4.23), (4.66), and the correlation prop-
erties (4.24-4.26) of the real spectral processes. Then part (b) is immediate.
2
and, in combination,
x(t) = 2 ReY (t) eiω0 t .
Here, Y (t) is a complex process
d
Y (t) = Y1 (t) + iY2 (t) = f0 (ω)/2 eiωt dω WC (ω + ω0 )
−d
with only low frequencies. With R(t) = 2|Y (t)| and Θ(t) = arg Y (t), we obtain
The envelope R(t) has here a real physical meaning as the slowly varying am-
plitude of the narrow band process.
Section 4.4 Linear filters 119
A narrowband wave, with Hilbert transform and envelope
A Pierson−Moskowitz wave, with Hilbert transform and envelope
2.5
3
2
1.5 2
1
1
0.5
0 0
−0.5
−1
−1
−1.5 −2
−2 −3
−2.5
0 10 20 30 40 50 0 50 100 150 200
Figure 4.4: Gaussian processes and their Hilbert transforms and envelopes.
Left: Pierson-Moskowitz waves, Right: process with triangular spectrum
over (0.8, 1.2).
i.e. its spectrum is restricted to the interval [−ω0 , ω0 ]. We require that there
is no spectral mass at the points ±ω0 .
For a fixed t, the function gt (ω) = eiωt · I[−ω0,ω0 ] is square integrable over
(−ω0 , ω0 ), and from the theory of Fourier series, it can be expanded as
N
sin ω0 (t − kt0 )
e iωt
= lim eiωkt0 · ,
N →∞ ω0 (t − kt0 )
k=−N
120 Spectral- and other representations Chapter 4
2
ω0
N
−
iωt sin ω 0 (t kt 0
)
e − eiωkt0 · dF (ω) → 0.
−ω0 k=−N
ω0 (t − kt0 )
The convergence is also uniform for |ω| < ω0 = π/t0 . For ω = ±ω0 it converges
to
eiω0 t + e−iω0 t
= cos ω0 t.
2
Therefore, if dF (±ω0 ) = 0, then
N
sin ω0 (t − kt0 )
eiωkt0 · → eiωt , (4.68)
ω0 (t − kt0 )
k=−N
in H(F ) as N → ∞.
Inserting this expansion in the spectral representation of {x(t), t ∈ R} we
obtain,
⎛ 2 ⎞
N
sin ω0 (t − kt0 ) ⎠
E ⎝x(t) − x(kt0 ) · (4.69)
ω0 (t − kt0 )
k=−N
⎛ 2 ⎞
ω0
N
sin ω0 (t − kt0 )
= E ⎝ eiωt − eiωkt0 · dZ(ω) ⎠
−ω0 ω0 (t − kt0 )
k=−N
ω0
N
2
iωt sin ω 0 (t − kt 0
)
≤ e − eiωkt0 · dF (ω).
−ω0 k=−N
ω0 (t − kt0 )
N
sin ω0 (t − kt0 )
x(t) = lim x(kt0 ) · ,
N →∞ ω0 (t − kt0 )
k=−N
sin2 ω0 t(F + + F − ),
An example of this is the simple random cosine process, x(t) = cos(ω0 t+φ),
which has covariance function r(t) = 12 cos ω0 t, and spectrum concentrated at
±ω0 . Then
x(α + kt0 ) = (−1)k x(α),
pk Σpk p pk ωk
V (zk ) = = k = 1.
ωk ωk
1 1
Cov(zj .zk ) = E(zj zk ) = √ E pj x x pk = √ p Σ pk = 0,
ωj ωk ωj ωk j
uniformly in t, as n → ∞.
The functions ck (t) depend on the choice of observation interval [a, b], and
the random variables zk are elements in the Hilbert space spanned by x(t); t ∈
[a, b],
zk ∈ H(x(t); t ∈ [a, b]).
Let us first investigate what properties such an expansion should have, if it
exists. Write H(x) instead of H(x(t); t ∈ [a, b]). Suppose there exists zk with
zk 2H(x) = 1, (zj , zk ) = E(zj zk ) = 0, j = k,
Section 4.5 Karhunen-Loève expansion 123
and assume that the family {zk } is complete for H(x), i.e. the zk form a basis
for H(x). In particular this means that for every U ∈ H(x),
U ⊥ zk , ∀k ⇒ U = 0.
Now take any y ∈ H(x), and define ck = (y, zk ). Then, by the orthogonal-
ity,
⎛ 2 ⎞
n
n
E ⎝y − ck zk ⎠ = . . . = E(|y|2 ) − |ck |2 ,
0 0
n ∞
so 0 |ck |2 ≤ y2H(x) for all n, and hence 2 2
0 |ck | ≤ yH(x) . This means
∞
that 0 ck zk exists as a limit in quadratic mean, and also that
∞
y− ck zk ⊥ zn
0
∞
for all n. But since {zk } is a complete family, y = ∞0 ck zk = 0 (y, zk )zk .
Now replace y by a fixed time observation of x(t). Then, naturally, the ck
will depend on the time t andare functions ck (t), x(t) = ck (t)zk . For the
covariance function of x(t) = k ck (t)zk we have, by the orthogonality of zk ,
r(s, t) = E(x(s)x(t)) = cj (s) ck (t) E(zj zk ) = ck (s)ck (t).
j,k k
Thus, we shall investigate the existence and properties of the following pair
of expansions,
x(t) = ck (t)zk , (4.71)
k
r(s, t) = ck (s)ck (t). (4.72)
k
Not only can the zk be taken as uncorrelated but also the functions ck (t)
can be chosen as orthogonal, i.e.
b
cj (t) ck (t) dt = 0, j = k
a
b
|ck (t)|2 dt = ωk ≥ 0.
a
Example 4:11 The standard Wiener process w(t), observed over [0, T ] has
covariance function r(s, t) = min(s, t), and eigenfunctions can be found explic-
itly: from
T
min(s, t) φ(t) dt = ωφ(s),
0
it follows by twice differentiation,
s T
tφ(t) dt + sφ(t) dt = ωφ(s) (4.74)
0 s
T
sφ(s) − sφ(s) + φ(t) dt = ωφ (s) (4.75)
s
T π
√ = + kπ, k = 0, 1, 2, . . . .
ωk 2
we have that the Wiener process can be defined as the infinite (uniformly con-
vergent in quadratic mean) sum
% ∞ πt(k+ ) 1
2 sin T 2
w(t) = zk ,
T π(k+ 12 )
k=0 T
Proof: (of Theorem 4:13) We only indicate the steps in the proof, following
the outline in [35]. One has to show: the mathematical facts about existence
and properties of eigenvalues and eigenfunctions, the convergence of the series
(4.73), and finally the stochastic properties of the variables zk . This is done in
a series of steps.
b
(i) If a r(s, t) φ(t) dt = ωφ(s), then ω is real and non-negative. This follows
from
b b b
0≤ r(s, t) φ(s) φ(t) ds dt = ω |φ(s)|2 ds.
a a a
b
where the maximum is taken over φ2 = a |φ(t)|2 dt = 1. As stated in [35],
”this is not easily proved”. The corresponding eigenfunction is denoted by
φ0 (t), and it is continuous.
(iii) The function r1 (s, t) = r(s, t) − ω0 φ0 (s)φ0 (t) is a continuous covariance
function, namely for the process
b
x1 (t) = x(t) − φ0 (t) φ0 (s) x(s) ds,
a
Repeating step (ii) with r1 (s, t) instead of r(s, t) we get a new eigenvalue
b
ω1 ≥ 0 with eigenfunction φ1 (t). Since a r1 (s, t) φ1 (t) dt = ω1 φ1 (s), we have
b b b
1
φ1 (s) φ0 (t) dt = φ0 (s) r1 (s, t) φ1 (t) dt ds
a ω1 a a
b b
1
= φ1 (t) r1 (s, t) φ0 (s) ds dt = 0,
ω1 a a
= ω1 φ1 (s) + 0.
n
rn (s, t) = r(s, t) − ωk φk (s)φk (t).
k=0
as n → ∞, i.e.
∞
r(s, t) = ωk φk (s)φk (t),
k=0
with uniform convergence. (This is Mercer’s theorem from 1909.)
(vi) For the representation (4.73) we have
2
n
√
n
2
E x(t) − ωk φk (t) zk = E(|x(t)| ) − ωk |φk (t)|2
0 0
n
= r(t, t) − ωk |φk (t)|2 → 0,
0
H0 : m(t) = m0 (t),
H1 : m(t) = m1 (t)
Writing
b
ak = φk (t) m(t) dt
a
b
aik = φk (t) mi (t) dt, i = 0, 1,
a
√
we have that yk ∈ N (ak , ωk ) and the hypotheses are transformed into hy-
potheses about ak :
H0 : ak = a0k , k = 0, 1, . . .
H1 : ak = a1k , k = 0, 1, . . .
and
(a1k − a0k )2
V (Uk ) =
ωk
∞
under both H0 and H1 , we have that 0 Uk is normal with mean
∞ ⎧ 2
⎨ m0 = ∞ 1 (a1k −a0k ) if H0 is true
0 2 ωk
E Uk =
⎩ m = −m = − ∞ 1 (a1k −a0k ) if H is true
2
0 1 0 0 2 ωk 1
m0 (t) + m1 (t)
Uk = f (t) x(t) − dt,
a 2
0
where
∞
a0k − a1k
f (t) = φk (t).
ωk
0
130 Spectral- and other representations Chapter 4
Exercises
4:1. Let x(t) be a stationary Gaussian process with E(x(t)) = 0, covariance
function rx (t) and spectral density fx (ω). Calculate the covariance func-
tion for the process
y(t) = x2 (t) − rx (0),
and show that it has the spectral density
∞
fy (ω) = 2 fx (μ)fx (ω − μ) dμ.
−∞
4:2. Derive the spectral density for u(t) = 2x(t)x (t) if x(t) is a differentiable
stationary Gaussian process with spectral density fx (ω).
4:3. Let et , t = 0, ±1, ±2, . . . be independent N (0, 1)-variables and define the
stationary processes
t
xt = θxt−1 + et = θ t−n en ,
n=−∞
yt = et + ψet−1 ,
with |θ| < 1. Find the expressions for the spectral processes Zx (ω)
and Zy (ω) in terms of the spectral process Ze (ω), and derive the cross
spectrum between xt and yt . (Perhaps you should read Chapter 6 first.)
4:4. Let un and vn be two sequences of independent, identically distributed
variables with zero mean and let the stationary sequences xn and yn be
defined by
yn = a1 + b1 xn−1 + un
xn = a2 − b2 yn + vn .
a) First show that the following integral and limit exists in quadratic
mean: T −iωt
1 e −1
Z(ω) = lim x(t) dt.
T →∞ 2π −T −it
Section 4.5 Karhunen-Loève expansion 131
b) Then show that the process Z(ω), −∞ < ω < ∞, has orthogonal
increments and that
for ω1 < ω2 .
c) Finally, show that the integral
∞
eiωt dZ(ω) = lim eiωk t (Z(ωk+1 ) − Z(ωk ))
−∞
exists, and that E|x(t) − eiωt dZ(ω)|2 = 0.
σ 2 −α|t| α
ry (t) = e 0 t +
cos ω 0 |t| ,
sin ω
4αω02 0
ω
T T
1 1 q.m.
r(t) dt → 0 implies x(t) dt → 0,
T 0 T 0
133
134 Ergodic theory and mixing Chapter 5
Example 5:1 (The irrational modulo game) A simple game, that can go on
forever and has almost all interesting properties of a stationary process, is the
adding of an irrational number. Take a random x0 with uniform distribution
between 0 and 1, and let θ be an irrational number. Define
xk+1 = xk + θ mod 1,
Example 5:2 One can define other stationary sequences by applying any time
invariant rule to a stationary sequence xn , e.g. with 0 ≤ x0 ≤ 1, yn = 1 if
xn+1 > x2n and yn = 0 otherwise. A more complicated rule is
xk+1 = cxk (1 − xk ),
1
n
Sn /n = xk .
n
1
x2 + x3 + . . . + xn+1 x1 − xn+1
y = lim + lim
n n
x2 + x3 + . . . + xn+1 xn+1
= lim − lim .
n n
Example 5:3 The limit of Sn /n is invariant (when it exists) under the shift
transformation (x1 , x2 , . . .) → (x2 , x3 , . . .) of R∞ . The random variables y =
lim sup xn and lim sup Sn /n are always invariant.
3
Show as an exercise, if you have not done so already, that xn is a stationary sequence
with E(|xn |) < ∞ , then P (xn /n → 0, as n → ∞ ) = 1 ; Exercise 5:6.
Section 5.3 The Ergodic theorem, transformation view 139
with starting distribution p(0) = (2p/5, 3p/5, 3q/5, 2q/5) is stationary for every
p + q = 1. The sets A1 = {ω = (x1 , x2 , . . .); xk ∈ {1, 2}} and A2 = {ω =
(x1 , x2 , . . .); xk ∈ {3, 4}} are both invariant under the shift transformation.
J = {invariant sets A ∈ F },
is a σ -field.
(b) A random variable y is invariant if and only if it is measurable with respect
to the family J of invariant sets.
5.3.2 Ergodicity
The fundamental property of ergodic systems ω → T ω with a stationary (or
invariant) distribution P , is that the T k ω , with increasing k , visits every corner
of the state space, exactly with the correct frequency as required by P . Another
way of saying this is that the ”histogram”, counting the number of visits to any
neighborhood of states, converges to a limiting ”density”, namely the density
for the invariant distribution P over the state space. If we make a measurement
x(ω) on the system, then the expected value E(x) is the ”ensemble average”
with respect to the measure P ,
E(x) = x(ω) dP (ω),
ω∈Ω
and – here is the ergodicity – this is exactly the limit of the ”time average”
1
n
x(T k−1 ω).
n
1
Proof: First assume that every bounded invariant variable is constant, and
take an invariant set A. Its indicator function χA (ω) = 1 if ω ∈ A is then
an invariant random variable, and by assumption it is an a.s. constant. This
means that the sets where it is 0 and 1, respectively, have probability either 0
or 1, i.e. P (A) = 0 or 1. Hence T is ergodic.
Conversely, take an ergodic T , and consider an arbitrary invariant random
variable x. We shall show that x is a.s. constant. Define, for real x0 ,
Then
x(ω) dP (ω) ≥ 0.
ω;Mn >0
x ≥ Sk − Mn , for k = 1, 2, . . . , n + 1,
which implies
x ≥ max(S1 , . . . , Sn ) − Mn .
Thus (with Mn = Mn (T ω)),
x(ω) dP (ω) ≥ {max(S1 (ω), . . . , Sn (ω)) − Mn (T ω)} dP (ω).
Mn >0 Mn >0
But on the set {ω; Mn (ω) > 0}, one has that Mn = max(S1 , . . . , Sn ), and
thus
x(ω) dP (ω) ≥ {Mn (ω) − Mn (T ω)} dP (ω)
Mn >0 Mn >0
≥ {Mn (ω) − Mn (T ω)} dP (ω) = 0,
since increasing the integration area does not change the integral of Mn (ω)
while it can only make the integral of Mn (T ω) larger. Further, T is measure
Section 5.3 The Ergodic theorem, transformation view 143
preserving, i.e. shifting the variables one step does not change the distribution,
nor the expectation. 2
Proof of theorem: We first assume that E(x | J ) = 0 and prove that the
average converge to 0, a.s. For the general case consider x − E(x | J ) and use
that
E(x | J )(T ω) = E(x | J )(ω),
since E(x | J ) is invariant by Theorem 5:1(b), page 139
We show that x = lim sup Sn /n ≤ 0 and, similarly, x = lim inf Sn /n ≥ 0,
giving lim Sn /n = 0. Take an > 0 and denote D = {ω; x > }: we shall show
that P (D) = 0. Since, from Example 5:3, x is an invariant random variable,
also the event D is invariant. Define a new random variable,
x(ω) − if ω ∈ D,
x∗ (ω) =
0 otherwise,
and set Sn∗ (ω) = n1 x∗ (T k−1 ω), with Mn∗ defined from Sk∗ . From Lemma 5.1
we know
x∗ (ω) dP (ω) ≥ 0. (5.2)
Mn∗ >0
We now only have to replace this inequality by an inequality for a similar
integral over the set D to be finished. The sets
Fn = {Mn∗ > 0} = max Sk∗ > 0
1≤k≤n
1
n−1
lim x(T k ω) = E(x), a.s.
n→∞ n
0
1
n−1
a.s.
χA (T k ω) → P (A).
n
0
1 a.s.
n
xk → E(x1 | J ).
n
1
1 a.s.
n
xk → E(x1 ).
n
1
yn = φ(xn , xn+1 , . . .)
is stationary and ergodic. The special case that φ(xn , xn+1 , . . .) = φ(xn ) only
depends on xn should be noted in particular.
Proof: It is easy to reformulate the results of Theorem 5:3 to yield the the-
orem. However, we shall give the direct proof once more, but use the process
formulation, in order to get a slightly better understanding in probabilistic
terms. We give a parallel proof of part (a), and leave the rest to the reader.
nthe counterpart of Lemma 5.1: From the sequence {xn , n ∈ Z}, define
(a) First
Sk = j=1 xj and Mn = max(0, S1 , S2 , . . . , Sn ). We prove that
x1 ≥ max(S1 , . . . , Sn ) − Mn .
146 Ergodic theory and mixing Chapter 5
which is 0, since Mn and Mn have the same expectation. This proves (5.4).
We continue with the rest of the proof of part (a). Suppose E(x1 | J ) = 0
and consider the invariant random variables x = lim sup Sn /n. Take an > 0
and introduce the invariant event D = {x > }. Then, by the assumption,
a fact which is basic in the proof. We intend to prove that P (D) = 0 for every
> 0, thereby showing that x ≤ 0.
Similarly, x = lim inf Sn /n can be shown to be non-negative, and hence
x = x = 0.
However, before we prove that P (D) = P (x > ) = 0, we need to discuss
the meaning of (5.5). A conditional expectation is defined as a random variable,
measurable with respect to the conditioning σ -field, in this case J . In (5.5)
we conditioned on one of the events D ∈ J and that is fine if P (D) > 0,
but if P (D) = 0, the claim (5.5) makes no sense. The conditional expectation
given an event of probability 0 can be given any value we like, since the only
requirement on the conditional expectation is that it should give a correct value
when integrated over a J -event. If that event has probability 0 the integral is
0 regardless of how the expectation is defined. We return to this at the end of
the proof.
From (x1 , x2 , . . .), define a new sequence of variables
xk − if x > ,
x∗k =
0 otherwise,
and define Sk∗ = kj=1 x∗k and Mn∗ = max(0, S1∗ , S2∗ , . . . , Sn∗ ), in analogy with
Sk and Mn . The sequence {x∗k } is stationary, since x is an invariant random
variable, so we can apply (5.4) to get
But x = lim sup Sk /k ≤ supk≥1 Sk /k , so the right hand side is just D = {x >
}; Fn ↑ D . This implies (here E(|x∗1 |) ≤ E(|x1 |) + < ∞ is needed),
Theorem 5:5 (a) For any stationary process {x(t), t ∈ R} with E(|x(t)|) <
∞ and integrable sample paths, as T → ∞,
1 T a.s.
x(t) dt → E(x(0) | J ).
T 0
Proof of ”only if” part: If x(t) is ergodic, so is x2 (t), and therefore the
time average of x2 (t) tends to E(x2 (0)) = 1,
T
ST 1
x2 (t) dt → 1,
a.s.
=
T T 0
2 1 2 2
E((ST /T − 1) ) = 2 E x(s) x(t) ds dt − 1
T 0 0
T T T t
2 2 4 1 2
= 2 r (t − s) ds dt = 2 t· r (s) ds dt. (5.8)
T 0 0 T 0 t 0
t
But according to Theorem 4:7, page 94, relation (4.20), 1t 0 r 2 (s) ds tends to
the
sum 2of squares of all jumps of the spectral distribution function F (ω),
(ΔFk ) . Hence, if this sum is strictly positive, the right hand side in (5.8)
has a positive limit, which contradicts what we proved above, and we have
concluded the ”only if” part of the theorem. 2
The ”if” part is more difficult, and we can prove it here only under the
additional condition that
ω the process has a spectral density, i.e. the spectral
distribution is F (ω) = −∞ f (x) dx, because then
r(t) = eiωt f (ω) dω → 0,
Proof of lemma, and the ”if” part of theorem: We show that if r(t) → 0,
then every invariant set has probability 0 or 1. Let S be an a.s. invariant set
for the x(t)-process, i.e. the translated event Sτ differs from S by an event of
probability zero. But every event in F can be approximated arbitrarily well
by a finite-dimensional event, B , depending only on x(t) for a finite number of
time points tk , k = 1, . . . , n; cf. Section 1.3.3. From stationarity, also Sτ can
be approximated by the translated event Bτ = Uτ B , with the same error, and
combining S with Sτ can at most double the error. Thus, we have
and
H(x, −∞) ⊆ H(x, t) ⊆ H(x).
The subspace H(x, t) is the space of random variables that can be obtained as
limits of linear combinations of variables x(tk ) with tk ≤ t, and H(x, −∞) is
Section 5.6 Mixing and asymptotic independence 151
what can be obtained from old variables, regardless of how old they may be. It
can be called the infinitely remote past, or the the primordial randomness.4
Two extremes may occur, depending on the size of H(x, −∞):
• if H(x, −∞) = H(x), then {x(t), t ∈ R} is purely deterministic, or sin-
gular,
{x(t), t ∈ R}
Number (3) follows from the projection properties, since the residual y(s) =
x(s) − P−∞ (x(s)) is uncorrelated with every element in H(x, −∞), i.e. y(s) ⊥
H(x, −∞). Since z(t) ∈ H(x, −∞), number (3) follows.
Further, H(y, t) ⊆ H(x, t) and H(y, t) ⊥ H(x, −∞). Therefore H(y, −∞)
is equal to 0, because if y is an element of H(y, −∞) then both y ∈ H(y, t) ⊂
H(x, t) for all t, i.e. y ∈ H(x, −∞), and at the same time y ⊥ H(x, −∞). The
only element that is both in H(x, −∞) and is orthogonal to H(x, −∞) is the
zero element, showing (2).
Finally, H(z, t) = H(x, −∞) for every t. To see this, note that
As the band-limited white noise example shows, there are natural deter-
ministic processes. Other common process models are regular. Examples of
processes combining the two properties seem to be rather artificial.
so the spectrum is discrete if F (ω) = F (d) (ω). The part of the spectrum which
is neither absolutely continuous nor discrete is called the singular part:
F (s) (ω) = F (ω) − F (ac) (ω) − F (d) (ω).
Note that both F (d) (ω) and F (s) (ω) are bounded non-decreasing functions,
differentiable almost everywhere, with zero derivative. The question of singu-
larity or regularity of the process {x(t), t ∈ R} depends on the behavior of the
spectrum for large |ω|.
154 Ergodic theory and mixing Chapter 5
Theorem 5:8 For a stationary sequence xt , t ∈ Z the following cases can oc-
cur.
a) If P = −∞, then xt is singular.
b) If P > −∞ and the spectrum is absolutely continuous with f (ω) > 0 for
almost all ω , then xt is regular.
t
xt = ht−k yk ,
k=−∞
with uncorrelated yk ; cf. Theorem 4:8 which also implies that it must have a
spectral density.
A sequence that is neither singular nor regular can be represented as sum
of two uncorrelated sequences,
5
Singularity also occurs if f (ω) = 0 at a single point ω0 and is very close to 0 nearby,
1
such as when f (ω) ∼ exp(− (ω−ω 0)
2 ) when ω → ω0 .
Section 5.6 Mixing and asymptotic independence 155
(r)
The regular part xt has absolutely continuous spectrum,
ω
(ac)
F (ω) = f (x) dx,
−π
(s)
while the singular part xt has spectral distribution F (d) (ω) + F (s) (ω).
It is also possible to express the prediction error in terms of the integral
π
1
P = log f (ω) dω.
2π −π
d
where as before, f (ω) = dω F (ω) is the a.s. existing derivative of the spectral
distribution function.
Theorem 5:9 For a stationary process {x(t), t ∈ R}, one has that
a) if Q = −∞, then x(t) is singular,
b) if Q > −∞ and the spectrum is absolutely continuous then x(t) is regular.
The decomposition of x(t) into one singular component which can be pre-
dicted, and one regular component which is a moving average, is analogous to
the discrete time case,
t
x(t) = x(s) (t) + h(t − u) dζ(u),
u=−∞
then
1
N
y(N + 1) − y(1)
x(k) = .
BN BN
k=1
strong mixing: x(t) is called strongly mixing (or α-mixing) if there is a non-
negative function α(n) such that α(n) → 0 as n → ∞, and for all t and
events A ∈ Mt−∞ and B ∈ M∞ t+n ,
weak mixing: x(t) is called weakly mixing if, for all events A ∈ Mt−∞ and
B ∈ M∞t ,
1
n
lim |P (A ∩ U −k B) − P (A)P (B)| = 0.
n→∞ n
k=1
Of these, uniform mixing is the most demanding and weak mixing the least.
P (B | A) − P (B) ≥ c0 > 0
is bounded away from 0 by a positive constant c0 , and hence can not tend to
0.
For simplicity, assume E(x(t) = 0, E(x(t)2 ) = 1 and assume there is an
infinite number of t = tk for which ρk = r(tk ) > 0, but still r(tk ) → 0. Define
A = {x(0) > 1/ρk }, (obviously P (A) > 0) and let B = {x(tk ) > 1}. Since
x(0), x(tk ) have a bivariate normal distribution, the conditional distribution of
x(tk ) given x(0) = x is normal with mean ρk x and variance 1−ρ2k . As ρk → 0,
the conditional distribution of x(0) given x(0) > 1/ρk will be concentrated near
1/ρk and then x(tk ) will be approximately N (1, 1). Therefore,
∞ as ρk → 0,
P (B | A) → 1/2. On the other hand, P (B) = (2π)−1/2 1 exp(−y 2 /2) dy <
0.2. Hence (5.10) does not hold for φ(tk ) → 0.
b) For a proof of this, see [18] or [17]. 2
and hence
P (B ∩ Bn ) − P (B) · P (Bn ) → 0.
As in the proof of Lemma 5.2, it follows that P (B)2 = P (B) and hence P (S)2 =
P (S). 2
The reader could take as a challenge to prove that also weak mixing implies
that x(t) is ergodic; see also Exercise 14.
Section 5.6 Mixing and asymptotic independence 159
Exercises
5:1. Show that the following transformation of Ω = [0, 1), F = B , P =
Lebesgue measure, is measurable and measure preserving,
T x = 2x mod 1.
1
n
χA (xj , . . . , xj+k ) → P ((x1 , . . . , xk+1 ) ∈ A).
n
j=1
5:10. Showthat if x is non-negative with E(x) = ∞ and xn (ω) = x(T n−1 ω),
Sn = n1 xn , then Sn /n → ∞ if T is ergodic.
5:11. Prove Theorem 5:4.
5:12. Take two stationary and ergodic sequences xn and yn . Take one of the
two sequences at random with equal probability, zn = xn , n = 1, 2, . . . or
zn = yn , n = 1, 2, . . . . Show that zn is not ergodic.
5:13. Let {xn } and {yn } be two ergodic sequences, both defined on (Ω, F, P ),
and consider the bivariate sequence zn = (xn , yn ). Construct an exam-
ple that shows that zn need not be ergodic, even if {xn } and {yn } are
independent.
160 Ergodic theory and mixing Chapter 5
5:14. Prove that a sufficient condition for z(n) = (x(n), y(n)) to be ergodic,
if {x(n)} and {y(n)} are independent ergodic sequences, is that one of
{x(n)} and {y(n)} is weakly mixing.
Chapter 6
161
162 Vector processes and random fields Chapter 6
where F(ω) is a function of positive type, i.e. for every pair j, k , complex
z = (z1 , . . . , zn ), and frequency interval ω1 < ω2 ,
(Fjk (ω2 ) − Fjk (ω1 ))zj zk ≥ 0.
j,k
This says that ΔF(ω) = (Fjk (ω2 )− Fjk (ω1 )) is a non-negative definite Hermite
matrix.
(b) If Fjj (ω), Fkk (ω) are absolutely continuous with spectral densities fjj (ω),
fkk (ω), then Fjk (ω) is absolutely continuous, with
Together with rjj (t) = eiωt dFjj (ω), and rkk (t) = eiωt dFkk (ω), we get
rjk (t) + rkj (t) = eiωt (dG1 (ω) − dFjj (ω) − dFkk (ω)) ,
irjk (t) − irkj (t) = eiωt (dG2 (ω) − dFjj (ω) − dFkk (ω)) ,
Section 6.1 Cross-spectrum and spectral representation 163
which implies
1
rjk (t) = rkj (t) = eiωt · (dG1 (ω) − idG2 (ω) − (1 − i)(dFjj (ω) + dFkk (ω)))
2
= eiωt dFjk (ω),
which in turn implies that for any ω -interval, |ΔFjk |2 ≤ ΔFjj · ΔFkk .Thus, if
Fjj and Fkk have spectral densities, so does Fjk and
where Gjk (ω) and Hjk (ω) are functions of bounded variation.
processes with discrete spectrum, for which the spectral representation are sums
of the form (4.29),
xj (t) = σj (n) (Uj (n) cos ω(n)t + Vj (n) sin ω(n)t) .
n
Here Uj (n) and Vj (n) are real random variables with mean 0, variance 1,
uncorrelated for different n-values. The correlation between the xj - and the
xk -process is caused by a correlation between the U :s and V :s in the two
representations:
E(Uj (n)Uk (n)) = E(Vj (n)Vk (n)) = ρjk (n), j = k,
E(Uj (n)Vk (n)) = −E(Vj (n)Uk (n)) = −
ρjk (n), j = k,
E(Uj (n)Vj (n)) = 0,
for some ρjk (n) and ρjk (n) such that 0 ≤ ρjk (n)2 + ρjk (n)2 ≤ 1. Direct
calculation of auto- and cross-covariances gives
rjk (t) = σj (n)σk (n) (ρjk (n) cos ω(n)t + ρjk (n) sin ωn t) ,
n
= Ajk (n) cos (ω(n)t − Φjk (n)) , j = k, (6.3)
n
rkj (t) = σj (n)σk (n) (ρjk (n) cos ω(n)t − ρjk (n) sin ωn t) ,
n
= Akj (n) cos (ω(n)t − Φkj (n)) , j = k, (6.4)
n
rj (t) = σj (n)2 cos ω(n)t, j = 1, . . . , p. (6.5)
n
Here, Ajk (n) = Akj (n) = σj (n)σk (n) ρjk (n)2 + ρjk (n)2 , represent the corre-
lation between the amplitudes, while Φjk (n) = −Φkj (n), with
ρjk (n) ρjk (n)
cos Φjk (n) = , sin Φjk (n) = ,
ρjk (n)2 + ρjk (n)2 ρjk (n)2 + ρjk (n)2
represent the phase relations.
The corresponding spectral distributions Fkk (ω) are symmetric with mass
ΔFk = σk (n)2 /2
at the frequencies ±ω(n), while for j = k , the cross spectrum Fjk (ω) is skewed
if ρjk (n) = 0, with mass
⎧
⎪ 1 ρ (n) + i ρjk (n)
⎪
⎪ A (n) e−iΦjk (n) = 12 Ajk (n) jk , at ω = ωn ,
⎨ 2 jk ρjk (n)2 + ρjk (n)2
ΔFjk =
⎪
⎪ ρ (n) − i ρjk (n)
⎪
⎩ 12 Ajk (n) eiΦjk (n) = 12 Ajk (n) jk , at ω = −ωn .
ρjk (n) + ρjk (n)2
2
Section 6.2 Some random field theory 165
|ΔFjk (n)|2
= ρjk (n)2 + ρjk (n)2 .
ΔFjj (n)ΔFkk (n)
t = (t1 , . . . , tp ).
The covariance of the process values at two parameter points depends on dis-
tance as well as on direction of the vector between the two points.
In spatial applications it is popular to use the variogram defined by
2γ(u, v) = E |x(u) − x(v)|2
Theorem 6:2 (a) The covariance function r(t) of a homogeneous random field
has a spectral distribution
r(t) = eiω·t dF (ω),
One should also note the particularly simple form of the covariance function
for the case p = 3, ∞
sin(ωt)
r(t) = dG(ω).
0 ωt
Another special case that needs special treatment is when the parameter t
is composed of both a time and a space parameter, t = (t, s1 , . . . , sp ), one could
hope for isotropy in (s1 , . . . , sp ) only, in which case the spectral form becomes
∞ ∞
r(t, (s1 , . . . , sp )) = eiνt · Hp (ω(s1 , . . . , sp )) dG(ν, ω),
ν=−∞ ω=0
where
Hp (x) = ((2/x))(p−2)/2 Γ(p/2) J(p−2)/2 (x).
The form of the covariance function for an isotropic field as a mixture of
Bessel functions is useful for non-parametric estimation from data. It is only
Section 6.2 Some random field theory 169
the weight function G(ω) that needs to be estimated, for example as a discrete
distribution.
The theorem gives all possible isotropic covariance functions valid in the
special dimension p. Some functions can be used as covariance function in any
dimension, for example
r(t) = σ 2 exp(−φtα ),
σ2 √ √
r(t) = ν−1
(2 ν tφ)ν Kν (2 ν tφ),
2 Γ(ν)
Aω cos(κ1 s1 + κ2 s2 + ωt + φω ).
For fixed t this is a cosine-function in the plane, which is zero along lines
κ1 s1 +κ2 s2 +ωt+φω = π/2+kπ , k integer. For fixed (s1 , s2 ) it is a cosine wave
with frequency ω . The parameters κ1 and κ2 are called the wave numbers.
In general there is no particular relation between the time frequency ω and
the space frequencies ν , except for water waves, which we shall deal with later.
However, one important application of space-time random fields is the model-
ing of environmental variables, like the concentration of a hazardous pollutant.
Over a reasonably short period of time the concentration variation may be
regarded as statistically stationary in time, at least averaged over a 24 hour
170 Vector processes and random fields Chapter 6
period. But it is often unlikely that the correlation structure in space is inde-
pendent of the absolute location. Topography, location of cities and pollutant
sources, makes the process inhomogeneous in space.
One way to overcome the inhomogeneity is to make a transformation of the
space map and move each observation point (s1 , s2 ) to a new location (ŝ1 , ŝ2 )
so that the field x̂(t, ŝ1 , ŝ2 ) = x(t, s1 , s2 ) is homogeneous. This may not be
exactly attainable, but the technique is often used in environmental statistics
for planning of measurements.
Aω cos(ωt − κs + φω ).
ω 2 = κg tanh(hκ),
which for infinite depth2 reduces to ω 2 = κg. Here g is the constant of gravity.
A Gaussian random wave is a mixture of elementary waves of this form, in
spectral language, with κ > 0 solving the dispersion relation,
∞ ∞
i(ωt−κs)
x(t, s) = e dZ+ (ω) + ei(ωt+κs) dZ− (ω)
ω=−∞ ω=−∞
= x+ (t, s) + x− (t, s).
Here is a case when it is important to use both positive and negative frequencies;
cf. the comments in Section 4.3.3.4. Waves described by x+ (t, s) move to the
right and waves in x− (t, s) move to the left with increasing t.
Keeping t = t0 or s = s0 fixed, one obtains a space wave, x(t0 , s), and a time
wave, x(t, s0 ) respectively. The spectral density of the time wave, x(t, s0 ) =
x+ (t, s) + x− (t, s0 ) is called the wave frequency spectrum,
and we see again that it is not possible to distinguish between the two wave
directions by just observing the time wave.
2
tanh x = (ex − e−x )/(ex + e−x ) .
Section 6.2 Some random field theory 171
The space wave has a wave number spectrum given by the equation, for
infinite water depth, with ω 2 = gκ > 0,
2ω space 2
fxtime (ω) = f (ω /g),
g x
%
space 1 g time √
fx (κ) = f ( gκ).
2 κ x
One obvious effect of these relations is that the space process seems to have
more short waves than can be inferred from the time observations. Physically
this is due to the fact that short waves travel with lower speed than long waves,
and they are therefore not observed as easily in the time process. Both the
time wave observations and the space registrations are in a sense “biased” as
representatives for the full time-space wave field.
In Chapter 3, Remark 3:3, we introduced the mean period 2π ω0 /ω2 of
a stationary time process. The corresponding quantity for the space wave is
the mean wave length, i.e. the average distance between two upcrossings of the
mean level by the space process x(t0 , s). It is expressed in terms of the spectral
moments of the wave number spectrum, in particular
κ0 = fxspace(κ) dκ = fxtime dω = ω0 ,
κ ω
%
2 space ω4 g2 time 2ω
κ2 = κ fx (κ) dκ = 2
f
2 x
(ω) dω = ω4 /g2 .
κ ω 2g ω g
The average wave length is therefore 2πg ω0 /ω4 . We see that the average wave
length is more sensitive to the tail of the spectral density than is the average
wave period. Considering the difficulties in estimating the high frequency part
of the wave spectrum, all statements that rely on high spectral moments are
unreliable.
The case of a two-dimensional time dependent Gaussian wave x(t, s1 , s2 ),
the elementary waves with frequency ω and direction θ become
Aω cos(ωt − κ(s1 cos θ + s2 sin θ) + φω ), (6.7)
where ω > 0 and κ > 0 is given by the dispersion relation. With this choice of
sign, θ determines the direction in which the waves move.
The spectral density for the time-space wave field specifies the contribution
to x(t, (s1 , s2 )) from elementary waves of the form (6.7). Summed (or rather
integrated) over all directions 0 ≤ θ < 2π , they give the time wave x(t, s0 ), in
which one cannot identify the different directions. The spectral distribution,
called the directional spectrum, is therefore often written in polar form, based
on the spectral density fxtime (ω) for the time wave, as
f (ω, θ) = fxtime (ω)g(ω, θ).
2π
The spreading function g(ω, θ), with 0 g(ω, θ) dθ = 1, specifies the relative
contribution of waves from different directions. It may we frequency dependent.
172 Vector processes and random fields Chapter 6
0.6 250
150 0.4 30
200
0.2
150
180 0
100
210 330
50
240 300
270 0
0 50 100 150 200 250 300
Figure 6.1: Left: Level curves for directional spectrum with frequency depen-
dent spreading. Right: Level curves for simulated Gaussian space sea.
Example 6:1 Wave spectra for the ocean under different weather conditions
are important to characterize the input to (linear or non-linear) ship models.
Much effort has been spent on design and estimation of typical wave spectra.
One of the most popular is the Pierson-Moskowitz spectrum,
α −1.25(ωm /ω)4
fPtime
M (ω) = e .
ω5
or the variant, the Jonswap spectrum, in which an extra factor γ > 1 is
introduced to enhance the peak of the spectrum,
α −1.25(ωm /ω)4 exp(−(1−ω/ωm )2 /2σm
2 )
fJtime (ω) = 5
e γ .
ω
In both spectra, α is a main parameter for the total variance, and ωm defines
the “peak frequency”. The parameters γ and σm determine the peakedness
of the spectrum. The spectrum and a realization of a Gaussian process with
Jonswap spectrum was shown in Example 1:4.
Figure 6.1 shows to the right, the level curves for a simulated Gaussian
wave surface with the directional spectrum with frequency dependent spreading,
shown on the left. Frequency spectrum is of Jonswap type.
As mentioned in the historical Section 1.6.3, Gaussian waves have been used
since the early 1950, with great success. However, since Gaussian processes are
symmetric, x(t) has the same distribution as −x(t) and as x(−t), they are not
very realistic for actual water waves except in special situations; deep water, no
strong wind. Much research is presently devoted to development of “non-linear”
stochastic wave models, where elementary waves with different frequencies can
interact, in contrast to the “linear” Gaussian model, where they just add up.
Section 6.2 Some random field theory 173
Exercises
6:1. To be written.
174 Vector processes and random fields Chapter 6
Appendix A
Here is a collection of the basic probability axioms, together with proofs of the
extension theorem for probabilities on a field (page 5), and of Kolmogorov’s
extension theorem, from finite-dimensional probabilities to infinite-dimensional
ones (page 10).
A1 , A2 , . . . ∈ F implies ∪∞
n=1 An ∈ F.
175
176 Appendix A.
Theorem A:1 Suppose P is a function which is defined for all sets in a field
F0 , there satisfying the probability axioms, i.e. (with three equivalent formula-
tions of Condition 4),
m
m
m
≤ P (Am − Kn ) ≤ P (An − Kn ) ≤ ε/2n ≤ ε,
1 1 1
P (x1 ≤ b1 , . . . , xn ≤ bn ) = Fn (b1 , . . . , bn ).
Proof: We prove first the Extension formulation. We are given one probability
measure Pn on each one of (Rn , Bn ), n = 1, 2, . . . . Consider the intervals in
R∞ , i.e. the sets of the form
for some n, and unions of a finite number of intervals. Define P for each
interval by n
4
P (I) = Pn (ai , bi ] .
1
Let I1 and I2 be two disjoint intervals. They may have different dimensions
(n1 = n2 ), but setting suitable ai or bi equal to ±∞, we may assume that
they have the same dimension. The consistency of the family {Pn } guarantees
that this does not change their probabilities, and that the additivity property
holds, that is, if also I1 ∪ I2 is an interval, then P (I1 ∪ I2 ) = P (I1 ) + P (I2 ). It
is easy to extend P with additivity to all finite unions of intervals. By this we
have defined P on the field F0 of finite unions of intervals, and checked that
properties (1), (2), and (3) of Theorem A:1 hold.
Now check property (4a) in the same way as for Theorem A:2, for a decreas-
ing sequence of non-empty intervals with empty intersection,
I1 ⊇ I2 ⊇ . . . , with ∩∞
1 In = ∅,
Section A.3 Kolmogorov’s extension to R∞ 179
and suppose P (In ) ↓ h > 0.1 We can always assume In to have dimension n,
(n) (n)
In = {x ∈ R∞ ; aj < xj ≤ bj , j = 1, . . . , n},
(n) (n)
For each j , [αj , βj ], n = j, j + 1, . . . , is a decreasing sequence of non-empty,
closed and bounded intervals, and by Cantor’s theorem they have at least one
(n) (n)
common point, xj ∈ ∩∞ n=j [αj , βj ]. Then, x = (x1 , x2 , . . .) ∈ Ln for all n.
Hence x ∈ Ln ⊆ In for all n and the intersection ∩∞ 1 In is not empty. This
contradiction shows that P (In ) ↓ 0, and (3”) is shown to hold.
The conditions (1), (2), (3), and (4a) of Theorem A:1 are all satisfied, and
hence P can be extended to the σ -field F generated by the intervals.
To get the Existence formulation, just observe that the family of finite-dimen-
sional distributions uniquely defines Pn on (Rn , Bn ), and use the extension.
2
Exercises
A:1. Let Z be the integers, and A the family of subsets A, such that either
A or its complement Ac is finite. Let P (A) = 0 in the first case and
P (A) = 1 in the second case. Show that P can not be extended to a
probability to σ(A), the smallest σ -field that contains A.
1
Property (4a) deals with a decreasing sequence of finite unions of intervals. It is easy to
convince oneself that it suffices to show that (4a) holds for a decreasing sequence of intervals.
180 Appendix A.
Appendix B
Stochastic convergence
Here we summarize the basic types of stochastic convergence and the ways we
have to check the convergence of a random sequence with specified distributions.
Definition B:1 Let {xn }∞ n=1 be a sequence of random variables x1 (ω), x2 (ω),
. . . defined on the same probability space, and let x = x(ω) be a random variable,
defined on the same probability space. Then, the convergence xn → x as n → ∞
can be defined in three ways:
a.s.
• almost surely, with probability one (xn → x): P ({ω; xn → x}) = 1;
• in quadratic mean (xn → x): E |xn − x|2 → 0;
q.m.
P
• in probability (xn → x): for every > 0, P (|xn − x| > ) → 0.
L
Furthermore, xn tends in distribution to x, (in symbols xn → x) if
P (xn ≤ a) → P (x ≤ a)
181
182 Appendix B.
To prove this, note that if ω is an outcome such that the real sequence
xn (ω) does not converge to x(ω), then
⎧ ⎫
⎨ ∞ 3∞ ∞ ⎬
ω∈ |xn (ω) − x(ω)| > 1/q .
⎩ n=m
⎭
q=1 m=1
and this is 0 for all q if and only if (B.1) holds for all δ > 0. The reader should
complete the argument, using that P (∪k Ak ) = 0 if and only if P (Ak ) = 0 for
all k , and the fact that if B1 ⊇ B2 ⊇ . . . is a non-increasing sequence of events,
then P (∩k Bk ) = limk→∞ P (Bk ). Now,
∞
P (|xn − x| > δ for at least one n ≥ m ) ≤ P (|xn − x| > δ),
n=m
and hence a simple sufficient condition for (B.1) and a sufficient condition for
almost sure convergence is that for all δ > 0,
∞
P (|xn − x| > δ) < ∞. (B.2)
n=1
(In fact, the first Borel-Cantelli lemma directly shows that (B.2) is sufficient
for almost sure convergence.)
A simple moment condition is obtained from the inequality P (|xn − x| >
δ) ≤ E(|xn − x|h )/δh , giving that a sufficient condition for almost sure conver-
gence is
∞
E(|xn − x|h ) < ∞, (B.3)
n=1
for some h > 0.
A Cauchy convergence type condition is the following: sufficient condition
for almost sure convergence:
if there exist
∞ two sequences of positive numbers
δn and n such that ∞ δ
n=1 n < ∞ and n=1 n < ∞, and such that
A sequence of random variables can converge almost surely, and we have just
given sufficient conditions for this. But we shall also need convergence of a
sequence of random functions {xn (t); t ∈ T }, where T = [a, b] is a closed
bounded interval.
that is, if xn lies close to the limiting function x in the entire interval [a, b]
for all sufficiently large n.
a.s.
then there exists a random function x(t); a ≤ t ≤ b, such that xn (t) → x(t)
uniformly for t ∈ [a, b].
E(|xm − xn |2 ) → 0, (B.6)
The limit x has E(|x|2 ) = lim E(|xn |2 ) < ∞, and E(xn ) → E(x). If there
q.m. q.m.
are two convergent sequences, xn → x and yn → y , then
P
If xn → x, take any δ > 0 and consider P (|xn − x| > δ) → 0 as n → ∞. The
meaning of the convergence is that for each k there is an N k such that
which is finite by construction. The sufficient criterion (B.2) gives the desired
almost sure convergence of the subsequence xnk .
Exercises
B:1. Prove the Borel-Cantelli lemma:
Section B.3 Criteria for convergence in probability 185
B:3. Suppose the random sequences xn and xn have the same distribution.
a.s.
Prove that if xn → x then there exists a random variable x such that
a.s.
xn → x .
186 Appendix B.
Appendix C
1. The operations addition and subtraction are defined, and there exists a
unique ”zero” element 0 ∈ H and to each x ∈ H there is a unique
inverse −x:
x + y = y + x ∈ H,
x + 0 = x,
x + (−x) = 0.
c · x ∈ H,
0 · x = 0,
1 · x = x.
187
188 Appendix C.
(x, y) = (y, x) ∈ C,
(ax + by, z) = a(x, z) + b(y, z),
(x, x) ≥ 0,
(x, x) = 0 if and only if x = 0.
4. A norm x and a distance d(x, y) = x−y are defined, and convergence
has the standard meaning: if x ∈ H then x = (x, x)1/2 , and if xn , x ∈
H then limn→∞ xn = x if and only if xn − x → 0.
We list some further properties of Hilbert spaces and scalar products, which
will be seen to have parallels as concepts for random variables:
Schwarz inequality: |(x, y)| ≤ x · y with equality if and only if (y, x)x =
(x, x)y ,
V = M1 ⊕ . . . ⊕ Mk
x=y+z
y1 = c11 x1 ,
y2 = c21 x1 + c22 x2 ,
···
yn = cn1 x1 + cn2 x2 + . . . + cnn xn ,
···
190 Appendix C.
First, it is clear that (x, y) = E(xy) has the properties of a scalar product;
check that. It is also clear that we can add random variables with mean zero
and finite variance
to obtain new random variables with the same properties.
Also, x = E(|x|2 ), which means that if x = 0, then P (x = 0) = 1,
so random variables which are zero with probability one, are, in this context,
defined to be equal to the zero element 0. Convergence in the norm · is
equal to convergence in quadratic mean of random variables, and if a sequence
of random variables xn is a Cauchy sequence, i.e. xm −xn → 0 as m, n → ∞,
then we know that it converges to a random variable x with finite mean, which
means that H(Ω) is complete. Therefore it has all the properties of a Hilbert
space.
Section C.3 Stochastic processes and Hilbert spaces 191
H(x, t) = S(x(s); s ≤ t)
e(t), t = . . . , −1, 0, 1, 2, . . . ,
If |b1 | < 1, the process can be inverted and e(t) simply retrieved from x(s), s ≤
t:
Here,
yn (t) ∈ S(x(s); s = t − n, . . . , t) ⊆ S(x(s); s ≤ t) = H(x, t),
while
zn (t) = |b1 |n+1 → 0
as n → ∞. Thus e(t) − yn (t) → 0 and we have that
∞
n
e(t) = (−b1 ) x(t − k) = lim
k
(−b1 )k x(t − k) ∈ H(x, t)
n→∞
k=0 k=0
n
e(t) = x(t − k) + e(t − n − 1),
k=0
and so, since the left hand side does not depend on n,
1 1 1
N N n N
e(t) = e(t) = x(t − k) + e(t − n − 1)
N N N
n=1 n=1 k=0 n=1
N
1
N
k
= 1− x(t − k) + e(t − n − 1) = yN (t) + zN (t).
N N
k=0 n=1
N
Now, zN (t) = N1 n=1 e(t − n − 1) = e(t) − yN (t) → 0 by the law of large
numbers, since all e(t) are uncorrelated with E(e(t)) = 0 and V (e(t)) = 1.
We have shown that e(t) is in fact the limit of a finite linear combination of
x(s)-variables, i.e. e(t) ∈ H(x, t).
Appendix D
N −1
z(n) = Z(k) exp(i2πkn/N ), (D.1)
k=0
193
194 Appendix D.
Before we describe the steps in the simulation we repeat the basic facts
about processes with discrete spectrum, and the special problems that arise
when sampling a continuous time process.
Here ρ0 is a random level shift, while {ρk } are the amplitudes and {φk } the
phases of the different harmonic components of x(t). The frequencies ωk > 0
can be any set of fixed positive frequencies.
If we define
Z(0) = ρ0 ,
Z(k) = ρk exp(iφk ), for k = 1, 2, . . ..
it is easy to see that x(t) in (D.2) is the real part of a complex sum, so if we
write y(t) for the imaginary part, then
∞
x(t) + iy(t) = Z(k) exp(iωk t). (D.3)
k=0
D.3 Aliasing
If a stationary process {x(t), t ∈ R} with continuous twosided spectral den-
sity fx (ω), is sampled with a sampling interval d, the sequence {x(nd), n =
Section D.4 Simulation scheme 195
(d)
0, ±1, . . .} has a spectral density fx (ω) that can be restricted to any interval
of length π/d, for example the interval (−π/d, π/d]. There it can be written
as a folding of the original spectral density,
∞
2πj
fx(d) (ω) = fx ω+ , for −π/d < ω ≤ π/d.
d
j=−∞
(d)
The corresponding one-sided spectral density gx (ω) can then be defined on
[0, π/d] as
fx(d) (ω) + fx(d) (−ω).
For reasons that will become clear later (Section D.6) we prefer to define it
instead on [0, 2π/d) by
∞
2πj
gx(d) (ω) = fx ω+ , for 0 ≤ ω < 2π/d. (D.4)
d
j=−∞
N −1
z(n) = Z(k) exp(i2πkn/N ) (D.5)
k=0
∞
x(t) = Re Z(k) exp(iωk t) (D.6)
k=0
x(nd) = Re z(n), n = 0, 1, . . . , N − 1.
2π
J
2πj
σk2 = fx (ωk + ), k = 0, 1, . . . , N − 1, (D.8)
dN d
j=−J
where J is taken large enough. If fx (ω) ≈ 0 for ω ≥ ωmax one can take J = 0.
Section D.6 Simulation of the envelope 197
D.7 Summary
In order to simulate a sample sequence of a stationary process {x(t), t ∈ R}
with spectral density fx (ω) over a finite time interval one should do the follow-
ing:
2π
J
2πj
σk2 = fx (ωk + ), k = 0, 1, . . . , N − 1,
dN d
j=−J
Uk , Vk , for k = 0, 1, . . . , N − 1
5. Set Z(k) = σk (Uk + iVk ) and calculate the (inverse) Fourier transform
N −1
z(n) = Z(k) exp(i2πkn/N ).
k=0
x(nd) = Re z(n), n = 0, 1, . . . , N − 1;
y(nd) = Im z(n), n = 0, 1, . . . , N − 1;
[2] Banerjee, S., Carlin, B.P. and Gelfand, A.E. (2004): Hierarchical mod-
eling and analysis for spatial data. Chapman & Hall/CRC, Boca Raton.
indexCarlin, B.P.
[3] Belyaev, Yu.K. (1959): Analytic random processes. Theory Probab. and its
Applications, English edition, 4, 402.
[4] Belyaev, Yu.K. (1961): Continuity and Hölder’s conditions for sample func-
tions of stationary Gaussian processes. Proc. Fourth Berk. Symp. on Math.
Stat. and Probability, 2, 22-33.
[9] Cramér, H. and Leadbetter, M.R. (1967): Stationary and related stochastic
processes. Wiley, New York. Reprinted by Dover Publications, 2004.
199
200 Literature
[14] Jordan, D.W. and Smith, P. (1999): Nonlinear ordinary differential equa-
tions. 3rd Ed. Oxford University Press.
[17] Ibragimov, I.A. and Linnik, Yu.V. (1971): Independent and stationary
sequences of random variables. Wolters-Noordhoff, Groningen.
[18] Ibragimov, I.A. and Rozanov, Y.A. (1978): Gaussian random processes.
Springer-Verlag, New York.
[21] Lasota, A. and Mackey, M.C. (1994): Chaos, fractals, and noise; stochastic
aspects of dynamics. Springer-Verlag, New York.
[27] Rice, S.O. (1944, 1945): Mathematical analysis of random noise. Bell Sys-
tems technical Journal, 23, 282-332, and 24, pp 46-156. Reprinted in:
Wax, N. (1954): Selected papers on noise and stochastic processes. Dover
Publications, New York.
[28] Rice, S.O. (1963): Noise in FM-receivers. In: Time Series Analysis, Ed:
M. Rosenblatt, Chapter 25, pp. 395-422. Wiley, New York.
[29] Royden, H.L. (1988): Real Analysis, 3rd Ed. Prentice Hall.
Literature 201
[31] Slepian, D. (1963): On the zeros of Gaussian noise. In: Time Series Anal-
ysis, Ed: M. Rosenblatt, pp. 104-115. Wiley, New York.
[32] St. Denis, M. and Pierson, W.J. (1954): On the motion of ships in confused
seas. Transactions, Soc. Naval Architects and Marine Engineers, Vol. 61,
(1953) pp. 280-357.
[33] van Trees, H. (1968): Detection, Estimation, and Modulation Theory. Part
I”, John Wiley & Sons.
202
Index 203
upcrossings
counting, 60
expected number, 57
intensity, 58
slope at, 56
variogram, 166
vertical window conditioning, 66
von Plato, J., 133, 200