Background Notes To Course Stochastic Processes
Background Notes To Course Stochastic Processes
Spring 2013
always under construction as well
Abstract
1 Review of definitions
Recall the following definitions: let a space E be given. A collection A of subsets of E is called a an
algebra (or a field) if the following three conditions hold:
i) E ∈ A;
ii) A ∈ A ⇒ Ac = E \ A ∈ A;
iii) A, B ∈ A ⇒ A ∪ B ∈ A.
Definition 1.1 A collection T of so-called open subsets of the space E is called a topology if
i) ∅, E ∈ T ;
1
there exists rx such that Brx (x) = {y | ρ(x, y) < rx } is contained in A. If, in addition, the metric
space E is separable (i.e. there is a countably dense subset), then B(E) = σ(Bq (x), q ∈ Q, x ∈
countably dense subset of E).
We use B to denote the Borel sets in R.
Many statements concerning σ-algebras can be reduced to statements on certain generic collections
of sets: a main one is the notion of π-system. A collection A of subsets of E is called a π-system if
A is invariant under finite intersections, i.e. A, B ∈ A ⇒ A ∩ B ∈ A.
Desirable properties of π-systems are the following (see Williams, PwM).
Lemma 1.2 i) Let I be a π-system on E generating the σ-algebra A. Suppose that µ1 , µ2 are
measures on (E, σ(I)), such that µ1 (E) = µ2 (E) < ∞, and µ1 (A) = µ2 (A) for all A ∈ I.
Then µ1 (A) = µ2 (A) for all A ∈ (σ(I));
ii) Let a probability space (Ω, F, P) be given. Let I, J be π-systems on E with A = σ(I) and
F = σ(J ). Then A and F are independent if I and J are independent, that is, if P{A ∩ B} =
P{A}P{B} for all A ∈ I, B ∈ J .
Suppose that we have measure spaces (Ω, F) and (E, E). Then the function X : Ω → E is called
a random element if X is F/E-measurable. In other words, if X −1 (A) ∈ F, for all A ∈ E. The
following lemma is helpful in checking measurability.
Lemma 1.3 Let (Ω, F), (E, E) be a measurable space. Let C be a collection of subsets of E, such
that σ(C) = E. Let X : Ω → E be a map. Then X is F/E-measurable if for X −1 (A) ∈ F for all
A ∈ C.
Let the probability space (Ω, F, P) be given, as well as the measurable space (E, E). Let further
X = (Xt )t∈T be a s.p. (stochastic process) with state E, for index set T . This means that Xt , t ∈ T ,
are all E-valued, F/E-measurable random elements. We assume that T is an interval of R.
Given ω ∈ Ω, the corresponding realisation x(ω) = (xt (ω))t∈T of the s.p. X is called a sample path
or trajectory of X.
One can also view X(ω) as an E-valued function of T , i.e. there exists f : T → E, such that
X(ω) = f , and Xt (ω) = f (t), t ∈ T . Hence, the process X takes values in the function space E T .
The question is: how should one define a σ-algebra on E T in some consistent way and such that we
can answer questions concerning the probability of events of interest that concern the whole path?
We take the following approach.
A finite-dimensional rectangle in E T is a set of the form
S = {x ∈ E T | xt1 ∈ E1 , . . . , xtn ∈ En },
E T = σ{1-dimensional rectangles}.
2
For understanding the structure of this σ-algebra, let us recall the concept of product-σ-algebras E n ,
n = 1, 2, . . . , ∞.
For n finite, this is simply the σ-algebra generated by all n-dimensional rectangles
E n = σ{E1 × · · · × En | E1 , . . . , En ∈ E}.
E ∞ = σ{E1 × · · · × En × E × E × · · · | E1 , . . . , En ∈ E, n = 1, 2, . . .}.
Now we shall call a set A ∈ E T a cylinder, if there exists a finite subset {t1 , . . . , tn } of T and a set
B ∈ E n , such that
A = {x | (xt1 , . . . , xtn ) ∈ B}. (2.1)
It will be called a σ-cylinder, if there exists a (at most) non-ordered countable subset {t1 , . . .} of T
and a set B ∈ E ∞ , such that (2.1) holds.
We have the following characterisation.
Proof. First note that the σ-cylinders are contained in the σ-algebra generated by finite-dimensional
cylinders. Hence E T contains all σ-cylinders. Notice that by definition E T is the smallest σ-algebra
containing all 1-dimensional cylinders. Hence it is the smallest σ-algebra containing all σ-cylinders.
So it is sufficient to show that the σ-cylinders themselves form a σ-algebra.
First E T = {x | xt ∈ E} for arbitrary, but fixed t. Hence it is a σ-cylinder.
Let A = {x | (xt1 , xt2 , . . .) ∈ B}, B ∈ E ∞ . Then Ac = {x | (xt1 , xt2 , . . .) ∈ E ∞ \ B} is a σ-cylinder.
Finally, let An = {x | (xtn1 , xtn2 , . . .) ∈ Bn }, Bn ∈ E ∞ , n = 1, 2, . . .. It is sufficient to show that the
intersection is a σ-cylinder. To this end, we need to describe all An in terms of the same ‘time’
points. We therefore need to form the concatenation of the time points that define the different
σ-cylinders A1 , . . .. Let Sn = {tnk }k=1,... , put S = ∪n Sn . S is countable.
Let f : S → {1, 2, . . .} be a bijective map (representing an enumeration of the points of S). The set
An can be defined in terms of all time points f −1 (1), f −1 (2), . . .. At time points f −1 (m) 6∈ {tnk }k any
path x ∈ An is allowed to take any values. At time points f −1 (m) ∈ {tnk }k the values are prescribed.
To formalise this, define B̃n ⊂ E ∞ by
3
Remark A stochastic process X = (Xt )t , with Xt F/E-measurable, is F/E T -measurable by con-
struction. Hence, the induced probability measure on (E T , E T ) is given by:
for any σ-cylinder A. This probability measure is uniquely determined by any π-system generating
E T . A convenient π-system is given by
n
I = A | ∃n ∈ Z+ , t1 < · · · < tn , ti ∈ T, i = 1 . . . , n, A1 , . . . , An ∈ E,
o
such that A = {x ∈ E T | xti ∈ Ai , i = 1 . . . , n} .
Note that a random variable Xt can be obtained from X by projection or co-ordinate maps: let
S ⊂ T , and let x ∈ E T . For any S ⊆ T : E S = {x = (xt )t∈S : xt ∈ E}. The precise indices
belonging to S do not disappear in this notation. For instance if T = {1, 2, 3, 4, 5} and S = {1, 5},
then RS = {x = (x1 , x5 ), x1 , x5 ∈ R}. This looks artificial, but is natural in the context of viewing
the index set as the collection of observation instants of a process.
Then the projection πS : E T → E S is defined by
One can view σ-algebras as the amount of information on individual states that is available to the
observer: of individual states one can only observe the measurable sets to which a state belongs. By
virtue of Lemma 2.1, one can only observe the values of a function in E T in countably many time
instants in T .
It is often convenient to complete a σ-algebra. Consider the probability space (Ω, F, P). The
completion F of F w.r.t. P is the σ-algebra generated by F and the sets A ⊂ E, s.t. there exists sets
A1 , A2 ∈ F with A1 ⊂ A ⊂ A2 and P{A2 \ A1 } = 0. Put P{A} = P{A1 }. In words, the completion
of F is the σ-algebra obtained by adding all subsets of null sets to F.
Problem 2.2 Show that the set of measurable real-valued functions on T = [0, ∞) is not measurable,
in other words, show that
Moreover, show that there is no probability measure P on (RT , B T ) such that M ∈ B T , that is, such
that M belongs to the completion of B T w.r.t. P. Hint: consider M c ; show that no σ-cylinder can
contain only measurable functions x.
4
σ-Algebra generated by a random variable or a stochastic process
In a similar fashion one can consider the σ-algebra generated by random variables or by a stochastic
process.
Let Y : Ω → E be an F/E-measurable random variable. Then F Y ⊂ F is the σ-algebra generated
by Y . This is the minimal σ-algebra that makes Y a measurable random variable. Suppose that C
is a π-system generating E, then Y −1 C is a π-system generating F Y .
Let us consider a stochastic process X = (Xt )t∈T with T = Z+ or T = R+ . The σ-algebra FtX is the
σ-algebra generated by X upto time t. It is the minimal σ-algebra that makes the random variables
Xs , s ≤ t measurable and so it contains each σ-algebra FsX , for s ≤ t.
How can one generate this σ-algebra? If C is a π-system for E, then FtX = σ(Xs−1 (C), C ∈ C, s ≤ t}.
Unfortunately, the collection of sets {Xs−1 (C) | C ∈ C, s ≤ t} itself is not a π-system. However, the
following is a π-system generating FtX :
C X = {Xt−1
1
(C1 ) ∩ · · · ∩ Xt−1
n
(Cn ) | t1 < · · · < tn ≤ t, C1 , . . . , Cn ∈ C, n = 1, 2, . . .}.
This can be used in checking independence of σ-algebras, measurability issues, etc. What we see
here again, is that the information contained in the minimal σ-algebra FtX concerns the behaviour
of paths of X upto time t observed at at most countably many time points.
3 Measurable maps
Continuous maps between two metric spaces are always measurable with respect to the Borel-σ-
algebra on these spaces.
Corollary 3.1 Let E 0 , E be metric spaces. Let f : E 0 → E be a continuous map, i.e. f −1 (B) is
open in E 0 for each open set B ⊂ E. Then f is B(E 0 )/B(E)-measurable.
Proof. The open sets of E generate B(E). The statement follows from Lemma 1.3. QED
In stochastic process theory right-continuous functions of the positive real line R+ play an important
role. We want to show that such functions are measurable with respect to the Borel-σ-algebra on
R+ . We will be more specific.
Let E be a Polish space, with metric ρ say, and let E be the Borel-σ-algebra on E.
Lemma 3.2 There exists a countable class H of continuous functions f : E → [0, 1], such that
xn → x in E iff f (xn ) → f (x) in [0, 1] for all f ∈ H.
Proof. Take a countable, dense subset y1 , y2 , . . . of E. Let fk,n : E → [0, 1] be a continuous function
with fk,n (y) = 1 for ρ(y, yk ) ≤ 1/n, and fk,n (y) = 0 for ρ(y, yk ) ≥ 2/n. Then H = {fk,n | k, n = 1, . . .}
has the desired property. QED
5
Proof. In view of the previous lemma, it is sufficient to show that t 7→ f (xt ) has the desired property
for each f ∈ H. That is, it is sufficient to consider the case E = [0, 1].
For t > 0 let
if yt1 = 0
|xt − lims↑t xs |,
yt1 = lim sup xs − lim inf xs (≥ 0 !), yt2 =
s↑t s↑t 0, otherwise.
If x is discontinuous at t we have either yt1 > 0 or yt2 > 0. Hence, it is sufficient to show that for any
> 0 the sets Ai = {t | yti > } are at most countable (why?).
By right-continuity,
0 = lim ys1 = lim ys2 , t ≥ 0. (3.1)
s↓t s↓t
In particular, taking t = 0, gives that Ai ∩ (0, δ] = ∅ for some δ > 0. It follows that
But if Ai is uncountable, we would have τ i < ∞. This implies the existence of a sequence τni ∈
Ai ∩ (τ i , ∞) with τni ↓ τ i . Then lim sups↓τ i ysi ≥ , contradicting (3.1). QED
Let now X = {Xt }t be a stochastic process defined on the probability space (Ω, F, P) with right-
continuous paths. Define for f ∈ H, with H a convergence determining class, and yti , i = 1, 2, defined
in the above proof
(i)
Yt,f (ω) = yti (f (Xt (ω))
(i)
\
Cu = {ω | Xt (ω) is continuous at t = u} = {Yt,f = 0, i = 1, 2}.
f ∈H
It follows by right-continuity that lim inf t↑u f (Xt ), and lim supt↑u f (Xt ) are both measurable. Hence
(i)
Yt,f and Cu are as well. It makes sense to define u to be a fixed discontinuity of X if P{Cu } < 1.
Corollary 3.4 Let X = {Xt }t be a stochastic process defined on the probability space (Ω, F, P) with
right-continuous paths. Then X has at most countably many fixed discontinuities.
(i)
Proof. By (3.1) and dominated convergence we have lims↓t EYs,f = 0 for all t. Exactly as in the
(i)
proof of the previous lemma we may conclude that EYt,f = 0 except for t in an at most countable
(i) (i) (i)
set Nf . But when t 6∈ ∪i,f Nf , we have Yt,f = 0 a.s. for all f , implying that P{Ct } = 1. QED
Proof. It is sufficient to show for any open set A ∈ E that the set C = {t | xt ∈ A} ∈ B(R+ ). Let
T be the countable set of discontinuities of x. Let t ∈ C \ T . Then there exist qt1 < t < qt2 , qti ∈ Q,
such that xs ∈ A for s ∈ (qt1 , qt2 ). Let further T ∩ C = TC . Then C = TC ∪ ∪t∈C\T (qt1 , qt2 ). Since
the collection of sets {(q1 , q2 ) | q1 , q2 ∈ Q, q1 < q2 } is at most countable, C is the countable union of
measurable sets. QED
6
Consider again a metric space E, with metric ρ say. Then ρ(x, A) = inf y∈A ρ(x, y) is a continuous
function of x ∈ E, and hence it is B(E)/B(R)-measurable by Corollary 3.1. Let (Ω, F, P) be a
probability space, and let X be an E-valued r.v. Then ρ(X, A) is F/B(R)-measurable and so it is a
r.v.
Now, is ρ(X, Y ) F/B(R)-measurable, for Y another E-valued r.v. on Ω? This map is a composition of
the map (X, Y ) : Ω → E 2 and the map ρ : E 2 → R. The map (X, Y ) is F/B(E) × B(E)-measurable,
where B(E) × B(E) is the product-σ-algebra on E 2 . Consider on E 2 the topology generated by the
rectangles A×B with A, B open in E. This generates the Borel-σ-algebra B(E 2 ) of E 2 . The function
ρ(., .) : E 2 → R is continuous, hence B(E 2 )/B(R)-measurable. As a consequence, for measurability
of ρ(X, Y ), it is sufficient that B(E 2 ) = B(E) × B(E). This is guaranteed if E is separable!
Lemma 3.6 Let E, E 0 be separable, metric spaces. Then B(E) × B(E 0 ) = B(E × E 0 ).
It is clear that B(E) × B(E) ⊂ B(E 2 ). Can you define a space E and metric ρ, such that B(E 2 ) 6=
B(E) × B(E)?
Monotone class theorems Statements about σ-algebras can often be deduced from the statement
on a π-system. We have seen this with respect to measurability, and independence issues, where we
have theorems asserting this.
Suppose one does not have a theorem at one’s disposal. Then the idea is to show that the sets
satisfying a certain condition form a monotone class or a d-system, see below for the defintion. And
then one shows that the monotone class or the d-system contains a π-system generating the σ-algebra
of interest. By Lemma 3.4, the d-system then contains the σ-algebra of interest and so the desired
property applies to our σ-algebra.
Definition 3.7 A collection S of subsets of the space Ω say, is called a d-system or a monotone
class if
i) Ω ∈ S;
Lemma 3.8 If a d-system contains a π-system, I say, then the d-system contains the σ-algebra
σ(I) generated by the π-system.
7
Standard machinery
ii) Argue by linearity that this implies the result to hold for step functions, that is, finite linear
combinations of indicator functions.
iii) Consider non-negative (measurable) functions. Then any such function can be approached by
a non-decreasing sequence of step functions. Use monotone convergence to deduce the desired
result for non-negative measurable functions.
iv) Consider general measurable functions (if appropriate) and write such functions as the difference
of two non-negative functions. Apply (iii).
It is sometimes not easy to show the first step (i). Instead one would like to only consider indicator
functions of elements of a π-system generating the σ-algebra. The following theorem allows to deduce
results on general (real-valued) measurable functions from results on indicators of the elements of a
π-system.
Theorem 3.9 ((Halmos) Monotone Class Theorem: elementary version) Let H be a class
of bounded functions from a set S to R, satisfying the following conditions:
i) H is a vector space over R (i.e. it is an Abelian group w.r.t addition of functions, it is closed
under scalar multiplication, such that (αβ)f = α(βf ), (−1)f = −f and (α + β)f = αf + βf ,
for f ∈ H, α, β ∈ R);
If H contains the indicator function of every set in a π-system I, then H contains every bounded
σ(I)-measurable function.
Proof. One should use the Monotone Class theorem applied to bounded functions. Then for
T S
P
non-negative functions, with f = n≥0 fn , where fn = f 1{n≤f <n+1} . Notice that πS is E /E -
measurable as a consequence of Lemma 2.1. QED
8
4 Multivariate normal distribution
Definition. The random vector X = (X1 , . . . , Xn )T has a multivariate normal distribution iff
there exists a n × k matrix B, a vector a ∈ Rn and independent random variables Z1 , . . . , Zk , with
d
Zi = N(0, σ 2 (Zi )), σ 2 (Zi ) ≥ 0, defined on the same probability space (Ω, F, P), such that
X = a + BZ, (4.1)
where we write Z = (Z1 , Z2 , . . . , Zk )T . In this case, X is said to have the multivariate normal
| where a = (E(X1 ), . . . , E(Xn ))T is the mean vector and
distribution N(a, Σ),
σ 2 (Z1 )
σ 2 (Z2 ) T
| =B
Σ .. B , (4.2)
.
σ 2 (Zk )
Corollary 4.1 Suppose that the random vector X = (X1 , . . . , Xn )T has a multivariate normal dis-
| X . The following holds:
tribution with mean vector a and covariance matrix Σ
d
| ii ).
i) Xi = N(ai , Σ
ii) Let B be an m × n-matrix. Then the random vector BX has a multivariate normal distribution
| T.
with mean Ba and covariance matrix B ΣB
|X =Σ
Due to the above result, we will write Σ | for the covariance matrix of X. Then
σ 2 (Z1 )
σ 2 (Z2 )
|Z =
Σ .. .
.
σ 2 (Zk )
Problem 4.1.a Prove Corollary 4.1. You may use that a linear combination of independent normally
distributed random variables is normally distributed with the appropriate mean and variance.
So far, it is not clear that the multivariate normal distribution is completely determined by mean
vector and covariance matrix. In order that this be true, for any l ≥ 1, n × l matrix C, and
independent, normally distributed, non-degenerate, random variables Z10 , . . . , Zl0 with
| Z0 C T = BΣ
CΣ | Z BT (4.3)
9
Lemma 4.2 i) Suppose that X = (X1 , . . . , Xn )T has a multivariate normal distribution with mean
vector a and covariance matrix
2
0 ··· 0
σ1 0
0 σ22 0 ··· 0
.. . .. ..
|
ΣX = .
0 . . . . .
. .. ..
..
. . 0
0 ··· 0 σn2
d
Then X1 , . . . , Xn are independent random variables, with Xi = N(0, σi2 ), i = 1, . . . , n.
ii) For any l ≥ 1, any n × l matrix C and independent, normally distributed random variables
Z10 , . . . , Zl0 satisfying (4.3), assertion (4.4) holds. In other words: if the random vectors X =
(X1 , . . . , Xn )T and Y = (Y1 , . . . , Yn )T have a multivariate normal distribution with the same
d
mean vector and covariance matrix, then X = Y .
Proof. We first prove (i). To this end, let Z1 , . . . , Zk be independent, normally distributed random
variables (on (Ω, F, P), with variances σ 2 (Z1 ), . . . , σ 2 (Zk ), and let B be an n × k matrix, such that
(4.1) holds, with mean vector a = 0. By assumption, (4.2) equals the above diagonal matrix, that is
2
0 ··· 0
σ1 0 2
σ (Z1 )
0 σ22 0 · · · 0
..
.. .. ..
σ 2 (Z2 ) T
. 0 . . . =B . ..
B .
. . .
.. .. .. 0
2
σ (Zk )
0 ··· 0 σn2
Further, for j 6= i
k
X
cov(Xi , Xj ) = Bim Bjm σ 2 (Zm ) = 0. (4.5)
m=1
• No row of B is identically equal to 0. If this were true, say row i, then Xi would be identically
equal to 0, and then Xi would be independent of (X1 , . . . , Xi−1 , Xi+1 , . . . Xn ). Without loss of
generality, one thus can ‘delete’ the rows of B that are identically equal to 0.
• If n < k, one can add k − n rows to the matrix B in the following manner. Consider the n × k
matrix B̃ with elements
q
B̃ij = Bij σ 2 (Zj ), 1 ≤ i ≤ n, 1 ≤ j ≤ k.
Then the rows of B̃ are mutually orthogonal by (4.5). Add k − n rows to B̃, such that the
resulting k × k matrix, denoted
p B̃, has mutually orthogonal rows. Define the k × k extended
matrix B, by Bij = B̃ij / σ (Zj ), when σ 2 (Zj ) > 0, and arbitrary otherwise, for i = n +
2
10
| is not a diagonal matrix.
• If n > k, the Xi are linearly dependent and Σ
The conclusion is that, without loss of generality, we may assume that n = k, and B is non-singular.
By virtue of Corollary 4.1 (i), Xi has a normal distribution with mean 0 and variance
X
σi2 = Σ
| ii = 2 2
Bij σ (Zj ),
j
for i = 1, . . . , n. We want to show that PX = PX1 × PX2 × · · · × PXn , with the latter denoting the
product measure.
Suppose first, that Zi are all non-degenerate, i.e. σ 2 (Zi ) > 0, 1, . . . , n.
Since PZ = PZ1 × · · · × PZn , the product measure has a density, f say, where
1 1 1 T | −1
e− 2 z Σ
P 2 2
f(z1 , . . . , zn ) = p e− i zi /2σ (Zi ) = p Z z, z ∈ Rn .
(2π)n i σ 2 (Zi )
Q
| Z)
(2π)n det(Σ
It holds, that Z = B −1 X. Using a change of variables, we get that X has the density fX with
| −1
(B −1 x)T Σ −1 T | −1
Z (B x) = x ΣX x.
11
It is sufficient to study the distribution of (Xr+1 , . . . , Xn )T . But this follows similarly to the argu-
ments above.
We prove part (ii). Since Σ|X =Σ | Y is symmetric, we can write Σ| X = U DU T , where U T = U −1 .
T
Then V = U BZ is an n-dimensional random vector with a multivariate normal distribution, and a
diagonal covariance matrix. Thus, V has independent components by part (i), and the distribution is
a product measure of the separate components. Similarly, W = U T CZ 0 is an n-dimensional random
vector with a multivariate normal distribution, with the same diagonal covariance matrix as V .
Thus the distribution of W is the product measure of the separate components. Since corresponding
components of V and W have the same normal distributions, and the distributions of V and W are
product measures, the distributions of V and W are equal.
Since X = U V and Y = U W , it is straightforward to check that X and Y have the same distribution.
Indeed, for any product set A = A1 × · · · × An ∈ B(Rn ), taking the integral over the set {U V ∈ A}
and {U W ∈ A} with respect to the probability distribution of V and W respectively, yields the same
value, since the distributions of V and W are equal. As product sets form a π-system generating
B(Rn ), the probability distributions of X = U V and Y = U W are equal.
QED
By a similar transformation argument as in the proof of the above lemma, X has a density whenever
| 6= 0, given by
detΣ
1
fX (x) = p | −1 (x − a)}, x = (x1 , . . . , xk )T ∈ Rn .
exp{− 12 (x − a)T Σ
n |
(2π) det(Σ)
Remark 4.1 Word of caution: in general it is not true that two normally distributed random
variables X and Y with cov(X, Y ) = 0, are independent!
The following lemma shows that sub-vectors also have a multivariate normal distribution.
| 0 ) distribution, where
Lemma 4.3 i) The vector (Xν1 , . . . , Xνr ), ν1 < . . . < νr ≤ n, has a N(a0 , Σ
| 0 = (Σ
a0 = (aν1 , . . . , aνr )T and Σ | X,ij )i,j∈{ν1 ,...,νr } .
ii) Suppose that Σ | X has a block diagonal structure, in other words, there are indices r0 = 0 <
r1 < · · · < rm−1 < k = rm , such that Σ | X,ij 6= 0 implies that rs−1 + 1 ≤ i, j ≤ rs for some
s ∈ {1, . . . , m}.
Then the vectors (X1 , . . . , Xr1 ), (Xr1 +1 , . . . , Xr2 ), . . . , (Xrm−1 +1 , . . . , Xn ) are mutually inde-
pendent.
Proof. For the proof of (i), take the restriction of the mean vector a and the matrix B to the rows
corresponding to the indices ν1 , . . . νr . The result follows immediately.
| s = (Σ
We prove (ii). Denote as = (ars−1 +1 , . . . , ars ), and let Σ | X,ij )rs−1 +1≤i,j≤rs be the s-th diagonal
| X.
block of Σ
12
d Qm
| s ). Als note that det(Σ
By (i) (Xrs−1 +1 , . . . , Xrs ) = N(as , Σ | X) = |s
s=1 det(Σ ). By the symmetry of
s
| , diagonalisation leads to the decomposition
Σ
| s = Us Ds UsT ,
Σ
where Us , Ds are (rs − rs−1 ) × (rs − rs−1 ) matrices, s = 1, . . . , m. Put U the matrix with the same
block structure as Σ | X , but with s-th diagonal block equal to Us , and put D the diagonal matrix
with diagonal equal to the successive diagonal elements from D1 , . . . , Dm . Then U DU T = Σ | X.
T
Following notation and construction in the proof of Lemma 4.2, V = U BZ is a multivariate normally
distributed random vector with independent components and X = U V . Thus, for Ai ∈ B(R),
i = 1, . . . , n
Z
P{Xi ∈ Ai , i = 1, . . . , n} = dPV1 (z1 ) × · · · × dPVn (zn )
z∈Rn :
U z∈A1 ×···×An
Z Z
1
= dP(V1 ,...,Vr1 )(z ) · · · dP(Vr (z m )
m−1 +1 ,...,Vn )
z1 ∈Rr1 : z m ∈Rrm −rm−1 :
U1 z 1 ∈A1 ×···Ar1 Um z m ∈Arm−1 +1 ×···×An
= P{(X1 , . . . , Xr ) ∈ A1 × · · · × Ar1 } × · · ·
· · · × P{(Xrm−1 +1 , . . . , Xn ) ∈ Arm−1 +1 × · · · × An }.
The result follows. QED
if and only if
(ii) there exist a vector a ∈ Rn and a symmetric, non-negative definite n × n matrix Γ, such that
for all θ ∈ Rn
T
Eei(θ,X) = ei(θ,a)−θ Γθ/2 .
13
In case of random variables assuming non-negative values only, it may be convenient to work with
the Laplace transform. Let g : [0, ∞) → R. Then the Laplace transform L(g) is defined by
Z ∞
Lg(s) = e−st g(t)dt,
0−
with s ∈ C, and very often Re(s) ≤ 0 (in the case of random variables), provided the integral exists.
An inversion formula exists in this case as well.
Remark It is a consequence of both theorems for distribution functions F and G, that φF (θ) = φG (θ)
for all θ ∈ R implies that F ≡ G. Here φF and φG denote the characteristic functions of F and G.
The same conclusion holds for F and G the distributions of non-negative r.v. If LF (s) = LG(s) for
all s ≥ 0, then F ≡ G.
5 Addendum on Measurability
In the proof of Lemma 1.6.11 we need that the limit of a certain sequence of measurable maps is
measurable. We will show this.
Lemma 5.1 Let (E, d) be a metric space and let B(E) be the Borel-σ-algebra of open sets compatible
with d. Let (Xsn )s≤t , n = 1, . . ., be (E, B(E))-valued adapted stochastic processes on the filtered
probability space (Ω, F, (Fs )s≤t , P), such that the map
(s, ω) → Xsn (ω)
is measurable as a map from ([0, t] × Ω, B([0, t]) × Ft ) to (E, B(E)). Suppose that Xsn (ω) converges
to Xs (ω), n → ∞. Then the map (s, ω) → Xs (ω) is measurable.
Proof. Note that B(E) = σ(G | G open in E). Hence it is sufficient to check that
X −1 (G) = {(s, ω) | Xs (ω) ∈ G} ∈ B([0, t]) × F,
for all G open in E. Write G = ∪∞ c −1
k=1 Fk , with Fk = {x ∈ E | d(x, G ) ≥ k }. We need the closed
sets Fk to exclude that Xsm (ω) ∈ G for all m but limm→∞ Xsm (ω) 6∈ G. By pointwise convergence of
Xsm (ω) to Xs (ω) we have that
∞ [
[ ∞ \
∞
X −1 (G) = (X m )−1 (Fk ).
k=1 n=1 m=n
We now give an example of a stochastic process X = (Xt )0≤t≤1 , that is not progressively measurable.
Moreover, for this example we can construct a stopping time τ , for which Xτ is not Fτ -measurable.
14
Example 5.1 Let Ω = [0, 1], F = B[0, 1], and P is the Lebesgue-measure. Let A ⊂ Ω be non-
measurable, i.e. A 6∈ B. Put
t + ω, t∈A
Xt (ω) =
−t − ω, t 6∈ A.
Note that σ(Xt ) = B, for all t. Hence, the natural filtration FtX = B. Further |Xt (ω)| = t + ω is
continuous in t and ω.
Now {(s, ω) | Xs (ω) ≥ 0} = A×Ω is not measurable. Consequently X is not progressively measurable.
Define the stopping time T by
T = inf{t | 2t ≥ |Xt |},
so that T (ω) = ω. Clearly {ω : T (ω) ≤ t} = {ω ≤ t} ∈ B = FtX , so that indeed T is a stopping time.
One has Fτ = B. However,
2ω, ω∈A
XT (ω) (ω) =
−2ω, ω 6∈ A
and so {Xτ > 0} is not B-measurable, hence not Fτ -measurable.
Next, define the hitting time of x: τx = inf{t > 0 | Wt = x}. A direct proof of measurability of τx for
x 6= 0 follows here (cf. LN Example 1.6.9). To show F/B(R)-measurability of, it is suffcient to show
that τx−1 (t, ∞) is measurable for each t. This is because the sets {(t, ∞), t ∈ R} generate B(R).
For x 6= 0 one has
We have in fact proved that {τx > t} ∈ σ(Ws , s ≤ t), hence {τx ≤ t} ∈ σ(Ws , s ≤ t) and so τx is a
stopping time.
It is slightly more involved to show that {τ0 > t} is a measurable event, since W0 = 0, a.s. and so
there does not exist a uniformly positive distance between the path identically equal to 0 and Ws (ω)
on (0, t].
Definition 6.1 Suppose additionally that Ω is a metric Rspace and FR the Borel-σ-algebra. Then
w
µn converges weakly to µ (in formula: µn → µ), if f dµn → f dµ for all f ∈ C(Ω) =
{g | g continuous, bounded}.
15
The following theorem gives equivalent criteria for weak convergence.
w
Theorem 6.2 (Portmanteau Theorem) µn → µ, n → ∞, if and only if one (and hence all) of
the following criteria holds:
Assume that (E, E) is a measurable space, with E separable, metric (with metric ρ) and E = B(E)
is the σ-algebra generated by the open sets. Let X, X1 , X2 , . . . : Ω → E be F/E-measurable random
variables.
a.s.
Definition 6.3 i) Xn converges a.s. to X (in formula: Xn → X), if P{limn→∞ Xn = X} = 1;
P
ii) Xn converges in probability to X (in formula: Xn → X), if P{ρ(Xn − X) ≥ } → 0, n → ∞, for
each > 0.
L1
iii) Xn converges to X in L1 (in formula: Xn → X) if E|Xn − X| → 0, n → ∞.
D w
iv) Xn converges to X in distribution (in formula: Xn → X), iff PXn → PX , where PXn (PX ) is the
distribution of Xn (X).
In Williams PwithM, Appendix to Chapter 13, you can find the following characterisation of con-
vergence in probability.
a.s. P D
Lemma 6.4 i) Xn → X implies Xn → X, implies Xn → X.
D P
ii) Xn → a implies Xn → a, where a is a constant (or degenerate r.v.).
P
iii) Xn → X if and only if every subsequence {Xnk }k=1,2,... of {Xn }n contains a further subsequence
{Xn0k }k such that Xn0k → X, a.s.
Suppose that Xn are i.i.d. integrable r.v.’s. Then the (Strong) Law of Large Numbers states that
Pn
i=1 Xi a.s.
→ EX1 .
n
If also EXn2 < ∞, then the Central Limit Theorem states that
n
1 X D
p (Xi − EX1 ) → X ∗ ,
σ 2 (X1 )n i=1
d
where X ∗ = N(0, 1).
Suppose that Xn are i.i.d. random variables, defined on a common probability space, with values in
a real, separable Banach space. A
Pfamous paper by Itô and Nisio contains the following surprising
convergence results. Write Sn = ni=1 Xi .
16
Theorem 6.5 • Equivalent are
1. Sn , n = 1, 2, . . ., converges in probability;
2. Sn , n = 1, 2, . . ., converges in distribution;
3. Sn , n = 1, 2, . . ., converges a.s.
• If Sn , n = 1, 2, . . ., are uniformly tight, i.e. for each > 0 there exists a compact set K ⊂ E,
such that
P{Sn ∈ K} ≥ 1 − , n = 1, 2, . . . ,
then there exist c1 , c2 , . . . ∈ E, such that Sn − cn converges a.s.
d
If the random variables are symmetric, i.e. Xi = −Xi , then even more can be said. First, let E ∗ be
the collection of bounded continuous functions on E.
Theorem 6.6 If Xi , i = 1, 2, . . . are all symmetric random variables, then a.s. convergence is
equivalent to uniform tightness. In particular, the following additional equaivalences hold.
3 Sn , n = 1, 2, . . ., converges in probability;
4 Sn , n = 1, 2, . . ., is uniformly tight;
6 there exists an E-valued random variable S, such that limn→∞ Eeif (Sn ) → Eeif (S) for every f ∈ E ∗ .
These convergence results play an important role in the analysis of processes with independent
increments.
7 Conditional expectation
i) Y is A-measurable;
If Y 0 is another r.v. with properties (i,ii,iii), then Y 0 = Y with probability 1, i.e. P{Y 0 = Y } = 1.
We call Y a version of the conditional expectation E(X|A) of X given A and we write Y = E(X|A)
a.s.
17
Proof. To prove existence, suppose first that X ≥ 0. Define the measure µ on (Ω, A) by
Z
µ(A) = XdP, A ∈ A.
A
Since X is integrable, µ is finite. Now for all A ∈ A we have that P(A) = 0 implies µ(A) = 0. In
other words, µ is absolutely continuous with respect to P. By the Radon-Nikodym theorem, there
exists a measurable random variable Y on (Ω, A) such that
Z
µ(A) = Y dP, A ∈ A.
A
Clearly, Y has the desired properties. The general case follows by linearity.
As regards a.s. unicity of Y , suppose that we have two random variables Y, Y 0 satisfying (i,ii,iii).
Let An = {Y − Y 0 ≥ 1/n}, n = 1, 2, . . .. Then An ∈ A and so
Z
0= (Y − Y 0 )dP ≥ n1 P{An },
An
Hence Y ≤ Y 0 a.s. Interchanging the roles of Y and Y 0 yields that Y 0 ≤ Y a.s. Thus Y = Y 0 a.s.
QED
N.B.2 Suppose we have constructed an A-measurable r.v. Z, with E(|Z|) < ∞, such that (iii) holds
for all A ∈ π(A), i.e. (iii) holds on a π-system generating A, containing the whole space. Then (iii)
holds for all A ∈ A, and
R so Z is a version of the conditional expectation E(X|A). This follows from
the interpretation of A XdP as a measure on A and the fact that two measures that are equal on a
π-system, are equal on the generated σ-algebra, provided they assign equal mass to the whole space.
This can be used in determining conditional expectations: make a guess and check that it is the right
one on a π-system.
Elementary properties The following properties follow either immediately from the definition,
or from the corresponding properties of ordinary expectations.
18
iv) Positivity: if X ≥ 0 a.s., then E(X | A) ≥ 0;
vi) Fatou: if Xn ≥ 0 a.s., then E(lim inf Xn | A) ≤ lim inf E(Xn | A) a.s.
vii) Dominated convergence: suppose that |Xn | ≤ Y a.s. and EY < ∞. Then Xn → X a.s. implies
E(Xn | A) → E(X | A) a.s.
viii) Jensen: if φ is a convex funcion such that E|φ(X)| < ∞, then E(φ(X) | A) ≥ φ(E(X | A)) a.s.;
x) Taking out what is known: if Y is A-measurable and bounded, then E(Y X | A) = Y E(X | A) a.s.
The same assertion holds as well if X, Y ≥ 0 a.s. and E(XY ), EX < ∞, or if E|X|q , E|Y |p <
∞, with 1/q + 1/p = 1, p, q 6= 1;
xi) Role of independence: if B is independent of σ(σ(X), A), then E(X | σ(B, A)) = E(X | A) a.s.
Proof. Exercise! The proofs of parts (iv), (viii), (x) and (xi) are the most challenging, see Williams
PwithM. QED
Lemma 7.2A Suppose X, Y are r.v. defined on the probability space (Ω, F, P). Then X ≥ Y , a.s.
if and only if Z Z
XdP = E1{F } X ≥ E1{F } Y = Y dP
F F
for all F ∈ F.
Theorem 7.3 Let (Ω1 , F1 ) be a measure space with the property that F1 is the σ-algebra generated
by a map g on Ω1 with values in a measure space (Ω2 , F2 ), i.e. F1 = σ(g). Then a real-valued
function f on (Ω1 , F1 ) is measurable if and only if there exists a real-valued measurable function h
on (Ω2 , F2 ), such that f = h(g).
Proof. If there exists such a function h, then f −1 (B) = g −1 (h−1 (B)) ∈ F1 , by measurability of g
and h. Suppose therefore that f is measurable. We have to show the existence of a function h with
the above required properties.
The procedure is by going through indicator functions through elementary functions to positive
functions and then to general functions f :
19
i) Assume that f = 1{A} for some set A ∈ F1 . Put B = g(A), then B ∈ F2 . Put h = 1{B} .
Pn
ii) Let f = i=1 ai 1{Ai } , Ai ∈ F1 . It follows from (i) that 1{Ai } = hi (g) with hi = 1{g(Ai )}
measurableon Ω2 . But then also h = ni=1 ai hi is measurable on Ω2 , and f = h(g).
P
iii) Let f ≥ 0. Then there is a sequence of elementary functions fn , with f (ω) = limn→∞ fn (ω),
ω ∈ Ω1 . By (ii) fn = hn (g) with hn measurable on Ω2 . It follows that limn hn (ω2 ) exists for
all ω2 ∈ {g(ω1 ) : ω1 ∈ Ω1 }. Define
limn hn (ω2 ), if this limit exists
h(ω2 ) =
0, otherwise
QED
Proof. By Theorem 7.3, E(X | Y ) = h(Y ) for a measurable function h : E T → R. By Lemma 3.10
h = h0 ◦ πS for a countable subset S ⊂ T and an E S -measurable function h0 : E S → R. As a
consequence, h(Y ) = h0 (πS (Y )) = h0 (Yt1 , . . . , ), where S = {t1 , . . .}. QED
Suppose X is a (measurable) map from (Ω, F, P) to (E, E). Let A ∈ F. You are accustomed to
interpreting
P {X ∈ B} ∩ A
P(X ∈ B | A) = , (7.1)
P(A)
provided P(A) > 0. Formally we get
and so we obtain
P {X ∈ B} ∩ A
E(1{X∈B} | A)(ω) = ,
P(A)
20
if P(A) > 0. In view of (7.1) we see that P(X ∈ B | A) stands for the value E(1{X∈B} | A)(ω) on A.
How can we take E(1{X∈B} | A)(ω) on A, if P(A) = 0?
The following observation is useful for computations.
Lemma 7.5 Let (Ω, F, P) be a probability space. Let F1 , F2 ⊆ F be independent σ-algebras. Let
(E1 , E1 ), (E2 , E2 ) be measure spaces. Suppose that X : Ω → E1 is F1 -measurable and that Y : Ω → E2
is F2 -measurable. Let further f : E1 × E2 → R be E1 × E2 -measurable with E|f (X, Y )| < ∞. Then
there exists an E2 -measurable function g : E2 → R, such that E(f (X, Y ) | F2 ) = g(Y ), where
Z Z
g(y) = f (X(ω), y)dP(ω) = f (x, y)dPX (x). (7.2)
Ω x
Proof. Define H as the class of bounded, measurable functions f : E1 × E2 → R, with the property
that for each f ∈ H the function g : E2 → R given by (7.2) is E2 -measurable, with E(f (X, Y ) | F2 ) =
g(Y ). It is straightforward to check that H is a monotone class in the sense of Theorem 3.9.
We will check that f = 1{B1 ×B2 } ∈ H, for B1 ∈ E1 , B2 ∈ E2 . By virtue of Theorem 3.9, H then
contains all bounded E1 × E2 -measurable functions.
Now
Put g(y) = 1{B2 } (y)E(1{B1 } (X)), then the above implies that g(Y ) is a version of
E(1{B1 ×B2 } (X, Y ) | F2 ).
Derive the result for unbounded functions f yourself. QED
Example
Let Ω = [0, 1], F = B[0, 1], P = λ.
21
Put (
1
12 ,ω ∈ [0, 1/2]
Z(ω) = 7
12 ,ω ∈ (1, 2, 1].
R R
The sets [0, 1/2], (1/2, 1] form a π-system for F0 . Since A ZdP = A XdP for the sets A in a π-system
for F0 , and thus also for A = Ω, we have that Z is a version of E(X |F0 ), i.o.w. Z = E(X | F0 ) a.s.
Next let Y : Ω → R be given by Y (ω) = (1/2 − ω)2 . We want to determine E(X | Y ) = E(X | σ(Y )).
σ(Y ) is the σ-algebra generated by the π-system {[0, ω] ∪ [1 − ω, 1] | ω ∈ [0, 1/2]}. By Corollary 7.4
there exists a B(R)/B(R)-measurable function h : R → R, such that E(X | Y ) = h(Y ). Since Y is
constant on sets of the form {ω, 1 − ω}, necessarily E(X | Y ) is constant on these sets!
The easiest way to determine E(X | Y ) is by introducing a ‘help’ random variable and by subsequently
applying Lemma 3.2. Say V : Ω → {0, 1} is defined by V (ω) = 0 for ω ≤ 1/2 and V (ω) = 1 for ω >
1/2. Then σ(V ) and σ(Y ) are independent, and σ(Y, V ) = F (check this). Hence E(X | Y, V ) = X,
a.s.
By Corollary 7.4 there exists a B(R+ ) × σ({0}, {1})-measurable function f , such that X = f (Y, V ).
√ √
By computation we get f (y, v) = (− y + 1/2)2 1{v=0} + ( y + 1/2)2 1{v=1} .
By Lemma 7.5
g(Y ) = E(X | Y ) = E(f (Y, V ) | Y ),
where
1 √ 1 √
Z
1
g(y) = f (y, v)dPV (v) = (− y + 1/2)2 + ( y + 1/2)2 = y + .
v 2 2 4
1
It follows that E(X | Y ) = Y + 4 a.s.
d
Problem 7.1 Let X and Y be two r.v. defines on the same probability space, such that X =exp(λ),
i.e. P{X > t} = e−λt ; Y ≥ 0 a.s. and X and Y are independent. Show the memoryless property of
the (negative) exponential distribution
Problem 7.2 A rather queer example. Let Ω = (0, 1]. Let A be the σ-algebra generated by all
one-point sets {x}, x ∈ (0, 1]. Then A ⊂ B(0, 1].
i) Classify A.
ii) Let λ be the Lebesgue measure on ((0, 1], B(0, 1]). Let X : ((0, 1], B(0, 1]) → (R, B) be any
integrable r.v. Determine E(X|A). Explain heuristically.
Gambling systems A casino offers the following game consisting of n rounds. In every round t he
bets αt ≥ 0. His bet in round t may depend on his knowledge of the game’s past.
22
Problem 7.3 By the distribution of outcomes, one has E(Xt ) = 0. Prove this.
Lemma 7.6 Let Y be a σ(η1 , . . . , ηn ) measurable function. Then there is an admissible gambling
strategy α1 , . . . , αt such that
Xn
Y − E(Y ) = αj ηj .
j=1
QED
23
8 Uniform integrability
a.s.
Suppose that {Xn }n is an integrable sequence of random variables with Xn → X for some integrable
random variable X. Under what conditions does EXn → EX, n → ∞, or, even stronger, when does
L
Xn →1 X?
If Xn ≤ Xn+1 , a.s., both are true by monotone convergence. If this is not the case, we need additional
properties to hold.
R
For an integrable random variable X, the map A → A |X|dP is “uniformly continuous” in the
following sense.
Lemma 8.1 Let X be an integrable random variable. Then for every > 0 there exists a δ > 0 such
that for all A ∈ F, P{A} ≤ δ implies Z
|X|dP < .
A
Definition 8.2 Let C be an arbitrary collection of random variables on a probability space (Ω, F, P).
We call the collection uniformly integrable (UI) if to every > 0 there exists a constant K ≥ 0 such
that Z
|X|dP ≤ , for all X ∈ C.
|X|>K
Lemma 8.3 Let C be a collection of random variables on a probability space (Ω, F, P).
i) If the class C is bounded in Lp (P) for some p > 1, then C is uniformly integrable.
Another useful characterisation of UI is the following. It allows for instance to conclude that the
sum of two UI sequences of random variables is UI.
Lemma 8.4 Let C be a collection of random variables on a probability space (Ω, F, P). C is UI if
and only if C is bounded in L1 and
24
Proof. Let C be a UI collection. We have to show that for all δ > 0 there exists > 0 such that
Notice that
E(1{A} |X|) ≤ xP{A} + E{1{(x,∞)} (X)|X|}. (8.3)
Let x be large enough, so that E{1{(x,∞)} (X)|X|} ≤ δ/2. Next, choose with x ≤ δ/2. The result
follows.
Boundedness in L1 follows by putting A = Ω in (8.3) and setting x large enough, so that
supX∈C E{1{(x,∞)} (X)|X|} < ∞.
For the reverse statement, assume L1 -boundedness of C as well as (8.1). By Chebychev’s inequality
1
sup P{|X| > x} ≤ sup E|X| → 0, x → ∞. (8.4)
X∈C x X∈C
To show UI, we have to show that for each δ > 0 there exists x ∈ R such that
E1{(x,∞)} (X)|X|} ≤ δ, X ∈ C.
Fix δ > 0. By assumption there exists > 0 for which (8.2) holds. By virtue of (8.4), there exists x
so that P{X| > x} ≤ for X ∈ C. Put AX = {X > x}. Then (8.4) implies that
sup E{1{(x,∞)} (X)|X|} = sup E{1{AX } |X|} ≤ sup sup E{1{A} |X|} ≤ δ.
X X A:P{A}≤ X
QED
Conditional expectations give us the following important example of a uniformly integrable class.
C = {E(X | A) | A is sub-σ-algebra of F}
is uniformly integrable.
Proof. By conditional Jensen (Lemma 7.2) E|E(X | A)| ≤ EE(|X| | A) = E|X|. This yields L1 -
boundedness of C. Let F ∈ F. Then
Let δ > 0. By virtue of Lemma 8.1 it is sufficient to show the existence of > 0 such that
Suppose that this is not true. Then for any n there exist Fn ∈ F with P{Fn } ≤ 1/n and An , such
that
E(|X|E(1{Fn } | An )) > δ.
25
Since P{Fn } ≤ 1/n, we can choose a subsequence, indexed by n again, such that E(E(1{Fn } | An )) =
P
P{Fn } ↓ 0 as n → ∞. Hence E(1{Fn } | An ) → 0 (see Theorem 8.6 below). The sequence has a
subsequence converging a.s. to 0 by virtue of Lemma 6.4 (iii). By dominated convergence it follows
that E(|X|E(1{Fn } | An )) ↓ 0 along this subsequence. Contradiction. QED
Uniform integrability is the necessary property for strengthening convergence in probability to con-
vergence in L1 (see §5).
L1
Theorem 8.6 Let (Xn )n∈Z+ and X be integrable random variables. Then Xn → X if and only if
P
i) Xn → X and
9 Augmentation of a filtration
Lemma 9.1 Gt = σ(Ft+ , N ). Moreover, if t ≥ 0 and G ∈ Gt , then there exists F ∈ Ft+ , such that
F ∆G := (F \ G) ∪ (G \ F ) ∈ N .
Problem 9.1 i) Let (Ω, A, P) be a probability space, such that A is P-complete. Let N be the
collection of P-null sets in A. Let K be a sub-σ-algebra of A. Prove that
ii) Prove Lemma 9.1. Hint: use (i). Note that it amounts to proving that ∩u>t σ(Fu , N ) =
σ(∩u>t Fu , N )!
The problem of augmentation is that crucial properties of the processes under consideration might
change. The next Lemma show that supermartingales with cadlag paths stay supermartingales (with
cadlag paths) after augmentation.
26
Lemma 9.2 Suppose that X is a supermartingale with cadlag paths relative to the filtered probability
space (Ω, F, {Ft }t , P). Then X is also a supermartingale with cadlag paths relative to the usual
augmentation (Ω, G∞ , {Gt }t , P).
Problem 9.2 Prove Lemma 9.2. You may use the previous exercise.
For Markov process theory it is useful to see that by augmentation of σ-algebras, certain measurability
properties do not change intrinsically. Let (Ω, F, P) be a probability space. Let A ⊂ F be a sub-σ-
algebra, and let N be collection of P-null sets.
i) Let X be an F-measurable, integrable random variable. Then E(X | G) = E(X | A), P-a.s.
ii) Suppose that Z is G-measurable. Then there exist A-measurable random variables Z1 , Z2 with
Z1 ≤ Z ≤ Z2 .
Recall that a (real) vector space is called a normed linear space, if there exists a norm on V , i.e. a
map || · || : V → [0, ∞) such that
A normed linear space may be regarded as a metric space, the distance between the vectors v, w ∈ V
being given by ||v − w||.
If V, W are two normed linear spaces and A : V → W is a linear map, we define the norm of the
operator A by
||A|| = sup{||Av|| | v ∈ V and ||v|| = 1}.
If ||A|| < ∞, we call A a bounded linear transformation from V to W . Observe that by construction
||Av|| ≤ ||A||||v|| for all v ∈ V . A bounded linear transformation from V to R is called a bounded linear
functional on V .
Theorem 10.1 Let W be a linear subspace of a normed linear space V and let A be a bounded linear
functional on W . Then A can be extended to a bounded linear functional on V without increasing
its norm.
27
Proof. See for instance Rudin (1987), pp. 104–107. QED
Corollary 10.2 Let W be a linear subspace of a normed linear space V . If every bounded linear
functional on V that vanishes on W , vanishes on the whole space V , then W = V , i.e. W is dense
in V .
Proof. Suppose that W is not dense in V . Then there exists v ∈ V , and > 0, such that ||v − w|| >
for all w ∈ W . Let W 0 be the subspace generated by W and v and define a linear functional A on
W 0 by putting A(w + λv) = λ for w ∈ W and λ ∈ R. For λ 6= 0 ||w + λv|| = |λ|||v − (−λ−1 w|| ≥ |λ|.
Hence |A(w + λv)| = |λ| ≤ ||w + λv||/. It follows that ||A|| ≤ 1/, with A considered as a linear
functional on W 0 . Hence A is bounded on W 0 . By the Hahn-Banach theorem, A can be extended
to a bounded linear functional on V . Since A vanishes on W and A(v) = 1, the proof is complete.
QED
Riesz representation theorem Let E ⊆ Rd be an arbitrary set and consider the class C0 (E)
of continuous functions on E that become small outside compacta. We endow C0 (E) with the
supremum norm
||f ||∞ = sup |f (x)|.
x∈E
This turns C0 (E) into a normed linear space, even a Banach space. The version of the Riesz
representation theorem that we consider here, describes the bounded linear functionals on C0 (E).
If µ is a finite Borel measure on E, then clearly the map
Z
f 7→ f dµ
E
is a linear functional on C0 (E), with norm equal to µ(E). The Riesz representation theorem states
that every bounded linear functional on C0 (E) can be represented as the difference of two functionals
of this type.
Theorem 10.3 Let A be a bounded linear functional on C0 (E). Then there exist two finite Borel
measures µ and ν such that Z Z
A(f ) = f dµ − f dν,
E E
for every f ∈ C0 (E).
28
Theorem 10.4 Then either there exists M < ∞ such that
||Tα || ≤ M, α ∈ A.
or
sup ||Tα f || = ∞,
α∈A
These theorems are both taken from Royden’s book on Real Analysis. Let Ω, F) be a measurable
space. Let {µn }n be a sequence of measures on (Ω, F). Then µn is said to converge setwise to the
measure µ (on Ω, cF ), if µn (B) → µ(B) for every B ∈ F.
Lemma 11.1 (Generalised Fatou) Let (Ω, F) be a measurable space, {µn }n a sequence of mea-
sures that converges setwise to a measure µ and {fn : (Ω, F) → (R, B)}n a sequence of non-negative
functions that converges pointwise to the function f : (Ω, F) → (R, B). Then,
Z Z
lim inf fn dµn ≥ f dµ.
n→∞
Then Z Z
lim fn dµn = f dµ.
n→∞
29