Notes On Measure, Probability and Stochastic Processes
Notes On Measure, Probability and Stochastic Processes
Stochastic Processes
Chapter 1. Introduction 1
1. Classical definitions of probability 1
2. Mathematical expectation 4
Part 2. Probability 59
Chapter 5. Distributions 61
1. Definition 61
2. Simple examples 63
3. Distribution functions 64
4. Classification of distributions 66
5. Convergence in distribution 69
6. Characteristic functions 72
iii
iv CONTENTS
Chapter 6. Independence 79
1. Independent events 79
2. Independent random variables 80
3. Independent σ-algebras 82
Chapter 7. Conditional expectation 83
1. Conditional expectation 83
2. Conditional probability 88
Introduction
These are the lecture notes for the course “Probability Theory and
Stochastic Processes” of the Master in Mathematical Finance (since
2016/2017) at ISEG–University of Lisbon. It is required good knowl-
edge of calculus and basic probability. I would like to thank comments,
corrections and suggestions given by several people, in particular by my
colleague Telmo Peixe.
and
24
35
P (no pair of 6’s out of 2 dice in 24 attempts) = ' 0, 508.
36
Laplace law is far from what one could consider as a useful definition
of probability. For instance, we would like to examine also “biased”
experiments, that is, with unequally possible outcomes. A way to deal
with this question is defining probability by the frequency that some
event occurs when repeating the experiment many times under the
same conditions. So,
number of favourable cases in n experiments
P(“event”) = lim .
n→+∞ n
Example 1.3. In 2015 there was 85500 births in Portugal and
43685 were boys. So,
2. Mathematical expectation
and E(S2 ) = 7.
Example 1.8. Toss two fair coins. If we get two heads we win
e4, two tails e1, otherwise we loose e3. Moreover, P (two heads) =
P (two heads) = 14 and also P (one head one tail) = 21 . Let X be the
gain for a given outcome, i.e. X(two heads) = 4, X(two tails) = 1 and
X(one head one tail) = −3. The profit expectation for this game is
therefore
E(X) = 4P (X = 4) + 1P (X = 1) − 3P (X = −3).
2. MATHEMATICAL EXPECTATION 5
Measure theory
CHAPTER 2
1. Algebras
Proof.
(1) As for any α we have ∅ ∈ Fα , then ∅ ∈ F.
(2) Let A ∈ F. So, A ∈ Fα for any α. Thus, Ac ∈SFα and Ac ∈ F.
(3) If An ∈ F, we have An ∈ Fα for any α. So, n An ∈ Fα and
S
n An ∈ F.
Exercise 2.11. Is the union of σ-algebras also a σ-algebra?
2. Monotone classes
Proof.
14 2. MEASURE AND PROBABILITY
3. Product algebras
We denote it by A = F1 × F2 .
Proposition 2.21. A is an algebra (called the product algebra).
4. Measures
Proof.
18 2. MEASURE AND PROBABILITY
(2) Since µ(A1 ) < +∞ any subset of A1 also has finite measure.
Notice that
!c
\ [ [
c
An = An = A1 \ Cn ,
n n n
where Cn = Acn
∩ A1 . We also have Ck ⊂ Ck+1 . Hence, by the
previous case,
! !
\ [
µ An = µ(A1 ) − µ Cn
n n
= lim (µ(A1 ) − µ(Cn )) (2.3)
n→+∞
= lim µ(An ).
n→+∞
Example 2.34. Consider the counting measure µ. Let
An = {n, n + 1, . . . }.
T+∞
Therefore, A = n=1 An = ∅ and An+1 ⊂ An . However, µ(An ) = +∞
does not converge to µ(A) = 0. Notice that the previous theorem does
not apply because µ(A1 ) = +∞.
then P (B) = 0.
(2) *(Second Borel-Cantelli lemma) If
+∞
X
P (An ) = +∞
n=1
and !
n
\ n
Y
P Ai = P (Ai ),
i=1 i=1
for every n ∈ N (i.e. the events are mutually independent; see
section 2), then P (B) = 1.
5. Examples
Exercise 2.45. Is
+∞
X 1
P = δ1/n .
n=1
2n
a probability measure on P(R)?
So, A is a Borel set and m(A) = 0. Notice that A is not empty, as for
instance it includes 0. In fact, A is uncountable. To prove this we just
need to find a bijection between A and [0, 1]. For that we use base 3
representation of numbers5 in [0, 1]. Observe that
A1 = {x ∈ [0, 1] : x = (0.a1 a2 a3 . . . )3 , a1 6= 1}.
Moreover,
An = {x ∈ [0, 1] : x = (0.a1 a2 a3 . . . )3 , a1 6= 1, . . . , an 6= 1}
and
A = {x ∈ [0, 1] : x = (0.a1 a2 a3 . . . )3 , ai 6= 1, i ∈ N}.
So, elements of A are characterized as the points that have no 1’s in
their base 3 expansion6. Finally we choose the bijection h : A → [0, 1]
given by
h((0.a1 a2 a3 . . . )3 ) = (0.b1 b2 b3 . . . )2
where bi = ai /2 ∈ {0, 1}.
then X X
µ(Ai × Bi ) = µ(A0i × Bi0 ).
i i
So, µ(A) is well-defined.
(2) Show that µ can be extended to every measurable set in the
product σ-algebra F.
Measurable functions
1. Definition
Proof.
(1) (⇒) Since any I ∈ I also belongs to σ(I), if f is measurable
then f −1 (I) is in F1 .
(2) (⇐) Let
F = {B ∈ σ(I) : f −1 (B) ∈ F1 }.
29
30 3. MEASURABLE FUNCTIONS
2. Simple functions
Proof.
(⇒) For x ∈ R, take the set J(x) = {j : cj ≤ x} ⊂ {1, . . . , N }.
Hence,
[ [
ϕ−1 (] − ∞, x]) = ϕ−1 (cj ) = Aj .
j∈J(x) j∈J(x)
32 3. MEASURABLE FUNCTIONS
These are also functions with values in R̄. Moreover, the sequence fn
converges if they are finite and equal to the limit of fn (lim fn ), as
discussed in the next section.
Proposition 3.15. For any sequence of measurable functions fn ,
the functions inf n fn , supn fn , lim inf n→+∞ fn and lim supn→+∞ fn are
also measurable.
Exercise 3.16. Prove it.
a.e.
• fn converges almost everywhere to f (i.e. fn −−→ f ) iff there
is A ∈ F such that µ(A) = 0 and
lim fn (x) = f (x) for every x ∈ Ac .
n→+∞
µ
• fn converges in measure to f (i.e. fn →
− f ) iff for every ε > 0
lim µ ({x ∈ Ω : |fn (x) − f (x)| ≥ ε}) = 0.
n→+∞
Proof.
(1) Consider the simple functions
n2n+1
X j
ϕn = −n + n XAn,j + nXf −1 ([n,+∞[) − nXf −1 (]−∞,−n[)
j=0
2
where
−1 j j+1
An,j = f −n + n , −n + n .
2 2
Notice that for any ω ∈ An,j we have
j j+1
−n + n ≤ f (ω) < −n + n
2 2
and
j
ϕn (ω) = −n + n .
2
So,
1
f (ω) − n < ϕn (ω) ≤ f (ω).
2
Therefore, ϕn → f for every ω ∈ Ω since for n sufficiently
large ω belongs to some An,j .
5. INDUCED MEASURE 35
5. Induced measure
Lebesgue integral
1. Definition
(2) If ϕ1 ≤ ϕ2 , then
Z Z
ϕ1 dµ ≤ ϕ2 dµ.
Proof.
(1) Let ϕ, ϕ
e be simple functions in the form
N
X N
e
X
ϕ= cj X A j , ϕ
e= cj XAej
e
j=1 j=1
2. Properties
Proof.
1It
is also known as expectation, mathematical expectation, mean value, mean,
average or first moment. It is sometimes denoted by E[X], E(X) or hXi.
2. PROPERTIES 41
Z N
X
ϕ dµ = cj µ(Aj ∩ (A ∪ B))
A∪B j=1
N
X
= cj (µ(Aj ∩ A) + µ(Ai ∩ B))
j=1
Z Z
= ϕ dµ + ϕ dµ.
A B
= f dµ + f dµ.
A B
(5) RWe have R0 ≤ f ≤R 0 a.e. Then, by the the first property,
0 dµ ≤ f dµ ≤ 0 dµ.
(6) Let A = {x ∈ Ω : f (x) ≥ λ}. Then,
Z Z Z
f dµ ≥ f dµ ≥ λ dµ = λµ(A).
A A
(7) We want to show that µ({x ∈ Ω : f (x) > 0}) = µ◦f −1 (]0, +∞[) =
0. The Markov inequality implies that for any n ∈ N,
Z
−1 1
µ◦f , +∞ ≤ n f dµ = 0.
n
Since
+∞
−1
[
−1 1
f (]0, +∞[) = f , +∞ ,
n=1
n
we have
+∞
−1
X
−1 1
µ◦f (]0, +∞[) ≤ µ◦f , +∞ = 0.
n=1
n
(8) It is enough to notice that inf f ≤ f ≤ sup f .
3. Examples
4. Convergence theorems
Let
An = {x ∈ Ω : gn (x) ≥ cϕ(x)}.
4. CONVERGENCE THEOREMS 45
So, An ↑ Ω. In addition,
Z Z Z Z
cϕ dµ ≤ gn dµ ≤ fk dµ ≤ fk dµ
An An An
for any k ≥ n. Finally,
Z Z Z
cϕ dµ ≤ inf fk dµ ≤ lim inf fn dµ.
An k≥n
Therefore, since the previous inequality is valid for any 0 < c < 1 and
any n large, Z Z
ϕ dµ ≤ lim inf fn dµ.
The next result is the first one for limits and not just liminf.
Theorem 4.14 (Monotone convergence). Let fn ≥ 0 be a sequence
of measurable functions. If fn % f a.e., then
Z Z
lim fn dµ = lim fn dµ.
n→+∞ n→+∞
R R
Proof. Notice that fn ≤ limn→+∞ fn . Hence,
Z Z Z Z
lim sup fn ≤ lim fn = lim inf fn ≤ lim inf fn
n→+∞ n→+∞ n→+∞ n→+∞
where we have used Fatou’s lemma. Since lim inf is always less or equal
to lim sup, the above inequality implies that they have to be the same
and equal to lim.
Remark 4.15. This result applied to a sequence of random vari-
ables Xn ≥ 0 on a probability space is the following: if Xn % X a.s.,
then E(lim Xn ) = lim E(Xn ).
Proof.
46 4. LEBESGUE INTEGRAL
Z Z
≤ +
f dµ + f − dµ
Z Z
= f dµ + f − dµ
+
Z
= |f | dµ.
4. CONVERGENCE THEOREMS 47
Then,
Z +∞
X Z
ϕ dµ2 = cj XAj dµ2
Ω2 j=1 Ω2
+∞
X
= cj µ1 ◦ f −1 (Aj )
j=1
+∞
X Z
= cj dµ1
j=1 f −1 (Aj )
+∞
X Z
= cj Xf −1 (Aj ) dµ1
j=1 Ω1
Z
= ϕ ◦ f dµ1 .
Ω1
Example 4.18. Consider a probability space (Ω, F, P ) and X : Ω →
R a random variable. By setting the induced measure α = P ◦ X −1
and g : R → R we have
Z Z
E(g(X)) = g ◦ X dP = g(x) dα(x).
R
In particular, E(X) = x dα(x).
Proposition 4.19. Consider the measure
+∞
X
µ= an µ n ,
n=1
where
m
X Z
bk,m = an ϕk dµn .
n=1
and
Z Z Z +∞
X Z
−
+
f dµ = f dµ − f dµ = an (f + − f − ) dµn .
n=1
(2)
Z Z Z
−(x2 +y 2 )n −(x2 +y 2 )n
lim e dx dy = lim e dx dy = dm = π,
n→+∞ R2 R2 n→+∞ D
2 +y 2 )n 2 +y 2 )
where we have used the fact that |e−(x | ≤ e−(x is
integrable and
1
e,
(x, y) ∈ ∂D
2 2 n
lim e−(x +y ) = 1, (x, y) ∈ D
0, o.c.
with D = {(x, y) ∈ R2 : x2 + y 2 < 1}.
Exercise 4.22. Determine the following limits:
R +∞ n
(1) limn→+∞ 0 1+rr n+2 dr
Rπ √ n
(2) limn→+∞ 0 1+xx2 dx
R +∞
(3) limn→+∞ −∞ e−|x| cosn (x) dx
n (x−y)
(4) limn→+∞ R2 1+cos
R
(x2 +y 2 +1)2
dx dy
5. Fubini theorem
So, I ⊂ G. The same can be checked for the finite union of measurable
rectangles, corresponding to an algebra A, so that A ⊂ G.
We now show that G is a monotone class. Take an increasing se-
quence An ↑ A in G. Hence, their sections are increasing as well as
fAn and gAn . Moreover,
R fA = limR fAn and gA = lim gAn are measur-
able. Finally, since fAn dP1 =R gAn dP2 Rholds for every n, by the
monotone convergence theorem fA dP1 = g dP2 . That means that
A ∈ G. The same argument can be carried over to decreasing sequences
An ↓ A. Therefore, G is a monotone class.
By Theorem 2.19 we know that σ(A) ⊂ G. R Since F = σ(A) and
G ⊂ F we obtain that G = F. Also, P (A) = fA dP1 for any A ∈ F
by extending this property for measurable rectangles.
Remark 4.26. There exist examples of non-measurable sets (A ⊂
Ω but A 6∈ F) with measurable sections and measurable functions
P2 (Ax1 ) and P1 (Ax2 ) whose integrals differ.
Define Z
I1 : Ω1 → R, I1 (x1 ) = fx1 dP2
and Z
I2 : Ω2 → R, I2 (x2 ) = fx2 dP1 .
6. Signed measures
7. Radon-Nikodym theorem
and
Z
dλ dλ
λ({an }) = (x) dµ(x) = (an )µ({an }),
{an } dµ dµ
we obtain
dλ λ({an })
(an ) = , n ∈ N.
dµ µ({an })
This defines the Radon-Nikodym derivative at µ-almost every
point.
(2) Suppose that Ω = [0, 1] ⊂ R and F = B([0, 1]). Take the
Dirac measure δ0 at 0, the Lebesgue measure m on [0, 1] and
µ = 21 δ0 + 12 m. If µ(A) = 0 then 21 δ0 (A) + 21 m(A) = 0 which
7. RADON-NIKODYM THEOREM 57
Proof.
(1) If A ∈ F is such that µ(A) = 0 then λ(A) = 0 because λ µ.
Furthermore, since ν λ we also have ν(A) = 0. This means
that ν µ.
(2) We know that
Z
dλ
λ(A) = dµ.
A dµ
So, Z Z
dν dν dλ
ν(A) = dλ = dµ,
A dλ A dλ dµ
58 4. LEBESGUE INTEGRAL
Probability
CHAPTER 5
Distributions
1. Definition
α = P ◦ X −1 .
2. Simple examples
Here are some examples of random variables for which one can find
explicitly their distributions.
Example 5.7. Consider X to be constant, i.e. X(x) = c for any
x ∈ Ω and some c ∈ R. Then, given B ∈ B we obtain
(
∅, c 6∈ B
X −1 (B) =
Ω, c ∈ B.
Hence,
(
P (∅) = 0, c 6∈ B
α(B) = P (X −1 (B)) =
P (Ω) = 1, c ∈ B.
That
R is, α = δc is the Dirac distribution at c. Finally, E(g(X)) =
g(x) dα(x) = g(c), so mn = cn and in particular
E(X) = c and Var(X) = 0.
Example 5.8. Given A ∈ F and constants c1 , c2 ∈ R, let X =
c1 XA + c2 XAc . Then, for B ∈ B we get
A, c1 ∈ B, c2 6∈ B
Ac , c 6∈ B, c ∈ B
1 2
X −1 (B) =
Ω, c1 , c2 ∈ B
∅, o.c.
64 5. DISTRIBUTIONS
3. Distribution functions
Proof.
(1) For any x1 ≤ x2 we have ]−∞, x1 ] ⊂ ]−∞, x2 ]. Thus, F (x1 ) ≤
F (x2 ) and F is increasing. Now, given any sequence xn → a+ ,
lim F (xn ) = lim α(] − ∞, xn ])
n→+∞ n→+∞
+∞
!
\
=α ] − ∞, xn ]
n=1
= α(] − ∞, a]) = F (a).
That is, F is continuous from the right for any a ∈ R. Finally,
using Theorem 2.33,
F (−∞) = lim F (−n)
n→+∞
= lim α(] − ∞, −n])
n→+∞
+∞
!
\
=α ] − ∞, −n]
n=1
= α(∅) = 0.
and
F (+∞) = lim F (n)
n→+∞
= lim α(] − ∞, n])
n→+∞
+∞
!
[
=α ] − ∞, n]
n=1
= α(R) = 1.
(2) Consider the algebra A(R) that contains every finite union of
intervals of the form ]a, b] (see section 1.2). Take a sequence of
disjoint intervals ]an , bn ], −∞ ≤ an ≤ bn ≤ +∞, whose union
is in A(R) and define
+∞
! +∞
[ X
α ]an , bn ] = (F (bn ) − F (an )).
n=1 n=1
4. Classification of distributions
Proposition 5.14.
(1) α({a}) > 0 iff a ∈ D.
(2) D is countable.
Proof.
(1) Recall that α({a}) = F (a) − F (a− ). So, it is positive iff a is a
discontinuity point of F .
(2) For each x ∈ D we can choose a rational number g(x) such
that F (x− ) < g(x) < F (x). This defines a function g : D → Q.
Now, for x1 , x2 ∈ D satisfying x1 < x2 , we have
g(x1 ) < F (x1 ) ≤ F (x−
2 ) < g(x2 ).
4. CLASSIFICATION OF DISTRIBUTIONS 67
Thus,
(
∅, x<0
{X ≤ x} = 1/r
] − ∞, x ], x ≥ 0
5. Convergence in distribution
Proof.
(1)⇒(2) Assume that Fn → F on the set Dc of continuity points of
F . Let ε > 0 and a, b ∈ Dc such that a < b, F (a) ≤ ε and
F (b) ≥ 1 − ε. Then, there is n0 ∈ N satisfying
Fn (a) ≤ 2ε and Fn (b) ≥ 1 − 2ε
for all n ≥ n0 .
Let δ > 0 and f continuous such that |f (x)| ≤ M for some
M > 0. Take the following partition
N
[
]a, b] = Ij , Ij =]aj , aj+1 ],
j=1
Hence,
|f (x) − h(x)| ≤ δ, x ∈]a, b].
In addition,
Z Z Z
(f − h) dαn = (f − h) dαn + f dαn
]a,b] ]a,b]c
≤ δ αn (]a, b]) + (max |f |)(Fn (a) + 1 − Fn (b))
≤ δ + 4M ε.
Similarly,
Z
(f − h) dα ≤ δ + 2M ε.
In addition,
αn (Ij ) − α(Ij ) = Fn (aj+1 ) − F (aj+1 ) − (Fn (aj ) − F (aj ))
converges to zero as n → +∞ and the same for
Z Z XN
h dαn − h dα = f (aj ) (αn (Ij ) − α(Ij ))
j=1
Therefore, using
Z Z Z Z
f dαn − f dα = (f − h) dαn − (f − h) dα
Z Z
+ h dαn − h dα
we obtain
Z Z
lim sup f dαn − f dα ≤ 2δ + 6M ε.
n→+∞
w
Being ε and δ arbitrary, we get αn → α.
(2)⇒(1) Let y be a continuity point of F . So, α({y}) = 0. Consider
A =] − ∞, y[ and the sequence of functions
1
1, x ≤ y − 2k
fk (x) = −2k (x − y), y − 21k < x ≤ y
0, x > y,
where k ∈ N. Notice that fk % XA . Thus, using the domi-
nated convergence theorem
Z Z Z
F (y) = α(A) = XA dα = lim fk dα = lim fk dα.
k k
72 5. DISTRIBUTIONS
6. Characteristic functions
is a characteristic function.
(2) If φ is a characteristic function, then there is a unique distri-
bution α such that
Z
eitx dα(x) = φ(t), t ∈ R.
Proof.
(1) Take n = 2, t1 = 0 and t2 = t. Hence,
φ(0)z1 z 1 + φ(−t)z1 z 2 + φ(t)z2 z 1 + φ(0)z2 z 2 ∈ R+
0
Proof.
R
(1) φ(0) = dα = 1.
(2) For any s, t ∈ R we have
Z
|φ(t) − φ(s)| = (eitx − eisx ) dα(x)
Z
≤ |eisx | |ei(t−s)x − 1| dα(x)
Z
= |ei(t−s)x − 1| dα(x).
Proposition 5.34. If φ : R → C is a characteristic function, then
there is a unique distribution α such that
Z
eitx dα(x) = φ(t).
78 5. DISTRIBUTIONS
Independence
1. Independent events
(1) If A1 and A2 are independent, then Ac1 and A2 are also inde-
pendent.
(2) Any full probability event is independent of any other event.
The same for any zero probability event.
(3) Two disjoint events are independent iff at least one of them
has zero probability.
(4) Consider two events A1 ⊂ A2 . They are independent iff A1
has zero probability or A2 has full probability.
Example 6.2. Consider the Lebesgue measure m on Ω = [0, 1] and
the event I1 = [0, 12 ]. Any other interval I2 = [a, b] with 0 ≤ a < b ≤ 1
that is independent of I1 has to satisfy the relation P (I2 ∩ [0, 21 ]) =
1
2
(b−a). Notice that a ≤ 12 (otherwise I1 ∩I2 = ∅) and b ≥ 21 (otherwise
I2 ⊂ I1 ). So, b = 1 − a. That is, any interval [a, 1 − a] with 0 ≤ a ≤ 21
is independent of [0, 12 ].
Exercise 6.3. Suppose that A and C are independent events as
well as B and C with A ∩ B = ∅. Show that A ∪ B and C are also
independent.
79
80 6. INDEPENDENCE
These are independent events iff Ai and A0j are independent for every
i, j.
Proposition 6.8. Let X and Y be independent random variables.
Then, there are sequences ϕn and ϕ0n of simple functions such that
(1) ϕn % X and ϕ0n % Y ,
(2) ϕn and ϕ0n are independent for every n ∈ N.
where
−1 j j+1
An,j = X −n + n , −n + n ,
2 2
and
n2 n+1
X j
ϕ0n = −n + n XAn,j + nXY −1 ([n,+∞[) − nXY −1 (]−∞,−n[)
j=0
2
where
j j+1
A0n,j =Y −1
−n + n , −n + n .
2 2
It remains to check that ϕn and ϕ0n are independent for any given
n. This follows from the fact that X and Y are independent, since any
pre-image of a Borel set by X and Y are independent.
Proposition 6.9. If X and Y are independent, then
E(XY ) = E(X) E(Y )
and
Var(X + Y ) = Var(X) + Var(Y ).
3. Independent σ-algebras
Conditional expectation
1. Conditional expectation
Proof.
(1) If X is G-measurable, then it is the Radon-Nikodym derivative
of λG with respect to P .
(2) This follows from (7.1) with A = Ω.
(3) Consider the set A = {E(X|G) < 0} which is in G since
E(X|G) is G-measurable. If P (A) > 0, then by (7.1),
Z Z
0≤ X dP = E(X|G) dP < 0,
A A
(7) Assume that h ≥ 0 (the general case follows from the decom-
position h = h+ − h− with h+ , h− ≥ 0). Take a sequence of
G-measurable non-negative simple functions ϕn % h of the
form X
ϕn = cj X A j ,
j
where each Aj ∈ G. We will show first that the claim holds
for simple functions and later use the monotone convergence
theorem to deduce it for h. For any A ∈ G we have that
A ∩ Aj ∈ G. Hence
Z Z
E(ϕn X|G) dP = ϕn X dP
A A
X Z
= cj X dP
j A∩Aj
X Z
= cj E(X|G) dP
j A∩Aj
Z
= ϕn E(X|G) dP.
A
By the monotone convergence theorem applied twice,
Z Z
E(hX|G) dP = lim ϕn X dP
A A
Z
= lim E(ϕn X|G) dP
ZA
= lim ϕn E(X|G) dP
A
Z
= hE(X|G) dP.
A
(8) Let A ∈ G1 . Then,
Z Z
E(E(X|G2 )|G1 ) dP = E(X|G2 ) dP
A A
Z Z
= X dP = E(X|G1 ) dP
A A
since A is also in G2 .
(9) Do it as an exercise.
Remark 7.3. Whenever the σ-algebra is generated by the random
variables Y1 , . . . , Yn , we use the notation
E(X|Y1 , . . . , Yn ) = E(X|σ(Y1 , . . . , Yn ))
which reads as the conditional expectation of X given Y1 , . . . , Yn .
86 7. CONDITIONAL EXPECTATION
2. Conditional probability
Stochastic processes
CHAPTER 8
where A − x2 = {y − x2 ∈ R : y ∈ A}.
(2) The characteristic function of α1 ∗ α2 is
φα1 ∗α2 = φα1 φα2 ,
where φαi is the characteristic function of αi .
(3) α1 ∗ α2 = α2 ∗ α1 .
95
96 9. SUMS OF IID PROCESSES: LIMIT THEOREMS
(1)
Z
(α1 ∗ α2 )(A) = XA (y) d(α1 ∗ α2 )(y)
Z
= XA ◦ f (x1 , x2 ) d(α1 × α2 )(x1 , x2 )
Z Z
= Xfx−1 (A) (x1 ) dα1 (x1 )dα2 (x2 )
2
Z
= α1 (fx−1
2
(A))dα2 (x2 )
Z
= α1 (A − x2 )dα2 (x2 ).
(2)
Z
φα1 ∗α2 (t) = eity d(α1 ∗ α2 )(y)
Z
= eitf (x1 ,x2 ) d(α1 × α2 )(x1 , x2 )
Z Z
itx1
= e dα1 eitx2 dα2 .
(3) By the previous result, it follows from the fact that the char-
acteristic functions are equal.
Proposition 9.3. Let X1 , . . . , Xn be independent random variables
with distributions α1 , . . . , αn , respectively. Then,
Markov chains
S = {1, 2, . . . , N }
(1) P (Xn ∈ S) = 1,
(2) it satisfies the Markov property: for every i0 , . . . , in ∈ S,
This means that the next future state (at time n + 1) only depends on
the present one (at time n). The system does not have “memory” of
the past.
We will see that the distributions of each X1 , X2 , . . . are determined
by the knowledge of the above conditional probabilities (that control
the evolution of the system) and the initial distribution of X0 .
Denote by
n
πi,j = P (Xn = j|Xn−1 = i)
= P (Xn ∈ S|Xn−1 = i)
=1
because P (Xn ∈ S) = 1.
2. Distributions
Proof.
(1) It is enough to observe that
P (X0 = i, X1 = i1 , . . . , Xn = in )
P (X1 = i1 , . . . , Xn = in |X0 = i) =
P (X0 = i)
and use Proposition 10.7.
(2) Using the previous result and (7.2),
X
P (Xn = j|X0 = i) = P (X1 = i1 , . . . , Xn−1 = in−1 , Xn = j|X0 = i)
i1 ,...,in−1
X
= 1
πi,i π 2 . . . πin−1
1 i1 ,i2
πn
n−2 ,in−1 in−1 ,j
.
i1 ,...,in−1
3. HOMOGENEOUS MARKOV CHAINS 105
1 2
4 1
3
1
1
3 1
2 1 3
1
4
T = 1−p p
.. ..
0 . .
108 10. MARKOV CHAINS
i.e. for i, j ∈ N
1 − p, i = j
πi,j = p, i+1=j
0, o.c.
Show that
n
P (Xn = j|X0 = i) = Cj−i pj−i (1 − p)n−j+i , 0 ≤ j − i ≤ n.
4. Recurrence time
5. Classification of states
Proof.
(1)
(⇒) If i is recurrent, then there is m ≥ 1 such that
P (Xm = i|X0 = i) > 0.
From (7.3) and the Markov property, for any q > m we
have
X
P (Xq = j|X0 = i) = P (Xq = j|Xm = k)P (Xm = k|X0 = i).
k∈S
110 10. MARKOV CHAINS
Thus,
+∞
X +∞
X
(n)
πi,i = P (Xn = i|X0 = i)
n=1 n=1
+∞
X
≥ P (Xm = i|X0 = i)2 = +∞.
s=1
P (n)
(⇐) Suppose now that n πi,i = +∞. Using (7.3), since
+∞
X
P (ti = k) = 1,
k=1
we have
(n)
πj,i = P (Xn = i|X0 = j)
+∞
X
= P (Xn = i|ti = k, X0 = j)P (ti = k|X0 = j).
k=1
n
X
(n) (n−k)
πj,i = πi,i P (ti = k|X0 = j). (10.2)
k=1
5. CLASSIFICATION OF STATES 111
Finally,
N PN (n)
n=1 πi,i
X
1≥ P (ti = k|X0 = i) ≥ PN (n)
→1
k=1 1+ n=1 πi,i
Example 10.20. Consider the Markov chain with two states and
transition matrix
0 1
T = .
1 0
Thus,
(
T, n odd
Tn =
I, n even.
112 10. MARKOV CHAINS
So, only recurrent states can have finite mean recurrence time. We
will classify them accordingly.
A recurrent state i is null iff τi = +∞ (τi−1 = 0). We write i ∈ R0 .
Otherwise it is called positive, i.e. τi ≥ 1 and i ∈ R+ . Hence,
R = R0 ∪ R+ .
Proposition 10.22. Let i ∈ S. Then,
(1) τi = +∞ iff
(n)
lim πi,i = 0.
n→+∞
(2) If τi = +∞, then for any j ∈ S
(n)
lim πj,i = 0.
n→+∞
6. Decomposition of chains
Let i, j ∈ S. We write
i→j
(n)
whenever there is n ≥ 0 such that πi,j > 0. That is, the probability
of eventually moving from i to j is positive. Moreover, we use the
notation
i ←→ j
if i → j and j → i simultaneously.
Exercise 10.28. Consider i 6= j. Show that i → j is equivalent to
+∞
X
P (tj = n|X0 = i) > 0.
n=1
Proof. We will just prove (2). The remaining cases are similar
and left as an exercise.
Notice first that
(m+n+r)
X (m+n) (r)
πi,i = πi,k πk,i
k
(m+n) (r)
≥ πi,j πj,i
X (m) (n) (r)
= πi,k πk,j πj,i
k
(m) (n) (r)
≥ πi,j πj,j πj,i
(m) (r)
Since i ←→ j, there are m, r ≥ 0 such that πi,j πj,i > 0. So,
(m+n+r)
(n) πi,i
πj,j ≤ (m) (r)
.
πi,j πj,i
This implies that
+∞ +∞
X (n) 1 X (m+n+r)
πj,j ≤ (m) (r)
πi,i
n=1 πi,j πj,i n=1
+∞
1 X (n)
≤ (m) (r)
πi,i
πi,j πj,i n=1
Therefore, if i is transient, i.e.
+∞
X (n)
πi,i < +∞,
n=1
Proof. Suppose that [i] is not closed. Then, there is some j 6∈ [i]
(1)
such that πi,j > 0. That is, i → j but j 6→ i (otherwise j would be in
[i]). So,
! !
\ \
P {Xn 6= i}|X0 = i ≥ P {X1 = j} ∩ {Xn 6= i}|X0 = i
n≥1 n≥2
(1)
= P (X1 = j|X0 = i) = πi,j > 0.
Taking the complementary set
! !
[ \
P {Xn = i}|X0 = i = 1 − P {Xn 6= i}|X0 = i < 1.
n≥1 n≥1
Proof. Suppose that all states are transient. Then, for any i, j ∈
(n)
C we have πj,i → 0 as n → +∞ by Proposition 10.18. Moreover, for
any j ∈ C we have X (n)
πj,i = 1.
i∈C
So, for any ε > 0 there is N ∈ N such that for any n ≥ N we have
(n)
πj,i < ε. Therefore,
X (n)
1= πj,i < ε #C,
i∈C
which implies for any ε that #C < 1/ε. That is, C is infinite.
Assume now that there is i ∈ R0 ∩ C. So, by Proposition 10.22 we
(n)
have for any j ∈ C that πj,i → 0 as n → +∞. As before,
X (n)
πj,i = 1
j∈C
and the limit of the left hand side is zero unless C is infinite.
Finally, if C is irreducible all its states have the same recurrence
property. Since at least one is in R+ , then all are in R+ .
Remark 10.36. The previous result implies that if [i] is finite and
closed, then [i] ⊂ R+ . In particular, if S is finite and irreducible (notice
that it is always closed), then S = R+ .
Example 10.37. Consider the finite state space S = {1, 2, 3, 4, 5, 6}
and the transition probabilities matrix
1 1
2 2
0 0 0 0
1 3 0 0 0 0
41 41 1 1
4 4 4 4
0 0
T = 01 1 1 1
.
4 4 4
0 4
0 0 0 0 1 1
2 2
0 0 0 0 12 21
It is simple to check that 1 ←→ 2, 3 ←→ 4 and 5 ←→ 6. We have
that [1] = {1, 2} and [5] = {5, 6} are irreducible closed sets, while
[3] = {3, 4} is not closed. So, the states in [1] and [5] are positive
recurrent and in [3] are transient.
7. Stationary distributions
be the random variable that counts the number of visits to state j until
time ti . That is, the chain visits the state j for Nj times until it reaches
i. Notice that Ni = 1.
The mean of Nj starting at X0 = i is
ρj = E(Nj |X0 = i).
7. STATIONARY DISTRIBUTIONS 119
Furthermore,
XX
ρj = πi,j + P (Xn = j, Xn−1 = k, ti ≥ n|X0 = i)
n≥2 k6=i
XX
= πi,j + πk,j P (Xn−1 = k, ti ≥ n|X0 = i)
n≥2 k6=i
X X
= πi,j + πk,j P (Xn = k, ti ≥ n + 1|X0 = i).
k6=i n≥1
That is,
ρ = ρT
where ρ = (ρ1 , ρ2 , . . . ). We therefore take µ(i) ({j}) = ρj .
The sum of all the Nj ’s is equal to ti . Indeed,
X XX X
Nj = X{Xn =j,ti ≥n} = X{ti ≥n} = ti .
j∈S n≥1 j∈S n≥1
(i)
Exercise 10.47. Show that µi = 1.
Exercise 10.48. Show that
(i)
X X
µj = πi,j + πi,k1 πk1 ,k2 . . . πkn−1 ,kn πkn ,j .
n≥1 k1 ,...,kn 6=i
(n)
Proof. Given j ∈ S there is n such that πj,i > 0 by the irre-
ducibility of S. Using also the stationarity property of the measures
(νT n = ν and µ(i) T n = µ(i) ),
X (n)
X (i) (n) (i)
νk πk,i = νi and µk πk,i = µi = 1.
k∈S k∈S
Exercise 10.50. Complete the proof of Theorem 10.41.
8. Limit distributions
Notice that
X
τj = E(tj |X0 = j) = nfn + ∞ · f∞ .
n≥1
Suppose that
f∞ = P (R1 = ∞|R0 = m) = P (tj = ∞|X0 = j) > 0,
so that j is transient for the chain Xn with mean recurrence time
τj = ∞. Also,
(n)
lim πj,j = 0,
n→+∞
which means that it is equal to 1/τj .
It remains to be proved that for f∞ = 0 we have
(n) 1
lim πj,j = .
n→+∞ τj
Take first the sup limit
(n)
a0 = lim sup πj,j .
By considering a subsequence for which the limit is a0 , we use the
diagonalization argument to have a subsequence for which there exists
(k −k)
ak = lim πj,jn ≤ a0
(k)
for any k ∈ N. Here we make the assumption that πj,j = 0 for any
k ≤ −1.
8. LIMIT DISTRIBUTIONS 123
Recall that n
(n)
X (n−m)
πj,j = fm πj,j .
m=1
Taking the limit along the sequence kn , by the dominated convergence
theorem, we get
+∞
X
a0 = f m am .
m=1
P
Since k fk = 1 and ak ≤ a0 , then ak = a for every k ∈ D = {n ∈
N : fn > 0}. Similarly,
n −k
kX +∞
(k −k−m)
X
ak = lim fm πj,jn = fm ak+m
m=1 m=1
So,
(n) 1
lim sup πi,j = .
τj
The same idea can be used for the lim inf proving that the limit
exists. This completes the proof of Theorem 10.51.
CHAPTER 11
Martingales
where
τ ∧ n = min{τ, n}.
So, on average it does not take too long to get a win. However, what
matters to avoid ruin is the mean capital just before a win. Whilst
2. GENERAL DEFINITION OF A MARTINGALE 127
E(Kτ ) = K0 + b, we have
τ −1
!
X
E(Kτ −1 ) = K0 − E b 2i−1
i=1
τ −1
= K0 − bE(2 − 1)
+∞
X
= K0 − b P (τ = n)(2n−1 − 1)
n=1
+∞
X 1 n−1
= K0 − b n
(2 − 1)
n=1
2
= −∞.
That is, the mean value for the capital just before winning is −∞.
Notice also that E(K1 ) = K0 . In general, for any n, since Kn+1 =
Kn + 2n bYn+1 and Yn+1 is independent of Kn (Kn is a sum involving
only Y1 , . . . , Yn and the sequence Yn is independent) we have
E(Kn+1 |Kn ) = Kn + 2n bE(Yn+1 |Kn ) = Kn .
3. Examples
Now,
E(Xn+1 − Xn |Fn ) = Xn E(Yn+1 − 1).
Thus, Xn is a martingale iff E(Yn ) = 1 for every n ∈ N.
(3) Consider now the stochastic process
n
!2
X
Xn = Yn
i=1
4. Stopping times
Proposition 11.12.
E(Zn+1 − Zn |Fn ) = E(Xn+1 − Xn |Fn )X{τ ≥n+1} .
Exercise 11.13. Prove it.
Remark 11.14. From the above result we can conclude that:
(1) If Xn is a martingale, then Zn is also a martingale.
(2) If Xn is a submartingale, then Zn is also a sub-martingale.
(3) If Xn is a supermartingale, then Zn is also a super-martingale.
So,
+∞
X
E(Xτ ) = E(Yn X{τ ≥n} ).
n=1
132 11. MARTINGALES
Example A.4.
(1) “If Portugal is bordered by the Atlantic Ocean, then Portugal
is bordered by the sea” (T)
(2) “x = 0 iff |x| = 0” (T)
Let A, B ⊂ Ω.
• A \ B = {x ∈ Ω : x ∈ A ∧ x 6∈ B} is the difference between A
and B (A minus B).
• Ac = {x ∈ Ω : x 6∈ A} is the complementary set of A in Ω.
As in (A.1) we can write:
A \ B = {x : p(x)∧ ∼ q(x)} and Ac = {x : ∼ p(x)}.
Properties A.17.
(1) A \ B = A ∩ B c
(2) A ∩ Ac = ∅
(3) A ∪ Ac = Ω.
Example A.18.
(1) Let An = [n, n + 1] ⊂ R, with n ∈ N (notice that I = N).
Then \ [
An = ∅, An = [1, +∞[.
n∈N n∈N
3. FUNCTION THEORY NOTIONS 139
(2)
!c
[ \
Aα = Acα
α∈I α∈I
Notation:
• A is the domain of f .
• f (C) = {f (x) ∈ B : x ∈ C} is the image of C ⊂ A.
• f −1 (D) = {x ∈ A : f (x) ∈ D} is the pre-image of D ⊂ B.
Example A.21.
(1) Let A = {a, b, c, d}, B = N and f a function f : A → B defined
by the following table:
x a b c d
f (x) 3 5 7 9
Then, f ({b, c}) = {5, 7}, f −1 ({1}) = ∅, f −1 ({3, 5}) = {a, b},
f −1 ({n ∈ N : n2 ∈ N}) = ∅.
140 A. THINGS THAT YOU SHOULD KNOW BEFORE STARTING
The pre-image behaves nicely with the union, intersection and com-
plement of sets. Let I to be the set of indices of Aα ⊂ A and Bα ⊂ B
with α ∈ I .
Proposition A.22.
(1) !
[ [
f Aα = f (Aα )
α∈I α∈I
(2) !
\ \
f Aα ⊂ f (Aα )
α∈I α∈I
(3) !
[ [
−1
f Bα = f −1 (Bα )
α∈I α∈I
(4) !
\ \
f −1 Bα = f −1 (Bα )
α∈I α∈I
(5)
f −1 (Bαc ) = f −1 (Bα )c
(6)
f (f −1 (Bα ) ⊂ Bα
(7)
f −1 (f (Aα )) ⊃ Aα
Exercise A.23. Prove it.
3. FUNCTION THEORY NOTIONS 141
Proof.
4. Topological notions in R
In fact, we could have defined distance1 only using the above prop-
erties, since they are the relevant ones. An example of another distance
d on R satisfying the same properties is:
|x − y|
d(x, y) = .
1 + |x − y|
Notice that with this distance we have d(0, 1) = d(1, 2) = 12 and that
d(0, 2) = 23 . On the other hand, there are no points whose distance
between each other is more than 1.
We will restrict our study to the usual distance in (A.2). How-
ever, with some care we could have developped our study for a generic
distance.
Proof.
(1) A open ⇔ front A ⊂ Ac ⇔ front Ac ⊂ Ac ⇔ Ac closed.
(2) Let a ∈ A∩B. Then, as A and B are open, there are ε1 , ε2 > 0
such that
Vε1 (a) ⊂ A e Vε2 (a) ⊂ B.
Choosing ε = min{ε1 , ε2 }, we have that
Vε (a) ⊂ Vε1 (a) ∩ Vε2 (a) ⊂ A ∩ B.
Same idea for A ∪ B.
146 A. THINGS THAT YOU SHOULD KNOW BEFORE STARTING
and
lim sup un = inf sup un = lim sup un .
n≥1 k≥n n→+∞ k≥n
k
X f (i) (x0 )
P (x) = (x − x0 )i ,
i=0
n!
f (k+1) (ξ)
f (x) − P (x) = (x − x0 )k+1
n!
6. Greek alphabet
[1] M. Capinski and E. Kopp. Measure, Integral and Probability Springer, 2005.
[2] H. L. Royden. Real Analysis. Macmillan, 3rd Ed, 1988.
[3] W. Rudin. Real and Complex Analysis. McGraw-Hill, 3rd Ed, 1987.
[4] S. R. S. Varadhan. Probability Theory AMS, 2001.
151