Elements of Probability Theory - Lecture Notes
Elements of Probability Theory - Lecture Notes
CHUNG-MING KUAN
Department of Finance
National Taiwan University
December 5, 2009
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 1 / 58
Lecture Outline
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 2 / 58
Lecture Outline (cont’d)
7 Stochastic Processes
Brownian motion
Weak Convergence
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 3 / 58
Probability Space and σ-Algebra
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 4 / 58
Probability Measure
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 5 / 58
Borel Field
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 6 / 58
The Borel field on Rd , B d , is generated by all open hypercubes:
or by
The sets that generate the Borel field B d are all Borel sets.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 7 / 58
Random Variable
z −1 (B) = {ω : z(ω) ∈ B} ∈ F.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 8 / 58
Borel Measurable
All the inverse images of random vector z, z−1 (B), form a σ-algebra,
denoted as σ(z).
It is known as the σ-algebra generated by z, or the information set
associated with z.
It is the smallest σ-algebra in F such that z is measurable.
A function g : R 7→ R is B-measurable or Borel measurable if
{ζ ∈ R : g (ζ) ≤ b} ∈ B.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 9 / 58
Distribution Function
with
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 10 / 58
Independence
y and z are (pairwise) independent iff for any Borel sets B1 and B2 ,
Lemma 5.1
Let {zi } be a sequence of independent random variables and hi ,
i = 1, 2, . . . be Borel-measurable functions. Then {hi (zi )} is also a
sequence of independent random variables.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 11 / 58
Expectation
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 12 / 58
A function g is convex on a set S if for any a ∈ [0, 1] and any x, y in S,
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 13 / 58
Lp -Norm
hzi , zj i = IE(zi zj ).
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 14 / 58
Inequalities
IE |z|p
IP(|z| ≥ c) ≤ ,
cp
where c is a positive real number.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 15 / 58
Lemma 5.4 (Hölder)
Let y be a random variable with finite p th moment (p > 1) and z a
random variable with finite q th moment (q = p/(p − 1)). Then,
IE |yz| ≤ ky kp kzkq .
| IE(yz)| ≤ ky k2 kzk2 .
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 16 / 58
Let y = 1 and x = z p . For q > p and r = q/p, by Hölder’s inequality,
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 17 / 58
Conditional Distributions
fz,y (ζ, η)
fz|y (ζ | y = η) = .
fy (η)
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 18 / 58
Given the conditional density function fz|y , for A ∈ B d ,
Z
IP(z ∈ A | y = η) = fz|y (ζ | y = η)dζ.
A
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 19 / 58
Let G be a sub-σ-algebra of F, the conditional expectation IE(z | G)
is the integrable and G-measurable random variable satisfying
Z Z
IE(z | G) d IP = z d IP, ∀G ∈ G.
G G
Suppose that G is the trivial σ-algebra {Ω, ∅}, then IE(z | G) must be
a constant c, so that
Z Z
IE(z) = z d IP = c d IP = c.
Ω Ω
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 20 / 58
By definition,
Z Z
IE[IE(z | G)] = IE(z | G) d IP = z d IP = IE(z);
Ω Ω
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 21 / 58
Lemma 5.11
Let z be a square integrable random variable. Then
It follows that
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 22 / 58
The conditional variance-covariance matrix of z given y is
Then,
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 23 / 58
Almost Sure Convergence
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 24 / 58
Lemma 5.13
a.s.
Let g : R 7→ R be a function continuous on Sg ⊆ R. If zn −→ z, where z
a.s.
is a random variable such that IP(z ∈ Sg ) = 1, then g (zn ) −→ g (z).
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 25 / 58
Convergence in Probability
Convergence in Probability
{zn } is said to converge to z in probability if for every > 0,
Note: In this definition, the events Ωn () = {ω : |zn (ω) − z(ω)| ≤ } may
vary with n, and convergence is referred to the probability of such events:
pn = IP(Ωn ()), rather than the random variables zn .
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 26 / 58
Almost sure convergence implies convergence in probability.
To see this, let Ω0 denote the set of ω such that zn (ω) → z(ω). For
ω ∈ Ω0 , there is some m such that ω is in Ωn () for all n > m. That is,
∞ \
[ ∞
Ω0 ⊆ Ωn () ∈ F.
m=1 n=m
As ∩∞
n=m Ωn () is non-decreasing in m, it follows that
∞ \ ∞
!
[
IP(Ω0 ) ≤ IP Ωn ()
m=1 n=m
∞
!
\
= lim IP Ωn () ≤ lim IP Ωm () .
m→∞ m→∞
n=m
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 27 / 58
Example 5.15
Let Ω = [0, 1] and IP be the Lebesgue measure. Consider the
sequence of intervals {In } in [0, 1]: [0, 1/2), [1/2, 1], [0, 1/3),
[1/3, 2/3), [2/3, 1], . . . , and let zn = 1In . When n tends to infinity,
In shrinks toward a singleton. For 0 < < 1, we have
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 28 / 58
Lemma 5.16
Let {zn } be a sequence of square integrable random variables. If
IP
IE(zn ) → c and var(zn ) → 0, then zn −→ c.
Lemma 5.17
IP
Let g : R 7→ R be a function continuous on Sg ⊆ R. If zn −→ z, where z
IP
is a random variable such that IP(z ∈ Sg ) = 1, then g (zn ) −→ g (z).
Proof: By the continuity of g , for each > 0, we can find a δ > 0 s.t.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 29 / 58
Lemma 5.13 and Lemma 5.17 are readily generalized to Rd -valued random
a.s. IP
variables. For instance, zn −→ z (zn −→ z) implies
a.s. IP
z1,n + z2,n −→ (−→) z1 + z2 ,
a.s. IP
z1,n z2,n −→ (−→) z1 z2 ,
2 2 a.s. IP
z1,n + z2,n −→ (−→) z12 + z22 ,
where z1,n , z2,n are two elements of zn and z1 , z2 are the corresponding
elements of z. Also, provided that z2 6= 0 with probability one,
a.s. IP
z1,n /z2,n −→ (−→) z1 /z2 .
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 30 / 58
Convergence in Distribution
Convergence in Distribution
D
{zn } is said to converge to z in distribution, denoted as zn −→ z, if
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 31 / 58
Lemma 5.19
IP D IP D
If zn −→ z, then zn −→ z. For a constant c, zn −→ c iff zn −→ c.
IP(zn ≤ ζ) =
IP
Similarly, IP(z ≤ ζ − ) ≤ IP(zn ≤ ζ) + IP(|zn − z| > ). If zn −→ z, then
by passing to the limit and noting that is arbitrary,
That is, Fzn (ζ) → Fz (ζ). The converse is not true in general, however.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 32 / 58
Theorem 5.20 (Continuous Mapping Theorem)
Let g : R 7→ R be a function continuous almost everywhere on R, except
D D
for at most countably many points. If zn −→ z, then g (zn ) −→ g (z).
D D
For example, zn −→ N (0, 1) implies zn2 −→ χ2 (1).
Theorem 5.21
Let {yn } and {zn } be two sequences of random vectors such that
IP D D
yn − zn −→ 0. If zn −→ z, then yn −→ z.
Theorem 5.22
If yn converges in probability to a constant c and zn converges in
D D D
distribution to z, then yn + zn −→ c + z, yn zn −→ cz, and zn /yn −→ z/c
if c 6= 0.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 33 / 58
Non-Stochastic Order Notations
Theorem 5.23
(a) If an = O(nr ) and bn = O(ns ), then an bn = O(nr +s ), an + bn = O(nmax(r ,s) ).
(b) If an = o(nr ) and bn = o(ns ), then an bn = o(nr +s ), an + bn = o(nmax(r ,s) ).
(c) If an = O(nr ) and bn = o(ns ), then an bn = o(nr +s ), an + bn = O(nmax(r ,s) ).
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 34 / 58
Stochastic Order Notations
The order notations defined earlier easily extend to describe the behavior
of sequences of random variables.
{zn } is Oa.s. (cn ) (or O(cn ) almost surely) if zn /cn is O(1) a.s.
{zn } is OIP (cn ) (or O(cn ) in probability) if for every > 0, there is
some ∆ such that IP(|zn |/cn ≥ ∆) ≤ , for all n sufficiently large.
Lemma 5.23 holds for stochastic order notations. For example,
yn = OIP (1) and zn = oIP (1), then yn zn is oIP (1).
It is very restrictive to require a random variable being bounded
almost surely, but a well defined random variable is typically bounded
in probability, i.e., OIP (1).
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 35 / 58
D
Let {zn } be a sequence of random variables such that zn −→ z and ζ be a
continuity point of Fz . Then for any > 0, we can choose a sufficiently
D
large ζ such that IP(|z| > ζ) < /2. As zn −→ z, we can also choose n
large enough such that
Lemma 5.24
D
Let {zn } be a sequence of random variables such that zn −→ z. Then
zn = OIP (1).
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 36 / 58
Law of Large Numbers
Note that i.i.d. random variables need not obey Kolmogorov’s SLLN if
they do not have a finite mean, e.g., i.i.d. Cauchy random variables.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 37 / 58
Theorem 5.26 (Markov’s SLLN)
Let {zt } be a sequence of independent random variables such that for
some δ > 0, IE |zt |1+δ is bounded for all t. Then,
T
1 X a.s.
[zt − IE(zt )] −→ 0.
T
t=1
Note that here zt need not have a common mean, and the average of
their means need not converge.
Compared with Kolmogorov’s SLLN, Markov’s SLLN requires a
stronger moment condition but not identical distribution.
A LLN usually obtains by regulating the moments of and dependence
across random variables.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 38 / 58
Examples
T T T −1
!
X X X
var yt = var(yt ) + 2 (T − τ ) cov(yt , yt−τ )
t=1 t=1 τ =1
T
X T
X −1
≤ var(yt ) + 2T | cov(yt , yt−τ )| = O(T ),
t=1 τ =1
so that var T −1 T = O(T −1 ). As IE(T −1 T
P P
t=1 y t t=1 yt ) = 0,
1 PT IP
T t=1 yt −→ 0.
P∞ i
In Example 5.27, yt = i=0 αo ut−i with |αo | < 1, so that
P∞ i
i=0 |αo | < ∞
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 40 / 58
Example 5.29: For the sequences {t} and {t 2 },
PT PT 2
t=1 t = T (T + 1)/2, t=1 t = T (T + 1)(2T + 1)/6.
PT PT
Hence, T −1 t=1 t and T −1 t=1 t
2 both diverge.
Example 5.30: ut are i.i.d. with mean zero and variance σu2 . Consider
now {tut }, which does not have bounded (1 + δ) th moment and does not
obey Markov’s SLLN. Moreover,
T T
!
X X T (T + 1)(2T + 1)
var tut = t 2 var(ut ) = σu2 ,
6
t=1 t=1
so that T
P 3/2 ). It follows that T −1
PT 1/2 ).
t=1 tut = OIP (T t=1 tut = OIP (T
That is, {tut } does not obey a WLLN.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 41 / 58
Example 5.31: yt is a random walk: yt = yt−1 + ut . For s < t,
Pt
y t = ys + i=s+1 ui = ys + vt−s ,
T T T −1 T
!
X X X X
var yt = var(yt ) + 2 cov(yt , yt−τ ) = O(T 3 ),
t=1 t=1 τ =1 t=τ +1
PT PT 2
for t=1 var(yt ) = t=1 tσu = O(T 2 ) and
T
X −1 T
X T
X −1 T
X
2 cov(yt , yt−τ ) = 2 (t − τ )σu2 = O(T 3 ).
τ =1 t=τ +1 τ =1 t=τ +1
PT PT
Then, t=1 yt = OIP (T 3/2 ) and T −1 t=1 yt diverges in probability.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 42 / 58
Example 5.32: yt is the random walk in Example 5.31. Then,
2 ) IE(u 2 ) = (t − 1)σ 4 , and for s < t,
IE(yt−1 ut ) = 0, var(yt−1 ut ) = IE(yt−1 t u
This yields
T T T
!
X X X
var yt−1 ut = var(yt−1 ut ) = (t − 1)σu4 = O(T 2 ),
t=1 t=1 t=1
and T
P −1
PT 4
t=1 yt−1 ut = OIP (T ). As var(T t=1 yt−1 ut ) converges to σu /2,
rather than 0, {yt−1 ut } does not obey a WLLN, even though its partial
sums are OIP (T ).
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 43 / 58
Central Limit Theorem (CLT)
i.i.d. random variables need not obey this CLT if they do not have a
finite variance, e.g., t(2) r.v.
Note that z̄T converges to µo in probability, and its variance σo2 /T
vanishes when T tends to infinity. A normalizing factor T 1/2 suffices
to prevent a degenerate distribution in the limit.
When {zt } obeys a CLT, z̄T is said to converge to µo at the rate
T −1/2 , and z̄T is understood as a root-T consistent estimator.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 44 / 58
Lemma 5.36 (Liapunov’s CLT)
Let {zTt } be a triangular array of independent random variables with mean
2 > 0 such that σ̄ 2 = 1
PT 2 2
µTt and variance σTt T T t=1 σTt → σo > 0. If for
√ D
some δ > 0, IE |zTt |2+δ are bounded, then T (z̄T − µ̄T )/σo −→ N (0, 1).
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 45 / 58
Examples
It follows that
√ T
3 X t D
u −→ N (0, 1).
T 1/2 σu t=1 T t
These results show that {(t/T )ut } obeys a CLT, whereas {tut } does not.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 46 / 58
Example 5.38: yt is a random walk: yt = yt−1 + ut , where ut are i.i.d.
with mean zero and variance σu2 . We know yt do not obey a LLN and
hence do not obey a CLT.
CLT for Triangular Array
{zTt } is a triangular array of random variables and obeys a CLT if
T √
1 X T (z̄T − µ̄T ) D
√ [zTt − IE(zTt )] = −→ N (0, 1),
σo T σo
t=1
PT
where z̄T = T −1 t=1 zTt , µ̄T = IE(z̄T ), and
T
!
X
−1/2
σT2 = var T zTt → σo2 > 0.
t=1
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 47 / 58
Consider an array of square integrable random vectors zTt in Rd . Let
z̄T denote the average of zTt , µ̄T = IE(z̄T ), and
T
!
1 X
ΣT = var √ zTt → Σo ,
T t=1
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 48 / 58
Stochastic Processes
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 49 / 58
Brownian motion
The process {w (t), t ∈ [0, ∞)} is the standard Wiener process (standard
Brownian motion) if it has continuous sample paths almost surely and
satisfies:
1 IP w (0) = 0 = 1.
2 For 0 ≤ t0 ≤ t1 ≤ · · · ≤ tk ,
Q
IP w (ti ) − w (ti−1 ) ∈ Bi , i ≤ k = i≤k IP w (ti ) − w (ti−1 ) ∈ Bi ,
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 50 / 58
w (t) ∼ N (0, t) such that for r ≤ t,
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 51 / 58
The d-dimensional, standard Wiener process w consists of d mutually
independent, standard Wiener processes, so that for s < t,
w(t) − w(s) ∼ N (0, (t − s) Id ).
Lemma 5.39
Let w be the d-dimensional, standard Wiener process.
1 w(t) ∼ N (0, t Id ).
2 cov(w(r ), w(t)) = min(r , t) Id .
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 52 / 58
Weak Convergence
IPn converges weakly to IP, denoted as IPn ⇒ IP, if for every bounded,
continuous real function f on S,
Z Z
f (s) dIPn (s) → f (s) d IP(s),
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 53 / 58
Continuous Mapping Theorem
Proof: Let S and S 0 be two metric spaces with Borel σ-algebras S and S 0 and
g : S 7→ S 0 be a measurable mapping. For IP on (S, S), define IP∗ on (S 0 , S 0 ) as
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 54 / 58
Functional Central Limit Theorem (FCLT)
1
zn (t) = zn ((i − 1)/n) = √ s[nt] ,
σ n
where [nt] is the the largest integer less than or equal to nt.
From Lindeberg-Lévy’s CLT,
1/2
D √
1 [nt] 1
√ s[nt] = p s[nt] −→ t N (0, 1),
σ n n σ [nt]
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 55 / 58
For r < t, we have
D
(zn (r ), zn (t) − zn (r )) −→ w (r ), w (t) − w (r ) ,
D
and hence (zn (r ), zn (t)) −→ (w (r ), w (t)). This is easily extended to
establish convergence of any finite-dimensional distributions and leads
to the functional central limit theorem.
Then, zT ⇒ w as T → ∞.
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 56 / 58
Let ζt be r.v.s with mean µt and variance σt2 > 0. Define long-run
variance of ζt as
T
!
2 1 X
σ∗ = lim var √ ζt ,
T →∞ T t=1
{ζt } is said to obey an FCLT if zT ⇒ w as T → ∞, where
[Tr ]
1 X
zT (r ) = √ ζt − µt , r ∈ [0, 1].
σ∗ T t=1
In the multivariate context, FCLT is zT ⇒ w as T → ∞, where
[Tr ]
1 −1/2 X
zT (r ) = √ Σ∗ ζ t − µt , r ∈ [0, 1],
T t=1
w is the d-dimensional, standard Wiener process, and
T
! T !0
1 X X
Σ∗ = lim IE (ζ t − µt ) (ζ t − µt ) ,
T →∞ T
t=1 t=1
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 57 / 58
Example 5.43
T T Z t/T Z 1
1 X X 1
yt = σ u √ y[Tr ] dr ⇒ σu w (r ) dr ,
T 3/2 t=1 t=1 (t−1)/T T σu 0
PT
This result also verifies that t=1 yt is OIP (T 3/2 ). Similarly,
T T Z 1
1 X 2 1 X yt 2 2
yt = √ ⇒ σu w (r )2 dr ,
T2 T T 0
t=1 t=1
PT 2
so that t=1 yt is OIP (T 2 ).
C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 58 / 58