0% found this document useful (0 votes)
87 views

Elements of Probability Theory - Lecture Notes

This document outlines the key concepts in probability theory that will be covered in a lecture series. It includes sections on probability spaces, random variables, distributions, independence, expectations, norms, and inequalities. The document defines key terms like probability measures, Borel fields, distribution functions, and Lp norms. It also states several important lemmas regarding independence, Jensen's inequality, Markov's inequality, and Holder's inequality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Elements of Probability Theory - Lecture Notes

This document outlines the key concepts in probability theory that will be covered in a lecture series. It includes sections on probability spaces, random variables, distributions, independence, expectations, norms, and inequalities. The document defines key terms like probability measures, Borel fields, distribution functions, and Lp norms. It also states several important lemmas regarding independence, Jensen's inequality, Markov's inequality, and Holder's inequality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Elements of Probability Theory

CHUNG-MING KUAN

Department of Finance
National Taiwan University

December 5, 2009

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 1 / 58
Lecture Outline

1 Probability Space and Random Variables


Probability Space
Random Variables
Moments and Norms
2 Conditional Distributions and Moments
Conditional Distributions
Conditional Moments
3 Modes of Convergence
Almost Sure Convergence
Convergence in Probability
Convergence in Distribution
4 Stochastic Order Notations

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 2 / 58
Lecture Outline (cont’d)

5 Law of Large Numbers

6 Central Limit Theorem

7 Stochastic Processes
Brownian motion
Weak Convergence

8 Functional Central Limit Theorem

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 3 / 58
Probability Space and σ-Algebra

A probability space is a triplet (Ω, F, IP), where


1 Ω is the outcome space, whose elements ω are outcomes of the random
experiment,
2 F is a a σ-algebra, a collection of subsets of Ω,
3 IP is a a probability measure assigned to the elements in F.
F is a σ-algebra if
1 Ω ∈ F,
2 if A ∈ F, then Ac ∈ F,
S∞
3 if A1 , A2 , · · · ∈ F, then n=1 An ∈ F.
By (2), Ωc = ∅ ∈ F. From de Mongan’s law,

!c ∞
[ \
An = Acn ∈ F.
n=1 n=1

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 4 / 58
Probability Measure

IP : F 7→ [0, 1] is a real-valued set function such that


1 IP(Ω) = 1,
2 IP(A) ≥ 0 for all A ∈ F.
S∞ P∞
3 if A1 , A2 , . . . ∈ F are disjoint, then IP( n=1 An ) = n=1 IP(An ).
IP(∅) = 0, IP(Ac ) = 1 − IP(A), IP(A) ≤ IP(B) if A ⊆ B, and

IP(A ∪ B) = IP(A) + IP(B) − IP(A ∩ B).

If {An } is an increasing (decreasing) sequence in F with the limiting


set A, then limn→∞ IP(An ) = IP(A).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 5 / 58
Borel Field

Let C be a collection of subsets of Ω. The σ-algebra generated by C,


σ(C), is the intersection of all σ-algebras that contain C and hence
the smallest σ-algebra containing C.
When Ω = R, the Borel field, B, is the σ-algebra generated by all
open intervals (a, b) in R.
Note that (a, b), [a, b], (a, b], and (−∞, b] can be obtained from each
other by taking complement, union and/or intersection. For example,
∞  ∞ 
\ 1 [ 1i
(a, b] = a, b + , (a, b) = a, b − .
n n
n=1 n=1

Thus, the collection all open intervals (closed intervals, half-open


intervals or half lines) generates the same Borel field.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 6 / 58
The Borel field on Rd , B d , is generated by all open hypercubes:

(a1 , b1 ) × (a2 , b2 ) × · · · × (ad , bd ).

B d can be generated by all closed hypercubes:

[a1 , b1 ] × [a2 , b2 ] × · · · × [ad , bd ],

or by

(−∞, b1 ] × (−∞, b2 ] × · · · × (−∞, bd ].

The sets that generate the Borel field B d are all Borel sets.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 7 / 58
Random Variable

A random variable z defined on (Ω, F, IP) is a function z : Ω 7→ R


such that for every B in the Borel field B, its inverse image is in F:

z −1 (B) = {ω : z(ω) ∈ B} ∈ F.

That is, z is a F/B-measurable (or simply F-measurable) function.


Given ω, the resulting value z(ω) is known as a realization of z.
A Rd valued random variable (random vector) z defined on (Ω, F, IP)
is: z : Ω 7→ Rd such that for every B ∈ B d ,

z−1 (B) = {ω : z(ω) ∈ B} ∈ F;

i.e., z is a F/B d -measurable function.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 8 / 58
Borel Measurable

All the inverse images of random vector z, z−1 (B), form a σ-algebra,
denoted as σ(z).
It is known as the σ-algebra generated by z, or the information set
associated with z.
It is the smallest σ-algebra in F such that z is measurable.
A function g : R 7→ R is B-measurable or Borel measurable if

{ζ ∈ R : g (ζ) ≤ b} ∈ B.

For random variable z defined on (Ω, F, IP) and Borel measurable


function g (·), g (z) is a random variable defined on (Ω, F, IP). The
same conclusion holds for d-dimensional random vector z and
B d -measurable function g (·).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 9 / 58
Distribution Function

The joint distribution function of z is a non-decreasing,


right-continuous function Fz such that for ζ = (ζ1 , . . . , ζd )0 ∈ Rd ,

Fz (ζ) = IP{ω ∈ Ω : z1 (ω) ≤ ζ1 , . . . , zd (ω) ≤ ζd },

with

lim Fz (ζ) = 0, lim Fz (ζ) = 1.


ζ1 →−∞, ..., ζd →−∞ ζ1 →∞, ..., ζd →∞

The marginal distribution function of the i th component of z is

Fzi (ζi ) = IP{ω ∈ Ω : zi (ω) ≤ ζi } = Fz (∞, . . . , ∞, ζi , ∞, . . . , ∞).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 10 / 58
Independence

y and z are (pairwise) independent iff for any Borel sets B1 and B2 ,

IP(y ∈ B1 and z ∈ B2 ) = IP(y ∈ B1 ) IP(z ∈ B2 ).

A sequence of random variables {zi } is totally independent if


\  Y
IP {zi ∈ Bi } = IP(zi ∈ Bi ).
all i all i

Lemma 5.1
Let {zi } be a sequence of independent random variables and hi ,
i = 1, 2, . . . be Borel-measurable functions. Then {hi (zi )} is also a
sequence of independent random variables.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 11 / 58
Expectation

The expectation of Zi is the Lebesgue integral of zi wrt to IP:


Z
IE(zi ) = zi (ω) d IP(ω).

In terms of its distribution function,


Z Z
IE(zi ) = ζi dFz (ζ) = ζi dFzi (ζi ).
Rd R

For Borel measurable function g (·) of z,


Z Z
IE[g (z)] = g (z(ω)) d IP(ω) = g (ζ) dFz (ζ).
Ω Rd

For example, the covariance matrix of z IE(zz0 ).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 12 / 58
A function g is convex on a set S if for any a ∈ [0, 1] and any x, y in S,

g (ax + (1 − a)y ) ≤ ag (x) + (1 − a)g (y );

g is concave on S if the inequality above is reversed.

Lemma 5.2 (Jensen)


Let g be a convex function on the support of z. For an integrable random
variable z such that g (z) is integrable, g (IE(z)) ≤ IE[g (z)]; the inequality
reverses if g is concave.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 13 / 58
Lp -Norm

For random variable z with finite p th moment, its Lp -norm is:


kzkp = [IE(z p )]1/p .
The inner product of square integrable random variables zi and zj is:

hzi , zj i = IE(zi zj ).

The L2 -norm of zi can be obtained as kzi k2 = hzi , zi i1/2 .


For any c > 0 and p > 0, note that
Z Z
p p
c IP(|z| ≥ c) = c 1{ζ:|ζ|≥c} dFz (ζ) ≤ |ζ|p dFz (ζ) ≤ IE |z|p ,
{ζ:|ζ|≥c}

where 1A is the indicator function of the event A.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 14 / 58
Inequalities

Lemma 5.3 (Markov)


Let z be a random variable with finite p th moment. Then,

IE |z|p
IP(|z| ≥ c) ≤ ,
cp
where c is a positive real number.

For p = 2, Markov’s inequality is also known as Chebyshev’s


inequality.
Markov’s inequality is trivial if c is small such that IE |z|p /c p > 1.
When c becomes large, the probability that z assumes very extreme
values will be vanishing at the rate c −p .

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 15 / 58
Lemma 5.4 (Hölder)
Let y be a random variable with finite p th moment (p > 1) and z a
random variable with finite q th moment (q = p/(p − 1)). Then,

IE |yz| ≤ ky kp kzkq .

Since | IE(yz)| ≤ IE |yz|, we also have:

Lemma 5.5 (Cauchy-Schwatz)


Let y and z be two square integrable random variables. Then,

| IE(yz)| ≤ ky k2 kzk2 .

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 16 / 58
Let y = 1 and x = z p . For q > p and r = q/p, by Hölder’s inequality,

IE |z p | ≤ kxkr ky kr /(r −1) = [IE(z pr )]1/r = [IE(z q )]p/q .

Lemma 5.6 (Liapunov)


Let z be a random variable with finite q th moment. Then for p < q,
kzkp ≤ kzkq .

Lemma 5.7 (Minkowski)


Let zi , i = 1, . . . , n, be random variables with finite p th moment (p ≥ 1).
Then, k ni=1 zi kp ≤ ni=1 kzi kp .
P P

When n = 2, this is just the triangle inequality for Lp -norms.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 17 / 58
Conditional Distributions

Given A, B ∈ F, suppose we know B has occurred. Given the


outcome space is restricted to B, the likelihood of A is characterized
by the conditional probability: IP(A | B) = IP(A ∩ B)/ IP(B).
The conditional density function of z given y = η is

fz,y (ζ, η)
fz|y (ζ | y = η) = .
fy (η)

fz|y (ζ | y = η) is clearly non-negative. Also


Z Z
1 1
fz|y (ζ | y = η)dζ = fz,y (ζ, η)dζ = f (η) = 1.
Rd fy (η) Rd fy (η) y

That is, fz|y (ζ | y = η) is a legitimate density function.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 18 / 58
Given the conditional density function fz|y , for A ∈ B d ,
Z
IP(z ∈ A | y = η) = fz|y (ζ | y = η)dζ.
A

This probability is defined even when IP(y = η) is zero.


When A = (−∞, ζ1 ] × · · · × (−∞, ζd ], the conditional distribution
function is

Fz|y (ζ | y = η) = IP(z1 ≤ ζ1 , . . . , zd ≤ ζd | y = η).

When z and y are independent, the conditional density (distribution)


reduces to the unconditional density (distribution).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 19 / 58
Let G be a sub-σ-algebra of F, the conditional expectation IE(z | G)
is the integrable and G-measurable random variable satisfying
Z Z
IE(z | G) d IP = z d IP, ∀G ∈ G.
G G

Suppose that G is the trivial σ-algebra {Ω, ∅}, then IE(z | G) must be
a constant c, so that
Z Z
IE(z) = z d IP = c d IP = c.
Ω Ω

Consider G = σ(y), the σ-algebra generated by y.

IE(z | y) = IE[z | σ(y)],

which is interpreted as the prediction of z given all the information


associated with y.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 20 / 58
By definition,
Z Z
IE[IE(z | G)] = IE(z | G) d IP = z d IP = IE(z);
Ω Ω

That is, only a smaller σ-algebra matters in conditional expectation.

Lemma 5.9 (Law of Iterated Expectations)


Let G and H be two sub-σ-algebras of F such that G ⊆ H. Then for the
integrable random vector z,

IE[IE(z | H) | G] = IE[IE(z | G) | H] = IE(z | G).

If z is G-measurable, then IE[g (z)x | G] = g (z) IE(x | G) with prob. 1.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 21 / 58
Lemma 5.11
Let z be a square integrable random variable. Then

IE[z − IE(z | G)]2 ≤ IE(z − z̃)2 ,

for any G-measurable random variable z̃.

Proof: For any square integrable, G-measurable random variable z̃,


 
IE [z − IE(z | G)]z̃ = IE [IE(z | G) − IE(z | G)]z̃ = 0.

It follows that

IE(z − z̃)2 = IE[z − IE(z | G) + IE(z | G) − z̃]2

= IE[z − IE(z | G)]2 + IE[IE(z | G) − z̃]2

≥ IE[z − IE(z | G)]2 . 2

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 22 / 58
The conditional variance-covariance matrix of z given y is

var(z | y) = IE [z − IE(z | y)][z − IE(z | y)]0 | y




= IE(zz0 | y) − IE(z | y) IE(z | y)0 ,

which leads to decomposition of analysis of variance:



var(z) = IE[var(z | y)] + var IE(z | y) .

Example 5.12: Suppose that


" # " # " #!
y µy Σyy Σ0xy
∼N , .
x µx Σxy Σxx

Then,

IE(y | x) = µy − Σ0xy Σ−1


xx (x − µx ),

var(y | x) = var(y) − var IE(y | x) = Σyy − Σ0xy Σ−1



xx Σxy .

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 23 / 58
Almost Sure Convergence

A sequence of random variables, {zn (·)}n=1,2,... , is such that for a given ω,


zn (ω) is a realization of the random element ω with index n, and that for a
given n, zn (·) is a random variable.

Almost Sure Convergence


Suppose {zn } and z are all defined on (Ω, F, IP). {zn } is said to converge
to z almost surely if, and only if,

IP(ω : zn (ω) → z(ω) as n → ∞) = 1,


a.s.
denoted as zn −→ z or zn → z a.s. (with prob. 1).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 24 / 58
Lemma 5.13
a.s.
Let g : R 7→ R be a function continuous on Sg ⊆ R. If zn −→ z, where z
a.s.
is a random variable such that IP(z ∈ Sg ) = 1, then g (zn ) −→ g (z).

Proof: Let Ω0 = {ω : zn (ω) → z(ω)} and Ω1 = {ω : z(ω) ∈ Sg }. Thus,


for ω ∈ (Ω0 ∩ Ω1 ), continuity of g ensures that g (zn (ω)) → g (z(ω)). Note
that

(Ω0 ∩ Ω1 )c = Ωc0 ∪ Ωc1 ,

which has probability zero because IP(Ωc0 ) = IP(Ωc1 ) = 0. (Why?) It


follows that Ω0 ∩ Ω1 has probability one, showing that g (zn ) → g (z) with
probability one. 2

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 25 / 58
Convergence in Probability

Convergence in Probability
{zn } is said to converge to z in probability if for every  > 0,

lim IP(ω : |zn (ω) − z(ω)| > ) = 0,


n→∞

or equivalently, limn→∞ IP(ω : |zn (ω) − z(ω)| ≤ ) = 1. This is denoted as


IP
zn −→ z or zn → z in probability.

Note: In this definition, the events Ωn () = {ω : |zn (ω) − z(ω)| ≤ } may
vary with n, and convergence is referred to the probability of such events:
pn = IP(Ωn ()), rather than the random variables zn .

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 26 / 58
Almost sure convergence implies convergence in probability.

To see this, let Ω0 denote the set of ω such that zn (ω) → z(ω). For
ω ∈ Ω0 , there is some m such that ω is in Ωn () for all n > m. That is,
∞ \
[ ∞
Ω0 ⊆ Ωn () ∈ F.
m=1 n=m

As ∩∞
n=m Ωn () is non-decreasing in m, it follows that

∞ \ ∞
!
[
IP(Ω0 ) ≤ IP Ωn ()
m=1 n=m

!
\ 
= lim IP Ωn () ≤ lim IP Ωm () .
m→∞ m→∞
n=m

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 27 / 58
Example 5.15
Let Ω = [0, 1] and IP be the Lebesgue measure. Consider the
sequence of intervals {In } in [0, 1]: [0, 1/2), [1/2, 1], [0, 1/3),
[1/3, 2/3), [2/3, 1], . . . , and let zn = 1In . When n tends to infinity,
In shrinks toward a singleton. For 0 <  < 1, we have

IP(|zn | > ) = IP(In ) → 0,


IP
which shows zn −→ 0. On the other hand, each ω ∈ [0, 1] must be
covered by infinitely many intervals, so that zn (ω) = 1 for infinitely
many n. This shows that zn (ω) does not converge to zero. 2
Note: Convergence in probability permits zn to deviate from the
probability limit infinitely often, but almost sure convergence does
not, except for those ω in the set of probability zero.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 28 / 58
Lemma 5.16
Let {zn } be a sequence of square integrable random variables. If
IP
IE(zn ) → c and var(zn ) → 0, then zn −→ c.

Lemma 5.17
IP
Let g : R 7→ R be a function continuous on Sg ⊆ R. If zn −→ z, where z
IP
is a random variable such that IP(z ∈ Sg ) = 1, then g (zn ) −→ g (z).

Proof: By the continuity of g , for each  > 0, we can find a δ > 0 s.t.

{ω : |zn (ω) − z(ω)| ≤ δ} ∩ {ω : z(ω) ∈ Sg }

⊆ {ω : |g (zn (ω)) − g (z(ω))| ≤ }.

Taking complementation of both sides, we have

IP(|g (zn ) − g (z)| > ) ≤ IP(|zn − z| > δ) → 0.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 29 / 58
Lemma 5.13 and Lemma 5.17 are readily generalized to Rd -valued random
a.s. IP
variables. For instance, zn −→ z (zn −→ z) implies
a.s. IP
z1,n + z2,n −→ (−→) z1 + z2 ,
a.s. IP
z1,n z2,n −→ (−→) z1 z2 ,
2 2 a.s. IP
z1,n + z2,n −→ (−→) z12 + z22 ,

where z1,n , z2,n are two elements of zn and z1 , z2 are the corresponding
elements of z. Also, provided that z2 6= 0 with probability one,
a.s. IP
z1,n /z2,n −→ (−→) z1 /z2 .

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 30 / 58
Convergence in Distribution

Convergence in Distribution
D
{zn } is said to converge to z in distribution, denoted as zn −→ z, if

lim Fzn (ζ) = Fz (ζ),


n→∞

for every continuity point ζ of Fz .

We also say that zn is asymptotically distributed as Fz , denoted as


A
zn ∼ Fz ; Fz is thus known as the limiting distribution of zn .
Cramér-Wold Device. Let {zn } be a sequence of random vectors in
D D
Rd . Then zn −→ z if and only if α0 zn −→ α0 z for every α ∈ Rd such
that α0 α = 1.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 31 / 58
Lemma 5.19
IP D IP D
If zn −→ z, then zn −→ z. For a constant c, zn −→ c iff zn −→ c.

Proof: For some arbitrary  > 0 and a continuity point ζ of Fz , we have

IP(zn ≤ ζ) =

IP({zn ≤ ζ} ∩ {|zn − z| ≤ }) + IP({zn ≤ ζ} ∩ {|zn − z| > })

≤ IP(z ≤ ζ + ) + IP(|zn − z| > ).

IP
Similarly, IP(z ≤ ζ − ) ≤ IP(zn ≤ ζ) + IP(|zn − z| > ). If zn −→ z, then
by passing to the limit and noting that  is arbitrary,

lim IP(zn ≤ ζ) = IP(z ≤ ζ).


n→∞

That is, Fzn (ζ) → Fz (ζ). The converse is not true in general, however.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 32 / 58
Theorem 5.20 (Continuous Mapping Theorem)
Let g : R 7→ R be a function continuous almost everywhere on R, except
D D
for at most countably many points. If zn −→ z, then g (zn ) −→ g (z).
D D
For example, zn −→ N (0, 1) implies zn2 −→ χ2 (1).

Theorem 5.21
Let {yn } and {zn } be two sequences of random vectors such that
IP D D
yn − zn −→ 0. If zn −→ z, then yn −→ z.

Theorem 5.22
If yn converges in probability to a constant c and zn converges in
D D D
distribution to z, then yn + zn −→ c + z, yn zn −→ cz, and zn /yn −→ z/c
if c 6= 0.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 33 / 58
Non-Stochastic Order Notations

Order notations are used to describe the behavior of real sequences.


bn is (at most) of order cn , denoted as bn = O(cn ), if there exists a
∆ < ∞ such that |bn |/cn ≤ ∆ for all sufficiently large n.
bn is of smaller order than cn , denoted as bn = o(cn ), if bn /cn → 0.
An O(1) sequence in bounded; an o(1) sequence converges to zero.
The product of O(1) and o(1) sequences is o(1).

Theorem 5.23
(a) If an = O(nr ) and bn = O(ns ), then an bn = O(nr +s ), an + bn = O(nmax(r ,s) ).
(b) If an = o(nr ) and bn = o(ns ), then an bn = o(nr +s ), an + bn = o(nmax(r ,s) ).
(c) If an = O(nr ) and bn = o(ns ), then an bn = o(nr +s ), an + bn = O(nmax(r ,s) ).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 34 / 58
Stochastic Order Notations

The order notations defined earlier easily extend to describe the behavior
of sequences of random variables.

{zn } is Oa.s. (cn ) (or O(cn ) almost surely) if zn /cn is O(1) a.s.
{zn } is OIP (cn ) (or O(cn ) in probability) if for every  > 0, there is
some ∆ such that IP(|zn |/cn ≥ ∆) ≤ , for all n sufficiently large.
Lemma 5.23 holds for stochastic order notations. For example,
yn = OIP (1) and zn = oIP (1), then yn zn is oIP (1).
It is very restrictive to require a random variable being bounded
almost surely, but a well defined random variable is typically bounded
in probability, i.e., OIP (1).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 35 / 58
D
Let {zn } be a sequence of random variables such that zn −→ z and ζ be a
continuity point of Fz . Then for any  > 0, we can choose a sufficiently
D
large ζ such that IP(|z| > ζ) < /2. As zn −→ z, we can also choose n
large enough such that

IP(|zn | > ζ) − IP(|z| > ζ) < /2,

which implies IP(|zn | > ζ) < . We have proved:

Lemma 5.24
D
Let {zn } be a sequence of random variables such that zn −→ z. Then
zn = OIP (1).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 36 / 58
Law of Large Numbers

When a law of large numbers holds almost surely, it is a strong law of


large numbers (SLLN); when a law of large numbers holds in
probability, it is a weak law of large numbers (WLLN).
A sequence of random variables obeys a LLN when its sample average
essentially follows its mean behavior; random irregularities (deviations
from the mean) are “wiped out” in the limit by averaging.
Kolmogorov’s SLLN : Let {zt } be a sequence of i.i.d. random
a.s.
variables with mean µo . Then, T −1 T
P
t=1 zt −→ µo .

Note that i.i.d. random variables need not obey Kolmogorov’s SLLN if
they do not have a finite mean, e.g., i.i.d. Cauchy random variables.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 37 / 58
Theorem 5.26 (Markov’s SLLN)
Let {zt } be a sequence of independent random variables such that for
some δ > 0, IE |zt |1+δ is bounded for all t. Then,
T
1 X a.s.
[zt − IE(zt )] −→ 0.
T
t=1

Note that here zt need not have a common mean, and the average of
their means need not converge.
Compared with Kolmogorov’s SLLN, Markov’s SLLN requires a
stronger moment condition but not identical distribution.
A LLN usually obtains by regulating the moments of and dependence
across random variables.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 38 / 58
Examples

Example 5.27 Suppose that yt = αo yt−1 + ut with |αo | < 1. Then,


σu2
var(yt ) = σu2 /(1 − αo2 ), and cov(yt , yt−j ) = αoj 1−α 2 . Thus,
o

T T T −1
!
X X X
var yt = var(yt ) + 2 (T − τ ) cov(yt , yt−τ )
t=1 t=1 τ =1
T
X T
X −1
≤ var(yt ) + 2T | cov(yt , yt−τ )| = O(T ),
t=1 τ =1
 
so that var T −1 T = O(T −1 ). As IE(T −1 T
P P
t=1 y t t=1 yt ) = 0,

1 PT IP
T t=1 yt −→ 0.

by Lemma 5.16. That is, {yt } obeys a WLLN.


C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 39 / 58
Lemma 5.28
Let yt = ∞
P
i=0 πi ut−i , where ut are i.i.d. random variables with mean zero
a.s.
and variance σu2 . If ∞
P −1
PT
i=−∞ |πi | < ∞, then T t=1 yt −→ 0.

P∞ i
In Example 5.27, yt = i=0 αo ut−i with |αo | < 1, so that
P∞ i
i=0 |αo | < ∞

Lemma 5.28 is quite general and applicable to processes that can be


expressed as an MA process with absolutely summable weights, e.g.,
weakly stationary AR(p) processes.
For random variables with strong correlations over time, the variation
of their partial sums may grow too rapidly and cannot be eliminated
by simple averaging.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 40 / 58
Example 5.29: For the sequences {t} and {t 2 },
PT PT 2
t=1 t = T (T + 1)/2, t=1 t = T (T + 1)(2T + 1)/6.
PT PT
Hence, T −1 t=1 t and T −1 t=1 t
2 both diverge.

Example 5.30: ut are i.i.d. with mean zero and variance σu2 . Consider
now {tut }, which does not have bounded (1 + δ) th moment and does not
obey Markov’s SLLN. Moreover,
T T
!
X X T (T + 1)(2T + 1)
var tut = t 2 var(ut ) = σu2 ,
6
t=1 t=1

so that T
P 3/2 ). It follows that T −1
PT 1/2 ).
t=1 tut = OIP (T t=1 tut = OIP (T
That is, {tut } does not obey a WLLN.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 41 / 58
Example 5.31: yt is a random walk: yt = yt−1 + ut . For s < t,
Pt
y t = ys + i=s+1 ui = ys + vt−s ,

where vt−s is independent of ys and cov(yt , ys ) = IE(ys2 ) = sσu2 . Thus,

T T T −1 T
!
X X X X
var yt = var(yt ) + 2 cov(yt , yt−τ ) = O(T 3 ),
t=1 t=1 τ =1 t=τ +1

PT PT 2
for t=1 var(yt ) = t=1 tσu = O(T 2 ) and
T
X −1 T
X T
X −1 T
X
2 cov(yt , yt−τ ) = 2 (t − τ )σu2 = O(T 3 ).
τ =1 t=τ +1 τ =1 t=τ +1

PT PT
Then, t=1 yt = OIP (T 3/2 ) and T −1 t=1 yt diverges in probability.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 42 / 58
Example 5.32: yt is the random walk in Example 5.31. Then,
2 ) IE(u 2 ) = (t − 1)σ 4 , and for s < t,
IE(yt−1 ut ) = 0, var(yt−1 ut ) = IE(yt−1 t u

cov(yt−1 ut , ys−1 us ) = IE(yt−1 ys−1 us ) IE(ut ) = 0.

This yields
T T T
!
X X X
var yt−1 ut = var(yt−1 ut ) = (t − 1)σu4 = O(T 2 ),
t=1 t=1 t=1

and T
P −1
PT 4
t=1 yt−1 ut = OIP (T ). As var(T t=1 yt−1 ut ) converges to σu /2,
rather than 0, {yt−1 ut } does not obey a WLLN, even though its partial
sums are OIP (T ).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 43 / 58
Central Limit Theorem (CLT)

Lemma 5.35 (Lindeberg-Lévy’s CLT)


Let {zt } be a sequence of i.i.d. random variables with mean µo and
√ D
variance σo2 > 0. Then, T (z̄T − µo )/σo −→ N (0, 1).

i.i.d. random variables need not obey this CLT if they do not have a
finite variance, e.g., t(2) r.v.
Note that z̄T converges to µo in probability, and its variance σo2 /T
vanishes when T tends to infinity. A normalizing factor T 1/2 suffices
to prevent a degenerate distribution in the limit.
When {zt } obeys a CLT, z̄T is said to converge to µo at the rate
T −1/2 , and z̄T is understood as a root-T consistent estimator.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 44 / 58
Lemma 5.36 (Liapunov’s CLT)
Let {zTt } be a triangular array of independent random variables with mean
2 > 0 such that σ̄ 2 = 1
PT 2 2
µTt and variance σTt T T t=1 σTt → σo > 0. If for
√ D
some δ > 0, IE |zTt |2+δ are bounded, then T (z̄T − µ̄T )/σo −→ N (0, 1).

A CLT usually requires stronger conditions on the moment of and


dependence across random variables than those needed to ensure a
LLN.
Moreover, every random variable must also be asymptotically
negligible, in the sense that no random variable is influential in
affecting the partial sums.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 45 / 58
Examples

Example 5.37: {ut } is a sequence of independent random variables with


mean zero, variance σu2 , and bounded (2 + δ) th moment. we know
var( T
P 3 −1/2
PT
t=1 tut ) is O(T ), which implies that variance of T t=1 tut is
2
diverging at the rate O(T ). On the other hand, observe that
T
!
1 X t T (T + 1)(2T + 1) 2 σu2
var u = σ → .
T 1/2 T t 6T 3 u
3
t=1

It follows that
√ T
3 X t D
u −→ N (0, 1).
T 1/2 σu t=1 T t

These results show that {(t/T )ut } obeys a CLT, whereas {tut } does not.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 46 / 58
Example 5.38: yt is a random walk: yt = yt−1 + ut , where ut are i.i.d.
with mean zero and variance σu2 . We know yt do not obey a LLN and
hence do not obey a CLT.
CLT for Triangular Array
{zTt } is a triangular array of random variables and obeys a CLT if
T √
1 X T (z̄T − µ̄T ) D
√ [zTt − IE(zTt )] = −→ N (0, 1),
σo T σo
t=1
PT
where z̄T = T −1 t=1 zTt , µ̄T = IE(z̄T ), and
T
!
X
−1/2
σT2 = var T zTt → σo2 > 0.
t=1

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 47 / 58
Consider an array of square integrable random vectors zTt in Rd . Let
z̄T denote the average of zTt , µ̄T = IE(z̄T ), and
T
!
1 X
ΣT = var √ zTt → Σo ,
T t=1

a positive definite matrix. Using the Cramér-Wold device, {zTt } is


said to obey a multivariate CLT, in the sense that
T √
−1/2 1 −1/2 D
X
Σo √ [zTt − IE(zTt )] = Σo T (z̄T − µ̄T ) −→ N (0, Id ),
T t=1

if {α0 zTt } obeys a CLT, for any α ∈ Rd such that α0 α = 1.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 48 / 58
Stochastic Processes

A d-dimensional stochastic process with the index set T is a


measurable mapping z : Ω 7→ (Rd )T such that

z(ω) = {zt (ω), t ∈ T }.

For each t ∈ T , zt (·) is a Rd -valued r.v.; for each ω, z(ω) is a sample


path (realization) of z, a Rd -valued function on T .
The finite-dimensional distributions of {z(t, ·), t ∈ T } is

IP(zt1 ≤ a1 , . . . , ztn ≤ an ) = Ft1 ,...,tn (a1 , . . . , an ).

z is stationary if Ft1 ,...,tn are invariant under index displacement.


z is Gaussian if Ft1 ,...,tn are all (multivariate) normal.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 49 / 58
Brownian motion

The process {w (t), t ∈ [0, ∞)} is the standard Wiener process (standard
Brownian motion) if it has continuous sample paths almost surely and
satisfies:

1 IP w (0) = 0 = 1.

2 For 0 ≤ t0 ≤ t1 ≤ · · · ≤ tk ,
 Q 
IP w (ti ) − w (ti−1 ) ∈ Bi , i ≤ k = i≤k IP w (ti ) − w (ti−1 ) ∈ Bi ,

where Bi are Borel sets.


3 For 0 ≤ s < t, w (t) − w (s) ∼ N (0, t − s).
Note: w here has independent and Gaussian increments.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 50 / 58
w (t) ∼ N (0, t) such that for r ≤ t,

cov w (r ), w (t) = IE w (r ) w (t) − w (r ) + IE w (r )2 = r .


    

The sample paths of w are a.s. continuous but highly irregular


(nowhere differentiable).
To see this, note wc (t) = w (c 2 t)/c for c > 0 is also a standard
Wiener process. (Why?) Then, wc (1/c) = w (c)/c. For a large c
such that w (c)/c > 1, wc1/c
(1/c)
= w (c) > c. That is, the sample path
of wc has a slope larger than c on a very small interval (0, 1/c).
The difference quotient:

[w (t + h) − w (t)]/h ∼ N (0, 1/|h|)

can not converge to a finite limit (as h → 0) with a positive prob.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 51 / 58
The d-dimensional, standard Wiener process w consists of d mutually
independent, standard Wiener processes, so that for s < t,
w(t) − w(s) ∼ N (0, (t − s) Id ).

Lemma 5.39
Let w be the d-dimensional, standard Wiener process.
1 w(t) ∼ N (0, t Id ).
2 cov(w(r ), w(t)) = min(r , t) Id .

The Brownian bridge w0 on [0, 1] is w0 (t) = w(t) − tw(1). Clearly,


IE[w0 (t)] = 0, and for r < t,

cov w0 (r ), w0 (t) = cov w(r ) − r w(1), w(t) − tw(1) = r (1 − t) Id .


 

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 52 / 58
Weak Convergence

IPn converges weakly to IP, denoted as IPn ⇒ IP, if for every bounded,
continuous real function f on S,
Z Z
f (s) dIPn (s) → f (s) d IP(s),

where {IPn } and IP are probability measures on (S, S).

When zn and z are all Rd -valued random variables, IPn ⇒ IP reduces


D
to the usual notion of convergence in distribution: zn −→ z.
When zn and z are d-dimensional stochastic processes with the
D
distributions induced by IPn and IP, zn −→ z, also denoted as zn ⇒ z,
implies that all the finite-dimensional distributions of zn converge to
the corresponding distributions of z.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 53 / 58
Continuous Mapping Theorem

Lemma 5.40 (Continuous Mapping Theorem)


Let g : Rd 7→ R be a function continuous almost everywhere on Rd , except
for at most countably many points. If zn ⇒ z, then g (zn ) ⇒ g (z).

Proof: Let S and S 0 be two metric spaces with Borel σ-algebras S and S 0 and
g : S 7→ S 0 be a measurable mapping. For IP on (S, S), define IP∗ on (S 0 , S 0 ) as

IP∗ (A0 ) = IP(g −1 (A0 )), A0 ∈ S 0 .

For every bounded, continuous f on S 0 , f ◦ g is also bounded and continuous on


S. IPn ⇒ IP now implies that
Z Z
f ◦ g (s) dIPn (s) → f ◦ g (s) d IP(s),

f (a) dIP∗n (a) → f (a) dIP∗ (a), proving IP∗n ⇒ IP∗ .


R R
which is equivalent to

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 54 / 58
Functional Central Limit Theorem (FCLT)

ζi are i.i.d. with mean zero and variance σ 2 . Let sn = ζ1 + · · · + ζn



and zn (i/n) = (σ n)−1 si .
For t ∈ [(i − 1)/n, i/n), the constant interpolations of zn (i/n) is

1
zn (t) = zn ((i − 1)/n) = √ s[nt] ,
σ n

where [nt] is the the largest integer less than or equal to nt.
From Lindeberg-Lévy’s CLT,
1/2
D √

1 [nt] 1
√ s[nt] = p s[nt] −→ t N (0, 1),
σ n n σ [nt]

which is just N (0, t), the distribution of w (t).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 55 / 58
For r < t, we have
D 
(zn (r ), zn (t) − zn (r )) −→ w (r ), w (t) − w (r ) ,

D
and hence (zn (r ), zn (t)) −→ (w (r ), w (t)). This is easily extended to
establish convergence of any finite-dimensional distributions and leads
to the functional central limit theorem.

Lemma 5.41 (Donsker)


Let ζt be i.i.d. with mean µo and variance σo2 > 0 and
[Tr ]
1 X
zT (r ) = √ (ζt − µo ), r ∈ [0, 1].
σo T t=1

Then, zT ⇒ w as T → ∞.

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 56 / 58
Let ζt be r.v.s with mean µt and variance σt2 > 0. Define long-run
variance of ζt as
T
!
2 1 X
σ∗ = lim var √ ζt ,
T →∞ T t=1
{ζt } is said to obey an FCLT if zT ⇒ w as T → ∞, where
[Tr ]
1 X 
zT (r ) = √ ζt − µt , r ∈ [0, 1].
σ∗ T t=1
In the multivariate context, FCLT is zT ⇒ w as T → ∞, where
[Tr ]
1 −1/2 X 
zT (r ) = √ Σ∗ ζ t − µt , r ∈ [0, 1],
T t=1
w is the d-dimensional, standard Wiener process, and

T
! T !0 
1  X X
Σ∗ = lim IE (ζ t − µt ) (ζ t − µt )  ,
T →∞ T
t=1 t=1

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 57 / 58
Example 5.43

yt = yt−1 + ut , t = 1, 2, . . ., with y0 = 0, where ut are i.i.d. with


mean zero and variance σu2 .
P[Tr ]
By Donsker’s FCLT, the partial sum y[Tr ] = t=1 ut is such that

T T Z t/T Z 1
1 X X 1
yt = σ u √ y[Tr ] dr ⇒ σu w (r ) dr ,
T 3/2 t=1 t=1 (t−1)/T T σu 0

PT
This result also verifies that t=1 yt is OIP (T 3/2 ). Similarly,
T T Z 1
1 X 2 1 X  yt  2 2
yt = √ ⇒ σu w (r )2 dr ,
T2 T T 0
t=1 t=1
PT 2
so that t=1 yt is OIP (T 2 ).

C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, 2009 58 / 58

You might also like