0% found this document useful (0 votes)
89 views

Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei

The document discusses martingale limit theory and stochastic regression theory. It provides examples of martingales, including a branching process with immigration where the process {Zn+1} is a martingale. It also discusses conditional expectation, defining it as a random variable that satisfies being F-measurable and having integrals over events in F equal to integrals of the underlying process over the same events. Uniqueness of conditional expectation is shown using the fact that if two random variables both satisfy the properties, their difference must have probability 0.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei

The document discusses martingale limit theory and stochastic regression theory. It provides examples of martingales, including a branching process with immigration where the process {Zn+1} is a martingale. It also discusses conditional expectation, defining it as a random variable that satisfies being F-measurable and having integrals over events in F equal to integrals of the underlying process over the same events. Uniqueness of conditional expectation is shown using the fact that if two random variables both satisfy the properties, their difference must have probability 0.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 155

Martingale Limit Theory

and
Stochastic Regression Theory

Ching-Zong Wei
Contents

1 Martingale Limit Theory 2


1.1 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Martingale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Basic Inequalities (maximum inequalities) . . . . . . . . . . . . . . . 25
1.4 Square function inequality . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5 Series Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 Stochastic Regression Theory 109


2.1 Introduction: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

1
Chapter 1

Martingale Limit Theory

Some examples of Martingale:

Example 1.1 Let yi = ayi−1 + εi , where εi i.i.d. with E(εi ) = 0, Var(εi ) = σ 2 , and
if we estimate a by least squares estimation
Pn
yi−1 yi
â = Pi=1
n 2
i=1 yi−1
Pn
yi−1 εi
â − a = Pi=1n 2
,
i=1 yi−1

then Sn = ni=1 yi−1 εi is a martingale.


P

Example 1.2 Likelihood Ratio:


Given Θ, and

Ln (θ) = fθ (X1 , . . . , Xn )
= fθ (Xn |X1 , . . . , Xn−1 ) · fθ (X1 , . . . , Xn−1 )
Yn
= fθ (Xi |X1 , . . . , Xi−1 ) · fθ (X1 ),
i=2

Ln (θ)
then Rn (θ) = Ln (θ0 )
, Rn (θ) is a martingale.

2
For example, if Xi = θui + εi , where ui is constant, {εi } is i.i.d. N (0.1), then
n (x −θu )2
1
P
i=1 i i
fθ (x1 , . . . , xn ) = ( √ )n e− 2

Pn
(x −θui )2
Pn
fθ (x1 , . . . , xn ) (x −θ u )2
− i=1 2i + i=1 i2 0 i
= e
fθ0 (x1 , . . . , xn )
Pn (θ 2 −θ0
2) Pn
ui xi − u2i
= e(θ−θ0 ) i=1 2 i=1 .

Example 1.3 Likelihood: L0 = 1, d logdθLn (θ) is a martingale.

log Ln (θ) = log fθ (Xn |X1 , . . . , Xn−1 ) + log Ln−1 (θ)


d log fθ (Xn |X1 , . . . , Xn−1 ) d[log Ln (θ) − log Ln−1 (θ)]
ui (θ) = =
dθ dθ
Xn
In (θ) = Eθ (u2i (θ)|X1 , . . . , Xn−1 ).
i=1

Let
dui (θ) d2 log fθ (Xn |X1 , . . . , Xn−1 )
Vi (θ) = = ,
dθ dθ2
since
Eθ (u2i (θ)|X1 , . . . , Xi−1 ) = −Eθ (Vi (θ)|X1 , . . . , Xi−1 )
and n
X
Jn (θ) = Vi (θ),
i=1

Then Jn (θ) + In (θ) is a martingale.

Example 1.4
PZnBranching Process with Immigration :
Let Zn+1 = i=1 Yn+1,i +In+1 , where {Yj,i } is i.i.d. with mean E(Yj,i ) = m, Var(Yj,i ) =

3
σ 2 , and {In } is i.i.d. with mean E(In ) = b, Var(In ) = λ, then

E(Zn+1 |Fn ) = mZn + b


Zn+1 = E(Zn+1 |Fn ) + δn+1
δn+1 = Zn+1 − E(Zn+1 |Fn )
2
E(δn+1 |Fn ) = σ 2 Zn + λ
Zn
X
Zn+1 = mZn + b + { (Yn+1,i − m) + (In+1 − b)}
i=1
p
= mZn + b + σ 2 Zn + λ εn+1 ,

where
δn+1
εn+1 = √ 2
.
σ Zn + λ

4
Consider (Ω, F, P), where

Ω: Sample space

F: σ–algebra ⊂ 2Ω

P: probability

{X = ai } = Ei , i = 1, . . . , n
FX = minimum σ–algebra ⊃ {E1 , . . . , En }
FX1 ,X2 = σ–algebra ⊃ {X1 = ai , X2 = bj } i = · · · , j = · · ·
Note that FX1 ,X2 ⊃ FX1 .

{Xn } is said to be {Fn }–adaptive if Xn is Fn –measurable (i.e. FXn ⊂ Fn .)

1.1 Conditional Expectation


Main purpose: Given X1 = a1 , . . . , Xn = an to find the expectation of Y , i.e. to find
E(Y |X1 = a1 , . . . , Xn = an ).
(Ω, F, P) is a probability space.
Given an event B with P (B) > 0, the conditional probability given B is defined
to be
P (A ∩ B)
P (A|B) = ∀A ∈ F,
P (B)
then (Ω, F, P(·|B)) is a probability space.
Given X, we can define
Z
E(X|B) = XdP (·|B).
Pn Pn
Example 1.5 Let X = i=1 ai IAi where Ai = {X = ai }, then E(X|B) = i=1 ai P (Ai |B).

Ω = ∪∞ i=1 Bi , where Bi ∩ Bj = ∅ if i 6= j.
F = σ(Bi ), P 1≤i<∞
E(X|F) = ∞ i=1 E(X|Bi )IBi

Observe that if X = ni=1 ai IAi , Ω = ∪li=1 Bi , Bi ∩ Bj = ∅ if i 6= j, then


P

(i) E(X|F) is F–measurable and E(X|F) ∈ L1 ,


R R
(ii) ∀ G ∈ F, G E(X|F)dP = G XdP .

5
Sol :
(i) E(X|F) = li=1 E(X|Bi )IBi ,
P

|E(X|F)| ≤ li=1 |E(X|Bi )| < ∞ ⇒ E(X|F) ∈ L1


P

(ii) ∀ G ∈ F
Z l
Z X
E(X|F)dP = E(X|Bi )IBi dP
G G i=1
l
X
= E(X|Bi )P (Bi ∩ G)
i=1
Xl Xn
= aj P (Aj |Bi )P (Bi ∩ G)
i=1 j=1
n
X Xl
= aj ( P (Aj |Bi )P (Bi ∩ G))
j=1 i=1
Xn
= aj P (Aj ∩ G).
j=1

Since by hypothesis G ∈ F,
∃ an index set I s.t. G = ∪i∈I Bi
l
X X X
P (Aj |Bi )P (Bi ∩ G) = P (Aj |Bi )P (Bi ) = P (Aj ∩ Bi )
i=1 i∈I i∈I
= P (Aj ∩ (∪i∈I Bi )) = P (Aj ∩ G).

Definition 1.1 (Ω, G, P) is a probability space. Let F ⊂ G, X ∈ L1 . Define the


conditional expectation of X given F to be a random variable that satisfies (i) and
(ii).

Existence and Uniqueness:


Uniqueness: Assume Z, W both satisfies (i) and (ii).
Let G = Z > W .
By

6
(i) G is F–measurable,
R R R
(ii) G (Z − W )dP = G XdP − G XdP = 0
⇒ P (G) = 0.

Recall that Z ≥ 0 a.s. and E(Z) = 0 ⇒ P (Z > 0) = 0.


Similarly, P (W > Z) = 0.

Existence: X ≥ 0, X = li=1 ai IAi


P

Define ν(G) = G XdP = li=1 ai P (Ai ∩ G) ∀ G ∈ F.


R P
Then ν is a (σ–finite) measure on F.

ν  P|F = P̃ (P̃ (G) = 0 ⇒ ν(G) = 0)

By Radon-Nikodym theorem ∃ F–measurable function f


Z Z
s.t. f dP = f dP̃ = ν(G)
G G

so f = E(X|F) a.s.

• derivative : 4f /4t

• density : contents/unit vol

• ratio

Radon-Nikodym Theorem : Assume that ν and µ are σ–finite measure on F s.t.


ν  µ. Then ∃ F–measurable function f s.t.
Z

f dµ = ν(A) ∀A ∈ F (f = ).
A dµ
1. transformation of X −→ new measure

2. FA 6= FB ⇒ E(X|FA ) 6= E(X|FB )

Example 1.6

7
1. Discrete : F = σ(Bi , 1 ≤ i < ∞) X ∈ L1

R
X Bi
XdP
E(X|F) = IBi
i=1
P (Bi )

2. Continuous : Let R f (x, y1 , . . . , yn ) be the joint density of (X, Y1 , . . . , Yn ) and


g(y1 , . . . , yn ) = f (x, y1 , . . . , yn )dx,
f (x,y1 ,...,yn )
Set f (x|y1 , . . . , yn ) = g(Ỹ )
I[g(Ỹ )6=0] , Ỹ = (y1 , . . . , yn ).
Then E(ϕ(X)|Y1 , . . . , Yn ) = h(Y1 , . . . , Yn ) a.s.,
R
where h(y1 , . . . , yn ) = ϕ(x)f (x|y1 , . . . , yn )dx.
We only have to show for any Borel set B ⊂ Rn ,
Z
E(h(Ỹ )IB ) = h(Ỹ )g(Ỹ )dỸ , Ỹ = (Y1 , . . . , Yn )
B
Z Z
= [ ϕ(x)f (x|Ỹ )dx]g(Ỹ )dỸ
B
Z Z
= ϕ(x)f (x, Ỹ )dxdỸ
ZBZ
= ϕ(x)IB f (x, Ỹ )dxdỸ
= E(ϕ(X)IB )
= E(E(ϕ(X)IB |Ỹ ))

⇒ ϕ(X) = h(Ỹ )

Proposition 1.1 Let X, Y ∈ L1 ,


1. E[E(X|F)]
R = E X. R
Proof : Ω E(X|F)dP = Ω XdP .

2. E(X|{∅, Ω}) = E X.

3. If X is F–measurable then
R E(X|F) = XR a.s..
Proof : Since ∀G ∈ F G E(X|F)dP = G XdP .

4. If X = cR ,a constant,
R a.s. then E(X|F) = c a.s..
Proof : G XdP = G cdP, Y ≡ c is F–measurable.

8
5. ∀ constantsR a, b E(aXR + bY |F) = aE(X|F) + bE(Y |F).
Proof : G
(rhd) = G (lhs).
6. X ≤ Y a.s. ⇒ E(X|F) ≤ E(Y |F).
Proof : Use (5), we only show that
X − Y = Z ≥ 0 a.s. ⇒ E(Z|F) ≥ 0 a.s..
Let A = {E(Z|F) < 0}, then
Z Z
0≤ ZdP = E(Z|F)dP ⇒ P (A) = 0.
A A

7. |E(X|F)| ≤ E(|X||F) a.s..


8. |Xn | ≤ Y a.s., Y ∈ L1 . If limn→∞ Xn = X a.s., then
lim E(Xn |F) = E(X|F) a.s..
n→∞

Proof :
Set Zn = supk≥n |Xk −X|, then Zn ≤ 2Y . So Zn ∈ L1 , and Zn ↓ ⇒ E(Zn |F) ↓ .
So ∃Z s.t. limn→∞ E(Zn |F) = Z a.s.. We only have to show that Z = 0 a.s..
Since |E(Xn |F) − E(X|F)| ≤ E(|Xn − X||F) ≤ E(Zn |F).
Note that Z ≥ 0 a.s.. We only have to prove E Z = 0.
Since E(Zn |F) ↓ Z, hence
E Z ≤ lim E(E(Zn |F)) = lim E(Zn ) = E( lim Zn ) = 0
n→∞ n→∞ n→∞

⇒ E Z = 0.

Theorem 1.1 If X is F–measurable and Y, XY ∈ L1 , then E(XY |F) = XE(Y |F).
Proof :
1. X = IG where G ∈ F
∀B∈F
Z Z Z Z
E(XY |F)dP = XY dP = IG Y dP = Y dP
B B B B∩G
Z
= E(Y |F)dP (Since B ∩ G ∈ F)
ZB∩G Z
= IG E(Y |F)dP = XE(Y |F)dP.
B B

So E(XY |F) = XE(Y |F).

9
P 2
2. Find Xn s.t. Xn = nk=0 nk I[ k ≤x< k+1 ] − nk I[− k+1 <x≤− k ] ,
n n n n
then |Xn | ≤ |X|, and Xn → X a.s..
From (1), we obtain that E(Xn Y |F) = Xn E(Y |F).
Now Xn Y → XY a.s.
|Xn Y | = |Xn ||Y | ≤ |XY |
byD.C.T.
limn→∞ E(Xn Y |F) = E(limn→∞ Xn Y |F) = E(XY |F).
But limn→∞ Xn E(Y |F) = XE(Y |F) a.s..
So E(XY |F) = XE(Y |F).

Theorem 1.2 (Towering)


If X ∈ L1 and F1 ⊂ F2 , then E[E(X|F2 )|F1 ] = E(X|F1 ).
Proof : ∀ B ∈ F1 then B ∈ F2 and
Z Z
E[E(X|F2 )|F1 ]dP = E(X|F2 )dP (Since B ∈ F1 )
B B
Z
= XdP (Since B ∈ F2 ).
B

So E[E(X|F2 )|F1 ] = E(X|F1 ) a.s..

Remark 1.1 E[E(X|F1 )|F2 ] = E(X|F1 )E[1|F2 ] = E(X|F1 ), since E(X|F1 ) is F2 –


measurable.

Jensen’s Inequality : If ϕ is a convex function on R and X, ϕ(X) ∈ L1 then


ϕ(E(X|F)) ≤ E(ϕ(X)|F) a.s..
Proof :
1. Let X = ki=1 ai IAi , where ∪ki=1 Ai = Ω, and Ai ∩ Aj = ∅ if i 6= j, then
P

k
X
E(X|F) = ai E(IAi |F).
i=1

Since
k
X Xk
E(IAi |F) = E( IAi |F) = E(1|F) = 1 a.s.,
i=1 i=1

10
so
k
X
ϕ(E(X|F)) ≤ E(IAi |F)ϕ(ai )
i=1
Xk
= E( ϕ(ai )IAi |F) = E(ϕ(X)|F)
i=1

P
2. Find Xn as before (i.e., Xn is of the form ai IAi , |Xn | ≤ |X|, and Xn → Xa.s..)
Then ϕ(E(Xn |F)) ≤ E(ϕ(Xn )|F).
First observe that E(Xn |F) → E(X|F)a.s.. By continuity of ϕ,

lim ϕ(E(Xn |F)) = ϕ( lim E(Xn |F)) = ϕ(E(X|F))


n→∞ n→∞

Fix m,we can find a convex function ϕm such that ϕm (x) = ϕ(x), ∀|x| ≤ m,
and |ϕm (x)| ≤ Cm (|x| + 1), ∀x, and ϕ(x) ≥ ϕm (x), ∀x.
Fix m, ∀n,
|ϕm (xn )| ≤ Cm (|xn | + 1) ≤ Cm (|x| + 1),
so
lim E[ϕm (xn )|F] = E[ lim ϕm (xn )|F] = E[ϕm (x)|F],
n→∞ n→∞

E[ϕ(x)|F] ≥ sup E[ϕm (x)|F] = sup lim E[ϕm (xn )|F]


m m n→∞
≥ sup lim ϕm (E(Xn |F)) = sup ϕm [ lim E(Xn |F)]
m n→∞ m n→∞

= sup ϕm [E(X|F)] = ϕ[E(X|F)] a.s.


m

Some properties of convex function ϕ :

• If λi ≥ 0, ni=1 λi = 1 then ϕ( ni=1 λi xi ) ≤ ni=1 λi ϕ(xi )


P P P

• The geometry property

• ϕ is continuous(since right-derivative and left-derivative exist)

11
Corollary 1.1 If X ∈ Lp , p ≥ 1 then E(X|F) ∈ Lp .
Proof : Since ϕ(x) = |x|p is convex if p ≥ 1, then

|E(X|F)|p ≤ E(|X|p |F) a.s.

and
E|E(X|F)|p ≤ EE(|X|p |F) = E|X|p < ∞.

Homework :
1 1
1. If p > 1 and p
+ q
= 1,X ∈ Lp , Y ∈ Lq , then
1 1
E(|XY ||F) ≤ E(|X|p |F) p E(|Y |q |F) q a.s..

2. If X ∈ L2 and Y ∈ L2 (F) = {U : U ∈ L2 and U is F–measurable}, then

E(X − Y )2 = E(X − E(X|F))2 + E(E(X|F) − Y )2 .

Therefore
inf2 E(X − Y )2 = E(X − E(X|F))2 .
Y ∈L (F )

Proof :

E(X − Y )2 = E(X − E(X|F) + E(X|F) − Y )2


= E(X − E(X|F))2 + E(E(X|F) − Y )2
+2E[(X − E(X|F))(E(X|F) − Y )].

Lemma 1.1 E(X − E(X|F))U = 0 if U ∈ L2 (F).


proof:
E[E((X − E(X|F))U |F)] = EU [E((X − E(X|F))|F]
= EU [E(X|F) − E(X|F)] = EU · 0 = 0.

Application : Bayes Estimate (X1 , · · · , Xn ) ∼ f (~x|θ) , θ ∈ L2 , Xi ∈ L2 . Use


X1 , · · · , Xn to estimate θ.
Method : find θ̂(X1 , · · · , Xn ) ∈ L2 such that E(θ − θ̂)2 is minimum.

12
Remark 1.2 Let Fn = σ(X1 , · · · , Xn ). Then θ̂ is Fn –measurable
⇔ ∃ measurable function h such that θ̂ = h(X1 , · · · , Xn ) a.s.
So θ̂n = E(θ|Fn ) is the solution.

Question : In what sense θ̂n −→ θ ?

1.2 Martingale
(Ω, F, P)
Fn ⊂ F, Fn ⊂ Fn+1 : history(filtration)

Definition 1.2

(i) Xn is Fn –adaptive ( or adapted to Fn ) if Xn is Fn –measurable ∀n.

(ii) Yn is Fn –predictive ( predictive w.r.t. Fn ) if Yn is Fn−1 –measurable ∀n.

(iii) The σ–fields Fn = σ(X1 , · · · , Xn ) is said to be the natural history of {Xn }.( It
is obvious Fn ↑. )

(iv) {Xn , n ≥ 1} is said to be a martingale w.r.t. {Fn , n ≥ 1} ,if

(1) Xn is Fn –adaptive.
(2) E(Xn |Fn−1 ) = Xn−1 , ∀n ≥ 2.
(3) {εn , n ≥ 1} is said to be a martingale difference sequence w.r.t. {Fn , n ≥ 0}
if E(εn |Fn−1 ) = 0 a.s., ∀n ≥ 1.

Remark 1.3 If {Xn , n ≥ 1} is a martingale w.r.t. {Fn , n ≥ 1} and E(X1 ) = 0,


then ε1 = X1 , εn = Xn − Xn−1 for n ≥ 2 is a martingale difference sequence w.r.t.
{Fn , n ≥ 0}, where F0 = {∅, Ω}, E(ε1 |F0 ) = E(X1 |F0 ) = E(X1 ) = 0.
If {εn , n ≥ 1} is a martingale difference w.r.t. {FnP
, n ≥ 0},{Yn , n ≥ 1} is {Fn , n ≥
0}–predictive, and εn ∈ L , Yn εn ∈ L , then Sn = ni=1 Yi εi is a martingale w.r.t.
1 1

13
{Fn , n ≥ 0}.
Proof :

E(Sn |Fn−1 ) = E(Yn εn + Sn−1 |Fn−1 )


= E(Yn εn |Fn−1 ) + Sn−1 = Yn E(εn |Fn−1 ) + Sn−1
= Yn · 0 + Sn−1 = Sn−1 a.s..

Example 1.7

(a) IfP{εi } are independent r.v.0 s with E(εi ) = 0, and V ar(εi ) = 1, ∀i. Let Sn =
n
i=1 εi , and Fn = σ(ε1 , · · · , εn ), then E(εn |Fn−1 ) = E(εn ) = 0.

(b) Let Xn = ρXn−1 + εn , |ρ| < 1, where εn are i.i.d. with E(εn ) = 0, E(ε2n ) < ∞ and
n
X0 ∈ L2 is independent of {εi , i ≥ 1}, then i=1 Xi−1 εi is a martingale w.r.t.
P
{Fn , n ≥ 0}, where Fn = σ(X0 , ε1 , · · · , εn ), ∀n ≥ 0.
proof :

Xn = ρ2 Xn−2 + ρεn−1 + εn
= · · · = ρn X0 + ρn−1 ε1 + · · · + εn .

(c) Bayes estimate : θ̂n = E(θ|Fn ) where Fn ↑,

E(θ̂n+1 |Fn ) = E(E(θ|Fn+1 )|Fn ) = E(θ|Fn ) = θ̂n .

(d) Likelihood Ratio : Pθ , dPθ = fθ (X1 , · · · , Xn )dµ

fθ (X1 , · · · , Xn ) dPθ /dµ dPθ


Yn (θ, θ0 , X1 , · · · , Xn ) = = =
fθ0 (X1 , · · · , Xn ) dPθ0 /dµ dPθ0

Fn = σ(X1 , · · · , Xn )

Ln (θ, X1 , · · · , Xn ) = fθ (Xn |X1 , · · · , Xn−1 )Ln−1 (θ, X1 , · · · , Xn−1 ).

14
Fix θ0 , θ, then{Yn (θ), Fn , n ≥ 1} is a martingale
Ln (θ)
Eθ0 (Yn (θ)|Fn−1 ) = Eθ0 ( |Fn−1 )
Ln (θ0 )
fθ (Xn |X1 , · · · , Xn−1 ) Ln−1 (θ)
= Eθ0 ( · |Fn−1 )
fθ0 (Xn |X1 , · · · , Xn−1 ) Ln−1 (θ0 )
Ln−1 (θ) fθ (Xn |X1 , · · · , Xn−1 )
= Eθ0 ( |Fn−1 )
Ln−1 (θ0 ) fθ0 (Xn |X1 , · · · , Xn−1 )
fθ (xn |X1 , · · · , Xn−1 )
Z
= Yn−1 (θ) · fθ0 (xn |X1 , · · · , Xn−1 )dxn .
fθ0 (xn |X1 , · · · , Xn−1 )
Z
i.e., E(ϕ(X)|X1 , · · · , Xn ) = ϕ(x)f (x|X1 , · · · , Xn )dx.

(e) { d logdθLn (θ) , Fn = σ(X1 , · · · , Xn )} is a martingale if


∂fθ (xn |X1 , · · · , Xn−1 )
Z Z

dxn = fθ (xn |X1 , · · · , Xn−1 )dxn = 0.
∂θ ∂θ

d log Ln (θ)
Eθ ( |Fn−1 )

d log fθ (Xn |X1 , · · · , Xn−1 ) d log Ln−1 (θ)
= Eθ ( + |Fn−1 )
dθ dθ
∂fθ (Xn |X1 ,··· ,Xn−1 )
∂θ d log Ln−1 (θ)
= Eθ [ |Fn−1 ] +
fθ (Xn |X1 , · · · , Xn−1 ) dθ
Z ∂fθ (xn |X1 ,··· ,Xn−1 )
∂θ d log Ln−1 (θ)
= · fθ (xn |X1 , · · · , Xn−1 )dxn +
fθ (xn |X1 , · · · , Xn−1 ) dθ
d log Ln−1 (θ)
= .

Lemma : If Xn is Fn –adaptive and Xn ∈ L1 , then S1 = X1 , Sn = X1 +
P n
i=2 (Xi − E(Xi |Fi−1 )) is a martingale w.r.t. {Fn , n ≥ 1}.
proof : n ≥ 2,
n
X
∵ E(Sn |Fn−1 ) = X1 + Xi − E(Xi |Fi−1 ) + E[(Xn − E(Xn |Fn−1 ))|Fn−1 ],
i=2
∴ E[(Xn − E(Xn |Fn−1 ))|Fn−1 ] = E(Xn |Fn−1 ) − E(Xn |Fn−1 ) = 0.

15
(f ) Let

d log fθ (Xn |X1 , . . . , Xn−1 )


un (θ) = ,

n
d log Ln (θ) X
= ui (θ),
dθ i=1
n
X
I(θ) = E[u2i (θ)|Fi−1 ],
i=1
dun (θ)
= vn (θ),

Xn
J(θ) = vn (θ),
i=1
Pm
then J(θ)+I(θ) is a martingale, and J(θ)− i=1 E(vi (θ)|Fi−1 ) is a martingale.
We only have to show that

E[vi (θ)|Fi−1 ] = −E[u2i (θ)|Fi−1 ] a.s..

Example : Xn = θXn−1 + εn , n = 1, 2, . . ., and X0 ∼ N (0, c2 ) is independent


of i.i.d. sequence εn ∼ N (0, σ 2 ). Assume that σ 2 and c2 are known, then

Ln (θ, X0 , . . . , Xn ) = fθ (X0 )fθ (X1 |X0 ) · · · fθ (Xn |X0 , . . . , Xn−1 )


1 − x202 1 (xn −θxn−1 )2
= √ e 2c · · · √ e− 2σ 2
2πc 2πσ
1 1 1 x20 1 Pn 2
= ( √ )n+1 n e−[ 2c2 + 2σ2 i=1 (xi −θxi−1 ) ] .
2π cσ
Hence
n
n+1 x20 1 X
log Ln (θ) = log(2π) − log c − n log σ − [ 2 + 2 (xi − θxi−1 )2 ],
2 2c 2σ i=1

therefore
n n
d log Ln (θ) 1 X 1 X
= 2 xi−1 (xi − θxi−1 ) = 2 xi−1 εi .
dθ σ i=1 σ i=1

16
1 1 2
i.e., ui (θ) = 2
Xi−1 (Xi − θXi−1 ) ⇒ u2i (θ) = 4 Xi−1 (Xi − θXi−1 )2 .
σ σ
Then
1 2
E[u2i (θ)|Fi−1 ] = X E[(Xi − θXi−1 )2 |Fi−1 ]
σ 4 i−1
2
1 2 2 Xi−1
= X σ = 2 ,
σ 4 i−1 σ
so
n
1 X 2
I(θ) = X ,
σ 2 i=1 i−1
2
dui (θ) Xi−1
vi (θ) = =− 2 ,
dθ σ
n n
X 1 X 2
J(θ) = vi (θ) = − 2 X .
i=1
σ i=1 i−1

⇒ I(θ) + J(θ) = 0.
Pn Pn
And i=1 u2i (θ) + i=1 E[vi (θ)|Fi−1 ] is also a martingale, since
n n n
1 X 2 2 1 X 2 1 X 2 2
X [Xi − θXi−1 ] − 2 X = X [ε − σ 2 ],
σ 4 i=1 i−1 σ i=1 i−1 σ 4 i=1 i−1 i

E[ε2 − σ 2 |Fi−1 ] = E(ε2 − σ 2 ) = σ 2 − σ 2 = 0.

Definition 1.3 An {Fn , n ≥ 1}– adaptive seq. {Xn } is defined to be a sub–martingale


(super–martingale) if E(Xn |Fn−1 ) ≥ (≤)Xn−1 for n = 2, . . ..

(1)Intuitive : martingale — constant


submartingale — increasing
supermartingale — decreasing
(2)Game : martingale — fair game
submartingale — favorable game
suppermartingale — infarovable game

17
Theorem 1.3
(i) Assume that {Xn , Fn } is a martingale. If ϕ is convex and ϕ(Xn ) ∈ L1 , then
{ϕ(Xn )Fn } is a submartingale.
(ii) Assume that {Xn , Fn } is a submartingale. If ϕ is convex, increasing and E[ϕ(Xn )] ∈
L1 , then {ϕ(Xn ), Fn } is a submartingale.
Proof : By Jensen inequality,

E[ϕ(Xn )|Fn−1 ] ≥ ϕ(E[Xn |Fn−1 ]) = ϕ(Xn−1 ).

For examples, ϕ(x) = |x|p , p ≥ 1 or ϕ(x) = (x − a)+ .


Corollary 1.2 If {Xn , Fn } is a martingale, and Xn ∈ Lp with p ≥ 1, then h(n) =
E|Xn |p is an increasing function.
Proof : Since {|Xn |p , Fn } is a submartingale,

E{E(|Xn+1 |p |Fn )} ≥ E{|Xn |p }.


Pn
Prove that Xn = i=1 εi , where ε0i s are i.i.d. r.v.0 s with E(εi ) = 0, and E|εi |3 < ∞,
then
E|Xn |3 ≤ E|Xn+1 |3 ≤ . . . .
(iii) [Gilat,D.(1977) Ann. Prob. 5,pp.475-481]
For a nonnegative submartingale {Xn , σ(X1 , . . . , Xn )}, there is a martingale
D
{Yn , σ(Y1 , . . . , Yn )} s.t. {Xn } = {|Yn |}.

(iv) Assume that {Xn , σ(X1 , . . . , Xn )} is a nonnegative submartingale. If ϕ is con-


vex and ϕ(Xn ) ∈ L1 , then there is a submartingale {Zn , σ(Z1 , . . . , Zn )} s.t.
D
{ϕ(Xn )} = {Zn }.
Proof : Let ψ(X) = ϕ(|X|). Then ψ(X) is a convex function. By Gilat’s
D
theorem, ∃ martingale {Yn } s.t. {Xn } = {|Yn |}, so
D
{ϕ(Xn )} = {ϕ(|Yn |)} = {ψ(Yn )} = {Zn },

which is a submartingale by (i).

18


Homework : Assume that {Xn , Fn } is a submartingale. If ∃ m > 1 s.t. E(Xm ) =


E(X1 ), then {Xi , Fi , 1 ≤ i ≤ m} is a martingale.

Definition 1.4 Let N∞ = {1, 2, . . . , ∞}, and T : Ω → N∞ . Then T is said to be a


Fn –stopping time if {T = n} ∈ Fn , n = 1, 2, . . . .

Remark 1.4 Let F∞ = ∨n Fn . Since {T = ∞} = {T < ∞}c and {T < ∞} =


∪n {T = n} ∈ F∞ so {T = ∞} ∈ F∞ .

We said that a stopping time T is finite if P {T = ∞} = 0.

Remark 1.5 Since {T ≥ n} = {T < n}c ∈ Fn−1 , then

{T ≤ n} ∈ Fn , ∀ n ⇔ {T = n} ∈ Fn , ∀ n.

Definition 1.5 Let T be an Fn –stopping time. The pre–T σ–field FT is defined to


be {Λ ∈ F : Λ ∩ {T = n} ∈ Fn , ∀ n ∈ N∞ }.

If Λ ∈ FT , then Λ = ∪n∈N∞ (Λ ∩ {T = n}) ∈ F∞ , so FT ⊂ F∞ .

Example 1.8 Let Xn be Fn –adaptive ∀ Borel set Γ, we define T = inf{n : Xn ∈ Γ}.


Then T is an Fn –stopping time. (inf Ø = ∞).
Proof : {T = k} = {X1 6∈ Γ, . . . , Xk−1 6∈ Γ, Xk ∈ Γ} ∈ Fk .

Theorem 1.4 Assume that T1 and T2 are Fn –stopping times.

(i) Then so are T1 ∧ T2 and T1 ∨ T2 .

19
(ii) If T1 ≤ T2 then FT1 ⊂ FT2 .

Proof :

(i) {T1 ∧ T2 ≤ n} = {T1 ≤ n} ∪ {T2 ≤ n} ∈ Fn


{T1 ∨ T2 ≤ n} = {T1 ≤ n} ∩ {T2 ≤ n} ∈ Fn

(ii) Let Λ ∈ FT1 , then Λ ∩ {T1 ≤ n} ∈ Fn . Since {T2 ≤ n} ∈ Fn , we have Λ ∩ {T1 ≤


n} ∩ {T2 ≤ n} ∈ Fn and Λ ∩ {T1 ≤ n} ∩ {T2 ≤ n} = Λ ∩ {T2 ≤ n} ∈ FT2 , so
Λ ∈ FT2 .

Theorem 1.5 (Optional Sampling Theorem)


Let α and β be two Fn –stopping times s.t. α ≤ β ≤ K where K is a positive
integers. Then for any (sub or super) martingale {Xn , Fn },{Xα , Fα ; Xβ , Fβ } is a
(sub or super) martingale.
Proof : We only have to consider the case when Xn is a submartingale.
Lemma : Assume that β is an Fn –stopping time s.t. β ≤ K. If {Xn , Fn } is a
submartingale then

E[Xβ |Fn ] ≥ Xn a.s. on {β ≥ n}


E[Xβ |Fn ]I[β≥n] ≥ Xn I[β≥n] a.s.

Proof of Lemma : It is sufficient to show that


Z Z
∀ A ∈ Fn Xβ I[β≥n] dp ≥ Xn I[β≥n] dp
A A

Let A = {Un > E(Z|Fn )}, E(Z|Fn ) ≥ Un ∈ Fn


Z Z
⇔ ∀ A ∈ Fn , Zdp ≥ U dp
A A
Z
⇔ ∀ A ∈ Fn , (Z − U )dp ≥ 0
A
Z Z
⇔ E(Z|Fn )dp ≥ U dp
ZA A

⇔ [E(Z|Fn ) − U ]dp = 0.
A

20
Z Z Z Z
Xn I[β≥n] dp = Xn dp = Xn dp + Xn dp
A A∩[β≥n] A∩[β=n] A∩[β≥n+1]
Z Z
≤ Xβ dp + Xn+1 dp.
A∩[β=n] A∩[β≥n+1]

Since B ∈ Fn , Z Z Z
E[Xn+1 |Fn ]dp = Xn+1 dp ≥ Xn dp.
B B B
We have that
Z Z Z Z
Xn I[β≥n] dp ≤ Xβ dp + . . . + Xβ dp + XK+1 dp
A A∩[β=n] A∩[β=K] A∩[β≥K+1]
Z Z
= Xβ dp = Xβ dp.
A∩[n≤β≤K] A∩[n≤β]

Continuation of the proof of the theorem : R R


It is sufficient to show that ∀ Λ ∈ Fα , Λ Xβ dp ≥ Λ Xα dp. Given Λ ∈ Fα , A =
∪kn=1 (Λ ∩ {α = n}). It is sufficient to show ∀ 1 ≤ n ≤ K,
Z Z Z
Xβ dp ≥ Xα dp = Xn dp.
Λ∩[α=n] Λ∩[α=n] Λ∩[α=n]
R R
However, Λ∩[α=n] Xβ dp = Λ∩[α=n] E(Xβ |Fn )dp. Since {α = n} ⊂ {β ≥ n} (since
R R
β ≥ α = n), we have Λ∩[α=n] E(Xβ |Fn )dp ≥ Λ∩[α=n] Xn dp,

∀ n, {Xα ≤ x} ∪ {α = n} = {Xn ≤ x} ∩ {α = n} ∈ Fn

So {Xα ≤ x} ∈ Fα .

Remark 1.6 If α = 1, ∀ β ≤ K, we have EXβ = EX1 , then {Xn , Fn } is a martin-


gale.

How to prove the convergence of a sequence:

1. Find the limit X, try to show |Xn − X| → 0.

21
2. Without knowing the limit:

(i) Cauchy sequence supm>n |Xn − Xm | → 0 as n → ∞


(ii) limit set ,[lim inf Xn , lim sup Xn ] = A
(a) lim inf Xn = lim sup Xn
(b) ∀ a ∈ A, ψ(a) = 0 and ψ has a unique root.
Consider

{lim inf Xn < lim sup Xn } = ∪ a<b {lim inf Xn < a < b < lim sup Xn }
rationals

α1 = inf{m : Xm ≤ a}
β1 = inf{m > α1 : Xm ≥ b}
..
.
αk = inf{m > βk−1 : Xm ≤ a}
βk = inf{m > αk : Xm ≥ b},

and define upcrossing number Un = Un [a, b] = sup{j : βj ≤ n, j < ∞}. Note that if
αi0 = αi ∧ n, βi0 = βi ∧ n then αn0 = βn0 = n.
Then define τ0 = 1, τ1 = α10 , . . . , τ2n−1 = αn0 , and τ2n = βn0 . Clearly, τn = n.
If {Xn , Fn } is a submartingale, then {Xτk , Fτk , 1 ≤ k ≤ n} is a submartingale by
optional sampling theorem. ( Since τk ≤ n ∀ 1 ≤ k ≤ n. )
Theorem 1.6 (Upcrossing Inequality)
If {Xn , Fn } is a submartingale, then (b − a)EUn ≤ E(Xn − a)+ − E(X1 − a)+ .
Proof : Observe that the upcrossing number Un [0, b − a] of (Xn − a)+ is the same as
Un [a, b] of Xn . Furthermore,{(Xn −a)+ , Fn } is also a martingale. ϕ(x) = (x−a)+ is a
convex function. Hence we only have to show the case Xn ≥ 0 a.s. and Un = Un [0, c].
Now consider
n−1
X X X
Xn − X1 = Xτn − Xτn−1 + . . . + Xτ1 − Xτ0 = (Xτi+1 − Xτi ) = + ,
i=0 i:even i:odd

X
∵ (xτi+1 − Xτi ) ≥ Un C,
i:odd
X X
∴ EXn − EX1 ≥ CEUn + E( ) ≥ CEUn + (EXτi+1 − EXτi ) ≥ CEUn .
i:even i:even

22


Theorem 1.7 (Global convergence theorem)


Assume that {Xn , Fn } is a submartingale s.t. supn E(Xn+ ) < ∞. Then Xn con-
verges a.s. to a limit X∞ and E|X∞ | < ∞.
Proof : We only have to show that

P [lim inf Xn < a < b < lim sup Xn ] = 0. (∗)

Let U∞ [a, b] be the upcrossing number of {Xn }. Then {lim inf Xn < a < b <
lim sup Xn } ⊂ {∪∞ [a, b] = ∞} and Un [a, b] ↑ U∞ [a, b],

EU∞ [a, b] = lim E(Un [a, b])


n→∞
≤ sup(E(Xn − a)+ − E(X1 − a)+ )/(b − a) < ∞,
n

so U∞ [a, b] < ∞ a.s., and P [U∞ [a, b] = ∞] = 0.


This implies (∗). Now

E|Xn | = EXn+ + EXn− = 2EXn+ − (EXn+ − EXn− )


= 2EXn+ − EXn ≤ 2EXn+ − EX1 ,

so supn E|Xn | ≤ 2 supn EXn+ − EX1 < ∞.


By Fatou’s Lemma,

E|X∞ | = E( lim |Xn |) ≤ lim inf E|Xn | ≤ sup E|Xn | < ∞.


n→∞ n

Remark 1.7 Xn ↑, supn EXn+ < ∞ : upper bound.


a.s.
Corollary 1.3 If {Xn } is a nonnegative supermartingale then ∃ X ∈ L0 s.t. Xn →
X.
Proof : Since −Xn is a nonpositive submartingale and E(−Xn )+ = 0, ∀ n.

Example 1.9

23
1. Likelihood Ratio
Ln (θ)
Yn (θ) = ≥ 0.
Ln (θ0 )
So Yn (θ) → Y (θ) a.s. (Pθ0 ), (Y (θ) = 0 if θ1 , θ0 are distinctable.)

2. Baye’s est.

θ̂n = E[θ|X1 , . . . , Xn ], E(θ2 ) < ∞


E|θ̂n | ≤ E{E(|θn ||X1 , . . . , Xn )} = E|θn | < ∞.
a.s.
So supn E|θ̂n | < ∞, and θ̂n → θ∞ .

Definition 1.6 {Xn } is said to be uniformly integrable(u.i.) if ∀ ε > 0, ∃ A s.t.


Z Z
sup |Xn |dp ≤ ε or lim sup |Xn |dp → 0.
n {|Xn |>A} A→∞ n {|Xn |>A}

Theorem 1.8 {Xn } is u.i. ⇐⇒

(i) supn E|Xn | < ∞, and


R
(ii) ∀ ε > 0, ∃ δ > 0 s.t. ∀ E ∈ F, P (E) < δ ⇒ supn E
|Xn |dP < ε.

How to prove {Xn } is u.i. ?

1. If Z = supn |Xn | ∈ L0 then {Xn } is u.i..


Proof :

(i) obvious,since E|Xn | ≤ E(Z) < ∞


(ii)
Z Z Z Z
|Xn |dP ≤ ZdP ≤ ZI[Z≤c] dP + ZI[Z>c] dP
E E E E
Z
≤ cP (E) + ZdP
{Z>c}

24
2. If ∃ Borel–measurable function f : [0, ∞) 7→ [0, ∞) s.t. supn Ef (|Xn |) < ∞ and
limt→∞ f (t)
t
= ∞, then {Xn } is u.i..
p
Theorem 1.9 Assume that Xn → X , then the following statements are equivalent.
(i) {|Xn |p } is u.i.
Lp n→∞
(ii) Xn → X, (i.e.E|Xn − X|p −→ 0)
n→∞
(iii) E|Xn |p −→ E|X|p

D n→∞
Remark 1.8 If Xn → X and {|Xn |p } is u.i., then E|Xn |p −→ E|X|p .
Proof : We can reconstruct the probability space and r.v.’s Xn0 , X 0 ,
D D a.s.
s.t. Xn0 = Xn , X 0 = X and Xn0 → X 0 .

D n→∞
Ex. Let Xn → N (0, σ 2 ) and Xn2 is u.i., then E(Xn2 ) −→ σ 2 . How to know
max1≤i≤n |Xn |p ∈ L1 ?


1.3 Basic Inequalities (maximum inequalities)


Theorem 1.10 (Fundamental Inequality)
If {Xi , Fi , i ≤ i ≤ n} is a submartingale, then ∀ λ
λP [ max Xi > λ] ≤ E(Xn I[max1≤i≤n Xi >λ] ).
1≤i≤n

Proof : Define τ = inf{i : Xi > λ}, (recall : inf Ø = ∞), then {max1≤i≤n Xi > λ} =
{τ ≤ n}. On the set τ = k ≤ n, Xτ > λ, then
Z Z Z
λP [τ = k] ≤ Xτ dP = Xk dP ≤ Xn dP
[τ =k] [τ =k] [τ =k]

Since
τ = k ⇔ X1 ≤ λ, . . . , Xk−1 ≤ λ, Xk > λ,
then
n
X Z Z
λP [ max Xi > λ] = λ P [τ = k] ≤ Xn dP = Xn dP.
1≤i≤n [τ ≤n] [max1≤i≤n Xi >λ]
k=1

25


Theorem 1.11 (Doob’s Inequality)


If {Xi , Fi , 1 ≤ i ≤ n} is a martingale, then ∀ p > 1
kXn kp ≤ k max |Xi |kp ≤ qkXn kp ,
1≤i≤n

1
where kXkp = (E|X|p ) p and p1 + 1q = 1.
Proof : Since {|Xn |, Fn } is a submartingale, by the theorem. Let Z = max1≤i≤n |Xi |,
then
Z ∞
p
E(Z ) = p xp−1 P [Z > x]dx
Z0 ∞ Z ∞
p−2
≤ p x E(|Xn |I[Z>x] )dx = pE[|Xn | I[Z > x]xp−2 dx]
0 0
Z Z p−1
Z
≤ pE[|Xn | xp−2 dx] = pE[|Xn | ]
0 p−1
p p 1
≤ kXn kp kZ p−1 kq = kXn kp [E(Z p )] q .
p−1 p−1
Hence
kZ p−1 kq = {E(Z p−1 )q }1/q = [E(Z p )]1/q ,
1
kZkp = [E(Z p )]1/p = [E(Z p )]1− q ≤ qkXn kp .
Note that
k max |Xi |kp = ∞ ⇒ qkXn kp = ∞.
1≤i≤n

Corollary 1.4 If {Xn , Fn , n ≥ 1} is a martingale s.t. supn E|Xn |p < ∞ for some
p > 1 then {|Xn |p } is u.i. and Xn converges in Lp .
Proof : p > 1 ⇒ supn E|Xn | < ∞ so Xn converges a.s. to a r.v. X. By Doob’s
inequality:
k max |Xi |kp ≤ qkXn kp ≤ q sup kXn kp < ∞
1≤i≤n n
By the Monotone convergence theorem:
E sup |Xi |p = lim E sup |Xi |p ≤ q sup E|Xn |p < ∞
1≤i≤∞ n→∞ 1≤i≤n n

Lp
So sup1≤i≤∞ |Xi |p ∈ L1 , {|Xn |p } is u.i. and Xn −→ X.

26


Homework : Show without using martingale convergence theorem that if {Xn , Fn }


is a martingale and supn E|Xn |p < ∞ for some p > 1 then Xn converges a.s.


a.s.
Ex.( Bayes Est. ) θ̂n = E[θ|X1 , . . . , Xn ]. If θ ∈ L2 then θ̂n → θ∞ and E[θ̂n −θ∞ ]2 → 0.
pf: E θ̂n2 ≤ Eθ2 < ∞(p = 2).

What is θ∞ ? Is θ∞ equal to E[θ|Xi , i ≥ 1]?

Theorem 1.12 If X ∈ L1 , Xn = E(X|Fn ) and X∞ = limn→∞ Xn then (i) {Xn } is


u.i., and (ii) X∞ = E(X|F∞ ) where F∞ = ∨∞ n=1 Fn .
pf: Fix n, {Xn , Fn , X, F} is a martingale. Therefore, {|Xn |, Fn , |X|, F} is a sub-
n|
martingale. So {|Xn |>λ} |Xn |dP ≤ {|Xn |>λ} |X|dP . Now P {|Xn | > λ} ≤ E|X
R R
λ

E|X|
λ
→ 0. R R
{|Xn |>λ}
|X|dP ≤ cP {|X n | > λ} + {|X|>c}
|X|dP
E|X| R
≤ c λ + {|X|>c} |X|dP
Z
E|X|
⇒ sup E|Xn |I[|Xn |>λ] ≤ c + |X|dP
n λ {|X|>c}
Z
lim sup E|Xn |I[|Xn |>λ] ≤ |X|dP ∀ c
λ→∞ n {|X|>c}

L1 R n→∞ R R
RTherefore, XRn → X∞ . So ∀ Λ ∈ F, Λ Xn dP → Λ X∞ dP . Since | Λ Xn dP −
Λ
X∞ dP | ≤ Λ |Xn − X∞ |dP ≤ E|Xn − X∞ | → 0. Fix n, Λ ∈ Fn , ∀ m ≥ n
Z Z Z Z
XdP = Xn dP = Xm dP = X∞ dP
Λ Λ Λ Λ

Let G = {Λ : Λ XdP = Λ X∞ dP }. Then G is a σ–field s.t. G ⊃ ∪∞


R R
n=1 Fn . So

G ⊃ ∨n=1 Fn = F∞ . Observe that X∞ is F∞ –measurable. Hence E(X|F) = X∞ .

Corollary 1.5 Assume that θ ∈ L2 , θ̂n = E(θ|X1 , . . . , Xn ) and θ∞ = E(θ|Xi , i ≥ 1).


p
If ∃ θ̃n = θ̃n (X1 , . . . , Xn ) s.t. θ̃n → θ then θ∞ = θ a.s.
p a.s.
pf: Since θ̃n → θ. Let Fn = σ(X1 , . . . , Xn ). So ∃ nj s.t. θ̃nj → θ as nj → ∞.
Hence θ is F∞ = σ(Xi , i ≥ 1) measurable. By the theorem stated above, we get
θ∞ = E[θ|F∞ ] = θ a.s.

27
Example: yi = θxi + εi ,Xi : constant,θ ∈ L2 with known density f (θ),εi i.i.d.
N (0, σ 2 ), σ 2 known, and {εi } is independent of θ.
Pn
µ i=1 Xi Yi
c 2 + 2
θ̂n = E(θ|Y1 , . . . , Yn ) = Pnσ 2
1 i=1 Xi
c 2 + σ 2

Assume that f (θ) ∼ N (µ, c2 ), µ, c2 known.


Pn 2
1 − (θ−µ) 2
1 n − i=1 (yi −θxi )
g(θ, y1 , . . . , yn ) = √ e 2c2 ( √ ) e 2σ 2
2πc 2πσ

g(θ|y1 , . . . , yn ) = R g(θ,y1 ,...,yn )


g(θ,y1 ,...,yn )dθ
Pn 2 Pn
2
1 i=1 Xi )θ 2 +( µ + i=1 xi yi )θ
∝ K(y1 , . . . , yn )e−( 2c2 + 2σ 2 c2 σ2

P∞
When i=1 Xi2 < ∞
P∞ 2
P∞
µ i=1 Xi i=1 Xi εi
n→∞ c2 + σ2
θ+ σ2
θ̂n −→ P∞ 2 = θ∞
1 i=1 Xi
c2
+ σ2
P∞ 2
P∞
i=1 Xi ( Xi2 )σ 2
σ 2 c2 + i=1
σ4
D
∼ N (µ, P∞ 2 ) 6= θ
i=1 Xi
( c12 + σ 2 )2
P∞
When i=1 Xi2 → ∞
Pn Pn
xi yi xi εi a.s.
θ̂n ∼ Pn 2 = θ + Pi=1
i=1
n 2
→θ
i=1 xi i=1 xi
Pn
xi y i Pn
In general, let θ̂n = Pi=1
n 2 . When i=1 x2i → ∞,
i=1 xi

Pn
2 x i εi 2 σ2
E(θ̃n − θ) = E{ Pi=1
n 2
} = Pn 2
→0
i=1 Xi i=1 xi

p
So θn → θ. By our theorem, θˆn → θ a.s. or L2 .
Pn
How to calculate the upper and lower bound of E|Xn |p and E| i=1 Xi εi |p ?

28
1.4 Square function inequality
Let {Xn , Fn } be a martingale and d1 = X1 ,di = Xi − Xi−1 for i ≥ 2.

Theorem 1.13 (Burkholder’s inequality)


∀ 1 < p < ∞, ∃ C1 and C2 depending only on p such that
n
X n
X
C1 E| d2i |p/2 p
≤ E|Xn | ≤ C2 E| d2i |p/2
i=1 i=1

Cor. For p > 1, ∃C20 depending only on p s.t.


n
X n
X
C1 E| d2i |p/2 ≤ E(Xn∗ )p ≤ C20 E| d2i |p/2
i=1 i=1

where Xn∗ = max1≤i≤n |Xi | and C1 is defined by the theorem.


proof:
Since E(Xn∗ )p ≥ E|Xn |p =⇒ lower half is obtained.
By Doob’s inequality: kXn∗ kp ≤ qkXn kp P
So E(Xn∗ )p = kXn∗ kpp ≤ q p E|Xn |p ≤ q p C2 E| ni=1 d2i |p/2
Remark: When di are independent, it is called Marcinkiewz-Zygmund inequality.
Note that for p ≥ 2
p/2
E| ni=1 d2i |p/2 = k ni=1 d2i kp/2
P P

≤ ( ni=1 kd2i kp/2 )p/2 = { ni=1 (E|di |p )2/p }p/2


P P

D D
N (0, σ 2 ) then Y ∼ N (0, ( ∞ 2 2
P
If εi ∼ P −∞ ai )σ ).
C2 = ( ∞ 2
−∞ ai )σ
2


p Y p p p p p p/2
X
E|Y | = E| | C = (E|N (0, 1)| )C = {E|N (0, 1)| }σ ( a2i )p/2
C −∞
P∞ P∞ 2
Example: Let Y = −∞ ai εi ,where −∞ ai < ∞ and εi are i.i.d. random Pvaribles
2 p n
with E(εi ) = 0 and V ar(εi ) = σ < ∞. Assume E|εi )| < ∞,Yn = −n ai εi ,
(a−n ε−n , a−n ε−n + a−n+1 ε−n+1 , · · · , Yn ) is a martingale.

E|Yn |p ≤ C2 {Pn−n (E|ai εi |p )2/p }p/2


P
= C2 { n−n (|ai |p E|εi |pP )2/p }p/2
= C2 {(E|ε1 |p )2/p }p/2 { n−n a2i }p/2

29
By Fatou’s lemma, E|Y |p ≤ C2 (E|ε1 |p ){ ∞ 2 p/2
P
−∞ ai } , ∃ C1 , C2 depending only on p
p
and E|εi | s.t.
X∞ X∞
C1 ( a2i )p/2 ≤ E|Y |p ≤ C2 ( a2i )p/2
−∞ −∞

Example: Consider yi = α + βxi + εi where εi are P i.i.d. mean 0 and E|εi |p < ∞
for some p ≥ 2. Assume that xi are constant and s2n = ni=1 (xi − x̄n )2 → ∞. If p > 2
then the least square estimator β̂ is strongly consistent.
Pn
(xi − x̄n )εi σ2
β̂n − β = Pi=1
n 2
(V ar(β̂n ) = )
i=1 (xi − x̄n ) s2n
x̄n = n1 ni=1 xi ,let
P

Pn
Sn = i=1 (xi − x̄n )εi , n ≥ 2
= S2 + (S3 − S2 ) + · · · + (Sn − Sn−1 )
When n > m,
Sn − Sm = Pni=1 (xi − x̄n )εi − Pm
P P
i=1 (xi − x̄m )εi
m n
= (x̄
i=1 m − x̄ )ε
n i + i=m+1 (xi − x̄n )εi
Pm
E(Sn − Sn−1 )Sm = x̄m )(x̄m − x̄n )σ 2
i=1 (xi − P
= (x̄m − x̄n )[ m i=1 (xi − x̄m )]σ
2

So s2n = ( ni=2 Ci2 ) where C22 = E(S22 )/σ 2 and Cn2 = E(Sn − Sn−1 )2 /σ 2 . We want to
P
show Ss2n → 0 a.s.
n

Moricz:E| ni=m Zi |p ≤ Cp ( ni=m CPi2 )p/2 ∀ n, m


P P
n
Z
If ni=1 Ci2 → ∞ and P > 2 then Pni=1 C 2i → 0 a.s.
P
i=1 i
Zi = Si − Si−1 ,Sn = ni=1 (xi − x̄n )εi Note that ni=m Zi = ni=1 ai (n, m)εi where
P P P
ai (n, m) may depend on n and m.
So E| ni=m Zi |p ≤ Cp ( ni=1 a2 (n, m))p/2
P P
P i
V ar( n Zi ) p/2
≤ Cp [ P σi=m 2 ]
n
i=m V ar(Zi ) p/2
= Cp [ σ2
]
Cp Pn 2 p/2
= σp ( i=m Ci )
If ai is Fi−1 –measurable,recall:
Pn
} = { ni=1 (E|ai εi |p )2/p }p/2
p 2/p p/2
P
{ P i=1 (E|di | )
= { ni=1 (E|ai |p {E(|εi |p |Fi−1 )})2/p }p/2

30
Theorem 1.14 (Burkholder-Davis-Gundy)
∀ ρ > 0, ∃ C depending only on p s.t.
n
X
E(Xn∗ )p ≤ C{E[ E(d2i |Fi−1 )]p/2 + E( max |di |p )}
1≤i≤n
i=1

Theorem 1.15 (Rosenthal’s inequality)


∀ 2 ≤ p < ∞, ∃ C1 , C2 depending only on p s.t.
C1 {E[ ni=1 2 p/2
Pn
E|di |p } ≤ E|Xn |p
P
E(di |Fi−1 )] + i=1
≤ C2 {E[ i=1 E(d2i |Fi−1 )]p/2 + ni=1 E|di |p }
Pn P

Cor.(Wei,1987,Ann.Stat. 1667-1682)
Assume that {εi , Fi } is a martingale differences s.t. supn E{|εn |p |Fn−1 } ≤ C for
some p ≥ 2 and constant C.
Assume that un is Fn−1 –measurable. Let Xn = ni=1 ui εi and Xn∗ = sup1≤i≤n |Xi |.
P

Then ∃ K depending only on C and p s.t. E(Xn∗ )p ≤ KE( ni=1 u2i )p/2 .
P
Proof: By B–D–G inequality:
n
X
E(Xn∗ )p ≤ Cp {E[ E(u2i ε2i |Fi−1 )]p/2 + E max |ui εi |p }
1≤i≤n
i=1
Pn Pn 2
i=1E(u2i ε2i |Fi−1 ) ≤ p
i=1 ui [E(|εi | |Fi−1 )
2/p
]
2 P
n
≤ C p ( i=1 u2i )
CE( ni=1 u2i )p/2
P
f irst term ≤ CpP
n p p
second term ≤ E Pn i=1 |ui | p|εi | p
= Pi=1 E(|ui | |εi | )
n p p
= i=1 E{E(|ui | |εi | |Fi−1 )}
P n p
≤ C P i=1 E|ui |
n
= CE(P i=1 |ui |p )
≤ CE ni=1 u2i (max1≤j≤n |uj |p−2 )
p−2
≤ CE(Pni=1 u2i )( ni=1 u2i ) 2
P P
= CE( ni=1 u2i )p/2
Let K = Cp C + C.
Pn
|ai |p ≤ ( ni=1 a2i )p/2 .
P
ai constant,p ≥ 2 : i=1

The comparison of Local convergence theorems and Global convergence theorems:


Conditional Borel-Cantelli Lemma:
Classical results: Ai events,

31
P
1. If P (Ai ) < ∞ then P (Ai i.o.) = 0.
P
2. If Ai are independent and P (Ai i.o.) = 0 then P (Ai ) < ∞.
P∞
Define X = i=1 IAi then {Ai i.o.} = {X = ∞}.
X X X
P (Ai ) = E(IAi ) = E( IAi ) = E(X)

The classical result connects the finiteness of X and E(X).


1. X > 0, E(X) < ∞ ⇒ X < ∞ a.s.

2. ?
P∞ P∞
i=1 E(IAi |Fi−1 ) < ∞ a.s. if i=1 EIAi < ∞,Fn = σ(A1 , · · · , An )

Mi = E(IAi |Fi−1 ) = P (Ai )


X∞ ∞
X
P( IAi < ∞) > 0 ⇒ P (Ai ) < ∞
i=1 i=1

Theorem: Let {Xn } be a sequence of nonnegative random variables and {Fn , n ≥ 0}


be a sequence of increasing σ–fields. Let Mn = E(Xn |Fn−1 ). Then
P∞ P∞
1. i=1 Xi < ∞ a.s. on { i=1 Mi < ∞}, and

1
2. if
PY = supn Xn /(1 + XP1 + · · · + Xn−1 ) ∈ L and Xn is Fn –measurable then
∞ ∞
i=1 Mi < ∞ a.s. on { i=1 Xi < ∞}.

Remark: If Xi are uniformly bdd by C then Y ≤ C a.s. and Y ∈ L1 . In this


case,with the assumption Xn is Fn –measurable.
X∞ ∞
X ∞
X ∞
X
P [({ Xi < ∞} 4 { Mi < ∞}) ∪ ({ Xi = ∞} 4 { Mi = ∞})] = 0
i=1 i=1 i=1 i=1

proof: ( Due to Louis,H.Y.Chen,Ann.Prob 1978)

Theorem 1.16 Let {Xn } be a sequence of nonnegative random variables and {Fn }
be a sequence of increasing σ–fields. Let Mn = E(Xn |Fn−1 ) for n ≥ 1.
P∞ P∞
1. i=1 Xi < ∞ a.s. on { i=1 Mi < ∞}.
Xn 1
P∞
2. If Xn is Fn –measurable and Y = supn 1+X1 +···+Xn−1
∈ L then i=1 < ∞ a.s.
P∞
on { i=1 Mi < ∞}.

32
Classical results : Ai events
P ∞
i=1 P (Ai ) < ∞ ⇒ P (An i.o.) = 0
If Ai are independent then P (An i.o.) = 0 or P ( ∞
P P∞
i=1 IAi < ∞) = 1, ⇒ i=1 P (Ai ) <
∞.

xi = IAi , Fn = σ(A1 , · · · , An )
P∞
P (Ai ) = ∞
P P∞
i=1P i=1 E(I Ai
) = E( i=1 IAi )
∞ P∞
= E( i=1 Xi ) = E{ i=1 E(Xi |Fi−1 )} < ∞
P∞
|Fi−1 ) < ∞ a.s. ⇒P ∞
P
⇒ i=1 E(XiP i=1 Xi < ∞ a.s.
{An i.o.} = { ∞ I Ai = ∞} = { ∞
i=1 Xi = ∞}
P∞ P∞ i=1 indep. P∞ P∞
i=1
P∞ Mi = i=1 E(IAi |Fi−1 P)∞ = i=1 E(IAi ) = i=1 P (Ai )
P { i=1 IAi < ∞} > 0 ⇒ i=1 P (Ai ) < ∞

proof of theorem:
(i) Let M0 = 1. Consider
Pn Mi
i=1 (M0 +···+Mi−1 )(M0 +···+Mi )
P n 1 1
= i=1 { M0 +···+M i−1
− M0 +···+M i
}
1 1 1
= M0 − M0 +···+Mn = 1 − 1+M0 +···+Mn
Let Sn = M0 + · · · + Mn then Sn is Fn−1 –measurable.
Since 1 ≥ E ∞
P Mi
P∞ Mi
i=1 Si−1 Si = i=1 E( Si−1 Si )
P∞
= i=1 E( E(X i |Fi−1 )
)= ∞ Xi
P
S i−1 S i i=1 E{E( Si−1 Si |Fi−1 )}
= ∞
P Xi
P∞ Xi
i=1 E( Si−1 Si ) = E( i=1 Si−1 Si )
P∞ Xi
So i=1 Si−1 Si < ∞ a.s.

On the set {S∞ < ∞},


∞ ∞ ∞ ∞
X Xi X Xi 1 X X
≥ 2
= 2 Xi ⇒ Xi < ∞
i=1
Si−1 Si i=1
S∞ S∞ i=1 i=1

(ii) Let X0 = 1 and Un = ni=0 Xi is Fn –measurable.


P

E( ∞ Mi
)= ∞ Mi
)= ∞ EE( UM2 i |Fi−1 )
P P P
2
i=1 Ui−1 i=1 E( Ui−1
2
P∞ Xi P∞ Xi Ui=1 i−1
= E( i=1 U 2 ) = E( i=1 Ui−1 Ui Ui−1 i
)
i−1
≤ E[( ∞ Xi Ui Ui
P
i=1 Ui−1 Ui )(supi Ui−1 )] ≤ E supi Ui−1
= E(supi (1 + UXi−1
i
)) = E(1 + Y ) < ∞

33
P∞ Mi
So 2
i=1 Ui−1 < ∞ a.s.
2
On the set {U∞ < ∞}
∞ P∞ ∞
Mi i=1 Mi
X X
2
≥ 2
⇒ Mi < ∞
i=1
Ui−1 U ∞ i=1

Remark: Under condition(ii)

P [{P∞
P P∞
i=1 Mi < ∞} 4 {Pi=1 Xi < ∞}] = 0, and
P [{ ∞i=1 Mi = ∞} 4 {

i=1 Xi = ∞}] = 0.

1.5 Series Convergence


Recall (Global) convergence theorem :
{Xn , Fn } is a martingale and supn E|Xn | < ∞ ⇒ Xn converges a.s.
Let ε1 = X1 and εn = Xn − Xn−1 ,n ≥ 2
n
X n
X
sup E| εi | < ∞ ⇒ εi converges a.s.
n
i=1 i=1
Pn
Theorem P1.17 (Doob) Let {Xn = i=1 εi , Fn } be a martingale. Then Xn converges

a.s. on { i=1 E(ε2i |Fi−1 ) < ∞}.
proof: Fix K > 0. Define τ = inf{n : n+1 2
P
i=1 E(εi |Fi−1 ) > K}. Then {Xn∧τ , Fn∧τ } is
a martingale.
2
Pn∧τ 2 Pn 2
E(X
Pnn∧τ ) = E( i=1 ε i ) = E( i=1 εi I[τ ≥i] )
2
P n 2
= i=1PnE(I[τ ≥ i]εi2) = i=1 EE(I ≥i] εi |Fi−1 )
P[τn∧τ
= E{ i=1 I[τ ≥i] E(εi |Fi−1 )} = E{ i=1 E(ε2i |Fi−1 )}
≤ E(K) = K
2
) < ∞ so Xn∧τ converges a.s. But on the event AK = { ∞ 2
P
Since supn E(Xn∧τ i=1 E(εi |Fi−1 ) ≤
K} : τ = ∞ and Xn∧τ P∞= Xn .2 So Xn converges a.s. on AK . Hence it also converges

a.s. on ∪K=1 AK = { i=1 E(εi |Fi−1 ) < ∞}.

Theorem 1.18 Pn(Three series Theorem)


Let Xn = i=1 εi be Fn –adaptive and C a positive constant. Then Xn converges
a.s. on the event where
P∞
(i) i=1 P [|εi | > C|Fi−1 ] < ∞,

34
Pn
(ii) i=1 E(εi I[|εi |≤C] |Fi−1 ) converges, and
P∞ 2
(iii) i=1 {E(εi I[|εi |≤C] |Fi−1 ) − E 2 (εi I[|εi |≤C] |Fi−1 )} < ∞

Remark: When εi are independent,(i),(ii) and (iii) are also necessary for Xn to be an
a.s. convergent series.
proof:
Xn = Pni=1 εi
P
n Pn
= i=1 ε i I[|εi |>C] + i=1 {εi I|εi |≤C] − E(εi I[|εi |≤C] |Fi−1 )}
+ ni=1 E(εi I[|εi |≤C] |Fi−1 )
P
= I1n + I2n + I3n
Let Ω0 = {(i),(ii) and (iii) hold}. By (i) and the conditional Borel—Cantelli lemma,

X
I[|εi |>C] < ∞ a.s. on Ω0
i=1

Hence I[|εi |>C] = 0 eventually on Ω0 . So I1n converges a.s. on Ω0 . The conver-


gence of I2n on Ω0 follows from (iii) and Doob’s theorem. Let Zi = εi I|εi |≤C] −
E(εi I[|εi |≤C] |Fi−1 ).

E(Zi2 |Fi−1 ) = E(ε2i I[|εi |≤C] |Fi−1 ) − E 2 (εi I[|εi |≤C] |Fi−1 ).
I3n follows from (ii).

Counterexample: Let Xn be a sequence of independent random variables s.t.


1 −1 1
P [Xn = √ ] = P [Xn = √ ] = .
n n 2
a.s.
Let Fn = σ(X1 , · · · , Xn ),ε1 = X1 and εn = Xn − Xn−1 for n ≥ 2. Claim (i) Xn → 0
since |Xn | = √1n a.s. (ii) Let C = 2. Then I[|εi |≤2] = 1, since |εn | ≤ 2.
Pn Pn
i=1 E(ε i |Fi−1 ) = Pi=1 {E(Xi ) − Xi−1 Pn}
n
P∞ = Pi=2 −X i−1 = −
P∞ i=21 Xi−1
∞ 2
i=2 Var(Xi−1 ) = i=2 EXi−1 = i=2 i−1 = ∞
P
⇒ Xi diverges a.s.

Theorem 1.19P(Chow)
n
P{X
Let n = i=1 εi , Fn } be a martingale and 1 ≤ p ≤ 2. Then Xn converges a.s.
on { ∞ i=1 E(|ε |p
i |Fi−1 ) < ∞}.

35
proof: Let C > 0.
(i) P [|εi | > C|Fi−1 ] ≤ E(|εi |p |Fi−1 )/C p .

(ii)

X ∞
X
|E(εi I[|εi |≤C] |Fi−1 )| = |E(εi I[|εi |>C] |Fi−1 )|
i=2 i=2

X ∞
X
≤ E(|εi |I[|εi |>C] |Fi−1 ) ≤ E(|εi |p |Fi−1 )/C p−1
i=2 i=2

(iii)
E{ε2i I[|εi |≤C] |Fi−1 } ≤ E{|εi |p C 2−p |Fi−1 }
≤ C 2−p E{|εi |p |Fi−1 }.
Pn+1
New proof: τ = inf{n : E(|εi |p |Fi−1 ) > K}, 1 < p ≤ 2.
i=1
Pn
E|Xτ ∧n |p = E| P i=1 I[τ ≥i] εi |
p
n
≤ Cp E( Pi=1 I[τ ≥i] ε2i )p/2
≤ Cp E{Pni=1 I[τ ≥i] |εi |p }
= Cp E{ n∧τ p
i=1 E(|εi | |Fi−1 )}
≤ KCp

When p = 1,
E|Xτ ∧n | ≤ E Pni=1 I[τ ≥i] |εi |
P
= E n∧τ
i=1 E(|εi ||Fi−1 ) ≤ K.

Colloary. Let {εn , Fn } be a sequence P and 1 ≤


of martingale differences p ≤ 2. Let Xn
be Fn−1 –measurable. Then ni=1 Xi εi converges a.s. on { ∞ p p
P
i=1 |Xi | E(|ε i | |Fi−1 ) <
∞}.
Remark: We does not assume that Xi is P integrable.
Proof: We can find constants ai so that ∞ i=1 P [|Xi | > ai ] < ∞. For any Z and
α > 0, we can find n so that P [|Z| > n] ≤ α.
1
an ↔ αn =
n2
P [|Xn | > an i.o.] = 0, so we can replace Xi by X̃i = Xi I[|Xi |≤ai ] . In this case, ni=1 Xi εi
P
is a martingale and E(|Xi εi |p |Fi−1 ) = |Xi |p E(|εi |p |Fi−1 ). The collary follows Chow’s
result.
Remark: If supn E(|εi |p |Fi−1 ) < ∞ then ni=1 Xi εi converges a.s. on { ∞ p
P P
i=1 |Xi | <
∞}

36
yi = βxi + εi
εi i.i.d., Eεi = 0 and V ar(εi ) = σ 2
xi is Fi−1 P
= σ(ε1 , · · · , εi−1 )–measurable. P
n ∞ P∞ 2
xi εi xi εi
β̂n = β + Pi=1 2 converges a.s. to β + 2 on { i=1 xi < ∞}
n Pi=1

x
i=1 i x
i=1 i
Chow’s Theorem

n
(∞ )
X X
εi converges a.s. on E(| εi |P | Fi−1 ) < ∞ , where 1 ≤ p ≤ 2.
i=1 1

Special case:

sup E(| εi |2 | Fi−1 ) < ∞


i
n
(∞ )
X X
⇒ xi εi converges a.s. on x2i < ∞
1 1

Corollary : If un is Fn−1 measurable


then
n
( ∞
)
X X
εi = 0(un ) a.s. on the set un ↑ ∞, | ui |−p E{| εi |p | Fi−1 } < ∞
1 1

pf : Take xi = u1i

X 1
Then εi converges a.s. by previous corollary. In view of Kronecker’s Lemma.
1
ui

X
εi
1
−→ 0 when un ↑ ∞
un
R∞
Pn : Let2 f : [0, ∞) → (0, ∞) be an increasing fun. s.t.
Corollary 0
f −2 (t)dt < ∞ . Let
2
sn = i=1 E(εi | Fi−1 )Fn−1 measurable. Then
n n
X εi X
2
converges a.s., εi = 0(f (s2n )) a.s.
i=1
f (si ) i=1
on {s2n → ∞} where lim f (t) = ∞.
t→∞

37
pf:

" 2 # ∞
X εi X E(ε2i | Fi−1 )
E | Fi−1 =
i=1
f (s2i ) i=1
f 2 (s2i )
∞ ∞ Z s2i
X s2i − s2i−1 X 1
= ≤ dt
f 2 (s2i ) s2i−1 f 2 (t)
Zi=1∞ i=1
1
≤ dt < ∞
so f 2 (t)
Remark:
 1+δ
t1/2 (log t) 2 , δ > 0, t ≥ 2
f (t) =
f (2), o.w.
or f (t) = t
For this, we have that

X
s2∞ = x2i E(ε2i | Fi=1 )
i=1
n n
!
X X
x i εi = 0 x2i E(ε2i | Fi=1 )
"i=1∞ i=1
#
X
on x2i E(ε2i | Fi=1 ) = ∞
i=1

If we assume that
sup E(ε2i | Fi−1 ) < ∞
i

n n
! (∞ )
X X X
x i εi = 0 x2i on x2i = ∞
i=1 i=1 i=1

In summary, under the assumption


sup E(ε2i | Fi−1 ) < ∞
i

n  P∞ 2
X O(1) on { 1 xi < ∞}
x i εi =
o( 1 xi ) on { ∞
Pn 2 P 2
i=1 1 xi = ∞}

38
Example : yi = βxi + εi
where {εi , Fi } is a martingale difference seq. s.t.

sup E(ε2n | Fn−1 ) < ∞ a.s. and xi is Fi−1 measurable.


n
Then
n
X n
X
xi yi x i εi
1 1
β̂n = n =β+ n
X X
x2i x2i
1 1


X
converges a.s. and the limit is β on { x2i = ∞}
1
pf:

X n
X
On { x2i < ∞}, xi εi converges
1 1


X
x i εi
i=1
So that β̂n → β + ∞
X
x2i
i=1

X
On { x2i = ∞}
1
n
X
x i εi
i=1
n −→ 0, as n → ∞
X
x2i
i=1

So that β̂n −→ β

Application (control)
yi = βxi + εi , (β 6= 0) where εi i.i.d. with E(εi ) = 0, V ar(εi ) = σ 2
Goal: Design xi which depends on previous observations so that y ' y ∗ 6= 0

39
Strategy : choose x1 arbitrary , set
y∗
xn+1 =
β̂n
Question:
y∗
xn → a.s. ?
β
or β̂n → β a.s. ?

By previous result, β̂n always converges.

(y ∗ )2
Then x2n+1 = is bounded away from zero
β̂n2

X
and x2n+1 = ∞ a.s.. Therefore, β̂n → β a.s.
1

Open Question:
Is there a corresponding result for

yi = α + βxi + εi
or yi = αyi−1 + βxi + εi

Open Questions:

X
Assume that | xi |p < ∞ a.s. and
1
p
sup E(| εn | | Fn−1 ) < ∞ a.s. for some 1 ≤ p ≤ 2
n


X
What are the distribution properties of S = x i εi ?
1


xi are constants 

xi 6= 0 i.o.

⇒ S has a continuous distribution
p=2 

lim inf n→∞ E(| εn || Fn−2 ) > 0 a.s.

40
Almost Supermartingale
Theorem (Robbins and Siegmund)
Let {Fn } be a sequence of increasing fields and xn , βn , yn , zn are nonnegative Fn -
measurable random variables

s.t. E(xn+1 | Fn ) ≤ xn (1 + βn ) + yn − zn a.s.

Then on
(∞ ∞
)
X X
βi < ∞, yi < ∞
i=1 i=1


X
xn converges and zi < ∞ a.s.
1

pf:1o Reduction to the case βn = 0, ∀ n


n−1
Y
set x0n = xn (1 + βi )−1
i=1
n
Y
yn = yn (1 + βi )−1
0

i=1
n
Y
0
zn = zn (1 + βi )−1
i=1

n
0
Y
Then E(xn+1 | Fn ) = E(xn+1 | Fn ) (1 + βi )−1
i=1
n
Y
≤ [xn (1 + βn ) + yn − zn ] (1 + βi )−1
i=1
= x0n + yn0 − zn0

(∞ ) n
X Y
on βi < ∞ , (1 + βi )−1 converges to a nonzero limit.
i=1 i=1

41
Therefore,
X X
(i) yi < ∞ ⇐⇒ yi0 < ∞
(ii) xn converges ⇐⇒ x0n converges
X X
(iii) zi < ∞ ⇐⇒ zi0 < ∞

2o Assume that βn = 0, ∀ n
E(xn+1 | Fn ) ≤ xn + yn − zn
n−1
X n−1
X n−1
X
Let un = xn − (yi − zi ) = xn + zi − yi
1 1 1

n
X
Then E(un+1 | Fn ) = E(xn+1 | Fn ) − (yi − zi )
1
n
X
≤ xn + yn − zn − (yi − zi )
1
n−1
X
= xn − (yi − zi ) = un
1

Given a > 0 , define


n
X
τ = inf{n : yi > a}
1

X
Observe that [τ = ∞] = [ yi ≤ a]
1

and uτ Λn is also a supermartingale


∧n−1
τX
uτ ∧n ≥ − yi ≥ −a, ∀ n
1
So that uτ ∧n converges a.s.
X∞
Consequently un = uτ ∧n converges on [τ = ∞] = { yi ≤ a} . Since a is arbi-
1
X∞
trary, un converges a.s. on { yi < ∞}
1

42

X ∞
X
So that x + zi converges a.s. on { yi < ∞}
1 1
n
X
So that zi converges and so does xn .
1
Example : Find the quantile

Assume y = α + βx + ε where β > 0


Given y ∗ , want to find x∗
Method : choose x1 arbitrary
xn+1 = xn + an (y ∗ − yn ) , an > 0
↑ ↑
control step control direction
=⇒ Stochastic Approximation
?
Question : xn → x∗
(xn+1 − x∗ ) = (xn − x∗ ) + an (α + βx∗ − α − βxn − εn )
= (xn − x∗ )(1 − an β) − an εn
xn+1 is Fn -measurable
where Fn = σ(xo , ε1 , · · · , εn )
E((xn+1 − x∗ )2 | Fn−1 ) = (xn − x∗ )2 (1 − an β)2 + a2n σ 2
where we assume εi are i.i.d., Eεi = 0, var(i ) = σ 2
Xn = (xn+1 − x∗ )2
Zn−1 = 2an β(xn − x∗ )2 = 2βan Xn−1
Yn−1 = a2n σ 2
Bn−1 = a2n β 2
E(Xn | Fn−1 ) ≤ Xn−1 (1 + βn−1 ) + Yn−1 − Zn−1

43
P 2
Condition (1) an < ∞ P
Then Xn converges
P a.s. and Zi < ∞ a.s.
Condition (2) an = ∞
X
Pn converges
P to X
Zi = 2β ai+1 Xi < ∞
⇒ X = 0 a.s. P
Remark: Assume ai < ∞

(xn+1 − x∗ ) = (xn − x∗ ) + an (α + βx∗ − α − βxn − εn )


= (xn − x∗ )(1 − an β) − an εn
Yn Xn Yn

= (1 − aj β)(x1 − x ) − (1 − a` β)aj εj
j=1 i=1 `=j+1
n n
" # n " j #−1
Y Y X Y
= (1 − aj β)(x1 − x∗ ) − (1 − a` β) (1 − a`β ) aj ε j
j=1 `=1 j=1 `=1

n
Y
P
when aj < ∞, Cn = (1 − aj β) converges to C > 0.
j=1
" ∞
#
X
So that xn − x∗ → C (x1 − x∗ ) − Cj−1 aj εj
j=1


X
Note that (Cj−1 aj )2 < ∞, Cj−1 aj > 0 ∀j
j=1

X
So that Cj−1 aj εj has a continuous distribution
j=1

This implies that where xi is a const.


" ∞
#
X
P (x1 − x∗ ) − Cj−1 aj εj = 0 = 0
j=1

Central Limit Theorems (CLT)


Reference: I.S. Helland (1982)
Central Limit Theorems for martingales with discrete or continuous time. Scand J.
Statist. 9, 79∼ 94.

44
Classical CLT:
Assume that ∀ n
Xn,i , 1 ≤ i ≤ kn are indep. with EXn,i = 0.
Xkn
2 2
Let sn = Xn,i
i=1

kn
X 1  2 
Thm. If ∀ ε > 0, 2
E Xn,i I[|Xn,i |>sn ε] → 0
s
i=1 n
kn
X Xn,i D
then → N (0, 1)
i=1
sn
Xn,i
* Reformulation: X
en,i =
sn
kn
X
(i) e2 ) = 1
E(Xn,i
i=1
Xkn h i
2
(ii) E Xen,i I[|Xen,i |>ε] → 0 ∀ ε
i=1
0
(ii) is Lindeberg s condition
* uniform negligibility (How to use mathematics to formulate?)
( D
max | Xn,i |→ 0
1≤i≤kn
2
controlXn,i
* condition of varience
To recall Burkholder0 s inequality: ∀ 1 < p < ∞
n
!p/2 n
!p/2
X X
Cp0 E d2i ≤ E | Sn |p ≤ Cp E d2i
i=1 i=1

EZ p d2i )1/2 EZ p
P
Z=(
kn kn
f ormalize
X X
2 2
Xn,i → (Xn,i | Fn,i−1 )
i=1 i=1
1 j
X X
2 2
Xn,i E(Xn,i | Fn,i−1 )
i=1 i=1
↑ ↑
optional quadratic variance predictable quadratic variance

45
( Thm. j∀ n ≥ 1, {Fn,j ; 1 ≤
) j ≤ kn < ∞} is a sequence of increasing σ-fields. Let
X
Sn,j = Xn,i , 1 ≤ j ≤ kn be {Fn,j }-adaptive.
i=1
Define
Xn∗ = max | Xn,i |,
1≤i≤kn
j
X
2 2
Un,j = Xn,i , 1 ≤ j ≤ kn
i=1

Assume that
kn
D
X
(i) Un2 = 2
Un,k n
= 2
Xn,i → Co , where Co > 0 is a constant.
i=1
D
(ii) Xn∗ → 0
(iii) sup E(Xn∗ )2 < ∞
n≥1
kn kn
D D
X X
(iv) E{Xn,j | Fn,j−1 } → 0 and E 2 {Xn,j | Fn,j−1 } → 0
j=1 j=1

Then
kn
D
X
Sn = Xn,i → N (0, Co )
i=1
= Sn,kn
Remark:{Xn,j , 1 ≤ j ≤ kn } can be defined on different probability space for different
n.
Step 1. Reduce the problem to the case
where {Sn,j , Fn,j , 1 ≤ j ≤ kn } is a martingale. Set
en,j = Xn,j − E(Xn,j | Fn,j−1 )
X 1 ≤ j ≤ kn , Fn,o : trivial field
kn
X
2 e2
Un =
e X n,j
j=1
en∗
X = max | X
en,j |
1≤j≤kn
kn
X
Sen = X
en,j
j=1

46
kn
D
X
(a)Sn − Sen = E(Xn,j | Fn,j−1 ) → 0 by(iv)
j=1

1/2
en∗ ≤ | E(Xn,j | Fn,j−1 ) |2

(b) X max | Xn,j | + max
1≤j≤kn 1≤j≤kn
(k )1/2
X n

= Xn∗ + E 2 (Xn,j | Fn,j−1 )


j=1

e∗ → D
So that X n 0 by (ii) and (iv)
 
∗ 2 ∗ 2 2
(X ) ≤ 2(Xn ) + 2 max E (Xn,j | Fn,j−1 )
e
1≤j≤kn
 
∗ 2 2 ∗
≤ 2(Xn ) + 2 max E (Xn | Fn,j−1 )
1≤j≤kn

| E(Xnj | Fn,j−1 ) |≤ E(| Xn,j || Fn,j−1 ) ≤ E(Xn∗ | Fn,j−1 )


Vj = E(Xn∗ | Fn,j ) is a martingale 1 ≤ j ≤ kn
 
E sup Vj ≤ 4E(Xn∗ )2
2
1≤j≤kn

by Doob0 s ineq. ∞ > p > 1 ,


k sup | Xj | kp ≤ q k Xn kp .
1≤j≤n

e ∗ )2 ≤ 2E(X ∗ )2 + 2 × 4E(X ∗ )2
So that E(Xn n n
∗ 2
= 10E(Xn ) < ∞

kn kn
D
X X
e2 − U 2 =
U 2
E (Xn,j | Fn,j−1 ) − 2 Xn,j E(Xn,j | Fn,j−1 ) → 0
n n
j=1 j=1
kn
D
X
E 2 (Xn,j | Fn,j−1 ) → 0 By(iv)
j=1
kn
D
X
Xn,j E(Xn,j | Fn,j−1 ) → 0
j=1

47
kn kn
!1/2 kn
!1/2
D
X X X
2
Because | Xn,j E(Xn,j | Fn,j−1 ) |≤ Xn,j E 2 (Xnj | Fn,j−1 ) →0
i=1 j=1 i=1

kn
!1/2
D
X
2
Xn,j = (Un2 )1/2 → Co1/2
j=1

kn
!1/2
D
X
E 2 (Xn,j | Fn,j−1 ) →0
i=1

D
e2 →
So that Un Co

Thm. ∀n ≥ 1, {Fn,j , 1 ≤ j ≤ kn < ∞} is a sequence of increasing σ-fields. Let


j
X
{Sn,j = Xn,i , 1 ≤ j ≤ kn } be {Fn,j } -martingale. Define Xn∗ = max | Xn,i |
1≤i≤kn
i=1
j
X
2 2
, Un,j = Xn,i , 1 ≤ j ≤ kn
i=1
Assume that
kn
D
X
(i) Un2 = 2
Un,k n
= 2
Xn,i → Co , where Co > 0 is a constant
i=1
D
(ii) Xn∗ → 0
(iii) sup E(Xn∗ )2 < ∞
n≥1

Then
kn
D
X
Sn = Xn,i → N (0, Co )
i=1

Step 2. Further Reduction. Define


 2
inf{i : 1 ≤ i ≤ kn , Un,i > C} , when Un2 > C
τ=
kn , when Un2 ≤ C

48
where C > Co

Define X̂n,j = Xn,j I[τ ≥j]


kn
X kn
X τ
X
Ŝn = X̂n,j = Xn,j I[τ ≥j] = Xn,i
j=1 i=1 j=1
j
X
2 2
Ûn,j = X̂n,i ,
i=1

X̂ = max | X̂n,j |
1≤i≤kn
τ
X
2 2
Ûn = Ûn,k n
= Xn,j
j=1

P (Sn 6= Ŝn ) ≤ P (Un2 > C) → 0

⇒ It is sufficient to show that


D
Ŝn → N (0, Co )

If C ≥ Un2 then Ûn2 = Un2


If C < Un2 then τ ≤ kn and
τ −1
X
2
C < Ûn = 2
Xn,j 2
+ Xn,τ ≤ C + (Xn∗ )2
i=1

So that Un2 ∧ C ≤ Ûn2 ≤ (Un2 ∧ C) +(Xn∗ )2


↓ ↓ ↓D
Co ∧ C = Co Co ∧ C = Co 0

D
⇒ Ûn2 → Co
Clearly, X̂n∗ ≤ Xn∗
D
Therefore, X̂n∗ → 0 by (ii) and
sup E(X̂n∗ )2 ≤ sup E(Xn∗ )2 < ∞
n≥1 n≥1

Step 3. E eiŜn → e−co /2


D
Claim: This is sufficient to show Sn → N (0, Co )

49
Reason : Step 3 ⇒ E eiSn → e−Co /2
2
Now replace Sn by t Sn . Using step 3 again, we obtain EeitSn → e−t Co /2
(a) Expansion
2
eix = (1 + ix)e(−x /2)+r(x) , where | r(x) |≤| x |3 for | x |< 1

Because | x |< 1
⇒ ix = [log(1 + ix)] − x2 /2 + r(x)
x2
⇒ r(x) = + ix − log(1 + ix)
2 "∞ #
x2 X (ix)j
= + ix − (−1)j+1
2 j=1
j
∞ j
X
j (ix) (ix)3 (ix)4
= (−1) =− + − ···
j=3
j 3 4
= x a(x) + x3 b(x)i
4

1 x2 x4 1
where a(x) = − + − ··· <
4 6 8 4
2 4
1 x x 1
b(x) = − + ··· <
3 5 7 3

p
| r(x) | = x8 a2 (x) + x6 b2 (x)
r r
x8 x6 3 1 1
≤ + ≤| x | + ≤| x |3
16 9 16 9

kn
Y
iŜn
e = eiXn,j
j=1
kn
X Xkn
2
"k
n
# − X̂n,j /2 + r(X̂n,j )
Y
= (1 + iX̂n,j ) e j=1 j=1

j=1
def 2
= Tn e−Ûn /2+Rn
h 2
i 2
= (Tn − 1)e−Co /2 + (Tn − 1) e−Ûn /2+Rn − e−Co /2 + e−Ûn /2+Rn
= In + IIn + IIIn

50
Note that on {X̂n∗ < 1}
kn
X kn
X
| Rn | ≤ | r(X̂n,j ) |≤ | X̂n,j |3
j=1 j=1
kn
X
≤ X̂n∗ 2
X̂n,j = X̂n∗ Ûn2
j=1

So that | Rn |≤| Rn | I[X̂n∗ ≥1] + X̂n∗ Ûm


2

↓D ↓D ↓D
0 0 Co

D
⇒ Rn → 0
D
So that IIIn → e−Co /2

(k )
X n

Now E | Tn |2 = E 2
(1 + X̂n,j )
j=1
Y
2 2
= E(1 + X̂n,τ ) (1 + X̂n,j )
j<τ
τ −1
X
2
X̂n,j
≤ E(1 + X̂n∗2 )e j=1
≤ ec E(1 + X̂n∗2 ) < ∞

So that {Tn } is u.i. ⇒ conv. in dist.


⇒ conv. in expectation

D
| IIn |= | Tn − 1 | | IIIn − e−co /2 |→ 0
k ↓D
Op (1) 0

51
E(In ) = e−co /2 [E(Tn ) − 1] = 0
(k )
Y n

E(Tn ) = E (1 + iX̂n,j )
j=1
(k )
Y n

= E (1 + iX̂n,j ) · E(1 + iX̂n,kn | Fn,kn −1 )


j=1
(k −1 )
n
Y
= E (1 + iX̂n,j ) = · · · = E{1 + iX̂n,1 } = 1
j=1

So that eiŜn = In + IIn + IIIn


D D
E(In ) = 0, IIn → 0, IIIn → e−co /2
In = (Tn − 1)e−co /2 is u.i.

D
eiŜn − In = IIn + IIIn → e−co /2

But eiŜn − In is u.i.

Therefore E(eiŜn ) = E(eiŜn − In )


u.i.
→ E(e−co /2 ) = e−co /2 , as n → ∞

Note:
j
( )
X
∀n Sn,j = Xn,i , Fn,j is a martingale
i=1

kn
D
X
(i) Un2 = 2
Xn,i →C>0
i=1
D
(ii) sup | Xn,i |→ 0
1≤i≤kn

(iii) sup E sup | Xn,i |2 < ∞


n 1≤i≤kn
kn
D
X
⇒ Sn = Xn,i → N (0, C)
i=1

52
Lemma 1. Assume that Fo ⊂ F1 ⊂ · · · ⊂ Fn
Then ∀ ε > 0 ,
n
! ( n )
[ X
P Ai ≤ ε + P P (Aj | Fj−1 ) > ε
i=1 j=1

k
X
pf: Let µk = P (Aj | Fj−1 )
j=1
Then µk is Fk−1 -measurable
n
! n
[ X
So that P Ai [µn ≤ ε] ≤ P (Ai [µn ≤ ε])
i=1 i=1
Xn
≤ P (Ai [µi ≤ ε])
i=1
n
X
= E E(IAi I[µi ≤ε] | Fi−1 )
i=1
n
X
= E E(IAi | Fi−1 )I[µi ≤ε]
i=1
≤ ε

j
X
Lemma : Zj ≥ 0, µj = E(Zi | Fi−1 )
i=1

n
X n
X
Then E Zi I[µi ≤ε] = E E(Zi | Fi−1 )I[µi ≤ε] ≤ ε
i=1 i=1

pf: Set τ = max{j : 1 ≤ j ≤ n, µj ≤ ε}


Then, since µ1 ≤ µ2 ≤ · · · ≤ µτ
n
X
E(Zi | Fi−1 )I[µi ≤ε]
i=1
τ
X
= E(Zi | Fi−1 ) = µτ ≤ ε.
i=1

53
Corollary. Assume that Ynj ≥ 0 a.s. and Fn,1 ⊂ · · · ⊂ Fn,kn
kn
D
X
Then P (Yn,j > ε | Fn,j−1 ) → 0, ∀ ε
j=1
D
⇒ max Yn,j → 0
a≤j≤kn

kn
D
X
Remark : E[Yn,j I[Yn ,j>ε] | Fn,j−1 ] → 0 is sufficient
j=1
pf : Let Yn∗ = max Yn,j
1≤j≤kn
"k #
[ n

P (Yn∗ > ε) = P (Yn,j > ε)


j=1
" kn
#
X
≤η+P P ([Yn,j > ε] | Fn,j−1 ) > η ∀η > 0 By Lemma 1
j=1
lim sup P [Yn∗ > ε] ≤ η
n→∞
Set η → 0

Lemma 2. ∀n {Yn,j } is {Fn,j }-adaptive


Assume that Yn,j ≥ 0 a.s. and E(Yn,j ) < ∞
j j
X X
Let Un,j = Yn,i and Vn,j = E(Yn,j | Fn,j−1 )
i=1 i=1
Un = Un,kn , Vn = Vn,kn
k n
D
X
If E{Yn,j I[Yn,j >ε] | Fn,j } → 0
i=1
and {Vn } is tight (i.e. lim sup P (Vn > λ) = 0)
λ→∞ n

D
then max | Un,j − Vn,j |→ 0
1≤j≤kn

D
pf: By previous corollary Yn∗ → 0
0
Let Yn,j = Yn,j I[Yn,j ≤δ, Vn,j ≤λ]

54
0 0
Define Un,j , Vn,j , Un0 , Vn0 similarly
 
Then P max | Un,j − Vn,j |> 3γ
1≤j≤kn
 
0
≤ P max | Un,j − Un,j |> γ
1≤j≤kn
 
0 0
+P max | Un,j − Vn,j |> γ
1≤j≤kn
 
0 def
+P max | Vn,j − Vn,j |> γ ≡In + IIn + IIIn
1≤j≤kn

(1) In ≤ P [∃j 3 Yn,j > δ or Vnj > λ]


≤ P [Yn∗ > δ] + P [Vn > λ]
 
1 0 0 2
(2) IIn ≤ 2 E max (Un,j − Vn,j )
r 1≤j≤kn
1
≤ 2 4E(Un0 − Vn0 )2
r
kn
4 X 0 2 0
= 2 [E(Yn,j ) − E(E 2 (Yn,j | Fn,j−1 ))]
r i=1
kn
4 X 0 2
≤ E(Yn,j )
r2 j=1
k kn
!
n
4 X 0 4 X
0
≤ 2δ E(Yn,j ) = 4 δE E(Yn,j | Fn,j−1 )
γ j=1 γ j=1
kn
!
4δ X 
≤ 2E E Yn,j I[Vnj ≤λ] | Fn,j−1
r j=1
kn
!
4δ X 4δλ
= 2E E[Yn,j | Fn,j−1 ]I[Vn,j ≤λ] ≤
r j=1
γ2

55
0
(3) Note that max | Vn,j − Vn,j |
1≤j≤kn
j
X
0
≤ max | (E(Yn,i | Fn,i−1 ) − E(Yn,i | Fn,i−1 )) |
1≤j≤kn
i=1
kn
X
0
≤ E(| Yn,i − Yn,i || Fn,i−1 )
i=1
kn
X
≤ E(Yn,j I[Yn,j >δ or Vn,j >λ] | Fn,j−1 )
j=1
kn
X kn
X
≤ E(Yn,j I[Yn,j >δ] | Fn,j−1 ) + E(Yn,j I[Vn,j >λ] | Fn,j−1 )
j=1 j=1
kn
X kn
X
≤ E(Yn,j I[Yn,j >δ] | Fn,j−1 ) + E(Yn,j | Fn,j−1 )I[Vn,j >λ]
j=1 j=1
kn
X kn
X
≤ E(Yn,j I[Yn,j >δ] | Fn,j−1 ) + E(Yn,j | Fn,j−1 )I[Vn >λ]
j=1 j=1
kn
X
≤ E(Yn,j I[Yn,j >δ] | Fn,j−1 ) + Vn I[Vn >λ]
j=1
" kn
#
X γ
IIIn ≤ P E(Yn,j I[Yn,j >δ] | Fn,j−1 ) >
j=1
2
h γi
+P Vn I[Vn >λ] >
"k 2 #
n
X γ
≤ P E(Yn,j I[Yn,j >δ] | Fn,j−1 ) > + P [Vn > λ]
j=1
2

 
4δλ
So that lim sup P max | Un,j − Vn,j |> 3γ ≤ 2 sup P [Vn > λ] +
n→∞ 1≤j≤kn n γ2
1
Let λ → ∞, δ = λ2
. The proof is completed.

56
j
X
Thm. ∀ n {Sn,j = Xn,i , Fn,j } is a martingale
i=1

kn
D
X
If (i) Vn2 = 2
E(Xn,i | Fn,i−1 ) → C > 0
i=1
kn
D
X
2 0
and (ii) E(Xn,i 2 >ε] | Fn,i−1 ) → 0 Conditional Lindeberg s condition
I[Xn,i
i=1
kn
D
X
then Sn = Xn,i → N (0, C)
i=1

2
pf: Set Yn,j = Xn,j
2 D
By (ii) and lemma 1, Yn∗ = max Xn,j →0
1≤j≤kn
D
or max | Xn,j |→ 0
1≤j≤kn
By (i), {Vn2 } is tight.
Therefore by (ii) and lemma 2.
D D
Vn2 − Un2 → 0, So that Un2 → C by (i).
0
Now define Xn,j = Xn,j I j

X
2
E(Xn,j 2 >ε] | Fn,j−1 ) ≤ 1
I[Xn,j
 

 
i=1
" kn
#
X
2
Since P [Sn 6= Sn0 ] ≤ P E(Xn,j 2 >ε] | Fn,j−1 ) > 1
I[Xn,j →0
j=1
D
So that it is sufficient to show that Sn0 → N (0, C)

0 D
(a) max | Xn,j |≤ Xn∗ → 0
1≤j≤kn
"k #
n
02
X
2 2
(b) P [Un 6= Un ] ≤ P 2 >ε] | Fn,j−1 ) > 1
E(Xn,j I[Xn,j →0
j=1
02 D
So that Un → C

57
 
0 0 0
(c) E max (Xn,j )2 ≤ E max (Xn,j )2 I[(Xn,j
0 )2 ≤ε] + E max (X
2
n,j ) I[(Xn,j
0 )2 >ε]
1≤j≤kn 1≤j≤kn 1≤j≤kn
kn
X
0
≤ ε+E (Xn,j )2 I[(Xn,j
0 )2 >ε]

j=1
kn
X
2
= ε+E Xn,j 2 >ε] I
I[Xn,j j

X
j=1 2
E(Xn,i 2 >ε] | Fn,i−1 ) ≤ 1
I[Xn,i
 

 
i=1
≤ ε + 1 < ∞.

( i
)
X
Thm. Let Sn,i = Xn,j , Fn,i 1 ≤ j ≤ kn be a martingale, s.t.
j=1
kn
D
X
2
(i) E(Xn,i | Fn,i−1 ) → C > 0
i=1
and
kn
D
X
2
(ii) An = E(Xn,i 2 >ε] | Fn,i−1 ) → 0 ∀ ε
I[Xn,i
i=1
kn
D
X
Then Sn = Xn,i → N (0, C)
i=1

Conditional Lyapounov condition


kn
D
X
Bn = E(| Xn,i |2+δ | Fn,i−1 ) → 0 for some δ > 0
i=1

Lyapounov0 s condition ⇒ Lindeberg0 s condition


kn kn
| xn,i |2+δ
   
D
X X
2
E Xn,i I[Xn,i
2 >ε] | Fn,i−1 ≤ E √ | Fn,i−1 →0
i=1 i=1
( ε)δ
kn
X
2
E(An ) = E(Xn,i 2 >ε] ) → 0
I[Xn,i
i=1
Xkn
E(Bn ) = E | Xn,i |2+δ → 0
i=1

58
Both are sufficient since An ≥ 0 and Bn ≥ 0
Example: yi = βxi + εi , i = 1, 2, · · ·
n
X n
X
xi yi x i εi
i=1 i=1
β̂n = n =β+ n
!
X X
x2i x2i
i=1 i=1

Assumptions:
n
an X
(1) ∃ an > 0 s.t. an ↑ ∞, → 1 and x2i /an → 1 a.s.
an+1 i=1
(2) εi i.i.d. E(εi ) = 0, V ar(εi ) = σ 2
(3) xi is Fi = σ(xo , ε1 , · · · , εi−1 ) measurable
(a) If E | ε1 |2+δ < ∞ then
√ D
an (β̂n − β) → N (0, σ 2 )
(b) If (xi , εi ) are identically distributed with
E(Xi2 ) < ∞, and an = n, then
√ D
n(β̂n − β) → N (0, σ 2 )
n
X
x i εi
i=1 xi εi
Consider Sn = √
an
, i.e. Xn,i = √
an
, kn = n

kn n
X
2
X x2 i
(1) E(Xn,i | Fn,i−1 ) = E(ε2i )
i=1 i=1
an
Xn
x2i
i=1 a.s.
= σ2 → σ2
an

59
n n !
X X Xi 2+δ
(a) E(| Xn,i |2+δ | Fn,i−1 ) = √
an (E | ε1 |2+δ )
i=1 i=1
 n
X


max | Xi |  | Xi |2 
δ
1≤i≤n a.s.
  i=1
 
≤ √  E | ε1 |2+δ →0
an 
 a n

n
X n−1
X
x2i x2i
x2n i=1 an−1 i=1 a.s.
= − · →0
an an an an−1
max (x2i )
1≤i≤n a.s.
⇒ →0
an

n
!
X Xi2 ε2i 
(b) E I Xi2 ε2i 
i=1
n n

n
!
1X
= E X12 ε21 I X12 ε21 
n i=1 n

n→∞
= E(X12 ε21 I[X12 ε21 >nδ] ) −→ 0

Note that

E(X12 ε21 ) = E(X12 E(ε21 | Fo )) = σ 2 E(X12 ) < ∞

Lemma. If Z ≥ 0 and E(Z) < ∞

then lim E(ZI[Z>Cn ] ) = 0 when Cn → ∞


n→∞
0 ≤ Zn = ZI[Z>Cn ] ≤ Z
Zn → 0a.s. by Lebesgue Dominated Convergence Theorem

Theorem 1. (Unconditional form)

60
( i
)
X
Let Sn,i = Xn,j , Fn,i , 1 ≤ i ≤ kn be a martingale s.t.
j=1

kn
D
X
2
(1) Xn,j →C>0
j=1
D
(2) Xn∗ = max | Xni |→ 0
1≤i≤kn

(3) sup E(Xn∗ )2 < ∞


n
kn
D
X
Then Sn = Xn,i → N (0, C)
i=1

Theorem 3.
(1) +
E(Xn∗ ) → 0 is sufficient
!
Note that (3) ⇒ {Xn∗ } is u.i.
(2) + u.i. ⇒ lim E(Xn∗ ) = 0
n→∞

Theorem 30 .
(1)+(2)+
kn
D
X
E(Xn,j I[|X
nj |>1]
|Fn,j−1 )| → 0 is sufficient
j=1

Lemma. Assume that Yn,j ≥ 0 is Fnj -adaptive


 

If E(Yn ) = E max Ynj = 0(1)
1≤j≤kn
kn
D
X
then E(Yn,j I[Yn,j >ε] | Fn,j−1 ) → 0 ∀ ε > 0
j=1

S kn
> ε] = [Yn∗ > ε]

inf {1 ≤ j ≤ kn : Yn,j > ε} on j=1 [Yn,j
pf : Define τn =
kn Otherwise

61
∀δ(> 0 )
kn
X
P E(Yn,j I[Yn,j >ε] | Fn,j−1 ) > δ Fn,j−1 −measurable
j=1
(τ )
Xn

≤ P {τn < kn } + P E(Yn,j I[Yn,j >ε] | Fn,j−1 ) > δ


j=1
(k )
X n

≤ P {Yn∗ > ε} + P I[τn ≥j] E(Yn,j I[Yn,j >ε] | Fn,j−1 ) > δ


j=1
(k )
X n

≤ P {Yn∗ > ε} + P E(Yn,j I[τn ≥j,Yn,j >ε] | Fn,j−1 ) > δ


j=1
kn
!
X
≤ ε−1 E(Yn∗ ) + δ −1 E Yn,j I[τn ≥j,Yn,j >ε]
j=1
kn
!
X
≤ ε−1 E(Yn∗ ) + δ −1 E Yn∗ I[τn ≥j,Yn,j >ε]
j=1
−1
≤ε E(Yn∗ ) +δ −1
E(Yn∗ ) → 0.

Corollary 1. Yn,j ≥ 0 is Fn,j -adaptive


kn
D D
X
If Yn∗ → 0 then P [Yn,j > ε | Fn,j−1 ] → 0, ∀ ε > 0
j=1

pf: Fix ε > 0

Let znj = I[Yn,j >ε] ≥ 0


zn∗ = max I[Yn,j >ε] = I[Yn∗ >ε]
1≤j≤kn
E(zn∗ ) = P [Yn∗ > ε] = 0(1)

62
kn
D
X
Therefore E(zn,j I[zn,j > 1 ] | Fn,j−1 ) → 0
2
j=1
kn
X
= E(I[Yn,j >ε] I[zn,j =1] | Fn,j−1 )
j=1
kn
X
= E(I[Yn,j >ε] | Fn,j−1 )
j=1
kn
X
= P (Yn,j > ε | Fn,j−1 ).
j=1

Corollary 2. Thm 3. is a corollary of Thm 30 .


pf: Let Yn,j =| Xn,j |
Then E(Yn∗ ) = E(Xn∗ ) → 0
kn
D
X
So that E(| Xn,j | I[|Xn,j |>ε] | Fn,j−1 ) → 0.
j=1
Corollary 3.
If (1) Yn,j ≥ 0 is Fn,j -adaptive
(2) | Yn,j |≤ C ∀ n, j
D
(3) Yn∗ → 0
kn
D
X
2
then E(Yn,j 2 >ε] | Fn,j−1 ) → 0
I[Yn,j
j=1
k
X n
2
pf: E(Yn,j 2 >ε] | Fn,j−1 )
I[Yn,j
j=1
√ D
≤ C 2 kj=1
Pn
P [Yn,j > ε | Fn,j−1 ] → 0 by (3) and Corollary 1.
kn
D
X
E(Yn,j I[Yn,j >ε] | Fn,j−1 ) → 0
j=1
kn
D
Pkn X
Vn = j=1 E(Yn,j | Fn,j−1 ) is tight ⇒| Yn,j − Vn |→ 0
j=1

63
pf. of Theorem 30
kn
X
Sn = Xn,i
i=1
Xkn kn
X
= Xn,i I[|Xn,i |≤1] + Xn,i I[|Xn,i |>1]
i=1 i=1

Let X
en,i = Xn,i I[X |≤1]
n,i

Note that
P [Xn,j 6= X
en,j , for some 1 ≤ j ≤ kn ]
≤ P [Xn∗ > 1] → 0 by (2)
D
So that Sn − Sen → 0
kn
D
X
2
and (1) gives Xen,j →C
j=1

en,j − E(X
X̄n,j = X en,j | Fn,j−1 )
kn
X
Sn − S̄n =
e E(Xn,j I[|Xn,j |≤1] | Fn,j−1 )
j=1
kn
X
= − E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) By martingale properties.
j=1
kn
D
X
So that | Sen − S̄n | ≤ | E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) |→ 0
j=1

Observe that

|Xen,j |≤ 1 ⇒| X̄n,j |≤ 2
So that sup E(X̄n∗ ) ≤ 2 [(3) is satisfied]
n

64

Xn = max | X
en,j − E(X
en,j | Fn,j−1 ) |
1≤j≤n

≤ max | X
en,j | + max | E(Xn,j I[|X |>1] | Fn,j−1 ) |
n,j
1≤j≤n 1≤j≤n
kn
X
≤ max | Xnj | + | E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) |
1≤j≤n
j=1

X kn kn
2 X
2
X n,j − Xn,j
e


j=1 j=1

X kn X kn
= −2 Xen,j E(X en,j | Fn,j−1 ) + E 2 (X
en,j | Fn,j−1 )


j=1 j=1
k kn
X n X
≤ 2 Xn,j E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) + E 2 (Xn,j I[|Xn,j |>1] | Fn,j−1 )
e

j=1 j=1

kn
! 1/2 kn
!1/2
X X
2
≤2 Xen,j E 2 (Xn,j I[|X |>1] | Fn,j−1 )
n,j
j=1 j=1
kn
X
+ E 2 (Xn,j I[|Xn,j |>1] | Fn,j−1 )
j=1

It is sufficient to show
kn
D
X
| E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) |2 → 0 (By the assumption ∀ 0 < δ < 1)
j=1
kn
(k )2
n
D
X X
| E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) |≤ | E(Xn,j I[|Xn,j |>1] | Fn,j−1 ) | →0
j=1 j=1

65
Homework: Assume that Xn,j is Fnj -measurable
kn
D
X
2
(1) E(Xn,j 2 >ε] | Fn,j−1 ) → 0
I[Xn,j
j=1
kn
D
X
(2) E(Xn,j | Fn,j−1 ) → 0
j=1
kn
D
X
2
(3) {E(Xn,j | Fn,j−1 ) − E 2 (Xn,j | Fn,j−1 )} → C > 0
j=1
kn
D
X
Then Sn = Xn,j → N (0, C)
j=1

Exponential Inequality:
Theorem 1 (Bennett0 inequality):
Assume that {Xn } is a martingale difference with respect to {Fn } and τ is an {Fn }-
stopping time (with possible value ∞). Let σn2 = E(Xn2 | Fn−1 ) for n ≥ 1. Assume

that ∃ positive constants U and V such that Xn ≤ U a.s. for n ≥ 1 and i=1 σi2 ≤ V
a.s., Then ∀ λ > 0
( τ )  
X 1 2 −1 −1
P Xi ≥ λ ≤ exp − λ V ψ(4λV )
i=1
2

where ψ(λ) = (2/λ2 )[(1 + λ)log(1 + λ) − λ], ψ(0) = 1.


Note:

n ∞

Z
X 1 x2 1 1 − λ2
(i) Xi / n =⇒ √ e− 2 dx ∼ √ e 2.
i=1
2π λ 2π λ

(ii) Prokhorov0 s “arcsinh” inequality:


Its upper bound is
 
1 −1 −1
h = exp − λ(2υ) arcsinh(υλ(2V )
2
where υλV ≈ 0, arcsinh[υλ(2V )−1 ] ∼
−1
= υλ(2V )−1
λ2
   
∼ 1 −1 −1
h = exp − λ(2υ) υλ(2V ) = exp −
2 8V

66
Reference: (i) Annals probability (1985).
Johson, Schechtman, and Zin.
(ii) Journal of theoretical probalility (1989) (Levental).
Corollary:(Bernsteins in equality).
τ  
X 1 2 1
P( Xi ≥ λ) ≤ exp − λ /(V + υλ)
i=1
2 3

proof:
λ
By ψ(λ) ≥ (1 + )−1 , ∀ λ > 0.
3
idea:(i) Note that on (τ = ∞)

X τ
X
since E(Xi2 | Fi−1 ) = σi2 ≤ V a.s.
i=1 i=1
τ
X
Xi coverges a.s. on(τ = ∞).
i=1

(By Chow0 s Theorem).


(ii) We can replace
( τ ) ( τ )
X X
P Xi ≥ λ by P Xi > λ
i=1 i=1
since λ > 0, δ > 0.
( τ )  
X 1 2 −1 −1
P Xi > λ + δ ≤ exp − (λ + δ) V ψ(υ(λ + δ)V
i=1
2

( τ )
X
Let δ ↓ 0. Left = P Xi ≥ λ
i=1
 
1 2 −1 −1
right = exp − λ V ψ(υλV )
2

67
(iii)
τ
X ∞
X n
X
Xi = Xi I[τ ≥i] = lim Xi I[τ ≥i] a.s. (By (i))
n→∞
i=1 i=1 i=1
 
τ
!
X  
P Xi > λ = E I X
 
τ 
Xi > λ]
i=1
 
[
i=1
 
 
≤ E lim inf I  (Fatou0 s Lemma)
 
n
n→∞  
X 


Xi I[τ ≥i] > λ
 
i=1
 
 
≤ lim inf E I X
 
n 
n→∞
Xi I[τ ≥i] > λ}
 
{
i=1

Therefore, it is sufficient to show that


n
!  
X 1 2 −1 −1
P Xi I[τ ≥i] > λ ≤ exp − λ V ψ(υλV ) , ∀ n
i=1
2

(iv) {Xi I[τ ≥i] , Fi } is a martingale difference sequence.


X
since [τ ≥ i] = Ω\ (τ = j)Fi−1 − measurable.
j<i
=⇒ E(Xi I[τ ≥i] | Fi−1 ) = I[τ ≥i] E(Xi | Fi−1 ) = 0

So that,
n
X n
X
Xi2 I[τ ≥i] I[τ ≥i] E(Xi2 | Fi−1 )

E | Fi−1 =
i=1 i=1
τ
X
≤ σi2 ≤ V.
i=1

68
and Xi I[τ ≥i] ≤ υ a.s.
Proof: Let Yi = Xi I[τ ≥i] .

E(etYi | Fi−1 ), t > 0 (etYi ≤ etυ )


∞ j j
!
X t Yi
= E 1 + tYi + | Fi−1
j=2
j!
∞ j
X t E[Yi2 | Fi−1 ] j−2
≤ 1+ υ , Yij = Yi2 Yij−2 ≤ Yi2 υ j−2
j=2
j!
∞ j
X t I[τ ≥i]
= 1+ σi2 υ j−2
j=2
j!

!
X tj υ j
= 1+ I[τ ≥i] σi2
j=2
j!υ 2
2
= 1 + g(t)I[τ ≥i] σi2 ≤ eg(t)I[τ ≥i] σi ,

where

g(t) = (etυ − 1 − tυ)/υ 2 , and


∞ j j ∞
X tυ X (υt)j
= − 1 − tυ = etυ − 1 − tυ
j=2
j! j=0
j!

Claim:
 
j j
X
I[τ ≥i] σi2 
X  
t Yi 
 g(t)
i=1
e i=1 /e

is a supermartingale.

69
proof:
 n
X n
X

 t Yi − g(t) I[τ ≥i] σi2 
E e i=1 i=1 | Fn−1 
 
 

n−1
X n
X
t Yi − g(t) I[τ ≥i] σi2
E etYn | Fn−1
 
=e i=1 i=1

n−1
X n−1
X
t Yi − g(t) I[τ ≥i] σi2
≤e i=1 i=1

n
X n
X
 n
X

t Yi t Yi g(t)V −

I[τ ≥i] σi2 
Ee i=1 ≤ Ee i=1 · e i=1
 n n 
X X
 t Yi − g(t) I[τ ≥i] σi2 
 g(t)V
= E e e
 i=1 i=1
 

n
!
X
≤ eg(t)V since V − I[τ ≥i] σi2 > 0
i=1

( n )
X  Pn 
−λt t Yi
P Yi > λ ≤ e E e i=1

i=1

≤ e−λt · eg(t)V = e−λt+g(t)V , ∀ t > 0,


( n )
X inf (−λt + g(t)V )
P Yi > λ ≤ et>0
i=1

Differentiate h(t) = −λt + g(t)V


we obtain the minmizer to = υ −1 log(1 + υλV −1 )

70
Therefore
( n )
X
P Yi > λ ≤ eh(to)
i=1
 2 
λ −1 −1
= exp − V ψ(υλV ) .
2

Note:
Pn   Pn 
Eet i=1 Yi
= E E et i=1 Yi | Fn−1

Remark:

(i) ψ(0+ ) = 1
(ii) ψ(λ) ∼= 2λ−1 logλ, as λ → ∞.
λ
(iii) ψ(λ) ≥ (1 + )−1 , ∀λ > 0.
3
Reference: Appendix of shorack and wellner (1986, p.852).
∀ λ>0
( τ τ
)  2 
X X
2 λ −1 −1
P Xi > λ, σi ≤ V ≤ exp − V ψ(υλV )
i=1 i=1
2

also holds.
Example:

X
V = σi2 < ∞
( ni=1 ) ( τ )
X X
P Xi > λ, for some n ≤ P Xi > λ
i=1 i=1
( n
)
X
Let τ = inf n: Xi > λ .
i=1

Theorem 2 (Hoeffding0 s inequality):


Let {Xn , Fn } be an adaptive sequence such that ai ≤ Xi ≤ bi , a.s.
and µi = E[Xi | Fi−1 ]

71
Then ∀ λ > 0,
( n n
)
2λ2
X X  
P Xi − µi ≥ λ ≤ exp − Pn 2
i=1 i−1 i=1 (bi − ai )

2n2 λ2
 

or P X̄n − µ̄n ≥ λ ≤ exp − Pn 2
i=1 (bi − ai )

proof: By convexity of etx , (t > 0)

bi − Xi tai Xi − ai tbi
etXi ≤ e + e
b i − ai b i − ai

bi − µi t(ai −µi ) µi − ai t(bi −µi )


E et(Xi −µi ) | Fi−1

≤ e + e
b i − ai b i − ai
= eL(hi )

where L(hi ) = −hi Pi + `n(1 − Pi + Pi ehi )


µ i − ai
hi = t(bi − ai ), Pi =
b i − ai
t(ai −µi )
+ Pi et(bi −µi )
 
L(hi ) = `n 1 − Pi )e
= `n et(ai −µi ) (1 − Pi ) + Pi et(bi −ai )
 

L0 (hi ) = −Pi + Pi / (1 − Pi )e−hi + Pi


 

Pi (1 − Pi )e−hi
L00 (hi ) = = ui (1 − ui )
[(1 − Pi )e−hi + Pi ]2

where 0 ≤ ui = Pi /[(1 − Pi )e−hi + Pi ] ≤ 1


1
L(hi ) = L(0) + L0 (0)hi + L00 (h∗i )h2i
2
1 0 1
≤ L(0) + L (0)hi + h2i
2 8
L(hi ) ≤ h2i /8 ≤ t2 (bi − ai )2 /8

t2 (bi − ai )2
 
t(Xi −µi )
So that E(e ) ≤ exp
8

72
n
X
t (Xi − µi )
Ee i=1

≤ E{E(· · · | Fn−1 )}
n−1
X
t (Xi − µi )
1 2
t (bi −ai )2
≤ e 8 Ee i=1
n
X
1 2
8
t (bi − ai )2
≤ e i=1

( n ) " n
#
X 1 X
So that P (Xi − µi ) > λ ≤ exp −λt + t2 (bi − ai )2
i=1
8 i=1

n
1 X
Leth(t) = −λt + t2 (bi − ai )2
8 i=1

n
X
minimizer t0 = 4λ (bi − ai )2
i=1

 2
  n
4λ 1 4λ  X
h(t0 ) = −λ n +  n
  (bi − ai )2
X 8 X 
(bi − ai )2 (bi − ai )2
  i=1
i=1 i=1
n
X
= −2λ2 (bi − ai )2
i=1

( n ) " n
X #
X
So that P (Xi − µi ) > λ ≤ exp −2λ2 (bi − ai )2
i=1 i=1

Application: yn = βXn + εn , where


Xn is Fn -measurable r.v.s.
εn i.i.d. with common distriburtion F.

73
εn is independent of Fn−1 ⊃ σ(ε1 , · · · , εn )
Eεn = 0, 0 < V ar(εn ) = σ 2 < ∞ .
Question : Test F = Fo (Ho )
Example : AR(1) process
yn = βyn−1 + εn , yo Fo − measurable
n
1X
F̂n (u) = I ,
n i=1 [yi −β̂n xi ≤u]

where β̂n an estimator of β based on {(y1 , x1 ), · · · , (yn , xn )}.


n
1X
idea : F̂n (u) ∼
= Fn (u) = I[ε ≤u] , if β̂n xi ∼
= βxi
n i=1 i
P
sup | Fn (u) − F0 (u) |→ 0
u
√ D o
n sup | Fn (u) − F0 (u) |→ sup |ω (t) |, (Under Ho )
u 0≤t≤1
o o
where ω (t) is the Brownian Bridge which is defined by ω (t) = w(t) − tw(1) and
w(t) is the Brownian Motion
(i) {w(ti ) − w(si )} are independent,
∀ 0 = s0 ≤ t0 ≤ s1 ≤ t1 ≤ · · · ≤ sn ≤ tn
(ii) w(t) − w(s) = N (0, t − s)
(iii) w(0) = 0
If the εn are independent and have a cemmon distribution function F (t). Then for
large n,
Fn (t, w) → F (t).
Glivenko-Cantelli theoren:
sup | Fn (t) − F (t) |→ 0 a.s.
0≤t≤1
n
1X
Fn (t) = I[ε ≤t]
n i=1 i
Basic Theorem:
If εi are i.i.d. U (0, 1).
Then
n
!
1 X  
αn (t) = √ I[εi ≤t] − F (t)
n i=1
D o
→ω (t) in D − space.

74
√ P
Wish : n sup | F̂n (u) − Fn (u) |→ 0 (In general, it is wrong)
u
√ D o
n sup | F̂n (u) − Fn (u) |→ sup |ω (t) |
u 0≤t≤1

Reject if n sup | F̂n (u) − Fn (u) |> Cα
u
Compare:
n
√ 1X P
(i) n sup | F̂n (u) − F (u + (β̂n − β)xi ) − Fn (u) + F (u) |→ 0 (right)
u n i=1
√ P
(ii) n sup | [F̂n (u) − F (u)] − [Fn (u) − F (u)] |→ 0 (It is wrong, in general)
u
n
1X
F̂n (u) = I
n i=1 [yi −β̂n xi ≤u]
n
1X
= I
n i=1 [εi ≤u+(β̂n −β)xi ]
F (c xi + u)
= E(I[εi ≤c xi +u] | Fi−1 )
(If C is constant, we can use the exponential bound).

n(F̂n (u) − F (u))
n
√ 1X
= n(F̂n (u) − F (·) − Fn (u) + F (u)) · · · (1)
n i=1
n
!
√ 1X
+ n F (·) − F (u) · · · (2)
n i=1

+ n(Fn (u) − F (u)) · · · (3)
In fact, tell us:
n
1 X
√ [F (u + (β̂n − β)xi ) − F (u)]
n i=1
n
∼ 1 X 0
=√ F (u)(β̂n − β)xi
n i=1
n
!
1 X
= F 0 (u) √ xi (β̂n − β) does not converge to zero.
n i=1

75
Example:

yi = βxi + εi , xi = 1, β̂n − β = ε̄n


n
!
1 X √ D
√ xi (β̂n − β) = n(ε̄n ) → N (0, 1)
n i=1

wish:(1) → 0p (1)
(2) → 0, and
D
known (3) → Wo (t), 0 ≤ t ≤ 1
Classical result: υ(0, 1) = F
Define: √
αn (t) = n(Fn (t) − t)
Oscillation modulus:

Wn (δ) = sup | αn (t) − αn (u) |


|t−u|≤δ

Lemma:∀ ε > 0, ∀ η > 0, ∃ δ and N 3 n ≥ N, P {Wn (δ) ≥ ε} ≤ η.


Reference:
Billingsley, (1968)
Convergence of probability measures. (Book).
Papers:
(i) W. Stute (1982, 1984). Ann. Prob. p.86-107, p.361-379.
(ii) The Oscillation behavior of empirical process.: The Multivariate case.
Key idea:
• If (β̂n − β) ∼
= C and u fixed. Then
n
1 X 
√ I(εi ≤Cxi +u) − F (Cxi + u) − I[εi ≤u] + F (u)
n i=1

n
X
By Yi , (Yi | Fi−1 ) ∼ b(1, Pi ) and exponential bound. Pi ∈ Fi−1 -measurable.
i=1
0
• Lemma: If k F∞ k, Then
n
√ X
n sup | I[εi ≤u+δni ] − F (u + δni ) − I[εi ≤u] + F (u) |
u
i=1
P 1
→ 0, if δn = op ( √ )
n

76
•(β̂n − β) = op (an ) PP
∃ c ∈ Cn lattice points and ∀x ∈ ( : square set) .
1
3 (c − x) sup | xi |= 0( √ )
1≤i≤n n
# (Cn ) ≤ nk .
wish:
n
√ 1X P
n sup | F̂n (u) − F (u + (β̂n − β)xi ) − Fn (u) + F (u) |→ 0
u n i=1

By
n
√ 1X
n sup sup | F̂n (u) − F (u + cxi ) − Fn (u) + F (u) |
u c∈Cn n i=1

∀ ε >0

X 
P n sup | F̂n (u) − · · · |> ε
u
c∈Cn
XX n√ o
≤ P n | F̂n (u) · · · |> ε
u∈Un c∈Cn
nε2
0
  0
≤ nk+k e− 2
t
. if #(Un ) ≤ nk

Question:
n
1 X 
√ I[εi ≤(β̂n −β)xi +u] − F ((β̂n − β)xi + u) − I[εi ≤u] + F (u)
n i=1

•(β̂n − β) = Op (an )
Yi = βXi + εi , εi i.i.d. with distribution ft. F
Xi ∈ Fi−1 -measurable, εi independent of Fi−1 .
n
1 X 
√ I[εi ≤δXi +u] − F (δXi + u) − I[εi ≤u] + F (u)
n i=1
n
1 X
=√ Yi
n i=1

77
Z Z
(a) E[Yi | Fi−1 ] = dF (ε) − F (δXi + u) − dF (ε) + F (u)
[ε≤δXi +u] [ε≤u]
= 0

(b) − 1 ≤ yi ≤ 1

Hoeffding0 s inequality: (not good)


{Yi , Fi } is a martingale difference.

−1 = ai ≤ Yi ≤ bi = 1
 

2n2 t2
 
 
P {Ȳn ≥ t} ≤ exp − n
 .

X
(bi − ai )2
 

( n i=1 ) 2
1 X 2n2 λn
So that P √ Yi ≥ λ ≤ 2e− 2n = 2exp[−λ2 ]
n i=1

— It can0 t reflect the true variance.


Bennett0 s inequality: (better)
τ
X
Yi ≤ υ, E(Yi2 | Fi−1 ) ≤ V
i=1
( τ )  2 
X t −1
P Yi ≥ t ≤ exp − ψ(U tV )
i=1
2V

E(Yi2 | Fi−1 ) = | F (δXi + u) − F (u) || 1 − (· · · ) |


≤ | F (δXi + u) − F (u) |
≤ k F 0 k∞ | δ || Xi |

78
n
X n
X
0
E[Yi2 | Fi−1 ] ≤k F k∞ | δ | | xi |
i=1 i−1
n
X n
X
x i εi x i εi
i=1 i=1 1
β̂n − β = n = ! 12 ! 12
X n n
x2i
X X
x2i x2i
i=1
i=1 i=1
n
x2i ∼
X
= a2n cn
i=1
n
X
n
| xi | n
! 12
1
X i=1
X
(β̂n − β) | xi |≈ Op (1) ! 12 ≤ n 2 x2i
n
i=1 X i=1
x2i
i=1


take V = nc, τ = n, υ = 1
( n
)
1 X
P |√ Yi |> λ
n i=1
 √
( nλ)2 √ √

≤ exp − √ ψ( nλ/ nc)
2 nc
 √ 2  
nλ λ
= exp − ψ
2c c
Law of the iteratived logarithm:
classical: Xn i.i.d., EXn = 0, 0 < V ar(Xn ) = σ 2 < ∞
Sn
lim sup √ = σ a.s.
n→∞ 2nloglogn



D
(a) Zn = Sn nσ ∼ N (0, 1)

p
Sn = Zn 2loglogn

79
(b) if m and n very closeness.
If Zm and Zn are very closeness.
n
!2
X
E Xi r
i=1 n 1 n
E(Zm Zn ) = 2
√ = 2√ == 2 .
σ mn σ mn σ m
n
(c) m
= 1c , c large enough.

n1 = c, n2 = c2 , · · · , nk = ck
Zn1 , Zn2 , · · · , Znk ' i.i.d.N (0, 1).

(d) if Yi is i.i.d. N(0,1)



p
lim sup Yn 2logn = 1 a.s.
n→∞

proof: ∀ ε > 0
p
P {Yn ≥ (1 + ε) 2logn i.o.} = 0
p
P {Yn ≥ (1 − ε) 2logn i.o.} = 1

By Borel-Contelli lemma, we only have to check



X p
P {Yn ≥ (1 + δ) 2logn} < ∞
n=1

X 1 2(1+δ)2 logn
∼ √ e− 2

n=1
(1 + δ) 2logn

X 1 1
= √ (1+δ) 2 < ∞ if δ > 0 .
n=1
(1 + δ) 2logn n

Zn,k
(e) lim sup √ = 1 a.s.
n→∞ 2logk
nk = ck , loglognk = logk + loglogc.

S ck
(f) lim sup p = 1 a.s.
k→∞ ck · 2 · loglogck

80
Sn
(g) lim sup √ = 1 a.s.
n→∞ σ 2nloglogn
Theorem A: Let {Xi , Fi } be a martingale difference such that | Xi |≤ υ a.s. and
n
X
s2n = E(Xi2 | Fi−1 ) → ∞ a.s.
i=1

Then
Sn
lim sup 1 ≤ 1 a.s.
n→∞ sn (2loglogs2n ) 2
where
n
X
Sn = Xi
i=1

Corollary:
Sn
lim inf 1 ≥ −1
n→∞ sn (2loglogs2n ) 2
| Sn |
and lim sup 1 ≤ 1 a.s.
n→∞ sn (2loglogs2n ) 2
proof: (theorem A)
c>1
∀ k, let Tk = inf {n : s2n+1 ≥ c2k }
So that Tk is a stopping time
Tk < ∞ a.s. since s2n → ∞ a.s.
Consider STk

a.s.
ST2k ≤c ,2k
ST2k c2k → 1.

Want to show:
n p o
k
(∗) P STk > (1 + ε)c 2logk, i.o. = 0
  
2 12
⇒ lim sup (STk STk (2loglogsTk ) )] ≤ 1 + ε a.s.
k→∞

81
By Bennett0 s inequality, let
p
λ = (1 + ε)ck 2logk, V = c2k , υ = υ

λ2
  
X υλ
(∗) ≤ exp − ψ
k=1
2V V
  k
√  
X∞ (1 + ε)2 c2k ψ υ(1+ε)cc2k 2logk 2logk
= exp − 2k

k=1
2c

X
≤ c0 exp[−(1 + ε0 )2 logk]
k=1

X 1
= c0 <∞
k=1
k (1+ε0 )2

1
Because (1 + ε)2 logk · ≥ (1 + ε0 )2 logk


υ(1+ε)ck 2logk
1+ c2k
∀ n, ∃ Tk , Tk+1 , s.t. Tk ≤ n ≤ Tk+1
Sn = STk + Sn − STk
S ST Sn − STk
p n ≤ p k + p
sn 2loglogsn2 sn 2loglogsn sn 2loglogs2n
2

Given ε > 0 , choose c > 1


So that ε2 /(c2 − 1) > 1
( n )
X
Xi I[Tk <i≤Tk+1 ] , Fn is a martingale
i=1
n
!
X
sup (Sn − STk ) ≤ sup Xi I[Tk <i≤Tk+1 ]
Tk <n≤Tk+1 1≤n<∞
i=1
Tk+1
X
Since E(Xi2 | Fi−1 ) = ST2k+1 − ST2k +1 ≤ c2(k+1) − c2k = c2k (c2 − 1).
i=Tk +1

Want to prove:
p
P{ sup (Sn − STk ) > εck 2logk, i.o.} = 0
Tk <n≤Tk+1
j
X p
pf : Def τ = inf {j : Xi I[Tk <i≤Tk+1 ] > εck 2logk}
i=1

82

( )
X p
k
P sup (Sn − STk ) > εc 2logk
Tk <n≤Tk+1
k=1

( τ )
X X p
= P Xi I[Tk <i≤Tk+1 ] > εck 2logk
k=1 i=1
∞  k√
ε2 c2k 2logk
 
X υc 2logk
≤ exp − 2 2k
ψ
k=1
2(c − 1)c (c2 − 1)c2k
∞  2  √
υ 2logkck

X ε logk
≤ exp − 2 ψ
k=1
c −1 (c2 − 1)c2k

when k is large, [ε2 /(c2 − 1)]ψ(·) ≥ 1 + δ, for some δ > 0.



X
0
≤ C exp[−(1 + δ) log k]
k=1

X
= C0 k −(1+δ) < ∞.
k=1

Reference:

1. W. Stout: A martingale analysis of kolmogorov0 s law of the iteratived logarithm.


Z.W. Verw. Geb. 15, 279∼290, (1970).

2. D.A. Freedman, Ann. Prob. (1975), 3, 100-118. On Tail Probability For


Martingale.

Exponential Centering:
X ∼ F, ∃ ϕ(t) = EetX

P {X > µ}
etx dF (x)
Z Z
= dF (x) = ϕ(t)e−tx
[x>µ] [x>µ] ϕ(t)
Z
= ϕ(t) e−tx dG(x)
[x>µ]

Under G, X have the mean=ψ 0 (t) and Variance=ψ 00 (t).


tx dF (x)
where ψ(t) = log ϕ(t), G(x) = e ϕ(t)

83
d
R
xetx etx dF ϕ0 (t)
Z Z
• xdG(x) = dF (x) = dt
= = [log ϕ(t)]0
ϕ(t) ϕ(t) ϕ(t)
= ψ 0 (t)
R
Similarly, for x2 dG(x).

So, P {x > u}
Z
= ϕ(t) e−tx dG(x)
[x>u]
Z
0 0
= ϕ(t)e−tψ (t)   e−t(x−ψ (t)) dG(x)
x−ψ 0 (t) u−ψ 0−1 (t)
√ > √
ψ 00 (t) ψ 00 (t)

0
Z √ 00
= eψ(t)−tψ (t)   e−t ψ (t)z dH(z)
u−ψ 0 (t)
z> √ 00
ψ (t)
p
where H(z) = G( ψ 00 (t)z + ψ 0 (t)).

Example: X ∼ N (0, 1)
t2
ϕ(t) = e− 2
ψ(t) = t2 /2, ψ 0 (t) = t, ψ 00 (t) = 1
Z
t2
−t2
P {X > u} = e 2 e−tz dH(z)
[z>u−t]
H(z) ∼ N (0, 1)

Simulation: t = u
2
Z
− u2
P {X > u} = e e−uz dH(z)
[z>0]

Exponential bound : t = u(1 + ε)


2
Z
− u2 (1+ε)2
P {X > u} = e e−u(1+ε)z dH(z)
[z>−εu]
2
Z
− u2 (1+ε)2
≥ e e−u(1+ε)z dH(z)
[0≥z>−εu]

Ref: R.B. Bahadur : Some limit theorems in statisties. SIAM.


Lemma 1: If E[X | F] = 0, E[X 2 | F] ≥ c > 0 and E[X 4 | F] ≤ d < ∞

84
Then P {X > 0 | F} ∧ P {X < 0 | F} ≥ c2 /4d.
proof:

E[X | F] = 0 ⇔ E[X + | F] = E[X − | F]


c
E[X 2 | F] ≥ c ⇔ E[X +2 | F] ≥ or
2
−2 c
E[X | F] ≥
2
2 c
Assume that:E(X + | F) ≥ 2

c 2 4 1
≤ E(X +2 | F) = E[(X + ) 3 · (X + ) 3 | F ] (Hölder inequality)
2
2 1
≤ E 3 (X + | F)(E(X + )4 ) 3

c
So that ( )3 ≤ E 2 (X + | F)E(X 4 | F)
2
c 3
( ) /d ≤ E 2 (X + | F)
2
c 3 1
( ) 2 /d 2 ≤ E(X + | F) = E(X + I[X>0] | F)
2
1 3
≤ E 4 (X 4 | F)E 4 (I[X>0] | F) (Hölder inequality)
1 3
≤ d 4 P 4 {X > 0 | F}
c
( )6 /d2 ≤ dP 3 {X > 0 | F}, implies
2
c2
P {X > 0 | F} ≥
4d
c 32
( )
Similarly, E(X − | F) ≥ 2 1 , and
d2
P {X < 0 | F} ≥ c2 /4d.

Lemma 2: !Assume that {εn , Fn } is a martingale difference sequence such that


Xn
E ε2i | Fo ≥ c2 > 0.
i=1

n
X
E(ε2i | Fi−1 ) ≤ c1 and sup | εi |≤ M a.s.
1≤i≤n
i=1

85
Then there is a universal constant B s.t.
( n ) ( n )
X X
p εi < 0 | Fo ∧ P εi > 0 | Fo
i=1 i=1
≥ Bc22 /(c21 +M )4

proof: (i) Burkholder-Gundy-Davis

p>0
" i
#
X
E ( sup | εj |P | Fo
1≤i≤n
j=1

n
! P2 
X
≤ kE  E(ε2j | Fj−1 ) | Fo 
j=1
  
P
+kE max | εi | | Fo
1≤j≤n

use: If E(XIA ) ≥ E(Y IA ), ∀AF, X ≥ 0, Y ≥ 0


Then E(X | F) ≥ E(Y | F) a.s.

p.f. : Let A = {E(X | F) < E(Y | F)}


E(XIA − Y IA ) = E{[E(X | F) − E(Y | F)]IA }
= E(XIA ) − E(Y IA )

⇒ P (A) = 0.
(ii) By a conditional version of B-G-D inequality take p=4.
 4 
X n
E  εj | Fo 


j=1
( )2 
X n
≤ kE  (ε2i | Fi−1 ) | Fo 
i=1
 
4
+kE max | εi | | Fo
1≤i≤n

≤ kc21 + kM 4 = k(c21 + M 4 )

86
By Lemma 1,
( n ) ( n )
X X
P εi > 0 | Fo ∧P εi < 0 | Fo
i=1 i=1
≥ c22 /4k(c21 + M 4 ) = Bc22 /(c21 + M 4 )
1
where B = 4k
n
!
X
use (i) E εi | Fo =0
i=1
 !2  !
n
X n
X
(ii) E εi | Fo  = E ε2i | Fo ≥ c2 .
i=1 i=1

Similarly,
( n ) ( n )
X X
P εi (−λ, 0) | Fo ∧P εi (0, λ) | Fo
i=1 i=1
≥ Bc22 /(c21 4
+ M ) − c1 /λ 2

(By Markov-inequality)
n
X
Let Sn = Xi
i=1
Assumptions:
(i) {Xi , Fi } is a martingale difference sequence.
(ii) P {| Xi |≤ d} = 1, ∀ 1 ≤ i ≤ n
Notations:
i
X
σi2 = E(Xi2 | Fi−1 ), s2i = σj2
j=1
−1 −2
g1 (x) = x (e − 1), g(x) = x (ex − 1 − x)
x

Conditional Exponential Centering:


idea : P {A | Fo } = E(E(· · · E(E(IA | Fn−1 ) | Fn−2 ) · · · | Fo ))
ϕi (t) = E[etXi | Fi−1 ], ψi (t) = logϕi (t)
Definition
(t)
Fi (x) = E[I[Xi ≤x] etXi | Fi−1 ]/ψi (t)

87
So that P {Sn > λ | Fo }
n
X
Z Z "Y
n
# −t xi
(t)
= ··· [ϕi (t)] e i=1 dFn(t) · · · dF1
[Sn >λ] i=1
n
X n
X
Z Z [ψi (t)] −t xi
(t)
= ··· e i=1 e i=1 dFn(t) · · · dF1
[Sn >λ]
n
X n
X
Z Z [ψi (t) − tψi0 (t)] −t (xi − ψi0 (t))
(t)
= ··· e i=1 e i=1 dFn(t) · · · dF1
[Sn >λ]

Under new measure,

E[Xi | Fi−1 ] = ψi0 (t).


V ar(Xi | Fi−1 ) = ψi00 (t).

Goal : Compute P {Sn > λ | Fo }


= E(I[Sn >λ] | Fo )
= E(E · · · E(I[Sn >λ] | Fn−1 ) | Fn−2 ) · · · | Fo )
(t) etXi
by dFi = dP [Xi ≤ x | Fi−1 ].
ϕi (t)

88
Now, if s2n ≤ M, g(−td) − t2 d2 g 2 (−td) − g1 (td) ≤ 0,
then P {Sn > λ | Fo }
Xn
Z Z Y" n
# −t xi
(t)
= ··· ϕi (t) e i=1 dFn(t) · · · dF1 , ∀t > 0
[Sn >λ] i=1
n
X n
X
Z Z ψi (t) − t xi
(t)
= ··· e i=1 i=1 dFn(t) · · · dF1
[Sn >λ]
n
X n
X
Z Z (ψi (t) − tψi0 (t)) −t (xi − ψi0 (t))
(t)
(∗∗) = ··· e i=1 e i=1 dFn(t) · · · dF1
[Sn >λ]
(t) (t)
under dFn , · · · dF1 ,

Z
(t)
E[Yi | Fi−1 ] = ydFi (y) = E[Xi etXi | Fi−1 ]
= [logϕi (t)]0 = ψi0 (t), and
V ar(Yi | Fi−1 ) = ψi00 (t).

•ψi0 (t) = E(Xi etXi | Fi−1 )


= E(Xi (etXi − 1) | Fi−1 )
etXi − 1
= E[tXi2 | Fi−1 ]
tXi
= tE[Xi2 g1 (tXi ) | Fi−1 ], where g1 (x) ↑ as x ↑ .

≤ tE[Xi2 g1 (td) | Fi−1 ] ≤ tσi2 g1 (td).
≥ tE[Xi2 g1 (−td) | Fi−1 ] ≥ tσi2 g1 (−td).
 x 
e −1
Since g1 (x) > 0, ≥ 0, ∀x
x
ϕ0i (t) ≥ 0, There ϕi (t) ≥ ϕi (0) = 1

89
• • ϕi (t) = E[etXi | Fi−1 ]
= E[1 + tXi + t2 Xi2 g(tXi ) | Fi−1 ]

≤ 1 + t2 σi2 g(td)
≥ 1 + t2 σi2 g(−td)

0
ϕi (t)
• • •ψi0 (t) =
ϕ (t)
(i
≤ tg1 (td)σi2
tg (−td)σi2
≥ 1+t1 2 σ2 g(td)
i

So, ψi (t) − tψi0 (t) ≥ logϕi (t) − t2 σi2 g1 (td).


≥ log[1 + t2 σi2 g(−td)] − t2 σi2 g1 (td)
2 2
(1 + u ≥ eu−u , u ≥ 0; eu (1 + u) ≥ eu )
≥ t2 σi2 g(−td) − t4 σi4 g 2 (−td) − t2 g1 (td)σi2
≥ t2 σi2 {g(−td) − t2 d2 g 2 (−td) − g1 (td)}
Because σi2 = E(Xi2 | Fi−1 ) ≤ d2

, and
n
X
(ψi (t) − tψi0 (t))
i=1
≥ t2 s2n {g(−td) − t2 d2 g 2 (−td) − g1 (td)}
Xn
Because σi2 = s2n .
i=1

90
Thus,
2 M [g(−td)−t2 d2 g 2 (−td)−g (td)]
(∗∗) ≥ et 1

n
X
Z Z −t (xi − ψi0 (t))
(t)
· ··· e dFn(t) · · · dF1 .
i=1
[Sn >λ]
n n
!
X X
[Sn > λ] = [Sn − ψi0 (t) > λ − ψi0 (t)].
i=1 i=1
 n 
X
Z Z −t

(xi − ψi0 (t))
≥ ··· h i e i=1 dFn(t) · · · dFn(t)
tmg(−td)
Sn − n 0
P
i=1 ψi (t)≥λ− 1+t2 d2 g(td)
2 2 2 2
·et M [g(−td)−t d g (−td)−g1 (td)]
2 2 d2 g 2 (−td)−g (td)]
≥Z et M [g(−td)−t
Z
1

(t)
· ··· h i 1dFn(t) · · · dF1 .
tmg(−td)
0≥Sn − n 0
P
i=1 ψi (t)≥λ− 1+t2 d2 g(td)

• • • • ϕ00i (t) = E(Xi2 etXi | Fi−1 )


≤ E(Xi2 etd | Fi−1 ) = etd σi2
≥ e−td σi2
ϕ00i (t)
ψi00 (t) = − (ψi0 (t))2
ϕi (t)
≤ ϕ00i (t)/ϕi (t) ≤ ϕ00i (t) ≤ etd σi2
e−td σi2
≥ − t2 g12 (td)σi4
1 + t2 σi2 g(td)
2 σ 2 g(td)
≥ σi2 e−td e−t i − t2 g12 (td)σi4
2 d2 g(td)
≥ σi2 [e−td−t − t2 d2 g12 (td)].

n
X
So, ψi00 (t)
i=1
≤ s2n etd

= 2 2
≥ s2n {e−td−t d g(td) − t2 d2 g12 (td)}

91
√ √
Replace t by t/ M and λ by (1 − r) M t.

(∗ ∗ ∗) P {Sn > (1 − r) M t | Fo }
√ √
t2 {g(−td/ M )−(td/ M )2 g 2 (− √td )−g1 ( √td )}
≥ eZ M M
Z √ √
(t/ M )
· ··· " Pn    √ m √t

g(−td/ M )
# dFn(t/ M)
· · · dF1
0 √t M
0≥ i=1 xi −ψi ≥(1−r) M t− 2 √
M 1+ t d2 g 2 (td/ M )
M
√ 2d
Let εi = Xi − ψi0 (t/ M ), | εi |≤ √
M
n 2 √ √
X s
(1) E(ε2i | Fi−1 ) ≤ n e(td/ M ) ≤ etd/ M = c1
i=1
M
" n # " n #
X X √
(2) E ε2i | Fo = E (Xi − ψi0 (t/ M ))2 | Fo
i=1 i=1
t2 d2 2 td
 
M1 td td m t td
≥ − 2 √ g1 √ + √ g1 (− √ )/(1 + g ( √ ))
M M M M M M M M
Thus,
2
√ √ 2 2
√ √
(∗ ∗ ∗) ≥ et [g(td/ M )−(td/ M ) g (−td/ M )−g1 (td/ M )]
h √ √ √
 M M
1
− (2td/ M )g 1 (td/ M ) + m √t
M M
g(−td/ M) · B
· √ √
 e2td/ M + (2d/ M )4
√ )
etd/ M
− m
√ 2 2

t2 [(1 − r) − M tg(−td/ M )/(1 + tMd g 2 (td/ M ))]2

Let t → ∞, and td/ M → 0
n
X
Assume that E(Xi2 | Fo ) ≥ M1 > 0, and let M1 /M → 1, m/M → 1,
i=1
n
X
m≤ E(Xi2 | Fi−1 ) ≤ M , and 1 − (m/M ) < r
i=1
Then √
P {Sn > (1 − r) M t | Fo }
t2
≥ e− 2 (1+0(1)) · B(1 + 0(1))
In summary:

92
For each n, {Xn,i , Fu,i , i = 1, 2, · · · , n} is a martingale difference such that

(1) sup | Xn,i |≤ dn , dn increasing.


n
n
X n
X
2 2
(2) mn ≤ E(Xn,i | Fn,(i−1) ) ≤ Mn , E(Xn,i | Fn,o ) ≥ Mn,1 ,
i=1 i=1
where mn /Mn → 1, Mn,1 /Mn → 1

If tn → ∞, and tn dn / Mn → 0
then
( n )
X p
P Xn,i > (1 − r) Mn tn | Fno
i=1
t2
n
≥ e− 2 (1+0(1)) · C(1 + 0(1))
( n
)
X
Theorem: Assume that Sn = Xi , Fn is a martingale
i=1
such that sup | Xn |≤ d < ∞ a.s.
1≤n<∞
n
X
Let σi2 = E(Xi2 | Fi−1 ) and s2n = σi2
i=1
If s2n → ∞ a.s., then

lim sup Sn /(2s2n loglogs2n )1/2 = 1 a.s.


n→∞

proof: (i) “≤ ” is already shown.


(ii) To show “≥ 1”.
we only have to show that ∀ ε > 0, ∃ nk 3

P {Snk > (1 − ε)(2s2nk loglogs2nk )1/2 i.o.} = 1


Given c > 1, let τk = {n : s2n+1 ≥ ck }

τk is a stopping time, since


n+1
X
s2n+1 = E(Xi2 | Fi−1 ) is Fn measurable.
i=1

Note that

s2τk < ck , s2τk+1 ≥ ck

93
(1) s2τk+1 − s2τk ≤ ck+1 − s2τk +1 − στ2k +1
 

≤ ck+1 − ck + d2
 
(2) s2τk+1 − s2τk ≥ s2τk+1 +1 − στ2k+1 +1 − ck
≥ ck+1 − d2 − ck

By in summary,
τk+1 ∞
X X
Sτk+1 − Sτk = Xi = Xi I[τk <i≤τk+1 ]
i=τk +1 i=1

P {Sτk+1 − Sτk > (1 − δ)(2s2τk+1 loglogs2τk+1 )1/2 | Fτk }


≥ P {Sτk+1 − Sτk > (1 − δ)(2ck+1 loglogck+1 )1/2 | Fτk }
1−δ
(∗) = P {Sτk+1 − Sτk > (1 − r)( )(2ck+1 loglogck+1 )1/2 | Fτk }
1−r
let r = δ/2 and choose c so that
r
1−δ √
r
1−δ 1 d2
< 1 − , implies ≤ 1 − c−1 ≤ 1 − c−1 + k+1
1 − δ/2 c 1−r c
k+1 k 2 k+1 2 k
Mk = c − c + d , mk = c −d −c
k+1 1/2
(2loglogc ) 1−δ
tk = (
d2 1/2 1 − r
)
−1
(1 − c + ck+1 )
< α(2loglogck+1 )1/2 , 0 < α < 1.

p
(∗) = P {Sτk+1 − Sτk > (1 − r) Mk tk | Fτk }
t2
k
≥ e− 2 (1+0(1)) B(1 + 0(1))
2 loglogck+1 (1+0(1))
≥ B(1 + 0(1))e−α
2
≥ B(1 + 0(1))((k + 1)α (1+0(1)) )−1

X p
So that, P {Sτk+1 − Sτk > (1 − r) Mk tk | Fτk } = ∞ a.s.
k=1

So that, P {Sτk+1 − Sτk > (1 − δ)(2s2τk+1 loglogs2τk+1 )1/2 i.o. | Fτk }


=1

94
But

Sτk Sτk (s2τk loglog × s2τk )1/2


 1/2 = 1/2  1/2
2s2τk+1 loglogs2τk+1 2s2τk loglogs2τk sτk+1 loglog × s2τk+1
(s2τk loglogs2τk )1/2 (ck loglogck )1/2
1/2 ≤
((ck+1 − d2 )loglogck+1 )1/2

s2τk+1 loglogs2τk+1
≤ (1/(c − d2 /ck ))1/2 → 0, as c → ∞
≤ δ (choose c so that)

So that, with c choosen,

lim sup Sτk+1 /(2s2τk+1 loglogs2τk+1 )


k→∞
Sτk+1 − Sτk
≥ lim sup 2
k→∞ (2sτk+1 loglog s2τk+1 )1/2
+ lim sup Sτk /(2s2τk+1 loglogs2τk+1 )1/2
k→∞
≥ (1 − δ) + (−1)δ = 1 − 2δ
By lim sup(an + bn )
n→∞
≥ lim sup an + lim inf bn .
n→∞ n→∞

History of L.I.L.:
Step 1:

{Xi } i.i.d. P {Xi = 1} = P {Xi = −1} = 1/2


X n
Sn = Xi
i=1
s2n = n
1
(1913) Hausdorff: Sn = O(n 2 +ε ) a.s.
(By moment and chebyshev0 s inequality).
(1914) Hardy-Littlewood:

Sn = O((n log n)1/2 )


x2
(By e− 2 or e−x/2 )

95
(1922) Steinhauss:

lim sup Sn /(2nlogn)1/2 ≤ 1 a.s.


n→∞

(1923) Khinchine:

Sn = O((n loglogn)1/2 )

(1924) Khinchine:

lim sup Sn /(2n loglog n)1/2 = 1 a.s.


n→∞

step 2:
(1929) Kolmogorov:
n
X
0
Xi indep. r.v .s EXi = 0, s2n = EXi2
i=1

sn
(i) sup | Xk |≤ kn
1≤k≤n (loglogs2n )1/2
(ii) kn → 0, s2n → ∞.

Then

lim sup Sn /(2s2n loglogs2n )1/2 = 1 a.s.


n→∞

(1937) Marcinkewicz and Zygmund:


Given an example:
n
X 1
Sn = ci εi , P (εi = −1) = P (εi = 1) = ,
i=1
2
{εn } i.i.d.
kn sn
{cn } is choosen, so that kn → k > 0, | cn |≤ .
(2 loglogs2n )1/2
They showed that

lim sup Sn /(2s2n loglogs2n )1/2 < 1 a.s.


n→∞

(1941) Hartman and Witter


Xi i.i.d. EXi = 0, V ar(Xi ) = σ 2 .

96
Step 3:
(196?) Strassen:
Xi i.i.d, EXi = 0, V ar(Xi ) = 1.
limit of Sn /(2loglogn) is {-1, 1}.
Wn is a Brownian Motion
1 1
| Sn − Wn |= 0 (n 2 (loglogn) 2 ).
Construct a Brownian Motion W (t) and stopping time τ1 , τ2 , · · · so that
( n
)
D
X
Sn = W ( τi ), n = 1, 2, · · ·
i=1
n
!
X
| Sn − Wn |=| W τi − Wn |
i=1

(1965) Strassen:
Xi independent case and special martingale.
(1970) W.F. Stout:
Martingale
( Version)of Kolmogorov0 s Law of Iterated Logarithm. Z.W.V.G. 15, 279∼290.
Xn
Xn = Yi , Fn is a martingale.
i=1
n
X
s2n = E[Yi2 | Fi−1 ]
i=1
1
If s2n → ∞ a.s. and | Yn |≤ kn sn /(2log2 s2n ) 2 a.s.
where kn is Fn−1 -measurable and lim kn = 0
n→∞

Then lim sup Xn /(sn un ) = 1 a.s.


n→∞
1 1
un = (2log2 s2n ) 2 ≡ (2loglogs2n ) 2 .
(1979) H. Teicher:
Z.W.V.G. 48, p.293-307.
Indepent Xi , P {| Xn |≤ dn } = 1.
1
dn (log2 s2n ) 2
lim =a≥0
n→∞ sn
(a=0, Kolmogorov0 s condition)
n
2 12
√ o
P lim Sn /sn (2log2 sn ) = c/ 2 = 1
n→∞

97
1
where 0.3533/a ≤ c ≤ min[ + bg(a, b)]
b>0 b
(1986) E. Fisher:
Sankhyea, Series A, 48, p.267∼ 272.
Martingale Version:

lim sup kn < k. a.s.


n→∞

n
1
X
implies lim sup Yi /sn (2log2 s2n ) 2 ≤ 1 + ε(k).
n→∞
i=1

k/4, if 0 < k ≤ 1
where ε(k) =
(3 + 2k 2 )/4k − 1, if k > 1.

This bound is not as good as Teicher0 s bounds.


Problems:

1. Do we have a martingale version of Teicher0 s result?



2. M-Z. implies c/ 2 < 1.
Teicher0 s result does not imply this.
How to interprate M-Z phenomenon?

3. Can we extend martingale0 s result to the double arrays of martingale differences


Sn ?

X
Sn = ani εi .
−∞
Lai and Wei (1982). Annals prob. 10, 320∼ 335.

Papers:
D. Freedman (1973). Annals probability, 1, 910∼925.
Basic assumptions:

(i) Fo ⊂ F1 ⊂ · · · ⊂ Fn · · · (σ −fields)
(ii) Xn is Fn −measurable, n ≥ 1.
(iii) 0 ≤ Xn ≤ 1 a.s.
Xn n
X
Sn = Xi , Mi = E[Xi | Fi−1 ], Tn = Mi .
i=1 i=1

98
Theorem: Let τ be a stopping time

(i) If 0 ≤ a ≤ b, then
( τ τ
)
X X
P Xi ≤ a and Mi ≥ b
i=1
i=1
(a − b)2

a a−b
≤ (b/a) e ≤ exp −
2c
, where c = a ∨ b = max{a, b}.
(ii) If 0 ≤ b ≤ a, then
( τ τ
)
X X
P Xi ≥ a and Mn ≤ b
i=1
i=1
(b − a)2

≤ (b/a)a ea−b ≤ exp − , where c = aV b.
2c

Lemma:
P 0 ≤ X ≤ 1 is a r.v. on (Ω , F, P).
Let be a sub-σ-field
P of F.
Let M = E{X | } and h be a real number.
Then
X
E{exp(hX) | } ≤ exp[M (eh − 1)]

proof: f (x) = exp(hx), f 00 (x) = h2 ehx ≥ 0


So f (x) is convex.

ehX = f (x) ≤ f (0)(1 − x) + f (1)x.


= (1 − x) + eh x

X X
E[ehX | ] ≤ E[(1 − X) + eh X | ]
= (1 − M ) + eh M
h −1)M
= 1 + (eh − 1)M ≤ e(e .

(Because 1 − x ≤ ex , ∀x).
Corollary : For each h, define Rn (m, x) = exp[hx − (eh − 1)m].
Then
Rh (Tn , Sn ) is a super-martingale.

99
proof:

Rh (Tn , Sn ) = Rh (Tn−1 , Sn−1 ) exp[hXn − (eh − 1)Mn ]


So E[Rh (Tn , Sn ) | Fn−1 ]
≤ Rh (Tn−1 , Sn−1 )E[exp hXn | Fn−1 ] exp[−(eh − 1)Mn ]
≤ Rh (Tn−1 , Sn−1 ) (By lemma).

In the following, we use exp(∞) = ∞, exp(−∞) = 0, then


Rh (m, x) is a continuous function on [0, ∞]2 − (∞, ∞).
Lemma: Let τ be a stopping time.

G = {Tτ < ∞ or Sτ < ∞}


Z
Then Rh (Tτ , Sτ )dP ≤ 1.
G

proof: By the super-martingale property,


∀ n, ERh (Tτ ∧n , Sτ ∧n ) ≤ 1.

So that, 1 ≥ lim inf E[Rh (Tτ ∧n , Sτ ∧n )]


n→∞
h i
≥ E lim inf Rh (Tτ ∧n , Sτ ∧n )
n→∞
(Fautou0 s Lemma).
Z
≥ lim inf Rh (Tτ ∧n , Sτ ∧n )
G n→∞
Z
= Rh (Tτ , Sτ )dP.
G

proof of the theorem:



1, if m ≥ b and x ≤ a, ∀ (m, x)[0, ∞]2
Let u(m, x) =
0, o.w.

Qh (m, x) = exp[ha − (1 − e−h )b] R−h (m, x),


∀ (m, x)[0, ∞]2 − (∞, ∞), ∀ h ≥ 0

100
Then
P {S ≤ a and Tτ ≥ b}
Z τ
= u(Tτ , Sτ )dP
Z
= (Tτ , Sτ )dP, G = {Tτ < ∞ or Sτ < ∞}
ZG

≤ Qh (Tτ , Sτ )dP ( Qh ≥ u)
G
(Qh (m, x) = exp[−h(x − a) + (1 − e−h )(m − b)] ≥ 1, if m ≥ b and x < a)
Z
= Qh (0, 0) R−h (Tτ , Sτ )dP
G
≤ Qh (0, 0)

So P {Sτ ≤ a and Tτ ≥ b} ≤ inf Qh (0, 0)


h≥0

= inf exp[ha − (1 − e−h )b]


 
−h
= exp inf [ha − (1 − e )b .
h≥0
−h −h
d/dh[ha − (1 − e )b] = a − e b
minimum point ho satisfies eho = b/a
So that, min Qh (0, 0) = exp(ho a) · exp[be−ho − b]
h≥0

= (eho )a exp[(eho )−1 b − b]


a
= (b/a)a exp[ · b − b]
b

So, P {Sτ ≤ a and Tτ ≥ b}


≤ (b/a)a e(a−b) .
Another one: Let

1, if m ≤ b and x ≥ a
u(m, x) =
0, o.w.

h ≥ 0, Qh (m, x) = exp[−ha + (eh − 1)b]Rh (m, x)


G = {Tτ < ∞ and Sτ < ∞}, a > 0.

101
Lemma1 : a ≥ 0, b ≥ 0, c = a ∨ b
Then (b/a)a ea−b ≤ exp[−(a − b)2 /2c]

0 1 1−ε −ε
Lemma1 : 0 < ε < 1, f (ε) = ( ) e , g(ε) = (1 − ε)eε .
1−ε
we have
f (ε) < exp[−ε2 /2] < 1 and
g(ε) < exp[−ε2 /2] < 1.

proof : log f (ε) = −(1 − ε) log(1 − ε) − ε


x2 x3
(Because − log(1 − x) = x + + + · · · , 0 < x < 1).
2 3
ε2 ε3
= (1 − ε)[ε + + + · · · ] − ε
2 3
ε2 ε3 ε2
= [ε + + + · · · ] − ε2 − − · · · − ε
2 3 2
2
≤ −ε /2
log g(ε) = log(1 − ε) + ε
ε2 ε3
= −(ε + + + · · · ) + ε
2 3
2
≤ −ε /2

proof of Lemma 1:
(i) a=b (trivial).
(ii) case 1: 0 < a < b, let ε = (b − a)/b = 1 − a/b.

(b/a)a ea−b = [(1 − ε)−1 ](1−ε)b eb (−ε)


= [(1/(1 − ε))1−ε e−ε ]b
ε2
= f b (ε) ≤ exp[−b ]
2
(b − a)2

= exp −b = exp[−(b − a)2 /2b]
2b2

102
case 2: 0 < b < a.
ε = (a − b)/a = 1 − b/a
(b/a)a ea−b = (1 − ε)a eaε
ε2
 
a
= g (ε) ≤ exp −a
2
(a − b)2
  
= exp −a ·
2a2
(a − b)2
 
= exp −
2a
If 0 ≤ a ≤ b then
( τ τ
)
X X
P Xi ≤ a and Mi ≥ b ≤ exp[−(a − b)2 /2(a ∨ b)]
i=1 i=1

Application:
Let Xn = ρXn−1 + εn , n = 1, 2, · · · , | ρ |< 1.
{εn , Fn } is a martingale difference sequence such that E[ε2n | Fn−1 ] = σ 2 , and
sup E[(ε2n )p | Fn−1 ] ≤ c < ∞
n

where p > 1 , c is a constant.


we know that
n
X
2
(i) 1/n Xi−1 → c2 a.s.
i=1
n
! 12
D
X
2
(ii) Xi−1 (ρ̂n − ρ) → N (0, σ 2 )
i=1

where ρ̂n is the L.S.E. of ρ.


Question: when Xi is random variable
n
!
n→∞
X
E Xi−1 (ρ̂n − ρ)2 −→ σ 2 ?
2

i=1
n
!−1 n
!
X X
ρ̂n − ρ = Xi2 Xi−1 εi

n
!i=1 i=1
2
( ni=1 Xi−1 εi )
X P
2 2
Xi−1 (ρ̂n − ρ) = Pn 2
i=1 i=1 Xi−1

103
n
X
2
difficult: Xi−1 is a random variable.
i=1
This problem how to calculate.
The corresponding χ2 -statistic is
n n
!2  n
X X X
2 2 2
Qn = Xi−1 (ρ̂n − ρ) = Xi−1 εi Xi−1 (Cauchy − Schwarz inequality)
i=1 i=1 i=1

n
!1/2 n
!1/2 2
X X
2
 Xi−1 ε2i 
n
i=1 i=1 X
≤ n = ε2i
X
2 i=1
Xi−1
i=1
?
E(Qpn ) → σ 2p E | N (0, 1) |2p .

A sufficient condition is to show {Qpn } is uniformly integrable. It is sufficient to


show that
0
∃ p0 > p 3 sup E[Qpn ] < ∞
n

Assume that ∃ q > p 3 E | Qn |2q < ∞.

104
Ideas:

(i) ε2i = (Xi − ρXi−1 )2 ≤ 2(Xi2 + ρ2 Xi−1


2
)
n n n
!
X X X
ε2i ≤ 2 Xi2 + ρ2 2
Xi−1
i=1 i=1 i=1
n+1
!
X
≤ 2(1 + ρ2 ) 2
Xi−1
i=1
n−1 n
!
X X
So that ε2i ≤ 2(1 + ρ2 ) 2
Xi−1
i=1 i=1
n
X
(ii) Qn ≤ (Xi − ρXi−1 )2
i=1
n
X
= ε2i
i=1
n
X
implies Qn ≤ ε2i
i=1
n
X n
X
Since ε2i = (Xi − ρ̂n Xi−1 )2 + Qn
i=1 i=1
Pn 2
2(1 + ρ2 ) ( ni=1 Xi−1 εi )
P
i=1 Xi−1 εi
implies Pn 2
≤ Pn−1 2
i=1 Xi−1 i=1 εi
n
!
2
2(1 + ρ2 ) ( ni=1 Xi−1 εi )
X P
2
(iii) Qn ≤ εi IAn + Pn−1 2 IAn c
i=1 i=1 εi
↑ ↑
(By (ii)) (By (i))

(iv) Let 0 < τ < σ 2 , choose k so that


  h i
E ε2i I[ε2i ≤k] = σ 2 − E ε2i I[ε2i >k]
E | εi |2q E | εi |2q
≥ σ2 − , let α = σ 2

k 2q−2 k 2q−2
> τ.

105
( n )
X
Then P ε2i ≤ nτ
i=1
( n )
X
≤P ε2i I[ε2i ≤k] ≤ nτ
( i=1
n
)
X
≤P (ε2i /k)I[ε2i ≤k] ≤ n τ /k
("i=1n # " n
#)
ε2i
 
X X n
=P (ε2i /k)I[ε2i /k≤1] ≤ n τ /k , E I 2 | Fi−1 ≥ α
i=1 i=1
k [εi /k<1] k
 h n i n 
nE ε2i /k
I[εi /k≤1] ≥ α > τ
2
k  k
((n/k) α − nk τ )2

≤ exp −
2( nk α)
= exp[−n(α − τ )2 /2kα]
n
(α − τ )2
 
= exp − = r−n
2kα
(α − τ )2
 
r = exp > 1.
2kα

106
" n−1 #
X
(v) Let An = ε2i ≤ (n − 1)τ , and q > p0 > p ≥ 1.
i=1
 !p0 1/p0
 n
X 
E ε2i IAn
 
i=1
n  1/p0
0
X
≤ E(ε2i )p IAn
i=1
n
X p0 1 1 1 1
≤ ((E[ε2i ]q ) q (EIAn
s
) s ) p0 , + = 1.
i=1
q s
(Hölder inequality)
1 1
≤ E(ε2i )q q · n{p(An)} sp0

1
−n sp
≤c·n·r → 0.
0

Pn 2p0
0 0E | Xi−1 ε i |
(vi) EQpn IAc n ≤c i=1
(n − 1)p0
Recall : (1987) Wei, Ann. Stat. 1667∼ 1687.
X n
Xn = ui εi , ui − Fi−1 measurable.
i=1
{εi , Fi } is a martingale difference sequence.
p≥2
sup E{| εn |p | Fn−1 } ≤ c a.s.
n
  n
! p2
X
Then E sup | Xi |p ≤kE u2i
1≤i≤n
i=1
, k depends only on p, c.
2p0 !p0
Xn n
X
2
So, E Xi−1 εi ≤ k E Xi−1


i=1 i=1
n
X 0
≤kk 2
Xi−1 kpp0
i=1
n
!p0
X
≤k k Xi kp0
i=1

107
Now, Xn = ρXn−1 + εn = εn + ρεn−1 + · · · + ρn−1 ε1 + ρn Xo
= Yn + ρn Xo .

0
E | Yn + ρn Xo |2p
0 0 0
≤ 22p [E | Yn |2p +(| ρ |n | Xo |)2p ]
0
It is sufficient to show that sup E | Yn |2p < ∞
n
Since this implies
2p
Xn
0
E Xi−1 εi = O(np ) and


i=1
0
E[Qpn IAcn ] = O(1)

By the same inequality again,


0 0
E | Yn |2p = E | εn + ρεn−1 + · · · + ρn−1 ε1 |2p
0
≤ k E(12 + ρ2 + · · · + ρ2n−2 )p
p0
1 − ρ2n

= k
1 − ρ2
0
≤ k[1/(1 − ρ2 )p ] < ∞

108
Chapter 2

Stochastic Regression Theory

2.1 Introduction:
Model yn = β1 xn,1 + · · · + βn xn,p + εn
where {εn , Fn } is a martingale difference sequence and ~x = (xn,1 , · · · , xn,p ) is Fn−1
-measurable.
~
Issue: Based on the observations {~x1 , y1 , · · · , ~xn , yn }, make inference on β.
Examples:
(i) Classical Regression Model
(Fixed Design, i.e. ~x0i s are constant vectors).
(ii) Time series: AR(p) model
yn = β1 yn−1 + β2 yn−2 + · · · βp yn−p + εn
where εn are i.i.d. N (0, σ 2 ).
~xn = (yn−1 , · · · , yn−p )0 .
(iii) Input-Output Dynamic System.
(1) System Identification (Economic of Control)

yn = α1 yn−1 + · · · + αp yn−p + β1 un−1 + · · · + βq un−q + εn


~xn = (yn−1 , · · · , yn−p , un−1 , · · · , un−q )0
~un = (un−1 , · · · , un−q )0 ∼ exogeneous variable

(2) Control:
~u Fn−1 -measurable.
Example:
yn = αyn−1 + βun−1 + εn
Goal: yn ≡ T, T fixed constant.
If α, β are known.

109
After observing {u1 , y1 , · · · , un−1 , yn−1 }
Define un−1 so that
T − αyn−1
T = αyn−1 + βun−1 , i.e. un−1 = , (β 6= 0)
β
 Fn−1 −measurable.

If α, β unknown:
Based on {u1 , y1 , · · · , un−1 , yn−1 }
Let α and β (say by α̂n−1 , β̂n−1 ).
Define un−1 = T −α̂β̂n−1 yn−1
n−1
Question:
Is the system under control?
Xm
Is m1
(yn − εn − T )2 small?
n=1
(iv) Transformed Model:
Xn
X
Branching Pocess with Immigration: Xn+1 = Yn+1,i + In+1
i=1
Xn : the population size of n-th generation.
Yn+1,i : the size of the decends of i-th number in n-th generation.
In+1,i : the size of the immigration in (n+1)th generation.
Assumptions:

(i) {Yn,i , 1 ≤ n < ∞, 1 ≤ i < ∞} are i.i.d. random variables.


with m = EYn,i , σ 2 = EYn,i2

(ii) {In } i.i.d. r.v. with b = EIn , V ar(In ) = σI2


(iii) {In } is independent of {Yn,i }

110
Xn
X
E(Xn+1 | Fn ) = E[Yn+1,i | Fn ] + E[In+1 | Fn ]
i=1
= mXn + b
Xn
X
V ar(Xn+1 | Fn ) = (E((Yn+1,i − m)2 | Fn ))
i=1
+E((In+1 − b)2 | Fn )
= Xn σ 2 + σI2
Xn
X
(Yn+1,i − m) + (In+1 − b)
i=1
Let εn+1 = p
σ 2 Xn + σI2

Then {εn , Fn } is a martingale difference sequence with E[ε2n | Fn−1 ] = 1.


The model becomes
q
Xn+1 = mXn + b + ( σ 2 Xn + σI2 )εn+1

If σ 2 and σI2 are known,


1 Xn 1
Yn+1 = Xn+1 /(σ 2 Xn + σI2 ) 2 = m p + bp + εn+1
σ 2 Xn + σI2 σ 2 Xi + σI2

In general we may use


1 Xn 1
Yn+1 = Xn+1 /(1 + Xn ) 2 = m √ + b√ + ε0n+1
1 + Xn 1 + Xn
s
0 σ 2 Xn + σI2
where εn+1 = εn+1 ,
1 + Xn
σ 2 Xn + σI2
V ar(ε0n | Fn−1 ) = ≤ c.
1 + Xn
In both cases, the inference on m and b can be handed by the Stochastic Regres-
sion Theory.
Reference:
Least Squares Estimation Stochastic Regression Models with Applications to Identi-
fication and Control of Dynamic Systems.

111
T.L. Lai and C.Z. Wei (1982).
Ann. Stat., 10, 154 ∼ 166.
Model: yi = β~ 0~xi + εi
{εi , Fi } is a sequence of martingale difference and ~xi is Fi−1 -measurable.
Bassic Issue : Make inference on β~ , based on observations {~x1 , y1 , · · · , ~xn , yn }
Estimation:
(a) εi ∼ i.i.d. N (0, σ 2 )
~x1 fixed, ~xi σ(y1 , · · · , yi−1 ), i = 2, 3, · · ·
MLE of β~ :
~ = L(β,
L(β) ~ y 1 , · · · , yn )
~ y1 , · · · , yn−1 )L(β,
= L(β, ~ yn | y1 , · · · , yn−1 )
~ y1 , · · · , yn−1 ) √ 1 e−(yn −β~ 0 ~xn )2 /2σ2
= L(β,
2πσ
..
.
X n

√ − (yi − β~ 0~xi )2 /2σ 2


= (1/ 2πσ)n e i=1 .

n
!−1 n
ˆ X X
So, M.L.E. β~n = ~xi~x0i ~xi yi
i=1 i=1
n
~ˆxi )2
X
σ̂n2 = 1/n (yi − β~
i=1

(b) Least squares:


n
X
~ =
minimum h(β) (yi − β~ 0~xi )2 over β.
~
i=1
n
X
~ β~ =
∂h(β)/∂ (yi − β~ 0~xi )~xi
i=1
n
! n
!
X X
= yi~xi − ~xi~x0i β~
i=1 i=1

ˆ
Solve the equation, we obtain β~n .
Computation Aspect:

112
• Recursive Formula
ˆ ˆ ˆ
β~n+1 = β~n + {(yn+1 − β~n0 ~x0n+1 )/(1 + ~x0n+1 Vn~xn+1 )}Vn ~xn+1
Vn+1 = Vn − Vn~xn+1~x0n+1 Vn /(1 + ~xn+1 Vn~xn+1 )
n
!−1
X
Vn = ~xi~x0i
i=1

Kalman filter type estimator:


! ! !
ˆ
~ ˆ
βn+1 =f β~n , ~xn+1 , n + 1
Vn+1 Vn

f : hardware or program.
!
ˆ
~
βn : stored in the memory.
Vn
~xn+1 : new data

Real Time Calculation:


• automatic
• large data set.
what is filter?

yi = β~ 0~xi + εi (state process.)


Oi = yi + δi (Observation process)

Filter Theory : Estimation state.


Predict state.
State History : F Y
Observation History : F O
Global History : F = F Y ∪ F O .
h is F-measurable
ĥ = E[h | F O ].
Author:
P. Brémaud : Point Process and Queues : Martingale Dynamic., Spring-Verlag, Ch.
IV : Filtering.
Matrix Lemma:

113
(1) If A, m× m matrix, is nonsingular υ, V  <m
Then
0 −1 −1 (A−1 υ)(V 0 A−1 )
(A + υV ) = A −
1 + V 0 A−1 υ

(A−1 υ)(V 0 A−1 )


 
−1
proof : A − 0 −1
[A + υV 0 ]
1+V A υ
(A υ)(V 0 A−1 )
−1
−1 0 (A−1 υ)(V 0 A−1 )υV 0
= I− A + A υV −
1 + V 0 A−1 υ 1 + V 0 A−1 υ
−1 0
A υV −1 0 (A υ)(V 0 A−1 υ)V 0
−1
= I− + A υV −
1 + V 0 A−1 υ 1 + V 0 A−1 υ
1
= I− 0 −1
{A−1 υV 0 − A−1 υV 0
1+V A υ
−V 0 A−1 υA−1 υV 0 + (V 0 Aυ)AυV 0 }υV 0 } = I
Corollary:
n+1
!−1
X
−1
Pn+1 = ~xi~x0i
i=1
n
!−1
X
= ~xi~x0i + ~xn+1~x0n+1
i=1
(Pn−1~xn+1 )(~x0n+1 Pn−1 )
= Pn−1 −
1 + ~x0n+1 Pn−1~xn+1
n+1
!−1 n+1
ˆ X X
β~n+1 = ~xi~x0i ~xi yi
i=1 i=1
n+1
!−1 n
X X
= ~xi~x0i −1
~xi yi + Pn+1 ~xn+1 yn+1 .
i=1 i=1
n
(Pn−1~xn+1 )(~x0n+1 Pn−1 )
 X
= Pn−1 − −1
~xi yi + Pn+1 ~xn+1 yn+1
1 + ~x0n+1 Pn−1~xn+1 i=1
Pn−1~xn+1~x0n+1 ~ˆ (Pn−1~xn+1 )(x0n+1 Pn−1~xn+1 )
 
ˆ
~ −1
= βn − βn + Pn ~xn+1 − yn+1
1 + ~x0n+1 Pn−1~xn+1 1 + ~x0n+1 Pn−1~xn+1
ˆ Pn−1~xn+1 ~ˆ0 ~xn+1 ) + Pn−1~xn+1
= β~n − (β n yn+1
1 + ~x0n+1 Pn−1~xn+1 1 + ~x0n+1 Pn−1~xn+1
ˆ ˆ
= β~n + (yn+1 − β~n~xn+1 )Pn−1~xn+1 )/(1 + ~x0n+1 Pn−1~xn+1 )

114
Po
!−1
X ˆ
If we set VPn = ~xi~x0i , β~Po =Least square estimator. Then Vn+1 =
i=1
n+1
!−1
X
~xi~x0n and
i=1
ˆ
β~n are least square estimator of β. ~
Engineer : Set initial value
Vo = CI, C is very small.
ˆ
β~o : guess.
(2) If A = B + w ~w~ 0 is nonsingular
|A|−|B|
Then w ~ 0 Aw
~ = |A|
Notice:
N
an−1
X an − an−1
as an ↑ ∞, an → 1, ∼ log aN
i=1
an
n+1 n
!
X X
Special Case : x2i = x2i + x2n+1 .
i=1 i=1

A w
~ 0
proof : | B |=| A − w
~w~ |= 0
(∗)
w
~ 1
Lemma : If A is nonsingular,

A C −1
Then B D =| A || D − BA C |

  
I O A C
proof : det
−BA−1 I B D
 
A C
= det
0 −BA−1 C + D
~ 0 A−1 w
So, (∗) = | A || 1 − w ~|
2. Strong Consistency:
Conditional Fisher0 s information matrix:
L(β,~ yi | y1 , · · · , yi−1 )
n
Y
= ~ yi | y1 , · · · , yi−1 ), implies
L(β,
i=1
n
X
~ y 1 , y2 , · · · , yn ) =
log L(β, ~ yi , | y1 , · · · , yi−1 )
log L(β,
i=1

115
Definition:
( )
~ yi | y1 , · · · , yi−1 ) [∂ log L(β,
∂ log L(β, ~ yi | y1 , · · · , yi−1 )]0
Ji = E y1 , · · · , yi−1
∂ β~ ∂ β~
Conditional Fisher0 s information matrix is
Xn
In = Ji
i=1

Model : yn = β~ 0~xn + εn
εn i.i.d. ∼ N (0, σ 2 )
~xn  σ{y1 , · · · , yn−1 } = Fn−1

~0 ~
 
x )2
~ yi | y1 , · · · , yi−1 ) = log √ 1 e− i 2σ2 i
(y −β
log L(β,
2πσ
√ (yi − β~ ~xi )
0 2
= − log 2πσ −
( 2σ 2
(yi − β~ 0~xi ) 0 (yi − β~ 0~xi )
Ji = E ~xi~xi |Fi−1 }
σ2 σ2
= E{ε2i ~xi~x0i | Fi−1 }/σ 4 = ~xi~x0i E{ε2i | Fi−1 }/σ 4
= ~xi~x0i /σ 2 ,
Xn
In = ~xi~x0i /σ 2
i=1

Recall that when ~xi are constant vectors,


 !−1 n 
n
ˆ X X
cov(β~n ) = cov  ~xi~x0i ~xi εi 
i=1 i=1

n
!−1
X
= ~xi~x0i σ 2 = In−1
i=1

Therefore, for any unit vector ~e ,


n
!−1
ˆ X
V ar(~e0 β~n ) = ~e0 ~xi~x0i ~e σ 2
i=1
= ~e0 In−1~e

116
ˆ
Let δn (~e∗ ) be the minimum eigenvalue (eigenvector) of In . Then Var(~e0∗ β~n ) =
~e0∗ In−1~e∗ = 1/δn ≥ ~e0 In−1~e, ∀ ~e.

So, the data set {~x1 , y1 , ~x2 , y2 , · · · , ~xn , yn } provides least information for estimat-
ing β~ along the direction ~e∗ , we can interpretate the maximum-eigenvaluce similarly.
ˆ
When the L.S.E. β~n is (strongly) consistent? Heuristically, if the most difficult direc-
tion has “infinite” information, we should be able to estimate β~ consistently. More
precisely, if
ˆ
λmin (In ) → ∞, we expect β~n → β~ a.s.

Weak consistently is trivial when ~xi are constants, since


ˆ 1
cov(β~n ) = In−1 and k In−1 k= → 0.
λmin (In )
For strong consistency, this is shown by Lai, Robbins and Wei(1979), Journal
Multivariate Analysis, 9, 340 ∼ 361. !
Xn
Theorem : In the fixed design case if lim λmin ~xi~x0i → ∞
n→∞
i=1
ˆ
Then β~n → β~ a.s. if {εi } is a convergence system.
Definition : {εn } is a convergence system if
n
X n
X
ci εi converges a.s. for all c2i < ∞.
i=1 i=1

Example:
εi ∼i.i.d. Eεi = 0, V ar(εi ) < ∞.
More general, {εn , Fn } is a martingale difference sequence such that

sup E[ε2i | Fi−1 ] < ∞ and


i
sup E[ε2i ] < ∞.
i

Stochastie Case:
< 1 > First Attempt : (Reduce to 1-dimension case).
n
!−1 n
ˆ X X
β~n − β~ = ~xi~x0 ~xi εi
i
i=1 i=1

117
Recall that : {εi , Fi } martingale difference sequence ui Fi−1 .
n
( P∞ 2
X converges a.s. on { i=1 ui < ∞}
ui εi 1+δ
P
1/2
0 ( ni=1 u2i ) [log ( ni=1 u2i )] 2
P
a.s. ∀ δ > 0
i=1

p = dim(β)~ = 1.
ˆ
Conclusion: β~n converges a.s.
n
X
The limit is β~ on the set {In = x2i → ∞}. In fact on this set
i=1

n
! 1+δ
2 n
!1/2 
ˆ X X
β~n − β~ = 0  log x2i / x2i  a.s. ∀ δ > 0.
i=1 i=1
n
!
X
Let Pn = ~xi~x0i , Vn = Pn−1 , Dn = diag(Pn ).
i=1
n
ˆ X
β~n − β~ = (Pn−1 Dn )(Dn−1 ~xi εi )
i=1
 Xn Xn 
 xi1 , εi / x2i1 
= Pn−1 Dn  i=1 i=1
 
.. 

Pn . P 
n 2
i=1 ip i /
x ε i=1 xip

So
n
! 1+δ
2
X
log x2ij
ˆ
k β~n − β~ k ≤ k Pn−1 kk Dn k max Pn
i=1
1/2
1≤j≤P
i=1 x2ij
1+δ
!
(log λ∗n ) 2
= O 1/λn · λ∗n · 1/2
, λ∗n : max. eigen.
λn

118
since
0
 
 0 
 
 0 
 .. 
.
 
 
(0, · · · , 0, 1, 0, · · · , 0)Pn  0  ≥ λn
 
1
 
 
0
 
 
 .. 
 . 
0
1+δ 3
= O(λ∗n (log λ∗n ) 2 /λn ). (∗)
2

ˆ
Conclusion: β~n → β~ a.s. on the set
n 3 o
lim λ∗n (log λ∗n )(1+δ)/2 /λn2 = 0, for some δ > 0 = C
n→∞
n 3 o
Remark: C ⊂ lim λ∗n /λn2 =0
n→∞

If λn ∼ n, then the order of λ∗n should be smaller than n3/2 .

λ∗n λn
 
det Pn
λn /2 ≤ = ∗ ≤ λn
tr(Pn ) λn + λn

119
Example 1 : yi = β1 + β2 i + εi
i = 1, 2, 3, · · · , n.
 
1
~xi =
i
 n
X 
n
X  n i 
Pn = ~xi~x0i = X i=1
 
n Xn 
 2 
i=1 i i
i=1 i=1
n
X
implies tr(Pn ) = n + i2 ∼ n 3 .
i=1
n n
!2
X X
det(Pn ) = n i2 − i
i=1 i=1
2
n2 n4 n4 n4

3
∼ n n /3 − = − = .
2 3 4 12

implies λ∗n ∼ n3
λn ∼ n

implies (∗) is not satisfy.


Example 2 : AR (2)
zn = β1 zn−1 + β2 zn−2 + εn
Characteristic polynomial
P (λ) = λ2 − β1 λ − β2
The roots of P (λ) determine the behavior of zn , assume that

P (λ) = (λ − ρ1 )(λ − ρ2 )
= λ2 − (ρ1 + ρ2 )λ + ρ1 ρ2
β1 = ρ1 + ρ2 , β2 = −ρ1 ρ2
 
zn−1
yn = zn , ~xn =
zn−2

Depcomposition:
      
vn 1 −ρ1 zn zn − ρ1 zn−1
= =
wn 1 −ρ2 zn−1 zn − ρ2 zn−1

120
Claim : vn = ρ2 vn−1 = εn
wn = ρ1 wn−1 = εn

vn − ρ2 Vn−1 = (zn − ρ1 zn−1 ) − ρ2 (zn−1 − ρ1 zn−2 )


= zn − (ρ1 + ρ2 )zn−1 + ρ1 ρ2 zn−2
= zn − β1 zn−1 − β2 zn−2 = εn

ρ2 = 1, ρ1 = 0, then vn − vn−1 = εn
Xn
= εi + v o
i=1
and wn = εn

n  
X zi−1
Pn = (zi−1 , zi−2 )
zi−2
i=1
   
1 − ρ1 1 − ρ1
Pn
1 − ρ2 1 − ρ2
n 
X vi−1 
= (vi−1 , wi−1 )
wi−1
i=1
 X n Xn 
2
 vi vi wi 
 i=1 i=1
=  X

n Xn 
 2 
vi wi wi
i=1 i=1

n
X
vo = 0 implies vn = εi , w i = εi
i=1
εi i.i.d. Eεi = 0, and V ar(εi ) < ∞.

121
n
! n
!
X X
tr(Pn ) on order vi2 + ε2i
i=1 i=1
n
! n
! n
!2
X X X
det(Pn ) = vi2 ε2i − v i εi
i=1 i=1 i=1
n
! n
! n n
!2
X X X X
= vi2 ε2i − ε2i + vi−1 εi
i=1 i=1 i=1 i=1
Because vi = vi−1 + εi .

n
X
lim sup vi2 /n(2n log log n) < ∞ a.s. (Donsker Theorem)
n→∞
i=1
n
X
inf(log log n) vi2
i=1
lim > 0 a.s.
n→∞ n2
n
X
implies tr(Pn ) ∼ vi2
i=1
n

n
! " n
#! 1+δ
2

X X X
2 2
Because vi−1 εi = 0  vi−1 log vi−1 .
i=1 i=1 i=1


n
! n
! 1+δ
2

X X
det(Pn ) = −O n2 + 2
vi−1 log 2
vi−1 
i=1 i=1
n
! n
!
X X
+ vi2 ε2i .
i=1 i=1
 
n
! n
!1+δ 
X X
 n2 + 2 2
n
! n
!
  vi−1 log vi−1 

X X i=1 i=1
= vi2 ε2i 1 − O 
  ! ! 
n n

  X X 
i=1 i=1 2
  vi=1 ε2i 
i=1 i=1

122
n
 X ! n
!
X n
n2 2
vi−1 ε2i ∼ n
!
i=1 i=1
X
2
vi−1
i=1
 
2 log log n
= O(n/(n / log log n)) = O
n
" n
!#1+δ  n
(log n)1+δ
X X  
2
log vi−1 ε2i = O
i=1 i=1
n
= o(1)

implies
n
X
tr(Pn ) ∼ vi2
i=1
n
!
X
det(Pn ) ∼ vi2 ·n
i=1

Not application I
< 2 > Second Approach
Energy function, ε-Liapounov0 s function.
dε(x(t))/dt < 0
Roughly speaking, construct a constant function.

V : <P → <
V (~x) > 0, if ~x 6= ~0
V (~0) = 0
inf V (~x) > 0
|~
x|>M

~ n is a sequence of vectors in <n


If w
s.t.

~ n+1 ) ≤ V (w
V (w ~ n ) and lim V (~ωn ) = 0
n→∞

then ~ n = ~0.
lim w
n→

123
Two essential ideas:
(1) decreasing
(2) never ending unless it reaches zero.
What are the probability analogous ?
Decreasing → supermartingale .
→ almost supermartingle.
Recall the following theorem (Robbins and Siegmund) 1971, Optimization Methods
in stat. ed. by Rustgi, 233∼.
Lemma : (Important Theorem )
Let an , bn , cn , dn , be Fn -measurable nonnegative
( ∞ random varaibles ) E[an+1 | Fn ] ≤
s.t.
X ∞
X
an (1 + bn ) + cn − dn . Then on the event bi < ∞, ci < ∞
i=1 i=1
n
X
lim an exists and finite a.s. and di < ∞ a.s.
n→∞
i=1
What is the supermartingale in above ?
Ans: bn = 0, cn = 0, dn = 0.
We start with the residual sum of squares.
n n
X ˆ X
(yi − β~n0 ~xi )2 = ε2i − Qn
i=1 i=1

n
X ˆ
where Qn = (β~n~xi − β~ 0~xi )2
i=1
n
!
ˆ X ˆ
= (β~n − β)
~ 0 ~xi~x0i (β~n − β)
~
i=1

Heuristic : If the least squares functions is good, one would expect


n n
(yi − β~i0~xi )2 ∼
X X
= ε2i .
i=1 i=1

n
X
That is, relative to ε2i , Qn should be smaller. Therefore, Qn /a∗n may be a
i=1
right consideration for the “energying function ”. Another aspect of Qn is that it is
ˆ ~ which reaches zero only when β~ˆn = β.
a quadratic function of (β~n − β), ~

124
How to choose a∗n ?
ˆ
Qn ≥k β~n − β~ k2 ·λn
ˆ
or Qn /λn ≥k β~n − β~ k, choose : a∗n = λn .
Theorem : In the stochastic regression model.
yn = β~ 0~xi + εi
if sup E[ε2n | Fn−1 ] < ∞ a.s.
n

then on the event


 !−1 
X ∞ n
X 
~x0 ~xi~x0i ~xn /λn < ∞, lim λn = ∞
 n=p n n→∞ 
i=1

proof : an = Qn /λn , bn = 0.
n
!0 n
!−1 n
!
X X X
0
Qn = ~xi εi ~xi~xi ~xi εi
i=1 i=1 i=1

n−1
!0 n−1
!
X X
E[an | Fn−1 ] = ~xi εi Vn ~xi εi /λn
i=1 i=1
n−1
!
X
+2E[~x0n εn Vn ~xi εi | Fn−1 ]/λn
i=1
+E(~x0n Vn εn | Fn−1 )/λn .
n
!0 n
!
X X
= ~xi εi Vn ~xi εi /λn
i=1 i=1
+~x0n Vn~xn E[ε2n | Fn−1 ]/λn
n−1
!0 n−1
!
X X
≤ ~xi εi Vn−1 ~xi εi /λn + cn−1
i=1 i=1
= Qn−1 /λn + cn−1
 
1 1
= Qn−1 /λn−1 − Qn−1 − + cn−1
λn−1 λn
= an−1 − an−1 (1 − λn−1 /λn ) + cn−1

125
By the almost supermartingale theorem.
X  
λn − λn−1
lim an < ∞ and an−1 <∞
n→∞ λn

X 0 
X ~xn Vn~xn 2
a.s. on { cn−1 < ∞} = E[εn | Fn−1 ] < ∞
λn
X 0 
~xn Vn~xn
⊃ <∞
λn
If lim an = a > 0
n→∞
Then ∃N s.t. an ≥ a/2, ∀ n > N
∞ ∞
!
X λi − λi−1 a X λi − λi−1
So ai−1 ≥
i=1
λi 2 i=N
λi
∞ Z λi
a X dx
≥ · λn /λn−1
2 i=N λi−1 x
 Z ∞
a 1
≥ inf λn /λn−1 dx = ∞
2 n≥N λn−1 x

Note 1: If λn−1 /λn has limit point λ < 1 then there exists
λnj − λnj−1
nj 3 lim λnj−1 /λnj = λ, lim = 1 − λ.
j→∞ j→∞ λnj

This contradicts.
X λi − λi−1
Note 2 : If <∞
i
λi
λn −λn−1
Then λn
→0
λn−1 /λn → 1.
Therefore, on the event
X 
~xn Vn~xn
< ∞, λn → ∞ ,
λ
an → 0 a.s.
ˆ
since an ≥k β~n − β~ k2
ˆ
β~n → β~ a.s. on the same event.

126
Corollary : On the event

{λn → ∞, (log λ∗n )1+δ = O(λn ) for some δ > 0}


ˆ
Then lim β~n = β~ a.s.
n→∞
∞ n
!−1
X X
proof : ~x0n ~xi~x0i ~xn /λn < ∞
n=p i=1

∞ ∞
X ~x0 Vn~xn X | Pn | − | Pn−1 |
= n
≤ (By Pn = Pn−1 + ~xn~x0n )
n=p
λn n=p
| P n | λ n


!
X | Pn | − | Pn−1 |
= O
n=p
| Pn | (log λ∗n )1+δ

!
X | Pn | − | Pn−1 |
= O
n=p
| Pn | (log | Pn |)1+δ
= O(1)

Since | Pn |= λ∗n · · · λn → ∞.
implies log | Pn |≤ p log(λ∗n ).
•• Knopp : Sequence and Series.

as an ↑
X an − an−1
implies <∞
an (log an )1+δ
Z ∞
1
dx < ∞
2 x(log x)1+δ
Because ~x0n Vn = ~x0n Vn−1 /(1 + ~x0n Vn−1~xn )
~x0 Vn−1~xn~x0n Vn−1
~x0n Vn = ~x0n Vn−1 − n
1 + ~x0n Vn−1~xn

127
< 3 > Third Approach:
k
!0 k
!
X X
Qk = ~xi εi Vk ~xi εi
i=1 i=1
k−1
!0 k−1
!
X X
= ~xi εi Vk ~xi εi
i=1 i=1
k−1
X
+~x0k Vk ~xk ε2k + 2(~x0k ~xk ~xi εi )εk
i=1
k−1
X
= Qk−1 − (~x0k Vk−1 ~xi εi )2 /(1 + ~x0k Vk−1~xk )
i=1
k−1
X
+~x0k Vk ~xk ε2k + 2(~x0k Vk ~xi εi )εk .
i=1

n
X
Qn − QN = (Qj − Qj−1 )
j=N +1
n k−1
!2 
X X
= − ~x0k Vk−1 (~xi εi ) 2
(1 + ~x0k Vk−1~xk )
k=N +1 i=1
n n k−1
!
X X X
+ ~x0k Vk ~xk ε2k + 2 ~x0k Vk ~xi εi εk
k=N +1 k=N +1 i=1

128
n k−1
!2
X X
implies Qn − QN + ~x0k Vk−1 ~xi εi /(1 + ~x0k Vk−1~xk )
k=N +1 i=1
k−1
!
X
n n
~x0k Vk−1 ~xi εi
X X k=1
(1) = ~x0k Vk ~xk ε2k +2 εk
k=N +1 k=N +1
1 + ~x0k Vk−1~xk
n k−1
!2
X X
(2) = ~x0k Vk−1 ~xi εi /(1 + ~x0k Vk−1~xk )
k=N +1 i=1
k−1
!
X
n
~x0k Vk−1 ~xi εi
X i=1
= εk
i=N +1
1 + ~x0k Vk−1~xk

(1) finite if and only if (2) finite.


Theorem: If sup E[ε2n | Fn−1 ] < ∞ a.s.
n
Then
k−1
!2
X
n
~x0k Vk−1 ~xi εi
X i=1
−QN + Qn +
k=N +1
1 + ~x0k Vk−1~xk
n
X
∼ ~x0k Vk ~xk ε2k a.s.
k=N +1

on the set where one of it approaches ∞.


proof: Let
k−1
X
~x0k Vk−1 ~xi εi
i=1
Uk =
1 + ~x0k Vk−1~xk

Then Uk is Fk−1 -measurable.

129
Therefore
 n
! " n
#
X X
Uk2 on Uk2 < ∞

 O


n
X 
k=N +1 " k=N +1
Uk εk = n
! ∞
#
 X X
k=N +1 2 2
 o Uk on Uk = ∞



k=N +1 k=N +1

n
X n
X
But Uk2 ≤ Uk2 (1 + ~x0k Vk−1~xk )
N +1 N +1
n k−1
!2
X X
= ~x0k Vk−1 ~xi εi /(1 + ~x0k Vk−1~xk )
N +1 i=1

Special case ~xi = 1, Pn = n.


 2
k−1
X
n
!2  n
 εi 
  
X X 
 i=1  1
εi n+ 1+
k − 1 k−1
 
i=1 k=N +1  

n
X ε2k

k=N +1
k
 k−1 2
X
 n εi     n   !
 X 1 1 X k − 1
(εk−1 )2

  1+ =

k=N +1 k − 1 
 k − 1 k=N +1
k

130
(εk )2 ∼ (log n)σ 2 .
P
Because
n
X k−1
X
Qn + (~x0k Vk+1 ~xi εi )2 /(1 + ~x0k Vk−1~xk )
k=N +1 i=1
Xn
∼ ~x0k Vk ~xk ε2k , if one of it → ∞, where
k=N +1
n
!
ˆ X ˆ
Qn = (β~n − β)
~ ~xi~x0i (β~n − β)
~
i=1
n
!0 n
!
X X
= ~xi εi Vn ~xi εi .
i=1 i=1

Lemma : Assume that {εk , Fk } is a martingale difference sequence and Vk is Fk−1 -


measurable all k.
(i) Assume that sup E[ε2n | Fn−1 ] < ∞ a.s.
n


(∞ )
X X
Then | uk | ε2k < ∞ a.s. on | uk |< ∞
k=1 k=1
 !1+δ 
∞ n
! n
X X X
and | uk | ε2k = o  | uk | log | uk | .
k=1 k=1 k=1
(∞ )
X
on the set | uk |= ∞ , for all δ > 0.
k=1

(ii) Assume that sup E[| εn |α | Fn−1 ] < ∞, for some α > 2. Then
n

n
X n
X
| uk | ε2k − | uk | E[ε2k | Fk−1 ]
k=1 k=1
n
! (∞ )
X X
=o | uk | a.s. on | uk |= ∞, sup | un |< ∞ .
n
k=1 k=1

131
Therefore, if lim E[ε2k | Fk−1 ] = σ 2 a.s.
k→∞
n
X n
X
Then lim | uk | ε2k / | uk |= σ 2 a.s.
n→∞
( ∞ k=1 k=1
)
X
on | uk |= ∞, sup | un |< ∞
n
k=1

Note:
Basic idea is to ask : zi ≥ 0, the relation of
n
X n
X
zi and E[zi | Fi−1 ]
i=1 i=1
Xn n
X
Because E(| uk | ε2k | Fk−1 ) = | uk | E(ε2k | Fk−1 )
k=1 k=1

(Freedman. D. (1973). Ann. Prob. 1, 910∼925.).


proof: (i) Take an large enough so that

X
P [| uk |> ak ] < ∞
k=1
Let u∗k = uk I[|uk |≤ak ]
Then P {uk = u∗k eventually }=1.
If we can show our results for {u∗k } then the results also hold for {uk }.
Therefore, we can assume that each uk is a bounded random variables.
∀ M > 0, define
vk = uk I[E(ε2k |Fk−1 )≤M ] I k

X
| ui |≤ M 
 


i=1

then vk is Fk−1 -measurable.



! ∞
!
X X
Then E | vi | ε2i = E(| vi | ε2i | Fi−1 )
i=1 i=1

!
X
= E | vi | E[ε2i | Fi−1 ]
i=1

132
 

∞
X 

≤E
 | u i | I
i
X
 · M

 i=1 
| u |≤ M
 

 j 

j=1

≤ M2 < ∞

X
So | vi | ε2i < ∞ a.s.
i=1
( ∞
)
X
Observe that vk = uk , ∀ k on sup E[ε2n | Fn−1 ] ≤ M, | un |≤ M = ΩM .
n
n=1

X
So | ui | ε2i < ∞ a.s. on ΩM , ∀ M .
i=1


( ∞
)
[ X
But ΩM = sup E[ε2n | Fn−1 ] < ∞, | un |< ∞ .
n
M =1 n=1
(∞ )
X
= | un |< ∞
n=1

The proof is first part.


n
X
Let sn = | ui |
i=1
n
X | uk | ε2k
consider
k=1
sk (log sk )1+δ

X | un |
Since < ∞ a.s.
s (log sn )1+δ
n=1 n
∞ Z sn
X dx

n=1 sn−1
x(log x)1+δ

X | uk |
implies ε2 < ∞ a.s.
k=1
sk (log sk )1+δ k

133
( n
)
X
By Kronecker0 s Lemma, on sn = | ui |→ ∞
i=1
n
X
| uk | ε2k
k=1
lim = 0 a.s.
n→∞ sn (log sn )1+δ
(ii) (Chow (1965), local convergence theorem).
For a martingale difference sequence {δk , Fk }
X n
εk converges a.s. on
k=1
(∞ )
X
E(| δk |r | Fk−1 ) < ∞ .
k=1
where 1 ≤ r ≤ 2.
Set δk = u2k [ε2k − E(ε2k | Fk−1 )]
Then {δk , Fk } is a martingale difference sequence without loss of generality,
1 1
we can assume that 2 < α ≤ 4. If α ≥ 4, then E 4 (ε4i | Fi−1 ) ≤ E α (| εi |α | Fi−1 ).
Set r = α/2.
Let tn = ni=1 | ui |2r .
P

E[| δk |r | Fk−1 ]
=| uk |2r E{| ε2k − E[ε2k | Fk−1 ] |r | Fk−1 }
≤ | uk |2r E{[max(| ε2k |, E[ε2k | Fk−1 ]r | Fk−1 }
k
≤ | uk | E{| εk |2r +E r [ε2k | Fk−1 ] | Fk−1 }
2r

= | uk |2r {E[| εk |2r | Fk−1 ] + E r [ε2k | Fk−1 ]}


≤ 2 | uk |2r E[| εk |2r | Fk−1 ]
Xn
So E(| δk /tk |r | Fk−1 )
k=1
n
!
X | uk |2r
≤ 2 sup E[| εn |α | Fn−1 ] < ∞ a.s.
k=1
trk n

n
X
So δk = o(tn ) a.s. on {tn → ∞}
k=1
n
(∞ )
X X
But δk converges a.s. on | ui |2r = lim tn < ∞ .
n→∞
k=1 i=1

134
n
X
0
by Chow s Theorem on δi .
i=1
Observe that on {supn | un |< ∞}.
n
! 
X
2r−1
tn ≤ | ui | sup | un |
n
i=1

Combining all those results


n n
! (∞ )
X X X
δi = o | ui | a.s. on | ui |= ∞, sup | un |< ∞ .
n
i=1 i=1 i=1

It is not difficult to see that


n n
!  
X X
| uk | ε2k =O | uk | a.s. on sup | un |< ∞
n
k=1 k=1

This is because
( n 
X
(a) On | ui | < ∞, sup | un |< ∞ ,
n
i=1
n n
!
X X
| uk | ε2k = O(1) = O | uk | (by (i))
k=1 k=1
( n
X
(b) On | uk | = ∞, sup | un |< ∞} ,
n
k=1
n n n
!
X X X
| uk | ε2k = | uk | E(ε2k | Fk−1 ) + o | uk |
k=1 k=1 k=1
n
! n
!
X X
≤ | ui | sup E(ε2n | Fn−1 ) + o | uk |
n
i=1 k=1
n
! 
X
= | ui | sup E(ε2n | Fn−1 ) + o(1)
n
i=1
n
!
X
= O | ui | .
i=1

135
Now, if lim E[ε2n | Fn−1 ] = σ 2 .
n→∞
n n
(∞ )
X X X
Then | uk | E[ε2k | Fk−1 ]/ | uk |→ σ 2 a.s. on | uk |= ∞ .
k=1 k=1 k=1
n
X
By an ≥ 0, bn ≥ 0, bn → b, ai → ∞
i=1
n
X n
X
Then ai b i / ai → b.
i=1 i=1

n
X n
X
So | uk | ε2k / | uk |
k=1 k=1
n
X
| uk | E[ε2k | Fk−1 ]
k=1
= n + o(1)
X
| uk |
k=1
( ∞
)
X
→ σ 2 , a.s. on sup | un |< ∞, | uk |= ∞ .
n
k=1

Lemma 2: Let {wn } be a p × 1 vectors and


Xn
An = w ~ i0 . Assume that AN is nonsingular for some N . Let λ∗n and | An | denote
~ iw
i=1
the maximum evgenvalue and determinant of An .
Then (i) λ∗n ↑ .

X
(ii) lim λ∗n < ∞ implies ~ i0 Ai w
w ~ i < ∞.
n→∞
i=N
n
X
(iii) lim λ∗n = ∞, implies ~ i0 A−1
w i w~ i = O(log λ∗n ).
n→∞
i=N
0 −1
(iv) lim λ∗n = ∞, w ~i →
~ i Ai w 0, implies
n→∞
Xn
~ i0 A−1
w i w~ i ∼ log | An | .
i=N

136
proof : (i) trivial.
| An | − | An−1 |
~ n0 A−1
(ii) w n w~n =
| An |
(λn ) ≥| An | and | An |≥ λ∗n λp−1
∗ p
n

Where λn is the minimum eigenvalue of An .

If λ∗n < ∞, then lim | An |< ∞.


n→∞
∞ ∞
X X | Ai | − | Ai−1 |
So ~ i0 A−1
w i w~i =
i=N i=N
| Ai |
X∞
(| Ai | − | Ai−1 |)
lim | An | − | AN −1 |
i=N n→∞
≤ = < ∞.
| Ai | | AN |

n n
X X | Ai | − | Ai−1 |
(iii) Note that ~ i0 A−1
w i w~i =
i=N i=N
| Ai |
n Z |Ai |
X 1
≤ dx + 1
i=N +1 |A i−1 | x
= 1 + log | An | − log | AN |
= O(log | An |) = O(log λ∗ ).

(iv) Note that λ∗n → ∞, | An |→ ∞.

| An | − | An−1 |
Now → 0 implies
| An |
n
X | Ai | − | Ai−1 |
∼ log | An |
i=N
| A i |

137
Corollaryl : (1) If sup E[ε2n | Fn−1 ] < ∞ a.s.
n
n
X
Then ~x0k Vk ε2k
k=N +1

= O((log λ∗n )1+δ ) a.s.


(2) If sup E[| εn |2+δ | Fn−1 ] < ∞, for some δ > 0.
n
Then
n
X
(i) ~x0k Vk ~xk ε2k = O(log λ∗n ) a.s.
k=N +1

(ii) lim E[ε2n | Fn−1 ] = σ 2 . Then


n→∞
n n
!!
X X
~x0k Vk ~xk ε2k ∼ log det ~xk ~x0k
k=N +1 k=1
n o
on lim ~x0n Vn~xn = 0, λ∗n → ∞ .
n→∞

| Pk | − | Pk−1 |
proof : 0 ≤ uk = ~x0k Vk ~xk = ≤1
| Pk |

(1) If lim λ∗n < ∞ then ∞


P
n→∞ k=1 uk < ∞ (lemma 2 - (ii)).

X
Therefore uk ε2k < ∞ (by lemma 1-(i)).
k=1
n
X
So uk ε2k = O((log λ∗n )1+δ ) on (λ∗n → ∞)
k=1

n
X
If λ∗n → ∞, ui = O(log λ∗n ).
i=1
n n
! n
!
X X X
and ui ε2i = O( ui [log ui ]1+δ )
i=1 i=1 i=1
∗ ∗ 1+δ
= O(log λn (log log λn ) )
= O((log λ∗n )1+δ ).

138
(2) Note that 0 ≤ ui ≤ 1.
un → 0 on Ωo , Ωo = {limn→∞~x0n Vn~xn = 0, λ∗n → ∞} .
Xn
ui → ∞ on Ωo
i=1

By lemma 1 - (ii),
n
X n
X
ui ε2i / ui → σ 2 a.s.
i=1 i=1
n
X
on ui ε2i ∼ (log | Pn |)σ 2
i=1
Remark:
n k−1
!2
X X
1o Rn = Qn + ~x0k Vk ~xi εi /(1 + ~x0k Vk−1~xk )
k=N +1 i=1
n
X
∼ ~x0k Vk ~xk ε2k if one of it → ∞.
k=N +1

2o (i) Assume that sup E[ε2n | Fn−1 ] < ∞ a.s.


n
Then Rn = O((log λ∗n )1+δ ) a.s. for δ > 0
(ii) If sup E[| εn |2+δ | Fn−1 ] < ∞ a.s. for some α > 2,
n
Then Rn = O(log λ∗n )

3o If sup E[| εn |α | Fn−1 ] < ∞ a.s. and lim E[ε2n | Fn−1 ] < ∞ a.s.
n n→∞

then on {~x0n Vn~xn → 0, λ∗n → ∞}


n
!
X
Rn ∼ [log det ~xi~x0i ]σ 2 a.s.
i=1

Corollary 1 : (i) If sup E[ε2n | Fn−1 ] < ∞ a.s.


n
n
!−1/2 n
X X
Then Qn =k ~xi~x0i ~xi εi k2 (∗)
i=1 i=1
= O((log λ∗n )1+δ ) a.s. (∗∗)

139
and k ~bn − β~ k2 = O((log λ∗n )1+δ /λn ) a.s., for all δ > 0.

(ii) If sup E[| εn |α | Fn−1 ] < ∞ a.s. for some α > 2,


n
then (∗) and (∗∗) holds with δ = 0

proof : Qn ≤ Rn (implies (∗) follow from Remark − 2o )


n
!0 n
!−1 n
!
X X X
Qn = ~xi εi ~xi~x0i ~xi εi
i=1 i=1 i=1
n
!
X
= (~bn − β)
~ 0 ~xi~x0i (~bn − β)
~
i=1

≥ λn (~bn − β)
~ 0 (~bn − β)
~
= λn k ~bn − β~ k2 .

So (∗∗) follow from (∗).


Corroblary 2: (Adaptive prediction)
If lim E[ε2n | Fn−1 ] = σ 2 a.s. and
n→∞
sup E[| εn |α | Fn−1 ] < ∞ for some α > 2
n

then on the set

{~x0n Vn~xn → 0, λ∗n → ∞}, we have that


Xn
Qn + {(~bk−1 − β)
~ 0~xk }2 .
k=N +1
" n
!#
X
∼ σ 2 log det ~xi~x0i a.s.
i=1

Therefore, if Qn = 0(log λ∗n ), then


n n
!
0
X X
(yk − ~bk−1~xk − εk )2 ∼ σ 2 log[det ~xi~x0i ] a.s.
k=N +1 i=1

140
proof: By Remark- 3o ,

n k−1
!2
X X
Qn + ~x0k Vk ~xi εi /(1 + ~x0k Vk−1~xk )
k=N +1 i=1
Xn
Qn + ~x0k (~bk−1 − β)
~ 2 /(1 + ~x0 Vk−1~xk )
k
k=N +1
n
!
X
∼ σ 2 log[det ~xi~x0i ] a.s.
i=1
n
X
[~x0k (~bk−1 − β)]
~ 2 /(1 + ~xk Vk−1~xk )
k=N +1
Xn
∼ [~x0k (~bk−1 − β)]
~ 2 if it → ∞ and ~x0 Vk−1~xk → 0,
k
k=N +1
1
since 1 + ~x0k Vk−1~xk = → 1.
1 − ~x0k Vk ~xk
n
X n
X
and ai b i ∼ ai (ai bi > 0)
i=1 i=1
n
X
if bi → 1 and ai → ∞
i=1
~0
(Because yk = β ~xk + εk )

Predict:
At stage n, we already above {y1 , ~x1 , · · · , yn , ~xn } since we can not forsee the
future, we have to use observed data to predict yn+1 .
i.e. The predictor ŷn+1 is Fn -measurable.

If we are only interested in a single period prediction, we may use (yn+1 − ŷn+1 )2 as
a measure of performance. In the adaptive prediction case, it may be more appropriate
to use the accumulated prectiction errors
n
X
Ln = (yk+1 − ŷk+1 )2
k=1

141
In the stochastic regression model,
n
X
Ln = (β~ 0~xk+1 − ŷk+1 )2
k=1
n
X n
X
+2 (β~ 0~xk+1 − ŷk+1 )εk+1 + ε2k+1
k=1 k=1

By Chow0 s local convergence Theorem,


n
X n+1
X
Ln ∼ (β~ 0~xk+1 − ŷk+1 )2 + εk a.s. if any side → ∞.
k=1 k=1

Therefore, to compare difference predictors, it is sufficient to compare


n
X
(β~ 0~xk+1 − ŷk+1 )2 = Cn
k=1

The least square predictor ŷk+1 = ~b0k ~xk+1 .


n
X
Note : (yi − ~b0i−1~xi )2 (1 − ~x0i Vi~xi )
i=P +1
Xn n
X
= ε̂2i (n) − ε̂2i (p), where ε̂i (n) = yi − ~b0n~xi
i=1 i=1

Example : AR(1)

xk = ρxk−1 + εk , εk ∼ i.i.d..
E[εi ] = 0, V ar(εi ) = σ 2 , E | εi |3 < ∞

n
X
(i) | ρ |< 1, x2i /n → σ 2 /(1 − ρ2 ) a.s.
i=1
n
X
(ii) | ρ |= 1, x2i = O(n2 log log n)
i=1

142
n
!
X
(log log n) x2i
1
lim inf > 0 a.s.
n→∞ n2
λ∗n = O(n3 ) a.s., | ρ |≤ 1.
lim inf λn /n > 0
n→∞
 1/2
log n
ρ̂n − ρ = 0 (By Corollary1.)
n
Xn
2
xn / x2i → 0
i=1
n
X n−1
X
n
x2i /n − x2i /n
X i=1 i=1 0
(i) | ρ |< 1, x2n / x2i = n → =0
X 1/(1 − ρ2 )
i=1
x2i /n
i=1

(ii) | ρ |= 1,
 !2 
n
X n
X
x2n / x2i = O  εi /(log log n/n2 )−1 
i=1 i=1
 
log log n p
= O ( 2n log log n)2
n2
(log log n)2
 
= O = o(1)
n

143
n
!2 n
X X
Qn = x i εi / x2i
 i=1  i=1

 

n
!1/2 n
!1/3 2
 1  X X
=  n
 X !

 x2i log x2i 
2  i=1 i=1
xi

i=1
n
!2/3
X
= log x2i
i=1

= (log n) 2/3
= 0(log λ∗n )

By Corolary 2,
n n+1
!
X X
(ρ̂n − ρ)x2i+1 ∼ σ 2 log x2i a.s.
i=2 i=1

σ 2 log n, a.s. if | ρ |< 1.

2σ 2 log n, a.s. if | ρ |= 1.
log[n2 log log n] = 2 log n + log(log log n)
log[n2 / log log n] = 2 log n − log(log log n).

To find the eigenvalue (maximum and minimum)

inf ~x0 Bn~x nonnegative positive


k~
xk=1

 
o 0
1 lim inf inf ~x Bn~x
n→∞ k~
xk=1
n o
6= inf lim ~x0 Bn~x (The place of difficulity)
k~
xk=1 n→∞

2o Lemma : Assume that {Fn } be a sequence of ↑ σ -fields and ~yn = ~xn + ~εn ,
when
`
X
~xn is Fn−` -measurable, ~εn = ~εn (j) and E{~εn (j) | Fn−j−1 } = 0.
j=1

144
sup E[k~εn (j)kα | Fn−j−1 ] < ∞ a.s. for some α > 2. Also assume that λn =
n ! !
X n n
X n
X
λ ~xi~x0i + ~εi ~ε0i → ∞ a.s. and log λ∗ ~xi~x0i = 0 (λn ) a.s.
i=1 i=1 i=1
n
!
X
Then lim λ ~yi ~yi0 /λn = 1 a.s.
n→∞
i=1

n
X n
X
proof : Let Rn = ~xi~x0i and Gn = ~εi ~ε0i
i=1 i=1
n
X n
X n
X
Then ~yi ~yi0 = Rn + ~xi ~ε0i + ~εi~xi + Gn
i=1 i=1 i=1

We can assume that Rn is nonsingular.


 
1
 0 
Otherwise, add ~yo =   = ~xo
 
..
 . 
0
 
0

 1 

~y−1 = ~x−1 =
 0 

 .. 
 . 
0
..
.  
0
 
  0
~y1−P = ~x1−P =  
  ..
  .
 0 
1
εo = ε−1 = · · · = ε−P +1 = 0

n
−1
X
kRn 2 ~xi ~ε0i (j)k2 = O(log(λ∗n )), (By Corollary 1.)
1
= o(λn )

145
n
−1
X
Therefore kRn 2 ~xi ~εi k2 = O(log λ∗n )
1

Given any unit vector ~u ,


n
!
X
0
~u ~xi ~εi ~u
1
n
!
1
−1
X
0
= ~u Rn Rn 2
2
~xi ~ε0i ~u
1
n
!
1
− 12
X
≤ k~u0 Rn kkRn 2
~xi ~ε0i k
1
1 1
= k~u0 Rn kO((log λ∗n ) 2 )
2

1 1
= (~u0 Rn~u) 2 O((log λ∗ ) 2 )
1 1
≤ (~u0 (Rn + Gn )~u) 2 O(log 2 λ∗n )
1 1
≤ (~u0 (Rn + Gn )~u/λn2 ) O(log 2 λ∗n )
(Because 1 ≤ ~u0 (Rn + Gn )~u)/λn )
1
= ~u0 (Rn + Gn )~u O((log λ∗n /λn ) 2 )
= (~u0 (Rn + Gn )~u)o(1)
n
!
X
So ~u0 ~yi ~yi0 ~u = ~u0 (Rn + Gn )~u(1 + o(1))
1

Since o(1) does not depend on ~u, we complete this proof.


Example :AR(p).

yi = β1 yi−1 + · · · + βp yi−p + εi
ψ(z) = z p − β1 z p−1 · · · − βp .

All the roots of ψ have magnitudes less than or equal to 1.


 
yp
 yp+1 
Let ~yn = 
 
.. 
 . 
yn−p+1

146
Then L.S.E.
n
!−1 n
!
X X
~bn = 0
~yi−1 ~yi−1 yi−1 εi + β~
i=1 i=1

Assume that εi are i.i.d.

E[εi ] = 0 and E[| εi |2+δ ] < ∞, Eε2i = σ 2 > 0.


 
β1 · · · βp
Let B =
Ip−1 O
   
  β1 β2 · · · βp   εn
yn yn−1
 yn−1  
 1 0 ··· 0    yn−2

  0 

~yn = 

..
  0 1
= ··· 0  
  .. +
  0 

 .   .. .. ..  .   .. 
 . . .   . 
yn−p+1 yn−p
0 0 1 0 0

 
1

 0 

implies ~yn = B ~yn−1 + ~e εn , where e = 
 .. 
 . 

 0 
0
~yn = B n yo + B n−1~eεn + · · · + B o~eεn
B can be written as
B = C −1 DC, where D = diag [D1 , · · · , Dq ]

 
λj 1 0 · · · 0
 0 λj 1 · · · 0 
Dj = 
 
.. .. . . .. 
 . . . . 
0 0 ··· λj

is an mj × mj matrix, mj is the multiplicity of λj .

147
q
X
, mj = p, λj are roots of ψ and C is a nonsingular matrix.
1
 k k k

λk−1 λk−2
  
λkj 1 j 2 j ··· mj −1
λk−mj +1
k
 0 λj 0 ··· 0
 

k
 .. .. 
Dj = 
 0 0 . . 

 .. .. .. .. 
 . . . . 
0 0 0 ··· λkj

B n = C −1 Dn C
= C −1 diag[D1n , · · · , Dqn ]C

kB n k ≤ kC −1 kkCk max{kD1n k, · · · , kDqn k}


 
p n n!
≤ k n (By = ∼ np )
p p!(n − p)!
k~yn k ≤ kB n kk~yo k + · · · + kB o~ek | εn |
≤ k np {k~yo k+ | ε1 | + · · · + | εn |}
= O(np+1 )

n
!
X
0
λmax ~yi−1 ~yi−1
i=1
n
X
0
≤ k ~yi−1 ~yi−1 k
i=1
n
X n
X
0
≤ k~yi−1 ~yi−1 k≤ k~yi−1 k2
i=1 i=1
n
!
X
= O ip+1 a.s.
i=1
p+2
= O(n ) a.s.

148
implies λmax = O(np+2 )

~yn = B 2 ~yn−2 + B~eεn−1 + ~eεn


= B p ~yn−p + B p−1~eεn−p+1 + · · · + ~eεn
= ~xn + ~εn , where
~xn = B p ~yn−p , ~εn = B p−1~e εn−p+1 + · · · + ~e εn
` = p.

n p−1
1X 0 X
Claim : lim ~εi ~εi = σ 2 B j ~e ~e0 (B 0 )j ≡ Γ, a.s.
n→∞ n
i=1 j=0

where Γ is positive definite.


n
!
X
Therefore, λmin ~εi ~ε0i /n → λmin (Γ) > 0 a.s.
i=1

p−1
X
~εi ~ε0i = B j ~e~e0 (B 0 )j ε2i−j
j=0
p−1
X
+ B j ~e ~e0 (B 0 )` εi−j εi−`
j6=`

Using the properties that


n
X
1/n ε2i−j = σ 2
i=1
n
X
1/n εi−` εi−j = 0 a.s. ∀ ` 6= j.
i=1

(From Martingale form and by Chow0 s theorem.)


n
1X 0
We have lim ~εi ~εi = Γ. a.s.
n→∞ n
i=1

149
Observe that
 
~e0
 ~e0 B 0 
Γ = (~e, B~e, · · · , B p−1~e) 
 
.. 
 . 
~e0 (B 0 )p−1

To show Γ is nonsingular, it is sufficient to show (~e, B~e, · · · , B p−1~e) is nonsingular.


 
1 β1 ∗ ∗ · · · ∗
 0 1 β1 ∗ · · · ∗ 
 
p−1  0 0 1
(~e, B~e, · · · , B ~e) =   is nonsingular.

 .. .. .. 
 . . . 
0 0 0 ··· ··· 1

~xn = B p ~yn−p
n
! n
X X
∗ 0 p 2 0
λ ~xi~xi ≤ kB k ~yi−p ~yi−p k
i=p i=p
p+2
= O(n ) a.s.

n
!
X
But λn ≥ λ∗ ~εi ~ε0i ∼ nλ∗ (Γ)
i=1
n
!
X
So log λ∗ ~xi~x0i = O(log n) = o(λn ) a.s.
i=1

By previous theorem,
n
!
X
0
λ∗ ~yi−1 ~yi−1
i=1
lim = 1 a.s.
n→∞ λn
n
!
X
0
Therefore, lim inf λ∗ ~yi−1 ~yi−1 /n > 0 a.s.
n→∞
i=1

150
n
!
X
So log λ∗ 0
~yi−1 ~yi−1
i=1
n
!!
X
0
= o λ∗ ~yi−1 ~yi−1 and
i=1

lim ~bn = β~ a.s.


n→∞

3. Limiting Distribution :

yn,i = βn0 xn,i + εn,i , i = 1, 2, · · · , n

Assume that ∀ n,
∃ ↑ σ-fields {Fn,j ; j = 0, 1, 2, · · · , n}
s.t. ∀ n {εn,j , Fn,j } is a martingale difference sequence and xn,j is Fn,j−1 -measurable.
Assume that:

(i) E[ε2n,j | Fn,j−1 ] = σ 2 a.s. ∀ n, j.


(ii) sup E[| εn,j |α | Fn,j−1 ] = OD (1), α > 2.
1≤j≤n
(iii) ∃ nonsingular matrices An s.t.
n
!
D
X
An ~xi,n~x0i,n A0n → Γ, where Γ is p.d.
i=1
D
(iv) sup kAn~xn,i k → 0.
1≤i≤n

n
!−1 n
!
X X
Then if ~bn = ~xn,i~x0n,i ~xn,i yn,i , we have
i=1 i=1
D
(A0n )−1 (~bn ~ → N (0, σ 2 Γ−1 )
− β)
take i = 1, 2, · · · , kn

151
Note: If {Xn,j , Fn,j , 1 ≤ j ≤ kn } is a martingale difference sequence s.t.
kn
D
X
2
(i) E[Xn,j | Fn,j−1 ] → C, constant
j=1
kn
D
X
2
(ii) E[Xn,j 2 >ε} | Fn,j−1 ] → 0
I{Xn,j
j=1
kn
D
X
Then Xn,j → N (0, C)
j=1

proof: W.L.O.G, we can assume that


xni is bounded,∀ n, i.
since (A0n )−1 (~bn − β) ~
kn
! kn
!
X X
= An ~xn,i~x0n,i A0n An ~xn,i εn,i
i=1 i=1

It is sufficient to show that


kn
D
X
An ~xn,i εn,i → N (~o, σ 2 Γ)
i=1
0
By Wald s device, it is sufficient to show that
∀ ~t =6 ~0
kn
D
X
~t0 An ~xn,i εn,i → N (0, σ 2~t0 Γ~t)
i=1

Let un,i = ~t0 An~xn,i εn,i


Then {un,i , Fn,i } is martingale difference s.t.
kn
X kn
X
E(u2n,i | Fn,i−1 ) = (~t0 An~xn,i )2 E[ε2n,i | Fn,i−1 ]
i=1 i=1
kn
X
= σ 2 ~t0 An~xn,i~x0n,i A0n~t
i=1
kn
X
= σ 2~t0 An ~xn,i~x0n,i A0n t
1
D 2~0
→ σ t Γ~t = C, say.

152
kn
X h i
and E u2n,i I{u2n,i >ε} | Fn,i−1
i=1
kn 
α−2
X
α
≤ E[| un,i | | Fn,i−1 ] ε 2

i=1
kn
1 X
= α−2 | ~t0 An~xn,i |α E[| εn,i |α | Fn,i−1 ]
ε 2
i=1
kn
−( α−2
2 )
X
≤ ε 2
sup E[| εn,i | | Fn,i−1 ] | ~t0 An~xn,i |2
1≤i≤kn
i=1
· sup | ~t0 An~xn,i |α−2 .
1≤i≤kn
n
!
α−2
≤ ε−( ) sup E[| ε |α | F
X
2
n,i
~0
n,i−1 ] · t An ~xn,i~xn,i A0n~t
1≤i≤kn
i=1
D
·k~tk sup kAn~xn,i k → 0.
1≤i≤kn

Example : yo = 0
yn = α + βyn−1 + εn , where | β |< 1, εn i.i.d., E[εn ] = 0,
V ar[εn ] = σ α , E[| εn |2 ] < ∞, for some α > 2.

yn = α + β[α + βyn−2 + εn−1 ] + εn


= α + βα + β 2 yn−2 + βεn−1 + εn
= α + βα + β 2 α + · · · + β n−1 α + β n−1 εn−1 + · · · + εn

Since α + βα + β 2 α + · · · + β n−1 α + · · ·
= α(1 + β + β 2 + · · · + β n−1 + · · · )
1
= α .
1−β

153
n 2
σ2

1X 2 α
implies y → +
n i=1 i 1−β (1 − β)2
n
1X α
yi → a.s.
n i=1 1−β
 
1
yn = (α, β) = β~ 0~xn
yn−1
n  
1X 1
(1, yi−1 )
n i=1 yi−1
 n
X 
n yi−1
1 i=1

=
 
 n n
n X X 
2 
yi−1 yi−1
i=1 i=1
!
1 α/(1 − β)
→ 
α
2
σ2 ≡ Γ.
α/(1 − β) 1−β
+ (1−β) 2


take ( n)−1 = A
Now, kn = n
 
1 1
sup k √ k
1≤i≤n n yi−1
1 1
≤ √ + √ sup | yi−1 |
n n 1≤i≤n
It is sufficient to show that

yn−1 / n → 0 a.s.
n−1
X n−2
X
yi2 − yi2
2 i=1 i=1
yn−1 /n = → 0 a.s.
kn
!n
X
An ~xn,i~x0n,i A0n → Γ a.s.
1
An /An−1 → 1 a.s.
D
implies sup kAn~xn,i k → 0.
1≤i≤kn

154

You might also like