0% found this document useful (0 votes)

23 views58 pages

ST302

Uploaded by

kyurisu24more

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views58 pages

ST302

Uploaded by

kyurisu24more

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

ST 302 –Stochastic Processes

Thorsten Rheinlander
London School of Economics and Political Science
August 17, 2006

1 Information and Conditioning

1.1 Sigma-Algebras and Filtrations
Let us recall that a -algebra is a collection F of subsets of the sample space
such that
(i) ;, 2F
(ii) if A 2 F then also Ac 2 F (here Ac denotes the complementary set to A)
(iii) if we have a sequence A1 , A2 , ... of sets in F, then [1
i=1 Ai 2 F.

We say that a collection of sets generates a -algebra F if F is the smallest

-algebra which contains all the sets. The trivial -algebra is the one containing
only ; and .

Interpretation: We perform a random experiment. The realized outcome is an

element ! of the sample space . Assume that we are given some partial infor-
mation in form of a -algebra F. If F does not contain all subsets of , we might
not know the precise !, but we may narrow down the possibilities. We know with
certainty whether a set A 2 F either contains ! or does not contain it –these sets
are resolved by our given information.

Example 1.1 We toss a coin three times, with the result of each toss being H
(head) or T (tail). Our time set consists of t = 0; 1; 2; 3. The sample space is
the set of all eight possible outcomes. At time 0 (just before the …rst coin toss),

1
we only know that the true ! does not belong to ; and does belong to , hence
we set
F0 = f;; g :
At time 1, after the …rst coin toss, in addition the following two sets are resolved:

AH = fHHH; HHT; HT H; HT T g
AT = fT HH; T HT; T T H; T T T g :

We can say with certainty whether ! belongs to AH or AT . In contrast, the

information about the …rst coin toss only is not enough to determine with certainty
whether ! is contained e.g. in fHHH; HHT g – for this we would have to wait
until the second coin toss. As the complement of AH is AT , and the union of AH
and AT equals , we set
F1 = f;; ; AH ; AT g :
At time 2, after the second coin toss, in addition to the sets already contained in
F1 , the sets

AHH = fHHH; HHT g ; AHT = fHT H; HT T g ;

AT H = fT HH; T HT g ; AT T = fT T H; T T T g

get resolved, together with their complements and unions. Altogether, we get

;; ; AH ; AT ; AHH ; AHT ; AT H ; AT T ; AcHH ; AcHT ; AcT H ; AcT T ;

F2 = :
AHH [ AT H ; AHH [ AT T ; AHT [ AT H ; AHT [ AT T

At time 3, after the third coin toss, we know the true realization of !, and can
therefore tell for each subset of whether ! is a member or not. Hence

F3 = The set of all subsets of :

As we have F0 F1 F2 F3 these four -algebras are said to form a …ltration.

The random variable

X = number of tails among the rst two coin tosses

is said to be F2 -measurable, but not F1 -measurable (as we cannot determine the

value of X after only one coin toss).

2
Consider now a discrete time stochastic process X = (X0 ; X1 ; X2 ; :::), i.e. a
collection of random variables indexed by time. Hence X is a function of a chance
parameter ! and a time parameter n. We would write Xn (!) for a function
value, but typically suppress the chance parameter, otherwise the notation gets
too heavy. Moreover, the two variables play quite di¤erent roles, since the chance
parameter ! comes from the sample space (which can be a very large set, with
no natural ordering) whereas the time parameter n is an element of the ordered
set N+ .
We denote with Fn a family of sets containing all information about X up
to time n. In more detail, Fn is the -algebra generated by sets of the form (we
assume that the starting value X0 is just a deterministic number)
fX1 = i1 ; X2 = i2 ; :::; Xn = in g
if the state space is discrete. If the state space is R then Fn is the -algebra
generated by sets of the form
fX1 2 (a1 ; b1 ) ; X2 2 (a2 ; b2 ) ; :::; Xn 2 (an; bn )g
with intervals (a1 ; b1 ), (a2 ; b2 ), ... We have
F0 = f;; g (at time zero, we know nothing)
F0 F1 F2 ::: Fn (the more time evolves, the more we know)
(Fn ) = (F0 ; F1 ; F2 ; :::) is called a …ltration. Sometimes we write FnX to specify
that we are collecting information about the stochastic process X.
We say a random variable H is Fn -measurable if it depends on information
about X up to time n only. For example, if f is some function, then f (Xn ) is
Fn -measurable; and so is max1 i n Xi . Or, in other words, H is Fn -measurable if
it is a function of X1 ,...,Xn .
We say that a stochastic process Y = (Yn )n 0 is adapted to (Fn ) if each Yn is
Fn -measurable. FnY is called the …ltration generated by Y if it is the smallest
…ltration where Y is adapted to. Summing up,
The random variable H is measurable with respect to a -algebra F if the
information in F is su¢ cient to determine H, and F can even contain more
information than just about H.
The stochastic process X is adapted to a …ltration (Fn ) if the information in
(Fn ) is su¢ cient to determine all the Xn ’s, and (Fn ) can even contain more
info. FnX contains just the information about all the Xn ’s. More precisely,
the -algebra FnX contains exactly all the information about X0 , X1 , ..., Xn .

3
1.2 Conditional Expectation
We consider a -algebra F and want to make a prediction about some random
variable H based on the information contained in F. If H is F-measurable, then
we can precisely determine H from the given information in F. If H is independent
of F, then the information in F is of no help in predicting the outcome of H. In
the intermediary case, we can try to use the information in F to make an ’educated
guess’about H, without being able to completely evaluate H.
You cannot avoid this section. Its material is absolutely essential for every-
thing that follows. To quote Thomas Mikosch, "The notion of conditional ex-
pectation is one of the most di¢ cult ones in probability theory, but it is also one
of the most powerful tools... For its complete theoretical understanding, measure-
theoretic probability theory is unavoidable." However, it is perfectly possible, and
not even di¢ cult, to get an operational understanding of conditional expectation.
You have just to grasp the following intuitive properties, they will be given for the
forementioned reason without proof.

De…nition 1.2 If E [H 2 ] < 1, then the conditional expectation of H given F,

written E [H j F], is the least-squares-best F-measurable predictor of H: amongst
all F-measurable random variables H b (i.e. amongst all predictors which can be
computed from available information), it minimizes
2
E b
H H :

It turns out that one can de…ne (by the partial averaging property below) the
conditional expectation also in the more general case where we only assume that
E [jHj] < 1 (so E [H 2 ] < 1 need not be satis…ed). We give a list of properties
of E [H j F]. Here A denotes the indicator function of the set A: A (!) = 1 if
! 2 A, A (!) = 0 if ! 2 = A.
(a) (Measurability) E [H j F] is F-measurable
(the conditional expectation is a predictor based on available information)
(b) (Partial Averaging) For every set A 2 F,
E[ AE [ H j F]] = E [ A H]

(on every set A 2 F, the conditional expectation of H given F has the same
expectation as H itself). In particular (take A = ),
E [E [ H j F]] = E [H]

4
Properties (a) and (b) characterize the random variable E [H j F] uniquely (mod-
ulo null sets).
(c) (Linearity) E [ a1 H1 + a2 H2 j F] = a1 E [H 1 j F] + a2 E [ H2 j F]
(d) (Positivity) If H 0, then E [H j F] 0
(e) (Jensen’s Inequality) If f is a convex function such that
E [jf (H)j] < 1, then
f (E [H j F]) E [ f (H) j F]
(f) (Tower Property) If G is a sub -algebra of F (contains less information
than F) then
E [E [H j F] j G] = E [H j G] :
In case we deal with a …ltration, this property has the following form: for two
time points s t (hence Fs Ft ) we have
E [E [H j Ft ] j Fs ] = E [H j Fs ] :
(nested conditional expectations are evaluated by taking only one conditional
expectation with respect to the earliest time point)
(g) (Taking out what is known) If G is a random variable which is F-
measurable, and such that E [jGHj] < 1, then
E [ GH j F] = G E [ H j F]
In particular (choose H = 1),

E [G j F] = G
(if G does only depend on available information then we can predict G fully)
(h) (Rôle of independence) If H is independent of F, then
E [H j F] = E [H]
(in that case, the information in F is useless in predicting H)
Let X be any random variable. We will sometimes use the notation E [H j X] if
we predict H by using the information about X only (more formally, we condition
with respect to the -algebra generated by X). Moreover, if A we write
P (Aj F) for E [ A j F]. An important special case of (h) is if we consider the
trivial -algebra F0 = f;; g. In that case, the conditional expectation with
respect to F0 is just the common expectation (so the result is a number):
E [ H j F0 ] = E [H] :

5
2 Martingales in Discrete Time
2.1 De…nition
Let X = (Xn )n 0 be a discrete time process with E [jXn j] < 1 for all n, and let
(Fn ) be some …ltration. We assume that X is adapted to (Fn ), that is, each Xn
is Fn -measurable.

De…nition 2.1 X is called a martingale (relative to (Fn ), P ) if for all n 1

E [Xn j Fn 1 ] = Xn 1 ;

a submartingale if
E [ Xn j Fn 1 ] Xn 1 ;
and a supermartingale if

E [ Xn j Fn 1 ] Xn 1 :

Interpretation: let Xn be the wealth at time n of a player engaged in some casino

game. The martingale property, rewritten as

E [ (Xn Xn 1 )j Fn 1 ] = 0

(here we have used properties (c): Linearity and (g): Taking out what is known
of the conditional expectation), tells us that, based on all the information about
the game’s progress so far, one can predict that our increase in wealth will be on
average zero: we are playing a fair game. In reality, our wealth will be described in
most games by a supermartingale: there is a tendency to loose on average. There
are some games like Black Jack, however, where you can make a very clever use of
past events (like memorizing the cards which have already been taken out of the
deck) to achieve that your wealth process evolves like a submartingale: you are
engaged in a favourable game.
We will later on introduce the notion of Markov processes: these are stochastic
processes where future predictions depend on the past only via the present state,
they are memoryless. A martingale need not be a Markov process (because future
outcomes may depend on the whole past) and a Markov process need not be a
martingale (since it may be related to an unfair game).

Exercise 2.2 Prove the following statements by applying the properties of

conditional expectation. State in each step which property you use. All random
variables are assumed to be su¢ ciently integrable.

6
Fix some time horizon T 2 N0 , and let H be a FT -measurable random
variable. De…ne a stochastic process X for n = 0; 1; 2; :::; T as
Xn = E [Hj Fn ] :

Show that X is a martingale: check that X is adapted and that it ful…lls

the martingale property.
Let f be a convex function and X a martingale. Show that f (X) is a
submartingale and f (X) a supermartingale.
A stochastic process X is called predictable if for each n we have that Xn
is Fn 1 -measurable. Show that every predictable process is (Fn )-adapted,
and that every predictable martingale is constant.
Let Y be a predictable process, and X be a martingale. Show that then the
martingale transform Y X, de…ned as ((Y X)0 = 0 and)
X
n
(Y X)n = Yi (Xi Xi 1 )
i=1

is a martingale as well.

2.2 Discrete-Time Examples

Example 2.3 Sums of independent zero-mean RVs (random variables). Let
X1 , X2 ,... be a sequence of independent RVs with E [jXk j] < 1, 8k, and
E [Xk ] = 0; 8k:
De…ne (S0 := 0 and)
Sn := X1 + X2 + ::: + Xn :
Let (Fn ) = FnX , the …ltration which is generated by X. Note that Sn is Fn -
measurable (depends only on (X1 ; X2 ; :::; Xn )). Then we have for n 1
E [Sn j Fn 1 ] = E [Sn 1 j Fn 1 ] + E [Xn j Fn 1 ] by linearity (c)
= Sn 1 + E [Xn j Fn 1 ] taking out what is known (g)
= Sn 1 + E [Xn ] by independence (h)
= Sn 1 :
This proves that S = (Sn ) is a martingale.

7
Example 2.4 Products of non-negative independent RVs of mean 1. Let
Y1 , Y2 ,... be a sequence of independent non-negative RVs with

E [Yn ] = 1; 8n:

De…ne (M0 := 1 and)

Mn := Y1 Y2 :::Yn :
Let (Fn ) = FnY , the …ltration which is generated by Y . Then, for n 1, we
have

E [ Mn j Fn 1 ] = E [Mn 1 Yn j Fn 1 ]
= Mn 1 E [Yn j Fn 1 ] taking out what is known (g)
= Mn 1 E [Yn ] by independence (h)
= Mn 1 :

This proves that M is a martingale.

Example 2.5 Exponential martingale. Let X1 , X2 ,... be a sequence of inde-

pendent and identically distributed RVs with moment-generating function

' ( ) = E [exp ( Xi )] < 1:

We set (S0 := 0 and) Sn = X1 + ::: + Xn . Let (Fn ) = FnX , the …ltration which
is generated by X.
Then
exp( Sn )
Mn =
' ( )n
is a martingale. This follows from the previous example, just put
exp ( Xk )
Yk = :
'( )
Example 2.6 Exponential martingale for random walk. In the setting of
the previous example, denote the distribution of the Xn ’s as X where
1
P (X = +1) = P (X = 1) = :
2
The stochastic process S = (Sn )n 0 is called a symmetric random walk. Then
1 +
' ( ) = E [exp ( X)] = e +e = cosh :
2

8
We get that
n
1
Mn = exp ( Sn )
cosh
is a martingale. In case the walk is not symmetric, then we have

' ( ) = pe+ + (1 p)e :

Choose now such that e = (1 p)=p, then ' ( ) = (1 p) + p = 1 and the

exponential martingale simpli…es to

Mn = exp ( Sn ) = ((1 p) =p)Sn :

2.3 Optional Stopping Theorem

A martingale has constant expected value (why?). However, in practice one could
try to get on average a better gain by using a more sophisticated approach. For
example, let X be a stochastic process modelling the level of some stock price, we
assume it is a martingale. Often traders are basing their decisions on technical
analysis (e.g. they buy after some particular candlestick pattern has occurred
or after the stock has broken upwards through the 200 days moving average).
Their trading decision thus is triggered by the occurence of some random event.
However, unless they have visionary abilities, this occurence at time n can be
determined by looking at the values X1 ,...,Xn of the price process up to that time.
This leads to the following

De…nition 2.7 Consider a discrete-time …ltration (Fn ). A stopping time is a

mapping T : ! f0; 1; 2; :::; 1g such that the set f! : T (!) ng is contained in
Fn , 8n 1 (equivalently, the set f! : T (!) = ng is contained in Fn , 8n 1).
Note that T can achieve the value 1.
To get an example for a stopping time, we consider a stochastic process X0 ,
X1 ,... with X0 = 0. Then the smallest time n such that Xn surpasses a given
level a > 0 is a stopping time Ta : if Ta (!) = n, then all the X0 , X1 , ..., Xn 1 are
smaller than a whereas Xn is larger than a.
Not all random times are stopping times, though. Let Xn be the stock price
of BMW at time n. We now set U to be the last time where the stock price of
BMW was $100 or lower. U cannot be a stopping time since you must know all
the future stock prices of BMW to determine whether U has occured or not. In
case you would really know that you are at time U , you could make quite some
fortune in a completely riskless manner!

9
Of more practical relevance is therefore the question whether you can turn a
game which is considered to be fair on a turn by turn base (your wealth coming
from this game is modelled as a martingale) by using a clever stopping time-based
strategy into something favourable for you. In fact, at …rst sight even a rather
dull strategy would serve this purpose, as the following example suggests:

Example 2.8 Riskless gain based on a stopping time strategy Assume

our wealth from gambling evolves like a symmetric random walk S where we start
with $0 (we are allowed to get into minus). We play this game until the stopping
time
T1 = min fn 1 : Sn = 1g ;
that is, we stop once we have gained one pound and walk happily away. As you
can imagine, life is not that easy. The next result tells us that this phenomenon
cannot happen when our bank agent o¤ers us a …nite credit line only.

Theorem 2.9 Optional stopping theorem Let T be a stopping time. Let M

be a martingale. Then MT is integrable and
E [MT ] = E [M0 ]
in each of the following situations:
(i) T is bounded (for some N > 0, T (!) N 8!)
(ii) E [T ] < 1, and, for some K 0, jMn (!) Mn 1 (!)j K 8 (n; !)
Finally, based on the optional stopping theorem, we are able to give many inter-
esting applications.

Example 2.10 Symmetric random walk. Imagine a game based on fair coin
tossing, that is, the gambler’s fortune is modelled by a symmetric random walk
Sn = X1 + ::: + Xn with S0 = x. Let a < b and x 2 (a; b). We are interested in
the probability that we are reaching the higher gain b …rst. To calculate this, we
consider the stopping time
T = min fn : Sn 2
= (a; b)g ;
the …rst time the random walk exits from the interval (a; b). One can show that
E [T ] < 1, so condition (ii) of the optional stopping theorem is ful…lled. We
therefore get
x = E [ST ] = aP (ST = a) + bP (ST = b) :
Since P (ST = a) = 1 P (ST = b), we get that x = a + (b a) P (ST = b), hence
the probability that starting in x we reach the level b before a is (x a) = (b a).

10
Example 2.11 Gambling game. Same setting as in the last example, but here
we have (asymmetric walk) that
P (Xi = +1) = p; P (Xi = 1) = 1 p,q
(typically p < q). Using the martingale of example 2.6, we get
x a b
q q q
= P (ST = a) + P (ST = b) :
p p p
Since P (ST = a) = 1 P (ST = b), we get
( )
x a b a
q q q q
= + P (ST = b) :
p p p p
We solve this to get that the probability that starting in x we reach the level b is
x a
q q
p p
b a
q q
p p
:
Example 2.12 Duration of fair games. Consider a symmetric simple random
walk Sn = X1 + ::: + Xn with S0 = 0. For a < 0 < b, let
T = min fn : Sn 2
= (a; b)g :
Then we have
E [T ] = ab:
Proof: We have that Mn = Sn2 n is a martingale. Indeed:
Mn+1 Mn = (Sn + Xn+1 )2 (n + 1) Sn2 + n
2
= 2Sn Xn+1 + Xn+1 1:
Now, Sn is Fn -measurable, and Xn+1 is independent of Fn , with E [Xn+1 ] = 0 and
2
E Xn+1 = 1. This yields
2
E [ Mn+1 M n j Fn ] = E 2Sn Xn+1 + Xn+1 1 Fn
2
= 2Sn E [Xn+1 j Fn ] + E Xn+1 Fn 1
2
= 2Sn E [Xn+1 ] + E Xn+1 1
= 0:
One can show that E [T ] < 1, so condition (ii) of the optional stopping theorem
is ful…lled. Applying this to the martingale Sn2 n yields
b a ab2 + ba2
E [T ] = E ST2 = a2 + b2 = = ab:
b a b a b a

11
3 Discrete-Time Markov Chains
3.1 Basic De…nitions
Example 3.1 (Gambling game) Consider a gambling game in which on any
turn you win $1 with probability p = 0:4 or lose $1 with probability 1 p = 0:6.
Let Xn be the amount of money you have after n turns. You stop the game if
you are either bankrupt (Xn = 0) or if you have reached a prespeci…ed gain of
$N . Then X0 , X1 , X2 ,... is a stochastic process which has the following ’Markov
property’: Given the present state Xn , any other information about the past is
irrelevant in predicting the next state Xn+1 .
In the following, we …x a …nite set S, called the state space. The stochastic
processes considered in this chapter are assumed to have values in S unless other-
wise stated.

Notation 3.2 Let Y be a random variable. If we condition with respect to the

-algebra (Y ) generated by Y , i.e. the smallest -algebra with respect to which
Y is measurable, then we often write E [ Xj Y ] instead of E [Xj (Y )]. If Y takes
values in S, then we have that (Y ) is generated by sets of the form f! : Y (!) = ig
where i 2 S.

De…nition 3.3 A discrete time stochastic process X = (Xn ) is called a Markov

chain with respect to a given …ltration (Fn ) if for any function h : S! R we have

E [h (Xt+s )j Ft ] = E [ h (Xt+s )j Xt ]

for any s; t 2 N+ . If this conditional expectation depends only on the time incre-
ment s, but not on t, then the chain is said to be homogenous.
We shall from now on focus on homogenous chains. Let i; j 2 S. We can then
form the quantities
p(i; j) = P (Xt+1 = jj Xt = i)
which do not depend on t since the chain is homogenous. p (i; j) is called the one-
step transition probability from state i to state j. We can form the transition
matrix (p (i; j))i;j .
In the gambling game example, the transition probability for 0 < i < N is

p (i; i + 1) = 0:4;
p (i; i 1) = 0:6:

12
Moreover, we have p (0; 0) = 1 and p (N; N ) = 1. The transition matrix for N = 4
is (here the state space is S = f0; 1; 2; 3; 4g)

0 1 2 3 4
0 1:0 0 0 0 0
1 0:6 0 0:4 0 0
2 0 0:6 0 0:4 0
3 0 0 0:6 0 0:4
4 0 0 0 0 1:0

Properties of the transition matrix:

(1) we have 0 p (i; j) 1 P
(2) all rows sum up to one: j p (i; j) = 1
Any matrix with these properties can be a transition matrix of some Markov chain
and is also called a stochastic matrix.

3.2 Multistep Transition Probabilities

We want to compute the probability of going from i to j in m > 0 steps (the
transition probability gives us this for one step). This probability is denoted by
pm (i; j).
Key idea: To go from i to j in m + n steps, we have to go from i to some
intermediary state k in m steps and then from k to j in n steps. The Markov
property implies that the two parts of our journey are independent.
This observation leads to the Chapman-Kolmogorov equation:
X
pm+n (i; j) = pm (i; k) pn (k; j)
k

For example, for m, n = 1 we get

X
p2 (i; j) = p (i; k) p (k; j)
k

That is, the two step transition matrix is the matrix product of the one step
transition matrix with itself. It follows by induction

Theorem 3.4 The m step transition matrix pm = (pm (i; j))i;j is the mth power
of the one step transition matrix.

13
Example 3.5 (Coin tossing) Each turn, a fair coin is tossed. We have 3 classes,
hence S = f1; 2; 3g. If you score head, you move up one class unless you are in
1st class. If you score tail you move down unless you are 3rd class. The transition
matrix is
1 2 3
1 0:5 0:5 0
2 0:5 0 0:5
3 0 0:5 0:5
The two-step transition matrix is obtained by matrix multiplication
0 1 0 1 0 1
0:5 0:5 0 0:5 0:5 0 0:5 0:25 0:25
@ 0:5 0 0:5 A @ 0:5 0 0:5 A = @ 0:25 0:5 0:25 A
0 0:5 0:5 0 0:5 0:5 0:25 0:25 0:5

For the 3 step transition matrix we have to multiply

0 1 0 1 0 1
0:5 0:25 0:25 0:5 0:5 0 0:375 0:375 0:25
@ 0:25 0:5 0:25 A @ 0:5 0 0:5 A = @ 0:375 0:25 0:375 A
0:25 0:25 0:5 0 0:5 0:5 0:25 0:375 0:375

and so on.

3.3 Classi…cation of States

De…nition 3.6 A state i is called recurrent if

P (Xn = i for some future n > 0j X0 = i) = 1;

otherwise it is called transient.

Example 3.7 S = f1; 2; 3g. The transition matrix is

1 2 3
1 1=3 1=3 1=3
2 0 1=2 1=2
3 0 3=4 1=4

Here 2 and 3 are recurrent, while 1 is transient (once you have gone to 2 or 3,
there is no way back to 1).

14
De…nition 3.8 We say that state i communicates with state j (write i ! j)
if one can get with positive probability from i to j, that is, pn (i; j) > 0 for some
n > 0. If i ! j and j ! i then i and j intercommunicate. If all states in a
chain intercommunicate with each other, then the chain is called irreducible.
In the previous example, state 1 communicates with states 2 and 3, but does not
intercommunicate with them. States 2 and 3 intercommunicate. So the example
chain is not irreducible.

Theorem 3.9 If i communicates with j, but j does not communicate with i, then
i is transient.
The explanation is that with positive probability we get from i to j but can never
go back.

De…nition 3.10 A set C S of states is called closed if it is impossible to get

out. That is, if i 2 C and j 2
= C then p (i; j) = 0. A chain is irreducible if there
exists no closed set other than the set of all states. So it is irreducible if every
state can be reached from every other state.

Exercise 3.11 A seven-state chain: consider the transition matrix

1 2 3 4 5 6 7
1 :3 0 0 0 :7 0 0
2 :1 :2 :3 :4 0 0 0
3 0 0 :5 :5 0 0 0
4 0 0 0 :5 0 :5 0
5 :6 0 0 0 :4 0 0
6 0 0 0 0 0 :2 :8
7 0 0 0 1 0 0 0

Give a graphical representation of the chain in drawing arrows from i to j if i

communicates with j. Identify the closed sets. Which states are transient? Is the
chain irreducible?

Theorem 3.12 If C is a …nite, closed, and irreducible set, then all states in C
are recurrent.
The explanation consists of two steps: 1) In a …nite closed set C there has to be one
recurrent state: since C is closed, the chain spends an in…nite amount of time in
C; if it would spend only a …nite amount of time at each of the …nitely many states

15
of C, we would have a contradiction. 2) If i is recurrent and intercommunicates
with j, then j is also recurrent.

Theorem 3.13 (Decomposition Theorem) If the state space S is …nite, then

S can be uniquely written as a disjoint union

S = T [ R1 [ ::: [ Rk ;

where T is a set of transient states and the Ri ’s are closed sets of recurrent states.

De…nition 3.14 It is a theoretical possibility that the probability of coming back

is non-zero only after every nth turn and zero otherwise, i.e. pn (i; i) > 0 and
pm (i; i) = 0 for m = 1; 2; :::; n 1. The period of a state i is the largest number
that divides all n for which pn (i; i) > 0. If all states have period 1, then the chain
is called aperiodic. To exclude this technical nuisance, our chains will always
be assumed to be aperiodic. This phenomenon does not occur in continuous-time
Markov chains.

3.4 Limit Behaviour

Our goal in this section is to identify the long time behaviour of a Markov chain -
we are interested in (j) , limn!1 pn (i; j) in case that limit exists. The following
concept is the key to this problem.

De…nition
P 3.15 A stationary distribution is a vector with 0 (j) 1,
j (j) = 1 which solves
X
(i) p (i; j) = (j)
i

or shorter in matrix notation p = .

Theorem 3.16 If X0 has distribution , i.e., P (X0 = j) = (j), then so does

Xn for all n 1.
To see this, we denote with m the distribution after m steps. As a consequence
of the Chapman-Kolmogorov equation we have

m = pm

where pm is the mth power of the transition matrix. By de…nition of the stationary
distribution we have 1 = p = , 2 = 1 p = p = , and so on.

16
Theorem 3.17 (Convergence theorem) Suppose we have an irreducible, ape-
riodic chain which has a stationary distribution (it follows from irreducibility
that must be unique). Then as n ! 1, pn (i; j) ! (j). If the state space S is
…nite, then there exists such a . Moreover, if i is the expected no. of steps it
takes to go from i back to i, then (i) = 1= i .

Example 3.18 Consider a chain with the following transition matrix p:

1 2 3
1 :4 :6 0
2 :2 :5 :3
3 :1 :7 :2
To determine the stationary distribution, we have to solve the matrix equation
p = which is in fact three equations for the three unknowns (1), (2), (3):

0:4 (1) + 0:2 (2) + 0:1 (3) = (1)

0:6 (1) + 0:5 (2) + 0:7 (3) = (2)
0:3 (2) + 0:2 (3) = (3)

In addition, we have (1) + (2) + (3) = 1, so one of the equations is redundant.

The solution is

(1) = :22353; (2) = :56471; (3) = :21176

The number n of turns does not have to be very large for pn (i; j) to be close to
its limiting value (j). For example, already after 4 steps we have
:2278 :5634 :2088
4
p = :2226 :5653 :2121
:2215 :5645 :2140

3.5 In…nite State Space

We study here the example of the re‡ecting random walk. This is a Markov
chain with state space S = f0; 1; 2; :::g. We have as transition probabilities (for
some p 2 (0; 1))

p (i; i + 1) = p when i 0
p (i; i 1) = 1 p when i 1
p (0; 0) = 1 p

17
To …nd a stationary distribution we solve the equation
X X
(i) p (i; j) = (j) with (j) = 1:
i j

In this case p (i; j) = 0 if ji jj > 1. So we end up with

(0) (1 p) + (1) (1 p) = (0)

(0) p + (2) (1 p) = (1)
(1) p + (3) (1 p) = (2)
::: = :::
(i 1) p + (i + 1) (p 1) = (i)

and so on. We set (0) = c for some c to be determined later. By induction, the
solution to our equation is
i
p
(i) = c :
1 p
We have three cases:

1. pP> 1=2: then p= (1 p) > 1 and (i) increases exponentially fast. Hence
i (i) = 1 cannot be satis…ed, and consequently, there does not exist a
stationary distribution .
P
2. p = 1=2: p= (1 p) = 1, so (i) = c and i (i) = 1. Again, there is no
stationary distribution.
P P
3. p < 1=2: = p=(1 p) < 1, so i (i) = c i i is a convergent geometric
series. Recall that we have for such a series
X
1
1 1 p
i
= = ;
i=0
1 1 2p

so taking c = (1 2p) = (1 p) we get that is a probability distribution.

One can show that the re‡ecting random walk is aperiodic. It is clearly irreducible.
Therefore the convergence theorem implies that for p < 1=2, the probability that
the walk is after n steps in state i converges for n ! 1 to (i). In this case,
the chain is recurrent. One can show that it is even positively recurrent: the

18
expected return time i to state i (i.e., the expected no. of steps it takes to go
from i back to i) equals i = 1= (i) < 1.
On the other hand, if p > 1=2 then the chain is transient: once we leave a state,
with positive probability we will never come back (the chain ’explodes’).
The case p = 1=2 is a borderline case between recurrence and transience: the state
0 is recurrent, but 0 = 1. In such a situation one says that 0 is null-recurrent.

4 Poisson-Processes
From now on, we will study continuous-time stochastic processes. The time set
is now R+ so that a stochastic process is a collection of random variables X =
(Xt )t 0 . The information about X is contained in a …ltration (Ft ) = (Ft )t 0 where
Fs Ft for s < t. Sometimes we consider …ltrations which contain not only
information about the process X but also about some other events or processes.
To express that an arbitrary …ltration (Ft ) contains all information about X (and
possibly more) we use the expression ’X is adapted to (Ft )’which means that
for each t 0, Xt is Ft -measurable.
We say that a stochastic process M is a martingale relative to (Ft ) if E [jMt j] < 1
for all t and
(i) M is adapted to (Ft )
(ii) we have for all (s; t) with s < t that
E [Mt j Fs ] = Ms :
Let T be a random time. As the event fT = tg often has probability zero, we
formulate the stopping time property now as follows: T is a stopping time if for
all t 0
fT tg 2 Ft :
There is also an optional stopping theorem in continuous time. However, some-
times it is a bit technical to check whether its assumptions are satis…ed. We will
tell you when that is the case.
Suppose f is a (deterministic) function of time. We often want to approximate
f (t) for small values of t and collect some remainder terms which are negligible
for small t in a ’bin’which we shall denote with the symbol o (t). For example,
consider the series expansion of the exponential function,
1 1
et = 1 + t + t2 + t3 + :::
2 6
= 1 + t + o (t)

19
re‡ecting that for very small t we have

et t 1 + t:

More formally, a function r (t) is said to be o (t) (read: ’small oh of t’) if

r(t)
lim = 0;
t#0 t
that is, r (t) goes faster to zero for t # 0 than the function f (t) = t.

4.1 Standard Poisson Process

Imagine you are running a car insurance company. Claims after accidents occur at
random times T1 , T2 , T3 ,... By time t we know whether the ith claim has arrived
or not, so the Tn are all stopping times (relative to the …ltration (Ft ) which re‡ects
all the info your company’s agents and secret agents have gathered). We always
set T0 = 0.
Take now the process N = (Nt )t 0 which counts the number of claims (at this
stage we are not yet looking whether it was a Rolls Royce or a Beetle which was
involved into the accident),

Nt $ number of claims which have arrived up to time t:

More formally we can de…ne N as follows: recall …rst the indicator function
1; if t Tn (!)
fTn tg =
0; if t < Tn (!) :
Let now N be given as X
Nt = fTn tg ;
n 1

N is called the counting process associated to the stopping time sequence (Tn ).
As the Tn are stopping times, we have fTn tg 2 Ft for all n. Therefore Nt is
Ft -measurable for all t, and hence adapted to the …ltration (Ft ).

De…nition 4.1 A counting process N is a Poisson process with intensity if

(i) its increments are independent from the past
(ii) it has stationary increments
(iii) E [Nt ] = t for all t 0.

20
Here (i) means that for any s < t we have that Nt Ns is independent of Fs , and
(ii) that for all s; t and u; v such that t s = v u we have that the probability
distribution of Nt Ns is the same as that of Nv Nu (and hence the same as
that of Nt s , since N0 = 0).

Theorem 4.2 Let N be a Poisson process with intensity . Then

(i) Nt t is a martingale (the so-called compensated Poisson process)
(ii) (Nt t)2 t is a martingale.
Proof. (i) We have already seen that N is adapted, hence this is also true for the
compensated Poisson process (we are subtracting a deterministic quantity from
N ). As for the martingale property, we have for all s < t

E [(Nt t) (Ns s)j Fs ] = E [Nt Ns j Fs ] t+ s

= E [Nt Ns ] t+ s by independence
= t s t+ s by (iii)
= 0:

As Ns s is Fs -measurable, we get E [ (Ns s)j Fs ] = Ns s (taking out what

is known), so we can conclude that

E [ (Nt t)j Fs ] = Ns s:

(ii) This is an exercise.

We now want to relate the Poisson process to the Poisson distribution. For this we
need one result from analysis: Let the function f be a right-continuous solution
to the functional equation

f (t + s) = f (t) f (s) ; for all 0 s < t:

Then either f (t) = exp (ct) for some constant c or f (t) = 0.

Theorem 4.3 Let N be a Poisson process with intensity . Then for n 0

e t
( t)n
P (Nt = n) = :
n!
That is, Nt has the Poisson distribution with parameter t.

21
Proof. Step 1. For all t 0, P (Nt = 0) = e t .
Since fNt = 0g = fNs = 0g \ fNt Ns = 0g for 0 s < t we get by the indepen-
dence of increments that
P (Nt = 0) = P (Ns = 0) P (Nt Ns = 0)
= P (Ns = 0) P (Nt s = 0)
by the stationarity of increments. Let f (t) = P (Nt = 0). We have proven that
f (t) = f (s) f (t s) ; for all 0 s < t: As we can exclude the case f (t) = 0
(otherwise Nt = 1 for all t which is absurd), it follows that P (Nt = 0) = e ct for
a constant c which can be easily identi…ed at the end of the proof with .
Step 2. P (Nt 2) is o (t). That is, for very small t is the probability that the
Poisson process has more than one jump negligibly small. (We are not going into
details here)
Step 3.
P (Nt = 1) 1 P (Nt = 0) P (Nt 2)
lim = lim
t#0 t t#0 t
t
1 e + o (t)
= lim
t#0 t
= :
Step 4. Conclusion. For 0 < < 1, set
Nt
' (t) = E :
We have by the independence and stationarity of the increments
Nt+s
' (t + s) = E
Nt+s Nt +Nt
= E
Nt+s Nt Nt
= E E
= E Ns E Nt

= ' (s) ' (t) :

Again by our functional equation, this implies that there exists some ( ), de-
pending on , such that ' (t) = e ( ) t . On the other hand, since ' (t) = E Nt ,
X
n
' (t) = P (Nt = n)
n 0
X
n
= P (Nt = 0) + P (Nt = 1) + P (Nt = n) :
n 2

22
Moreover, ( ) = '0 (0), the derivative of ' at zero. Therefore

' (t) 1 P (Nt = 0) 1 P (Nt = 1) o (t)

( ) = lim = lim + +
t#0 t t#0 t t t
= + :

We get that
( )t t+ t
' (t) = e =e ;
hence by the exponential series
X X ( t)n n
n t
' (t) = P (Nt = n) = e :
n 0 n 0
n!

Equating coe¢ cients of the two in…nite series yields

t( t)n
P (Nt = n) = e for n 0;
n!
as desired.

Recall that the claims in our model arrive at the times T1 , T2 , T3 ,... If the Poisson
process N is the associated counting process to this sequence of stopping times,
then we get by step 1 of the previous proof

P (T1 t) = P (Nt 1)
= 1 P (Nt = 0)
= 1 e t:

By the stationarity of increments of a Poisson process, we also get that for each
n 0,
P (Tn+1 Tn t) = 1 e t :
That is, the waiting time up to the next arrival of a claim is exponentially
distributed. Let us recall the density function f of the exponential distribution:
t
e for t 0
f (t) =
0 for t < 0:

It is often useful to think of f (t) dt as ’P (T = t) dt’. Using the density, we can

calculate the expected value of T = Tn+1 Tn , the waiting time for the arrival of

23
the next claim, as follows (use integration by parts):
Z Z 1
t
E [T ] = t P (T = t) dt = t e dt
0
Z 1
t 1 1
= te 0
+ e t dt = :
0

Similarly,
2
E T2 = 2;

so we get for the variance

1
var (T ) = E T 2 (E [T ])2 = 2:

4.2 Compound Poisson Process

Coming back to our car insurance model, so far we have only a model of how
many claims arrive up to a given time (as speci…ed by Nt ), but we do know
nothing about the size of the claims (accident involved Rolls Royce or Beetle). To
take that size also into account, we will associate to each claim an independent
and identically distributed random variable Yi . By independent we mean that the
Yi are independent of each other and of the Poisson process of arrivals. The Yi
model the amount of money we have to pay for each claim. Naturally, we are
interested in the total amount of money we have to pay up to time t.

De…nition 4.4 Let N be a standard Poisson process, and Y1 , Y2 , ... as above.

The compound Poisson process S is then de…ned as (St = 0 if Nt = 0 and)

St = Y1 + ::: + YNt :

We calculate (if E [Nt ] < 1) that (with being the intensity of N )

X
E [St ] = E [St j Nt = n] P (Nt = n)
n 0
X
= n E [Yi ] P (Nt = n)
n 0
X
= E [Yi ] n P (Nt = n)
n 0
= E [Yi ] E [Nt ] = tE [Yi ] :

24
By a similar computation one can get (if E [Nt2 ] < 1) that
var (St ) = var (Yi ) E [Nt ] + var (Nt ) (E [Yi ])2
= t var (Yi ) + (E [Yi ])2 = t E (Yi )2 :
Let us recall that the moment-generating function for the Poisson distribution
(here for the random variable Nt s ) is given as
E ecNt s
= exp ((ec 1) (t s)) : (4.1)
We denote with MY (r) the moment-generating function for the random variable
Yi (which does not depend on i),
MY (r) = E erYi :
Theorem 4.5 We have
" !#
X
Nt
E exp r Yi = e(MY (r) 1) (t s)
:
i=Ns +1

Proof. Recall that we can write ax = exp (x log (a)). By the stationarity and
independence of the increments of N , as well as that the Yi ’s are independent of
N , we get
" !# " Nt s
!#
XNt X
E exp r Yi = E exp r Yi
i=Ns +1 i=1
" Nt s #
Y
(multiplicative property of exp) = E exp (rYi )
i=1
" Nt s #
Y
(Yi are independent of N ) = E E [exp (rYi )]
i=1
" Nt s #
Y
= E MY (r)
i=1
h i
= E (MY (r))Nt s
= E [exp (Nt s log (MY (r)))] :
We now set c = log (MY (r)) into the expression (4.1) for the moment generating
function for Nt s to end up with
E [exp (Nt s log (MY (r)))] = e(MY (r) 1) (t s)
:

25
4.3 Ruin Probabilities for the Classical Risk Process
Let us imagine an insurance company which starts with an initial capital u, col-
lects premiums at a rate c, and has payments modelled by a compound Poisson
process, as discussed above. Then we get the following model for the surplus of
the insurance portfolio:
X
Nt
Ct = u + ct Yi :
i=1

C is called the classical risk process. We assume that Yi 0 and that the Yi ’s
are independent and identically distributed as in the last subsection. Let

= E [Yi ] < 1:

A somewhat unpleasant event for the company happens at the ruin time

= inf ft > 0 : Ct < 0g ;

where we follow the convention that inf ? = 1. Therefore, the board of the
company as well as regulators might want to have a close look on the (ultimate)
ruin probability
(u) = P ( < 1) :
The ruin probability is a function of the initial capital u. If (u) is large, then
regulators might force the company to start business with more initial reserves,
that is, to work with a larger u. Obviously, (u) is decreasing in u. We are
interested in obtaining some description of the dependence of on u. Let us
impose one more assumption, the so-called net pro…t condition:

c> :

We can interpret this condition by recalling from last section that the expected
value of a compound Poisson process is given as
"N #
X t

E Yi = t:
i=1

Therefore, the net pro…t condition just tells that the premium rate (our income) is
higher than the expected rate of our claim payments. It is a necessary requirement
to get an insurance company to be interested into business. Moreover, we assume

26
that the claims are not too heavy-tailed by requiring the moment-generating func-
tion of the claim distribution to be …nite,

MY (r) = E [exp (rYi )] < 1:

The crucial observation which will lead to the description of the ruin probability
will be obtained by solving the following problem:

Find r > 0 such that exp ( rCt ) is a martingale!

In other words, we are looking for an exponential martingale related to the risk
process C. We compute by using the independence and stationarity of increments,
the de…nition of the risk process C, and the result of Theorem 4.5 that
rCt
E e Fs
r(Ct Cs ) rCs
= E e e Fs
rCs r(Ct Cs )
(taking out + independence) = e E e
" !#
X
Nt
rCs rct+rcs
(de nition of C) = e E exp r Yi
i=Ns +1
" Nt
!#
Xs
rCs rct+rcs
(stationarity) = e E exp r Yi
i=1
rCs rct+rcs
(Theorem 4:5) = e exp ( (MY (r) 1) (t s))

We now de…ne a quantity (r) as

(r) = (MY (r) 1) rc

such that we get

rCt rCs
E e Fs = e exp ( (r) (t s)) :

Therefore, exp ( rCt ) is a martingale provided that

(r) = 0:

One solution to this equation is r = 0 which is useless since exp ( rCt ) is just
a constant for r = 0. Some analysis, however, shows that there might be one
additional solution R > 0 to this equation. In case this solution exists we call it
the adjustment coe¢ cient R.

27
Example 4.6 Let the claims Yi be Exp ( )-distributed. One computes

MY (r) = E [exp (rYi )] = :

r
For (R) = 0 we have to solve

1 Rc = 0:
R

Here a non-zero solution R exists:

R= ;
c
we have R > 0 (equivalent to c > = ) because of the net pro…t condition (re-
call that the exponential distribution has a mean of 1= ). However, often the
exponential distribution is a rather poor approximation to the ’real’ claim size
distribution, so we need to consider a more general situation.
Let us reformulate the result which we obtained above as follows.

Theorem 4.7 Assume that the adjustment coe¢ cient R exists. Then exp ( RCt )
is a martingale.
Before we proceed further, let us collect one more result from martingale theory
(without proof).

Theorem 4.8 Stopped martingales are martingales. Let (Mt ) be a martin-

gale, a stopping time, ^ t = min ( ; t). Then (M ^t ) is a martingale as well (it
is equal to (Mt ) up to time , afterwards it stays constant at level M ).
Having found in Theorem 4.7 an exponential martingale related to the risk process
C, the proof of the following celebrated theorem is surprisingly easy.

Theorem 4.9 Assume that the adjustment coe¢ cient R exists. Then
Ru
(u) < e :

Proof. Recall that (u) = P ( < 1) where is the time of ruin. By the optional
stopping theorem, applied to the stopped martingale exp ( RCt ), we get
Ru C0 RC RC RC
e =e =E e ^t
E e ^t
t =E e t :

28
RC
Letting now t tend to in…nity, we get (note that C 0, hence e 1)
Ru RC
e E e <1 >E[ <1 ] = P ( < 1) = (u) :

It results that the ruin probability (u) decreases rather quickly, namely expo-
nentially, with increasing initial reserve u. So running an insurance company is
not a too risky business, provided the company is equipped with plenty of initial
capital.

5 Stochastic Calculus for Piecewise Di¤erentiable

Processes
5.1 Stochastic Integration
Consider a risky asset whose price is modelled by a continuous-time stochastic
process (St ). A portfolio ( t ) is a stochastic process describing the amount t of
the asset held by the investor at time t. If the investor follows a dynamic trading
strategy, she would buy or sell at certain random transaction dates modelled
as stopping times 0 = T0 < T1 < T2 < ::: < Tn < Tn+1 = T . Between two
transaction dates Ti and Ti+1 the portfolio remains unchanged and we will denote
its composition by i . The portfolio may then be expressed as

X
n

t = 0 t=0 + i ]Ti ;Ti+1 ] (t):

i=0

The new portfolio i is chosen based on the information available at time Ti , it

is thus FTi -measurable. On the other hand, when the market maker sets the new
price at time Ti , the portfolio is still described as i 1 ; it takes its new value i
right after Ti . Therefore, the indicator function must be of the form ]Ti ;Ti+1 ] (left-
continuous) as opposed to [Ti ;Ti+1 [ (right-continuous). We say that is a simple
predictable process.
Between Ti and Ti+1 we hold i shares of the asset, with the asset price moving
by STi+1 STi . Hence the outcome of our trading strategy is given by
Z t X
n

u dSu := 0 S0 + i STi+1 ^t STi ^t :

0 i=0

29
We call this expression
R t the stochastic integral of with respect to S. Thus, the
stochastic integral 0 dS represents the capital accumulated between 0 and t by
driving the trading strategy .

Example 5.1 (Trading strategies have to be predictable)

Let St = t Nt where Nt denotes a Poisson process with intensity ; it is a
martingale. Denote by T1 the time of its …rst jump. Consider now the strategy
which consists in buying one unit of the asset S at t = 0 for the price of zero and
selling it right before the price falls down, i.e. t = [0;T1 [ (t). This strategy is
right-continuous, not left-continuous, and thus not simple predictable. The capital
gain associated to this strategy is given by
Z t
u dSu = t for t < T1 ;
0
= T1 for t T1 :

This strategy is a so-called arbitrage opportunity: it requires zero initial capital,

but leads to a strictly positive outcome without any downside risk. Such strategies
are extremely unrealistic and should be ruled out in any reasonable model. In
particular, this strategy would allow to sell right before the crash (which in reality
could not be foreseen).

Proposition 5.2 (Stochastic integration preserves the martingale prop-

erty)
If S is aR martingale, then for any simple predictable process the stochastic
integral dS is a martingale as well.
Proof. We will just take a look at the special case where we trade only at one
stopping time s < T1 < t. Then we have by de…nition of the stochastic integral
Z t
E u dSu Fs = 0 S0 + E [ 0 (ST1 S0 ) + 1 (St ST1 )j Fs ] :
0

For the …rst term we use that S is a martingale (and the optional stopping theo-
rem):

E[ 0 (ST1 S0 )j Fs ] = 0 (E [ST1 j Fs ] S0 )
= 0 (Ss S0 ) :

30
We evaluate the second term by using the tower property together with ’taking
out what is known’and the martingale property of S:

E[ 1 (St ST1 )j Fs ] = E [E [ 1 (St ST1 )j FT1 ]j Fs ]

= E [ 1 E [ (St ST1 )j FT1 ]j Fs ]
= 0:

Summing up, we get

Z t
E u dSu Fs = 0 S0 + 0 (Ss S0 )
0
Z s
= u dSu :
0

Consider now a right- or left-continuous adapted stochastic process . We par-

tition now our interval [0; t] by using a ’time grid’, i.e. random partition points
0 = 0 < 1 < ::: n = t, where the i are stopping times. If we now consider …ner
and …ner partitions, then one can show that
X n Z t
0 S0 + i
S i+1 ^t S i ^t ! u dSu ; (5.1)
i=0 0

where = lim"#0 u " . More precisely, one can de…ne a stochastic integral
Rt u

0 u
dSu as the limit of the approximating sums in (5.1). Important here is
again that the integrand has to be left-continuous, not right-continuous – the
reason why we work with u instead of u . This R guarantees in particular that if
S is a martingale, then the stochastic integral dS is a martingale as well (the
last statement is not completely rigorous, but essentially true).

5.2 The Chain Rule and Integration by Parts

Let us consider a stochastic process X which is right-continuous, i.e.

Xt = Xt+ := lim Xt+"

"#0

and has left limits, i.e.

Xt = lim Xt "
"#0

31
exists. One also abbreviates this by saying that X is an RCLL process. In case
Xt 6= Xt the process exhibits a jump of size

Xt = Xt Xt

We furthermore assume that every path of X is piecewise di¤erentiable (PD), that

is, it is di¤erentiable with the possible exception of a countable set of points. We
may then write Z t X
Xt = X0 + xu du + Xu
0 0<u t

or in di¤erential notation
dXt = xt dt + Xt
Let now f be a real-valued function with continuous derivative. We can form the
composed function f (Xt ). At the exceptional points f (Xt ) changes only at jump
points, and then it jumps by f (Xt ) f (Xt ). In the open intervals where X is
continuously di¤erentiable we have the usual chain rule at our disposal. In total
we get
df (Xt ) = f 0 (Xt ) xt dt + f (Xt ) f (Xt )
or in integral form

Theorem 5.3 (Itô’s formula for piecewise di¤erentiable processes)

Z t X
f (Xt ) = f (Xs ) + f 0 (Xu ) xu du + ff (Xu ) f (Xu )g :
s s<u t

Given two stochastic processes X; Y and a time grid on [0; t] we can de…ne the
’realized covariance’
X
X i+1 X i Y i+1 Y i
i
X
= X i+1
Y i+1
X iY i
Y i
X i+1
X i
X i
Y i+1
Y i
i
X
= Xt Yt X0 Y0 Y i
X i+1
X i
+X i
Y i+1
Y i
i
Z t Z t
! Xt Yt X0 Y0 Yt dXt Xt dYt :
0 0

32
Considering …ner and …ner grid size, we call the limit the quadratic covariation of
X and Y , denoted by [X; Y ]. Therefore,
X
X i+1 X i Y i+1 Y i ! [X; Y ]t ;
i

and we have proved

Theorem 5.4 (Integration by parts formula)

Z t Z t
Xt Yt X0 Y0 = Xt dYt + Yt dXt + [X; Y ]t
0 0

Note that, in contrast to the version of the Itô-formula we have proved so far,
X and Y do not have to be PD for the integration by parts formula to be valid.
However, if X and Y are PD, then we get from the approximation
X
[X; Y ]t = Xs Ys .
0 s t

In particular, since we have for a Poisson process N (jump size one!) that
X X
Nt = Ns = ( Ns )2 ;
0 s t 0 s t

we calculate that [N; N ]t = Nt . Moreover, we have for a compensated Poisson

process that [N t; N t]t = Nt as well.

5.3 The Stochastic Exponential

We consider a RCLL and PD stochastic process X. The stochastic exponential of
X, denoted by Z = E (X), is de…ned as
Y
Zt = exp (Xtc ) (1 + Xs ) ;
0<s t
P
where Xtc = Xt s t Xs .

Theorem 5.5 The stochastic exponential Z is the unique solution to the equation
Z t
Zt = 1 + Zs dXs ;
0

or, written in di¤erential form,

dZt = Zt dXt :

33
Proof. Let us assume that everywhere Xt = 0. Then we have Zt = exp(Xt ),
and the statement is a well-known result from classical analysis. If, on the other
hand, dXt = 0 with the possible exception of countably many jump points where
dXt = Xt , then Zt =Zt = 1 + Xt , and we get
dZt Zt Zt
= = (1 + Xt ) 1 = dXt :
Zt Zt
The general case is a combination of the two forementioned special cases.

As an application we calculate the moment generating function of a Poisson dis-

tribution:

Proposition 5.6 Let N be a Poisson process with intensity . Then

E ecNt = exp ((ec 1) t) :

Proof. Let N et = Nt t denote the compensated Poisson process, it is a martin-

gale. For > 1 we denote Xt = N et . Clearly, Xtc = t. Consider now the
di¤erential equation

dZt = et = Zt dXt
Zt dN
Z0 = 1:

Its solution is the stochastic exponential Z = E (X) where

Y
Zt = exp (Xtc ) (1 + Xs )
0<s t
Y
t
= e (1 + Ns )
s t

= e (1 + )Nt
t

= exp (Nt ln (1 + ) t)
c
= exp (cNt (e 1) t)

if we set c = ln (1 + ), hence = ec 1. Now Z is a martingale since it is

the stochastic integral of a compensated Poisson process (which is a martingale),
Rt
Zt = 1 + 0 Zs dN es , with Z0 = 1. Therefore E [Zt ] = E [Z0 ] = 1, hence

E ecNt = exp ((ec 1) t) :

34
5.4 Jump Times and Hazard Rates
We are given a stochastic process X of state variables which in‡uence the economy
via stochastic interest rates r(Xt ), and via a process (Xt ) which will later be
interpreted as hazard rate process. The …ltration generated by X will be denoted
by (Gt ). Let E1 be a random variable independent of (Gt ) which is exponentially
distributed. Let (Ft ) be the larger …ltration which encompasses (Gt ) and the
information about E1 . Then we can construct a (Ft )-stopping time as follows:
de…ne Z t
= inf t : (Xs ) ds E1 :
0

In case (Xs ) = (constant ) this would give us the …rst jump time of a Poisson
process with intensity . In this sense the de…nition of is a generalization to the
case of stochastic intensity. Recall now that, since E1 Exp(1), we have for an
arbitrary time T that
P (T < E1 ) = exp ( T ) :
Therefore, we get by independence and by the fact that (Xs ) is Gt -measurable
for all s < t that
Z t
P ( > tj Gt ) = P (Xs ) ds < E1 Gt
0
Z t
= exp (Xs ) ds :
0

This identity gives us, in analogy with the deterministic case studied in survival
analysis, the interpretation of (Xt ) as hazard rate process. Our goal is now to
calculate the expected discounted value p (0; t) of a survivor bond, which pays out
one unit if the policyholder has survived up to time t (actuarial interpretation), or
of a defaultable bond, which pays out one unit if t is smaller than the default time
(credit risk interpretation). In both cases the payout function of the claim is
f >tg . We get (recall that E f >tg Gt is just another notation for P ( > tj Gt ))

Z t
p(0; t) = E exp r (Xs ) ds f >tg
0
Z t
= E E exp r (Xs ) ds f >tg Gt
0
Z t
= E exp r (Xs ) ds E f >tg Gt
0

35
Z t Z t
= E exp r (Xs ) ds exp (Xs ) ds
0 0
Z t
= E exp (r + ) (Xs ) ds :
0

Here the interest rate r (Xs ) has been replaced by the actuarial interest rate
(r + ) (Xs ). Let now Nt = f tg be the survival process associated to . It
is possible (but rather tedious) to show that
Z t Z t^
M t = Nt u f >ug du = Nt u du
0 0
R t^
is a martingale with respect to (Ft ). We call 0 u du the compensator of N .

6 Brownian Motion
6.1 De…nition and First Properties
De…nition 6.1 B is a standard (one-dimensional) Brownian motion if it satis…es
the following conditions: (B0 = 0 and)
(i) Its increments are independent of the past
(ii) It has stationary increments
(iii) The mapping t ! Bt is continuous
(iv) Bt is N (0; t) distributed (normal distribution with mean 0 and variance t)

Theorem 6.2 Brownian motion exists.

De…nition 6.3 (B 1 ; :::; B n ) is a standard n-dimensional Brownian motion if each

of its components are independent one-dimensional Brownian motions.
Let us recall that a normal distribution is characterized by its mean and its
variance . Multidimensional normal distributions (Z 1 ; :::; Z n ) are characterized
by giving the vector of means i = E [Z i ] and the covariance matrix:

ij =E Zi i Zj j

Let us now list some properties.

36
p
Self-similarity. The processes (Bct ) and ( cBt ) have the same distribution (c >
0): speeding up time by a factor of c does not change properties (i) (iii), but Bct
has variance
p ct. This can also be achieved by multiplying a standard Brownian
motion by c.
Covariance formula. E [Bs Bt ] = min (s; t): let s < t. Writing Bt = Bs +
(Bt Bs ) and using the fact that Bs and (Bt Bs ) are independent with mean
0, we have

E [Bs Bt ] = E Bs2 + E [Bs (Bt Bs )] = s = min (s; t) :

Strong Markov property. If T is a stopping time, then BT +t BT , t 0 is a

Brownian motion independent of Br , r T .

Quadratic variation. Note that we had de…ned the quadratic covariation [X; Y ]
of two processes X, Y as the limit (the ti ’s form a partition n : 0 = t0 < t1 <
::: < tn of [0; t]) X
Xti+1 Xti Yti+1 Yti ! [X; Y ]t :
i

Our goal is now to determine [B; B]t . We set mesh( n ) = maxi jti+1 ti j. Let us
now consider …ner and …ner partitions m n for m < n such that mesh( n ) !
0 for n ! 1. We have to evaluate the limit of
X 2
Qn (t) = Bti+1 Bti :
i

Note …rst that

X h i X
2
E [Qn (t)] = E Bti+1 Bti = (ti+1 ti ) = t:
i i

Furthermore, recall that var(X) = E [X 2 ] (E [X])2 . By the independence of

increments of a Brownian motion we thus get
X X h i
2 4
var (Qn (t)) = var Bti+1 Bti = E Bti+1 Bti (ti+1 ti )2 :
i i
p
By self-similarity, we know that Bti+1 Bti is distributed like (ti+1 ti )B1 where
B1 N (0; 1). Recall that for the N (0; 1)-RV B1 we have E (B1 )4 = 3. Hence
h i
4
E Bti+1 Bti = E (ti+1 ti )2 (B1 )4 = 3 (ti+1 ti )2 ;

37
and consequently
X
var (Qn (t)) = 3 (ti+1 ti )2 (ti+1 ti )2
i
X
= 2 (ti+1 ti )2 :
i

Letting now mesh( n) ! 0 we obtain

X
var (Qn (t)) 2mesh ( n ) (ti+1 ti ) = 2mesh ( n) t ! 0:
i

Since var (Qn (t)) = E (Qn (t) t)2 we have proved that Qn (t) converges to t in
mean-square. One can show that this implies that [B; B]t = t, since [B; B]t is the
limit of Qn (t) (in an appropriate sense). This result is remarkable: we have seen
earlier on that for a PD stochastic process X,
X
[X; X]t = ( Xs )2 :
0 s t

As Brownian motion has continuous paths, B = 0 everywhere, and hence the

paths of a Brownian motion cannot be piecewise di¤erentiable.

6.2 Martingales related to Brownian motion

Let (Ft ) be the …ltration generated by the Brownian motion B, i.e. Ft contains
all information about B up to time t.

Theorem 6.4 B is a martingale.

Proof. Given Fs , the value of Bs is known while Bt Bs is independent and has
mean 0.

E [Bt j Fs ] = E [(Bt Bs ) + Bs j Fs ]
= E [Bt Bs ] + Bs
= Bs :

Let now
Ta = min ft 0 : Bt = ag

38
be the …rst time the Brownian motion hits a. One can show that

P (Ta < 1) = 1:

In the following, we will apply the following version of the optional stopping the-
orem:

Theorem 6.5 Let M be a martingale with continuous paths. Suppose T is a

stopping time with
P (T < 1) = 1;
and there is a constant K such that jMT ^t j K for all t. Then

E [MT ] = E [M0 ] :

Example 6.6 Exit distribution from an interval. De…ne the exit time by
(a < 0 < b)
= min ft : Bt 2
= (a; b)g :
Then we have
a b
P (B = b) = and P (B = a) = :
b a b a
Proof. Let us …rst check that is a stopping time. We have

f sgc = f > sg = fBr 2 (a; b) for all r sg ;

and the last event is certainly Fs -measurable. Since Fs is a -algebra, it contains

with f sgc also f sg, hence is indeed a stopping time. The fact that
P ( < 1) = 1 can be shown by noting that Ta while P (Ta < 1) = 1.
Moreover, jB ^t j jaj + b, so we may use the optional stopping theorem to get

0 = E [B ] = aP (B = a) + bP (B = b) :

Our result follows since P (B = a) = 1 P (B = b) and solving the linear equa-

tion.

Theorem 6.7 (Bt2 t) is a martingale.

39
Proof. Given Fs , the value of Bs is known while Bt Bs is independent with
mean 0 and variance t s. We get

E Bt2 Fs = E (Bs + (Bt Bs ))2 Fs

= Bs2 + 2Bs E [Bt Bs j Fs ] + E (Bt Bs )2 Fs
= Bs2 + 0 + t s:

Subtracting t from both sides yields the result.

Example 6.8 Mean exit time from an interval. For a < 0 < b, de…ne the
exit time by
= min ft : Bt 2
= (a; b)g :
Then
E[ ]= ab:
Proof. It is possible (but more di¢ cult to show than in the last example) to
apply the conclusion of the optional stopping time. This gives with the formula
for the exit distribution from the last example
a b
0 = E B2 = b2 + a2 E [ ]:
b a b a
Rearranging now gives

b2 a + ba2
E[ ]= = ba:
b a

We can derive one surprising conclusion from this example. Recall that Ta =
min ft 0 : Bt = ag and let a;N = min ft : Bt 2
= (a; N )g, so Ta a;N and there-
fore
E [Ta ] E [ a;N ] = aN ! 1
as N ! 1. The same conclusion holds for a > 0 by symmetry, so we have for all
a 6= 0 that
E [Ta ] = 1:

40
7 Stochastic Calculus for Brownian Motion
7.1 Stochastic Integral of Elementary Strategies
Let Bt denote the price of some stock at time t, we assume that B is a Brownian
motion (historically, this was one of the oldest models for stock prices). Assume
we buy a random amount of shares of this stock at time s for the price Bs , wait
until time t > s and sell it then for a price Bt . It is possible that is negative
(this amounts to short-selling the stock). We assume that we have no insider
information about future stock prices. Therefore should be a Fs -measurable
random variable. Our gain (or loss) from this strategy is given as
Z t
dBu := (Bt Bs ) :
s

We are interested in calculating conditional mean and variance of our gain in this
case. We have by the tower property, the fact that is Fs -measurable and the
independence of Bt Bs from Fs

E [ (Bt Bs )j Fs ] = E [ (Bt Bs )j Fs ]
= E [Bt Bs ]
= 0:

Moreover,

E 2
(Bt Bs )2 Fs = 2
E (Bt Bs )2 Fs
= 2
E (Bt Bs )2
2
= (t s):

We now assume that we can trade at n random time points 0 = T1 ::: Tn = T .

At each time ti we hold an amount of i shares where i is Fti -measurable. Our
gain (or loss) over the whole trading period is then given as
Z T X
n

u dBu := i BTi+1 BTi

0 i=1

We call this the stochastic integral of the strategy with respect to the Brownian
motion B. In this de…nition it is essential that i is FTi -measurable. Otherwise
we could predict future increments of the stock price and make a riskless pro…t.

41
For instance, if we can foresee the future then we can follow the strategy #i =
sign(BTi+1 BTi ) to get
Z T X
n
#u dBu = sign(BTi+1 BTi ) BTi+1 BTi
0 i=1
X
n
= BTi+1 BTi 0
i=1

RT
The stochastic integral 0 u dBu is a random variable. We can de…ne a stochastic
integral process as follows for any t 2 [0; T ]:
Z t X
n

u dBu := i BTi+1^t BTi ^t

0 i=1

Note that if t < Ti < Ti+1 , then BTi+1^t BTi ^t = 0. If Ti < t < Ti+1 then
BTi+1^t BTi ^t = Bt BTi , and if Ti < Ti+1 < t then BTi+1^t BTi ^t = BTi+1 BTi .
We always assume that our strategy is not too ’wild’in the sense that
" n #
X
2
E i (Ti+1 Ti ) < 1:
i=1

Generalizing our considerations above (trading at one time point only), one can
prove the following result.
Rt
Theorem 7.1 The stochastic integral process 0 u dBu is a martingale with con-
tinuous paths. Moreover,
" Z 2
# Z
t t
2
E u dBu Fs = E u du Fs
s s

Note that we are only trading at n discrete time points, this formula (for s = 0)
can also be written as
2 !2 3 " n #
Xn X
E4 i BTi+1^t BTi ^t 5=E 2
i (Ti+1 Ti )
i=1 i=1

42
7.2 Stochastic Integral for Left-Continuous Integrands. Itô’s
Formula.
Let us now consider a left-continuous trading strategy which has right-limits and
which is adapted to (Ft ), hence for all t we have that t is Ft -measurable. It is
possible to approximate such a strategy by the elementary strategies considered
in the last section. One can show that the corresponding elementary stochastic
integrals then also approximate a random variableR which we call the stochastic
t
integral of with respect to B and denote by 0R u dBu . If we consider it as
a function of t, then we get a stochastic process dB. It has the same two
properties as in the last section.
hR i
T 2
Theorem 7.2 Let be left-continuous with right-limits, and such that E 0 u du <
1. Then
R
(i) dB is a martingale with continuous paths, in particular (for 0 < s < t < T )
Z t
E u dBu Fs = 0
s

(ii) We have " Z #

t 2 Z t
2
E u dBu Fs = E u du Fs
s s

Please note that since we assume that F0 is the trivial -algebra (containing only
and ?), in case s = 0 the two formulas read as
Z t
E u dBu = 0;
0
" Z 2
# Z
t t
2
E u dBu = E u du :
0 0

Quadratic variation. Regarding the quadratic variation for stochastic integrals,

one can show that
Z Z Z t Z t
dB; # dB = u #u d [B; B]u = u #u du:
t 0 0

On the other hand, if one of the integrals is PD (i.e., an integral with respect to
du), then we get Z Z
dB; # du = 0:
t

43
We would like to develop a calculus for the stochastic integral with respect to
Brownian motion. In classical calculus, the key for doing computations is the
fundamental theorem relating integration and di¤erentiation, and we want to gen-
eralize this. Let f now be a two times continuously di¤erentiable function, and
consider the following heuristic motivation. Taylor’s theorem tells us that we have
1
f (y) x) + f 00 (x) (y x)2 + R
f (x) = f 0 (x) (y
2
where R is some remainder term. We now partition the intervall [0; t] as 0 = t0 <
t1 < ::: < tn = t and write as a telescoping sum
X
n
f (Bt ) f (B0 ) = f Bti+1 f (Bti )
i=0
Xn
= f 0 (Bti ) Bti+1 Bti
i=0

1 X 00
n
2
+ f (Bti ) Bti+1 Bti +R
2 i=0
If we now make the R t partition …ner and …ner, the …rst term on the right-hand side
0
is approximately 0 f (Bu ) dBu . What about the second term? We have by the
tower property
" n # " n #
X 2
X h 2
i
E f 00 (Bti ) Bti+1 Bti = E E f 00 (Bti ) Bti+1 Bti Fti
i=0
" i=0 #
X n h i
2
= E f 00 (Bti )E Bti+1 Bti Fti
" i=0 #
X n h i
2
= E f 00 (Bti )E Bti+1 Bti
" i=0 #
X n
= E f 00 (Bti ) (ti+1 ti )
i=0

and
R t 00the last sum inside of the expectation tends with …ner and …ner partitions to
0
f (Bu ) du. But one can get even more: it is possible to prove that on a set with
probability one we have (at least for certain subsequences of partitions) that
Xn Z t
00 2
f (Bti ) Bti+1 Bti ! f 00 (Bu ) du:
i=0 0

44
Moreover, the remainder term R goes to zero. We have motivated the most im-
portant result in stochastic calculus.

Theorem 7.3 Itô’s formula Let f be a two times continuously di¤erentiable

function, B a Brownian motion. Then
Z t Z
0 1 t 00
f (Bt ) f (B0 ) = f (Bu ) dBu + f (Bu ) du:
0 2 0
This structure of this formula looks like the fundamental theorem of classical
calculus, Z t
f (t) f (0) = f 0 (u) du;
0
however, here we have an additional term involving the second derivative of f .
Therefore, the formulas in stochastic calculus are di¤erent from those in classical
calculus. Let us now show how to apply Itô’s formula to evaluate stochastic
integrals.

Example 7.4 Take f (x) = x2 . We have f 0 (x) = 2x, f 00 (x) = 2. Itô’s formula
yields (assuming B0 = 0)
Z t Z
2 1 t
Bt = 2Bu dBu + 2 du
0 2 0
Z t
= 2 Bu dBu + t
0

Rearranging gives us Z t
1 1
Bu dBu = Bt2 t
0 2 2
which is indeed surprising if we compare it with
Z t
1
u du = t2 :
0 2
There is also a multidimensional, time-dependent generalization of Itô’s formula.
However, for the purpose of this lecture, we will only need the following corollaries
from this.

45
Theorem 7.5 Time-dependent Itô’s formula. Let f (t; x) be continuously
di¤erentiable: once in t, twice in x. Then
Z t Z t Z
@ @ 1 t @2
f (t; Bt ) f (0; B0 ) = f (u; Bu ) du+ f (u; Bu ) dBu + f (u; Bu ) du:
0 @t 0 @x 2 0 @x2
@
Example 7.6 Let f (t; x) = exp (x t=2). Then @t f (t; x) = 12 exp (x t=2),
@ @ 2
@x
f (t; x) = exp (x t=2), @x 2 f (u; Bu ) = exp (x t=2). It follows that
Z Z t
1 t
exp (Bt t=2) = 1 exp (Bu u=2) du + exp (Bu u=2) dBu
2 0 0
Z
1 t
+ exp (Bu u=2) du
2 0
Z t
= 1+ exp (Bu u=2) dBu :
0

Theorem 7.7 Let h (t; x) be a space-time harmonic function. That is, it ful…lls
the partial di¤erential equation
@ 1 @2
h (t; x) + h (t; x) = 0:
@t 2 @x2
hR i
t @ 2
Let moreover E 0 @x
h (u; Bu ) du < 1. Then h (t; Bt ) is a martingale, and

h (0; 0) = E [h (t; Bt )] :

Proof. By the time-dependent Itô’s formula, we have

Z t
@
h (t; Bt ) = h (0; B0 ) + h(u; Bu ) dBu ;
0 @x
Rt @
and by property (i) of the stochastic integral, 0 @x h(u; Bu ) dBu is a martingale
with Z t
@
E h(u; Bu ) dBu = 0:
0 @x

Theorem 7.8 Integration by parts. We recall the formula from the previous
chapter: Z t Z t
Xt Yt X0 Y0 = Xt dYt + Yt dXt + [X; Y ]t
0 0

46
R R
Let , # be strategies such that the stochastic integrals dB, # dB are well-
de…ned. The derivation of the integration by parts formula in the previous chapter
carries over to the Brownian case as well, so we get, noting that all processes
involved here are left-continuous,
R andR the stochastic integrals have zero initial
value, that (setting X = dB, Y = # dB)
Z t Z t Z tZ u Z tZ u Z t
u dBu #u dBu = v dBv #u dBu + #v dBv u dBu + u #u du:
0 0 0 0 0 0 0
RtRu
Example 7.9 We calculate the iterated integral 0 0 Bv dBv dBu by using the
integration by parts formula for = 1, # = B:
Z tZ u Z t Z t Z t
2
Bv dBv dBu = Bt Bu dBu Bu dBu Bu du: (7.1)
0 0 0 0 0

We have already computed

Z t
1 1
Bu dBu = Bt2 t: (7.2)
0 2 2

Moreover, applying Itô’s formula to f (x) = x3 yields

Z t Z t
3 2
Bt = 3 Bu dBu + 3 Bu du;
0 0

hence Z t Z t
1
Bu2 dBu = Bt3 Bu du: (7.3)
0 3 0

Plugging (7.2), (7.3) into (7.1), we get

Z tZ u
1 3 1 1 3
Bv dBv dBu = B tBt B
0 0 2 t 2 3 t
1 3 1
= B tBt :
6 t 2
Example 7.10 Ornstein-Uhlenbeck process. Let
Z t
t
Xt = e e s dBs :
0

47
t
Rt
We have by integration by parts (e = 0
e s ds)
Z t Z t Z tZ s
t s s
e e dBs = e e s dBs e u
dBu e s
ds
0 0 0 0
Z t
= Bt Xs ds:
0

Hence X satis…es the stochastic integral equation

Z t
Xt = Xs ds + Bt :
0

Example 7.11 Brownian bridge. We consider now the stochastic integral

equation (for t < 1) Z t
Xs
Xt = ds + Bt ;
0 1 s
and show by integration by parts that a solution is given as
Z t
1
Xt = (1 t) dBs :
0 1 s
Indeed:
Z t Z t Z tZ s
1 1 s 1
(1 t) dBs = dBs dBu dt
0 1 s 0 1 s 0 0 1 u
Z t
Xs
= Bt dt:
0 1 s
We have X0 = 0. One can show that limt!1 Xt = 0 as well. To motivate this, we
calculate (for t < 1)
Z t
1
E [Xt ] = (1 t) E dBs = 0
0 1 s
and
" Z 2
#
t
1
E Xt2 = (1 t)2 E dBs
0 1 s
Z t
2 1
= (1 t) E ds
0 (1 s)2
1
= (1 t)2 1 !0 as t ! 1:
1 t

48
7.3 Itô-Processes and Stochastic Di¤erential Equations
Itô-processes: these are stochastic processes of the form
Z t Z t
Xt = X0 + s ds + s dBs ;
0 0

where and are adapted stochastic processes (which includes the special case
when they are deterministic). We sometimes write this formula shorter in di¤er-
ential notation as
dXt = t dt + t dBt :
We can calculate the quadratic variation of an Itô-process as
Z t
2
[X]t = [X; X]t = s ds;
0

or in di¤erential notation
2
d [X]t = t dt:
We have an Itô-formula for these kind of processes as follows:
Z t Z
0 1 t 00
f (Xt ) f (X0 ) = f (Xs ) dXs + f (Xs ) d [X]s
0 2 0
Z t Z t Z
0 _ 0 _ 1 t 00 2
= f (Xs ) s ds + f (Xs ) s dBs + f (Xs ) s ds:
0 0 2 0

Stochastic di¤erential equations (SDE’s): Let two functions (t; x) and

(t; x) be given. A SDE is an equation of the form
Z t Z t
Xt = X0 + (s; Xs ) ds + (s; Xs ) dBs ;
0 0

or in di¤erential notation

dXt = (t; Xt ) dt + (t; Xt ) dBt :

One can state certain conditions on the functions and which guarantee the
existence of a unique solution, and there is a vast body of knowledge of e¢ cient
numerical algorithms to approximate this solution.

49
Examples:

The SDE
dXt = Xt dt + dBt

has as solution the Ornstein-Uhlenbeck process

Z t
t
Xt = e X0 + e+ s dBs :
0

The SDE
dXt = Xt dt + Xt dBt

has as solution geometric Brownian motion

1 2
Xt = X0 exp t + Bt :
2

7.4 The Formula of Feynman and Kac

Theorem 7.12 (Feynman-Kac) Consider the stochastic di¤erential equation

dXt = Xt dt + Xt dBt :

Fix T > 0, and let 0 t T . For any function h, we de…ne a function F (t; x) by
considering the conditional expectation
r(T t)
F (t; Xt ) = E e h (XT ) Ft : (7.4)

Then F (t; x) satis…es the partial di¤erential equation

@ @ 1 2 @2
F (t; x) + F (t; x) + F (t; x) = rF (t; x);
@t @x 2 @x2

with the terminal condition (for all x)

F (T; x) = h(x):

50
Proof. We have
rt rT
e F (t; Xt ) = E e h (XT ) Ft :
By the tower property, and the de…nition of F ,
rt rT
E e F (t; Xt ) Fs = E E e h (XT ) Ft Fs
rT
= E e h (XT ) Fs
= e rs F (s; Xs ):

It follows that e rt F (t; Xt ) is a martingale. By Itô’s formula, the di¤erential of

this martingale is

rt rt @ @ 1 2 @2 rt @
d e F (t; Xt ) = e rF + F + F + F dt + e F dB:
@t @x 2 @x2 @x
Because we have a martingale, the dt-term must be equal to zero, and the partial
di¤erential equation follows.

Remark 7.13 Often the converse of this statement is used: given a solution to
the forementioned PDE, it can be shown under certain conditions that it equals
the conditional expectation (7.4).

8 Continuous-Time Markov Chains

8.1 Jump Rates and Transition Probabilities
Let (Xt )t 0 be a collection of random variables taking values from a discrete set
S.

Example. Health-Sickness Model. The following model is often used in life

insurance: people can be in three states, H (healthy), S (sick), or D (dead). If a
person is in states H or S, then he/she stays there for an exponentially distributed
amount of time with rate H resp. S , then goes to either of the two other states
with a certain transition probability. Once you have reached state D, then you
stay there forever. D is called an absorbing or cemetery state (the latter expression
…ts well for this example).

Example. The previous example generalizes easily in the following way: Take
a discrete-time Markov chain and assign to each state a rate i . If Xt is in a
state i with i = 0 then Xt stays there forever. If i > 0, Xt stays at i for an

51
exponentially distributed amount of time with rate i , then goes to state j with
transition probability p (i; j). The lack of memory property of the exponential
distribution implies that given the present state, the rest of the past is irrelevant
for predicting the future.

De…nition 8.1 A continuous time stochastic process X = (Xt ) is called a Markov

chain with respect to a given …ltration (Ft ) if for any function h : S! R (here S
denotes the discrete state space) we have
E [h (Xt+s )j Ft ] = E [ h (Xt+s )j Xt ]
for any s; t 2 R+ . If this conditional expectation depends only on the time incre-
ment s, but not on t, then the chain is said to be homogenous. The transition
probabilities are given as pt (i; j) = P (Xt = jj X0 = i). Let now Pt be a matrix
that has the pt (i; j)’s as elements.

Properties of the family of matrices (Pt ):

(i) P0 = I (the unity matrix)
(ii) Pt is a stochastic matrix - rows add up to 1 and elements are all between 0
and 1
(iii) Ps+t = Ps Pt for any s, t; this is called the Chapman-Kolmogorov equation
These three properties tell us that (Pt ) is a stochastic semigroup. Consider now
a small time interval, and assume that the probability of two or more transitions
in this interval is o (h). We now write for i 6= j
ph (i; j) = q(i; j)h + o (h)
for some q(i; j) 0, and for i = j
ph (i; i) = 1 ih + o (h)
P
with i = j6=i q (i; j) 0. We call q (i; j) the jump rate from i to j.

Example 8.2 For a Poisson process with intensity , we have that the transition
probability to go from n to n + 1 is given for all n 0 as
ph (n; n + 1) = 1 e h + o (h)
= h + o (h) ;
so we get
q (n; n + 1) = :

52
We introduce a new matrix Q with entries
q (i; j) if j 6= i
Q (i; j) =
i if j = i
Note that the o¤-diagonal elements q (i; j) with i 6= j are nonnegative, while the
diagonal entry i is a negative number chosen to make the row sum equal to
0. A computation based on the Chapman-Kolmogorov equation now reveals the
connection between the matrices P and Q (you can look up the details in the book
by Durrett, Ch. 4.2):

Theorem 8.3 We have

d
Pt = QPt Kolmogorov0 s backward equation
dt
d
Pt = P t Q Kolmogorov0 s forward equation;
dt
subject to P0 = I.
If Q would be a number instead of a matrix, we could solve these equations easily:
just set Pt = exp(Qt) and check by di¤erentiating that the equation holds. Inspired
by this observation, we de…ne the matrix
X
1
(Qt)n X
1
tn
exp(Qt) = = Qn
n=0
n! n=0
n!

and check by di¤erentiating (we may interchange summation and di¤erentiation)

that
d X1
n tn 1
exp(Qt) = Qn
dt n=0
n!
X
1
tn 1
= Qn
n=1
(n 1)!
X1
Qn 1 tn 1
= Q
n=1
(n 1)!
= Q exp(Qt):
Comparing the backward with the forward equation, we note the remarkable fact
that
QPt = Pt Q:

53
In general, matrices do not commute: AB 6= BA, but the explanation is that here
Pt = eQt is made up of powers of Q:
X
(Qt)n X (Qt)n
1 1
Q exp(Qt) = Q = Q = exp(Qt) Q:
n=0
n! n=0
n!

Let us now write the backward equation more explicitly as

X
p0t (i; j) = q (i; k) pt (k; j) i pt (i; j) (8.1)
k6=i

Example 8.4 Poisson process. Let Nt be the number of arrivals up to time

t in a Poisson process with rate . In order to go from i arrivals at time s to j
arrivals at time t + s we must have j i and have exactly j i arrivals in t units
of time, so
j i
t ( t)
pt (i; j) = e
(j i)!
The backwards equation as in (8.1) tells us that in our situation

p0t (i; j) = pt (i + 1; j) pt (i; j) :

To check this we di¤erentiate the formula for pt (i; j) as above to get for j > i

t ( t)j i t ( t)j i 1
e +e = pt (i; j) + pt (i + 1; j) :
(j i)! (j i 1)!
t
When j = i, pt (i; i) = e , so the derivative is
t
e = pt (i; i) = pt (i; i) + pt (i + 1; i)

since pt (i + 1; i) = 0.

8.2 Limit Behaviour

T T
In discrete time a stationary solution is a solution of p= . In continuous
time we need the following condition:
T T
De…nition 8.5 is said to be a stationary distribution if Pt = for all
t > 0.

54
This is di¢ cult to check since it involves all of the Pt which in addition are usually
not that easy to compute. It would be better to get a condition which only
involves the single matrix Q which we usually get easily from the problem under
consideration.
T
Theorem 8.6 is a stationary distribution if and only if Q = 0.
Proof. We just prove the ’only if’part. We have, since Pt = exp(Qt),
T T
Pt = exp(Qt)
X1
tn
T
= Qn
n=0
n!
!
X
1
tn
T n
= I+ Q
n=1
n!
X
1
tn
T T
= + Q Qn 1

n=1
n!
T
= :

We now want to relate the concept of a stationary distribution to the limiting

behaviour of a Markov chain.

De…nition 8.7 A chain is called irreducible if for any two states i and j it is
possible to get from i to j in a …nite number of steps.

Theorem 8.8 If a continuous-time Markov chain is irreducible and has a station-

ary distribution , then
lim pt (i; j) = (j):
t!1

Example. Weather chain. It can be Sunny, Cloudy, or Rainy. The weather

stays sunny for an exponentially distributed number of days with mean 3, then
becomes cloudy. It stays cloudy with mean 4, then rain comes. The rain lasts
with mean 1 until sunshine returns. This verbal description translates into the
following Q-matrix:
S C R
S 1=3 1=3 0
C 0 1=4 1=4
R 1 0 1

55
The relation T Q = 0 now leads to three equations, which we solve to get (1) =
3=8, (2) = 4=8, (3) = 1=8.

De…nition 8.9 A birth and death chain has state space S = f0; 1; :::; N g and
jump rates

q(n; n + 1) = n
q(n; n 1) = n

for 0 n < N (and = 0 otherwise).

For a birth and death chain we can give an easy su¢ cient condition for to be a
stationary distribution:

(k) q (k; j) = (j) q (j; k) for all j; k:

If this detailed balance condition holds, then is a stationary distribution.

This follows by summing over all k 6= j and recalling the de…nition of j ,
X X
(k) q (k; j) = (j) q (j; k) = (j) j ;
k6=j k6=j

since then X
T
Q j
= (k) q (k; j) (j) j = 0:
k6=j

Note: the detailed balance condition is not a necessary condition for being
stationary. Indeed, in the weather example it is not satis…ed. However, it always
holds for birth and death chains.

Example. Barbershop. A barber can cut hair at rate 3, where the units are
people per hour, i.e. each haircut requires an exponentially distributed amount of
time with mean 20 minutes. Suppose customers arrive at times of an intensity rate
2 Poisson process, but will leave if both chairs in the waiting room are full. What
fraction of time will both chairs be full? In the long run, how many customers
does the barber serve per hour?
Solution: we de…ne our state to be the number of customers in the system, so
S = f0; 1; 2; 3g. From the problem description it is clear that

q (i; i 1) = 3 for i = 1; 2; 3
q (i; i + 1) = 2 for i = 0; 1; 2

56
The detailed balance conditions say
2 (0) = 3 (1) ; 2 (1) = 3 (2) ; 2 (2) = 3 (3)
This together with (0) + (1) + (2) + (3) = 1 yields
(0) = 27=65; (1) = 18=65; (2) = 12=65; (1) = 8=65:
From this we see that 8=65’s of the time both chairs are full, so that fraction of
the arrivals are lost and hence 57=65’s or 87:7% of the customers enter service.
Since the original arrival rate is 2, this means that the barber serves an average
of 114=65 = 1:754 customers per hour.
Example. Queuing chain. A bank has one teller to serve customers who need
an exponential amount of service with rate and queue in a line if the server is
busy. Customers arrive at times of a Poisson process with rate but only join the
queue with probability an if there are n customers in line. We have
q (n; n + 1) = an
q(n; n 1) = :
The next result gives a natural condition to prevent the queue length from growing
out of control.

Theorem 8.10 If an ! 0 as n ! 1, then there is a stationary distribution.

Proof. It follows from the detailed balance conditions that

(n + 1) = an (n) :

If N is large enough and n N , then an = 1=2 and it follows that

n N
1 1
(n + 1) (n) ::: (N ) :
2 2
P
This implies that n (n) < 1, so we can pick (0) to make the sum = 1.
If we now take an = 1=(n + 1), then we get from the detailed balance conditions

(n) = (n 1)
n
2
= (n 2)
n(n 1) 2
= :::
( = )n
= (0) :
n!
57
To …nd the stationary distribution we want to …nd c = (0) such that

X1
( = )n
c = 1:
n=0
n!

As the sum is the series for the exponential function, we get c = exp ( = ) and
the stationary distribution is Poisson.

Cheatsheet Probability and Statistics
100% (1)
Cheatsheet Probability and Statistics
10 pages
Discrete Time
No ratings yet
Discrete Time
106 pages
Module A
No ratings yet
Module A
43 pages
ARIMA Models: Instructions
60% (5)
ARIMA Models: Instructions
3 pages
Stochastic Calculus
No ratings yet
Stochastic Calculus
217 pages
STAT733 Notes
No ratings yet
STAT733 Notes
216 pages
Markov
No ratings yet
Markov
46 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Sol Tor Csaba
No ratings yet
Sol Tor Csaba
118 pages
ST7 1
No ratings yet
ST7 1
21 pages
Further Probability Theory
No ratings yet
Further Probability Theory
13 pages
CalTech Lecture On Probability
No ratings yet
CalTech Lecture On Probability
14 pages
درسي
No ratings yet
درسي
2 pages
Probabilityspace
No ratings yet
Probabilityspace
5 pages
TopicsInAnalysis3 TKSM
No ratings yet
TopicsInAnalysis3 TKSM
22 pages
Stationary & Non-Stationary Processes
No ratings yet
Stationary & Non-Stationary Processes
17 pages
The Definition and Existence of Conditional Expectation
No ratings yet
The Definition and Existence of Conditional Expectation
17 pages
Appendix A 2
No ratings yet
Appendix A 2
18 pages
AE2015 Lecture Notes Ch4
No ratings yet
AE2015 Lecture Notes Ch4
51 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
29 pages
Basic Probability Theory
No ratings yet
Basic Probability Theory
29 pages
Martingales: A Discrete Time Introduction
No ratings yet
Martingales: A Discrete Time Introduction
18 pages
Week2 PDF
No ratings yet
Week2 PDF
15 pages
18MAB203T-Probability and Stochastic Processes: Dr. S. Athithan
No ratings yet
18MAB203T-Probability and Stochastic Processes: Dr. S. Athithan
37 pages
Prerequis Esp Cond
No ratings yet
Prerequis Esp Cond
6 pages
Conditional Probability and Expectation
No ratings yet
Conditional Probability and Expectation
19 pages
ST339 23chapter5a
No ratings yet
ST339 23chapter5a
9 pages
Time Series
No ratings yet
Time Series
40 pages
SDENotes 2
No ratings yet
SDENotes 2
140 pages
Markov Process
No ratings yet
Markov Process
28 pages
(김재영) probability
No ratings yet
(김재영) probability
24 pages
Ouka Cha
No ratings yet
Ouka Cha
16 pages
Probability and Statistics: Cheat Sheet
100% (1)
Probability and Statistics: Cheat Sheet
10 pages
R300 MT Class 1 Slides
No ratings yet
R300 MT Class 1 Slides
68 pages
Lecture 9. ARIMA Models
No ratings yet
Lecture 9. ARIMA Models
16 pages
Martingales
No ratings yet
Martingales
15 pages
What Is A Martingale - Doob
No ratings yet
What Is A Martingale - Doob
14 pages
Ulj FMF Fc2 Fm2 Sno Black Scholes Model 01
No ratings yet
Ulj FMF Fc2 Fm2 Sno Black Scholes Model 01
60 pages
Lecture 2: Review of Probability Theory: Books Articles/probability Book/book - HTML
No ratings yet
Lecture 2: Review of Probability Theory: Books Articles/probability Book/book - HTML
21 pages
Slides Presentation
No ratings yet
Slides Presentation
106 pages
APA Lecture Notes Part2
No ratings yet
APA Lecture Notes Part2
21 pages
Discrete Random Variables: Scott Sheffield
No ratings yet
Discrete Random Variables: Scott Sheffield
52 pages
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
No ratings yet
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
155 pages
Conditional Probability
No ratings yet
Conditional Probability
18 pages
Arma Arima
No ratings yet
Arma Arima
10 pages
Lect 1
No ratings yet
Lect 1
16 pages
Proba 2
No ratings yet
Proba 2
88 pages
Introduction To Financial Mathematics
No ratings yet
Introduction To Financial Mathematics
47 pages
Applied Stochastic Processes: M. Ottobre
No ratings yet
Applied Stochastic Processes: M. Ottobre
164 pages
Brownian Motion Notes
No ratings yet
Brownian Motion Notes
14 pages
Stochastic Dynamic Programming: 4.1 The Axiomatic Approach To Probability: Basic Con-Cepts of Measure Theory
No ratings yet
Stochastic Dynamic Programming: 4.1 The Axiomatic Approach To Probability: Basic Con-Cepts of Measure Theory
17 pages
Notes Mainimp
No ratings yet
Notes Mainimp
164 pages
STAT 433 Course Note
No ratings yet
STAT 433 Course Note
11 pages
Probability Review
No ratings yet
Probability Review
29 pages
A Comprehensive Guide To Time Series Analysis
No ratings yet
A Comprehensive Guide To Time Series Analysis
18 pages
Advanced Probabiliy
No ratings yet
Advanced Probabiliy
80 pages
2 Information Theory
No ratings yet
2 Information Theory
40 pages
n-2 Cond-Exp
No ratings yet
n-2 Cond-Exp
31 pages
Summary Notes 1
No ratings yet
Summary Notes 1
4 pages
1 Preliminaries: 1.1 Dynkin's π-λ Theorem
No ratings yet
1 Preliminaries: 1.1 Dynkin's π-λ Theorem
13 pages
Marcin Pitera. Stochastic Processes.
No ratings yet
Marcin Pitera. Stochastic Processes.
45 pages
Probability Review
No ratings yet
Probability Review
5 pages
Handout For The Quantitative Finance Course: Conditional Expectation and Discrete Martingale
No ratings yet
Handout For The Quantitative Finance Course: Conditional Expectation and Discrete Martingale
4 pages
Yuriĭ A. Rozanov - Introduction To Random Processes
No ratings yet
Yuriĭ A. Rozanov - Introduction To Random Processes
126 pages
MDU
No ratings yet
MDU
41 pages
Learning Material - ITC
No ratings yet
Learning Material - ITC
96 pages
Time Series: Johnson & Johnson Quarterly Earnings Per Share, 84 Quarters, 1960 I To 1980 IV
No ratings yet
Time Series: Johnson & Johnson Quarterly Earnings Per Share, 84 Quarters, 1960 I To 1980 IV
19 pages
MTH403 Assignment 231219
No ratings yet
MTH403 Assignment 231219
2 pages
Project Report: Queuing of Network Packets
No ratings yet
Project Report: Queuing of Network Packets
13 pages
U3 Markov Chain
No ratings yet
U3 Markov Chain
15 pages
Time Series Lecture Notes-Ch-5
No ratings yet
Time Series Lecture Notes-Ch-5
27 pages
Brownian Motion PartIIIEssay Habermann
No ratings yet
Brownian Motion PartIIIEssay Habermann
47 pages
Stochastic Processes-2
No ratings yet
Stochastic Processes-2
17 pages
Queuing Theory 2
No ratings yet
Queuing Theory 2
18 pages
Teoria de Las Desiciones
No ratings yet
Teoria de Las Desiciones
23 pages
Chapter 1
No ratings yet
Chapter 1
11 pages
Mat 460-1
No ratings yet
Mat 460-1
2 pages
Peramalan (Forecasting)
No ratings yet
Peramalan (Forecasting)
24 pages
CS 4705 Hidden Markov Models: Slides Adapted From Dan Jurafsky, and James Martin
No ratings yet
CS 4705 Hidden Markov Models: Slides Adapted From Dan Jurafsky, and James Martin
35 pages
Quiz-6 Answers and Solutions: Coursera. Stochastic Processes December 30, 2020
No ratings yet
Quiz-6 Answers and Solutions: Coursera. Stochastic Processes December 30, 2020
4 pages
MScFE 622 CTSP - Compiled - Video - Transcripts - M3
No ratings yet
MScFE 622 CTSP - Compiled - Video - Transcripts - M3
8 pages
Queueing Theory & Markov Chains Problems With Solution
No ratings yet
Queueing Theory & Markov Chains Problems With Solution
4 pages
MA8402 Syllabus Edubuzz360
No ratings yet
MA8402 Syllabus Edubuzz360
2 pages
Geometric Brownian Motion
No ratings yet
Geometric Brownian Motion
2 pages
Assignment 1
No ratings yet
Assignment 1
1 page
Harmonic Analysis and the Theory of Probability
From Everand
Harmonic Analysis and the Theory of Probability
Salomon Bochner
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Fourier Series
From Everand
Fourier Series
G. H. Hardy
No ratings yet
An Introduction to Algebraic Topology
From Everand
An Introduction to Algebraic Topology
Andrew H. Wallace
No ratings yet