0% found this document useful (0 votes)
67 views27 pages

Stochastic Processes - Report On Course Contents

This is a summary report submitted as part of course requirements. It contains solved example and a week by week lecture digestion.

Uploaded by

Akash Chhabria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views27 pages

Stochastic Processes - Report On Course Contents

This is a summary report submitted as part of course requirements. It contains solved example and a week by week lecture digestion.

Uploaded by

Akash Chhabria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Stochastic Processes – Report on Course Contents

Akash Chhabria – 13174

Week 1

Lecture 1

Our first lecture dealt with a revision on the concept of random variables from probability theory
and an introduction of stochastic processes.

We were introduced to a sequence of random variables: 𝑋1 , 𝑋2 , 𝑋3 , … , 𝑋𝑛 or {𝑋𝑛 }𝑛∈𝑇 , where T is


either ℝ+ or ℤ+ . That is the 2 general types of stochastic processes, which either have discrete time
or continuous time. We also revised the idea of a probability space: (𝛺, 𝑓, 𝑃) where 𝛺 is the sample
space, 𝑓is the collection of all events and 𝑃 is the probability measure. We related this with the coin
flip example, where 𝛺 = {H, T}, where H and T, 𝜔1 and 𝜔2 , respectively, are our sample points. 𝑓 =
{{H}, {T}, {H, T}, {}}, where {H, T} is 𝛺, and probability 𝑃 takes values from the interval [0, 1]. We
revised the concept of cardinality, where we say that for flipping a coin twice, 𝑓 = 24 = 16 or more
generally, 𝑓 = 2|𝛺| . This discussion led towards X being a real valued random variable, such that X: 𝛺
-> E, which is the state space and can be either discrete (ℤ) or continuous (ℝ). For further
exemplification, we compared it with the Bernoulli random variable, where X can take values 1, with
probability p, or 0, with probability, 1-p. That is, extending the coin flip example, X(H) = 1 and X(T) =
0. Generally, a random variable is X (𝜔).

We concluded on the definition of stochastic process:

A collection or sequence of random variables indexed by time. Formally,

{𝑋𝑡 }𝑡∈𝑇 : (𝛺, 𝑓, 𝑃) -> E or ℝ𝑛

A brief introduction of the sample path (𝑋(𝑡, 𝜔1 )) of the process followed.

Lecture 2

We revised the formal definition of stochastic process, replacing state space E with the notation 𝑆.
We discussed then “why stochastic processes”. Under this:

i. How to control (predict) the future?


ii. What will be the future value of the process? (* This is considered in all stochastic processes)
iii. What are the dependencies? (For instance, if we are dealing with 𝑖𝑖𝑑 or independent and
identically distributed random variables, the future is easy to predict because there is no
dependency, however, there are cases where 𝑋1 can depend on 𝑋2 and so on.)
iv. What is the long-term behavior of the process? (We compared this to how when the number
of trials increase that Bernoulli becomes binomial and binomial becomes standard normal.
Here, in the context of the stochastic process, there is convergence towards a new model
with its own definition and distribution.)

This was then followed by looking at the types of stochastic processes:

i. Discrete Stochastic Process


a. Bernoulli Process
b. Random Walk
c. Markov Chains
d. Martingales
e. Poisson Process
ii. Continuous Stochastic Process
a. Brownian Motion
b. Martingale

Next, as introduced in the previous lecture, we delved into a formal definition of the path of the
process:

A realization of the stochastic process is a path of the stochastic process. Formally,

t -> 𝑋𝑡 (𝜔)

To solidify an understanding of this definition, the same example of coin flip from the previous
lecture was extended, to say heads offered a movement towards the right, and a tails offered a
movement towards the left. Similarly, there can be an up and down movement, which can be
captured as a visualization.

Finite Dimensional Distribution followed. That is,

If {𝑋𝑡 }𝑡∈𝑇 ,

We write:

𝐹𝑋𝑡1 ,𝑋𝑡2 ,…,𝑋𝑡𝑛 , the joint distribution function of random variables 𝑋𝑡1 , 𝑋𝑡2 ,…, 𝑋𝑡𝑛 .

This is the finite dimensional distribution function. We related it to how in the case of a random
variable X,

𝐹𝑋 (𝑥) = 𝑃(𝑋 ≤ 𝑥).

For a random vector, it follows that

𝐹𝑋1 ,𝑋2 ,…,𝑋𝑛 = 𝑃(𝑋1 ≤ 𝑥1 , 𝑋2 ≤ 𝑥2 , … , 𝑋𝑛 ≤ 𝑥𝑛 )

Using independency, the joint distribution function can be written as a product of the individual or
marginal distribution functions:
𝑛

𝐹𝑋𝑡1 ,𝑋𝑡2 ,…,𝑋𝑡𝑛 = 𝐹𝑋𝑡1 . 𝐹𝑋𝑡2 … 𝐹𝑋𝑡𝑛 = ∏ 𝐹𝑋𝑡


𝑖
𝑖=1

We can relate this to how at the start of the lecture we discussed these various dependencies. We
can now understand how easier it is when dealing with 𝑖𝑖𝑑 variables.

We concluded upon the definitions of stationary processes, and trend function.

Stationary processes: A stochastic process {𝑋𝑡 }𝑡∈𝑇 is said to be stationary if the joint distribution
function of (𝑋𝑡1 , 𝑋𝑡2 , … , 𝑋𝑡𝑛 ) 𝑎𝑛𝑑 (𝑋𝑡1 +ℎ , 𝑋𝑡2 +ℎ , … , 𝑋𝑡𝑛 +ℎ ) is the same. Where h>0

Trend function: E(X(t)), for all t𝜖T, provided it exists, is called the trend function. It does not matter if
we write 𝑋𝑡 as X(t), the meaning remains the same.

Lecture 3

As introduced in the previous lecture, we start with discrete stochastic processes. Our first topic of
discussion is the Bernoulli stochastic process.
After an initial discussion, we delved into the formal definition:

{𝑋𝑛 }𝑛∈ℕ0 is called a Bernoulli stochastic process if

a) 𝑋1 , 𝑋2 , … are independent
b) 𝑃{𝑋𝑛 = 1} = 𝑝,
𝑃{𝑋𝑛 = 0} = 𝑞 = 1 − 𝑝,
∀𝑛, where 𝑝 = probability of success

The properties of this process are then summarized in the following theorem:

{𝑋𝑛 }𝑛∈ℕ0 , a Bernoulli stochastic process with 𝑝 as the probability of success has:

i. 𝐸(𝑋𝑛 ) = 𝐸(𝑋𝑛2 ) = 𝐸(𝑋𝑛3 ) = ⋯ = 𝑝 (Mean)


2
ii. 𝑉𝑎𝑟(𝑋𝑛 ) = 𝐸(𝑋𝑛2 ) − (𝐸(𝑋𝑛 )) = 𝑝 − 𝑝2 = 𝑝𝑞 (Variance)
iii. 𝐸(𝛼 𝑋𝑛 ) = 𝑞 + 𝛼𝑝 (Probability Generating Function)

As ii is proved when we prove i, we avoid redundancy and start with just i and skip to iii.

𝐸(𝑋𝑛 ) = (𝑝)(1) + (0)𝑞 = 𝑝 – (i)

𝛼 0 𝑝 + 𝛼 1 𝑞 = 𝑞 + 𝛼𝑝 – (iii)

Next, we define the number of successes, under Bernoulli stochastic process. This is a stochastic
process generated from the Bernoulli stochastic process. Defined as:

𝑁𝑛 (𝜔) = 0 𝑖𝑓 𝑛 = 0
𝑁𝑛 (𝜔) = 𝑋1 (𝜔) + 𝑋2 (𝜔) + ⋯ + 𝑋𝑛 (𝜔) 𝑖𝑓 𝑛 = 1,2, … , 𝑛
We are interested in

𝑁𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 and 𝑁𝑛+1 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 + 𝑋𝑛+1

where 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛−1 represents the past, 𝑋𝑛 the present and 𝑋𝑛+1 the future. Recall under
“why stochastic processes”, we were interested in predicting (controlling) the future.

Therefore,

𝑁𝑛+1 = 𝑁𝑛 + 𝑋𝑛+1
Rewritten as:

𝑁𝑛+1 − 𝑁𝑛 = 𝑋𝑛+1 , which is equation 1

If n=0,

𝑁1 − 𝑁0 = 𝑋1
If n=1,

𝑁2 − 𝑁1 = 𝑋2
These results prove that 𝑁𝑛 is also generating 𝑋𝑛 just as 𝑋𝑛 generates 𝑁𝑛 . Therefore, their distribution
is the same.

From equation 1, we can write:

𝑃{𝑁𝑛+1 = 𝑘|𝑁𝑛 = 𝑗} = 𝑃{𝑋𝑛+1 = 𝑘 − 𝑗}


= 𝑝 𝑖𝑓 𝑗 = 𝑘 − 1
= 𝑞 𝑖𝑓 𝑗 = 𝑘
= 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Week 2

Lecture 4

Continuing from where we left off in the previous lecture. We formally discuss the number of
successes stochastic process. We say:

We have a process {𝑁𝑛 }𝑛∈ℕ0 , where

𝑁𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 and 𝑁𝑛+1 = 𝑁𝑛 + 𝑋𝑛+1 or 𝑋𝑛+1 = 𝑁𝑛+1 − 𝑁𝑛

That is,

𝑋1 = 𝑁1 − 𝑁0 = 𝑁1 ∴ 𝑁0 = 0
𝑋2 = 𝑁2 − 𝑁1
.

That is the random variables 𝑁1 , 𝑁2 , … , 𝑁𝑚 are completely determined by 𝑋1 , 𝑋2 , … , 𝑋𝑚 and


conversely, 𝑋1 , 𝑋2 , … , 𝑋𝑚 are completely determined by 𝑁1 , 𝑁2 , … , 𝑁𝑚 . Formally,

𝑃{𝑁𝑚+𝑛 − 𝑁𝑛 = 𝑘|𝑁1 , 𝑁2 , … , 𝑁𝑚 } = 𝑃{𝑁𝑚+𝑛 − 𝑁𝑛 = 𝑘|𝑋1 , 𝑋2 , … , 𝑋𝑚 } = 𝑃{𝑁𝑚+𝑛 − 𝑁𝑛 = 𝑘}


This result borrows from the memory-less property of the Markov process. In other words, the past is
irrelevant.

Theorem: For 𝑛 ∈ ℕ

a) 𝑃{𝑁𝑛 = 𝑘} = (𝑛𝑘)𝑝𝑘 𝑞𝑛−𝑘 , 𝑘 = 0,1,2, …


b) 𝑃{𝑁𝑚+𝑛 − 𝑁𝑚 = 𝑘} = (𝑛𝑘)𝑝𝑘 𝑞𝑛−𝑘

Knowing how to compute these probabilities will serve us well in our solved examples. We also noted
that:

𝑁𝑚+𝑛 − 𝑁𝑚 = 𝑋𝑚+1 + 𝑋𝑚+2 + ⋯ + 𝑋𝑚+𝑛 , which is a trivial result to find.

Next, as done in the Bernoulli stochastic process in the previous lecture, we find the expectation, the
variance of the number of successes process.

We have:

𝑁𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
Then,

𝐸(𝑁𝑛 ) = 𝐸(𝑋1 ) + 𝐸(𝑋2 ) + ⋯ + 𝐸(𝑋𝑛 )


= 𝑝 +𝑝 + ⋯+𝑝
𝐸(𝑁𝑛 ) = 𝑛𝑝
This is an easy enough result to obtain as the expected value of a Bernoulli random variable is the
probability of success 𝑝.

Similarly, we take the variance on both sides of

𝑁𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑉𝑎𝑟(𝑁𝑛 ) = 𝑉𝑎𝑟(𝑋1 ) + 𝑉𝑎𝑟(𝑋2 ) + ⋯ + 𝑉𝑎𝑟(𝑋𝑛 )
𝑉𝑎𝑟(𝑁𝑛 ) = 𝑝𝑞 + 𝑝𝑞 + ⋯ + 𝑝𝑞
𝑉𝑎𝑟(𝑁𝑛 ) = 𝑛𝑝𝑞

Moreover, using these two results, we can find 𝐸(𝑁𝑛2 )

𝐸(𝑁𝑛2 ) = 𝑉𝑎𝑟(𝑁𝑛 ) − (𝐸(𝑁𝑛 ))2

𝐸(𝑁𝑛2 ) = 𝑛𝑝𝑞 − (𝑛𝑝)2

𝐸(𝑁𝑛2 ) = 𝑛𝑝𝑞 − 𝑛2 𝑝2
We also recalled a theorem of conditional expectation to further aid us in solving examples:

𝐸[𝐸(𝑌|𝑋1 , … , 𝑋𝑛 )] = 𝐸(𝑌)
Lecture 5

Extending the idea of number of successes from the previous lecture, we are introduced with the idea
of times of successes.

Consider {𝑋𝑛 }𝑛∈ℕ0 a Bernoulli stochastic process with probability of success 𝑝 for a fixed 𝜔 ∈ 𝛺. Then,
we can write:

𝑋1 (𝜔), 𝑋2 (𝜔), …
Each of which can take values 1 or 0.

For example:

𝑋1 (𝜔) = 0, 𝑋2 (𝜔) = 1, 𝑋3 (𝜔) = 0, 𝑋4 (𝜔) = 1, 𝑋5 (𝜔) = 1


Then:

𝑇1 (𝜔) = 2, 𝑇2 (𝜔) = 4, 𝑇3 (𝜔) = 5


Generally,

𝑇𝑘 (𝜔) =trial number at which kth success occurs

Therefore, we have a new process: {𝑇𝑘 }𝑘≥1

While we know {𝑁𝑛 }𝑛∈ℕ is the number of successes, and {𝑇𝑘 }𝑘≥1 , the times of success, what is their
relation?

We consider 2 cases.

Case 1: for 𝜔 ∈ 𝛺 and 𝑁𝑛 (𝜔) = 𝑋1 (𝜔)+𝑋2 (𝜔) + ⋯ + 𝑋𝑛 (𝜔)

Then 𝑇𝑘 ≤ 𝑛, that is the kth success has occurred at or before the nth trial.
Therefore, the number of successes in the first n trials must be at least k. That is, 𝑁𝑛 (𝜔) ≥ 𝑘.

Therefore, if 𝑇𝑘 (𝜔) ≤ 𝑛 then 𝑁𝑛 (𝜔) ≥ 𝑘. Conversely, if 𝑁𝑛 (𝜔) ≥ 𝑘, then 𝑇𝑘 (𝜔) ≤ 𝑛.

Case 2: 𝜔 ∈ 𝛺 and 𝑇𝑘 = 𝑛. That is, there are exactly k-1 successes in the first (n-1) trials and one
success at the nth trial. Therefore, 𝑁𝑛−1 (𝜔) = 𝑘 and 𝑋𝑛 (𝜔) = 1. Conversely, 𝑁𝑛−1 (𝜔) = 𝑘 and
𝑋𝑛 (𝜔) = 1 then 𝑇𝑘 = 𝑛.

These 2 cases form part of a lemma with 𝑖𝑓𝑓 statements.

Our next topic of interest is the distribution of the process, that is times of success.

Theorem: For 𝑘 ∈ {1,2, … }, 𝑛 ≥ 𝑘, using the previous lemma:

a) 𝑃{𝑇𝑘 ≤ 𝑛} = ∑𝑛𝑗=𝑘 (𝑛𝑗) 𝑝 𝑗 𝑞 𝑛−𝑗 , 𝑛 = 𝑘, 𝑘 + 1, …


b) 𝑃{𝑇𝑘 = 𝑛} = (𝑛−1
𝑘−1
)𝑝𝑘 𝑞𝑛−𝑘 , 𝑛 = 𝑘, 𝑘 + 1, …

What followed is a proof of these two statements, using the results obtained from our lecture on
number of successes:

a) We use case 1
{𝑇𝑘 ≤ 𝑛} = {𝑁𝑛 ≥ 𝑘}
𝑃{𝑇𝑘 ≤ 𝑛} = 𝑃{𝑁𝑛 ≥ 𝑘}
𝑛

𝑃{𝑇𝑘 ≤ 𝑛} = ∑ 𝑃{𝑁𝑛 = 𝑗}
𝑗=𝑘
𝑛
𝑛
𝑃{𝑇𝑘 ≤ 𝑛} = ∑ ( ) 𝑝 𝑗 𝑞 𝑛−𝑗
𝑗
𝑗=𝑘
b) {𝑇𝑘 = 𝑛} = {𝑁𝑛−1 = 𝑘 − 1, 𝑋𝑛 = 1}
𝑃{𝑇𝑘 = 𝑛} = 𝑃{𝑁𝑛−1 = 𝑘 − 1, 𝑋𝑛 = 1}
𝑃{𝑇𝑘 = 𝑛} = 𝑃{𝑁𝑛−1 = 𝑘 − 1}𝑃{𝑋𝑛 = 1} ∴ 𝑁𝑛−1 and 𝑋𝑛 are independent events
𝑛 − 1 𝑘−1 (𝑛−1)−(𝑘−1) 1
𝑃{𝑇𝑘 = 𝑛} = ( )𝑝 𝑞 .𝑝
𝑘−1
𝑛 − 1 𝑘 𝑛−𝑘
𝑃{𝑇𝑘 = 𝑛} = ( )𝑝 𝑞
𝑘−1
We can recognize that the result in part b shares resemblance with the PMF of negative binomial.

One last theorem was discussed:

For 𝑘 ∈ ℕ, 𝑛 ≥ 𝑘

a) 𝑃{𝑇𝑘+1 = 𝑛|𝑇0 , 𝑇1 , … , 𝑇𝑘 } = 𝑃{𝑇𝑘+1 = 𝑛|𝑇𝑘 }


This is easily deduced from the Markov memoryless property. That is 𝑇0 , 𝑇1 , … , 𝑇𝑘−1 form the
past, 𝑇𝑘 the present, and 𝑇𝑘+1 the future for {𝑇𝑘 }𝑘≥1 .
b) 𝑃{𝑇𝑘+1 − 𝑇𝑘 = 𝑚} = 𝑝𝑞 𝑚−1
That is the time between two successes is independent of the time of previous successes and
follows a geometric distribution.
In other words,
𝑇1 , 𝑇2 − 𝑇1 , … are independent and identically distributed variables.

Solved Examples of Week 1 and 2:

1. Suppose we flip a coin twice, and we want the probability of 2 heads: (Online Statbook)
We use the theorem:
𝑛
𝑃{𝑁𝑛 = 𝑘} = ( ) 𝑝𝑘 𝑞𝑛−𝑘
𝑘
And substitute the values for k as 2, n as 2, and p and q as ½. We obtain: ¼. A rather simple example
but it illustrates how the number of successes behaves exactly as a binomial distribution.

2.

Example 3.16 (Cinlar, 2013): Consider 𝑃{𝑇1 = 3, 𝑇5 = 9, 𝑇7 = 17}:

This event is the same as the event 𝑃{𝑇1 = 3, 𝑇5 − 𝑇1 = 6, 𝑇7 − 𝑇5 = 8} = 𝑃{𝑇1 = 3, 𝑇4 = 6, 𝑇2 =


8} = 𝑃{𝑇1 = 3}𝑃{𝑇4 = 6}𝑃{𝑇2 = 8} = (3−1
1−1
)𝑝1 𝑞2 (6−1
4−1
)𝑝4 𝑞2 (8−1
2−1
)𝑝2 𝑞6 = 70𝑝7 𝑞10.

3.

Example 2.16 (Cinlar, 2013): Compute 𝐸[𝑁11 |𝑁5 ] = 𝐸[𝑁5 + (𝑁11 − 𝑁5 )|𝑁5 ] = 𝐸[𝑁5 |𝑁5 ] + 𝐸[𝑁11 −
𝑁5 |𝑁5 ] = 𝑁5 + 𝐸[𝑁6 ] = 𝑁5 + 6𝑝
Week 3

Lecture 6

A quick revision on times of successes began our discussion:

{𝑇𝑘 }𝑘∈ℕ0

𝑇𝑘 = (𝑇1 − 𝑇0 ) + (𝑇2 − 𝑇1 ) + (𝑇3 − 𝑇2 ) + ⋯ + (𝑇𝑘 −𝑇𝑘−1 )


We know:

𝑃{𝑇𝑘+1 − 𝑇𝑘 = 𝑚} = 𝑝𝑞 𝑚−1 , that is, a geometric distribution

This revision ended on the last result proved in our previous lecture. This result was to then be taken
to find the expected value and variance of 𝑇𝑘 .

i) 𝐸(𝑇𝑘 )
𝐸(𝑇𝑘 ) = 𝐸(𝑇1 − 𝑇0 ) + 𝐸(𝑇2 − 𝑇1 ) + ⋯ + 𝐸(𝑇𝑘 −𝑇𝑘−1 )
1 1 1
𝐸(𝑇𝑘 ) = + + ⋯ +
𝑝 𝑝 𝑝
𝑘
𝐸(𝑇𝑘 ) =
𝑝
ii) 𝑉𝑎𝑟(𝑇𝑘 ) = 𝑉𝑎𝑟(𝑇1 − 𝑇0 ) + 𝑉𝑎𝑟(𝑇2 − 𝑇1 ) + ⋯ + 𝑉𝑎𝑟(𝑇𝑘 −𝑇𝑘−1 )
𝑞 𝑞 𝑞
𝑉𝑎𝑟(𝑇𝑘 ) = 2 + 2 + ⋯ + 2
𝑝 𝑝 𝑝
𝑘𝑞
𝑉𝑎𝑟(𝑇𝑘 ) = 2
𝑝
This then led into an exploration of the limiting behavior of {𝑁𝑛 }𝑛∈ℕ0 and {𝑇𝑘 }𝑘∈ℕ0 .

We know that:
𝑛 − 1 𝑘 𝑛−𝑘
𝑃{𝑇𝑘 = 𝑛} = ( )𝑝 𝑞 , 𝑘 = 1,2, … 𝑎𝑛𝑑 𝑛 = 𝑘, 𝑘 + 1, …
𝑘−1
This above negative binomial distribution can be approximated with the normal distribution.

𝑇𝑘 = (𝑇1 − 𝑇0 ) + (𝑇2 − 𝑇1 ) + ⋯ + (𝑇𝑘 −𝑇𝑘−1 )


𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛

𝐸(𝑆𝑛 ) = 𝑛𝜇, 𝑉𝑎𝑟(𝑆𝑛 ) = 𝑛𝜎 2 ∴ 𝑋𝑖 ~𝑖𝑖𝑑


𝑆𝑛 − 𝑛𝜇 (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇
𝑍𝑛 = =
𝜎 √𝑛 𝜎√𝑛

In place of X, we take the process of 𝑇𝑘 . The mean 𝑘𝜇 and standard deviation 𝜎√𝑘 are such because
the number of random variables is k.

Therefore, we can write:


𝑇𝑘 − 𝑘𝜇
𝑍𝑘 =
𝜎√𝑘
𝑘
𝑇𝑘 − 𝑝
𝑍𝑘 =
𝑘𝑞

𝑝2
𝑝𝑇𝑘 − 𝑘
𝑝
𝑍𝑘 =
1
𝑝 √𝑘(1 − 𝑝)
𝑝𝑇𝑘 − 𝑘
𝑍𝑘 =
√𝑘(1 − 𝑝)

In other words:
𝑡
𝑇𝑘 − 𝑘𝜇 1 𝑥2
lim { }→ ∫ 𝑒 − 2 𝑑𝑥 𝑜𝑟 𝑁(0,1)
𝑘→∞ 𝜎√𝑘 √2𝜋
−∞

Lecture 7

This lecture we started our discussion of Random Walk, another discrete stochastic process. We define
it as:

Consider 𝑌𝑖 , which are 𝑖𝑖𝑑 random variables and


𝑝 1−𝑝
𝑌𝑖 ~ ( ) or the coin toss distribution.
1 −1
That is 𝑌𝑖 takes values 1, with probability 𝑝, and -1, with probability 1 − 𝑝.

And

𝑆𝑛 = ∑𝑛𝑖=1 𝑌𝑖

Given 𝑆0 = 0, then

𝑆1 = 𝑌1
𝑆2 = 𝑌1 + 𝑌2
.

𝑆𝑛 = 𝑌1 + 𝑌2 + ⋯ + 𝑌𝑛
Then,

𝑆1 , 𝑆2 ,…, 𝑆𝑛 is called a simple random walk process.

We discussed an example with fixed number of trials or barriers, which led into our discussion of its
properties:

i) 𝐸(𝑆𝑛 ) = 0. That is, the mean change of all coin flips approaches zero if 𝑛 → ∞
ii) 𝑛0 < 𝑛1 < 𝑛2 < ⋯, then 𝑌𝑖+1 − 𝑌𝑖 are mutually independent.
iii) Stationarity
iv) {𝑆𝑛 }𝑛∈ℕ0 : (𝛺, 𝑓, 𝑃) → 𝑆. When state space 𝑆 = ℤ = {0, ±1, ±2, … }, then random walk is
unrestricted (without barriers). But if 𝑆 ⊆ ℤ, then random walk is called restricted.
v) Random walk with absorbing barrier. That is, if your random walk enters into a state 𝑖 and
there is no way to leave that state, meaning you forever stay in that state, then the
random walk is called a random walk with an absorbing barrier.
vi) The random walk model can be used to approximate the Brownian motion process.

We then try to prove the first property, as well as find the variance of this sum.

i) 𝐸(𝑆𝑛 ) = 𝐸(𝑌1 ) + 𝐸(𝑌2 ) + ⋯ + 𝐸(𝑌𝑛 ), 𝑌𝑖 ~𝑖𝑖𝑑 and 𝐸(𝑌1 ) = 𝐸(𝑌2 ) = ⋯ = 𝐸(𝑌𝑛 )


𝐸(𝑆𝑛 ) = 𝑛𝐸(𝑌1 )
𝐸(𝑆𝑛 ) = 𝑛[(1)𝑝 + (−1)(1 − 𝑝)] = 𝑛[2𝑝 − 1]
If 𝐸(𝑆𝑛 ) = 0, then
𝑛[2𝑝 − 1] = 0
1 1
𝑝 = 2, that is 1 − 𝑝 = 2 = 𝑞
If 𝑝 = 𝑞, it is called a symmetric random walk. However,
If 𝐸(𝑆𝑛 ) > 0
𝑛[2𝑝 − 1] > 0
1
𝑝>
2
That is
1
1−𝑝 <
2
The situation is reversed if 𝐸(𝑆𝑛 ) < 0
ii) Similarly,
𝑉𝑎𝑟(𝑆𝑛 ) = 𝑉𝑎𝑟(𝑌1 ) + 𝑉𝑎𝑟(𝑌2 ) + ⋯ + 𝑉𝑎𝑟(𝑌𝑛 ), 𝑌𝑖 ~𝑖𝑖𝑑 and
𝑉𝑎𝑟(𝑌1 ) = 𝑉𝑎𝑟(𝑌2 ) = ⋯ = 𝑉𝑎𝑟(𝑌𝑛 )
𝑉𝑎𝑟(𝑆𝑛 ) = 𝑛(𝑉𝑎𝑟(𝑌1 ))
2
𝑉𝑎𝑟(𝑆𝑛 ) = 𝑛[𝐸(𝑌12 ) − (𝐸(𝑌1 )) ]
𝑉𝑎𝑟(𝑆𝑛 ) = 𝑛[12 𝑝 + (−1)2 (1 − 𝑝) − (2𝑝 − 1)2 ]
𝑉𝑎𝑟(𝑆𝑛 ) = 𝑛[1 − 4𝑝2 + 4𝑝 − 1]
𝑉𝑎𝑟(𝑆𝑛 ) = 𝑛[4𝑝(1 − 𝑝)]
𝑉𝑎𝑟(𝑆𝑛 ) = 𝑛4𝑝𝑞
If 𝑝 = 𝑞, 𝑉𝑎𝑟(𝑆𝑛 ) = 𝑛 and the standard deviation of 𝑆𝑛 is equal to √𝑛.

We next delve deeper into a simple random walk model. That is, we have:
𝑛

𝑆𝑛 = ∑ 𝑌𝑖 , 𝑌𝑖 ~𝑐𝑜𝑖𝑛 𝑡𝑜𝑠𝑠 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛


𝑖=1

We then consider a random walk consisting of a total of n steps (that is, there is a barrier or restriction).
Suppose we are now interested in the total number of right steps.
𝑛
𝑃(𝑟𝑖𝑔ℎ𝑡 𝑠𝑡𝑒𝑝: 𝑆𝑛 = 𝑟) = ( ) 𝑝𝑟 𝑞 𝑛−𝑟
𝑟
This is equation 1. We also know that 𝑛𝑒𝑡 𝑑𝑖𝑠𝑝𝑙𝑎𝑐𝑒𝑚𝑒𝑛𝑡 = 𝑡𝑜𝑡𝑎𝑙 𝑟𝑖𝑔ℎ𝑡 − 𝑡𝑜𝑡𝑎𝑙 𝑙𝑒𝑓𝑡. That is,

𝑘 = 𝑟 − (𝑛 − 𝑟)
𝑘 = 2𝑟 − 𝑛
𝑘+𝑛
𝑟=
2
We can then rewrite equation 1 to obtain equation 2.

𝑘+𝑛 𝑛 𝑘+𝑛 𝑛−𝑘


𝑃 ( 𝑆𝑛 = ) = (𝑘 + 𝑛) 𝑝 2 𝑞 2
2
2
Week 4

Lecture 8

In this lecture, we take our accumulated knowledge of the random walk and apply it to the Gambler’s
Ruin problem.

Before we begin, we understand that different probabilities can be found using the final equation
derived in the previous lectures. For instance, for finding the probability of hitting the origin, we can
set total displacement 𝑘 = 0, and similar reasoning can allow us to find the probability of never hitting
the origin.

We begin by defining the Gambler’s Ruin problem. Here, we suppose that there are two gamblers –
gambler A and gambler B.

A has initial capital = $𝑛, while B has initial capital = $(𝑎 − 𝑛), where 𝑎 = total capital (constant).

Both gamblers A and B are playing a coin flip game and probability of A wining the turn is 𝑝, while the
probability that B wins the turn is 𝑞, where 𝑝 + 𝑞 = 1. At the end of each turn, $1 will be transferred
accordingly.

We suppose 𝑈𝑛 = probability that A wins starting from 𝑛 and 𝑉𝑛 = probability that B wins starting from
𝑛.

We know that 𝑈𝑛 =(probability of the first turn landing on 𝑛 + 1)(probability of winning from step 𝑛 +
1) +(probability of first landing on 𝑛 − 1)(probability of winning from 𝑛 − 1)
Therefore, we write the difference equation: 𝑈𝑛 = 𝑝𝑈𝑛+1 + 𝑞𝑈𝑛−1 and using the boundary conditions
𝑞 𝑛
( ) −1
𝑝
𝑈0 = 0 and 𝑈𝑎 = 1, gives us the final result of 𝑞 𝑎
if 𝑝 ≠ 𝑞. Then, there are two cases:
( ) −1
𝑝

i) 𝑝>𝑞
ii) 𝑝<𝑞
We find the limiting behavior, as 𝑛 → ∞, to show that for the first case, 𝑈𝑛 → 1, and for the second
case 𝑈𝑛 → 0.

Lecture 9
𝑞 𝑛 𝑞 𝑛 𝑞 𝑎
( ) −1 ( ) −( )
𝑝 𝑝 𝑝
We revise that 𝑈𝑛 = 𝑞 𝑎
and𝑉𝑛 = 𝑞 𝑎
and that their sum is equal to 1. These apply to the case
( ) −1 1−( )
𝑝 𝑝
of a unfair game, that is 𝑝 ≠ 𝑞. If instead 𝑝 = 𝑞 then we cannot use the equation 𝑈𝑛 = 𝐴1 (1)𝑛 +
𝑞 𝑛
𝐴2 (𝑝) , which was obtained after supposing 𝑈𝑛 = 𝐴𝑊 𝑛 , where 𝐴 is a constant. This is because 𝑝 =
𝑞 implies 𝑈𝑛 = 𝐴1 + 𝐴2 . Instead, we suppose that 𝑈𝑛 = (𝐴1 + 𝐴2 𝑛)(𝑊)𝑛 and using the same two
𝑞 𝑛
roots 𝑊1 and 𝑊2 , that is 1 and (also equal to 1 here), we find that 𝑈𝑛 = . Similarly, for 𝑉𝑛 , with a
𝑝 𝑎
1
difference in boundary conditions, that is 𝑉0 = 1 and 𝑉𝑛 = 0, we obtain 𝑉𝑛 = (1 − 𝑛).
𝑎

We are now interested in the length of the duration of the gambler ruin problem.

Assuming it to be finite, we set 𝑑𝑛 = 𝑡ℎ𝑒 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑔𝑎𝑚𝑒 𝑠𝑡𝑎𝑟𝑡𝑖𝑛𝑔 𝑓𝑟𝑜𝑚 𝑛. Then,
the expected number of turns from equals the sum of the very first turn and 𝑑𝑛+1 or more and 𝑑𝑛−1 or
more. This is akin to the reasoning used to find the difference equation for 𝑈𝑛 .

𝑑𝑛 = 1 + 𝑝𝑑𝑛+1 + 𝑞𝑑𝑛−1 , which can be rewritten as 𝑝𝑑𝑛+1 − 𝑑𝑛 + 𝑞𝑑𝑛−1 = −1. Now, this non-
homogenous equation, requires us to solve the homogenous equation, where instead of -1, we
suppose 0, the same way as we solved 𝑈𝑛 and then find the particular solution 𝑓(𝑛). We suppose
𝑓(𝑛) = 𝑐𝑛, find 𝑐 using the difference equation. Once this particular solution is found, the same
boundary conditions give us the solution:
𝑞 𝑛
𝑛 𝑎 1−(𝑝)
𝑑𝑛 = − [ 𝑎 . We recall this is when there is an unfair game. For a fair game, we instead
𝑞−𝑝 𝑞−𝑝 1−(𝑞)
𝑝

suppose in the case of a particular solution that 𝑓(𝑛) = 𝑐𝑛2. The final solution then is 𝑑𝑛 = 𝑎𝑛 −
𝑛2 = 𝑛(𝑎 − 𝑛).

Solved Examples of Week 3 and 4:

1.

Compute 𝐸(𝑇5 |𝑇1 , 𝑇2 . 𝑇3 ).

𝐸(𝑇5 |𝑇1 , 𝑇2 . 𝑇3 ) = 𝐸(𝑇5 |𝑇3 ) = 𝐸(𝑇3 + (𝑇5 − 𝑇3 )|𝑇3 ) = 𝐸(𝑇3 |𝑇3 ) + 𝐸(𝑇5 − 𝑇3 |𝑇3 ) = 𝑇3 + 𝐸(𝑇2 )
2
= 𝑇3 +
𝑝
2.

{𝑋𝑛 }𝑛∈ℕ is a random walk process. What is 𝑃{𝑋5 = 3, 𝑋10 = 6, 𝑋20 = 10}?
𝑃{𝑋5 = 3, 𝑋10 = 6, 𝑋20 = 10} = 𝑃{𝑋5 = 3, 𝑋10 − 𝑋5 = 3, 𝑋20 − 𝑋10 = 4}
= 𝑃{𝑋5 = 3}𝑃{𝑋5 = 3}𝑃{𝑋10 = 4} = 𝑃{𝑆5 = 4}𝑃{𝑆5 = 4}𝑃{𝑆10 = 7}
Using formula

𝑘+𝑛 𝑛 𝑘+𝑛 𝑛−𝑘


𝑃 ( 𝑆𝑛 = ) = (𝑘 + 𝑛) 𝑝 2 𝑞 2
2
2
5 5 10
( ) 𝑝4 𝑞1 ( ) 𝑝4 𝑞1 ( ) 𝑝7 𝑞3 = 3000𝑝15 𝑞5
4 4 7
3.

We want to show

𝑃( 𝑋4 = 2) = (43)𝑝3 𝑞1 = (41)𝑝3 𝑞1 , where {𝑋𝑛 }𝑛∈ℕ is a random walk process.

𝑃( 𝑋4 = 2) = 𝑃( 𝑆4 = 3) = (43)𝑝3 𝑞1, using the same above listed formula. We also know that (43) =
(41). Hence:

𝑃( 𝑋4 = 2) = (43)𝑝3 𝑞1 = (41)𝑝3 𝑞1

Week 5

Lecture 10

Having finished our discussion of the random walk process, we now delve into another discrete
stochastic process, namely Markov Chain.

We begin by first defining the process:

A Markov chain {𝑋𝑛 }𝑛∈ℕ0 is a stochastic process with countably many states such that for each state
𝑖0 , 𝑖1 , 𝑖2 , … , 𝑖𝑛 , 𝑗
𝑃(𝑋𝑛+1 = 𝑗|𝑋0 = 𝑖0 , 𝑋1 = 𝑖1 , … , 𝑋𝑛 = 𝑖𝑛 ) = 𝑃(𝑋𝑛+1 = 𝑗|𝑋𝑛 = 𝑖𝑛 ) − 𝑝(𝑖𝑛 , 𝑗).

The term being subtracted is known as the transition probability. We also observe that this is a case
of the memory-less property of Markov encountered and applied in previous lectures. For each
transition, there exist similar transition probabilities. These are best captured in a Markov Matrix or a
Transition Matrix, which is our next definition:

Let 𝑃 be a square matrix of entries 𝑝(𝑖, 𝑗) for all 𝑖, 𝑗 ∈ 𝑆, where 𝑆 is the state space. Then 𝑃 is called
the Markov Matrix if:

a) For any 𝑖, 𝑗 ∈ 𝑆, 𝑝(𝑖, 𝑗) ≥ 0


b) For each 𝑖 ∈ 𝑆, ∑𝑗∈𝑆 𝑝(𝑖, 𝑗) = 1

If we write 𝑝𝑘 = 𝑝𝑘 𝑖𝑗 = 𝑝𝑘 (𝑖, 𝑗), we are referring to a k-step transition.

Next, we discuss the Chapman-Kolmogorov Equations:

i) 𝑃(𝑋0 = 𝑖0 , 𝑋1 = 𝑖1 , … , 𝑋𝑛−1 = 𝑖𝑛−1 , 𝑋𝑛 = 𝑖𝑛 ) = 𝜋(𝑖0 )𝑝(𝑖0 , 𝑖1 )𝑝(𝑖1 , 𝑖2 ) … 𝑝(𝑖𝑛−1 , 𝑖𝑛 ),


where 𝜋(𝑖0 ) = 𝑃(𝑋0 = 𝑖0 ), that is, the initial distribution. This is easily proved using the
definition of Markov chain and the Markov memory-less property.
ii) 𝑃𝑛+𝑚 (𝑖, 𝑗) = ∑∞ 𝑛 𝑚
𝑘=0 𝑝 (𝑖, 𝑘)𝑝 (𝑘, 𝑗)
iii) 𝑃(𝑋𝑛 = 𝑗) = ∑∞ 𝑛
𝑖=0 𝜋 (𝑖)𝑝 (𝑖, 𝑗)
𝑛+𝑚 ∞ 𝑛 𝑚
iv) 𝑃 𝑖,𝑗 = ∑𝑘=0 𝑃 𝑖,𝑘 𝑃 𝑘,𝑗

Lecture 11

This lecture addressed the remaining 3 equations. The second was similarly proved using the Markov
property. The third used the law of total probability and the concept of disjoint sets. The last and
fourth equation’s proof used the law of total probability as well. Solving two example problems
showed us how to draw a transition diagram and how to best make use of the transition matrix.

Week 6

Lecture 12

We begin by again discussing Markov Chain, where the process is defined by {𝑋𝑛 }𝑛∈ℕ0 . We revise our
understanding of the transition matrix and move towards stationary distributions.

Again,

{𝑋𝑛 }𝑛∈ℕ0 and 𝜋𝑛 denotes the distribution of the chain at time 𝑛. That is,

𝜋𝑛 (𝑖) => the distribution in state 𝑖

𝜋𝑛 (𝑖) = 𝑃{ 𝑋𝑛 = 𝑖}
𝜋𝑛+1 (𝑗) = 𝑃{ 𝑋𝑛+1 = 𝑗}
Then, using the law of total probability (from the Chapman-Kolmogorov equations, which we
discussed in our previous lecture)

𝜋𝑛+1 (𝑗) = 𝑃{ 𝑋𝑛+1 = 𝑗} = ∑ 𝑃{ 𝑋𝑛+1 = 𝑗 | 𝑋𝑛 = 𝑖}


𝑖∈𝑆

𝑃{ 𝑋𝑛 = 𝑖}

𝜋𝑛+1 (𝑗) = ∑ 𝜋𝑛 (𝑖)𝑝(𝑖, 𝑗)


𝑖∈𝑆

𝜋𝑛+1 (𝑗) = 𝜋𝑛 . 𝑃, where 𝜋𝑛 is the current distribution and 𝑃 is the Markov matrix.
𝑎 𝑎2
𝜋𝑛 = [𝜋1 𝜋2 ] and 𝑃 = [ 1 𝑎 + 𝜋2 𝑎3 𝜋1 𝑎2 +𝜋2 𝑎4 ]. This result is
𝑎3 𝑎4 ], so that 𝜋𝑛+1 = [𝜋1 1
helpful in us understanding how the initial distribution of states (that is the probability vector 𝜋0 )
transforms in the future (multiplied by 𝑃 or the transition matrix). That is, we are predicting future
behavior.

We have a proposition then:

𝜋𝑛+1 = 𝜋𝑛 . 𝑃
Initially at time zero,

𝜋0 = 𝜋1 = 𝜋2 = ⋯ = 𝜋𝑛 = 𝜋𝑛+1 . Therefore, the above equation transforms into:

𝜋0 = 𝜋0 . 𝑃. If we let 𝑣 = 𝜋0 , so that 𝑣 = 𝑣. 𝑃, where 𝑣 is a row matrix.


𝜋 = 𝜋. 𝑃
We are interested in these ‘𝜋’s. We can carry out a number of transitions using the same matrix 𝑃,
using its powers. That is 𝑃 itself is a one-step transition, while 𝑃2 is a two-step, and 𝑃𝑛 is a n-step
transition.

How does this help us? It helps us determine the equilibrium state, which we find by using

𝜋 = 𝜋. 𝑃. The use of this is best explained through an example.

Lecture 13

Revising what was taught in the previous lecture regarding the equilibrium state, after having proved
that 𝑣𝑃 = 𝑣 , we find:

𝜋𝑛+1 = 𝜋𝑛 . 𝑃
If 𝑛 = 0,

𝜋1 = 𝜋0 . 𝑃
If 𝑛 = 1,

𝜋2 = 𝜋1 . 𝑃
𝜋2 = 𝜋0 . 𝑃. 𝑃
𝜋2 = 𝜋0 . 𝑃2 -> 2-step transition

If 𝑛 = 2,

𝜋3 = 𝜋2 . 𝑃

𝜋2 = 𝜋0 . 𝑃 2 . 𝑃

𝜋2 = 𝜋0 . 𝑃 3
Generally,

𝜋𝑛 = 𝜋0 . 𝑃 𝑛
Therefore, we multiply the initial distribution with the matrix, and if we don’t have the initial
distribution, we take powers of the matrix.

This lecture concludes on defining the regular transition matrix:

If its transition matrix is a regular matrix, then it is a regular Markov chain. A regular matrix is such
that some power of it contains all positive entries.

Solved Examples of Week 5 and Week 6:

1.
Write the above transition diagram as a transition matrix:
0.9 0.1 0
(0.1 0.6 0.3)
0 0.3 0.7
2.
0 1 0
(0.4 0.2 0.4)
1 0 0
Is the above matrix regular?

Yes. The 3rd power of this matrix has all positive entries and is equal to
0.48 0.44 0.08
(0.256 0.568 0.176)
0.4 0.2 0.4
3.
2 3
A transition matrix ( 5 5 ). We wish to find the equilibrium matrix 𝑝.
1 0
2 3
(𝑥 𝑦) (5 5) = (𝑥 𝑦)
1 0
The product yields the following equations:
2
𝑥+𝑦 =𝑥
5
3
𝑥 + 0𝑦 = 𝑦
5
We subtract x and y from equations 1 and 2 above respectively:
3
− 𝑥+𝑦 =0
5
3
𝑥−𝑦 =0
5
We take the first equation and another derived from the fact that the sum of the probability vector is
equal to 1:
3
− 𝑥+𝑦 =0
5
𝑥+𝑦 =1
Gaussian Elimination yields:
5
𝑥=
8
3
𝑦=
8
Therefore,
5 3
𝑝=( )
8 8
Week 7

Lecture 14

Our previous week’s discussion was on stationary distributions. Our new topic in these 2 weeks is
classification of Markov states.

Under classifications of these states, we have:

i) Recurrent state
a. Positive recurrent
b. Null recurrent
ii) Transient state
iii) Examples of i) and ii)

Before diving into i) and ii), we studied some basic definitions. One was on communication and
accessibility:

We suppose {𝑋𝑛 }𝑛∈𝑇 is a Markov chain process, then a state j is said to be accessible from state 𝑖 If

𝑝𝑛 (𝑖, 𝑗) > 0, 𝑓𝑜𝑟 𝑛 ≥ 0.

That is the probability of transition in n steps is positive, meaning something does happen and we do
transition from one state to another. Communication is only possible when this probability is positive.
If it is equal to 0, no communication occurs between the two states.

Using the Chapman-Kolmogorov equations:

𝑝𝑛 (𝑖, 𝑗) = 𝑃{𝑋𝑛 = 𝑗|𝑋0 = 𝑖} > 0. If this is possible, state 𝑖 is accessible from state j. In notation,

𝑖 ↔ 𝑗 means 𝑖 communicates with j. If it is one-sided, 𝑖 → 𝑗, this means that j is accessible from 𝑖.


Communication, therefore, is a two-way accessibility.

Further, ↔ is a binary relation and is an equivalence class, such that:

i) For each 𝑖 ∈ 𝑆, 𝑖 ↔ 𝑖 (which is also known as reflexivity)


ii) For each 𝑖, 𝑗 ∈ 𝑆, 𝑖 ↔ 𝑗 𝑖𝑓𝑓 𝑗 ↔ 𝑖 (which is also known as symmetry)
iii) For each 𝑖, 𝑗, 𝑘 ∈ 𝑆, if 𝑖 ↔ 𝑗 and 𝑗 ↔ 𝑘 then 𝑖 ↔ 𝑘 (which is also known as transitivity)

These properties must be considered when finding an equivalence class. Using the basic definitions of
communication and accessibility, we can define the classification of states as either recurrent or
transient.

This discussion now moves towards defining these classifications:

First, we define recurrent:

We define formally,

𝑃{𝑋𝑛 = 𝑖, 𝑋1 ≠ 𝑖, 𝑋2 ≠ 𝑖, … , 𝑋𝑛−1 ≠ 𝑖|𝑋0 = 𝑖}


This probability is denoted by 𝑓𝑖𝑖𝑛 . Also, sometimes, we write 𝑓𝑖 = 𝑓𝑖𝑖𝑛 = ∑∞ 𝑛
𝑛=1 𝑓𝑖𝑖 . These all represent
the probability of returning to 𝑖.

a) 𝑖 is recurrent if 𝑓𝑖 = 𝑓𝑖𝑖𝑛 = 1. That is, starting from state 𝑖 the chain will return to state 𝑖 within
a finite random time with probability 1.
Formally,
𝑓𝑖 = 𝑃{𝜏𝑖 < ∞|𝑋0 = 𝑖} = 𝑃{𝑋𝑛 = 𝑖, 𝑛 ≥ 1|𝑋0 = 𝑖} = 1
Here, 𝜏𝑖 represents the time taken to return to state 𝑖.

Next, we define transient:

State 𝑖 is called transient if 𝑓𝑖 < 1. Thus, if 𝑓𝑖 is the probability of returning to 𝑖, then we can say 1-𝑓𝑖 is
the probability of never returning to 𝑖. Formally, 1-𝑓𝑖 = 𝑃{𝜏𝑖 = ∞|𝑋0 = 𝑖}.

Moreover, if there is communication between 2 states and one state is recurrent, then the other will
also be recurrent.

Lecture 15

We now consider counting the total number of visits to state 𝑖 ∈ 𝑆. Given that 𝑋0 = 𝑖 and defining:

𝑁𝑖 = ∑∞ 𝑛=0 𝐼{𝑋𝑛 = 𝑖|𝑋0 = 𝑖 }, where we have used an indicator function to calculate the total number
of visits. Then, we say 𝑖 is recurrent (𝑓𝑖 = 1) 𝑖𝑓𝑓 𝐸(𝑁𝑖 ) = ∞ and 𝑖 is transient (𝑓𝑖 < 1) 𝑖𝑓𝑓 𝐸(𝑁𝑖 ) <
∞.

This is defined as a theorem:


𝑛
𝐸(𝑁𝑖 |𝑋0 = 𝑖) = ∑∞
𝑛=0 𝑝𝑖𝑖

This is easily proved using an indicator function 𝐼𝑛 , which is equal to 1 if 𝑋𝑛 = 𝑖 and 0 otherwise.
Letting 𝑁𝑖 equal the infinite sum of this indicator function, we can find the expected value of 𝑁𝑖 with
help from the Chapman-Kolmogorov equations.

Next, we define positive recurrent:

A state 𝑖 is positive recurrent if 𝑓𝑖 = 1 and 𝜇𝑖 < ∞, where 𝜇𝑖 refers to the expectation. A null recurrent
state, on the other hand, is defined as 𝑓𝑖 = 1 and 𝜇𝑖 = ∞.

We then considered the relationship of the transient state and the geometric distribution:

Supposing 𝑖 is transient, that is 𝑓𝑖 < 1 and 𝑁𝑖 is equal to the number of visits to state 𝑖, then:
𝑗−1
𝑃{𝑁𝑖 = 𝑗|𝑋0 = 𝑖} = 𝑓𝑖 (1 − 𝑓𝑖 ), where 𝑗 = 1,2,3, …

What we see is the probability mass function of the geometric distribution. After proving this, we
move towards another classification, that is, periodic and aperiodic states:

A state in a Markov chain is periodic if the chain can return to the state only at multiples of some
integer greater than 1.

This is formally defined as:

𝛿(𝑖), 𝑖 ∈ 𝑆, is the period of the state and 𝛿(𝑖) = gcd {𝑛 ∈ ℕ: 𝑝𝑛 (𝑖, 𝑖) > 0}

i) 𝑖 is aperiodic if 𝛿(𝑖) = 1
ii) 𝑖 is periodic if 𝛿(𝑖) > 1
Moreover, if 𝑖 ↔ 𝑗, then 𝛿(𝑖) = 𝛿(𝑗).

Moving towards defining the class of states:

A class consists of elements which communicate between themselves and with no element from
outside.

Our final definition for this lecture is the irreducible Markov chain:

A Markov chain is called irreducible if there is only one (equivalence) class, that is, 𝑖𝑓𝑓 all states
communicate with each other, otherwise the Markov chain is reducible.

Week 8

Lecture 16

We enhance our understanding of irreducible chains through the following definitions:

i) A set of states is said to be closed if no state outside it can be reached from any state in it.
ii) A state forming a closed set by itself is called the absorbing state.
iii) A closed set is irreducible if no proper subset of it is closed.
iv) A Markov chain is called irreducible if its only closed set is the set of all states.

We further explore these definitions:

For instance, we consider the definition ii) and iv):

a) State 𝑗 ∈ 𝑆 is absorbing 𝑖𝑓𝑓 𝑝(𝑗, 𝑗) = 1.


b) A Markov chain is irreducible 𝑖𝑓𝑓 all states can be reached from each other.

We note that if we have a closed set C, then deleting from the transition matrix those rows and
columns which correspond to the states not in C produces another transition/Markov matrix. This is
captured in the following proposition:

Let 𝐶 = {𝑐1 , 𝑐2 , … } ⊂ S be a closed set and define 𝑄(𝑖, 𝑗) = 𝑃(𝑐𝑖 , 𝑐𝑗 ), 𝑐𝑖 , 𝑐𝑗 ∈ 𝐶. Then 𝑄 is a Markov
matrix.

What follows then is a lemma:

For each recurrent state 𝑗, ∃ an irreducible closed set 𝐶 which includes 𝑗.

Next, is a corollary:

Let 𝐶 be an irreducible closed set of finitely many states,

i) Then no state in 𝐶 is recurrent null


ii) Then it has no transient states

The new definitions learned are being related to definitions learned in the previous lecture. This helps
to build our understanding of these core concepts.

We conclude on the definitions of absorbing states and absorbing Markov chains:

i) 𝑓𝑖𝑖 = 1, 𝑖 ∈ 𝑆 of {𝑋𝑛 }𝑛∈ℕ


ii) A Markov chain is absorbing 𝑖𝑓𝑓
a. The chain has at least one absorbing state
b. It is possible to go from any non-absorbing state to any absorbing state. This may be
in more than one step.

Knowing this definition, we can finally delve into the applications of the absorbing Markov chain:

If {𝑋𝑛 }𝑛≥0 is a discrete time Markov chain, then we can decompose the Markov matrix 𝑃 as:
𝐼 𝑂
𝑃=[ ], where 𝐼 is the identity matrix consisting of the transition probabilities from absorbing
𝑅 𝑄
states to themselves, 𝑂 is the null matrix consisting of the transition probabilities from absorbing to
transient states, 𝑅 is the matrix consisting of the transition probabilities from transient to absorbing
states, and 𝑄 is the matrix consisting of transition probabilities from transient states to themselves.

Lecture 17

We continue this idea of absorbing Markov chain, using the same application defined above. We now
accompany it with a theorem:

For an absorbing Markov chain, the matrix 𝐼 − 𝑄 has an inverse 𝐹, where 𝐹 = 𝐼 + 𝑄 + 𝑄 2 + ⋯

That is, 𝐹 = (𝐼 − 𝑄)−1 and 𝐹 is called the fundamental matrix corresponding to 𝑄.

This theorem is easily proved, and exploring powers of the transition matrix using the result, gives us:
𝐼𝑚 𝑂
𝑃𝑘 = [ ]
𝐹𝑅 𝑄𝑘
The product 𝐹𝑅 gives us the matrix of probabilities that a particular initial non-absorbing state will
lead to a particular absorbing state.

The fundamental matrix 𝐹 gives us the expected number of visits to each state before absorption
occurs.

Similar to our application of the random walk process to the gambler’s ruin problem, we now apply
the Markov chain in the same capacity. We write the problem as:

Consider a gambler who at each play has probability 𝑝 of winning one unit and probability 𝑞 = 1 − 𝑝
of losing one unit. Assuming that successive plays of the game are independent (following the
definition of random walk), what is the probability that, starting with 𝑖 units, the gambler’s fortune
will reach 𝑁 (the maximum value) before reaching 0.

This is solved by letting {𝑋𝑛 }𝑛∈{0,1,2,… } denote the player’s fortune. It is a Markov chain with transition
probability 𝑝00 = 𝑝𝑁𝑁 = 1, whereas 𝑝𝑖,𝑖+1 = 𝑝 = 1 − 𝑝𝑖,𝑖−1 , where 𝑖 = 1,2,3, … , 𝑁 − 1. Then,
{1,2, … , 𝑁 − 1} are transient states, and {0} and {𝑁} are absorbing. Letting 𝑃𝑖 denote the probability
that starting with 𝑖 the gambler’s fortune will eventually reach 𝑁, where 𝑖 = 0,1, … , 𝑁. Constructing
a similar difference equation as that in random walk, but where the random variables are written in
terms of transition probabilities, we get:
𝑞
𝑃𝑖+1 − 𝑃𝑖 = 𝑝 (𝑃𝑖 − 𝑃𝑖−1 ). This is equation 1. Knowing that 𝑃0 = 0, and putting 𝑖 = 1 into the
equation, and doing similarly for 𝑖 = 2, and so on, leads us to form:

𝑞 𝑖−1
𝑃𝑖 − 𝑃𝑖−1 = ( ) 𝑃1 . Adding the first 𝑖 − 1 terms, leads us to forming similar equations as that in
𝑝
random walk:
𝑞 𝑖
1−( ) 𝑞 𝑞
𝑝
𝑃𝑖 = 𝑞 . 𝑃𝑖 , if ≠ 1 and 𝑃𝑖 = 𝑖𝑃1 , if = 1. Using the fact that at 𝑖 = 𝑁, 𝑃𝑁 = 1, putting in 𝑁 in
1−( ) 𝑝 𝑝
𝑝
both results gives us values of 𝑃1 , which can be subsequently substituted.

Firstly,
𝑞 𝑖
1−( ) 1
𝑝
𝑃𝑖 = 𝑞 𝑁
if 𝑝 ≠ 2
1−( )
𝑝

𝑖 1
And 𝑃𝑖 = 𝑁, if 𝑝 = 2
1
We see that the first result, where 𝑝 ≠ 2, is comparable to the result obtained for 𝑈𝑛 in the application
of the random walk.

Moreover, if 𝑁 → ∞

𝑞 𝑖
1 − (𝑝) 𝑖𝑓 𝑝 > 1/2
𝑃𝑖 =
1
0 𝑖𝑓 𝑝 ≤ 2

Solved Examples of Week 7 and 8:

1.

Consider the chain on states 1,2,3, and


1 1
0
2 2
1 1 1
𝑃=
2 4 4
1 2
( 0
3 3)
Is this an irreducible chain?

We know that 1 → 2 and 2 → 1. Therefore, 1 ↔ 2. Similarly, 2 → 3 and 3 → 2. Therefore, 2 ↔ 3. It


is an irreducible chain.

2.
0.15 0.05 0.8
( 0 1 0)
0.4 0.6 0
Find absorbing states and check whether the chain is absorbing. We can see that 2 is absorbing as it is
accessible from both 1 and 3, and the transition probability of 2 to itself is 1. Having at least one
absorbing state and that absorbing state being accessible from non-absorbing states implies that the
chain is absorbing.

3.
Considering the above transition diagram, identify recurrent and transient states:

The recurrent states are 5, 6, 7, and 8. The transient states are 1, 2, 3, 4. We can see that in the case
of recurrent states j the p(j,j)>0.

Week 9

Lecture 18

As the previous lecture concluded on the application of Markov chain in the gambler ruin problem, in
Martingales, we begin right away by discussing betting strategies.

This is a new topic in discrete time. Here the idea is betting strategies for:

i) Fair game
ii) Unfair game

Let 𝑀0 , 𝑀1 , 𝑀2 , … , 𝑀𝑛 denote a process which shows the amount of money at time n for a gamble
betting on a fair game. Then:

𝐸[𝑀𝑛+1 |𝑀0 = 𝑚0 , 𝑀1 = 𝑚1 , … , 𝑀𝑛 = 𝑚𝑛 ] = 𝑀𝑛 = 𝑚𝑛 . This is equation 1. It is similar to what we


have done before, where 𝑀0 , 𝑀1 , 𝑀2 , … , 𝑀𝑛−1 denotes the past, 𝑀𝑛 denotes the present, and 𝑀𝑛+1
future time.

In other words, 𝑀0 , 𝑀1 , 𝑀2 , … , 𝑀𝑛 form a Martingale if 𝐸(|𝑀𝑛 |) < ∞.

𝐸[𝑀𝑛+1 − 𝑀𝑛 |𝑀0 , 𝑀1 , … , 𝑀𝑛−1 ] = 0. This is equation 2. Equation 1 tells us that the expected value
of the future value is the present value. Equation 2, on the other hand, tells us that the mean change
between the present and the future value is zero.

𝐸[𝑀𝑛+1 |𝑀0 , 𝑀1 , … , 𝑀𝑛 ] can be rewritten as 𝐸[𝑀𝑛+1 |𝐹𝑛 ] = 𝑀𝑛 , where 𝐹𝑛 = 𝜎(𝑀0 , 𝑀1 , … , 𝑀𝑛 ) is an


information set, which is sigma generated by the random variables 𝑀0 , 𝑀1 , … , 𝑀𝑛 .

We also note that 𝐹𝑛 ⊂ 𝐹𝑛+1 ⊂ 𝐹𝑛+2 ⊂ ⋯


This is easily extended to show, using nested expectations that:

i) 𝐸[𝑀𝑛+1 |𝐹𝑛 ] = 𝑀𝑛
ii) 𝐸[𝑀𝑛+2 |𝐹𝑛 ] = 𝑀𝑛
iii) 𝐸[𝑀𝑛+3 |𝐹𝑛 ] = 𝑀𝑛
A coin toss problem introduces us to the concepts of sub-martingale and super-martingale.
This is a consideration of a series of games based on a coin toss experiment, given that:

𝑃{𝑋𝑖 = 1) = 𝑝 and 𝑃{𝑋𝑖 = −1) = 𝑞. Also, we recall from random walk, that 𝐸(𝑋𝑖 ) = 𝑝 − 𝑞. The total
gain is 𝑆𝑛 = 𝑋𝑛 +𝑋2 + ⋯ + 𝑋𝑛 .

Here, 𝐹𝑛 = 𝜎(𝑀0 , 𝑀1 , … , 𝑀𝑛 ) . If we are interested in computing the average gain after (𝑛 + 1)𝑡ℎ
step, given the information up to time n:

𝑆𝑛 = 𝑋𝑛 +𝑋2 + ⋯ + 𝑋𝑛
𝑆𝑛+1 = 𝑋𝑛 +𝑋2 + ⋯ + 𝑋𝑛 + 𝑋𝑛+1
𝑆𝑛+1 = 𝑆𝑛 + 𝑋𝑛+1 , where 𝑋𝑛+1 is independent. Therefore:

𝐸[𝑆𝑛+1|𝐹𝑛 ] = 𝐸[𝑆𝑛 + 𝑋𝑛+1 |𝐹𝑛 ] = 𝐸[𝑆𝑛 |𝐹𝑛 ] + 𝐸[𝑋𝑛+1 |𝐹𝑛 ] = 𝑆𝑛 + 𝐸[𝑋𝑛+1 ] = 𝑆𝑛 + 𝑝 − 𝑞


The information 𝑆𝑛 is known, therefore is a constant, and 𝑋𝑛+1 is independent. The final result tells
us that if 𝑝 = 𝑞, then 𝐸[𝑆𝑛+1 |𝐹𝑛 ] = 𝑆𝑛 and is therefore a martingale. But when 𝑝 ≠ 𝑞, two cases
follow. The first is when 𝑝 > 𝑞, and is a sub-martingale, such that 𝐸[𝑆𝑛+1 |𝐹𝑛 ] > 𝑆𝑛 . The second is
when 𝑝 < 𝑞, and is a super-martingale, such that 𝐸[𝑆𝑛+1 |𝐹𝑛 ] < 𝑆𝑛 .

Lecture 19

The idea in the previous lecture is now formally defined. Moreover, we prove 4 different problems as
being martingales.

First, the definition:

A sequence of random variables 𝑀0 , 𝑀1 , 𝑀2 , … , 𝑀𝑛 is said to be a martingale (sub-martingale or super-


martingale) with respect to an increasing sequence 𝐹0 ⊂ 𝐹1 ⊂ 𝐹2 ⊂ ⋯ of 𝜎-fields (information sets) if

i) 𝑀𝑛 is determined by 𝐹𝑛
ii) 𝐸(|𝑀𝑛 |) < ∞
iii) 𝐸[𝑀𝑛+1 |𝐹𝑛 ] = 𝑀𝑛 (martingale)
iv) 𝐸[𝑀𝑛+1 |𝐹𝑛 ] ≥ 𝑀𝑛 (sub-martingale)
v) 𝐸[𝑀𝑛+1 |𝐹𝑛 ] ≤ 𝑀𝑛 (super-martingale)

These are all ideas discussed in the previous lecture. They have now, however, been defined as one. The
lecture then had 4 problems where we had to show the above iii) equality. All 4 cases were proved to be
true. The two of the more well-known ones were the square and exponential martingales.

Week 10

Lecture 20

We begin with a discussion of Doob’s Martingale. We now know the concept of martingale, sub/super
martingale. In Doob’s martingale, we can generalize the concept of martingale with respect to another
sequence of random variables.

Let us consider the definition:

Let {𝑋0 , 𝑋1 , … } and {𝑌0 , 𝑌1 , … } be two discrete time stochastic processes. If 𝐸(|𝑋0 |) < ∞, ∀𝑛 = 0,1, …

Then the random sequence {𝑋0 , 𝑋1 , … } is a martingale with respect to {𝑌0 , 𝑌1 , … } or Doob’s
Martingale. Formally,

𝐸(𝑋𝑛+1 − 𝑋𝑛 |𝑌0 = 𝑦0 , 𝑌1 = 𝑦1 , … , 𝑌𝑛 = 𝑦𝑛 ) = 0 implies martingale.


𝐸(𝑋𝑛+1 − 𝑋𝑛 |𝑌0 = 𝑦0 , 𝑌1 = 𝑦1 , … , 𝑌𝑛 = 𝑦𝑛 ) ≤ 0 implies super-martingale.

𝐸(𝑋𝑛+1 − 𝑋𝑛 |𝑌0 = 𝑦0 , 𝑌1 = 𝑦1 , … , 𝑌𝑛 = 𝑦𝑛 ) ≥ 0 implies sub-martingale.

We prove a theorem that states Doob’s process is a martingale, using similar methods as done in the
previous lecture’s examples. Several connected examples are discussed, including variance martingale
and doubling strategy. Some important remarks at the end of this discussion are:

i) After every win, the game starts anew and the 𝑆𝑖 are adjusted accordingly.
ii) If the gambler loses at time N +1, his total winnings become 0.
iii) Since N is random, 𝑆𝑖 are random also.
iv) If the gambler decides to stop the game at a fixed time, let us say “n”, then he cannot
1
expect to have made the winnings. If {𝑌1 , 𝑌2 , … } is a super martingale, that is 𝑝 ≤ .
2
v) Therefore, the gambler should not stop the game (doubling strategy), at any time. That is,
the gambler must have unlimited amount of initial capital.

Lecture 21

Now, we discuss stopping time/Markov time/optimal time and optimal stopping theorem. We define
it as:

Let {𝐹𝑛 , 𝑛 ≥ 1} be an increasing sequence. A random variable 𝑇(𝜔) taking values in the set
{0,1,2, … , ∞} is called a stopping time with respect to {𝐹𝑛 , 𝑛 ≥ 1} if for every n

{𝑇 ≤ 𝑛} ∈ 𝐹𝑛 and 𝑃{𝑇 < ∞} = 1.

Optimal Stopping Theorem (Doob’s optional stopping theorem):

Let 𝑀0 , 𝑀1 , 𝑀2 , … be a martingale process with respect to 𝐹𝑛 and 𝑇 be a bounded stopping time, then:

𝐸(𝑀𝑇 ) = 𝐸(𝑀0 )
The proof of this follows using the fact that 𝑇 is bounded and 𝑀𝑇 can be written as a sum of
independent increments, as in the Bernoulli process. An indicator function 𝐼{𝑇≥𝑘} is introduced, where
𝑘 is the index in the summation of the increments. This indicator takes a value of 1 when 𝜔 ∈ 𝐴 and
0 when it does not. Application of expectation on both side and manipulation of the indicator function
allows the result to be proven.

It is of interest, however, that to properly conclude this topic of martingales, an application on the
gambler’s ruin problem is necessary, just as done in random walk and Markov chain. The problem is
worded similar to how it is done in Markov chain. It is important to recognize that this is an application
of Doob’s optimal stopping theorem, and the case being discussed is that of a symmetric random walk,
where initial capital for gamblers 𝐺1 and 𝐺2 is a and b dollars respectively. The same condition of them
continuing the game till one of them is ruined is applied. Our goal is to determine the probability of
𝐺1 being ruined, and to determine the expected number of games played.

We let 𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 and 𝑆𝑛∗ = 𝑎 + 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , which is the fortune of gambler 1


after the nth game and 𝑋𝑖 ~𝑖𝑖𝑑. As the game will only be stopped when either gambler 1 or gambler 2
are ruined, therefore:

𝑇 = min {𝑛: 𝑆𝑛∗ = 0 𝑜𝑟 𝑆𝑛∗ = 𝑎 + 𝑏}


𝑇 = min {𝑛: 𝑆𝑛∗ = −𝑎 𝑜𝑟 𝑆𝑛∗ = 𝑏}
We know that {𝑆𝑛 , 𝑛 ≥ 1} is a martingale process. Since at T, 𝑆𝑇 = −𝑎 𝑜𝑟 𝑆𝑇 = 𝑏. Then,
𝑃(𝑆𝑇 = −𝑎) + 𝑃(𝑆𝑇 = 𝑏) = 1 -> Equation a, and {𝑆𝑇 = −𝑎} is when gambler 1 wi(qll be ruined.

Also,

𝐸(𝑆𝑇 ) = (−𝑎)𝑃(𝑆𝑇 = −𝑎) + (𝑏)𝑃(𝑆𝑇 = 𝑏) = 0 -> Equation b.

Solving equations a and b allows us to obtain:


𝑏
𝑃(𝑆𝑇 = −𝑎) = 𝑎+𝑏, when gambler 1 is ruined
𝑎
𝑃(𝑆𝑇 = 𝑏) = 𝑎+𝑏, when gambler 2 is ruined.

We also find the number of expected games, that is 𝐸(𝑇), by using the fact that 𝑌𝑛 = 𝑆𝑛2 − 𝑛 is a
martingale process and if 𝑌𝑛 is a martingale, then 𝐸(𝑌𝑛 ) = 0, 𝑓𝑜𝑟 𝑇 = 𝑛

𝐸(𝑆𝑇2 ) = 𝐸(𝑇), that is 𝐸(𝑆𝑇2 ) = (−𝑎)2 𝑃(𝑆𝑇 = −𝑎) + (𝑏)2 𝑃(𝑆𝑇 = 𝑏), which gives us:

𝐸(𝑇) = 𝑎𝑏.

We also apply it to a case of an unfair game. We use the exponential martingale, which is linked with
the unfair game and obtain two results using Doob’s theorem to find when the random walk of the
gambler hits 0 before hitting N:

𝑞 𝑘 𝑞 𝑁
( ) −( )
𝑝 𝑝
𝑃(𝑆𝑇 = 0) =
𝑞 𝑁
1−( )
𝑝
Similarly,

𝑞 𝑘
1 − (𝑝)
𝑃(𝑆𝑇 = 𝑁) =
𝑞 𝑁
1 − (𝑝)

We recognize that as we move from process to process, applying the process to retrieve the final result
in the gambler’s ruin problem becomes easier. This will also be the case in the next and final topic
(Brownian Motion).

Solved Examples of Week 9 and Week 10:

1. Suppose that 𝑋1 , 𝑋2 , … are 𝑖𝑖𝑑 random variables with expected value 𝜇 = 𝐸(𝑋𝑖 ). We want to
show that
𝑋1 . 𝑋2 … 𝑋𝑛
𝑀𝑛 =
𝜇𝑛
Is a martingale with respect to 𝑋1 , 𝑋2 , …

We know:
𝑋𝑛+1
𝑀𝑛+1 = 𝑀𝑛
𝜇
Therefore,
𝑋𝑛+1 𝐸(𝑋𝑛+1 |𝐹𝑛 ) 𝐸(𝑋𝑛+1 ) 𝜇
𝐸(𝑀𝑛+1 |𝐹𝑛 ) = 𝐸(𝑀𝑛 |𝐹𝑛 ) = 𝑀𝑛 = 𝑀𝑛 = 𝑀𝑛 = 𝑀𝑛
𝜇 𝜇 𝜇 𝜇
2 1
2. Suppose 𝑋𝑖 = −1 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 3 and 𝑋𝑖 = 2 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 3. We want to show that
𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ +𝑋𝑛 is a martingale with respect to 𝑋1 , 𝑋2 , …

We can see that the expected value of 𝑋𝑖 is 0. We use this information in our proof.
𝐸(𝑆𝑛+1 |𝐹𝑛 ) = 𝐸(𝑆𝑛 + 𝑋𝑛+1 |𝐹𝑛 ) = 𝑆𝑛 + 𝐸(𝑋𝑛+1 |𝐹𝑛 ) = 𝑆𝑛 + 𝐸(𝑋𝑛+1 ) = 𝑆𝑛 + 0 = 𝑆𝑛
Week 11

Lecture 22

Here, we begin discussing Brownian Motion which is a continuous stochastic process. We define it as:

A Brownian motion (BM) is a stochastic process {𝐵𝑡 }𝑡≥0, which satisfies:

i) The process starts at 𝐵0 = 0


ii) 𝐵𝑡 has stationary independent increments, that is 𝐵𝑡1 − 𝐵𝑡0 , 𝐵𝑡2 − 𝐵𝑡1 , … . , 𝐵𝑡𝑘 −
𝐵𝑡𝑘−1 are independent for every 𝑡1 , 𝑡2 , 𝑡3 , … , 𝑡𝑘 . This is a result that comes from the
Bernoulli stochastic process.
iii) 𝑇ℎ𝑒 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 {𝐵𝑡 }𝑡≥0 is continuous in t.
iv) {𝐵𝑡 }𝑡≥0 is distributed as 𝑁(0, 𝜎 2 𝑡), 𝑡 ≥ 0, where 𝜎 2 > 0 and 𝜎 2 is a constant, that is, 𝐵𝑡 −
𝐵𝑠 ~𝑁(0, |𝑡 − 𝑠|), where, 𝑠 < 𝑡

Proposition: If {𝐵𝑡 }𝑡≥0 is normally distributed then

i) 𝐸(𝐵𝑡 ) = 0
ii) 𝐸(𝐵𝑡2 ) = 𝑡
iii) These two are proved using the probability density function and cumulative density
functions of the normal distributions. Knowledge of integration methods is required.
Obtaining these results allows us to also reach the conclusion that 𝑉𝑎𝑟(𝐵𝑡 ) = 0.

We then move towards defining the Wiener Process:

The BM is called the Wiener process if

i) 𝑊𝑡 starts at 𝑡 = 0 and 𝑊0 = 0
ii) 𝑊𝑡 is 𝐹𝑡 -martingale with 𝐸(𝑊𝑡2 ) < ∞, ∀𝑡 ≥ 0
iii) The process is 𝑊𝑡 is continuous in 𝑡

Our extra claim is for Wt to be a martingale. This leads us towards the proposition that the Brownian
Motion is a martingale:

This is proved by considering 𝐹𝑡 to be the information set and letting 𝑠 < 𝑡. We then write:

𝑊𝑡 = 𝑊𝑠 + (𝑊𝑡 − 𝑊𝑠 )
That is, we are now dealing with independent increments. Our goal is to show that

𝐸(𝑊𝑡 |𝐹𝑠 ) = 𝑊𝑆 . This is easily done.

Our next proposition then follows:

{𝑊𝑡 }𝑡≥0 is a Wiener process, then:

i) 𝐸(𝑊𝑡 𝑊𝑠 ) = min (𝑠, 𝑡) = 𝑠


ii) 𝐶𝑜𝑣(𝑊𝑡 𝑊𝑠 ) = 𝑠
𝑠
iii) 𝐶𝑜𝑟𝑟(𝑊𝑡 𝑊𝑠 ) = √𝑡
This is easily done knowing that it is a martingale. The next proposition: 𝐸(𝑊𝑡 − 𝑊𝑠 )2 = 𝑡 − 𝑠. This is
also equal to 𝑉𝑎𝑟(𝑊𝑡 − 𝑊𝑠 ). It is proved by opening the square.

Lecture 23

We now define Brownian motion with drift:

{𝑋𝑡 }𝑡≥0 is a Brownian motion process with drift coefficient “𝜇” and variance parameter 𝜎 2 if

i) 𝑋0 = 0
ii) 𝑋𝑡 , 𝑡 ≥ 0 has stationary independent increments
iii) 𝑋𝑡 is normally distributed with mean 𝜇𝑡 and variance 𝜎 2 𝑡

Equivalently,

𝑋𝑡 = 𝜇𝑡 + 𝜎𝐵𝑡 , where {𝐵𝑡 }𝑡≥0 is a standard BM process.

Applying expectation on both sides

𝐸(𝑋𝑡 ) = 𝜇𝑡 + 𝜎𝐸(𝐵𝑡 ) = 𝜇𝑡. We recall the previous lecture to obtain this result.

Also,

𝑉𝑎𝑟(𝑋𝑡 ) = 𝑉𝑎𝑟(𝜇𝑡 + 𝜎𝐵𝑡 ) = 𝑉𝑎𝑟(𝜎𝐵𝑡 ) = 𝜎 2 𝑉𝑎𝑟(𝐵𝑡 ) = 𝜎 2 𝑡. Similarly, what we studied in the last
class has helped us arrive at this result.

Note: 𝑋𝑡 − 𝑋𝑠 ~𝑁(𝜇(𝑡 − 𝑠), 𝜎 2 (𝑡 − 𝑠)), 𝑠 < 𝑡

For example, if 𝑋0 = 𝑎, then:


𝑥 − 𝜇𝑡
𝑃(𝑋𝑡 ≤ 𝑥|𝑋0 = 𝑎) = 𝑃(𝜇𝑡 + 𝜎𝐵𝑡 ≤ 𝑥|𝑋0 = 𝑎) = 𝑃 (𝐵𝑡 ≤ = 𝑎)
𝜎 |𝜎𝐵0
𝑥 − 𝜇𝑡 𝑎
= 𝑃(𝐵𝑡 ≤ |𝐵0 = )
𝜎 𝜎
(𝑥 − 𝑎) − 𝜇𝑡
= 𝑃(𝐵𝑡 ≤ )
𝜎
A new theorem:

If {𝐵𝑡 }𝑡≥0 is a BM process with respect to 𝐹𝑡 , then:

𝑌𝑡 = 𝐵𝑡2 − 𝑡 is a martingale. This is also called variance martingale, which we briefly mentioned last
week.

i) 𝐸(|𝑌𝑡 |) < ∞. This can be proved by using the equation in the above statement and using
the results obtained from a proposition in the last lecture, namely the expected value of
𝐵𝑡2 .
ii) 𝐸(𝐵𝑡2 |𝐹𝑠 ), 𝑠 < 𝑡
Next, we discuss the geometric Brownian motion (GBM):

If {𝑋𝑡 }𝑡≥0 is a BM process with drift coefficient 𝜇 and variance 𝜎 2 , 𝑡ℎ𝑒𝑛 {𝑌𝑡 }𝑡≥0 is called GBM if 𝑌𝑡 =
𝜎2
𝑋𝑡 (𝑡−𝑠)(𝜇+ )
𝑒 . Then, we find 𝐸(𝑋𝑡 |𝐹𝑠 ) = 𝑌𝑠 𝑒 2 . The GBM is a useful process in modelling stock prices.

What follows is another theorem:


𝑎2 𝑡
𝑒 𝑎𝐵𝑡 − 2 is a martingale process where {𝐵𝑡 }𝑡≥0 is a BM process. This is shown to be a martingale
process by showing that expected value of the above with respect to the information set 𝐹𝑠 , where s<t
𝑎2 𝑠
gives us 𝑒 𝑎𝐵𝑠 − 2 .
𝑎2 𝑡
𝑎𝐵𝑡
A proposition: If 𝐸(𝑒 )=𝑒 2 , then we write:
𝑡
i) 𝐸(𝑌𝑡 ) = 𝑒 2
ii) 𝐸(𝑌𝑡2 ) = 𝑒 2𝑡
iii) 𝑉𝑎𝑟(𝑌𝑡 ) = 𝑒 2𝑡 − 𝑒 𝑡
where 𝑌𝑡 = 𝑒 𝑎𝐵𝑡

Supposing a to equal 1, we show these to be true and remark that 𝑌𝑡 = 𝑒 𝐵𝑡 is log-normally distributed
𝑡
with mean 𝑒 2 and variance 𝑒 2𝑡 − 𝑒 𝑡

Bibliography

Cinlar, E. (2013). Introduction to stochastic processes. Minelo, NY. Dover

http://onlinestatbook.com/2/probability/binomial.html

https://users.math.msu.edu/users/drachman/math106spring01/lec18_long_range_predictions_ans
wers.pdf

Addison-Wesley: Markov Chains PDF.

https://www.math.ucdavis.edu/~gravner/MAT135B/materials/ch13.pdf

https://www.youtube.com/watch?v=wSQaYn2h-e8

http://people.brandeis.edu/~igusa/Math56aS08/newHW_5.pdf

You might also like