0% found this document useful (0 votes)
13 views

ModuleB 1

The document discusses random processes and their properties. A random process is a collection of random variables indexed over time. The document defines discrete-time and continuous-time random processes and covers topics like stationarity, autocorrelation, and examples of random processes.

Uploaded by

Ankur Mondal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

ModuleB 1

The document discusses random processes and their properties. A random process is a collection of random variables indexed over time. The document defines discrete-time and continuous-time random processes and covers topics like stationarity, autocorrelation, and examples of random processes.

Uploaded by

Ankur Mondal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Module B: Random Processes

A random process is a family/ collection of random variables indexed by a


set T , stated at {Xt }t∈T .

The set T is often interpreted as “time.”


 
X1
X 
 2
ˆ When T = {1, 2, .....n}, then {Xt }t∈T =  ..  is a random vector.
 . 
Xn
ˆ When T = {1, 2, 3, ....} = N, then {Xt }t∈T = (X1 , X2 , X3 , .....) is called
a discrete-time random process.
ˆ When T = R, {Xt }t∈T is an uncountable collection of random variables
and is called a continuous-time random process.

Recall: Xt : Ω → R fix ω: Xt (ω) : function of t is called the sample path.


1
1 w.p
 3
Example Xt = cos(2π ωt) where ω the random outcome ω = 2 w.p 1
 3
1
3 w.p

3

How do we specify a random process {Xt }t∈T : To fully specify a random


process, for any finite collection of indices (t1 , t2 , ......tn ), the joint distribution
(Xt1 , Xt2 , ......Xtn ) should be provided.

1
Deterministic vs Stochastic Dynamical Systems

ˆ Deterministic: starting from x0 ∈ Rn for all t ≥ 0

xt+1 = f (t, xt ).

More generally: xt+1 = f (t, xt , . . . , xt−m ) where m is the memory of the


system.
Example: n = 1, starting at x0 > 0, a simple (deterministic) population
growth model:
xt+1 = r0 xt .

Note that xt = r0t x0 .

ˆ Random process: starting from x0 ∈ Rn for all t ≥ 0

xt+1 = f (t, xt , wt ).

More generally: xt+1 = f (t, xt , . . . , xt−m , wt ) where m is the memory of the


system and wt is a random variable/vector.
Example: Beginning phase of a pandemics: for some initially infected pop-
ulation x0 > 0, the population of infected people at the beginning phase of
a pandemics can be modeled by:

xt+1 = rt xt ,

where rt is a non-negative random variable independent of rk for k < t with


some E[rt ] = r0 .

2
Examples of Random Processes

ˆ Averaging: suppose that {wt } is an independently and identically distributed


random process with E[wk ] = µ.
w1 +...+wt
How does the running average xt = t behave as t → ∞?
In this case:

txt = (t − 1)xt−1 + wt
1 1
xt = (1 − )xt−1 + wt
t t
xt = ft (xt−1 , wt )
1 1
ft (x, w) = (1 − )x + w.
t t
ˆ What happens if we use other weights such as: xt = w1 +...+w

t
t
?
ˆ What if we don’t have any weights at all, i.e., xt = w1 + . . . + wt ? What
happens?
ˆ What can we say about asymptotic behavior of such processes in general?

3
Terminology

For a random process X = {Xt }t∈T


(a) Mean function:

µX (t) := E[Xt ]

(b) Autocorrelation function:

RX (t1 , t2 ) := E[Xt1 Xt2 ]

(c) Autocovariance function:

CX (t1 , t2 ) := RX (t1 , t2 ) − µX (t1 )µX (t2 )

4
For an i.i.d. process

If the random process {Xt } is i.i.d., then


(a) For the mean function:

µX (t) = E[Xt ] = E[X0 ].

Therefore, we have a constant mean function.


(b) For the Autocorrelation function:
(
E[Xt21 ] = E[X12 ] t1 = t2
RX (t1 , t2 ) = E[Xt1 Xt2 ] = .
E[Xt1 ]E[Xt2 ] = µX (t1 )µX (t2 ) = µX (0)2 t1 ̸= t2

(c) Autocovariance function:


(
var(X1 ) t1 = t2
CX (t1 , t2 ) = RX (t1 , t2 ) − µX (t1 )µX (t2 ) = .
0 ̸ t2
t1 =
That is, the random process is uncorrelated in time.

Plus many other properties are true.

5
Stationary Processes

A random process is Strict Sense Stationary (SSS) if the (finite) joint


probability distributions (CDFs) are invariant under shift, i.e., for all t1 <
t2 < · · · < tk and all α1 , . . . , αk ∈ R:

FXt1 ,...,Xtk (α1 , . . . , αk ) = FXt1 +s ,...,Xtk +s (α1 , . . . , αk )

for all −t1 ≤ s.


Example: i.i.d. processes as

FXt1 ,...,Xtk (α1 , . . . , αk ) = FXt1 (α1 ) · · · FXt1 (αk ) = FX (α1 ) · · · FX (αk ).

A random process is Wide Sense Stationary (WSS) if


1. the mean function does not depend on time t, and
2. the RX (t1 , t2 ) = f (t1 − t2 ), i.e., autocorrelation function is just a
function of t1 − t2 .
Example: i.i.d. processes

For two random processes {Xt }t∈T and {Yt }t∈T ,


ˆ Cross-correlation RXY (t1 , t2 ) = E[X(t1 )Y (t2 )] ̸= E[Y (t1 )X(t2 )]
ˆ Cross-covariance CXY (t1 , t2 ) = cov[X(t1 ), Y (t2 )] = RXY (t1 , t2 )−µX (t1 )µY (t2 )
{Xt }t∈T and {Yt }t∈T are jointly WSS if
ˆ Both {Xt }t∈T and {Yt }t∈T are individually WSS
ˆ RXY (t1 , t2 ) = RXY (t1 − t2 )

6
Example: Random Walk

Let {Xk } be a random walk, given by Xk+1 = Xk + Zk where {Zk } is i.i.d. with
zero mean and variance σ 2 and X0 = 0 a.s.
(a) For the mean function:

µX (k) = E[Xk−1 + Zk−1 ] = E[Xk−1 ].

Therefore, µX (k) = µX (k − 1) = . . . = µX (0) = 0.


(b) For the Autocorrelation function: Let k1 ≤ k2 :

RX (k1 , k2 ) = E[Xk1 Xk2 ] = E[Xk1 (Xk2 − Xk1 + Xk1 )]


= E[Xk1 (Xk2 − Xk1 )] + E[Xk21 ]
= E[Xk21 ] = k1 σ 2 .

Therefore, RX (k1 , k2 ) = min(k1 , k2 )σ 2 . Thus, such a process is not WSS


and hence, not an SSS.
(c) Autocovariance function: since the process is zero mean CX = RX .

7
Continuous Time Random Processes

ˆ Example: for a deterministic α > 0 and frequency ω, let Xt = α cos(ωt + θ)


where θ ∼ U ([0, 2π]).
– The mean function:
Z 2π
1
µX (t) = E[α cos(ωt + θ)] = α cos(ωt + θ)dθ = 0.
2π 0

– The correlation function:

RX (t1 , t2 ) = E[α cos(ωt1 + θ)α cos(ωt2 + θ)]


Z 2π
1
= α2 cos(ωt1 + θ) cos(ωt2 + θ)dθ
2π 0
α2 2π
Z
= cos(ωt1 + θ) cos(ωt2 + θ)dθ
2π 0
α2 2π α2 2π
Z Z
= cos(ω(t1 + t2 ) + 2θ)dθ + cos(ω(t1 − t2 ))dθ
4π 0 4π 0
α2 2π
Z
= cos(ω(t1 − t2 ))dθ
4π 0
α2
= cos(ω(t1 − t2 )).
2

8
Properties of WSS Processes

ˆ Some properties of a WSS process {Xt }:


1. RX (τ ) = E[X(t)X(t + τ )] is an even function, i.e., RX (τ ) = RX (−τ ).
2. RX (0) ≥ RX (τ ) for all τ .
3. For independent processes {X(t)} and {Y (t)} with zero mean, RX+Y (τ ) =
RX (τ ) + RY (τ ).

Proof on board in class.

9
Ergodic Behavior

ˆ Statistical mean: µX (t) = E[X(t)] =


R
ω∈Ω Xt (ω) dPXt (ω).
ˆ If we have M samples of Xt1 , denoted (b b2t1 , . . . , x
x1t1 , x bMt1 ), drawn from PXt1 ,
1
Pm
then we can estimate the statistical average as µ
bX (t1 ) = M bit1 .
i=1 x

ˆ However, suppose we have a single sample path of the random process given
by x1 (ω0 ), x2 (ω0 ), . . .. Then, we can find the temporal mean and au-
tocorrelation as
1 T
Z
X(ω0 ) = xt (ω0 )dt
T 0
1 T
Z
RX (τ ) = xt+τ (ω0 )xt (ω0 )dt
T 0

ˆ Do the temporal and statistical averages coincide? Yes, when the process is
ergodic. The random process {Xt }t∈T is ergodic when
1 T
Z
E[Xt ] = lim Xt (ω0 ) dt.
T →∞ T 0

It is implicit that for ergodic process, E[Xt ] = µX (t) = µX for all t.


ˆ For a discrete-time process, we replace the integral by summation to compute
temporal averages.

Mean-Square Ergodic Theorem: Let {Xt }t∈T be a wide sense stationary pro-
cess with E[Xt ] = µX and auto-correlation RX (τ ), and let the Fourier
1
RT
transform of RX (τ ) exists. Let XT (ω) = 2T −T Xt (ω) dt. Then,

lim E[(XT − µX )2 ] = 0.
T →∞

In other words, XT converges to µX in mean-square sense.

The implication of the above theorem is that, we can approximate mean/ Corre-
lation by temporal average computed from a single sample path.

10
Random Process and LTI System

ˆ Suppose we have a LTI system with impulse response h(t). If we apply input
signal x(t) to this system, the output signal y(t) is given as
Z ∞
y(t) = h(τ )x(t − τ )dτ =: x(t) ⊛ h(t).

ˆ Now, suppose the input X(t) is a random process with mean µX (t) and
autocorrelation RX (t1 , t2 ). Determine the mean and autocorrelation of Y .

ˆ If X(t) is WSS, is Y (t) also WSS?

Yes. Derivation in class.

ˆ Are X(t) and Y (t) jointly WSS?

Yes. Derivation in class. We can show that

RY X (τ ) = h(τ ) ⊛ RX (τ )

11
Power Spectral Density (PSD)

R∞
ˆ From the above discussion, we have RY X (τ ) = ∞ h(s)RX (τ − s)ds.

ˆ For a CT WSS process X(t) (that is integrable), we can find the “power
spectral density” at frequency ω (rad/s):
Z ∞
SX (ω) := F T [RX (τ )] = RX (τ )e−jωτ dτ
−∞

ˆ Thus, SY X (ω) = H(ω)SX (ω) where H(ω) is the Fourier transform of the
impulse response h(t).

ˆ We can further show that

RY (τ ) = h(τ ) ⊛ RXY (τ )
=⇒ SY (ω) = H(ω) × SXY (ω)
In addition, SY X (ω) = H(ω) × SX (ω)
Since, RXY (τ ) = RY X (τ ), we have SXY (ω) = SY X (ω)∗
=⇒ SY (ω) = |H(ω)|2 SX (ω).

12
Discrete-time WSS Processes

ˆ A discrete-time random processs (Xn )n∈N is a collection of random variables


(X1 , X2 , . . . , Xn , . . .).

ˆ Mean function µX [n] = E[Xn ].

ˆ Autocorrelation function RX [n1 , n2 ] = E[Xn1 Xn2 ].

ˆ Autocovariance function CX [n1 , n2 ] = cov(Xn1 , Xn2 ).

ˆ Cross-correlation function RXY [n1 , n2 ] = E[Xn1 Yn2 ].

ˆ For X to be W.S.S, the following properties need to be satisfied.


1. µX [n] = µ independent of n.
2. RX [n1 , n2 ] = RX [n2 − n1 ].

ˆ Properties such as ergodicity and output of LTI system to a WSS input


continue to hold in an analogous manner.

13
Module B.2: Markov Chains

ˆ Markov Process: A random process whose probability distribution at time


t + 1 given the past only depends on its value at time t. Specifically,

Pr(Xk+1 ∈ A | Xk , . . . , X1 ) = Pr(Xk+1 ∈ A | Xk ).

More generally

Pr(Xk+1 ∈ A | Xki , . . . , Xk1 ) = Pr(Xk+1 ∈ A | Xki ),

for any k1 < k1 < . . . < ki < k.


ˆ If the (time) index set is continuous, the corresponding random process is
called Markov Process.
ˆ In this course: we focus on discrete-time Markov process where each random
variable Xk is a discrete random variable that takes values from a finite set.
ˆ Example: Infectious disease with reinfection where an individual can be in
one of two possible states: susceptible (S) and infected (I).

14
Formal Definition

ˆ Definition: We say that a (DT) random process {Xk } is a Markov chain


over a discrete-space if
1. Xk ’s are all discrete random variables with common support S, i.e.,
Pr(Xk ∈ S) = 1 for all k, where S is countable, and
2. for all i ≥ 1, all 1 ≤ k1 < k2 < . . . < ki ≤ k, and all 1, . . . , i, s ∈ S:

Pr(Xk+1 = s | Xki = si , . . . , Xk1 = s1 ) = Pr(Xk+1 = s | Xki = si ).


(1)

ˆ S is called the state space and each s ∈ S is called a state. Relation (1) is
called Markov property.
ˆ If S is finite, {Xk } is called a finite state Markov chain.

15
Transition Probabilities

ˆ From this point on assume S is a finite set with elements, S = {1, . . . , n}.
Unless otherwise stated, many of the following discussions hold for n = ∞
but for convenience we assume that n is finite.
ˆ For any k, let πk be the (marginal) probability mass function Xk , i.e.,

πk (i) = Pr(Xk = i).

Note that the vector πk is non-negative and di=1 πk (i) = 1. Such a vector
P
is called a stochastic (sometimes probability) vector. It is convenient to
assume that πk is a row vector.
ˆ For any 1 ≤ k1 < k2 , define the matrix (array)

Pk1 ,k2 (i, j) = Pr(Xk2 = j | Xk1 = i).

ˆ Pk1 ,k2 ∈ Rn1 ×n2 is called the transition matrix of MC from time k1 to time
k2 . In other words,
πk2 = πk1 Pk1 ,k2 .

ˆ We also (naturally) define Pk,k := I, where I is the n × n identity matrix.

16
Properties of Transition Matrices

ˆ Definition: We say that a n × n matrix A is a row-stochastic matrix if (i) A


is non-negative, and (ii) A1 = 1 (or each row sums up to one).
ˆ Properties of the transition matrices:
– Row-stochastic: For any k ≤ m, Pk,m is a row-stochastic matrix: The
non-negativeness follows from the definition. Also, each row adds up to
one:
n
X n
X
Pk,m (i, j) = Pr(Xm = j | Xk = i) = 1.
j=1 j=1

– For any k ≤ m, we have:

πm = πk Pk,m .

This follow from the fact:


n
X
πm (j) = Pr(Xm = j) = Pr(Xm = j, Xk = i)
i=1
n
X
= Pr(Xm = j | Xk = i) Pr(Xk = i)
i=1
= [πk Pk,m ]j .

17
Properties of Transition Matrices cont.

ˆ Properties of the transition matrices cont.:


– Semigroup property: For any k ≤ m ≤ q, we have:

Pk,q = Pk,m Pm,q .

To show this, let i, j being fixed. Then, we have

Pk,q (i, j) = Pr(Xq = j | Xk = i)


Xn
= Pr(Xq = j, Xm = ℓ | Xk = i)
ℓ=1
Xn
= Pr(Xq = j | Xm = ℓ, Xk = i) Pr(Xm = ℓ | Xk = i)
ℓ=1
Xn
(by Markov property) = Pr(Xq = j | Xm = ℓ) Pr(Xm = ℓ | Xk = i)
ℓ=1
Xn
= Pk,m (i, ℓ)Pm,q (ℓ, j)
ℓ=1
= [Pk,m Pm,q ]i,j .

This property is widely known as Chapman-Kolmogorov equation.

– For DS Markov chains, the second property, and the Chapman-Kolmogorov


property imply:

πk = π1 P1,k = π1 P1,2 P2,k = · · · = π1 P1,2 P2,3 · · · Pk−1,k .

18
Homogeneous Markov Chains

Definition: We say that a Markov chain {Xk } is (time-)homogeneous if P1,2 =


Pm,m+1 does not depend on m.

ˆ Denote P := Pm,m+1 . P is called the one-step transition matrix of the


underlying Homogeneous Markov chain.

ˆ P is a row-stochastic matrix.

ˆ For Homogeneous Markov chains, we have Pm,n = P n−m .

ˆ Distribution of Xk is given by πk = πk−1 P = π0 P k .

ˆ With the abuse of notation, for a Homogeneous Markov chain P is also


called (one-step) transition probability matrix (TPM).

ˆ For homogeneous markov chains, the initial distribution and the one-step
TPM completely specifies the random process.

19
Graph-Theoretic Interpretation

ˆ Consider a homogeneoys MC on state space S with TPM P .


ˆ Consider a directed weighted graph G = (V, E, P ) where
– V = S = {1, . . . , n},
– E = {(i, j) | Pij > 0}, and
– Pij is the weight of edge i, j.
ˆ Then, the MC can be viewed as a random walk on this weighted graph.
ˆ Example: infectious disease model. Determine the TPM, and simulate the
MC.

0.9 0.95
0.05

Susceptible Infected Recovered


0.2
0.1

0.8

20
Classification of States

We introduce a few basic definitions.


ˆ An m-step walk on a graph G = (V, E) is an ordered string of nodes
i0 , i1 , . . . , im such that (ik−1 , ik ) ∈ E for all k ∈ {1, 2, . . . , m}.

ˆ A path is a walk where no two nodes are repeated. A cycle is a walk where
the first and last nodes are identical and no other node is repeated.

ˆ Let G = (V, E, P ) be the graph associated with a MC with TPM P . A


state j is accessible from state i, denoted i → j if there is a walk in the
graph from node i to node j.

ˆ In other words, there exists nodes i1 , i2 , . . . , ik such that (i, i1 ) ∈ E, (i1 , i2 ) ∈


E, . . . , (ik , j) ∈ E. The length of this walk is k + 1.

ˆ Equivalently, Pi,i1 > 0, Pi1 ,i2 > 0, . . . , Pik ,j > 0. Thus, [P k+1 ]i,j > 0.

ˆ Two states i and j communicate if i → j and j → i. This is denoted by


i ↔ j.

ˆ Naturally, if i ↔ j, j ↔ k, then i ↔ k.

ˆ A subset of states C ⊆ V is a communicating class if


1. i ∈ C, j ∈ C =⇒ i ↔ j, and
2. i ∈ C, j ∈
/ C =⇒ i ↮ j.
The set of states can be partitioned into distinct communicating classes.
Each state belongs to exactly one communicating class.

Definition: A state i is called recurrent if i → j =⇒ j → i. A state is


transient if it is not recurrent.

If a state is recurrent, there is no path to a state from which there is no return.

21
Classification of States Cont.

Theorem: In a given communicating class, either all states are recurrent


or all states are transient. Furthermore, in a finite-state MC, there is at least
one recurrent communicating class.

ˆ A matrix P is irreducible if for any i, j, [P kij ]ij > 0 for some kij ≥ 1. In
other words, i ↔ j for every pair of states i, j.
ˆ Graph theoretic interpretation: P is irreducible if there is a directed path
between any two nodes on the graph.
ˆ In this case, there is a single communicating class which is recurrent.

Definition: The period γi of a state i, to be greatest common divisor


(gcd) of gcd(k | [P k ]ii > 0).

ˆ Graph theoretic interpretation: gcd of lengths of all paths from i to itself.



0 1
ˆ Example: for P = , determine the period of its states.
1 0
ˆ All states in the same communicating class have the same period.
ˆ We say that a non-negative matrix P is aperiodic if γi = 1 for all i.
ˆ A (homoegeneous) Markov chain with the transition matrix P is said to be
irreducible (aperiodic) if P is irreducible (aperiodic).

22
Stationary and Limiting Distribution of a Markov Chain

ˆ Let P ∈ Rn×n be the single-step transition probability matrix of a homoge-


neous markov chain.

ˆ Let π0 be the distribution of initial state X0 . It follows that πn = π0 P n .

A vector π ⋆ ∈ R1×n is called invariant/stationary/steady-state distribution


of the markov chain with TPM P if
Pn
ˆ π ⋆ is a probability vector, i.e., π ⋆ (i) ≥ 0, i=1 π ⋆ (i) = 1, and
ˆ π⋆ = π⋆P .

If πk = π ⋆ for some k, then πm = π ⋆ for all m ≥ k.

Fundamental questions in the theory of (homogeneous) Markov chains:


ˆ Existence and Uniqueness: When does π ⋆ exist? Is it unique?
ˆ Ergodicity : When unique, under what conditions, πk → π ∗ ?
ˆ Mixing time: How fast does it converge to π ∗ ?
ˆ Occupation Probability : How often do we spend time on a given state?

We know that the TPM P satisfies the following properties.


ˆ P is non-negative.
ˆ P is row-stochastic, which implies that all eigenvalues reisde on or within
the unit circle, and 1 is an eigenvalue.
ˆ Note: π ⋆ is nothing but the left eigenvector of eigenvalue 1.
ˆ Thus, existence and uniqueness of stationary distribution is equivalent to
showing existence and uniqueness of a non-negative left eigenvector of the
TPM.

23
Example

1 1

ˆ Let P = 2
1
2
2 .
3 3

ˆ Solving for (u, v)P = (u, v) with v = 1 − u, we get u = 2


5 and v = 35 . Is
this unique?
ˆ What about P = I?

24
Linear Algebra Viewpoint

ˆ Give a matrix A ∈ Rn×n , we define its spectral radius as

ρ(A) := {|λ| : λ is an eigenvalue of A}.

ˆ An eigenvalue of A is called semi-simple if its algebratic multiplicity =


its geometric multiplicity.
– algebratic multiplicity: number of times the eigenvalue appears as
root of the characteristic equation
– geometric multiplicity: number of linearly independent eigenvectors
associated with this eigenvalue
It is called simple when both multiplicities are equal to 1.

ˆ The matrix A is called


– semi-convergent if limk→∞ Ak exists, and
– convergent if it is semi-convergent and limk→∞ Ak = 0n×n .

Theorem 1. A matrix A ∈ Rn×n is


ˆ convergent if and only if ρ(A) < 1, and
ˆ is semi-convergent if and only if either (i) ρ(A) < 1 or (ii) 1 is
a semi-simple eigenvalue and all other eigenvalues have magnitude
strictly less than 1.

25
Perron-Frobenius Theorem

A matrix A ∈ Rn×n is
ˆ non-negative if Aij ≥ 0 for all i, j.
Pn−1
ˆ irreducible if k=0 Ak is positive, i.e., all entries are strictly larger
than 0.
ˆ primitive if there exists some k̄ such that Ak̄ > 0.
ˆ positive if Aij > 0 for all i, j.

Theorem 2. A matrix A ∈ Rn×n be a non-negative matrix.


ˆ Then, there exists a real eigenvalue λ ≥ |µ| ≥ 0 where µ is any
other eigenvalue. The left and right eigenvectors associated with A
are non-negative.
ˆ If A is irreducible, λ ≥ |µ| is strictly positive and simple. The left
and right eigenvectors associated with A are unique and positive.
ˆ If A is primitive, λ > |µ|. The left and right eigenvectors associated
with A are unique and positive.

Let P ∈ Rn×n be the single-step transition probability matrix of a homoge-


neous markov chain.
ˆ Is P non-negative?
ˆ When is P irreducible? Does it imply P is semi-convergent?
ˆ When is P primitive? Does it imply P is semi-convergent?
ˆ When is P positive? Does it imply P is semi-convergent?

26
Case 1: MC with Single Recurrent Class

ˆ In this case, TPM P is irreducible. (why?)

ˆ From PF Theorem, largest eigenvalue 1 is simple, and left eigenvector is


unique and positive. In other words, π ⋆ exists and is unique.

ˆ However, if the states have period d > 1, then there are d eigenvalues on
the unit circle that are equally spaced. Such a matrix is not primitive, and
hence not semi-convergent.

ˆ When the states are aperiodic (i.e., period d = 1), then it is primitive, and
P is semi-convergent.

ˆ A MC which is both irreducible and aperiodic is called ergodic.

ˆ We can show that


   
1 w1 w 2 . . . wn
1   w1
 w 2 . . . wn 
lim P k = (1)k vw⊤ =  ..  w1 w2 . . . wn =   =: P∞ ,
  
..
k→∞ .  . 
1 w1 w 2 . . . wn

where v is the right eigenvector and w is the left eigenvector of 1. Note that
w = π⋆.

ˆ In addition, for any initial distribution π0 , we have

lim πk = lim π0 P k = π0 P∞ = π ⋆ .
k→∞ k→∞

27
Case 2: MC with one Recurrent Class and some Transient
States

ˆ Such a markov chain is called a unichain. The TPM P is no longer irre-


ducible and can be partitioned as
m1  m1 n−m1 
PRR 0
P =
n−m1
PT R P T T

where the first m1 states belong to the recurrent class, and the remaining
states being transient.

ˆ Though P is not irreducible, the submatrix PRR is irreducible which has a


unique stationary distribution πR⋆ ∈ R1×m1 .

ˆ Then, the vector π ⋆ = [πR



01×n−m1 ] is the unique stationary distribution
of P .

ˆ If the states in the recurrent class is apeiodic, then P is semi-convergent.


Such a MC is called an ergodic unichain.

The following result characterizes the uniqueness and limiting behavior of the
stationary distribution.

Theorem 3. Consider a finite-state homogeneous MC.


ˆ A MC has a unique stationary distribution π ⋆ if and only if it is a
unichain (i.e., it has a single recurrent class)
ˆ Let limk→∞ P k = P∞ . Each row of P∞ is identical and equal to π ⋆ if
and only if MC is an ergodic unichain (unichain with an aperiodic
recurrent class).

28
Case 3: MC with Multiple Recurrent Classes

ˆ The TPM P can be partitioned as P


 m1 m2 m3 n− mi
PR 1 0 0 0
 0 PR2 0 0 
P =
 0

0 PR 3 0 
PT R1 PT R2 PT R3 PT T
where the first m1 states belong to the first recurrent class, and so on.

ˆ For each recurrent class, the corresponding submatrix PRi is irreducible which
has a unique stationary distribution πi⋆ ∈ R1×mi .

ˆ Then, the vector [0 πi⋆ . . . 0] is a stationary distribution of P . Thus,


stationary distribution is not unique.

ˆ Every recurrent class adds one multiplicity to the eigenvalue 1.

ˆ P is semi-convergent only when every recurrent class is aperiodic. In this


case, limk→∞ P k = P∞ , but P∞ has non-identical rows. However, rows
corresponding to states in the same recurrent class are identical.

29
Ergodic Property

ˆ Let the initial state X0 = i.

ˆ Ti := inf{k ≥ 1 | Xk = i} (first passage time): smallest time index at


which the state takes value i

ˆ fi := P(Ti < ∞): return probability

ˆ mi := E[Ti ]: mean return time


P∞
ˆ νi := k=0 1{Xk =i} number of visits to i starting from i.

ˆ State i is recurrent if and only if fi = 1. State i is transient if and only if


fi < 1.

Theorem: If state i is recurrent, then E[νi ] = ∞. If state i is transient,


then E[νi ] < ∞.

Theorem: Suppose the TPM is irreducible and let π ⋆ be the unique sta-
tionary distribution. Then, mi = π⋆1(i) for all states i.

Theorem: Suppose the TPM is irreducible and aperiodic (i.e., ergodic)


with the stationary distribution π ⋆ . Then
n
1X
lim 1{Xk =i} = π ⋆ (i) almost surely.
n→∞ n
k=1

30
Application: Page-Rank Algorithm

ˆ Original idea of Google search ranking: Model a browsing person as a random


walker over the graph of internet!
ˆ Let G = (V, E) where d = number of webpages and there is a node for each
webpage.
ˆ (i, j) ∈ E if i has a link to j.
ˆ Then a person can be modeled as a random walker on G where
(
1
di j ∈ Ni
Pij =
0 otherwise.

ˆ Problem with this? Corresponding Markov chain is not irreducible.


ˆ Now let us add a small reset probability, i.e., consider a Markov chain with
one-step transition matrix

P̂ = (1 − a)P + aJ,

where a ∈ (0, 1) is a small reset parameter and J is the d × d matrix with


all elements being 1/d.
ˆ Then a Markov chain with the transition matrix P̂ is irreducible and aperiodic
(why?).
ˆ Therefore, it is ergodic, has a unique stationary distribution π ∗ , and πk → π ∗
as k → ∞.
ˆ More importantly average visit percentage of state (webpage) i by time
k→ πi∗ !
ˆ Therefore, webpage i is superios to j if πi∗ > πj∗ .
ˆ How does Google find π ∗ ?

31
Vector-valued Random Process

A random process X = {Xt }t∈T may be such that each Xt is a random vector
taking values in Rn . Then,
(a) Mean function:

µX (t) := E[Xt ] ∈ Rn

(b) Autocorrelation function:

RX (t1 , t2 ) := E[Xt1 Xt⊤2 ] ∈ Rn×n

(c) Autocovariance function:

CX (t1 , t2 ) := cov(Xt1 , Xt2 ) ∈ Rn×n .

For WSS, every element of CX (t1 , t2 ) should only depend on t2 − t1 .

32
Other Class of Processes

ˆ A stochastic process {Xt }t∈T is called a Gaussian Process if for every finite
set of indices t1 , t2 , . . . , tk , the collection of random variables Xt1 , Xt1 , . . . , Xtk
is jointly Gaussian.

ˆ A stochastic process which is both Gaussian and Markov is called Gauss-


Markov Process.

ˆ A stochastic process {Xt }t∈T is said to have independent increments if


for every finite set of indices t1 , t2 , . . . , tk , the collection of random variables
Xt2 − Xt1 , Xt3 − Xt2 , . . . , Xtk − Xtk−1 are mutually indepdenent.

ˆ The increments are stationary if Xt2 − Xt1 and Xt2 +s − Xt1 +s have the same
distribution irrespective of the value of s.

ˆ Brownian Motion/Wiener Process: A stochastic process {Xt }t∈T is


a Wiener Process if
1. X0 = 0,
2. the process has stationary and independent increments,
3. Xt − Xs ∼ N (0, σ 2 (t − s)),
4. the sample paths are continuous with probability 1.
For a Wiener process, one can show that the sample paths are not differen-
tiable by showing
h X(t + ∆) − X(t) i σ 2
lim var = → ∞.
∆→0 ∆ ∆

33
Dynamical System

ˆ Deterministic discrete-time dynamical system in state-space form is given


by:
xk+1 = fk (xk , uk ), k = 0, 1, . . . ,
where xk ∈ Rn is the state at time k and uk ∈ Rm is the input at time k.
ˆ State variable: summarizes past information such that if we know the state
at time k and the input for all t ≥ k, then we can completely determine the
future states.
ˆ In other words, if we know the current state, we do not need to store past
states and inputs to predict the future.
ˆ If fk = f for all k, the system is time-invariant.

34
Stochastic Dynamical System

ˆ Stochastic Model: the future state is uncertain even if the current state
and input are known. There are two ways of representing such a system.
Both are equivalent under reasonable assumptions.

ˆ State-space form:

xk+1 = fk (xk , uk , wk ), k = 0, 1, . . . ,

where wk ∈ Rw is a random variable/noise/disturbace which is not under


our control (unlike input u).

ˆ Note that {w1 , w2 , . . . , } is a discrete-time random process, as is {x1 , x2 , . . . , }.

ˆ Example: xk+1 = axk + wk where wk ∈ N (c, 1) and x0 = 5. What will the


trajectories look like for different values of a and c? What is the distribution
of xk as k → ∞? Is this process Markovian?

35
Stochastic Linear System

A stochastic linear system is formally defined as

xk+1 = Ak xk + Bk uk + wk .

Problem: recursively determine the mean and variance of xk given that E[wk ] = 0,
var(wk ) = Q and x0 is known.

36
Representation via Transition Kernel

ˆ Recall the state-space form: xk+1 = fk (xk , uk , wk ), k = 0, 1, . . . .

ˆ Here, the distribution of xk+1 can be found in terms of the function fk and
indirectly, as a function of basic random variables (x0 , w0 , . . . , wk ).

ˆ The alternative approach is to directly specify the distribution of xk+1 instead


of relying on the function fk . In particular, the conditional distribution
of Xk+1 given xk and uk is specified for all values of xk and uk .

ˆ For the dynamical system to be Markovian, we need to show that for every
Borel subset A and for all k,

P(Xk+1 ∈ A|x0 , u0 , x1 , u1 , . . . , xk , uk ) = P(Xk+1 ∈ A|xk , uk ).

ˆ Is the above property always true?

37
Observation Model

ˆ In many instances, the states can not be directly measured.

ˆ Instead, we observe “output” quantities that depend on the state as

yk = gk (xk , vk ),

where vk is a random variable termed “measurement noise.”

ˆ Alternatively, the conditional distribution of yk given xk is specified.

ˆ In case of a linear system, yk = Ck xk + vk .

ˆ One problem of significant interest is to infer or estimate the state xk given


the measured / output quantities yk in an online and recursive manner.

ˆ Module C will tackle this issue.

38

You might also like