0% found this document useful (0 votes)
29 views38 pages

ModuleB 1

The document discusses random processes and their properties. A random process is a collection of random variables indexed over time. The document defines discrete-time and continuous-time random processes and covers topics like stationarity, autocorrelation, and examples of random processes.

Uploaded by

Ankur Mondal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views38 pages

ModuleB 1

The document discusses random processes and their properties. A random process is a collection of random variables indexed over time. The document defines discrete-time and continuous-time random processes and covers topics like stationarity, autocorrelation, and examples of random processes.

Uploaded by

Ankur Mondal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Module B: Random Processes

A random process is a family/ collection of random variables indexed by a


set T , stated at {Xt }t∈T .

The set T is often interpreted as “time.”


 
X1
X 
 2
ˆ When T = {1, 2, .....n}, then {Xt }t∈T =  ..  is a random vector.
 . 
Xn
ˆ When T = {1, 2, 3, ....} = N, then {Xt }t∈T = (X1 , X2 , X3 , .....) is called
a discrete-time random process.
ˆ When T = R, {Xt }t∈T is an uncountable collection of random variables
and is called a continuous-time random process.

Recall: Xt : Ω → R fix ω: Xt (ω) : function of t is called the sample path.


1
1 w.p
 3
Example Xt = cos(2π ωt) where ω the random outcome ω = 2 w.p 1
 3
1
3 w.p

3

How do we specify a random process {Xt }t∈T : To fully specify a random


process, for any finite collection of indices (t1 , t2 , ......tn ), the joint distribution
(Xt1 , Xt2 , ......Xtn ) should be provided.

1
Deterministic vs Stochastic Dynamical Systems

ˆ Deterministic: starting from x0 ∈ Rn for all t ≥ 0

xt+1 = f (t, xt ).

More generally: xt+1 = f (t, xt , . . . , xt−m ) where m is the memory of the


system.
Example: n = 1, starting at x0 > 0, a simple (deterministic) population
growth model:
xt+1 = r0 xt .

Note that xt = r0t x0 .

ˆ Random process: starting from x0 ∈ Rn for all t ≥ 0

xt+1 = f (t, xt , wt ).

More generally: xt+1 = f (t, xt , . . . , xt−m , wt ) where m is the memory of the


system and wt is a random variable/vector.
Example: Beginning phase of a pandemics: for some initially infected pop-
ulation x0 > 0, the population of infected people at the beginning phase of
a pandemics can be modeled by:

xt+1 = rt xt ,

where rt is a non-negative random variable independent of rk for k < t with


some E[rt ] = r0 .

2
Examples of Random Processes

ˆ Averaging: suppose that {wt } is an independently and identically distributed


random process with E[wk ] = µ.
w1 +...+wt
How does the running average xt = t behave as t → ∞?
In this case:

txt = (t − 1)xt−1 + wt
1 1
xt = (1 − )xt−1 + wt
t t
xt = ft (xt−1 , wt )
1 1
ft (x, w) = (1 − )x + w.
t t
ˆ What happens if we use other weights such as: xt = w1 +...+w

t
t
?
ˆ What if we don’t have any weights at all, i.e., xt = w1 + . . . + wt ? What
happens?
ˆ What can we say about asymptotic behavior of such processes in general?

3
Terminology

For a random process X = {Xt }t∈T


(a) Mean function:

µX (t) := E[Xt ]

(b) Autocorrelation function:

RX (t1 , t2 ) := E[Xt1 Xt2 ]

(c) Autocovariance function:

CX (t1 , t2 ) := RX (t1 , t2 ) − µX (t1 )µX (t2 )

4
For an i.i.d. process

If the random process {Xt } is i.i.d., then


(a) For the mean function:

µX (t) = E[Xt ] = E[X0 ].

Therefore, we have a constant mean function.


(b) For the Autocorrelation function:
(
E[Xt21 ] = E[X12 ] t1 = t2
RX (t1 , t2 ) = E[Xt1 Xt2 ] = .
E[Xt1 ]E[Xt2 ] = µX (t1 )µX (t2 ) = µX (0)2 t1 ̸= t2

(c) Autocovariance function:


(
var(X1 ) t1 = t2
CX (t1 , t2 ) = RX (t1 , t2 ) − µX (t1 )µX (t2 ) = .
0 ̸ t2
t1 =
That is, the random process is uncorrelated in time.

Plus many other properties are true.

5
Stationary Processes

A random process is Strict Sense Stationary (SSS) if the (finite) joint


probability distributions (CDFs) are invariant under shift, i.e., for all t1 <
t2 < · · · < tk and all α1 , . . . , αk ∈ R:

FXt1 ,...,Xtk (α1 , . . . , αk ) = FXt1 +s ,...,Xtk +s (α1 , . . . , αk )

for all −t1 ≤ s.


Example: i.i.d. processes as

FXt1 ,...,Xtk (α1 , . . . , αk ) = FXt1 (α1 ) · · · FXt1 (αk ) = FX (α1 ) · · · FX (αk ).

A random process is Wide Sense Stationary (WSS) if


1. the mean function does not depend on time t, and
2. the RX (t1 , t2 ) = f (t1 − t2 ), i.e., autocorrelation function is just a
function of t1 − t2 .
Example: i.i.d. processes

For two random processes {Xt }t∈T and {Yt }t∈T ,


ˆ Cross-correlation RXY (t1 , t2 ) = E[X(t1 )Y (t2 )] ̸= E[Y (t1 )X(t2 )]
ˆ Cross-covariance CXY (t1 , t2 ) = cov[X(t1 ), Y (t2 )] = RXY (t1 , t2 )−µX (t1 )µY (t2 )
{Xt }t∈T and {Yt }t∈T are jointly WSS if
ˆ Both {Xt }t∈T and {Yt }t∈T are individually WSS
ˆ RXY (t1 , t2 ) = RXY (t1 − t2 )

6
Example: Random Walk

Let {Xk } be a random walk, given by Xk+1 = Xk + Zk where {Zk } is i.i.d. with
zero mean and variance σ 2 and X0 = 0 a.s.
(a) For the mean function:

µX (k) = E[Xk−1 + Zk−1 ] = E[Xk−1 ].

Therefore, µX (k) = µX (k − 1) = . . . = µX (0) = 0.


(b) For the Autocorrelation function: Let k1 ≤ k2 :

RX (k1 , k2 ) = E[Xk1 Xk2 ] = E[Xk1 (Xk2 − Xk1 + Xk1 )]


= E[Xk1 (Xk2 − Xk1 )] + E[Xk21 ]
= E[Xk21 ] = k1 σ 2 .

Therefore, RX (k1 , k2 ) = min(k1 , k2 )σ 2 . Thus, such a process is not WSS


and hence, not an SSS.
(c) Autocovariance function: since the process is zero mean CX = RX .

7
Continuous Time Random Processes

ˆ Example: for a deterministic α > 0 and frequency ω, let Xt = α cos(ωt + θ)


where θ ∼ U ([0, 2π]).
– The mean function:
Z 2π
1
µX (t) = E[α cos(ωt + θ)] = α cos(ωt + θ)dθ = 0.
2π 0

– The correlation function:

RX (t1 , t2 ) = E[α cos(ωt1 + θ)α cos(ωt2 + θ)]


Z 2π
1
= α2 cos(ωt1 + θ) cos(ωt2 + θ)dθ
2π 0
α2 2π
Z
= cos(ωt1 + θ) cos(ωt2 + θ)dθ
2π 0
α2 2π α2 2π
Z Z
= cos(ω(t1 + t2 ) + 2θ)dθ + cos(ω(t1 − t2 ))dθ
4π 0 4π 0
α2 2π
Z
= cos(ω(t1 − t2 ))dθ
4π 0
α2
= cos(ω(t1 − t2 )).
2

8
Properties of WSS Processes

ˆ Some properties of a WSS process {Xt }:


1. RX (τ ) = E[X(t)X(t + τ )] is an even function, i.e., RX (τ ) = RX (−τ ).
2. RX (0) ≥ RX (τ ) for all τ .
3. For independent processes {X(t)} and {Y (t)} with zero mean, RX+Y (τ ) =
RX (τ ) + RY (τ ).

Proof on board in class.

9
Ergodic Behavior

ˆ Statistical mean: µX (t) = E[X(t)] =


R
ω∈Ω Xt (ω) dPXt (ω).
ˆ If we have M samples of Xt1 , denoted (b b2t1 , . . . , x
x1t1 , x bMt1 ), drawn from PXt1 ,
1
Pm
then we can estimate the statistical average as µ
bX (t1 ) = M bit1 .
i=1 x

ˆ However, suppose we have a single sample path of the random process given
by x1 (ω0 ), x2 (ω0 ), . . .. Then, we can find the temporal mean and au-
tocorrelation as
1 T
Z
X(ω0 ) = xt (ω0 )dt
T 0
1 T
Z
RX (τ ) = xt+τ (ω0 )xt (ω0 )dt
T 0

ˆ Do the temporal and statistical averages coincide? Yes, when the process is
ergodic. The random process {Xt }t∈T is ergodic when
1 T
Z
E[Xt ] = lim Xt (ω0 ) dt.
T →∞ T 0

It is implicit that for ergodic process, E[Xt ] = µX (t) = µX for all t.


ˆ For a discrete-time process, we replace the integral by summation to compute
temporal averages.

Mean-Square Ergodic Theorem: Let {Xt }t∈T be a wide sense stationary pro-
cess with E[Xt ] = µX and auto-correlation RX (τ ), and let the Fourier
1
RT
transform of RX (τ ) exists. Let XT (ω) = 2T −T Xt (ω) dt. Then,

lim E[(XT − µX )2 ] = 0.
T →∞

In other words, XT converges to µX in mean-square sense.

The implication of the above theorem is that, we can approximate mean/ Corre-
lation by temporal average computed from a single sample path.

10
Random Process and LTI System

ˆ Suppose we have a LTI system with impulse response h(t). If we apply input
signal x(t) to this system, the output signal y(t) is given as
Z ∞
y(t) = h(τ )x(t − τ )dτ =: x(t) ⊛ h(t).

ˆ Now, suppose the input X(t) is a random process with mean µX (t) and
autocorrelation RX (t1 , t2 ). Determine the mean and autocorrelation of Y .

ˆ If X(t) is WSS, is Y (t) also WSS?

Yes. Derivation in class.

ˆ Are X(t) and Y (t) jointly WSS?

Yes. Derivation in class. We can show that

RY X (τ ) = h(τ ) ⊛ RX (τ )

11
Power Spectral Density (PSD)

R∞
ˆ From the above discussion, we have RY X (τ ) = ∞ h(s)RX (τ − s)ds.

ˆ For a CT WSS process X(t) (that is integrable), we can find the “power
spectral density” at frequency ω (rad/s):
Z ∞
SX (ω) := F T [RX (τ )] = RX (τ )e−jωτ dτ
−∞

ˆ Thus, SY X (ω) = H(ω)SX (ω) where H(ω) is the Fourier transform of the
impulse response h(t).

ˆ We can further show that

RY (τ ) = h(τ ) ⊛ RXY (τ )
=⇒ SY (ω) = H(ω) × SXY (ω)
In addition, SY X (ω) = H(ω) × SX (ω)
Since, RXY (τ ) = RY X (τ ), we have SXY (ω) = SY X (ω)∗
=⇒ SY (ω) = |H(ω)|2 SX (ω).

12
Discrete-time WSS Processes

ˆ A discrete-time random processs (Xn )n∈N is a collection of random variables


(X1 , X2 , . . . , Xn , . . .).

ˆ Mean function µX [n] = E[Xn ].

ˆ Autocorrelation function RX [n1 , n2 ] = E[Xn1 Xn2 ].

ˆ Autocovariance function CX [n1 , n2 ] = cov(Xn1 , Xn2 ).

ˆ Cross-correlation function RXY [n1 , n2 ] = E[Xn1 Yn2 ].

ˆ For X to be W.S.S, the following properties need to be satisfied.


1. µX [n] = µ independent of n.
2. RX [n1 , n2 ] = RX [n2 − n1 ].

ˆ Properties such as ergodicity and output of LTI system to a WSS input


continue to hold in an analogous manner.

13
Module B.2: Markov Chains

ˆ Markov Process: A random process whose probability distribution at time


t + 1 given the past only depends on its value at time t. Specifically,

Pr(Xk+1 ∈ A | Xk , . . . , X1 ) = Pr(Xk+1 ∈ A | Xk ).

More generally

Pr(Xk+1 ∈ A | Xki , . . . , Xk1 ) = Pr(Xk+1 ∈ A | Xki ),

for any k1 < k1 < . . . < ki < k.


ˆ If the (time) index set is continuous, the corresponding random process is
called Markov Process.
ˆ In this course: we focus on discrete-time Markov process where each random
variable Xk is a discrete random variable that takes values from a finite set.
ˆ Example: Infectious disease with reinfection where an individual can be in
one of two possible states: susceptible (S) and infected (I).

14
Formal Definition

ˆ Definition: We say that a (DT) random process {Xk } is a Markov chain


over a discrete-space if
1. Xk ’s are all discrete random variables with common support S, i.e.,
Pr(Xk ∈ S) = 1 for all k, where S is countable, and
2. for all i ≥ 1, all 1 ≤ k1 < k2 < . . . < ki ≤ k, and all 1, . . . , i, s ∈ S:

Pr(Xk+1 = s | Xki = si , . . . , Xk1 = s1 ) = Pr(Xk+1 = s | Xki = si ).


(1)

ˆ S is called the state space and each s ∈ S is called a state. Relation (1) is
called Markov property.
ˆ If S is finite, {Xk } is called a finite state Markov chain.

15
Transition Probabilities

ˆ From this point on assume S is a finite set with elements, S = {1, . . . , n}.
Unless otherwise stated, many of the following discussions hold for n = ∞
but for convenience we assume that n is finite.
ˆ For any k, let πk be the (marginal) probability mass function Xk , i.e.,

πk (i) = Pr(Xk = i).

Note that the vector πk is non-negative and di=1 πk (i) = 1. Such a vector
P
is called a stochastic (sometimes probability) vector. It is convenient to
assume that πk is a row vector.
ˆ For any 1 ≤ k1 < k2 , define the matrix (array)

Pk1 ,k2 (i, j) = Pr(Xk2 = j | Xk1 = i).

ˆ Pk1 ,k2 ∈ Rn1 ×n2 is called the transition matrix of MC from time k1 to time
k2 . In other words,
πk2 = πk1 Pk1 ,k2 .

ˆ We also (naturally) define Pk,k := I, where I is the n × n identity matrix.

16
Properties of Transition Matrices

ˆ Definition: We say that a n × n matrix A is a row-stochastic matrix if (i) A


is non-negative, and (ii) A1 = 1 (or each row sums up to one).
ˆ Properties of the transition matrices:
– Row-stochastic: For any k ≤ m, Pk,m is a row-stochastic matrix: The
non-negativeness follows from the definition. Also, each row adds up to
one:
n
X n
X
Pk,m (i, j) = Pr(Xm = j | Xk = i) = 1.
j=1 j=1

– For any k ≤ m, we have:

πm = πk Pk,m .

This follow from the fact:


n
X
πm (j) = Pr(Xm = j) = Pr(Xm = j, Xk = i)
i=1
n
X
= Pr(Xm = j | Xk = i) Pr(Xk = i)
i=1
= [πk Pk,m ]j .

17
Properties of Transition Matrices cont.

ˆ Properties of the transition matrices cont.:


– Semigroup property: For any k ≤ m ≤ q, we have:

Pk,q = Pk,m Pm,q .

To show this, let i, j being fixed. Then, we have

Pk,q (i, j) = Pr(Xq = j | Xk = i)


Xn
= Pr(Xq = j, Xm = ℓ | Xk = i)
ℓ=1
Xn
= Pr(Xq = j | Xm = ℓ, Xk = i) Pr(Xm = ℓ | Xk = i)
ℓ=1
Xn
(by Markov property) = Pr(Xq = j | Xm = ℓ) Pr(Xm = ℓ | Xk = i)
ℓ=1
Xn
= Pk,m (i, ℓ)Pm,q (ℓ, j)
ℓ=1
= [Pk,m Pm,q ]i,j .

This property is widely known as Chapman-Kolmogorov equation.

– For DS Markov chains, the second property, and the Chapman-Kolmogorov


property imply:

πk = π1 P1,k = π1 P1,2 P2,k = · · · = π1 P1,2 P2,3 · · · Pk−1,k .

18
Homogeneous Markov Chains

Definition: We say that a Markov chain {Xk } is (time-)homogeneous if P1,2 =


Pm,m+1 does not depend on m.

ˆ Denote P := Pm,m+1 . P is called the one-step transition matrix of the


underlying Homogeneous Markov chain.

ˆ P is a row-stochastic matrix.

ˆ For Homogeneous Markov chains, we have Pm,n = P n−m .

ˆ Distribution of Xk is given by πk = πk−1 P = π0 P k .

ˆ With the abuse of notation, for a Homogeneous Markov chain P is also


called (one-step) transition probability matrix (TPM).

ˆ For homogeneous markov chains, the initial distribution and the one-step
TPM completely specifies the random process.

19
Graph-Theoretic Interpretation

ˆ Consider a homogeneoys MC on state space S with TPM P .


ˆ Consider a directed weighted graph G = (V, E, P ) where
– V = S = {1, . . . , n},
– E = {(i, j) | Pij > 0}, and
– Pij is the weight of edge i, j.
ˆ Then, the MC can be viewed as a random walk on this weighted graph.
ˆ Example: infectious disease model. Determine the TPM, and simulate the
MC.

0.9 0.95
0.05

Susceptible Infected Recovered


0.2
0.1

0.8

20
Classification of States

We introduce a few basic definitions.


ˆ An m-step walk on a graph G = (V, E) is an ordered string of nodes
i0 , i1 , . . . , im such that (ik−1 , ik ) ∈ E for all k ∈ {1, 2, . . . , m}.

ˆ A path is a walk where no two nodes are repeated. A cycle is a walk where
the first and last nodes are identical and no other node is repeated.

ˆ Let G = (V, E, P ) be the graph associated with a MC with TPM P . A


state j is accessible from state i, denoted i → j if there is a walk in the
graph from node i to node j.

ˆ In other words, there exists nodes i1 , i2 , . . . , ik such that (i, i1 ) ∈ E, (i1 , i2 ) ∈


E, . . . , (ik , j) ∈ E. The length of this walk is k + 1.

ˆ Equivalently, Pi,i1 > 0, Pi1 ,i2 > 0, . . . , Pik ,j > 0. Thus, [P k+1 ]i,j > 0.

ˆ Two states i and j communicate if i → j and j → i. This is denoted by


i ↔ j.

ˆ Naturally, if i ↔ j, j ↔ k, then i ↔ k.

ˆ A subset of states C ⊆ V is a communicating class if


1. i ∈ C, j ∈ C =⇒ i ↔ j, and
2. i ∈ C, j ∈
/ C =⇒ i ↮ j.
The set of states can be partitioned into distinct communicating classes.
Each state belongs to exactly one communicating class.

Definition: A state i is called recurrent if i → j =⇒ j → i. A state is


transient if it is not recurrent.

If a state is recurrent, there is no path to a state from which there is no return.

21
Classification of States Cont.

Theorem: In a given communicating class, either all states are recurrent


or all states are transient. Furthermore, in a finite-state MC, there is at least
one recurrent communicating class.

ˆ A matrix P is irreducible if for any i, j, [P kij ]ij > 0 for some kij ≥ 1. In
other words, i ↔ j for every pair of states i, j.
ˆ Graph theoretic interpretation: P is irreducible if there is a directed path
between any two nodes on the graph.
ˆ In this case, there is a single communicating class which is recurrent.

Definition: The period γi of a state i, to be greatest common divisor


(gcd) of gcd(k | [P k ]ii > 0).

ˆ Graph theoretic interpretation: gcd of lengths of all paths from i to itself.



0 1
ˆ Example: for P = , determine the period of its states.
1 0
ˆ All states in the same communicating class have the same period.
ˆ We say that a non-negative matrix P is aperiodic if γi = 1 for all i.
ˆ A (homoegeneous) Markov chain with the transition matrix P is said to be
irreducible (aperiodic) if P is irreducible (aperiodic).

22
Stationary and Limiting Distribution of a Markov Chain

ˆ Let P ∈ Rn×n be the single-step transition probability matrix of a homoge-


neous markov chain.

ˆ Let π0 be the distribution of initial state X0 . It follows that πn = π0 P n .

A vector π ⋆ ∈ R1×n is called invariant/stationary/steady-state distribution


of the markov chain with TPM P if
Pn
ˆ π ⋆ is a probability vector, i.e., π ⋆ (i) ≥ 0, i=1 π ⋆ (i) = 1, and
ˆ π⋆ = π⋆P .

If πk = π ⋆ for some k, then πm = π ⋆ for all m ≥ k.

Fundamental questions in the theory of (homogeneous) Markov chains:


ˆ Existence and Uniqueness: When does π ⋆ exist? Is it unique?
ˆ Ergodicity : When unique, under what conditions, πk → π ∗ ?
ˆ Mixing time: How fast does it converge to π ∗ ?
ˆ Occupation Probability : How often do we spend time on a given state?

We know that the TPM P satisfies the following properties.


ˆ P is non-negative.
ˆ P is row-stochastic, which implies that all eigenvalues reisde on or within
the unit circle, and 1 is an eigenvalue.
ˆ Note: π ⋆ is nothing but the left eigenvector of eigenvalue 1.
ˆ Thus, existence and uniqueness of stationary distribution is equivalent to
showing existence and uniqueness of a non-negative left eigenvector of the
TPM.

23
Example

1 1

ˆ Let P = 2
1
2
2 .
3 3

ˆ Solving for (u, v)P = (u, v) with v = 1 − u, we get u = 2


5 and v = 35 . Is
this unique?
ˆ What about P = I?

24
Linear Algebra Viewpoint

ˆ Give a matrix A ∈ Rn×n , we define its spectral radius as

ρ(A) := {|λ| : λ is an eigenvalue of A}.

ˆ An eigenvalue of A is called semi-simple if its algebratic multiplicity =


its geometric multiplicity.
– algebratic multiplicity: number of times the eigenvalue appears as
root of the characteristic equation
– geometric multiplicity: number of linearly independent eigenvectors
associated with this eigenvalue
It is called simple when both multiplicities are equal to 1.

ˆ The matrix A is called


– semi-convergent if limk→∞ Ak exists, and
– convergent if it is semi-convergent and limk→∞ Ak = 0n×n .

Theorem 1. A matrix A ∈ Rn×n is


ˆ convergent if and only if ρ(A) < 1, and
ˆ is semi-convergent if and only if either (i) ρ(A) < 1 or (ii) 1 is
a semi-simple eigenvalue and all other eigenvalues have magnitude
strictly less than 1.

25
Perron-Frobenius Theorem

A matrix A ∈ Rn×n is
ˆ non-negative if Aij ≥ 0 for all i, j.
Pn−1
ˆ irreducible if k=0 Ak is positive, i.e., all entries are strictly larger
than 0.
ˆ primitive if there exists some k̄ such that Ak̄ > 0.
ˆ positive if Aij > 0 for all i, j.

Theorem 2. A matrix A ∈ Rn×n be a non-negative matrix.


ˆ Then, there exists a real eigenvalue λ ≥ |µ| ≥ 0 where µ is any
other eigenvalue. The left and right eigenvectors associated with A
are non-negative.
ˆ If A is irreducible, λ ≥ |µ| is strictly positive and simple. The left
and right eigenvectors associated with A are unique and positive.
ˆ If A is primitive, λ > |µ|. The left and right eigenvectors associated
with A are unique and positive.

Let P ∈ Rn×n be the single-step transition probability matrix of a homoge-


neous markov chain.
ˆ Is P non-negative?
ˆ When is P irreducible? Does it imply P is semi-convergent?
ˆ When is P primitive? Does it imply P is semi-convergent?
ˆ When is P positive? Does it imply P is semi-convergent?

26
Case 1: MC with Single Recurrent Class

ˆ In this case, TPM P is irreducible. (why?)

ˆ From PF Theorem, largest eigenvalue 1 is simple, and left eigenvector is


unique and positive. In other words, π ⋆ exists and is unique.

ˆ However, if the states have period d > 1, then there are d eigenvalues on
the unit circle that are equally spaced. Such a matrix is not primitive, and
hence not semi-convergent.

ˆ When the states are aperiodic (i.e., period d = 1), then it is primitive, and
P is semi-convergent.

ˆ A MC which is both irreducible and aperiodic is called ergodic.

ˆ We can show that


   
1 w1 w 2 . . . wn
1   w1
 w 2 . . . wn 
lim P k = (1)k vw⊤ =  ..  w1 w2 . . . wn =   =: P∞ ,
  
..
k→∞ .  . 
1 w1 w 2 . . . wn

where v is the right eigenvector and w is the left eigenvector of 1. Note that
w = π⋆.

ˆ In addition, for any initial distribution π0 , we have

lim πk = lim π0 P k = π0 P∞ = π ⋆ .
k→∞ k→∞

27
Case 2: MC with one Recurrent Class and some Transient
States

ˆ Such a markov chain is called a unichain. The TPM P is no longer irre-


ducible and can be partitioned as
m1  m1 n−m1 
PRR 0
P =
n−m1
PT R P T T

where the first m1 states belong to the recurrent class, and the remaining
states being transient.

ˆ Though P is not irreducible, the submatrix PRR is irreducible which has a


unique stationary distribution πR⋆ ∈ R1×m1 .

ˆ Then, the vector π ⋆ = [πR



01×n−m1 ] is the unique stationary distribution
of P .

ˆ If the states in the recurrent class is apeiodic, then P is semi-convergent.


Such a MC is called an ergodic unichain.

The following result characterizes the uniqueness and limiting behavior of the
stationary distribution.

Theorem 3. Consider a finite-state homogeneous MC.


ˆ A MC has a unique stationary distribution π ⋆ if and only if it is a
unichain (i.e., it has a single recurrent class)
ˆ Let limk→∞ P k = P∞ . Each row of P∞ is identical and equal to π ⋆ if
and only if MC is an ergodic unichain (unichain with an aperiodic
recurrent class).

28
Case 3: MC with Multiple Recurrent Classes

ˆ The TPM P can be partitioned as P


 m1 m2 m3 n− mi
PR 1 0 0 0
 0 PR2 0 0 
P =
 0

0 PR 3 0 
PT R1 PT R2 PT R3 PT T
where the first m1 states belong to the first recurrent class, and so on.

ˆ For each recurrent class, the corresponding submatrix PRi is irreducible which
has a unique stationary distribution πi⋆ ∈ R1×mi .

ˆ Then, the vector [0 πi⋆ . . . 0] is a stationary distribution of P . Thus,


stationary distribution is not unique.

ˆ Every recurrent class adds one multiplicity to the eigenvalue 1.

ˆ P is semi-convergent only when every recurrent class is aperiodic. In this


case, limk→∞ P k = P∞ , but P∞ has non-identical rows. However, rows
corresponding to states in the same recurrent class are identical.

29
Ergodic Property

ˆ Let the initial state X0 = i.

ˆ Ti := inf{k ≥ 1 | Xk = i} (first passage time): smallest time index at


which the state takes value i

ˆ fi := P(Ti < ∞): return probability

ˆ mi := E[Ti ]: mean return time


P∞
ˆ νi := k=0 1{Xk =i} number of visits to i starting from i.

ˆ State i is recurrent if and only if fi = 1. State i is transient if and only if


fi < 1.

Theorem: If state i is recurrent, then E[νi ] = ∞. If state i is transient,


then E[νi ] < ∞.

Theorem: Suppose the TPM is irreducible and let π ⋆ be the unique sta-
tionary distribution. Then, mi = π⋆1(i) for all states i.

Theorem: Suppose the TPM is irreducible and aperiodic (i.e., ergodic)


with the stationary distribution π ⋆ . Then
n
1X
lim 1{Xk =i} = π ⋆ (i) almost surely.
n→∞ n
k=1

30
Application: Page-Rank Algorithm

ˆ Original idea of Google search ranking: Model a browsing person as a random


walker over the graph of internet!
ˆ Let G = (V, E) where d = number of webpages and there is a node for each
webpage.
ˆ (i, j) ∈ E if i has a link to j.
ˆ Then a person can be modeled as a random walker on G where
(
1
di j ∈ Ni
Pij =
0 otherwise.

ˆ Problem with this? Corresponding Markov chain is not irreducible.


ˆ Now let us add a small reset probability, i.e., consider a Markov chain with
one-step transition matrix

P̂ = (1 − a)P + aJ,

where a ∈ (0, 1) is a small reset parameter and J is the d × d matrix with


all elements being 1/d.
ˆ Then a Markov chain with the transition matrix P̂ is irreducible and aperiodic
(why?).
ˆ Therefore, it is ergodic, has a unique stationary distribution π ∗ , and πk → π ∗
as k → ∞.
ˆ More importantly average visit percentage of state (webpage) i by time
k→ πi∗ !
ˆ Therefore, webpage i is superios to j if πi∗ > πj∗ .
ˆ How does Google find π ∗ ?

31
Vector-valued Random Process

A random process X = {Xt }t∈T may be such that each Xt is a random vector
taking values in Rn . Then,
(a) Mean function:

µX (t) := E[Xt ] ∈ Rn

(b) Autocorrelation function:

RX (t1 , t2 ) := E[Xt1 Xt⊤2 ] ∈ Rn×n

(c) Autocovariance function:

CX (t1 , t2 ) := cov(Xt1 , Xt2 ) ∈ Rn×n .

For WSS, every element of CX (t1 , t2 ) should only depend on t2 − t1 .

32
Other Class of Processes

ˆ A stochastic process {Xt }t∈T is called a Gaussian Process if for every finite
set of indices t1 , t2 , . . . , tk , the collection of random variables Xt1 , Xt1 , . . . , Xtk
is jointly Gaussian.

ˆ A stochastic process which is both Gaussian and Markov is called Gauss-


Markov Process.

ˆ A stochastic process {Xt }t∈T is said to have independent increments if


for every finite set of indices t1 , t2 , . . . , tk , the collection of random variables
Xt2 − Xt1 , Xt3 − Xt2 , . . . , Xtk − Xtk−1 are mutually indepdenent.

ˆ The increments are stationary if Xt2 − Xt1 and Xt2 +s − Xt1 +s have the same
distribution irrespective of the value of s.

ˆ Brownian Motion/Wiener Process: A stochastic process {Xt }t∈T is


a Wiener Process if
1. X0 = 0,
2. the process has stationary and independent increments,
3. Xt − Xs ∼ N (0, σ 2 (t − s)),
4. the sample paths are continuous with probability 1.
For a Wiener process, one can show that the sample paths are not differen-
tiable by showing
h X(t + ∆) − X(t) i σ 2
lim var = → ∞.
∆→0 ∆ ∆

33
Dynamical System

ˆ Deterministic discrete-time dynamical system in state-space form is given


by:
xk+1 = fk (xk , uk ), k = 0, 1, . . . ,
where xk ∈ Rn is the state at time k and uk ∈ Rm is the input at time k.
ˆ State variable: summarizes past information such that if we know the state
at time k and the input for all t ≥ k, then we can completely determine the
future states.
ˆ In other words, if we know the current state, we do not need to store past
states and inputs to predict the future.
ˆ If fk = f for all k, the system is time-invariant.

34
Stochastic Dynamical System

ˆ Stochastic Model: the future state is uncertain even if the current state
and input are known. There are two ways of representing such a system.
Both are equivalent under reasonable assumptions.

ˆ State-space form:

xk+1 = fk (xk , uk , wk ), k = 0, 1, . . . ,

where wk ∈ Rw is a random variable/noise/disturbace which is not under


our control (unlike input u).

ˆ Note that {w1 , w2 , . . . , } is a discrete-time random process, as is {x1 , x2 , . . . , }.

ˆ Example: xk+1 = axk + wk where wk ∈ N (c, 1) and x0 = 5. What will the


trajectories look like for different values of a and c? What is the distribution
of xk as k → ∞? Is this process Markovian?

35
Stochastic Linear System

A stochastic linear system is formally defined as

xk+1 = Ak xk + Bk uk + wk .

Problem: recursively determine the mean and variance of xk given that E[wk ] = 0,
var(wk ) = Q and x0 is known.

36
Representation via Transition Kernel

ˆ Recall the state-space form: xk+1 = fk (xk , uk , wk ), k = 0, 1, . . . .

ˆ Here, the distribution of xk+1 can be found in terms of the function fk and
indirectly, as a function of basic random variables (x0 , w0 , . . . , wk ).

ˆ The alternative approach is to directly specify the distribution of xk+1 instead


of relying on the function fk . In particular, the conditional distribution
of Xk+1 given xk and uk is specified for all values of xk and uk .

ˆ For the dynamical system to be Markovian, we need to show that for every
Borel subset A and for all k,

P(Xk+1 ∈ A|x0 , u0 , x1 , u1 , . . . , xk , uk ) = P(Xk+1 ∈ A|xk , uk ).

ˆ Is the above property always true?

37
Observation Model

ˆ In many instances, the states can not be directly measured.

ˆ Instead, we observe “output” quantities that depend on the state as

yk = gk (xk , vk ),

where vk is a random variable termed “measurement noise.”

ˆ Alternatively, the conditional distribution of yk given xk is specified.

ˆ In case of a linear system, yk = Ck xk + vk .

ˆ One problem of significant interest is to infer or estimate the state xk given


the measured / output quantities yk in an online and recursive manner.

ˆ Module C will tackle this issue.

38

You might also like