0% found this document useful (0 votes)

29 views38 pages

ModuleB 1

The document discusses random processes and their properties. A random process is a collection of random variables indexed over time. The document defines discrete-time and continuous-time random processes and covers topics like stationarity, autocorrelation, and examples of random processes.

Uploaded by

Ankur Mondal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views38 pages

ModuleB 1

Uploaded by

Ankur Mondal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Module B: Random Processes

A random process is a family/ collection of random variables indexed by a

set T , stated at {Xt }t∈T .

The set T is often interpreted as “time.”

 
X1
X 
 2
When T = {1, 2, .....n}, then {Xt }t∈T =  ..  is a random vector.
 . 
Xn
When T = {1, 2, 3, ....} = N, then {Xt }t∈T = (X1 , X2 , X3 , .....) is called
a discrete-time random process.
When T = R, {Xt }t∈T is an uncountable collection of random variables
and is called a continuous-time random process.

Recall: Xt : Ω → R fix ω: Xt (ω) : function of t is called the sample path.


1
1 w.p
 3
Example Xt = cos(2π ωt) where ω the random outcome ω = 2 w.p 1
 3
1
3 w.p

3

How do we specify a random process {Xt }t∈T : To fully specify a random

process, for any finite collection of indices (t1 , t2 , ......tn ), the joint distribution
(Xt1 , Xt2 , ......Xtn ) should be provided.

1
Deterministic vs Stochastic Dynamical Systems

Deterministic: starting from x0 ∈ Rn for all t ≥ 0

xt+1 = f (t, xt ).

More generally: xt+1 = f (t, xt , . . . , xt−m ) where m is the memory of the

system.
Example: n = 1, starting at x0 > 0, a simple (deterministic) population
growth model:
xt+1 = r0 xt .

Note that xt = r0t x0 .

Random process: starting from x0 ∈ Rn for all t ≥ 0

xt+1 = f (t, xt , wt ).

More generally: xt+1 = f (t, xt , . . . , xt−m , wt ) where m is the memory of the

system and wt is a random variable/vector.
Example: Beginning phase of a pandemics: for some initially infected pop-
ulation x0 > 0, the population of infected people at the beginning phase of
a pandemics can be modeled by:

xt+1 = rt xt ,

where rt is a non-negative random variable independent of rk for k < t with

some E[rt ] = r0 .

2
Examples of Random Processes

Averaging: suppose that {wt } is an independently and identically distributed

random process with E[wk ] = µ.
w1 +...+wt
How does the running average xt = t behave as t → ∞?
In this case:

txt = (t − 1)xt−1 + wt
1 1
xt = (1 − )xt−1 + wt
t t
xt = ft (xt−1 , wt )
1 1
ft (x, w) = (1 − )x + w.
t t
What happens if we use other weights such as: xt = w1 +...+w
√
t
t
?
What if we don’t have any weights at all, i.e., xt = w1 + . . . + wt ? What
happens?
What can we say about asymptotic behavior of such processes in general?

3
Terminology

For a random process X = {Xt }t∈T

(a) Mean function:

µX (t) := E[Xt ]

(b) Autocorrelation function:

RX (t1 , t2 ) := E[Xt1 Xt2 ]

(c) Autocovariance function:

CX (t1 , t2 ) := RX (t1 , t2 ) − µX (t1 )µX (t2 )

4
For an i.i.d. process

If the random process {Xt } is i.i.d., then

(a) For the mean function:

µX (t) = E[Xt ] = E[X0 ].

Therefore, we have a constant mean function.

(b) For the Autocorrelation function:
(
E[Xt21 ] = E[X12 ] t1 = t2
RX (t1 , t2 ) = E[Xt1 Xt2 ] = .
E[Xt1 ]E[Xt2 ] = µX (t1 )µX (t2 ) = µX (0)2 t1 ̸= t2

(c) Autocovariance function:

(
var(X1 ) t1 = t2
CX (t1 , t2 ) = RX (t1 , t2 ) − µX (t1 )µX (t2 ) = .
0 ̸ t2
t1 =
That is, the random process is uncorrelated in time.

Plus many other properties are true.

5
Stationary Processes

A random process is Strict Sense Stationary (SSS) if the (finite) joint

probability distributions (CDFs) are invariant under shift, i.e., for all t1 <
t2 < · · · < tk and all α1 , . . . , αk ∈ R:

FXt1 ,...,Xtk (α1 , . . . , αk ) = FXt1 +s ,...,Xtk +s (α1 , . . . , αk )

for all −t1 ≤ s.

Example: i.i.d. processes as

FXt1 ,...,Xtk (α1 , . . . , αk ) = FXt1 (α1 ) · · · FXt1 (αk ) = FX (α1 ) · · · FX (αk ).

A random process is Wide Sense Stationary (WSS) if

1. the mean function does not depend on time t, and
2. the RX (t1 , t2 ) = f (t1 − t2 ), i.e., autocorrelation function is just a
function of t1 − t2 .
Example: i.i.d. processes

For two random processes {Xt }t∈T and {Yt }t∈T ,

Cross-correlation RXY (t1 , t2 ) = E[X(t1 )Y (t2 )] ̸= E[Y (t1 )X(t2 )]
Cross-covariance CXY (t1 , t2 ) = cov[X(t1 ), Y (t2 )] = RXY (t1 , t2 )−µX (t1 )µY (t2 )
{Xt }t∈T and {Yt }t∈T are jointly WSS if
Both {Xt }t∈T and {Yt }t∈T are individually WSS
RXY (t1 , t2 ) = RXY (t1 − t2 )

6
Example: Random Walk

Let {Xk } be a random walk, given by Xk+1 = Xk + Zk where {Zk } is i.i.d. with
zero mean and variance σ 2 and X0 = 0 a.s.
(a) For the mean function:

µX (k) = E[Xk−1 + Zk−1 ] = E[Xk−1 ].

Therefore, µX (k) = µX (k − 1) = . . . = µX (0) = 0.

(b) For the Autocorrelation function: Let k1 ≤ k2 :

RX (k1 , k2 ) = E[Xk1 Xk2 ] = E[Xk1 (Xk2 − Xk1 + Xk1 )]

= E[Xk1 (Xk2 − Xk1 )] + E[Xk21 ]
= E[Xk21 ] = k1 σ 2 .

Therefore, RX (k1 , k2 ) = min(k1 , k2 )σ 2 . Thus, such a process is not WSS

and hence, not an SSS.
(c) Autocovariance function: since the process is zero mean CX = RX .

7
Continuous Time Random Processes

Example: for a deterministic α > 0 and frequency ω, let Xt = α cos(ωt + θ)

where θ ∼ U ([0, 2π]).
– The mean function:
Z 2π
1
µX (t) = E[α cos(ωt + θ)] = α cos(ωt + θ)dθ = 0.
2π 0

– The correlation function:

RX (t1 , t2 ) = E[α cos(ωt1 + θ)α cos(ωt2 + θ)]

Z 2π
1
= α2 cos(ωt1 + θ) cos(ωt2 + θ)dθ
2π 0
α2 2π
Z
= cos(ωt1 + θ) cos(ωt2 + θ)dθ
2π 0
α2 2π α2 2π
Z Z
= cos(ω(t1 + t2 ) + 2θ)dθ + cos(ω(t1 − t2 ))dθ
4π 0 4π 0
α2 2π
Z
= cos(ω(t1 − t2 ))dθ
4π 0
α2
= cos(ω(t1 − t2 )).
2

8
Properties of WSS Processes

Some properties of a WSS process {Xt }:

1. RX (τ ) = E[X(t)X(t + τ )] is an even function, i.e., RX (τ ) = RX (−τ ).
2. RX (0) ≥ RX (τ ) for all τ .
3. For independent processes {X(t)} and {Y (t)} with zero mean, RX+Y (τ ) =
RX (τ ) + RY (τ ).

Proof on board in class.

9
Ergodic Behavior

Statistical mean: µX (t) = E[X(t)] =

R
ω∈Ω Xt (ω) dPXt (ω).
If we have M samples of Xt1 , denoted (b b2t1 , . . . , x
x1t1 , x bMt1 ), drawn from PXt1 ,
1
Pm
then we can estimate the statistical average as µ
bX (t1 ) = M bit1 .
i=1 x

However, suppose we have a single sample path of the random process given
by x1 (ω0 ), x2 (ω0 ), . . .. Then, we can find the temporal mean and au-
tocorrelation as
1 T
Z
X(ω0 ) = xt (ω0 )dt
T 0
1 T
Z
RX (τ ) = xt+τ (ω0 )xt (ω0 )dt
T 0

Do the temporal and statistical averages coincide? Yes, when the process is
ergodic. The random process {Xt }t∈T is ergodic when
1 T
Z
E[Xt ] = lim Xt (ω0 ) dt.
T →∞ T 0

It is implicit that for ergodic process, E[Xt ] = µX (t) = µX for all t.

For a discrete-time process, we replace the integral by summation to compute
temporal averages.

Mean-Square Ergodic Theorem: Let {Xt }t∈T be a wide sense stationary pro-
cess with E[Xt ] = µX and auto-correlation RX (τ ), and let the Fourier
1
RT
transform of RX (τ ) exists. Let XT (ω) = 2T −T Xt (ω) dt. Then,

lim E[(XT − µX )2 ] = 0.
T →∞

In other words, XT converges to µX in mean-square sense.

The implication of the above theorem is that, we can approximate mean/ Corre-
lation by temporal average computed from a single sample path.

10
Random Process and LTI System

Suppose we have a LTI system with impulse response h(t). If we apply input
signal x(t) to this system, the output signal y(t) is given as
Z ∞
y(t) = h(τ )x(t − τ )dτ =: x(t) ⊛ h(t).
∞

Now, suppose the input X(t) is a random process with mean µX (t) and
autocorrelation RX (t1 , t2 ). Determine the mean and autocorrelation of Y .

If X(t) is WSS, is Y (t) also WSS?

Yes. Derivation in class.

Are X(t) and Y (t) jointly WSS?

Yes. Derivation in class. We can show that

RY X (τ ) = h(τ ) ⊛ RX (τ )

11
Power Spectral Density (PSD)

R∞
From the above discussion, we have RY X (τ ) = ∞ h(s)RX (τ − s)ds.

For a CT WSS process X(t) (that is integrable), we can find the “power
spectral density” at frequency ω (rad/s):
Z ∞
SX (ω) := F T [RX (τ )] = RX (τ )e−jωτ dτ
−∞

Thus, SY X (ω) = H(ω)SX (ω) where H(ω) is the Fourier transform of the
impulse response h(t).

We can further show that

RY (τ ) = h(τ ) ⊛ RXY (τ )
=⇒ SY (ω) = H(ω) × SXY (ω)
In addition, SY X (ω) = H(ω) × SX (ω)
Since, RXY (τ ) = RY X (τ ), we have SXY (ω) = SY X (ω)∗
=⇒ SY (ω) = |H(ω)|2 SX (ω).

12
Discrete-time WSS Processes

A discrete-time random processs (Xn )n∈N is a collection of random variables

(X1 , X2 , . . . , Xn , . . .).

Mean function µX [n] = E[Xn ].

Autocorrelation function RX [n1 , n2 ] = E[Xn1 Xn2 ].

Autocovariance function CX [n1 , n2 ] = cov(Xn1 , Xn2 ).

Cross-correlation function RXY [n1 , n2 ] = E[Xn1 Yn2 ].

For X to be W.S.S, the following properties need to be satisfied.

1. µX [n] = µ independent of n.
2. RX [n1 , n2 ] = RX [n2 − n1 ].

Properties such as ergodicity and output of LTI system to a WSS input

continue to hold in an analogous manner.

13
Module B.2: Markov Chains

Markov Process: A random process whose probability distribution at time

t + 1 given the past only depends on its value at time t. Specifically,

Pr(Xk+1 ∈ A | Xk , . . . , X1 ) = Pr(Xk+1 ∈ A | Xk ).

More generally

Pr(Xk+1 ∈ A | Xki , . . . , Xk1 ) = Pr(Xk+1 ∈ A | Xki ),

for any k1 < k1 < . . . < ki < k.

If the (time) index set is continuous, the corresponding random process is
called Markov Process.
In this course: we focus on discrete-time Markov process where each random
variable Xk is a discrete random variable that takes values from a finite set.
Example: Infectious disease with reinfection where an individual can be in
one of two possible states: susceptible (S) and infected (I).

14
Formal Definition

Definition: We say that a (DT) random process {Xk } is a Markov chain

over a discrete-space if
1. Xk ’s are all discrete random variables with common support S, i.e.,
Pr(Xk ∈ S) = 1 for all k, where S is countable, and
2. for all i ≥ 1, all 1 ≤ k1 < k2 < . . . < ki ≤ k, and all 1, . . . , i, s ∈ S:

Pr(Xk+1 = s | Xki = si , . . . , Xk1 = s1 ) = Pr(Xk+1 = s | Xki = si ).

(1)

S is called the state space and each s ∈ S is called a state. Relation (1) is
called Markov property.
If S is finite, {Xk } is called a finite state Markov chain.

15
Transition Probabilities

From this point on assume S is a finite set with elements, S = {1, . . . , n}.
Unless otherwise stated, many of the following discussions hold for n = ∞
but for convenience we assume that n is finite.
For any k, let πk be the (marginal) probability mass function Xk , i.e.,

πk (i) = Pr(Xk = i).

Note that the vector πk is non-negative and di=1 πk (i) = 1. Such a vector
P
is called a stochastic (sometimes probability) vector. It is convenient to
assume that πk is a row vector.
For any 1 ≤ k1 < k2 , define the matrix (array)

Pk1 ,k2 (i, j) = Pr(Xk2 = j | Xk1 = i).

Pk1 ,k2 ∈ Rn1 ×n2 is called the transition matrix of MC from time k1 to time
k2 . In other words,
πk2 = πk1 Pk1 ,k2 .

We also (naturally) define Pk,k := I, where I is the n × n identity matrix.

16
Properties of Transition Matrices

Definition: We say that a n × n matrix A is a row-stochastic matrix if (i) A

is non-negative, and (ii) A1 = 1 (or each row sums up to one).
Properties of the transition matrices:
– Row-stochastic: For any k ≤ m, Pk,m is a row-stochastic matrix: The
non-negativeness follows from the definition. Also, each row adds up to
one:
n
X n
X
Pk,m (i, j) = Pr(Xm = j | Xk = i) = 1.
j=1 j=1

– For any k ≤ m, we have:

πm = πk Pk,m .

This follow from the fact:

n
X
πm (j) = Pr(Xm = j) = Pr(Xm = j, Xk = i)
i=1
n
X
= Pr(Xm = j | Xk = i) Pr(Xk = i)
i=1
= [πk Pk,m ]j .

17
Properties of Transition Matrices cont.

Properties of the transition matrices cont.:

– Semigroup property: For any k ≤ m ≤ q, we have:

Pk,q = Pk,m Pm,q .

To show this, let i, j being fixed. Then, we have

Pk,q (i, j) = Pr(Xq = j | Xk = i)

This property is widely known as Chapman-Kolmogorov equation.

– For DS Markov chains, the second property, and the Chapman-Kolmogorov

property imply:

πk = π1 P1,k = π1 P1,2 P2,k = · · · = π1 P1,2 P2,3 · · · Pk−1,k .

18
Homogeneous Markov Chains

Definition: We say that a Markov chain {Xk } is (time-)homogeneous if P1,2 =

Pm,m+1 does not depend on m.

Denote P := Pm,m+1 . P is called the one-step transition matrix of the

underlying Homogeneous Markov chain.

P is a row-stochastic matrix.

For Homogeneous Markov chains, we have Pm,n = P n−m .

Distribution of Xk is given by πk = πk−1 P = π0 P k .

With the abuse of notation, for a Homogeneous Markov chain P is also

called (one-step) transition probability matrix (TPM).

For homogeneous markov chains, the initial distribution and the one-step
TPM completely specifies the random process.

19
Graph-Theoretic Interpretation

Consider a homogeneoys MC on state space S with TPM P .

Consider a directed weighted graph G = (V, E, P ) where
– V = S = {1, . . . , n},
– E = {(i, j) | Pij > 0}, and
– Pij is the weight of edge i, j.
Then, the MC can be viewed as a random walk on this weighted graph.
Example: infectious disease model. Determine the TPM, and simulate the
MC.

0.9 0.95
0.05

Susceptible Infected Recovered

0.2
0.1

0.8

20
Classification of States

We introduce a few basic definitions.

An m-step walk on a graph G = (V, E) is an ordered string of nodes
i0 , i1 , . . . , im such that (ik−1 , ik ) ∈ E for all k ∈ {1, 2, . . . , m}.

A path is a walk where no two nodes are repeated. A cycle is a walk where
the first and last nodes are identical and no other node is repeated.

Let G = (V, E, P ) be the graph associated with a MC with TPM P . A

state j is accessible from state i, denoted i → j if there is a walk in the
graph from node i to node j.

In other words, there exists nodes i1 , i2 , . . . , ik such that (i, i1 ) ∈ E, (i1 , i2 ) ∈

E, . . . , (ik , j) ∈ E. The length of this walk is k + 1.

Equivalently, Pi,i1 > 0, Pi1 ,i2 > 0, . . . , Pik ,j > 0. Thus, [P k+1 ]i,j > 0.

Two states i and j communicate if i → j and j → i. This is denoted by

i ↔ j.

Naturally, if i ↔ j, j ↔ k, then i ↔ k.

A subset of states C ⊆ V is a communicating class if

1. i ∈ C, j ∈ C =⇒ i ↔ j, and
2. i ∈ C, j ∈
/ C =⇒ i ↮ j.
The set of states can be partitioned into distinct communicating classes.
Each state belongs to exactly one communicating class.

Definition: A state i is called recurrent if i → j =⇒ j → i. A state is

transient if it is not recurrent.

If a state is recurrent, there is no path to a state from which there is no return.

21
Classification of States Cont.

Theorem: In a given communicating class, either all states are recurrent

or all states are transient. Furthermore, in a finite-state MC, there is at least
one recurrent communicating class.

A matrix P is irreducible if for any i, j, [P kij ]ij > 0 for some kij ≥ 1. In
other words, i ↔ j for every pair of states i, j.
Graph theoretic interpretation: P is irreducible if there is a directed path
between any two nodes on the graph.
In this case, there is a single communicating class which is recurrent.

Definition: The period γi of a state i, to be greatest common divisor

(gcd) of gcd(k | [P k ]ii > 0).

Graph theoretic interpretation: gcd of lengths of all paths from i to itself.

0 1
Example: for P = , determine the period of its states.
1 0
All states in the same communicating class have the same period.
We say that a non-negative matrix P is aperiodic if γi = 1 for all i.
A (homoegeneous) Markov chain with the transition matrix P is said to be
irreducible (aperiodic) if P is irreducible (aperiodic).

22
Stationary and Limiting Distribution of a Markov Chain

Let P ∈ Rn×n be the single-step transition probability matrix of a homoge-

neous markov chain.

Let π0 be the distribution of initial state X0 . It follows that πn = π0 P n .

A vector π ⋆ ∈ R1×n is called invariant/stationary/steady-state distribution

of the markov chain with TPM P if
Pn
π ⋆ is a probability vector, i.e., π ⋆ (i) ≥ 0, i=1 π ⋆ (i) = 1, and
π⋆ = π⋆P .

If πk = π ⋆ for some k, then πm = π ⋆ for all m ≥ k.

Fundamental questions in the theory of (homogeneous) Markov chains:

Existence and Uniqueness: When does π ⋆ exist? Is it unique?
Ergodicity : When unique, under what conditions, πk → π ∗ ?
Mixing time: How fast does it converge to π ∗ ?
Occupation Probability : How often do we spend time on a given state?

We know that the TPM P satisfies the following properties.

P is non-negative.
P is row-stochastic, which implies that all eigenvalues reisde on or within
the unit circle, and 1 is an eigenvalue.
Note: π ⋆ is nothing but the left eigenvector of eigenvalue 1.
Thus, existence and uniqueness of stationary distribution is equivalent to
showing existence and uniqueness of a non-negative left eigenvector of the
TPM.

23
Example

1 1

Let P = 2
1
2
2 .
3 3

Solving for (u, v)P = (u, v) with v = 1 − u, we get u = 2

5 and v = 35 . Is
this unique?
What about P = I?

24
Linear Algebra Viewpoint

Give a matrix A ∈ Rn×n , we define its spectral radius as

ρ(A) := {|λ| : λ is an eigenvalue of A}.

An eigenvalue of A is called semi-simple if its algebratic multiplicity =

its geometric multiplicity.
– algebratic multiplicity: number of times the eigenvalue appears as
root of the characteristic equation
– geometric multiplicity: number of linearly independent eigenvectors
associated with this eigenvalue
It is called simple when both multiplicities are equal to 1.

The matrix A is called

– semi-convergent if limk→∞ Ak exists, and
– convergent if it is semi-convergent and limk→∞ Ak = 0n×n .

Theorem 1. A matrix A ∈ Rn×n is

convergent if and only if ρ(A) < 1, and
is semi-convergent if and only if either (i) ρ(A) < 1 or (ii) 1 is
a semi-simple eigenvalue and all other eigenvalues have magnitude
strictly less than 1.

25
Perron-Frobenius Theorem

A matrix A ∈ Rn×n is
non-negative if Aij ≥ 0 for all i, j.
Pn−1
irreducible if k=0 Ak is positive, i.e., all entries are strictly larger
than 0.
primitive if there exists some k̄ such that Ak̄ > 0.
positive if Aij > 0 for all i, j.

Theorem 2. A matrix A ∈ Rn×n be a non-negative matrix.

Then, there exists a real eigenvalue λ ≥ |µ| ≥ 0 where µ is any
other eigenvalue. The left and right eigenvectors associated with A
are non-negative.
If A is irreducible, λ ≥ |µ| is strictly positive and simple. The left
and right eigenvectors associated with A are unique and positive.
If A is primitive, λ > |µ|. The left and right eigenvectors associated
with A are unique and positive.

Let P ∈ Rn×n be the single-step transition probability matrix of a homoge-

neous markov chain.
Is P non-negative?
When is P irreducible? Does it imply P is semi-convergent?
When is P primitive? Does it imply P is semi-convergent?
When is P positive? Does it imply P is semi-convergent?

26
Case 1: MC with Single Recurrent Class

In this case, TPM P is irreducible. (why?)

From PF Theorem, largest eigenvalue 1 is simple, and left eigenvector is

unique and positive. In other words, π ⋆ exists and is unique.

However, if the states have period d > 1, then there are d eigenvalues on
the unit circle that are equally spaced. Such a matrix is not primitive, and
hence not semi-convergent.

When the states are aperiodic (i.e., period d = 1), then it is primitive, and
P is semi-convergent.

A MC which is both irreducible and aperiodic is called ergodic.

We can show that

   
1 w1 w 2 . . . wn
1 w1
 w 2 . . . wn 
lim P k = (1)k vw⊤ =  ..  w1 w2 . . . wn =   =: P∞ ,
  
..
k→∞ .  . 
1 w1 w 2 . . . wn

where v is the right eigenvector and w is the left eigenvector of 1. Note that
w = π⋆.

In addition, for any initial distribution π0 , we have

lim πk = lim π0 P k = π0 P∞ = π ⋆ .
k→∞ k→∞

27
Case 2: MC with one Recurrent Class and some Transient
States

Such a markov chain is called a unichain. The TPM P is no longer irre-

ducible and can be partitioned as
m1 m1 n−m1
PRR 0
P =
n−m1
PT R P T T

where the first m1 states belong to the recurrent class, and the remaining
states being transient.

Though P is not irreducible, the submatrix PRR is irreducible which has a

unique stationary distribution πR⋆ ∈ R1×m1 .

Then, the vector π ⋆ = [πR

⋆
01×n−m1 ] is the unique stationary distribution
of P .

If the states in the recurrent class is apeiodic, then P is semi-convergent.

Such a MC is called an ergodic unichain.

The following result characterizes the uniqueness and limiting behavior of the
stationary distribution.

Theorem 3. Consider a finite-state homogeneous MC.

A MC has a unique stationary distribution π ⋆ if and only if it is a
unichain (i.e., it has a single recurrent class)
Let limk→∞ P k = P∞ . Each row of P∞ is identical and equal to π ⋆ if
and only if MC is an ergodic unichain (unichain with an aperiodic
recurrent class).

28
Case 3: MC with Multiple Recurrent Classes

The TPM P can be partitioned as P

 m1 m2 m3 n− mi
PR 1 0 0 0
 0 PR2 0 0 
P =
 0

0 PR 3 0 
PT R1 PT R2 PT R3 PT T
where the first m1 states belong to the first recurrent class, and so on.

For each recurrent class, the corresponding submatrix PRi is irreducible which
has a unique stationary distribution πi⋆ ∈ R1×mi .

Then, the vector [0 πi⋆ . . . 0] is a stationary distribution of P . Thus,

stationary distribution is not unique.

Every recurrent class adds one multiplicity to the eigenvalue 1.

P is semi-convergent only when every recurrent class is aperiodic. In this

case, limk→∞ P k = P∞ , but P∞ has non-identical rows. However, rows
corresponding to states in the same recurrent class are identical.

29
Ergodic Property

Let the initial state X0 = i.

Ti := inf{k ≥ 1 | Xk = i} (first passage time): smallest time index at

which the state takes value i

fi := P(Ti < ∞): return probability

mi := E[Ti ]: mean return time

P∞
νi := k=0 1{Xk =i} number of visits to i starting from i.

State i is recurrent if and only if fi = 1. State i is transient if and only if

fi < 1.

Theorem: If state i is recurrent, then E[νi ] = ∞. If state i is transient,

then E[νi ] < ∞.

Theorem: Suppose the TPM is irreducible and let π ⋆ be the unique sta-
tionary distribution. Then, mi = π⋆1(i) for all states i.

Theorem: Suppose the TPM is irreducible and aperiodic (i.e., ergodic)

with the stationary distribution π ⋆ . Then
n
1X
lim 1{Xk =i} = π ⋆ (i) almost surely.
n→∞ n
k=1

30
Application: Page-Rank Algorithm

Original idea of Google search ranking: Model a browsing person as a random

walker over the graph of internet!
Let G = (V, E) where d = number of webpages and there is a node for each
webpage.
(i, j) ∈ E if i has a link to j.
Then a person can be modeled as a random walker on G where
(
1
di j ∈ Ni
Pij =
0 otherwise.

Problem with this? Corresponding Markov chain is not irreducible.

Now let us add a small reset probability, i.e., consider a Markov chain with
one-step transition matrix

P̂ = (1 − a)P + aJ,

where a ∈ (0, 1) is a small reset parameter and J is the d × d matrix with

all elements being 1/d.
Then a Markov chain with the transition matrix P̂ is irreducible and aperiodic
(why?).
Therefore, it is ergodic, has a unique stationary distribution π ∗ , and πk → π ∗
as k → ∞.
More importantly average visit percentage of state (webpage) i by time
k→ πi∗ !
Therefore, webpage i is superios to j if πi∗ > πj∗ .
How does Google find π ∗ ?

31
Vector-valued Random Process

A random process X = {Xt }t∈T may be such that each Xt is a random vector
taking values in Rn . Then,
(a) Mean function:

µX (t) := E[Xt ] ∈ Rn

(b) Autocorrelation function:

RX (t1 , t2 ) := E[Xt1 Xt⊤2 ] ∈ Rn×n

(c) Autocovariance function:

CX (t1 , t2 ) := cov(Xt1 , Xt2 ) ∈ Rn×n .

For WSS, every element of CX (t1 , t2 ) should only depend on t2 − t1 .

32
Other Class of Processes

A stochastic process {Xt }t∈T is called a Gaussian Process if for every finite
set of indices t1 , t2 , . . . , tk , the collection of random variables Xt1 , Xt1 , . . . , Xtk
is jointly Gaussian.

A stochastic process which is both Gaussian and Markov is called Gauss-

Markov Process.

A stochastic process {Xt }t∈T is said to have independent increments if

for every finite set of indices t1 , t2 , . . . , tk , the collection of random variables
Xt2 − Xt1 , Xt3 − Xt2 , . . . , Xtk − Xtk−1 are mutually indepdenent.

The increments are stationary if Xt2 − Xt1 and Xt2 +s − Xt1 +s have the same
distribution irrespective of the value of s.

Brownian Motion/Wiener Process: A stochastic process {Xt }t∈T is

a Wiener Process if
1. X0 = 0,
2. the process has stationary and independent increments,
3. Xt − Xs ∼ N (0, σ 2 (t − s)),
4. the sample paths are continuous with probability 1.
For a Wiener process, one can show that the sample paths are not differen-
tiable by showing
h X(t + ∆) − X(t) i σ 2
lim var = → ∞.
∆→0 ∆ ∆

33
Dynamical System

Deterministic discrete-time dynamical system in state-space form is given

by:
xk+1 = fk (xk , uk ), k = 0, 1, . . . ,
where xk ∈ Rn is the state at time k and uk ∈ Rm is the input at time k.
State variable: summarizes past information such that if we know the state
at time k and the input for all t ≥ k, then we can completely determine the
future states.
In other words, if we know the current state, we do not need to store past
states and inputs to predict the future.
If fk = f for all k, the system is time-invariant.

34
Stochastic Dynamical System

Stochastic Model: the future state is uncertain even if the current state
and input are known. There are two ways of representing such a system.
Both are equivalent under reasonable assumptions.

State-space form:

xk+1 = fk (xk , uk , wk ), k = 0, 1, . . . ,

where wk ∈ Rw is a random variable/noise/disturbace which is not under

our control (unlike input u).

Note that {w1 , w2 , . . . , } is a discrete-time random process, as is {x1 , x2 , . . . , }.

Example: xk+1 = axk + wk where wk ∈ N (c, 1) and x0 = 5. What will the

trajectories look like for different values of a and c? What is the distribution
of xk as k → ∞? Is this process Markovian?

35
Stochastic Linear System

A stochastic linear system is formally defined as

xk+1 = Ak xk + Bk uk + wk .

Problem: recursively determine the mean and variance of xk given that E[wk ] = 0,
var(wk ) = Q and x0 is known.

36
Representation via Transition Kernel

Recall the state-space form: xk+1 = fk (xk , uk , wk ), k = 0, 1, . . . .

Here, the distribution of xk+1 can be found in terms of the function fk and
indirectly, as a function of basic random variables (x0 , w0 , . . . , wk ).

The alternative approach is to directly specify the distribution of xk+1 instead

of relying on the function fk . In particular, the conditional distribution
of Xk+1 given xk and uk is specified for all values of xk and uk .

For the dynamical system to be Markovian, we need to show that for every
Borel subset A and for all k,

P(Xk+1 ∈ A|x0 , u0 , x1 , u1 , . . . , xk , uk ) = P(Xk+1 ∈ A|xk , uk ).

Is the above property always true?

37
Observation Model

In many instances, the states can not be directly measured.

Instead, we observe “output” quantities that depend on the state as

yk = gk (xk , vk ),

where vk is a random variable termed “measurement noise.”

Alternatively, the conditional distribution of yk given xk is specified.

In case of a linear system, yk = Ck xk + vk .

One problem of significant interest is to infer or estimate the state xk given

the measured / output quantities yk in an online and recursive manner.

Module C will tackle this issue.

Random Process Calculus
No ratings yet
Random Process Calculus
9 pages
4 Discrete Time Random Processes
No ratings yet
4 Discrete Time Random Processes
16 pages
Unit-4-Random Process
No ratings yet
Unit-4-Random Process
58 pages
EE7401 Probability and Random Processes
No ratings yet
EE7401 Probability and Random Processes
58 pages
Random Process in Analog Communication Systems
100% (1)
Random Process in Analog Communication Systems
91 pages
Unit 5 CSE
No ratings yet
Unit 5 CSE
33 pages
CE 513-2018-RP-Oct4
No ratings yet
CE 513-2018-RP-Oct4
67 pages
04-Random Processestk
No ratings yet
04-Random Processestk
32 pages
04-Random Processes
No ratings yet
04-Random Processes
37 pages
04 e PDF
No ratings yet
04 e PDF
85 pages
04 Random Processestk
No ratings yet
04 Random Processestk
28 pages
18MAB203T-Probability and Stochastic Processes: Dr. S. Athithan
No ratings yet
18MAB203T-Probability and Stochastic Processes: Dr. S. Athithan
37 pages
Chapters 5
No ratings yet
Chapters 5
20 pages
Unit 3
No ratings yet
Unit 3
227 pages
Stochastic Processes - Lecture - 2024
No ratings yet
Stochastic Processes - Lecture - 2024
41 pages
CH 07
No ratings yet
CH 07
34 pages
Randomprocess Final
No ratings yet
Randomprocess Final
40 pages
04-Random Processes(2) (1)
No ratings yet
04-Random Processes(2) (1)
37 pages
05 Random Processes
No ratings yet
05 Random Processes
49 pages
Module 3
No ratings yet
Module 3
78 pages
10 12 28 04 39 13 850 Maris
No ratings yet
10 12 28 04 39 13 850 Maris
78 pages
proba 6
No ratings yet
proba 6
39 pages
Lecture13_RandomProcesses
No ratings yet
Lecture13_RandomProcesses
21 pages
Unit 4 ppt
No ratings yet
Unit 4 ppt
24 pages
Lec 5
No ratings yet
Lec 5
3 pages
Note 1469965023
No ratings yet
Note 1469965023
24 pages
PTSP Unit-Iv
No ratings yet
PTSP Unit-Iv
14 pages
UNIT-4-PTSP-PPT
No ratings yet
UNIT-4-PTSP-PPT
40 pages
04-Random Processes
No ratings yet
04-Random Processes
10 pages
Random Processes: 8.1 Basic Concepts
No ratings yet
Random Processes: 8.1 Basic Concepts
14 pages
Random Processes PDF
No ratings yet
Random Processes PDF
37 pages
Slide 23
No ratings yet
Slide 23
16 pages
An Glicky 2016
No ratings yet
An Glicky 2016
130 pages
Chapter 5
No ratings yet
Chapter 5
33 pages
WINSEM2023-24 MCOA503L TH VL2023240501918 2024-02-01 Reference-Material-I
No ratings yet
WINSEM2023-24 MCOA503L TH VL2023240501918 2024-02-01 Reference-Material-I
18 pages
Week 10 - Introduction To Random Processes
No ratings yet
Week 10 - Introduction To Random Processes
13 pages
Chapter 1 - Introduction To Stochastic Processes
No ratings yet
Chapter 1 - Introduction To Stochastic Processes
15 pages
07 Random Processes
No ratings yet
07 Random Processes
37 pages
MIT18 S096F13 Lecnote1
No ratings yet
MIT18 S096F13 Lecnote1
67 pages
UNIT-4-PTSP-PPT
No ratings yet
UNIT-4-PTSP-PPT
48 pages
Basics
No ratings yet
Basics
16 pages
Course 09-2 - Discrete Time Random Signals
No ratings yet
Course 09-2 - Discrete Time Random Signals
40 pages
DISCRETE-TIME RANDOM PROCESS Summary
No ratings yet
DISCRETE-TIME RANDOM PROCESS Summary
13 pages
Random Processes: Saravanan Vijayakumaran Sarva@ee - Iitb.ac - in
No ratings yet
Random Processes: Saravanan Vijayakumaran Sarva@ee - Iitb.ac - in
12 pages
NPT29 Randomproces1
No ratings yet
NPT29 Randomproces1
11 pages
Discrete Time Random Process
No ratings yet
Discrete Time Random Process
35 pages
EE 278B: Stationary Random Processes 7 - 1
No ratings yet
EE 278B: Stationary Random Processes 7 - 1
14 pages
Random TT
No ratings yet
Random TT
12 pages
EE4601 Communication Systems: Week 3 Random Processes, Stationarity, Means, Correlations
No ratings yet
EE4601 Communication Systems: Week 3 Random Processes, Stationarity, Means, Correlations
15 pages
Review of Random Processes
No ratings yet
Review of Random Processes
34 pages
Lecture 14: Random Vibrations & Failure Analysis: Stochastic Processes-1
No ratings yet
Lecture 14: Random Vibrations & Failure Analysis: Stochastic Processes-1
27 pages
Topic 5 - Intro To Random Processes
No ratings yet
Topic 5 - Intro To Random Processes
64 pages
PTSP Notes Unit 3 PDF
No ratings yet
PTSP Notes Unit 3 PDF
11 pages
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
No ratings yet
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
5 pages
Random Processes
100% (1)
Random Processes
12 pages
more exactly, measurable function w.r.t. some σ-algebra
No ratings yet
more exactly, measurable function w.r.t. some σ-algebra
6 pages
Module A
No ratings yet
Module A
43 pages
Module C
No ratings yet
Module C
30 pages
Tutorial6 Delay Aware Controller Design Sol
No ratings yet
Tutorial6 Delay Aware Controller Design Sol
29 pages
Tutorial Statespace Modeling
No ratings yet
Tutorial Statespace Modeling
3 pages

ModuleB 1

Uploaded by

ModuleB 1

Uploaded by

Module B: Random Processes

A random process is a family/ collection of random variables indexed by a

The set T is often interpreted as “time.”

Recall: Xt : Ω → R fix ω: Xt (ω) : function of t is called the sample path.

How do we specify a random process {Xt }t∈T : To fully specify a random

 Deterministic: starting from x0 ∈ Rn for all t ≥ 0

More generally: xt+1 = f (t, xt , . . . , xt−m ) where m is the memory of the

Note that xt = r0t x0 .

 Random process: starting from x0 ∈ Rn for all t ≥ 0

More generally: xt+1 = f (t, xt , . . . , xt−m , wt ) where m is the memory of the

where rt is a non-negative random variable independent of rk for k < t with

 Averaging: suppose that {wt } is an independently and identically distributed

For a random process X = {Xt }t∈T

(b) Autocorrelation function:

RX (t1 , t2 ) := E[Xt1 Xt2 ]

(c) Autocovariance function:

CX (t1 , t2 ) := RX (t1 , t2 ) − µX (t1 )µX (t2 )

If the random process {Xt } is i.i.d., then

µX (t) = E[Xt ] = E[X0 ].

Therefore, we have a constant mean function.

(c) Autocovariance function:

Plus many other properties are true.

A random process is Strict Sense Stationary (SSS) if the (finite) joint

FXt1 ,...,Xtk (α1 , . . . , αk ) = FXt1 +s ,...,Xtk +s (α1 , . . . , αk )

for all −t1 ≤ s.

FXt1 ,...,Xtk (α1 , . . . , αk ) = FXt1 (α1 ) · · · FXt1 (αk ) = FX (α1 ) · · · FX (αk ).

A random process is Wide Sense Stationary (WSS) if

For two random processes {Xt }t∈T and {Yt }t∈T ,

µX (k) = E[Xk−1 + Zk−1 ] = E[Xk−1 ].

Therefore, µX (k) = µX (k − 1) = . . . = µX (0) = 0.

RX (k1 , k2 ) = E[Xk1 Xk2 ] = E[Xk1 (Xk2 − Xk1 + Xk1 )]

Therefore, RX (k1 , k2 ) = min(k1 , k2 )σ 2 . Thus, such a process is not WSS

 Example: for a deterministic α > 0 and frequency ω, let Xt = α cos(ωt + θ)

– The correlation function:

RX (t1 , t2 ) = E[α cos(ωt1 + θ)α cos(ωt2 + θ)]

 Some properties of a WSS process {Xt }:

Proof on board in class.

 Statistical mean: µX (t) = E[X(t)] =

It is implicit that for ergodic process, E[Xt ] = µX (t) = µX for all t.

In other words, XT converges to µX in mean-square sense.

 If X(t) is WSS, is Y (t) also WSS?

Yes. Derivation in class.

 Are X(t) and Y (t) jointly WSS?

Yes. Derivation in class. We can show that

 We can further show that

 A discrete-time random processs (Xn )n∈N is a collection of random variables

 Mean function µX [n] = E[Xn ].

 Autocorrelation function RX [n1 , n2 ] = E[Xn1 Xn2 ].

 Autocovariance function CX [n1 , n2 ] = cov(Xn1 , Xn2 ).

 Cross-correlation function RXY [n1 , n2 ] = E[Xn1 Yn2 ].

 For X to be W.S.S, the following properties need to be satisfied.

 Properties such as ergodicity and output of LTI system to a WSS input

 Markov Process: A random process whose probability distribution at time

Pr(Xk+1 ∈ A | Xki , . . . , Xk1 ) = Pr(Xk+1 ∈ A | Xki ),

for any k1 < k1 < . . . < ki < k.

 Definition: We say that a (DT) random process {Xk } is a Markov chain

Pr(Xk+1 = s | Xki = si , . . . , Xk1 = s1 ) = Pr(Xk+1 = s | Xki = si ).

πk (i) = Pr(Xk = i).

Pk1 ,k2 (i, j) = Pr(Xk2 = j | Xk1 = i).

 We also (naturally) define Pk,k := I, where I is the n × n identity matrix.

 Definition: We say that a n × n matrix A is a row-stochastic matrix if (i) A

– For any k ≤ m, we have:

This follow from the fact:

 Properties of the transition matrices cont.:

Pk,q = Pk,m Pm,q .

To show this, let i, j being fixed. Then, we have

Pk,q (i, j) = Pr(Xq = j | Xk = i)

This property is widely known as Chapman-Kolmogorov equation.

– For DS Markov chains, the second property, and the Chapman-Kolmogorov

πk = π1 P1,k = π1 P1,2 P2,k = · · · = π1 P1,2 P2,3 · · · Pk−1,k .

Definition: We say that a Markov chain {Xk } is (time-)homogeneous if P1,2 =

 Denote P := Pm,m+1 . P is called the one-step transition matrix of the

 For Homogeneous Markov chains, we have Pm,n = P n−m .

 Distribution of Xk is given by πk = πk−1 P = π0 P k .

 With the abuse of notation, for a Homogeneous Markov chain P is also

 Consider a homogeneoys MC on state space S with TPM P .

Deterministic: starting from x0 ∈ Rn for all t ≥ 0

Random process: starting from x0 ∈ Rn for all t ≥ 0

Averaging: suppose that {wt } is an independently and identically distributed

Example: for a deterministic α > 0 and frequency ω, let Xt = α cos(ωt + θ)

Some properties of a WSS process {Xt }:

Statistical mean: µX (t) = E[X(t)] =

If X(t) is WSS, is Y (t) also WSS?

Are X(t) and Y (t) jointly WSS?

We can further show that

A discrete-time random processs (Xn )n∈N is a collection of random variables

Mean function µX [n] = E[Xn ].

Autocorrelation function RX [n1 , n2 ] = E[Xn1 Xn2 ].

Autocovariance function CX [n1 , n2 ] = cov(Xn1 , Xn2 ).

Cross-correlation function RXY [n1 , n2 ] = E[Xn1 Yn2 ].

For X to be W.S.S, the following properties need to be satisfied.

Properties such as ergodicity and output of LTI system to a WSS input

Markov Process: A random process whose probability distribution at time

Definition: We say that a (DT) random process {Xk } is a Markov chain

We also (naturally) define Pk,k := I, where I is the n × n identity matrix.

Definition: We say that a n × n matrix A is a row-stochastic matrix if (i) A

Properties of the transition matrices cont.:

Denote P := Pm,m+1 . P is called the one-step transition matrix of the

For Homogeneous Markov chains, we have Pm,n = P n−m .

Distribution of Xk is given by πk = πk−1 P = π0 P k .

With the abuse of notation, for a Homogeneous Markov chain P is also

Consider a homogeneoys MC on state space S with TPM P .

Let G = (V, E, P ) be the graph associated with a MC with TPM P . A

In other words, there exists nodes i1 , i2 , . . . , ik such that (i, i1 ) ∈ E, (i1 , i2 ) ∈

Two states i and j communicate if i → j and j → i. This is denoted by

A subset of states C ⊆ V is a communicating class if

Graph theoretic interpretation: gcd of lengths of all paths from i to itself.

Let P ∈ Rn×n be the single-step transition probability matrix of a homoge-

Let π0 be the distribution of initial state X0 . It follows that πn = π0 P n .

Solving for (u, v)P = (u, v) with v = 1 − u, we get u = 2

Give a matrix A ∈ Rn×n , we define its spectral radius as

An eigenvalue of A is called semi-simple if its algebratic multiplicity =

The matrix A is called

In this case, TPM P is irreducible. (why?)

From PF Theorem, largest eigenvalue 1 is simple, and left eigenvector is

A MC which is both irreducible and aperiodic is called ergodic.

We can show that

In addition, for any initial distribution π0 , we have

Such a markov chain is called a unichain. The TPM P is no longer irre-

Though P is not irreducible, the submatrix PRR is irreducible which has a

Then, the vector π ⋆ = [πR

If the states in the recurrent class is apeiodic, then P is semi-convergent.

The TPM P can be partitioned as P

Then, the vector [0 πi⋆ . . . 0] is a stationary distribution of P . Thus,

Every recurrent class adds one multiplicity to the eigenvalue 1.

P is semi-convergent only when every recurrent class is aperiodic. In this

Let the initial state X0 = i.

Ti := inf{k ≥ 1 | Xk = i} (first passage time): smallest time index at

fi := P(Ti < ∞): return probability

mi := E[Ti ]: mean return time

State i is recurrent if and only if fi = 1. State i is transient if and only if

Original idea of Google search ranking: Model a browsing person as a random

Problem with this? Corresponding Markov chain is not irreducible.

A stochastic process which is both Gaussian and Markov is called Gauss-

A stochastic process {Xt }t∈T is said to have independent increments if

Brownian Motion/Wiener Process: A stochastic process {Xt }t∈T is

Deterministic discrete-time dynamical system in state-space form is given

Note that {w1 , w2 , . . . , } is a discrete-time random process, as is {x1 , x2 , . . . , }.

Example: xk+1 = axk + wk where wk ∈ N (c, 1) and x0 = 5. What will the

Recall the state-space form: xk+1 = fk (xk , uk , wk ), k = 0, 1, . . . .

The alternative approach is to directly specify the distribution of xk+1 instead

Is the above property always true?

In many instances, the states can not be directly measured.

Instead, we observe “output” quantities that depend on the state as

Alternatively, the conditional distribution of yk given xk is specified.