ModuleB 1
ModuleB 1
1
1 w.p
3
Example Xt = cos(2π ωt) where ω the random outcome ω = 2 w.p 1
3
1
3 w.p
3
1
Deterministic vs Stochastic Dynamical Systems
xt+1 = f (t, xt ).
xt+1 = f (t, xt , wt ).
xt+1 = rt xt ,
2
Examples of Random Processes
txt = (t − 1)xt−1 + wt
1 1
xt = (1 − )xt−1 + wt
t t
xt = ft (xt−1 , wt )
1 1
ft (x, w) = (1 − )x + w.
t t
What happens if we use other weights such as: xt = w1 +...+w
√
t
t
?
What if we don’t have any weights at all, i.e., xt = w1 + . . . + wt ? What
happens?
What can we say about asymptotic behavior of such processes in general?
3
Terminology
µX (t) := E[Xt ]
4
For an i.i.d. process
5
Stationary Processes
6
Example: Random Walk
Let {Xk } be a random walk, given by Xk+1 = Xk + Zk where {Zk } is i.i.d. with
zero mean and variance σ 2 and X0 = 0 a.s.
(a) For the mean function:
7
Continuous Time Random Processes
8
Properties of WSS Processes
9
Ergodic Behavior
However, suppose we have a single sample path of the random process given
by x1 (ω0 ), x2 (ω0 ), . . .. Then, we can find the temporal mean and au-
tocorrelation as
1 T
Z
X(ω0 ) = xt (ω0 )dt
T 0
1 T
Z
RX (τ ) = xt+τ (ω0 )xt (ω0 )dt
T 0
Do the temporal and statistical averages coincide? Yes, when the process is
ergodic. The random process {Xt }t∈T is ergodic when
1 T
Z
E[Xt ] = lim Xt (ω0 ) dt.
T →∞ T 0
Mean-Square Ergodic Theorem: Let {Xt }t∈T be a wide sense stationary pro-
cess with E[Xt ] = µX and auto-correlation RX (τ ), and let the Fourier
1
RT
transform of RX (τ ) exists. Let XT (ω) = 2T −T Xt (ω) dt. Then,
lim E[(XT − µX )2 ] = 0.
T →∞
The implication of the above theorem is that, we can approximate mean/ Corre-
lation by temporal average computed from a single sample path.
10
Random Process and LTI System
Suppose we have a LTI system with impulse response h(t). If we apply input
signal x(t) to this system, the output signal y(t) is given as
Z ∞
y(t) = h(τ )x(t − τ )dτ =: x(t) ⊛ h(t).
∞
Now, suppose the input X(t) is a random process with mean µX (t) and
autocorrelation RX (t1 , t2 ). Determine the mean and autocorrelation of Y .
RY X (τ ) = h(τ ) ⊛ RX (τ )
11
Power Spectral Density (PSD)
R∞
From the above discussion, we have RY X (τ ) = ∞ h(s)RX (τ − s)ds.
For a CT WSS process X(t) (that is integrable), we can find the “power
spectral density” at frequency ω (rad/s):
Z ∞
SX (ω) := F T [RX (τ )] = RX (τ )e−jωτ dτ
−∞
Thus, SY X (ω) = H(ω)SX (ω) where H(ω) is the Fourier transform of the
impulse response h(t).
RY (τ ) = h(τ ) ⊛ RXY (τ )
=⇒ SY (ω) = H(ω) × SXY (ω)
In addition, SY X (ω) = H(ω) × SX (ω)
Since, RXY (τ ) = RY X (τ ), we have SXY (ω) = SY X (ω)∗
=⇒ SY (ω) = |H(ω)|2 SX (ω).
12
Discrete-time WSS Processes
13
Module B.2: Markov Chains
Pr(Xk+1 ∈ A | Xk , . . . , X1 ) = Pr(Xk+1 ∈ A | Xk ).
More generally
14
Formal Definition
S is called the state space and each s ∈ S is called a state. Relation (1) is
called Markov property.
If S is finite, {Xk } is called a finite state Markov chain.
15
Transition Probabilities
From this point on assume S is a finite set with elements, S = {1, . . . , n}.
Unless otherwise stated, many of the following discussions hold for n = ∞
but for convenience we assume that n is finite.
For any k, let πk be the (marginal) probability mass function Xk , i.e.,
Note that the vector πk is non-negative and di=1 πk (i) = 1. Such a vector
P
is called a stochastic (sometimes probability) vector. It is convenient to
assume that πk is a row vector.
For any 1 ≤ k1 < k2 , define the matrix (array)
Pk1 ,k2 ∈ Rn1 ×n2 is called the transition matrix of MC from time k1 to time
k2 . In other words,
πk2 = πk1 Pk1 ,k2 .
16
Properties of Transition Matrices
πm = πk Pk,m .
17
Properties of Transition Matrices cont.
18
Homogeneous Markov Chains
P is a row-stochastic matrix.
For homogeneous markov chains, the initial distribution and the one-step
TPM completely specifies the random process.
19
Graph-Theoretic Interpretation
0.9 0.95
0.05
0.8
20
Classification of States
A path is a walk where no two nodes are repeated. A cycle is a walk where
the first and last nodes are identical and no other node is repeated.
Equivalently, Pi,i1 > 0, Pi1 ,i2 > 0, . . . , Pik ,j > 0. Thus, [P k+1 ]i,j > 0.
Naturally, if i ↔ j, j ↔ k, then i ↔ k.
21
Classification of States Cont.
A matrix P is irreducible if for any i, j, [P kij ]ij > 0 for some kij ≥ 1. In
other words, i ↔ j for every pair of states i, j.
Graph theoretic interpretation: P is irreducible if there is a directed path
between any two nodes on the graph.
In this case, there is a single communicating class which is recurrent.
22
Stationary and Limiting Distribution of a Markov Chain
23
Example
1 1
Let P = 2
1
2
2 .
3 3
24
Linear Algebra Viewpoint
25
Perron-Frobenius Theorem
A matrix A ∈ Rn×n is
non-negative if Aij ≥ 0 for all i, j.
Pn−1
irreducible if k=0 Ak is positive, i.e., all entries are strictly larger
than 0.
primitive if there exists some k̄ such that Ak̄ > 0.
positive if Aij > 0 for all i, j.
26
Case 1: MC with Single Recurrent Class
However, if the states have period d > 1, then there are d eigenvalues on
the unit circle that are equally spaced. Such a matrix is not primitive, and
hence not semi-convergent.
When the states are aperiodic (i.e., period d = 1), then it is primitive, and
P is semi-convergent.
where v is the right eigenvector and w is the left eigenvector of 1. Note that
w = π⋆.
lim πk = lim π0 P k = π0 P∞ = π ⋆ .
k→∞ k→∞
27
Case 2: MC with one Recurrent Class and some Transient
States
where the first m1 states belong to the recurrent class, and the remaining
states being transient.
The following result characterizes the uniqueness and limiting behavior of the
stationary distribution.
28
Case 3: MC with Multiple Recurrent Classes
For each recurrent class, the corresponding submatrix PRi is irreducible which
has a unique stationary distribution πi⋆ ∈ R1×mi .
29
Ergodic Property
Theorem: Suppose the TPM is irreducible and let π ⋆ be the unique sta-
tionary distribution. Then, mi = π⋆1(i) for all states i.
30
Application: Page-Rank Algorithm
P̂ = (1 − a)P + aJ,
31
Vector-valued Random Process
A random process X = {Xt }t∈T may be such that each Xt is a random vector
taking values in Rn . Then,
(a) Mean function:
µX (t) := E[Xt ] ∈ Rn
32
Other Class of Processes
A stochastic process {Xt }t∈T is called a Gaussian Process if for every finite
set of indices t1 , t2 , . . . , tk , the collection of random variables Xt1 , Xt1 , . . . , Xtk
is jointly Gaussian.
The increments are stationary if Xt2 − Xt1 and Xt2 +s − Xt1 +s have the same
distribution irrespective of the value of s.
33
Dynamical System
34
Stochastic Dynamical System
Stochastic Model: the future state is uncertain even if the current state
and input are known. There are two ways of representing such a system.
Both are equivalent under reasonable assumptions.
State-space form:
xk+1 = fk (xk , uk , wk ), k = 0, 1, . . . ,
35
Stochastic Linear System
xk+1 = Ak xk + Bk uk + wk .
Problem: recursively determine the mean and variance of xk given that E[wk ] = 0,
var(wk ) = Q and x0 is known.
36
Representation via Transition Kernel
Here, the distribution of xk+1 can be found in terms of the function fk and
indirectly, as a function of basic random variables (x0 , w0 , . . . , wk ).
For the dynamical system to be Markovian, we need to show that for every
Borel subset A and for all k,
37
Observation Model
yk = gk (xk , vk ),
38