Subsp Id
Subsp Id
Rewritting this as
x(t + 1) A B x(t)
= (1)
y(t) C D u(t)
x(t + 1) A B
where Y (t) = can be taken as the measurement, Θ = is the unknown to be determined
y(t) C D
x(t)
and φ(t) = is the regressor. Then we can find Θ by LS.
u(t)
Consider an LTI system at rest with u(t) = 0 for t 6= 0 (impulse input). Then,
(
Du(0), if t = 0
y(t) = t−1
(2)
CA B, if t > 0.
Let
C
CA
k−1
Ck = B AB · · · A B , Ok =
.
(4)
.
CAk−1
be the controllability and the observability matrices for time index k > 0 (usually k > n since we do not
know the system order). Define a block Hankel matrix
CAB · · · CAk−1 B
G(1) G(2) ··· G(k) CB
· · · G(k + 1) 2 CAk B
CAB CA B · · ·
G(2) G(3)
H= . . ··· . = .
. ··· . = Ok C k .
(5)
. . ··· . . . ··· .
G(k) G(k + 1) · · · G(2k − 1) CAk−1 CAk B · · · CA2k−2 B
We assume throughout that (A, B) is reachable/controllable and (C, A) is observable i.e., (A, B, C) is
minimal.
Definition 0.1. A deterministic sequence u of length N is PE (persistently exciting) of order k if
u(0) u(1) · · · u(N − k)
u(1) u(2) · · · u(N − k + 1)
T
rank(UN −1 ) = rank(
. . ··· . ) = k.
(6)
. . ··· .
u(k − 1) u(k) · · · u(N − 1)
If the input is a vector process u ∈ Rm , then this condition becomes rank(UN −1 ) = km.
1
Pk−1 The PE condition on input signals is crucial in system identification. Consider the FIR model y(t) =
T
i=0 gi u(t−i)+e(t). We want to find the impulse response θ := (gk−1 , . . . , g1 , g0 ) based on {u(t), y(t), t =
0, 1 . . . , N − 1}. Let
y(k − 1) e(k − 1)
y(k) e(k)
yN −1 = . , eN −1 =
. .
. .
y(N − 1) e(N − 1)
Then,
yN −1 = UN −1 θ + eN −1 (7)
and the impulse response coefficients can be found by solving the LS problem minθ kyN −1 − UN −1 θk which
requires UN −1 to be of full column rank i.e., the input sequence u to be PE.
For a zero mean stationary process u ∈ Rm , define the covariance matrix as
ΛT · · · ΛT
Λuu (0) uu (1) uu (k − 1)
Λuu (1) Λuu (0) · · · ΛT
uu (k − 2)
T
Λ̄uu (k) = limN →∞ UN UN = . . ··· .
(8)
. . ··· .
Λuu (k − 1) Λuu (k − 1) · · · Λuu (0)
Ho-Kalman method
• We give impulse input to system and obtain impulse response coefficients G(t).
• Let H = U ΣV T be the SVD of H and let H = U1 Σ1 V1T be the reduced SVD where Σ1 is nonsingular
with n singular values λ1 ≥ λ2 ≥ · · · ≥ λn > 0. The rank of the Hankel matrix H determines the
dimension of the state space.
1/2 1/2 1/2
• Consider Σ1 and construct Ok = U1 Σ1 and Ck = Σ1 V1T . Clearly, Ok Ck = H. We can also
1/2 1/2
define Ok = U1 Σ1 T and Ck = T −1 Σ1 V1T for any invertible matrix T .
• D = y(0) = G(0).
• From the definition of Ck , B = Ck (:, 1 : m) where m is the number of inputs given at a single time
instance.
• Observe that
C CA
CA CA2
. A = . = Ok (p + 1 : pk, :) = Õk ⇒ Ok−1 A = Õk
. .
CAk−2 CAk−1
T
⇒ Ok−1 T
Ok−1 A = Ok−1 T
Õk ⇒ A = (Ok−1 Ok−1 )−1 Ok−1
T
Õk .
A can also be found from Ck−1 and Ck using LS.
2
The method of obtaining (A, B, C, D) matrices up to a similarity transform from the impulse response
sequence is also called realization problem. (The impulse response and the transfer function can be found
from each other.) Limitations of this method is the need of impulse response coefficients rather than general
input-output data. The problem of finding the dimension n and the system matrices (A, B, C, D) up to a
similarity transform is called the subspace identification problem for deterministic LTI systems.
Data matrices
Consider a discrete LTI system at rest. Suppose the input-output data u = (u(0), . . . , u(N − 1)) and
y = (y(0), . . . , y(N − 1)) are given (N sufficiently large). Then for k > 0,
0 ··· 0 u(0) ··· u(N − k)
.. . .
.. ..
. u(1) ··· u(N − k + 1)
y(0) ··· y(N − 1) = g(k − 1) · · · g(0) .. ..
. .
.. ..
0 . ··· .
u(0) u(1) ··· u(k − 1) · · · u(N − 1)
If the matrix on the right hand side with inputs has full rank, then the impulse response can be found by
linear LS.
Consider block Hankel matrices formed by inputs and outputs
u(0) u(1) · · · u(N − 1)
u(1) u(2) · · · u(N )
∈ Rkm×N
U0|k−1 = . . ··· . (9)
. . ··· .
u(k − 1) u(k) · · · u(k + N − 2)
y(0) y(1) · · · y(N − 1)
y(1) y(2) · · · y(N )
∈ Rkp×N .
Y0|k−1 = . . · · · . (10)
. . ··· .
y(k − 1) y(k) · · · y(k + N − 2)
Define
y(t) u(t)
y(t + 1) u(t + 1)
kp ∈ Rkm
yk (t) = . ∈ R , uk (t) = .
. .
y(t + k − 1) u(t + k − 1)
Then,
yk (t) = Ok x(t) + Ψk uk (t), t = 0, 1, . . . . (12)
Let
X0 = x(0) x(1) · · · x(N − 1) ∈ Rn×N .
Then,
Y0|k−1 = Ok X0 + Ψk U0|k−1 . (13)
3
Similarly,
Yk|2k−1 = Ok Xk + Ψk Uk|2k−1 . (14)
where
Xk = x(k) x(k + 1) · · · x(N + k − 1) , Yk|2k−1 = y(k) y(k + 1) · · · y(N + k − 1)
Uk|2k−1 = u(k) u(k + 1) · · · u(N + k − 1) .
Assumption 1. We assume that the following conditions are satisfied by the exogeneous inputs and the
initial state matrix X0 .
• rank(X0 ) = n.
• span(X0 )∩ span(U0|k−1 ) = {0} where span(.) denotes the space spanned by the row vectors of a
matrix.
Lemma 0.2. Suppose the above conditions are satisfied and rank(Ok ) = n. Then,
U0|k−1
rank( ) = km + n. (15)
Y0|k−1
Lemma 0.3 (Persistence of excitation). Consider the data matrix W0|k−1 from (17). Then, under the as-
sumptions of Lemma 0.2, any input-output pair
ũ(t) ỹ(t)
ũ(t + 1) ỹ(t + 1)
ũk (t) =
. , ỹk (t) =
.
. .
ũ(t + k − 1) ỹ(t + k − 1)
4
Note that the persistence of excitation ensures that any input-output pair belongs to the column span of
W0|k−1 . This ensures that the data matrix W0|k−1 is informative for system identification.
LQ decomposition: Consider the following LQ decomposition of the data matrix
T
U0|k−1 L11 0 Q1
= (19)
Y0|k−1 L21 L22 QT 2
where L11 ∈ Rkm×km , L22 ∈ Rkp×kp are lower triangular, L21 ∈ Rkp×km and Q1 ∈ RN ×km , Q2 ∈ RN ×kp
are orthogonal. Note that the LQ decomposition is obtained by taking the transpose of the well known QR
decomposition (Gram-Schmidt algorithm).
Lemma 0.4. Under Assumption 1, each column of L matrix is an input-output pair; in particular, each
column of L22 contains a zero-input response of the system. Moreover, rank(L22 ) = n, i.e., the dimension
of the system.
We need to compute the SVD of L22 to recover information about Ok and to recover A and C (using
the method used in Ho-Kalman).
MOESP
From Lemma 0.4 and Assumption 1, L11 is nonsingular (since U0|k−1 is full row rank, L11 must be full row
−1
rank i.e., square and full row rank) and QT
1 = L11 U0|k−1 . Therefore,
Since Q1 , Q2 are orthogonal, the two terms in the rhs above are orthogonal to each other. Let PY0|k−1 |U0|k−1
be the orthogonal projection of the row space of Y0|k−1 onto the row space of U0|k−1 . Therefore,
−1
PY0|k−1 |U0|k−1 = L21 QT
1 = L21 L11 U0|k−1 . (21)
⊥
Similarly, the orthogonal projection of the row space of Y0|k−1 onto the row space of U0|k−1 is
PY0|k−1 |U ⊥ = L22 QT
2. (22)
0|k−1
Ok X0 + Ψk L11 QT T T
1 = L21 Q1 + L22 Q2 (23)
The RHS is an orthogonal direct sum whereas, the LHS is only a direct sum. Therefore, multiplying both
the sides by Q2 and using orthogonality,
Ok X0 Q2 = L22 . (24)
Under Assumption 1, X0 Q2 has full row rank because X0 ∈ Rn×N has full row rank and Q2 ∈ RN ×kp
has full column rank equal to kp > n. Rank(Ok ) = n since we assume that k is sufficiently large and
5
the system to be identified is controllable and observable. Therefore, rank(L22 ) = n. It also follows from
Lemma 0.4 that rank(L22 ) = n.
Let
Σ1 0 V1T
L22 = U1 U2 (25)
0 0 V2T
and let L22 = U1 Σ1 V1T be the reduced SVD. Then, Ok X0 Q2 = U1 Σ1 V1T . Define the extended observabil-
ity matrix as
1/2
Ok = U1 Σ 1 (26)
and n = dim(Σ1 ). The matrix C is given by
C = Ok (1 : p, 1 : n) (27)
Estimation of B, D requires more work. Observe that U2T L22 = 0 and U2T Ok = 0 (follows from (25) and
(26)).
From (23), it follows that
U2T Ψk L11 QT T T
1 = U2 L21 Q1 . (29)
Post-multiplying by Q1 and simplifying
D 0 ··· 0 0
CB D ··· 0 0
−1
U2T T
. . ··· . . = U2 L21 L11 (30)
. . ··· . .
k−2
CA B CA B k−3 · · · CB D
L1 D + L2 CB + · · · + Lk CAk−2 B = M1
L2 D + L3 CB + · · · + Lk CAk−3 B = M2
.
.
Lk−1 D + Lk CB = Mk−1
Lk D = Mk
become
L1 L̄2 Ok−1 M1
L2 L̄3 Ok−2
M2
. . D = . .
. (31)
. B
.
Lk−1 L̄k O1 Mk−1
Lk 0 Mk
Solving the above LS problem gives matrices B, D.
6
N4SID
Let k > n and define
where the subscripts p and f denote the past and future. Recall that
Yp = Ok Xp + Ψk Up , Yf = Ok Xf + Ψk Uf . (33)
Define
Up U0|k−1 Uf Uk|2k−1
Wp := = , Wf := = (34)
Yp Y0|k−1 Yf Yk|2k−1
Assume that Assumption 1 holds with k replaced by 2k and the third condition of Assumption 1 is satisfied
for Xf and Uf . This implies that Yf = Ok Xf + Ψk Uf in (33) is a direct sum decomposition.
The N4SID algorithm [2]:
Uf
Up
1. Compute the LQ decomposition of Yp .
Yf
T
Uf R11 0 0 Q̄1 Uf R11 0 0
Wp = R21 R22 0 Q̄T
2
=⇒ Wp Q̄1 Q̄2 Q̄3 = R21 R22 0 .
Yf R31 R32 R33 T
Q̄3 Yf R31 R32 R33
Recall Equation (18) and Lemma 0.4. Therefore, columns of R33 correspond to zero input response
vectors of Yf with Uf = 0 and Up = 0, Yp = 0 (by the block structure of the lower triangular block
in LQ decomposition). Thus, R33 must be zero.
Therefore,
R11 0 0 Q̄T
Uf 1
Wp = R21 R22 0 Q̄T
2 ,
(35)
Yf R31 R32 0 T
Q̄3
i.e.,
−1
Uf = R11 Q̄T T
1 =⇒ Q̄1 = R11 Uf (36)
−1
Wp = R21 R11 Uf + R22 Q̄T
2 (37)
T T −1
Yf = R31 Q̄1 + R32 Q̄2 = R31 R11 Uf + R32 Q̄T
2. (38)
3. Let ξ = U1 Σ1 V1T be the reduced SVD of ξ. Then, n = rank(Σ1 ) ([2, Theorem 6.3])
1/2 1/2
6 0, Xf = T −1 Σ1 V1T .
Ok := U1 Σ1 T, |T | = (41)
Recall from (32) that Xf = Xk = x(k) x(k + 1) · · · x(N + k − 1) .
1
this is a delicate point, I’m not entirely convinced about the proof presented in [2]
7
4. Define
X̄k+1 := x(k + 1) · · · x(k + N − 1) , X̄k := x(k) · · · x(k + N − 2) (42)
Ūk|k := u(k) · · · u(k + N − 2) , Ȳk|k := y(k) · · · y(k + N − 2) . (43)
Therefore,
X̄k+1 A B X̄k
= (44)
Ȳk|k C D Ūk|k
Note that in N4SID method, one gets constructs the state estimates Xf as well along with system pa-
rameters (A, B, C, D). So a separate Kalman filter is not required for the state estimation.
Modified N4SID:
Ok Xf Q̄2 Q̄T T
2 = R32 Q̄2 =⇒ Ok Xf Q̄2 = R32 . (45)
4. Now substituting R32 in place of ξ in the previous N4SID algorithm, we can obtain state estimates
X̃f along with (A, B, C, D) parameters.
R31 Q̄T T T
1 + R32 Q̄2 = Ok Xf + Ψk Uf = Ok Xf + Ψk R11 Q̄1
=⇒ R31 Q̄T T T
1 Q̄2 + R32 Q̄2 Q̄2 = Ok Xf Q̄2 + Ψk R11 Q̄1 Q̄2
Ok Xf Q̄2 = R32 . (47)
3. Let R32 = U1 Σ1 V1T be the reduced SVD of R32 . Then n =rank(Σ1 ) (system order since rank(Ok ) =
rank(Xf ) = n, Q̄2 ∈ RN ×kp with full column rank = kp > n).
1/2
4. Let Ok = U1 Σ1 T, |T | =
6 0 and proceed as in MOESP to identify (A, B, C, D).
Notice that we don’t construct state estimates when using MOESP approach. One needs to build a Kalman
filter from (A, B, C, D) parameters to estimate states. The upshot is that N4SID gives unknown system
parameters as well as the state estimates. MOESP on the other hand gives only system parameters but is
simpler to understand as it involves only orthogonal projections and SVDs.
8
Appendix
Suppose Rn is given by a direct sum of V and W i.e., Rn = V ⊕ W where V ∩ W = {0}. Then x ∈ Rn and
x = v + w where v = PV||W (x), w = PW ||V (x) are projections of x on V along W and the other way round
i.e.,
x = PV||W (x) + PW ||V (x).
Let A, B, C be the row spaces generated by rows of matrices A, B, C respectively. Then, for x ∈ A,
PB+C (x) = PB C
||C (x) + P||B (x) (48)
References
[1] H. Asada, Identification, Estimation and Learning, MIT, Lecture notes and video lectures, 2021.
[3] L. Ljung, System Identification, Theory for the user, PHI, 2nd Edition, 1999.