A Framework For Subspace Identification Methods
A Framework For Subspace Identification Methods
I-
Yk-p
Viberg (1995) gave an overview of SIMs and classified 'k-p
them into realization-based or direct types, and also pointed 'k-p+l Yk-p+l
out the different ways to get system matrices via estimated E, = yp=
- :
states or extended observability matrix. Van Overschee and ut-2 Yk-2
De Moor (1995) gave a unifying theorem based on lower 'k-1 Yk-I
order approximation of an oblique projection. Here different
methods are viewed as different choices of row and column
weighting matrices for the reduced rank oblique projection.
The basic structure and idea of their theorem is based on
trying to cast these methods into the N4SID algorithm. It
3679
iii) Then fit the estimated states to a state space method to estimate H, implicitly and the predictable
model. subspace via QR decomposition on [U,; Y,].
The major differences among SIMs are in the first two If the input sequences are auto-correlated, this method
steps and the third step is the same. The original MOSEP regresses part of the state effect away and gives a biased
algorithm extracts 4 from the estimated subspace. Here result for the predictable subspace. SVD on this subspace
MOESP is analyzed based on estimated states that come
from exactly the same subspace as 4 (also refer to Van
will gives an asymptotical unbiased estimation of
however, the estimation of X , will be biased.
c,
Overschee and De Moor, 1995). 2. Regression Yf against [Yp; Up; Ufl (N4SID)
Based on (6), we know I-Jk in ( 5 ) can be estimated by a
3. Estimation of the Predictable linear combination of the past inputs U,,and past outputs Y,,.
Subspace It is a natural choice to regress Y,against [U,;Up;U,]. Here
3.1 Linear Regression for HIto Estimate I'Jk
the regression coefficient for U, is an estimate of Hf (4)
and the part corresponding to the past data is an estimation
In SIMs, the predictable subspace I-PX, should be first of the predictable subspace, which is equivalent to
estimated in order to have a basis for estimation of states x
k projection Yf& U,onto the past data. This estimation will
or 4 matrix. The central problem is how to remove the
future input effects H,U, away from Y, in (5) in order to have a slight bias if the input signals are auto-correlated.
obtain a better estimate of the predictable subspace I-&. This bias will occur because of the correlation between past
The coefficient matrix H, is unknown and needs to be outputs and the past noise terms in (7).
estimated. This is the method used in N4SID to estimate Hf and
Hfshows the effects of U, on Yf,and consists of the the predictable subspace. It is realized by QR decomposition
first f steps of impulse weights on lower diagonals (SISO) of [U,; PIo; Y,](Plo=[Yp; U,,]).
The PO-MOESP (1994, past
or block weights on block lower diagonals (MIMO). The output (PO-) MOESP) gives similar results.
true H, is a lower block triangular matrix. These features (or 3.Constructing Hf from impulse weights of ARX model
requirements of H,) are very informative; however, most (CVN
algorithms do not make full use of these features. Different The nature of H, implies that it can be constructed by the
algorithm uses different method to estimate Hf from the first f impulse block weights. These impulse weight blocks
input and output data sets. There are quite a few ways for can be estimated from a simple model, such as an ARX
this task; however, they all belong to the linear regression model or FIR model, which can be obtained by regressing yk
method. against uk (if DzO), past inputs (U,) and past outputs (U )
4,
Once H, is estimated, say Yr&U, is an estimation p- .
The predictable subspace then is estimated as YfH,U,.
of the predictable subspace. This estimation includes the It includes all the future noise. This is the method some
effects of the estimation errors in 4 and the effects of CVA algorithms use to estimated H, and the predictable
future stochastic signals, which can be removed away by subspace.
projection to the past data. This projection procedure may 4. Regression out method
induce some error; however, in most cases it is less than the U, can be regressed out of both sides of (7) by projecting to
unpredictable future noise. Some subspace identification the orthogonal space of U , i.e., by post multiplying both
methods, such as N4SID, do the estimation of H, and side by P,,,=I-U~(UfU:)-IUp This removes away the U,
projection onto the past data sets in one step. term from the equation, and the coefficient matrices for past
data in (7) can be obtained by regressing Y,Puf0 against
-3.2 Methods Used to Estimate Hf Plflufo.This result is equivalent to that from N4SID, and
was implied in Van Overschee and De Moor (1995). The
1. Regression Y, against U, (MOESP) method has also been applied to the CVA method (refer to
Since H, is the coefficient matrix relating U, to U , it is Van Overschee and De Moor, 1995; Carette, 2000). See
natural to try to get Hf by directly performing LS regression next section for more discussion.
of Y, against U, as in (5). A basic assumption for an Another similar approach is to regress past datap out of
unbiased result is that the fiture inputs are un-correlated both sides of (7) (projecting to the orthogonal space of Pro,
with the noise terms in (7), which will also include the post multiplied by P,= I-P,oT(PIfl~oT)-'PIo) for the
rak
effect of state variables in this case. This method gives estimation of Hp This tums out to be equivalent to the
an unbiased result only when the inputs are white noise approach of N4SID.
signals. Once Hf is estimated, the predictable subspace is 5Instrumental Variable Method.
estimated as Y, -H,Up The original MOESP uses this If there is a variable that is correlated to Ufbut has no
3680
correlation with X,and the future noise, an unbiased Hf can 4.2 Methods Used for State Estimation
be estimated by the instrumental variable (IV) method based
on (5). For auto-correlated inputs, U, correlates with x k 1. PCA (MOSEP and N4SID)
through its correlation with U,, therefore the part of U , Both N4SID and MOESP extract X, by doing PCA on the
which has no correlation with past data, has no correlation estimated predictable subspace, which is essentially a SVD
with X,.This part of U, can be constructed by regressing U, procedure. This implies assumptions that && has a larger
out of U , and take the residual as the IV. Once Hf is variation than that of the estimation error and the two parts
estimated, the predictable subspace can be easily estimated. are uncorrelated. The frst assumption is well satisfied if the
All these estimation methods are variants of the linear signal to noise ration is large, and this ensures the frst n
regression method they differ only in their choice of the PCs are the estimated state variables. The second
independent and dependent variables, of regression assumption is essentially for the unbiasness of the
sequences, and the degree of utilization of knowledge about estimation, and this is not satisfied in case of auto-correlated
the features of H p The key problem comes from the inputs.
correlation between U, and xk, which arises from auto- The state-based MOESP (in original algorithm) directly
correlation of the input sequence in the open loop case. The uses PCA (SVD) on the estimated predictable subspace Y,
estimation accuracy (bias and variance) in each method
depends on the input signal, the true model structure, and
&Uf, where 4 is obtained by directly regressing Yfonto
U , and PCs are taken as estimated states. This estimated
the signal to noise ratio (SNR).
predictable subspace includes all the future noise and is not
predicted by the past data, and therefore PCA results have
4. Estimation of State Variables large estimation errors and no guarantee of the
predictability. The PO-MOESP applied PCA to the
4.1 Latent Variable Methods for State projection of estimated predictable subspace onto part of the
Estimation past data space, therefore the result is generally improved.
The predictable subspace estimated by the linear regression This method gives unbiased result for white noise inputs. If
methods of the last section is a high dimensional space the inputs are auto-correlated, the result will be biased.
(>>systemorder n) consisting of highly correlated variables. N4SID applies PCA (SVD) on the part of the projection
If there were no estimation error, this subspace should be YJ[Y,;Up;Uf]corresponding to the past data. As mentioned
only of rank n and any n independent variables in the data above, this result can be deemed as the predictable subspace
set or their linear combinations can be taken as state projection U, fi, U, on to the past data. Here fi, is the third
variables. However, the estimation error generally makes block of the regression coefficient matrix. In fact, it can be
the space fill rank. Direct choice of any n variables will shown that this method is equivalent to performing RRA on
have large estimation error and lose the useful information past data and Yrfi,Uf (for proof, see Shi, 2001). It is clear
in all other variables, which are highly correlated to the true
states. Extracting only n linear combinations from this that the best predictability in N4SID is in the sense of total
highly correlated high-dimensional space and keeping as 4
predictable variance of Yr U, based on the past data, and
much information as possible will be the most desirable. this is assured by projection onto the past data, and at the
This is exactly the general goal and the situation for which same time the future noise is removed. Therefore the
latent variable methods were developed. Latent variable estimation error and bias of X , from N4SID are very small
methods are therefore employed in all SIMs as the in general.
methodology for estimation of the state variables from the 2. CCA (CVA)
predictable subspace.
Latent variables (LVs) are linear combinations of the CVA applies CCA on PIo=[Yp;U,] and Y=Y,,= YF-&Uf,
original (manifest) variables for optimization of a specific and the first n latent variables (CVs) from the past data set
objective. There are a variety of latent variable methods are estimates of xk.By selecting the canonical variates with
based on different optimization objectives. In general terms, largest correlation as states, one is maximizing the relative
Principle Component Analysis (PCA), Partial Least Squares variation in Y in each dimension rather than the absolute
(PLS), Canonical Correlation Analysis (CCA) and Reduced variation.
Rank Analysis (RRA) are latent variable methods that CCA can also be applied to the results resulting from
maximize variance, covariance, correlation and predictable projecting future inputs U, out both the past data and the
variance respectively (for details refer to Bumham, et al., future outputs, i.e. PI#", and YQ,,; however, the direct
1996). Different SIMs employ different LVMs or use them results of canonical variates J,P,,P,, are obviously biased
in different ways to estimated state variables. estimation for X,. Here the coefficient matrix J I should be
applied to the original past data to get state estimates: J,P,.
These estimates are no longer orthogonal. The result is
3681
proven to give the unbiased state variable estimates (for State estimation by PLS gives relatively degraded results
proof, see Shi, 200 1). However, since part of the state signal compared with CCA or RRA based on the same estimated
is removed away by regressing U, out while the noise is predictable subspace by ARX. The estimated states by other
kept intact, the data set Ypuf0has a worse S N R than Y, 4 U, methods are much closer to the true states. Similar
conclusions are indicated by the squared multiple
in general.
correlation R2, which shows how much the total sum
3. Other Possible Methods squares of the true states can be explained by the estimated
Other LVMs, such as PLS and RRA, are possible choices states (two states are scaled to unit variance).
for state extraction. For example,' RRA should provide CCA results based on PIOpuf, and YB,, and those based
estimates of the states based on the same objective as on P,, and Y,-H,U, (using the true ITf) are also compared.
N4SID, that is, states which maximizing the variance The coefficient matrix J , from the former is different from
explained in the predictable subspace. However, it should that of the latter (to large to show). The estimated states
give numerically improved estimates since it directly from the former method are not linear combinations of
obtains the n LVs which explain the greatest variance in the those estimated by the latter method, but they are very
predictable subspace, rather than the two-step procedure of close.
the first performing an ill-conditioned least squares (oblique If the estimated states are used for fitting the state space
projection) followed by PCNSVD (see Shi, 200 1). model, each method can be compared by plotting their
Since the objective of PLS is to model the variance in estimated impulse responses (Fig. 1) and their errors (Fig.2).
the past data set and the predictable subspace as well as MOESP gives is a poor result for this example. The result
their correlation, it will generally not provide minimal order from SIM-ARX-PLS has a large error but can be improved
state models. In effect, it will try to provide state vectors for to match the others by using more LVs (refer to Shi and
the joint input/output space (refer to Shi and MacGregor, MacGregor, 2000). The results from other SIMs are very
2000). close to the true values. The somewhat irregular response
Combinations of the methods used for the predictable from the ARX model is also shown for comparison. All
subspace estimation and the methods used for the state SIMs give smooth response by fitting the LVs to the state
variable estimation can lead to a whole set of different equation.
subspace identification methods.
6. Conclusions
5. Simulation Example Although SIMs are quite different in their concepts and
In this section, a simulation example is used to illustrate the algorithms, they follow the same statistical framework set
discussed points. The example is a simple 1'' order SISO up here: (1) use of a linear regression method to estimate Hf
process with AR(1) noise, modeled as: and the predictable subspace; (2) use of a latent variable
0.22-' 1 method for estimation of a minimal set of the state
yk =W U k
variables; and (3) then fitting to the state space model. By
The input signal is a PRBS signal with switching time discussing the SIMs in this framework their similarities and
period T,=5 and magnitude of 4.0. 1000 data points are differences can be clearly seen. It also reveals possible new
collected with var(e,J=l.O, and SNR is about 0.93 (in methods and new combinations of existing approaches for
variance). Both past and future lag steps are takes as 7 for novel methods, such as use of the IV method for the
every method. estimation of H , and use of other latent variable methods
Different methods for estimation of the Hf matrix are I2RA and PLS for state estimation.
applied to the simulation example and compared to the true
result. A rough comparison is to take the mean of elements References
on lower diagonal as the estimated impulse weight. The Astram, K. .I.
Introduction
, to stochastic Control Theory, Academic
results and total absolute errors are listed in Table 1. The Press, 1970
results by regressing Yf directly onto U, are clearly the Bumham, A.J. R. Viveros and J.F. MacGregor, Frameworks for
farthest fiom the true values because of the bias resulting Latent Variable Multivariate Regression, Journal of
from the strong auto-correlation in the input PRBS signal. Chemometrics, W O , pp.31-45, 1996
Other methods give results close to the true values. Carette, P., personal communication, notes for CVA, 2000
There are many indexes for comparison of the Larimore, W. E., Canonical Variate Analysis for system
Identification, Filtering and Adaptive Control, Proc. 29th
estimated states to the true states. One quick index is the
IEEE conference on Decision and Control, VoI. 1, Honolulu,
canonical correlation coefficients between estimated states Hawaii, 1990
and the true states (see Table 2). This gives a clear idea on Larimore, W.E., Optimal Reduced Rank Modeling, Prediction,
how the two spaces are consistent with each other. MOESP Monitoring and Control Using Canonical Variate Analysis,
using the direct regression YdU, clearly gives poor results. preprints of ADCHEM, Banff, 1997
3682
Ljung, L. And McKelvey, T., Subspace identificationfrom closed
loop data, Signal Processing, V52, 1996
Shi, R. and J. MacGregor, Modeling of Dynamic Systems using
Latent Variable and Subspace Methods, J. of Chemometrics,
V14, pp.423-439,2000
Shi, R., Ph.D thesis, McMaster University, ON, Canada, 2001
Van Overschee, P. and De Moor, B., N4SID: Subspace Algorithms
for the Identification of Combined Deterministic-Stochastic
System, Automatica, Vol. 30, No. 1, pp. 75-93, 1994
Van Overschee, P. and De Moor, B., A Unifiing Theorem for
Three Subspace System, Identification Algorithms,
Automatica, Vo1.31. No.12, pp.1835-1864, 1995
Verhaegen, M. and Dewilde P., Subspace Model Identification,
Part I.. The Output-error State-space model identification
class of algorithms, International Joumal o f Control, V56,
pp.1187-1210, 1992
Verhaegen, M., Identification of Deterministic Part of MIMO State
Space Models Given in Innovations from Input-output Data,
Automatica, V30, pp.61-74, 1994
Viberg, M., Subspace-based Methods for the Identijcation of
Linear Time-invariant System, Automatica, V31, pp. 1835-
1851, 1995
3683