0% found this document useful (0 votes)
9 views6 pages

A Framework For Subspace Identification Methods

A Framework for Subspace Identification Methods

Uploaded by

slim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

A Framework For Subspace Identification Methods

A Framework for Subspace Identification Methods

Uploaded by

slim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the American Control Conference

Arlington, VA June 25-27,2001

A Framework for Subspace Identification Methods


'
Ruijie Shi and John F. MacGregor
Dept. of Chemical Engineering, McMaster University, Hamilton, ON L8S 4L7, Canada
Email: [email protected], [email protected]

Abstract focuses on the algorithms instead of concepts and ideas


behind of these methods.
Similarities and differences among various subspace In this paper, SIMs are compared by casting them into a
identification methods (MOESP, N4SID and CVA) are general statistical regression framework. The fundamental
examined by putting them in a general regression similarities and differences among these SIMs is clearly
framework. Subspace identificationmethods consist of three shown in this statistical framework. All the discussion in
steps: estimating the predictable subspace for multiple this paper is limited to the open loop case of linear time
future steps, then extracting state variables from this invariant &TI) system.
subspace and finally fitting the estimated states to a state In next section, a general framework for SIMs will be
space model. The major differences among these subspace set up first. Then the following two sections will discuss the
identification methods lie in the regression or projection major parts and how these methods fit to the framework. A
methods used in the first step to remove the effect of the simulation example follows to illustrate the key points. The
future inputs on the future outputs and thereby estimate the last section provides conclusions and some application
predictable subspace, and in the latent variable methods guidelines.
used in the second step to extract estimates of the states.
This paper compares the existing methods and proposes 2. General Statistical Framework for
some new variations by examining them in a common
framework involving linear regression and latent variable SIMs
estimation. Limitations of the various methods become 2.11 Data Relationships in Multi-step State-
apparent when examined in this manner. Simulations are
included to illustrate the ideas discussed. space Representation
A linear deterministic-stochastic combined system can be
represented in the following state space form:
1. Introduction x,+, = Ax, + Bu, + W, (1)
Subspace identification methods (SIMs) have become quite yk = CX,+ Du, + Nw, + v, (2)
popular in recent years. The key idea in SIMs is to estimate where outputs yk, inputs uk and state variables xk are of
the state variables or the extended observability matrix dimension 1, m and n respectively, and stochastic variable
directly from the input and output data. The most influential wk and vk are of proper dimensions and un-correlated with
methods are CVA (Canonical Variate Analysis, Larimore, each other.
1990), MOESP (Multivariable Output Error State space, In order to catch the dynamics, SIMs use the multiple
Verhaegen and Dewilde, 1992) and N4SID (Numerical steps of past data to relate to the multiple steps of future
Subspace State-Space System IDentification, Van data. For an arbitrary time point k taken as the current time
Overschee and De Moor, 1994). These methods are so point, all the past p steps of the input forms a vector up,and
different in their algorithms that it is hard to bring them the current and the future f-1 steps of the input forms a
together and get more insights on the essential ideas and the vector &. Similar symbols for output and noise variables
connections among them. However, some effort has been (some algorithms assume
made to contrast these methods.

I-
Yk-p
Viberg (1995) gave an overview of SIMs and classified 'k-p

them into realization-based or direct types, and also pointed 'k-p+l Yk-p+l

out the different ways to get system matrices via estimated E, = yp=
- :
states or extended observability matrix. Van Overschee and ut-2 Yk-2
De Moor (1995) gave a unifying theorem based on lower 'k-1 Yk-I
order approximation of an oblique projection. Here different
methods are viewed as different choices of row and column
weighting matrices for the reduced rank oblique projection.
The basic structure and idea of their theorem is based on
trying to cast these methods into the N4SID algorithm. It

0-7803-6495-3/01/$10.000 2001 AACC 3678


equation (6) summarizes the necessary information in the
past history to predict the future outputs of (5). By
substituting (6) into (5), a linear relationship between the
future outputs and the past data as well as the future inputs
is obtained:
('k+ f -1 ) Yf =rf(n,- A e T , ' H , ) U l , + H f U r i r , A T , + Y ,
For convenience, all the possible y for different k are
collected in columns of U, which is the past input data set, +rf(n,~.p-AYTp'H,~,,)Wp-rfArrp'V,i H.c,fwJ ivf
similar notations for Y,,U , Yf, Wp, Vp, Wf and V, All the (7)
possible x, for different k are collected in columns of X,. All the terms involving the past form the basis for I&,
The relationships between these data sets and the state HfU, is the effect of the future inputs and can be removed if
variables are analyzed in the following multi-step state- Hf is known or estimated, and the future noise terms are
space representation as a general environment for discussing unpredictable. Only FJk is predictable from the past data
SIMs and their framework. set, and this predictable subspace is the fundamental base
Based on equation (I), (2) and the above notations, the for SIMs to estimate state sequence x k or the observability
following multi-step state-space model for the current matrix r;.
states, the past and future output data can be obtained With auto-correlated inputs, HfUfis correlated with the
is the initial state sequence): past data and therefore part of it can be calculated from the
X , = A P X , _ , +9,U , +9,,,,W, (3) past data if the input auto-correlation remains unchanged.
However, it is not part of the causality effect to be modeled
Yp = T p X n - p + HpUp + H.v,pWp + Vp (4)
in system identification, and therefore should not be taken
Y, = r f x ,+ H,U, + H,<,/w/+v, (5) into account for the prediction of Yfbased on the past data.
Where extended controllable matrices are q=[AP'B, The input auto-correlation may give difficulty in estimation
..., AB, B ] , .r;rp=[AP', AP2, ..., A, I], the extended of the predictable subspace and the state variables.
observability matrix is &=[CT, (CA)', (CA')', ..., (CAf-')T]T
and the lower block triangular Toeplitz matrices Hf and Hs,f 2.2 General Statistical Framework for SIMs
are :
Each SIM looks quite different from others in concept,
computation tools and interpretation. The original MOESP
does a QR decomposition on [U,; Uf]and then a SVD on
H.r = part of the R matrix. Part of the singular vector matrix is
taken as Tf,based on which A and C matrices are estimated,
and B and D are estimated through a LS fitting. N4SID
projects Yf onto [Yp;Up;Uf] and does an SVD on the part
corresponding to the past data, the right singular vectors are
estimated as state variables and fit to the state space model.
N4SID is interpreted in the concept of non-stationary
Kalman filters. CVA uses CCA (Canonical Correlation
CAP-' CA'-' ... N Analysis) to estimate the state variables (called memory)
and fit them to the state space model. It is interpreted in
rfand Hf show the effect of current states and the future maximum likelihood principle. As for the detailed
inputs on the future outputs respectively.
algorithms (refer to papers), the difference between these
The result of substituting X,, from (4) to (3) is (c' is SIMs seems so large that it is hard to find the similarities
the pseudo-inverse): between them.
X , = A"r,'Y, + (0,- A ~ ~ , , + H , ) U+(a,,,
, - A P ~ , ' H , , ) W p- A'r,'V, In fact, if the basic ideas behind these methods are
(6) scrutinized from the viewpoint of statistical regression, and
That is, current state sequence X , (therefore T&J is a linear the computation methods are analyzed in regression terms
combination of the past data. l-& is the free evolution of and related to each other, these methods are found to be
current outputs (with no future inputs) and independent of very similar, and follow the same framework. The
the system matrices. It is the part of hture output space in framework consists of three steps:
(5) that can be estimated from the data relationships. i) Estimate the predictable subspace rTkby a
System states can be defined as "the minimum amount linear regression method
of information about the past history of a system which is ii) Extract state variables from the estimated
required to predict the future motion" (Astrom, 1970). Here subspace by a latent variable method
the linear combination of the terms on the right hand side of

3679
iii) Then fit the estimated states to a state space method to estimate H, implicitly and the predictable
model. subspace via QR decomposition on [U,; Y,].
The major differences among SIMs are in the first two If the input sequences are auto-correlated, this method
steps and the third step is the same. The original MOSEP regresses part of the state effect away and gives a biased
algorithm extracts 4 from the estimated subspace. Here result for the predictable subspace. SVD on this subspace
MOESP is analyzed based on estimated states that come
from exactly the same subspace as 4 (also refer to Van
will gives an asymptotical unbiased estimation of
however, the estimation of X , will be biased.
c,
Overschee and De Moor, 1995). 2. Regression Yf against [Yp; Up; Ufl (N4SID)
Based on (6), we know I-Jk in ( 5 ) can be estimated by a
3. Estimation of the Predictable linear combination of the past inputs U,,and past outputs Y,,.
Subspace It is a natural choice to regress Y,against [U,;Up;U,]. Here
3.1 Linear Regression for HIto Estimate I'Jk
the regression coefficient for U, is an estimate of Hf (4)
and the part corresponding to the past data is an estimation
In SIMs, the predictable subspace I-PX, should be first of the predictable subspace, which is equivalent to
estimated in order to have a basis for estimation of states x
k projection Yf& U,onto the past data. This estimation will
or 4 matrix. The central problem is how to remove the
future input effects H,U, away from Y, in (5) in order to have a slight bias if the input signals are auto-correlated.
obtain a better estimate of the predictable subspace I-&. This bias will occur because of the correlation between past
The coefficient matrix H, is unknown and needs to be outputs and the past noise terms in (7).
estimated. This is the method used in N4SID to estimate Hf and
Hfshows the effects of U, on Yf,and consists of the the predictable subspace. It is realized by QR decomposition
first f steps of impulse weights on lower diagonals (SISO) of [U,; PIo; Y,](Plo=[Yp; U,,]).
The PO-MOESP (1994, past
or block weights on block lower diagonals (MIMO). The output (PO-) MOESP) gives similar results.
true H, is a lower block triangular matrix. These features (or 3.Constructing Hf from impulse weights of ARX model
requirements of H,) are very informative; however, most (CVN
algorithms do not make full use of these features. Different The nature of H, implies that it can be constructed by the
algorithm uses different method to estimate Hf from the first f impulse block weights. These impulse weight blocks
input and output data sets. There are quite a few ways for can be estimated from a simple model, such as an ARX
this task; however, they all belong to the linear regression model or FIR model, which can be obtained by regressing yk
method. against uk (if DzO), past inputs (U,) and past outputs (U )
4,
Once H, is estimated, say Yr&U, is an estimation p- .
The predictable subspace then is estimated as YfH,U,.
of the predictable subspace. This estimation includes the It includes all the future noise. This is the method some
effects of the estimation errors in 4 and the effects of CVA algorithms use to estimated H, and the predictable
future stochastic signals, which can be removed away by subspace.
projection to the past data. This projection procedure may 4. Regression out method
induce some error; however, in most cases it is less than the U, can be regressed out of both sides of (7) by projecting to
unpredictable future noise. Some subspace identification the orthogonal space of U , i.e., by post multiplying both
methods, such as N4SID, do the estimation of H, and side by P,,,=I-U~(UfU:)-IUp This removes away the U,
projection onto the past data sets in one step. term from the equation, and the coefficient matrices for past
data in (7) can be obtained by regressing Y,Puf0 against
-3.2 Methods Used to Estimate Hf Plflufo.This result is equivalent to that from N4SID, and
was implied in Van Overschee and De Moor (1995). The
1. Regression Y, against U, (MOESP) method has also been applied to the CVA method (refer to
Since H, is the coefficient matrix relating U, to U , it is Van Overschee and De Moor, 1995; Carette, 2000). See
natural to try to get Hf by directly performing LS regression next section for more discussion.
of Y, against U, as in (5). A basic assumption for an Another similar approach is to regress past datap out of
unbiased result is that the fiture inputs are un-correlated both sides of (7) (projecting to the orthogonal space of Pro,
with the noise terms in (7), which will also include the post multiplied by P,= I-P,oT(PIfl~oT)-'PIo) for the
rak
effect of state variables in this case. This method gives estimation of Hp This tums out to be equivalent to the
an unbiased result only when the inputs are white noise approach of N4SID.
signals. Once Hf is estimated, the predictable subspace is 5Instrumental Variable Method.
estimated as Y, -H,Up The original MOESP uses this If there is a variable that is correlated to Ufbut has no

3680
correlation with X,and the future noise, an unbiased Hf can 4.2 Methods Used for State Estimation
be estimated by the instrumental variable (IV) method based
on (5). For auto-correlated inputs, U, correlates with x k 1. PCA (MOSEP and N4SID)
through its correlation with U,, therefore the part of U , Both N4SID and MOESP extract X, by doing PCA on the
which has no correlation with past data, has no correlation estimated predictable subspace, which is essentially a SVD
with X,.This part of U, can be constructed by regressing U, procedure. This implies assumptions that && has a larger
out of U , and take the residual as the IV. Once Hf is variation than that of the estimation error and the two parts
estimated, the predictable subspace can be easily estimated. are uncorrelated. The frst assumption is well satisfied if the
All these estimation methods are variants of the linear signal to noise ration is large, and this ensures the frst n
regression method they differ only in their choice of the PCs are the estimated state variables. The second
independent and dependent variables, of regression assumption is essentially for the unbiasness of the
sequences, and the degree of utilization of knowledge about estimation, and this is not satisfied in case of auto-correlated
the features of H p The key problem comes from the inputs.
correlation between U, and xk, which arises from auto- The state-based MOESP (in original algorithm) directly
correlation of the input sequence in the open loop case. The uses PCA (SVD) on the estimated predictable subspace Y,
estimation accuracy (bias and variance) in each method
depends on the input signal, the true model structure, and
&Uf, where 4 is obtained by directly regressing Yfonto
U , and PCs are taken as estimated states. This estimated
the signal to noise ratio (SNR).
predictable subspace includes all the future noise and is not
predicted by the past data, and therefore PCA results have
4. Estimation of State Variables large estimation errors and no guarantee of the
predictability. The PO-MOESP applied PCA to the
4.1 Latent Variable Methods for State projection of estimated predictable subspace onto part of the
Estimation past data space, therefore the result is generally improved.
The predictable subspace estimated by the linear regression This method gives unbiased result for white noise inputs. If
methods of the last section is a high dimensional space the inputs are auto-correlated, the result will be biased.
(>>systemorder n) consisting of highly correlated variables. N4SID applies PCA (SVD) on the part of the projection
If there were no estimation error, this subspace should be YJ[Y,;Up;Uf]corresponding to the past data. As mentioned
only of rank n and any n independent variables in the data above, this result can be deemed as the predictable subspace
set or their linear combinations can be taken as state projection U, fi, U, on to the past data. Here fi, is the third
variables. However, the estimation error generally makes block of the regression coefficient matrix. In fact, it can be
the space fill rank. Direct choice of any n variables will shown that this method is equivalent to performing RRA on
have large estimation error and lose the useful information past data and Yrfi,Uf (for proof, see Shi, 2001). It is clear
in all other variables, which are highly correlated to the true
states. Extracting only n linear combinations from this that the best predictability in N4SID is in the sense of total
highly correlated high-dimensional space and keeping as 4
predictable variance of Yr U, based on the past data, and
much information as possible will be the most desirable. this is assured by projection onto the past data, and at the
This is exactly the general goal and the situation for which same time the future noise is removed. Therefore the
latent variable methods were developed. Latent variable estimation error and bias of X , from N4SID are very small
methods are therefore employed in all SIMs as the in general.
methodology for estimation of the state variables from the 2. CCA (CVA)
predictable subspace.
Latent variables (LVs) are linear combinations of the CVA applies CCA on PIo=[Yp;U,] and Y=Y,,= YF-&Uf,
original (manifest) variables for optimization of a specific and the first n latent variables (CVs) from the past data set
objective. There are a variety of latent variable methods are estimates of xk.By selecting the canonical variates with
based on different optimization objectives. In general terms, largest correlation as states, one is maximizing the relative
Principle Component Analysis (PCA), Partial Least Squares variation in Y in each dimension rather than the absolute
(PLS), Canonical Correlation Analysis (CCA) and Reduced variation.
Rank Analysis (RRA) are latent variable methods that CCA can also be applied to the results resulting from
maximize variance, covariance, correlation and predictable projecting future inputs U, out both the past data and the
variance respectively (for details refer to Bumham, et al., future outputs, i.e. PI#", and YQ,,; however, the direct
1996). Different SIMs employ different LVMs or use them results of canonical variates J,P,,P,, are obviously biased
in different ways to estimated state variables. estimation for X,. Here the coefficient matrix J I should be
applied to the original past data to get state estimates: J,P,.
These estimates are no longer orthogonal. The result is

3681
proven to give the unbiased state variable estimates (for State estimation by PLS gives relatively degraded results
proof, see Shi, 200 1). However, since part of the state signal compared with CCA or RRA based on the same estimated
is removed away by regressing U, out while the noise is predictable subspace by ARX. The estimated states by other
kept intact, the data set Ypuf0has a worse S N R than Y, 4 U, methods are much closer to the true states. Similar
conclusions are indicated by the squared multiple
in general.
correlation R2, which shows how much the total sum
3. Other Possible Methods squares of the true states can be explained by the estimated
Other LVMs, such as PLS and RRA, are possible choices states (two states are scaled to unit variance).
for state extraction. For example,' RRA should provide CCA results based on PIOpuf, and YB,, and those based
estimates of the states based on the same objective as on P,, and Y,-H,U, (using the true ITf) are also compared.
N4SID, that is, states which maximizing the variance The coefficient matrix J , from the former is different from
explained in the predictable subspace. However, it should that of the latter (to large to show). The estimated states
give numerically improved estimates since it directly from the former method are not linear combinations of
obtains the n LVs which explain the greatest variance in the those estimated by the latter method, but they are very
predictable subspace, rather than the two-step procedure of close.
the first performing an ill-conditioned least squares (oblique If the estimated states are used for fitting the state space
projection) followed by PCNSVD (see Shi, 200 1). model, each method can be compared by plotting their
Since the objective of PLS is to model the variance in estimated impulse responses (Fig. 1) and their errors (Fig.2).
the past data set and the predictable subspace as well as MOESP gives is a poor result for this example. The result
their correlation, it will generally not provide minimal order from SIM-ARX-PLS has a large error but can be improved
state models. In effect, it will try to provide state vectors for to match the others by using more LVs (refer to Shi and
the joint input/output space (refer to Shi and MacGregor, MacGregor, 2000). The results from other SIMs are very
2000). close to the true values. The somewhat irregular response
Combinations of the methods used for the predictable from the ARX model is also shown for comparison. All
subspace estimation and the methods used for the state SIMs give smooth response by fitting the LVs to the state
variable estimation can lead to a whole set of different equation.
subspace identification methods.
6. Conclusions
5. Simulation Example Although SIMs are quite different in their concepts and
In this section, a simulation example is used to illustrate the algorithms, they follow the same statistical framework set
discussed points. The example is a simple 1'' order SISO up here: (1) use of a linear regression method to estimate Hf
process with AR(1) noise, modeled as: and the predictable subspace; (2) use of a latent variable
0.22-' 1 method for estimation of a minimal set of the state
yk =W U k
variables; and (3) then fitting to the state space model. By
The input signal is a PRBS signal with switching time discussing the SIMs in this framework their similarities and
period T,=5 and magnitude of 4.0. 1000 data points are differences can be clearly seen. It also reveals possible new
collected with var(e,J=l.O, and SNR is about 0.93 (in methods and new combinations of existing approaches for
variance). Both past and future lag steps are takes as 7 for novel methods, such as use of the IV method for the
every method. estimation of H , and use of other latent variable methods
Different methods for estimation of the Hf matrix are I2RA and PLS for state estimation.
applied to the simulation example and compared to the true
result. A rough comparison is to take the mean of elements References
on lower diagonal as the estimated impulse weight. The Astram, K. .I.
Introduction
, to stochastic Control Theory, Academic
results and total absolute errors are listed in Table 1. The Press, 1970
results by regressing Yf directly onto U, are clearly the Bumham, A.J. R. Viveros and J.F. MacGregor, Frameworks for
farthest fiom the true values because of the bias resulting Latent Variable Multivariate Regression, Journal of
from the strong auto-correlation in the input PRBS signal. Chemometrics, W O , pp.31-45, 1996
Other methods give results close to the true values. Carette, P., personal communication, notes for CVA, 2000
There are many indexes for comparison of the Larimore, W. E., Canonical Variate Analysis for system
Identification, Filtering and Adaptive Control, Proc. 29th
estimated states to the true states. One quick index is the
IEEE conference on Decision and Control, VoI. 1, Honolulu,
canonical correlation coefficients between estimated states Hawaii, 1990
and the true states (see Table 2). This gives a clear idea on Larimore, W.E., Optimal Reduced Rank Modeling, Prediction,
how the two spaces are consistent with each other. MOESP Monitoring and Control Using Canonical Variate Analysis,
using the direct regression YdU, clearly gives poor results. preprints of ADCHEM, Banff, 1997

3682
Ljung, L. And McKelvey, T., Subspace identificationfrom closed
loop data, Signal Processing, V52, 1996
Shi, R. and J. MacGregor, Modeling of Dynamic Systems using
Latent Variable and Subspace Methods, J. of Chemometrics,
V14, pp.423-439,2000
Shi, R., Ph.D thesis, McMaster University, ON, Canada, 2001
Van Overschee, P. and De Moor, B., N4SID: Subspace Algorithms
for the Identification of Combined Deterministic-Stochastic
System, Automatica, Vol. 30, No. 1, pp. 75-93, 1994
Van Overschee, P. and De Moor, B., A Unifiing Theorem for
Three Subspace System, Identification Algorithms,
Automatica, Vo1.31. No.12, pp.1835-1864, 1995
Verhaegen, M. and Dewilde P., Subspace Model Identification,
Part I.. The Output-error State-space model identification
class of algorithms, International Joumal o f Control, V56,
pp.1187-1210, 1992
Verhaegen, M., Identification of Deterministic Part of MIMO State
Space Models Given in Innovations from Input-output Data,
Automatica, V30, pp.61-74, 1994
Viberg, M., Subspace-based Methods for the Identijcation of
Linear Time-invariant System, Automatica, V31, pp. 1835-
1851, 1995

Table1 the Impulse Weights in Estimated Hf


I Method True I YfKJuf IYf/[plo;Uf]l ARX
I W1 I 0 I 0.1003 I 0.0159 1 0
I W2 I 0.2 I 0.2716 I 0,1951 10.1997 0.1953
I w3 I 0.16 I 0.2203 I 0.1585 (0.1546
I W4 I 0.128 I 0.1770 I 0.1035 10.1086
1 w5 1 0.10241 0.1626 I 0.0728 1 0.0863
I w6 IO.O819( 0.1613 I 0.0668 I0.0602
W7-
umAbs.Err. o
0.06551- 0.2135
I 0.5688
0.0510 I0.0610 0.0559
0.1061 10.0675
t
n5 l.z2EY

Method MOESP N4SID CVA SIM- SIM- SIM-


IV- ARX- ARX-
CCA PLS RRA 0 5 1 0 1 5 2 0 Z 3 3 5 4 0 4 5
iim(.%@IQI!td)
Predictable YdUf Yf/[P1o;Uf] ARX IV ARX ARX
Subspace Fig. 2 Error on the impulse responses
Estimated PCA PCA CCA CCA PLS RRA
States
1st CC 0.8680 0.9993 0.9997 0.9995 0.9500 0.9993
2ndCC 0.2599 0.9623 0.9613 0.9600 0.9122 0.9618
RZ 0.4031 0.9612 0.9605 0.9590 0.8667 0.9606

3683

You might also like