0% found this document useful (0 votes)
24 views3 pages

16 - Optimal Control of Unknown Parameter Systems

Uploaded by

Cordelio Lear
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views3 pages

16 - Optimal Control of Unknown Parameter Systems

Uploaded by

Cordelio Lear
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

1092 IEEE TRANSACTIONS ON AUTOMATIC CONTROL. VOL. 34. NO. IO.

OCTOBER 1989

Optimal Control of Unknown Parameter Systems with

d
FRANCISCO CASIELLO AND K E N N E T H A . LOPARO - P,( t ) = A,P, (1) + P,( t ) A J P, ( t )C(: Q ) IC,P,( t )
- ~

df
Abstract-This note addresses the problem of finding a cost functional f,(O) = f ( O ) = E[x(O)]=x"
for which an adaptive control law is optimal. The system under
consideration is a partially observed linear stochastic system with P,(O)= P ( 0 )= E [ ( x ( O )-f(O))(x(O) i ( 0 ) ) 'I -

unknown parameters. It is well known that an optimal finite-dimensional


Q" = E [ w ( t ) w ( t ) ' ]
filter for this problem can be derived when the parameters belong to a
finite set. Since the optimal filter involves the evaluation of a finite set of Also
a posteriori probabilities for each of the parameter values given the
observations, a natural adaptive control scheme is: i) develop the optimal
linear feedback law given each parameter; ii) use the a posteriori
K t F
probabilities to form the weighted average (convex combination) of the
individual control policies; iii) use the weighted average as the control We want to find a cost of the following form:
law. This note devises a quadratic cost functional for which this strategy
is optimal, in a general case, and shows that the probing effect identified
with dual control problems is inherent in the standard LQG problem with
J"(x0)= E [ 5, ( x ' ( r ) Q x ( r ) + u ' ( t ) u ( t ) df
)
1 -[(CO) (C).
parameter uncertainty.
Here I ( . ) is interpreted as a "learning cost" which is to be determined
PROBLEM
FORMULATION so that the following control law:

u(t)= - Tr(f)B,TK,i,(t) is optimal.


Consider a linear, partially observed stochastic system defined on the
i t 5
probability space (a, 5 , P ) described by
Here K , is the unique positive definite solution of the algebraic Riccati
d x ( t )=A,x(t)dt + Bou(t)dt equation
d y ( t )= C,x(t)dt+ d w ( t ) x(0)-N(xo, P ( 0 ) ) AJK,+K,A,-K,B,BrK,+Q=O i E F(ARE)
where
This is the continuous-time version of the Despande, Upadhyay,
x ( t ) is the state vector E R" Lainiotis (DUL) law [ I ] . Here the control law is computed as the average
u ( t ) is the control vector E R"' of the individual optimal regulator policies, assuming the parameters of
y ( / ) is the observation vector E R" each model are known.

and w(/)is a Brownian motion process with respect to the probability MAIN RESULT
measure P defined on (0,5 , , P ) , 5 , is a nondecreasing family of sub-u-
algebras of 5 , 0 an unknown parameter taking values in F = ( 0 , 1, . . . , Assume that ( A , , B,, C,) is a minimal triple, V, E F. Let K, be the
f). unique positive definite solution of the Riccati equation (ARE) above,
A control U(.) is said to be admissible if it is progressively measurable then the DUL law is optimal with respect to the cost (C) with the learning
with respect to 3 ; = u { y " ( r ) ,0, 5 7 5 t } with cost I ( . ) which is given by

and

x'(r)=x(0)+
rl [ A 8 x U ( r ) + B 0 u ( rd~
)] a.s.
+
,€F
T,(r)trK,B,P,(t)B:K,di

where SF = set of all pairwise distinct combinations of elements of F .


1
Proof: We need the following two lemmas.
Assume that the a priori probabilities *,(O) = Prob { 0 = i 1 / = 0},i LemmaI:Leta,~O,iEF,C,,,ff,= I . L e t x , E R " , i E F , P b e
E F are known. Define the a posteriori probabilities an n x n matrix, then

= Prob { e = iI 3 y }.
Now, following [ 5 ] ,let I , ( t )be the likelihood that the system is operating
under 0 = i, given the observations y U ( r ) 0 , 5 7 5 t , then where SF is defined as above.
Lemma 2: Let T 2 0 and define J';(x,) by
d / , ( / )= / , ( f ) ( C , i , ( f ')d) y ( t ) , i E F , T denotes transpose
where 2,(t ) = E [ x (t ) 10
can be computed by
= i, 3 9 is the state estimate given mode i and J;(xo)= 1'0
( x ( t )' Q x ( t )+ U ( t )' ~ ( r ) dt.
)

Let K be a solution to A ' K + K A ' = K R B rK - Q , then for any T


2 0

Manuscript received April 18, 1988; revised September 2 2 , 1988.


J;(xo)= -x(T)'Kx(T)+x(O)'Kx(O)+ i:11 u(t) + B'Kx(f)lI' dt.
The authors are with the Department of Systems Engineering, Case Western Reserve
University, Cleveland, OH 44106. The proof of Lemma 1 is straightforward and not included. Lemma 2 is
IEEE Log Number 8930296. from 12, p. 291. Now, in the context of our problem, we are interested in

0018-9286/89/1oOo-1092$01.00 0 1989 IEEE


IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 34. NO. IO, OCTOBER 1989 1093

the asymptotic behavior ( T + w )of the cost with


Z ( t ) = CRX(t)

J;(xo)=E [ (x(t)’Qx(t)+u(t)’u(t)) dt
1 -/(T).
and
i , ( t )= j , ( t )= C , i , ( t ) , i E F

Let 0 be a random variable taking values in F and let Eo[. ] and E,,,[.]
denote the expectation with respect to the random variable 0 and the siima C,(r)=E[llz(t)-t(r)lI*l;, Y 3.
field y;, respectively.
Then, for any s, 0 5 s 5 T , The dual or learning cost is then proportional to the Shannon
information of the pair 0 and y ( t ) , 0 5 t 5 T.
J p o ) = Ey:’ Also, under this assumption. uniform stabilization, as defined in [2] is
easily achieved with F = I since each model is stabilizable, and A, = A ,
- B,FC, = A , - B,F,; adaptive stabilization is also guaranteed.
In effect, the state feedback law
Computing the conditional expectation and using Lemmas 1 and 2, we U([)= - x , ( r ) B ~ K , . t ((ta) s . )
obtain iEE
“I

is also the output feedback law, that is C, = B I K yields

So that with F = I , u ( t ) = - F y ( f ) a.s.


This implies that ( A , ,B , , C,) is adaptively stabilizable, since U( t ) is a
stabilizing control.
However, the more stringent condition of uniform stabilization is not
required in the general case (just model stabilization). Assumption (A2) of
Define the dual or learning cost I ( T ) for the infinite time horizon [2, p. 951 guarantees the equivalence of uniform and adaptive stabilization
problem as and an interpretation of the dual cost as a Shannon information term.
However, the following remark should be taken into consideration.
Remark 2: If A , , i E F a r e completely stable or unstable matrices (that
is, all eigenvalues have negative or positive real parts, respectively), the

+ i: i t /
.rr,(t)trK,B,f,(t)BIK,
dt.
optimal control problem formulated in 121 with assumption (A2) of [2, p.
951 is not well posed. In effect, if the A , are as above, then the only
solution K, of the algebraic Riccati equation
The indicated limit exists since the integrals are almost surely absolutely
convergent as T +a. A:K,+K,A,-K,B,B:K,+Q,=O is the trivial one
Then, since K, stabilizes ( A , , B , , C,), that is U(.) is a stabilizing The key point here is to recognize that under the assumption given in
feedback control, when T 00, we obtain
+
[2] that C, = B I K , , Q, = C7C,,the algebraic Riccati equation simplifies
to the following Lyapunov equation:
J”(xo)= ~ , ( O ) i , ( o ) ~ K , i , ( o ) + *,(O)trK,P(O)
,E F itF :
A K, + K , A ,= 0
with the optimal control u ( t ) * = - C Z E F ~ , ( t ) B , ? K , i , ( a.s.
t ) (continu- If K, is the trivial solution, then the output equation if y ( t ) = w( t ) and
ous-time version of the DUL law). the optimal control law is u ( t ) = 0. The minimality condition of the triple
This proves the desired result. ( A r ,B,,C,) is not satisfied in this case.
Note: To show that in the cases above. the only solution is the trivial one. we
1) For finite time horizon problems, the dual cost I ( T ) is defined in use the following result from [ 3 ] .If the matrices A and - A * do not have
the same way with K, = K , ( t ) , the time dependent solution of the common eigenvalues, then K = 0 is the only solution to - A * K = K A .
differential Riccati equation associated with the optimal control problem. It is well known that A has eigenvalues satisfying X: = - Xi Iff A * ’

2) Let Bo E { 0 , 1, . . . ,f } be the correct parameter value. Then if the and - A have common eigenvalues.
consistency condition given in [2] is satisfied it follows that: Now assume that A is completely unstable and A * and - A have
common eigenvalues. Let X k = u k + iw,, Uk > 0 be an eigenvalue of A ,
then for some;, A: = h k = - crh + iw, is also an eigenvalue of A . but
~

then A is not completely unstable. A similar calculation holds when A is


completely stable.
and ?roo(T)-rl as T - r w a.s. Remark 3: According to Remark 2 the class of matrices that admit a
and the continuous-time version of the DUL controller converges almost nontrivial solution to A r K + KA = 0 are those having at least one pair
surely to the optimal LQG controller associated with the parameter Bo. of eigenvalues satisfying XT = - Xi, that is, they are symmetric with
Remark I : A special case of this result is when C, = F, = BTK, respect to the imaginary axis in the complex plane. Also, since the
(assumption (A2) in [2, p. 951). equations which deteryine K, are homogeneous the set of solutions is a
Following the development in [2, p. 791, we have linear subspace of R “ - , the set of n x n matrices over the real field.
r Thus, assumptions ( A I ) and (A2) of [2] together with the condition C,
= B J K , reduce the optimal control problem to a parametric optimization
problem over the set of matrices which stabilize the individual systems
( A , ,Bt), i E F .
The set of matrices A , which satisfy = - X k for some; and k must
have at least one real stable pole and a symmetric unstable pole, or
possibly a complex conjugate pair of eigenvalues on the imaginary axis.
We conjecture that, in general, the optimal control problem does not
admit a meaningful solution, if any, within this class. The following
example is illustrative.
1094 IEEl TRANSACTIONS ON AUTOMATIC CONTROL. VOL. 34. NO IO. OCTOBER 1989

Example: Assume that 0 belongs to F = { O , 1 ) and A,, = I:, ';I, A , (SPR) condition on the noise model. The key ingredient in the present
= [ A]. Take Bo = B , = I, then ( A # B,) , are controllable, i = 0, 1. method is the introduction of increasing lag regressors to formulate the
Then KO = [in $1, ko E R is the general solution o f A ;KO + KOAo = least-squares estimates for the noise process, while the main techniques in
0 and KOhas eigenvalues hl = ko, b2,= - ko,s o that Kciis an indefinite the convergence analysis are limit theorems for double array martingales.
matrix.
Also K , = lo' 1,], k , E R is thegeneral solution o f A : K I + K I A , =
h f l
I. INTRODUCTION
0 and K , has eigenvalues A,, = k , with multiplicity tWo, so that K , is
positive or negative semidefinite for k , 2 0 or k , 5 0, respectively. Let us consider the following linear stochastic control systems
The triples ( A r ,B , , C , ) , i = 0, 1 are minimal with C, = F, = B f K , if described by the ARMAX model:
and only if the pairs ( A O ,C O )and ( A , ,Cl) are observable. Since A,, and
A I are nonsingular matrices we require that KO and K , each be of rank 2. A (z)Y,,= B ( z ) u ,+ C(Z)W,, n2O (1)
which is true iff k0 # 0 and k , # 0. where y,, U,, and w, are the m-,I-, and m-dimensional system output,
The triples ( A o ,Bo, C,,) and ( A , , B , , C , ) are uniformly stabilizable input, and noise sequences, respectively, A ( z ) , B ( z ) , and C ( z ) are
with F = I iff A,, = An - BnFCo and A I = A , BIFC, are stable. matrix polynomials in backwards-shift operator z:
-:,,
~

Now A , = An - KO = [ ~ which ha\ real eigenvalues located at


k4-T-q A(7,)=I+A,z+ . " "'+A,ZP, p20, (2)
Thus, the system is not uniformly nor adaptively stabilizable.
We also note that A , = A I - K , = [ , I
h I
~ I which has eigenvalues B ( z ) = B , z + B 2 z 2 + . ... . . + B,zQ, q20, (3)
A l l = - k , + i and A12 = - k , - i so that K , stabilizes A , , if k , > 0. C(z)=I+C,z+... . . . +C J ' , rz0 (4)
Take nl(0) = I , SO that the problem becomes a standard LQG
problem. with unknown coefficients A,, B,, C, ( i = 1; . . p , j = 1; . ' q ,k =
Now, since K , stabilizes A , , we obtain 1. . . r ) , and known upper bounds p , q, and r for the true orders. Let us
organize the unknown coefficients into a parameter 0
J " ( ~ ~ ) = i n(x,'KIxl,+/rBKI),
f K,=diag { k , )
hI
@=[-AI ' . . -A,, B , ' . . B,, Cl . . . C,]'. (5)
s.t. k , > 0 which does not have a unique positive definite "minimizing"
sol ut ion. We assume that the innovation process { w,} is a martingale difference
sequence with respect to a family { F,} of nondecreasing o-algebras, and
CONCLUSIONS that the input U, is any F,-measurable vector for n 2 0, i.e.,
It has been shown that the continuous-time version of the D U L
controller is optimal for a cost functional that includes a quadratic term E[w,,-]~F,]=0, U, E F,,v n z o . (6)
and a nonquadratic term referred to as the "dual or learning cost." Since Thus, the input sequence may include any feedback control signal.
the DUL controller is a "passively learning" controller it becomes clear Furthermore, we assume that
that there must he a probing effect induced by the original quadratic cost
functional that is removed by subtracting the "dual cost." We conclude
that no extra terms need to be added to the cost functional for active
probing of the system if a "standard" quadratic cost functional is used.
The dual cost has been defined in 141. A general theory for discrete- lIw,Il =O(d(n)), a.s.. (8)
time problems will be given in a companion paper which will appear at a
later time. where { d ( n ) } is a positive nondecreasing deterministic sequence. Here
and thereafter the norm for a real matrix X is defined as / / X I /= A{,,
REFERENCES ( X X ' ) }''>, and the maximum[minimum] eigenvalue of a square matrix X
J . G . Despdnde. T . N. Upddhyay. and D.G . Lainiiiti$, "Adapti\e cnntrol ofllnear is denoted by A,, (.X')[AnIln ( X I ] .
stochamc system\." Autov?utica, w l . 9. pp. 107-1 15. 1973. Note that (7) implies that /I w, 11 = O(nC)a.s. Vc > 1/4, by the
0 . Hijah, Stabilizarion of C o n t r d S y s l e m . New York. Springer-Verlag.
1986.
conditional Borel-Cantelli lemma [ I]. Better bounds are also obtainable if
F. R. Gantinacher. The Theory of Murrices, Vols. 1 and 2 Neu Yorh: there are further assumptions, for example, 11 w,ll = O({log n } ,I2)when
Chel\ea. 1959 { w n } is Gaussian and white.
F. Casiello and K . Loparo. " A dual controller for lmcar systems wlth random Since { wn) is the innovation sequence, it is natural to assume that the
jump parameters," in Proc. 24th CDC, Ft. Lauderdalc. FL. Dec. 19x5. pp. 91 1-
915
noise model C ( z ) is stable (e.g., [ 2 ] ) ,i.e.,
0. Hijab. "The adaptive LQG prohleni-Part 1." [€€E Trans. Au/or??u/.
Contr., vol. AC-28. pp. 171-178. Feb. 1983. det C(z)#O, VZ: 1 ~ 1 ~ 1 , (9)
however. further a priori information on C(z) is generally unavailable
131.
Estimation and its related adaptive control problems for system (1) have
Least-Squares Identification for ARMAX Models been extensively studied over the last decade in the engineering literature.
without the Positive Real Condition Many identification algorithms have been proposed and analyzed (e.g.,
[ 2 ] , [4], 151). However, most of the existing recursive identification and
LEI GUO AND DAWEI HUANG adaptive control algorithms need the noise model to be strictly positive
real ( S P R ) in the convergence analysis. Specifically, for the standard
Abstract-The main purpose of this note is to study recursive
extended least-squares (ELS) algorithm, it is required that
identification problems of linear stochastic feedback control systems
described by ARMAX models, without imposing the strictly positive real C ~ l ( e ' " + C ~ r ( e ' " ) - I > O , v h E (0, 291, ( i 2 = - 1).
Manuscript received June 7 , 1988: revised October 25. 1988.
L. Gun was with the Department of Systems Engineering. Research School of Physical This condition is obviously not verifiable a priori, and necessarily implies
Sciences, Australian National University. Canherra. ACT 2601. Australia. He I\ now that II[C,.. .C,ll< I . Hence, it is a much stronger condition than (9).
with the Institute of Systems Science. Academia Sinica. Bcijlng. China. Qualitatively, it means that the noise process C(z)w, is not "too
D. Huang is ulth the Department of Statistics. Inqtitute of Advanced Studies.
colored."
Australran National University. Canberra. ACT 2601, Australla. on leave from the
Depanment of Prohahilit) and Statistics. Peking Univer5ity. Peking. China. It is also known that if the SPR condition fails, counterexamples can be
IEEE Log Number 8930298. constructed such that the ELS algorithm does not converge [6].

0018-9286/89/1000-1094$01.OO C 1989 IEEE

You might also like