16 - Optimal Control of Unknown Parameter Systems
16 - Optimal Control of Unknown Parameter Systems
OCTOBER 1989
d
FRANCISCO CASIELLO AND K E N N E T H A . LOPARO - P,( t ) = A,P, (1) + P,( t ) A J P, ( t )C(: Q ) IC,P,( t )
- ~
df
Abstract-This note addresses the problem of finding a cost functional f,(O) = f ( O ) = E[x(O)]=x"
for which an adaptive control law is optimal. The system under
consideration is a partially observed linear stochastic system with P,(O)= P ( 0 )= E [ ( x ( O )-f(O))(x(O) i ( 0 ) ) 'I -
and w(/)is a Brownian motion process with respect to the probability MAIN RESULT
measure P defined on (0,5 , , P ) , 5 , is a nondecreasing family of sub-u-
algebras of 5 , 0 an unknown parameter taking values in F = ( 0 , 1, . . . , Assume that ( A , , B,, C,) is a minimal triple, V, E F. Let K, be the
f). unique positive definite solution of the Riccati equation (ARE) above,
A control U(.) is said to be admissible if it is progressively measurable then the DUL law is optimal with respect to the cost (C) with the learning
with respect to 3 ; = u { y " ( r ) ,0, 5 7 5 t } with cost I ( . ) which is given by
and
x'(r)=x(0)+
rl [ A 8 x U ( r ) + B 0 u ( rd~
)] a.s.
+
,€F
T,(r)trK,B,P,(t)B:K,di
= Prob { e = iI 3 y }.
Now, following [ 5 ] ,let I , ( t )be the likelihood that the system is operating
under 0 = i, given the observations y U ( r ) 0 , 5 7 5 t , then where SF is defined as above.
Lemma 2: Let T 2 0 and define J';(x,) by
d / , ( / )= / , ( f ) ( C , i , ( f ')d) y ( t ) , i E F , T denotes transpose
where 2,(t ) = E [ x (t ) 10
can be computed by
= i, 3 9 is the state estimate given mode i and J;(xo)= 1'0
( x ( t )' Q x ( t )+ U ( t )' ~ ( r ) dt.
)
J;(xo)=E [ (x(t)’Qx(t)+u(t)’u(t)) dt
1 -/(T).
and
i , ( t )= j , ( t )= C , i , ( t ) , i E F
Let 0 be a random variable taking values in F and let Eo[. ] and E,,,[.]
denote the expectation with respect to the random variable 0 and the siima C,(r)=E[llz(t)-t(r)lI*l;, Y 3.
field y;, respectively.
Then, for any s, 0 5 s 5 T , The dual or learning cost is then proportional to the Shannon
information of the pair 0 and y ( t ) , 0 5 t 5 T.
J p o ) = Ey:’ Also, under this assumption. uniform stabilization, as defined in [2] is
easily achieved with F = I since each model is stabilizable, and A, = A ,
- B,FC, = A , - B,F,; adaptive stabilization is also guaranteed.
In effect, the state feedback law
Computing the conditional expectation and using Lemmas 1 and 2, we U([)= - x , ( r ) B ~ K , . t ((ta) s . )
obtain iEE
“I
+ i: i t /
.rr,(t)trK,B,f,(t)BIK,
dt.
optimal control problem formulated in 121 with assumption (A2) of [2, p.
951 is not well posed. In effect, if the A , are as above, then the only
solution K, of the algebraic Riccati equation
The indicated limit exists since the integrals are almost surely absolutely
convergent as T +a. A:K,+K,A,-K,B,B:K,+Q,=O is the trivial one
Then, since K, stabilizes ( A , , B , , C,), that is U(.) is a stabilizing The key point here is to recognize that under the assumption given in
feedback control, when T 00, we obtain
+
[2] that C, = B I K , , Q, = C7C,,the algebraic Riccati equation simplifies
to the following Lyapunov equation:
J”(xo)= ~ , ( O ) i , ( o ) ~ K , i , ( o ) + *,(O)trK,P(O)
,E F itF :
A K, + K , A ,= 0
with the optimal control u ( t ) * = - C Z E F ~ , ( t ) B , ? K , i , ( a.s.
t ) (continu- If K, is the trivial solution, then the output equation if y ( t ) = w( t ) and
ous-time version of the DUL law). the optimal control law is u ( t ) = 0. The minimality condition of the triple
This proves the desired result. ( A r ,B,,C,) is not satisfied in this case.
Note: To show that in the cases above. the only solution is the trivial one. we
1) For finite time horizon problems, the dual cost I ( T ) is defined in use the following result from [ 3 ] .If the matrices A and - A * do not have
the same way with K, = K , ( t ) , the time dependent solution of the common eigenvalues, then K = 0 is the only solution to - A * K = K A .
differential Riccati equation associated with the optimal control problem. It is well known that A has eigenvalues satisfying X: = - Xi Iff A * ’
2) Let Bo E { 0 , 1, . . . ,f } be the correct parameter value. Then if the and - A have common eigenvalues.
consistency condition given in [2] is satisfied it follows that: Now assume that A is completely unstable and A * and - A have
common eigenvalues. Let X k = u k + iw,, Uk > 0 be an eigenvalue of A ,
then for some;, A: = h k = - crh + iw, is also an eigenvalue of A . but
~
Example: Assume that 0 belongs to F = { O , 1 ) and A,, = I:, ';I, A , (SPR) condition on the noise model. The key ingredient in the present
= [ A]. Take Bo = B , = I, then ( A # B,) , are controllable, i = 0, 1. method is the introduction of increasing lag regressors to formulate the
Then KO = [in $1, ko E R is the general solution o f A ;KO + KOAo = least-squares estimates for the noise process, while the main techniques in
0 and KOhas eigenvalues hl = ko, b2,= - ko,s o that Kciis an indefinite the convergence analysis are limit theorems for double array martingales.
matrix.
Also K , = lo' 1,], k , E R is thegeneral solution o f A : K I + K I A , =
h f l
I. INTRODUCTION
0 and K , has eigenvalues A,, = k , with multiplicity tWo, so that K , is
positive or negative semidefinite for k , 2 0 or k , 5 0, respectively. Let us consider the following linear stochastic control systems
The triples ( A r ,B , , C , ) , i = 0, 1 are minimal with C, = F, = B f K , if described by the ARMAX model:
and only if the pairs ( A O ,C O )and ( A , ,Cl) are observable. Since A,, and
A I are nonsingular matrices we require that KO and K , each be of rank 2. A (z)Y,,= B ( z ) u ,+ C(Z)W,, n2O (1)
which is true iff k0 # 0 and k , # 0. where y,, U,, and w, are the m-,I-, and m-dimensional system output,
The triples ( A o ,Bo, C,,) and ( A , , B , , C , ) are uniformly stabilizable input, and noise sequences, respectively, A ( z ) , B ( z ) , and C ( z ) are
with F = I iff A,, = An - BnFCo and A I = A , BIFC, are stable. matrix polynomials in backwards-shift operator z:
-:,,
~