0% found this document useful (0 votes)

12 views8 pages

[18]Learning-Based Control of Continuous-Time Systems Using Output Feedback

Uploaded by

linliquan2494885469

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views8 pages

[18]Learning-Based Control of Continuous-Time Systems Using Output Feedback

Uploaded by

linliquan2494885469

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

∗

Learning-Based Control of Continuous-Time Systems Using Output Feedback

Downloaded 09/11/24 to 137.189.241.59 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

†
Leilei Cui and Zhong-Ping Jiang

Abstract cations of ADP span across various fields such as au-

This paper presents an adaptive optimal control approach tonomous driving [7, 25], human motor control [26], and
for continuous-time linear systems with output feedback. wheel-legged robots [9]. However, these methods re-
The method fills in the gap in the literature of reinforce- quire full-state information for controller design, which
ment learning and adaptive dynamic programming that is costly and sometimes impossible to measure in prac-
has been focused exclusively on either discrete-time sys- tice. Hence, designing a learning-based control method
tems or continuous-time systems with full-state information. using input-output measurement instead of input-state
The approach utilizes the historical continuous-time input- measurement is a significant yet challenging research
output trajectory to reconstruct the current state, with- topic. A central challenge in developing learning-based
out discretizing the system dynamics or using a state ob- control for systems without full-state measurement is
server. By exploiting the policy iteration method, subop- simultaneously estimating the state and optimizing the
timal output-feedback controllers can be directly obtained control policy.
from collected input-output trajectory data in the absence For learning-based output-feedback control of
of an accurate dynamic model. The effectiveness of the pro- discrete-time systems, several approaches have been
posed learning-based PI algorithm is demonstrated through proposed in the literature. Lewis et al. [23] reconstruct
a practical example of F-16 aircraft control. the current state by a finite segment of input-output tra-
jectory data, followed by learning-based PI and VI al-
1 Introduction gorithms. Kiumarsi et al. [20] propose a learning-based
control approach for linear quadratic optimal track-
Dynamic programming is a powerful tool for solving
ing of discrete-time systems. The authors of [10] dis-
sequential decision-making and optimal control prob-
cretize continuous-time systems and propose an output-
lems. However, traditional dynamic programming is not
feedback robust ADP algorithm to handle dynamic un-
applicable to real-world systems due to the “curse of
certainty. In [36], the authors propose learning-based
dimensionality” and “curse of modeling” [2, 29]. To
methods for event-triggered adaptive optimal control
address these limitations, approximate dynamic pro-
with output feedback, while in [12], an adaptive optimal
gramming and reinforcement learning have been pro-
output regulation approach is proposed for discrete-time
posed for systems described by Markov decision pro-
linear systems with output feedback and input delay.
cesses with discrete time, state, and input [2, 33, 14].
Another approach to learning-based output-
In the physical world, dynamical systems are mathe-
feedback control is to estimate the current state using a
matically modeled as differential equations and evolve
state observer, and then parameterize the control policy
in continuous time, state, and input spaces. Stability
and value functions in terms of the state observer.
is essential for safe operation of these real-world engi-
For example, the adaptive observer from [35, 13] is
neering systems. To ensure stability, adaptive dynamic
adopted for the learning-based output-feedback control
programming (ADP) has been introduced for adaptive
of continuous and discrete-time systems [15]. In [37],
optimal control of systems with continuous input and
the authors adopt the adaptive observer in [1] to solve
state spaces. Learning-based control algorithms have
the Bellman equation by data-driven methods. Based
been developed based on ADP for various classes of
on the observer in [34, Chapter 5], learning-based
linear, nonlinear, periodic, and time-delay dynamical
output-feedback control algorithms are developed for
systems, and for optimal stabilization and output reg-
optimal stabilization [31], zero-sum differential game
ulation problems [16, 18, 11, 3, 27, 6, 8]. The appli-
[30], and optimal output regulation [4]. However, to
∗ This offset the influence of the initial estimation error of the
work has been supported in part by the NSF grants
EPCN-1903781 and ECCS-2210320. observer and obtain an accurate state estimation, the
† L. Cui, and Z. P. Jiang are with the Control and Networks system should run for a sufficiently long time before
Lab, Department of Electrical and Computer Engineering, Tan- data collection [4, Lemma 8].
don School of Engineering, New York University, 370 Jay Street, In this paper, we propose a new learning-based
Brooklyn, NY 11201 (email: l.cui, [email protected]).

Copyright © 2023 by SIAM

17 Unauthorized reproduction of this article is prohibited
output-feedback control approach for continuous-time 2.1 System Description We consider the following
linear systems. Inspired by [23], we reconstruct the cur- continuous-time linear time-invariant systems with out-
rent state by the continuous-time input-output trajec- put measurement
tory, enabling the expression of the control policy and
Downloaded 09/11/24 to 137.189.241.59 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

ẋ = Ax + Bu, x(0) = x0 ,
value function as a functional of the continuous-time (2.1)
input-output trajectory. By combining the model-based y = Cx,
PI [21] with RL technique, a learning-based PI algo-
where x ∈ Rn , u ∈ Rm , and y ∈ Rq are the state,
rithm is proposed such that given an initial admissible
input and output of the system; x0 is the initial state;
controller, the optimal control policy is iteratively ap-
A, B, and C are constant matrices with compatible
proximated using input-output data in the absence of
dimensions. We assume that (A, B) is controllable and
the system model. Our contributions are as follows:
(C, A) is observable. Under this assumption, LQR is
1) we extend the state reconstruction technique from
the problem of minimizing the following quadratic cost:
discrete-time systems [23] to continuous-time systems Z ∞
without discretization; 2) we propose a learning-based
(2.2) J(x0 , u) = y T Qy + uT Rudt,
output-feedback control approach directly designed for 0
continuous-time systems without discretization; and 3) √
we analyze the theoretical convergence of the proposed with Q = Q ⪰ 0, R = RT ≻ 0, and ( QC, A) being
T

learning-based PI algorithm. observable. If the state x can be measured directly, it

The remaining contents of the paper are organized is well-known that the state-feedback optimal controller
as follows. In Section 2, we review the preliminaries of is u∗ (t) = −K ∗ x(t), where
optimal control. Section 3 presents the proposed state (2.3) K ∗ = R−1 B T P ∗ ,
reconstruction technique for continuous-time systems.
Next, in Section 4, we develop a learning-based PI and P ∗ is the unique positive-definite solution of the
algorithm and provide a theoretical analysis of its following algebraic Riccati equation (ARE)
convergence. We demonstrate the effectiveness of the
(2.4) AT P + P A − P BR−1 B T P + C T QC = 0.
proposed algorithm using an example of linearized F-16
aircraft dynamics in Section 5. Finally, we conclude In addition, (A − BK ∗ ) is Hurwitz [24, Section 6.2].
the paper with Section 6, where we summarize our
contributions and discuss potential future directions. 2.2 Model-based Policy Iteration The celebrated
Notations: Sn denotes the set of n-dimensional model-based PI is reviewed as the foundation of
real symmetric matrices. | · | denotes the Euclidean learning-based PI approach. Given an initial stabilizing
norm of a vector or Frobenius norm of a matrix. ∥·∥∞ controller K1 , PI fist evaluates the control policy Ki at
denotes the supremum norm of a function. C ∞ (X, Y ) the ith iteration (i = 1, 2, . . .) by calculating the cor-
denotes the class of smooth functions from the linear responding cost J(x0 , −Ki x) = xT0 Pi x0 , and then the
space X to the linear space Y . For a matrix A ∈ Rm×n , control gain is improved [21]. In detail, given an initial
vec(A) := [aT1 , · · · , aTn ]T , where ai is the ith column of stabilizing controller, PI is presented as
A. For a symmetric matrix P ∈ Sn , vecs(P ) =
[p11 , 2p12 , ..., 2p1n , p22 , 2p23 , ..., 2p(n−1)n , pnn ]T , 1. Policy evaluation
vecu(P ) = [2p12 , ..., 2p1n , 2p23 , ..., 2p(n−1)n ]T , and ATi Pi + Pi Ai + C T QC + KiT RKi = 0,
diag(P ) = [p11 , p22 , ..., pnn ]⊤ . For two arbitrary (2.5)
Ai = A − BKi .
vectors ν, µ ∈ Rn , vecd(ν, µ) = [ν1 µ1 , · · · , νn µn ]T ,
vecv(ν) = [ν12 , ν1 ν2 , ..., ν1 νn , ν22 , ..., νn−1 νn , νn2 ]T , 2. Policy improvement
vecp(ν, µ) = [ν1 µ2 , ..., ν1 µn , ν2 µ3 , ..., νn−1 µn ]T . [X]i,j
denotes the submatrix of the matrix X that is com- (2.6) Ki+1 = R−1 B T Pi .
prised of the rows between the ith and jth rows of
X. In denotes the n-dimensional identity matrix. A† The following lemma shows that the cost matrix Pi
denotes the Moore–Penrose pseudo-inverse of A. is monotonically decreasing and Ki updated at each
iteration maintains the stability.
2 Preliminaries and Problem Formulation Lemma 2.1. [21] Starting from an initial stabilizing
In this section, the linear quadratic regulator (LQR), control gain K1 , PI has the following properties
and model-based PI are reviewed. 1) A − BKi is Hurwitz for any i ∈ Z+ ;
2) P1 ⪰ P2 ⪰ · · · ⪰ Pi ⪰ · · · ⪰ P ∗ ;
3) limi→∞ Pi = P ∗ and limi→∞ Ki = K ∗ .

Copyright © 2023 by SIAM

18 Unauthorized reproduction of this article is prohibited
Given the model-based PI for solving the LQR Via integration by substitution with ν = τ − t, and
optimal control problem, the problem to be studied in changing the order of integration between ν and θ, it
this paper is formulated as follows. follows that
Z 0
Downloaded 09/11/24 to 137.189.241.59 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Problem 1. In the absence of the accurate system x(t) =

T
G−1 eA θ C T yt (θ)dθ
matrices (A, B, and C), develop a learning-based PI −T
algorithm to iteratively approximate the optimal output- (3.12) Z 0 Z ν
T
feedback controller for minimizing (2.2) using the input- + G−1 eA θ C T CeAθ dθe−Aν But (ν)dν.
−T −T
output trajectories of system (2.1).
Define M (θ) := [Mu (θ), My (θ)], and z(t) :=
3 State Reconstruction [uT (t), y T (t)]T , where
Since the optimal controller (2.3) requires the full state Z θ
T
measurement, in this section, we will reconstruct the M u (θ) := G−1 eA τ C T CeAτ dτ e−Aθ B,
state x(t) using a segment of historical input-output (3.13) −T
T
trajectory within the interval [t − T, t]. Let ut (θ) := My (θ) := G−1 eA θ C T .
u(t + θ) and yt (θ) := y(t + θ), ∀θ ∈ [−T, 0] denote the
Following (3.12), x(t) is reconstructed as
segments of input and output trajectories, respectively.
For any θ ∈ [−T, 0], according to [5, Equation (4.5)], Z 0
x(t) can be expressed as (3.14) x(t) = M (θ)zt (θ)dθ.
−T
Z t
(3.7) x(t) = e−Aθ x(t + θ) + eA(t−τ ) Bu(τ )dτ. It is seen from (3.14) that x(t) is expressed in terms of
t+θ a segment of input-output trajectory within [t − T, t].
T By (3.14), the optimal controller and the minimal
Pre-multiplying (3.7) with eA θ C T CeAθ and integrat- value function can be rewritten as
ing the both sides of the equation from −T to 0 with Z 0
respect to θ, we have u∗ (t) = −K ∗ x(t) = − K̄ ∗ (θ)zt (θ)dθ,
Z 0 −T
T (3.15)
eA θ C T yt (θ)dθ = Gx(t)
Z 0 Z 0
∗
−T J(x(t), u ) = ztT (ξ)P̄ ∗ (ξ, θ)zt (θ)dξdθ,
(3.8) Z 0 Z t −T −T
T
− eA θ
CT CeA(t+θ−τ ) Bu(τ )dτ dθ, where
−T t+θ
K̄ ∗ (θ) = K ∗ M (θ),
where G is defined as (3.16)
Z 0 P̄ ∗ (ξ, θ) = M T (ξ)P ∗ M (θ).
AT θ T Aθ
(3.9) G := e C Ce dθ.
−T
Remark 1. It is noticed that in (3.14), a segment of
Lemma 3.1. G is nonsingular for any T > 0. input-output trajectory is applied to reconstruct the cur-
Proof. Via integration by substitution with µ = −θ, it rent state. Therefore, (3.14) can be considered as an
is obtained that extension of state-reconstruction method for discrete-
Z T time systems [23, Lemma 1] to continuous-time sys-
(3.10) G= e −AT µ T
C Ce −Aµ
dµ. tems. The difference is that in [23, Lemma 1], M is
0 a finite-dimensional matrix, while in (3.14), M (θ) is a
It is observed that G is the observability Gramian for matrix-valued function.
the pair (C, −A). Under the assumption that (C, A) is Remark 2. It is seen from (3.15) that the optimal con-
observable, G is nonsingular for any T > 0 [5, Theorem troller and value functions are expressed as functionals
6.4]. of the input-output trajectory within [t − T, t]. The ker-
With Lemma 3.1 and by (3.8), the state x(t) is nels of the optimal controller and value functions are
reconstructed as K̄ ∗ (θ) and P̄ ∗ (ξ, θ), respectively.
The computation of kernel matrices K̄ ∗ (θ) and
Z 0
T
x(t) = G−1 eA θ C T yt (θ)dθ ∗
P̄ (ξ, θ) depend on the system matrices (A, B, and C).
−T
(3.11) Z 0 Z t In the next section, we will design a learning-based PI
+ e AT θ T
C Ce A(t+θ−τ )
Bu(τ )dτ dθ . algorithm to approximate K̄ ∗ (θ) and P̄ ∗ (ξ, θ) without
−T t+θ requiring the accurate system matrices.

Copyright © 2023 by SIAM

19 Unauthorized reproduction of this article is prohibited
4 Learning-Based Policy Iteration where WiN ∈ R(q+m)×N , ViN ∈ Rn1 ×N ,
(q+m−1)(q+m)
In this section, we will develop a learning-based PI n1 = 2 , UiN ∈ R(q+m)m×N ;
N ∞ 2 q+m ∞
algorithm using the input-output data collected from eΨ,i ∈ C ([−T, 0] , R ), eΛ,i ∈ C ([−T, 0]2 , Rn1 )
N

system (2.1). N ∞ (q+m)m

and eΦ,i ∈ C ([−T, 0], R ) are truncation errors
Downloaded 09/11/24 to 137.189.241.59 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

and they uniformly converge to zero, i.e.

4.1 Algorithm Development Recalling that Ai =
A − BKi , and Ki is the updated feedback gain at the lim ∥eN N
Ψ,i (ξ, θ)∥∞ = 0, lim ∥eΛ,i (ξ, θ)∥∞ = 0,
N →∞ N →∞
ith iteration of PI, system (2.1) can be rewritten as (4.7)
lim ∥eN Φ,i (θ)∥∞ = 0.
N →∞
ẋ = Ai x + B(Ki x + u),
(4.1)
y = Cx. The main idea of learning-based PI is to approxi-
mate the weighting matrices WiN , ViN and UiN using
Taking the derivative of xT Pi x along the trajectories of input-output data of system (4.1). Define ΘN as
i
(4.1) yields
d T (4.8) ΘN T N T N
i = [vec (Wi ), vec (Vi ), vec (Ui+1 )] .
T N T

[x (t)Pi x(t)] = xT (t)(ATi Pi + Pi Ai )x(t)

(4.2) dt
+ 2uT (t)B T Pi x(t) + 2xT (t)Pi BKi x(t). Let Θ̂N N
i denote the approximation of Θi . By the defini-
tion of ΘN ˆ ˆ
i and (4.6), P̄i and K̄i , the approximations of
Plugging (2.5) into (4.2), we have P̄i and K̄i respectively, can be reconstructed from Θ̂N i ,
T i.e.
d[x (t)Pi x(t)]
= −xT (t)(C T QC + KiT RKi )x(t)
(4.3) dt ŴiN = vec−1 ([Θ̂N
T T T T T i ]1,n2 ),
+ 2u (t)B Pi x(t) + 2x (t)Ki B Pi x(t).
diag(P̄ˆi (ξ, θ)) = ŴiN Ψ(ξ, θ),
To simplify the notations, define
V̂iN = vec−1 ([Θ̂N i ]n2 +1,n3 ),
K̄i (θ) := Ki M (θ), (4.9)
(4.4) vecu(P̄ˆi (ξ, θ)) = V̂iN Λ(ξ, θ),
T
P̄i (ξ, θ) := M (ξ)Pi M (θ). N
Ûi+1 = vec−1 ([Θ̂N i ]n3 +1,n4 ),
Integrating (4.3) from tl to tl+1 and considering (2.6) ˆ (θ) = vec−1 (Û N Φ(θ)),
K̄ i+1 i+1
and (3.14), we have

(4.5) where n2 = (q + m)N , n3 = n2 + n1 N , and n4 =

Z 0 Z 0 Z tl+1 n3 + (q + m)mN .
t
ztT (ξ)P̄i (ξ, θ)zt (θ)dξdθ|tll+1 = − y T Qydt Based on the parameterization in (4.6), we can
−T −T tl transform (4.5) to a linear equation with respect to the
tl+1 0 0
weighting matrices encoded in ΘN
Z Z Z
i . In detail, define the
− ztT (ξ)K̄iT (ξ)RK̄i (θ)zt (θ)dξdθdt
tl −T −T
data-dependent matrices of the form
Z tl+1 Z 0
+ 2uT (t)RK̄i+1 (θ)zt (θ)dθdt (4.10)
tl −T
Z 0 Z 0
Z tl+1 Z 0 Z 0 ΓΨzz (t) := ΨT (ξ, θ) ⊗ vecdT (zt (ξ), zt (θ))dξdθ,
+ 2ztT (ξ)K̄iT (ξ)RK̄i+1 (θ)zt (θ)dξdθdt −T
Z 0
−T
Z 0
tl −T −T
ΓΛzz (t) := ΛT (ξ, θ) ⊗ vecpT (zt (ξ), zt (θ))dξdθ,
Since P̄i (ξ, θ) ∈ C ∞ ([−T, 0]2 , Sq+m ) and K̄i (θ) ∈ −T −T
∞
C ([−T, 0], Rm×(q+m) ), we will use the linear combi- Z 0
nations of the basis functions to parameterize these ΓΦzu (t) := ΦT (θ) ⊗ zt (θ) ⊗ (uT (t)R)dθ,
−T
functions. Let Φ(θ), Ψ(ξ, θ), and Λ(ξ, θ) denote N - Z 0
dimensional linearly-independent basis functions. By ΓΦzK̄ˆ (t) := ΦT (θ) ⊗ ztT (θ) ⊗ (ûTi (t)R)dθ,
i
the approximation theory [28], we have: −T

diag(P̄i (ξ, θ)) = WiN Ψ(ξ, θ) + eN

Ψ,i (ξ, θ), where
(4.6) vecu(P̄i (ξ, θ)) = ViN Λ(ξ, θ) + eN
Λ,i (ξ, θ),
Z 0
(4.11) ûi (t) = − ˆ (θ)z (θ)dθ.
K̄
N N i t
vec(K̄i (θ)) = Ui Φ(θ) + eΦ,i (θ), −T

Copyright © 2023 by SIAM

20 Unauthorized reproduction of this article is prohibited
With (4.10), each term in (4.5) is expressed as Stacking (4.14) for l = 1, · · · , L yields
Z 0 Z 0
ztT (ξ)P̄i (ξ, θ)zt (θ)dξdθ = ϵN
1,i (t)
Hi ΘN N
i + Ei = Yi ,
−T −T T T T
Hi = [Hi,1 , · · · , Hi,L ] ,
Downloaded 09/11/24 to 137.189.241.59 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

+ ΓΨzz (t) vec(WiN ) + ΓΛzz (t) vec(ViN ), (4.17)

Z 0 EiN = [Ei,1
N N T
, · · · , Ei,L ] ,
2uT (t)RK̄i+1 (θ)zt (θ)dθ Yi = [Yi,1 , · · · , Yi,L ]T .
(4.12) −T
N
= 2ΓΦzu (t) vec(Ui+1 ) + ϵN
2,i (t), Assumption 1. Given N ∈ Z+ , there exist L∗ ∈ Z+
Z 0 Z 0 and α > 0, such that for all L > L∗ and i ∈ Z+ , the
2ztT (ξ)K̄iT (ξ)RK̄i+1 (θ)zt (θ)dξdθ following inequality holds:
−T −T
N
= 2ΓΦzK̄ˆ (t) vec(Ui+1 ) + ϵ3,i (t) + ρN
i,1 (t), 1 T
i (4.18) H Hi ≥ αI.
L i
where ϵN N N N
1,i (t), ϵ2,i (t), ϵ3,i (t) and ρ1,i (t) are induced by
trunction errors and expressed as Remark 3. Assumptions 1 is the reminiscence of the
Z 0 Z 0 persistent excitation (PE) condition in adaptive control
ϵ1,i (t) = vecdT (zt (ξ), zt (θ))eNΨ,i (ξ, θ) [19, 38]. It guarantees the uniqueness of the least-
−T −T square solutions to (4.17). As in the literature of ADP-
+ vecpT (zt (ξ), zt (θ))eN Λ,i (ξ, θ)dξdθ, based learning-based control [17, 22], one can fulfill it by
Z 0 adding exploration noise, such as sinusoidal signals and
ϵ2,i (t) = [2ztT (θ) ⊗ (uT (t)R)]eN Φ,i+1 (θ)dθ, random noise.
−T
Z 0 Z 0
(4.13) ˆ T (ξ)R)] Under Assumption 1 and according to (4.17), ΘN i is
ϵ3,i (t) = [2ztT (θ) ⊗ (ztT (ξ)K̄ i
−T −T approximated by the least-square method as
eN Φ,i+1 (θ)dξdθ, †
Z 0 Z 0 (4.19) Θ̂N
i = Hi Yi .

ρ1,i (t) = ˆ (ξ))T

2ztT (ξ)(K̄i (ξ) − K̄i
−T −T
Now, we are ready to present the learning-based PI
algorithm for optimal output-feedback controller design
RK̄i+1 (θ)zt (θ)dξdθ.
in Algorithm 1.
Plugging (4.12) into (4.5), we have
Remark 4. Since P̄iT (ξ, θ) = P̄i (θ, ξ), the diagonal ele-
(4.14) Hi,l ΘN N
i + Ei,l = Yi,l ,
ments of P̄i (ξ, θ) satisfy diag(P̄i (ξ, θ)) = diag(P̄i (θ, ξ)).
where Hence, the vector of basis functions Ψ should be chosen
h
t t
Hi,l = ΓΨzz (t)|tll+1 , ΓΛzz (t)|tl+1 , to satisfy Ψ(ξ, θ) = Ψ(θ, ξ).
l
Z tl+1
−2 ΓΦzu (t) + ΓΦzK̄ˆ (t)dt , 4.2 Convergence Analysis The convergence of the
i
tl
Z tl+1 proposed learning-based PI algorithm is rigorously stud-
Yi,l = − y T Qy ied. The following lemma shows that at each itera-
(4.15) tl tion, the value functional P̄i (ξ, θ) and the feedback gain
Z 0 Z 0
ˆ (θ)z (θ)dξdθ dt,
ˆ T (ξ)RK̄ K̄i+1 (θ) are well approximated as long as the number
+ ztT (ξ)K̄ i i t
−T −T
of the basis functions N is large enough.
Z tl+1
N
Ei,l = ϵN N N
1,i (t) − ϵ2,i (t) − ϵ3,i (t)
Lemma 4.1. For any i ∈ Z+ and η > 0, there exists
tl N ∗ (i, η) > 0, such that if N ≥ N ∗ (i, η),
− ρN N
1,i (t) + ρ2,i (t)dt,
∥P̄i (ξ, θ) − P̄ˆi (ξ, θ)∥∞ ≤ η,
and ρN 2,i (t), induced by the truncation errors, is ex- (4.20)
pressed as ∥K̄i+1 (θ) − K̄ ˆ (θ)∥ ≤ η.
i+1 ∞
Z 0 Z 0
ρN ˆ (ξ))T R
ztT (ξ)[(K̄i (ξ) − K̄
2,i (t) = i Proof. Let Θ̃N N
i = Θi − Θ̂i and
N
(4.16) −T −T
ˆ T (ξ)R(K̄ (θ) − K̄
K̄i (θ) + K̄ ˆ (θ))]z (θ)dξdθ. (4.21) ÊiN := Yi − Hi Θ̂N
i i i t i .

21 Unauthorized reproduction of this article is prohibited
Algorithm 1 Learning-based Policy Iteration limN →∞ ρN N
1,i+1 = limN →∞ ρ2,i+1 = 0. In addition,
1: Select the basis functions Φ(θ), Ψ(ξ, θ), and Λ(ξ, θ). from (4.7) and (4.13), we have limN →∞ ϵN 1,i+1 (t) =
2: Select the sampling instance tk ∈ [t1 , tL+1 ]. limN →∞ ϵN
2,i+1 (t) = lim N →∞ ϵN
3,i+1 (t) = 0.
Select a stabilizing gain K̄ˆ (θ), and using u(t) = Consequently, N
limN →∞ (Ei+1,l )2 = 0, and
Downloaded 09/11/24 to 137.189.241.59 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

3: 1
R0 ˆ
− −T K̄1 (θ)zt (θ)dθ +e(t), where e is an exploration limN →∞ (Θ̃N T N
i+1 ) Θ̃i+1 = 0 is obtained from (4.25).
signal, to explore system (2.1) and collect the input- Comparing (4.6) with (4.9), it is obtained that for
output data u(t), y(t), t ∈ [0, tL+1 ]. any η > 0, there exists N ∗ (i + 1, η) > 0, such that if
4: Set the threshold δ > 0 and i = 1. N ≥ N ∗ (i + 1, η), (4.20) holds for i + 1. The proof is
5: repeat thus completed by induction.
6: Construct Hi and Si by (4.15) and (4.17).
Get Θ̂N Theorem 4.1. For any η > 0, there exist i∗ ∈ Z+ and
7: i by solving (4.19).
ˆ (θ) by (4.9). N ∗∗ > 0, such that if N ≥ N ∗∗ ,
8: Get K̄ i+1
9: i←i+1 ∥P̄ ∗ (ξ, θ) − P̄ˆi∗ (ξ, θ)∥∞ ≤ η,
10: until |Θ̂N N
i − Θ̂i−1 |< δ. (4.26)
R0 ˆ ∥K̄ ∗ (θ) − K̄ˆ ∗ (θ)∥ ≤ η.
11: Use û (z ) = − K̄ (θ)z (θ)dθ as the approxi- i ∞
i t −T 1 t
mated optimal controller. Proof. According to Lemma 2.1, there exists i∗ ∈ Z+
such that

Subtracting (4.21) from (4.17) yields ∥P̄ ∗ (ξ, θ) − P̄i∗ (ξ, θ)∥∞ ≤ η/2,
(4.27)
∥K̄ ∗ (θ) − K̄i∗ (θ)∥∞ ≤ η/2.
(4.22) ÊiN = Hi Θ̃N N
i + Ei ,
Furthermore, by Lemma 4.1, if N ≥ N (i∗ , η/2),
Since Θ̂N
is the least-square solution to (4.17), it is
i
obtained that ∥P̄i∗ (ξ, θ) − P̄ˆi∗ (ξ, θ)∥∞ ≤ η/2,
(4.28)
1 N T N 1 ∥K̄ ∗ (θ) − K̄ˆ ∗ (θ)∥ ≤ η/2.
i i ∞
(4.23) (Ê ) Êi ≤ (EiN )T EiN .
L i L
Therefore, (4.26) is obtained by the triangle inequality.
It follows from (4.22) and (4.23) that

1 N T T
(Θ̃ ) Hi Hi Θ̃N 5 Application to F-16 Aircraft Control
L i i

1 In this section, we demonstrate the efficacy of the

(4.24) = (ÊiN − EiN )T (ÊiN − EiN ) proposed learning-based PI algorithm by the example
L
4 of linearized F-16 aircraft model [32]. The state of the
≤ (EiN )T EiN . system is x = [ζ, q, δe ]T , where ζ is the angle of attack, q
L
is the pitch rate, and δe is the elevator deflection angle.
Under Assumption 1, we have Then, the linearized F-16 dynamics is
4 4 (5.29)
(4.25) (Θ̃N T N
i ) Θ̃i ≤ (E N )T EiN ≤ max (E N )2 .
αL i α 1≤l≤L i,l
  
−1.01887 0.90506 −0.00215 0
ẋ =  0.82225 −1.07741 −0.17555 x +  0  u
Then, the lemma is demonstrated by induction. 0 0 −20.2 20.2
When i = 1, K̄1 (θ) = K̄ ˆ (θ), and ρN = ρN = 0
i 1,1 2,1

y = 0 57.2958 0 x.
is obtained from (4.13) and (4.16). In addition, it
follows from (4.7) and (4.13) that limN →∞ ϵN 1,1 (t) = The factor 57.2958 is used to convert the unit from
N N
limN →∞ ϵ2,1 (t) = limN →∞ ϵ3,1 (t) = 0. Consequently, radian to degree. The length of the trajectory to
N 2
limN →∞ (E1,l ) = 0, and limN →∞ (Θ̃N T N
1 ) Θ̃1 = 0 is reconstruct the state is T = 0.1. The basis functions are
obtained from (4.25). Comparing (4.6) with (4.9), it is chosen as Ψ(ξ, θ) = [1, ξ + θ, ξ 2 + θ2 , ξθ, ξ 3 + θ3 , ξ 2 θ +
obtained that for any η > 0, there exists N ∗ (1, η) > 0, ξθ2 , ξ 4 + θ4 , ξ 3 θ + ξθ3 , ξ 2 θ2 , ξ 4 θ + ξθ4 , ξ 3 θ2 + ξ 2 θ3 , ξ 4 θ2 +
such that if N ≥ N ∗ (1, η), (4.20) holds for i = 1. ξ 2 θ4 , ξ 3 θ3 , ξ 4 θ3 + ξ 3 θ4 , ξ 4 θ4 ]T , Λ = [1, ξ, ξ 2 , ξ 3 , ξ 4 ]T ⊗
Suppose that for some i > 1 and any η > 0, there [1, θ, θ2 , θ3 , θ4 ]T , and Φ = [1, θ, θ2 , θ3 , θ4 ]T . The cost
exists N ∗ (i, η) > 0, such that if N ≥ N ∗ (i, η), (4.20) weighting matrices are Q = 1 and R = 1. The
holds. Then, it follows from (4.13) and (4.16) that initial state is x(0) = [0.5, 0.5, 0.5]T . From t =

22 Unauthorized reproduction of this article is prohibited
60 0.6 Data Collection Ends
Controller Updated
0.4
40
0.2
Downloaded 09/11/24 to 137.189.241.59 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

20 0

0 5 10 15 20 25 30
0
1 2 3 4 5 6 7 8 9 10 11 12

0.6 Data Collection Ends

Figure 1: Convergence of K̄ˆ (θ) to the optimal values Controller Updated
i 0.4
∗
K̄ (θ) by learning-based PI algorithm.
0.2

0
0s to t = 10s, the exploratory/behavior policy is
P500
u(t) = 0.2 i=1 sin(wi t), where wi (i = 1, · · · , 500) are 0 5 10 15 20 25 30
randomly sampled from the uniform distribution over
[−250, 250]. The integrating interval is tk+1 − tk = 0.01.
For the PI algorithm, the input-output data is Data Collection Ends
collected from t = 0s to t = 10s and Algorithm starts 2
Controller Updated
at t = 10s. The algorithm converges after eleven
iterations when the stopping criterion |Θ̂N N 0
i − Θ̂i−1 |<
0.01 is satisfied. The relative approximation error -2
ˆ (θ)−K̄ ∗ (θ)∥
∥K̄i ∞
∥K̄ ∗ (θ)∥∞
is plotted in Fig. 1. It is seen that
0 5 10 15 20 25 30
at the eleventh iteration, the approximation error is
ˆ (θ)−K̄ ∗ (θ)∥
∥K̄11 ∞
∥K̄ ∗ (θ)∥∞
= 3.20%. The state trajectory is shown
in Fig. 2. It is observed that the state quickly converges
Figure 2: Trajectory of the state.
to zero after the controller is updated by Algorithm 1.

6 Conclusion
This paper has proposed a novel learning-based output- [3] T. Bian and Z. P. Jiang, Value iteration and adaptive
feedback control approach for adaptive optimal control dynamic programming for data-driven adaptive optimal
of continuous-time linear systems. The first major con- control design, Automatica, 71 (2016), pp. 348–360.
[4] C. Chen, L. Xie, K. Xie, F. L. Lewis, and S. Xie,
tribution of the paper is extending the state reconstruc-
Adaptive optimal output tracking of continuous-time
tion technique in [23] to continuous-time systems with-
systems via output-feedback-based reinforcement learn-
out discretizing system dynamics. By integrating the ing, Automatica, 146 (2022), p. 110581.
RL techniques, a learning-based PI algorithm is devel- [5] C.-T. Chen, Linear System Theory and Design, Ox-
oped and the convergence is theoretically analyzed. The ford University Press, New York, NY, 3rd ed., 1999.
efficacy of the proposed algorithms is validated by a [6] L. Cui and Z. P. Jiang, A reinforcement learning
practical example arising from F-16 flight control. Our look at risk-sensitive linear quadratic gaussian control,
future work will be directed at extending the proposed arXiv preprint arXiv:2212.02072, (2022).
learning-based control methodology to robust ADP and [7] L. Cui, K. Ozbay, and Z. P. Jiang, Combined
multi-agent systems with output feedback. longitudinal and lateral control of autonomous vehicles
based on reinforcement learning, in 2021 American
Control Conference (ACC), 2021, pp. 1929–1934.
References [8] L. Cui, B. Pang, and Z. P. Jiang, Learning-
based adaptive optimal control of linear time-delay
[1] F. Abdollahi, H. Talebi, and R. Patel, A sta- systems: A policy iteration approach, arXiv preprint
ble neural network-based observer with application to arXiv:2210.00204, (2022).
flexible-joint manipulators, IEEE Transactions on Neu- [9] L. Cui, S. Wang, J. Zhang, D. Zhang, J. Lai,
ral Networks, 17 (2006), pp. 118–129. Y. Zheng, Z. Zhang, and Z. P. Jiang, Learning-
[2] D. P. Bertsekas and J. N. Tsitsiklis, Neuro- based balance control of wheel-legged robots, IEEE
Dynamic Programming., Athena Scientific, Belmont, Robotics and Automation Letters, 6 (2021), pp. 7667–
MA, 1996. 7674.

23 Unauthorized reproduction of this article is prohibited
[10] W. Gao, Y. Jiang, Z. P. Jiang, and T. Chai, driven adaptive optimal control of mixed-traffic con-
Output-feedback adaptive optimal control of intercon- nected vehicles in a ring road, in 60th IEEE Conference
nected systems based on robust adaptive dynamic pro- on Decision and Control (CDC), 2021, pp. 77–82.
gramming, Automatica, 72 (2016), pp. 37–45. [26] B. Pang, L. Cui, and Z. P. Jiang, Human motor
Downloaded 09/11/24 to 137.189.241.59 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

[11] W. Gao and Z. P. Jiang, Adaptive dynamic program- learning is robust to control-dependent noise, Biological
ming and adaptive optimal output regulation of linear cybernetics, 116 (2022), pp. 307–325.
systems, IEEE Transactions on Automatic Control, 61 [27] B. Pang and Z. P. Jiang, Adaptive optimal control
(2016), pp. 4164–4169. of linear periodic systems: An off-policy value iteration
[12] W. Gao and Z. P. Jiang, Adaptive optimal out- approach, IEEE Transactions on Automatic Control,
put regulation of time-delay systems via measurement 66 (2021), pp. 888–894.
feedback, IEEE Transactions on Neural Networks and [28] M. J. D. Powell, Approximation Theory and Meth-
Learning Systems, 30 (2019), pp. 938–945. ods, Cambridge University Press, New York, NY, 1981.
[13] A. Guyader and Q. Zhang, Adaptive observer for [29] W. B. Powell, Approximate Dynamic Programming:
discrete time linear time varying systems, IFAC Pro- Solving the Curses of Dimensionality, Wiley Series in
ceedings Volumes, 36 (2003), pp. 1705–1710. 13th Probability and Statistics, Wiley, Hoboken, NJ, USA,
IFAC Symposium on System Identification, Rotter- 2nd ed., 2011.
dam, The Netherlands, 27-29 August, 2003. [30] S. A. A. Rizvi and Z. Lin, Output feedback adaptive
[14] R. A. Howard, Dynamic Programming and Markov dynamic programming for linear differential zero-sum
Processes, MIT Press, Cambridge, MA, 1960. games, Automatica, 122 (2020), p. 109272.
[15] Y. Jiang and Z. P. Jiang, Approximate dynamic pro- [31] S. A. A. Rizvi and Z. Lin, Adaptive dynamic pro-
gramming for output feedback control, in Proceedings of gramming for model-free global stabilization of control
the 29th Chinese Control Conference, 2010, pp. 5815– constrained continuous-time systems, IEEE Transac-
5820. tions on Cybernetics, 52 (2022), pp. 1048–1060.
[16] Y. Jiang and Z. P. Jiang, Computational adaptive [32] B. L. Stevens, F. L. Lewis, and E. N. Johnson,
optimal control for continuous-time linear systems with Aircraft Control and Simulation: Dynamics, Controls
completely unknown dynamics, Automatica, 48 (2012), Design, and Autonomous Systems, John Wiley & Sons,
pp. 2699–2704. Hoboken, NJ, 2015.
[17] Y. Jiang and Z. P. Jiang, Robust Adaptive Dynamic [33] R. S. Sutton and A. G. Barto, Reinforcement
Programming, Wiley-IEEE Press, NJ, USA, 2017. Learning: An Introduction, The MIT Press, Cam-
[18] Z. P. Jiang, T. Bian, and W. Gao, Learning- bridge, MA, 2018.
based control: A tutorial and some recent results, [34] G. Tao, Adaptive Control Design and Analysis, John
Foundations and Trends in Systems and Control, 8 Wiley & Sons, Hoboken, NJ, 2003.
(2020), pp. 176–284. [35] Q. Zhang, Adaptive observer for multiple-input-
[19] Z. P. Jiang, C. Prieur, and A. Astolfi (Editors), multiple-output (MIMO) linear time-varying systems,
Trends in Nonlinear and Adaptive Control: A Tribute IEEE Transactions on Automatic Control, 47 (2002),
to Laurent Praly for His 65th Birthday, Springer Na- pp. 525–529.
ture, NY, USA, 2021. [36] F. Zhao, W. Gao, T. Liu, and Z. P. Jiang,
[20] B. Kiumarsi, F. L. Lewis, M.-B. Naghibi-Sistani, Adaptive optimal output regulation of linear discrete-
and A. Karimpour, Optimal tracking control of un- time systems based on event-triggered output-feedback,
known discrete-time linear systems using input-output Automatica, 137 (2022), p. 110103.
measured data, IEEE Transactions on Cybernetics, 45 [37] L. M. Zhu, H. Modares, G. O. Peen, F. L. Lewis,
(2015), pp. 2770–2779. and B. Yue, Adaptive suboptimal output-feedback con-
[21] D. Kleinman, On an iterative technique for Riccati trol for linear systems using integral reinforcement
equation computations, IEEE Transactions on Auto- learning, IEEE Transactions on Control Systems Tech-
matic Control, 13 (1968), pp. 114–115. nology, 23 (2015), pp. 264–273.
[22] F. L. Lewis and D. Liu, Reinforcement Learning [38] K. J. Åström and B. Wittenmark, Adaptive Con-
and Approximate Dynamic Programming for Feedback trol, 2nd Edition, Addison-Wesley, MA, USA, 1997.
Control, Wiley-IEEE Press, NJ, USA, 2013.
[23] F. L. Lewis and K. G. Vamvoudakis, Reinforce-
ment learning for partially observable dynamic pro-
cesses: Adaptive dynamic programming using measured
output data, IEEE Transactions on Systems, Man, and
Cybernetics, Part B (Cybernetics), 41 (2011), pp. 14–
25.
[24] D. Liberzon, Calculus of Variations and Optimal
Control Theory: A Concise Introduction, Princeton
University Press, NJ, USA, 2012.
[25] T. Liu, L. Cui, B. Pang, and Z. P. Jiang, Data-

24 Unauthorized reproduction of this article is prohibited

2017--On the Sample Complexity of the Linear Quadratic Regulator
No ratings yet
2017--On the Sample Complexity of the Linear Quadratic Regulator
43 pages
Practical Slip Solutions For Data Structure Programming Using C
100% (1)
Practical Slip Solutions For Data Structure Programming Using C
25 pages
Using The Zonae Cogito Decision Support System: A Manual Prepared by Applied Environmental Decision Analysis Centre
No ratings yet
Using The Zonae Cogito Decision Support System: A Manual Prepared by Applied Environmental Decision Analysis Centre
35 pages
Adi Bank
No ratings yet
Adi Bank
60 pages
Data-Assimilated Model-Informed Reinforcement Learning
No ratings yet
Data-Assimilated Model-Informed Reinforcement Learning
31 pages
Hansen_2022
No ratings yet
Hansen_2022
20 pages
Near-optimal Control of Dynamical Systems With Neural Ordinary Differential Equations
No ratings yet
Near-optimal Control of Dynamical Systems With Neural Ordinary Differential Equations
23 pages
Automatica: D. Vrabie O. Pastravanu M. Abu-Khalaf F.L. Lewis
No ratings yet
Automatica: D. Vrabie O. Pastravanu M. Abu-Khalaf F.L. Lewis
8 pages
Applsci 13 13181
No ratings yet
Applsci 13 13181
21 pages
Predictive Control: For Linear and Hybrid Systems
No ratings yet
Predictive Control: For Linear and Hybrid Systems
458 pages
Kamala Pur Kar 2016
No ratings yet
Kamala Pur Kar 2016
11 pages
Linear Quadratic Control Using Model-Free Reinforcement Learning
No ratings yet
Linear Quadratic Control Using Model-Free Reinforcement Learning
16 pages
Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise
No ratings yet
Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise
12 pages
adaptative for lineare
No ratings yet
adaptative for lineare
23 pages
[15] Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method
No ratings yet
[15] Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method
34 pages
He, S et al (2019) Reinforcement learning
No ratings yet
He, S et al (2019) Reinforcement learning
10 pages
State–space modeling for control based on physics-informed neural networks
No ratings yet
State–space modeling for control based on physics-informed neural networks
10 pages
Borelli Predictive Control PDF
No ratings yet
Borelli Predictive Control PDF
424 pages
On_the_Certainty-Equivalence_Approach_to_Direct_Data-Driven_LQR_Design
No ratings yet
On_the_Certainty-Equivalence_Approach_to_Direct_Data-Driven_LQR_Design
8 pages
K.S.Rangasamy College of Technology: Tiruchengode - 637 215
No ratings yet
K.S.Rangasamy College of Technology: Tiruchengode - 637 215
19 pages
2019_RL_Control_Review
No ratings yet
2019_RL_Control_Review
27 pages
NIPS-1995-temporal-difference-learning-in-continuous-time-and-space-Paper
No ratings yet
NIPS-1995-temporal-difference-learning-in-continuous-time-and-space-Paper
7 pages
Disomat Opus
No ratings yet
Disomat Opus
152 pages
1909.04314
No ratings yet
1909.04314
7 pages
09029185
No ratings yet
09029185
7 pages
3D Printing
No ratings yet
3D Printing
12 pages
Self 2019
No ratings yet
Self 2019
6 pages
tac232
No ratings yet
tac232
7 pages
Greene 2020
No ratings yet
Greene 2020
6 pages
1 s2.0 S1474667016440140 Main
No ratings yet
1 s2.0 S1474667016440140 Main
6 pages
Cessna 2
No ratings yet
Cessna 2
20 pages
Adaptive optimal output regulation of unknown linear continuous-time systems by dynamic output feedback and value iteration
No ratings yet
Adaptive optimal output regulation of unknown linear continuous-time systems by dynamic output feedback and value iteration
11 pages
2024-Ouput-feedback-linear-system-based-on-ADP-1 (1)
No ratings yet
2024-Ouput-feedback-linear-system-based-on-ADP-1 (1)
10 pages
Model-Based Adaptive Critic Designs: Editor's Summary
No ratings yet
Model-Based Adaptive Critic Designs: Editor's Summary
31 pages
Automatica 1354
No ratings yet
Automatica 1354
10 pages
7.1 Linear Static Analysis
No ratings yet
7.1 Linear Static Analysis
10 pages
root
No ratings yet
root
8 pages
CS F342-11 PDF
No ratings yet
CS F342-11 PDF
24 pages
Adaptive Learning-Based Model Predictive Control For Uncertain Interconnected Systems: A Set Membership Identification Approach
No ratings yet
Adaptive Learning-Based Model Predictive Control For Uncertain Interconnected Systems: A Set Membership Identification Approach
11 pages
Ge Elec Reviewer
No ratings yet
Ge Elec Reviewer
137 pages
Data-driven-based Sliding-mode Dynamic Event-triggered Control (2) (1)
No ratings yet
Data-driven-based Sliding-mode Dynamic Event-triggered Control (2) (1)
11 pages
CDC_2023_final_submission 2023-09-12 14_11_10
No ratings yet
CDC_2023_final_submission 2023-09-12 14_11_10
6 pages
Control and Reinforcement Learning
No ratings yet
Control and Reinforcement Learning
6 pages
Provably Safe and Robust Learning-BasedModel Predictive Control
No ratings yet
Provably Safe and Robust Learning-BasedModel Predictive Control
13 pages
hmamed2016
No ratings yet
hmamed2016
5 pages
Unit 16.assignment Brief 1
0% (1)
Unit 16.assignment Brief 1
19 pages
w02 NeuralNetworks PDF
No ratings yet
w02 NeuralNetworks PDF
5 pages
Data Driven Control of Large Scale Systems (1) 240720 220740
No ratings yet
Data Driven Control of Large Scale Systems (1) 240720 220740
6 pages
3796 Neural Lyapunov Model Predicti
No ratings yet
3796 Neural Lyapunov Model Predicti
12 pages
shi2021
No ratings yet
shi2021
11 pages
2020 ADP Nonlinear System Mismatched Disterbances 2
No ratings yet
2020 ADP Nonlinear System Mismatched Disterbances 2
8 pages
Pertemuan 10 Teori Biaya Dan Estimasi
No ratings yet
Pertemuan 10 Teori Biaya Dan Estimasi
23 pages
Pointers in C++
100% (1)
Pointers in C++
10 pages
UHARC
No ratings yet
UHARC
13 pages
Digital Shadows Anomali Integration Datasheet
No ratings yet
Digital Shadows Anomali Integration Datasheet
2 pages
Data-Enabled Predictive Control: in The Shallows of The Deepc
No ratings yet
Data-Enabled Predictive Control: in The Shallows of The Deepc
8 pages
Sig Figs Presentation
No ratings yet
Sig Figs Presentation
18 pages
Innovative EC Systems: From E-Government To E-Learning, E-Health, Sharing Economy, and P2P Commerce
No ratings yet
Innovative EC Systems: From E-Government To E-Learning, E-Health, Sharing Economy, and P2P Commerce
3 pages
Discrete-Time Integral Sliding Mode Observer Design and A New Scheme For Model Predictive Control
No ratings yet
Discrete-Time Integral Sliding Mode Observer Design and A New Scheme For Model Predictive Control
6 pages
Adaptive Learning Feedback Linearization
No ratings yet
Adaptive Learning Feedback Linearization
9 pages
Cyber Security - (R20a898896202)
No ratings yet
Cyber Security - (R20a898896202)
75 pages
MPC Book
100% (1)
MPC Book
464 pages
Stochastic Feedback Controller Design Considering The Dual Effect
No ratings yet
Stochastic Feedback Controller Design Considering The Dual Effect
13 pages
5 - Uploading GPX
No ratings yet
5 - Uploading GPX
53 pages
Mohamad Lokman Ali, Munirah Ab. Rahman, Nik Shahidah Afifi MD Taujuddin
No ratings yet
Mohamad Lokman Ali, Munirah Ab. Rahman, Nik Shahidah Afifi MD Taujuddin
9 pages
Python Basics Notes
No ratings yet
Python Basics Notes
32 pages
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
No ratings yet
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
8 pages
A robust PID like state feedback control
No ratings yet
A robust PID like state feedback control
6 pages
Adaptive Dynamic Programming Algorithm For Uncertain Nonlinear Switched Systems
No ratings yet
Adaptive Dynamic Programming Algorithm For Uncertain Nonlinear Switched Systems
7 pages
Mathematics Is A Field That Many People Shy Away From
No ratings yet
Mathematics Is A Field That Many People Shy Away From
4 pages
Direct COSEC Login Via Query String
No ratings yet
Direct COSEC Login Via Query String
5 pages
An Adaptive Nonlinear Predictive Controller
No ratings yet
An Adaptive Nonlinear Predictive Controller
8 pages
Neuro Solution For Damage Detection and Categorisation of Earthquake-Affected Buildings
No ratings yet
Neuro Solution For Damage Detection and Categorisation of Earthquake-Affected Buildings
6 pages
Brikerbox SOHO: - Save Money On Cables 4. Save Money On Maintenance
No ratings yet
Brikerbox SOHO: - Save Money On Cables 4. Save Money On Maintenance
2 pages
Fundafinals PDF Free
No ratings yet
Fundafinals PDF Free
21 pages
Model Free Difference Feedback Control of Stochastic Systems
No ratings yet
Model Free Difference Feedback Control of Stochastic Systems
6 pages
BBMbook Cambridge Newstyle
No ratings yet
BBMbook Cambridge Newstyle
373 pages
Model Predictive Control
No ratings yet
Model Predictive Control
460 pages
Adaptive Control Using Neural Networks and Approximate Models
No ratings yet
Adaptive Control Using Neural Networks and Approximate Models
11 pages
January 2015 Calendar (Australia)
No ratings yet
January 2015 Calendar (Australia)
12 pages
J Automatica 2019 03 030
No ratings yet
J Automatica 2019 03 030
9 pages
ACODS 2014 GAndrade
No ratings yet
ACODS 2014 GAndrade
7 pages
Manual For The Sound Card Oscilloscope V1.30: 1 Requirements
No ratings yet
Manual For The Sound Card Oscilloscope V1.30: 1 Requirements
11 pages
Unit 5: Demo - Manage Sourcing With SAP Fiori Apps: Week 3: Omnichannel Availability & Sourcing
No ratings yet
Unit 5: Demo - Manage Sourcing With SAP Fiori Apps: Week 3: Omnichannel Availability & Sourcing
4 pages
Chief Operating Officer or Vice President or Director
No ratings yet
Chief Operating Officer or Vice President or Director
2 pages
Neural Networks For Control - 01
No ratings yet
Neural Networks For Control - 01
5 pages
Design and Analysis of Controllers: Definitive Reference for Developers and Engineers
From Everand
Design and Analysis of Controllers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Principles of Control Systems: Definitive Reference for Developers and Engineers
From Everand
Principles of Control Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet

[18]Learning-Based Control of Continuous-Time Systems Using Output Feedback

Uploaded by

[18]Learning-Based Control of Continuous-Time Systems Using Output Feedback

Uploaded by

∗

Learning-Based Control of Continuous-Time Systems Using Output Feedback

Abstract cations of ADP span across various fields such as au-

Copyright © 2023 by SIAM

learning-based PI algorithm. observable. If the state x can be measured directly, it

Copyright © 2023 by SIAM

Problem 1. In the absence of the accurate system x(t) =

Copyright © 2023 by SIAM

system (2.1). N ∞ (q+m)m

and they uniformly converge to zero, i.e.

[x (t)Pi x(t)] = xT (t)(ATi Pi + Pi Ai )x(t)

(4.5) where n2 = (q + m)N , n3 = n2 + n1 N , and n4 =

diag(P̄i (ξ, θ)) = WiN Ψ(ξ, θ) + eN

Copyright © 2023 by SIAM

+ ΓΨzz (t) vec(WiN ) + ΓΛzz (t) vec(ViN ), (4.17)

ρ1,i (t) = ˆ (ξ))T

Copyright © 2023 by SIAM

1 In this section, we demonstrate the efficacy of the

Copyright © 2023 by SIAM

0.6 Data Collection Ends

Copyright © 2023 by SIAM

Copyright © 2023 by SIAM

You might also like