Tracking Control of Completely Unknown Continuous-Time Systems Via Off-Policy Reinforcement Learning
Tracking Control of Completely Unknown Continuous-Time Systems Via Off-Policy Reinforcement Learning
Abstract— This paper deals with the design of an H ∞ tracking the perfect tracking. Second, a feedback control input is
controller for nonlinear continuous-time systems with completely designed by solving a Hamilton–Jacobi–Isaacs (HJI) equation
unknown dynamics. A general bounded L 2 -gain tracking to stabilize the tracking error dynamics. These methods are
problem with a discounted performance function is introduced
for the H ∞ tracking. A tracking Hamilton–Jacobi–Isaac (HJI) suboptimal as they ignore the cost of the feedforward control
equation is then developed that gives a Nash equilibrium solution input in the performance function. Moreover, in these methods,
to the associated min–max optimization problem. A rigorous procedures for computing the feedback and feedforward terms
analysis of bounded L 2 -gain and stability of the control solution are based on the offline solution methods that require complete
obtained by solving the tracking HJI equation is provided. knowledge of the system dynamics.
An upper-bound is found for the discount factor to assure local
asymptotic stability of the tracking error dynamics. An off-policy During the last few years, reinforcement learning (RL)
reinforcement learning algorithm is used to learn the solution [10]–[13] has been extensively used to solve the optimal
to the tracking HJI equation online without requiring any H2 [14]–[25] and H∞ [26]–[37] regulation problems, and
knowledge of the system dynamics. Convergence of the proposed has been successfully applied to several real-world applica-
algorithm to the solution to the tracking HJI equation is shown. tions [38]–[43]. Offline iterative RL algorithms [26], [27],
Simulation examples are provided to verify the effectiveness of
the proposed method. online synchronous RL algorithms [28]–[31], and
simultaneous RL algorithms [32]–[34] were proposed to
Index Terms— Bounded L 2 -gain, H∞ tracking controller, approximate the solution to the HJI equation arising in
reinforcement learning (RL), tracking Hamilton–Jacobi–
Isaac (HJI) equation. the H∞ regulation problem. These mentioned methods
require complete knowledge of the system dynamics.
Vrabie and Lewis [35] and Li et al. [36] used an
I. I NTRODUCTION integral RL (IRL) algorithm [15], [16] to learn the solution
to the HJI equation for systems with unknown dynamics.
T HE H∞ optimal control has been extensively used in the
effort to attenuate the effect of disturbances on the system
performance. The H∞ control theory has mostly concentrated
Although efficient, these methods require the disturbance to
be adjustable. However, this is not practical in most systems,
on designing regulators to drive the states of the system because the disturbance is independent and cannot be
to zero in the presence of disturbance [1]–[5]. In practice, specified. Luo et al. [37], inspired by [21] and [22], proposed
however, it is often required to force the states or outputs of an efficient off-policy RL algorithm to learn the solution to the
the system to track a reference trajectory. Existing solutions to HJI equation. In the off-policy RL algorithm, the system data,
the H∞ tracking problem are composed of two steps [6]–[9]. which are used to learn the HJI solution, can be generated
First, a feedforward control input is designed to guarantee with arbitrary policies rather than the evaluating policy.
Their method does not require an adjustable disturbance
Manuscript received October 27, 2014; revised April 25, 2015 and input. However, it requires partial knowledge of the system
May 17, 2015; accepted May 31, 2015. Date of publication June 24, 2015; dynamics.
date of current version September 16, 2015. This work was supported in
part by the National Science Foundation (NSF) under Grant ECCS-1405173 While significant progress has been achieved by the use of
and Grant IIS-1208623, in part by the Office of Naval Research, Arlington, RL algorithms for the design of the H∞ optimal controllers,
VA, USA, under Grant N00014-13-1-0562 and Grant N000141410718, and in and these algorithms are limited to the case of the regulation
part by the U.S. Army Research Office under Grant W911NF-11-D-0001. The
work of Z.-P. Jiang was supported by NSF under Grant ECCS-1101401 and problem. In practice, however, it is desired to make the system
Grant ECCS-1230040. to follow a reference trajectory. Therefore, the H∞ optimal
H. Modares is with the University of Texas at Arlington Research Institute, tracking controllers are required. Although the RL algorithms
Fort Worth, TX 76118 USA (e-mail: [email protected]).
F. L. Lewis is with the University of Texas at Arlington Research Institute, have been recently presented for solving H2 optimal
Fort Worth, TX 76118 USA, and also with the State Key Laboratory tracking [44]–[50], only Liu et al. [51] proposed an
of Synthetical Automation for Process Industries, Northeastern University, RL solution to the H∞ tracking. However, their solution is
Shenyang 110004, China (e-mail: [email protected]).
Z.-P. Jiang is with the Department of Electrical and Computer Engineering suboptimal as the cost of the feedforward control input is
with the Polytechnic School of Engineering, New York University, NY 11201 ignored in the performance function, and it requires complete
USA (e-mail: [email protected]). knowledge of the system dynamics.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. In this paper, an online off-policy RL algorithm is developed
Digital Object Identifier 10.1109/TNNLS.2015.2441749 to find the solution to the H∞ optimal tracking problem of
2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
MODARES et al.: H∞ TRACKING CONTROL OF COMPLETELY UNKNOWN CONTINUOUS-TIME SYSTEMS 2551
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
2552 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015
in Definition 1 gives an optimal solution, in contrast to the Using the augmented system (9), the disturbance attenuation
standard definition that results in a suboptimal solution, condition (7) becomes
as stated in [6]. ∞ ∞
−α(τ −t )
Remark 2: The performance function in the left-hand side e (X Q T X +u Ru)dτ ≤ γ
T T 2
e−α(τ −t )(d T d)dτ
t t
of the disturbance attenuation condition (7) represents a (11)
meaningful cost in the sense that it includes a positive penalty
on the tracking error and a positive penalty on the control where
effort. The use of the discount factor is essential. This is Q 0
because the feedforward part of the control input does not QT = . (12)
0 0
converge to zero in general, and thus, penalizing the control
Based on (11), define the performance function
input in the performance function without a discount factor ∞
makes the performance function unbounded. J (u, d) = e−α(τ −t )(X T Q T X + u T R u − γ 2 d T d) dτ.
Remark 3: Previous work on the H∞ optimal tracking t
divides the control input into feedback and feedforward parts. (13)
First, the feedforward part is obtained separately without Remark 4: Note that the problem of finding a control
considering any optimality criterion. Then, the problem of policy that satisfies bounded L 2 -gain condition for the
optimal design of the feedback part is reduced to an optimal tracking problem is equivalent to minimizing
H∞ optimal regulation problem. In contrast, in the new the discounted performance function (13) subject to the
formulation, both the feedback and feedforward parts of the augmented system (9).
control input are obtained simultaneously and optimally as a It is well-known that the H∞ control problem is closely
result of the defined L 2 -gain with a discount factor in (7). related to the two-player zero-sum differential game theory [5].
The control solution to the H∞ tracking problem with In fact, solvability of the H∞ control problem is equivalent to
the proposed attenuation condition (7) is provided in the solvability of the following zero-sum game [5]:
subsequent sections III and IV. We shall see in the subsequent
sections that this general disturbance attenuation condition V ∗ (X (t)) = J (u ∗ , d ∗ ) = min max J (u, d) (14)
u d
enables us to find both the feedback and feedforward parts
of the control input simultaneously, and therefore extends the where J is defined in (13) and V ∗ (X (t)) is defined as
method of off-policy RL for solving the problem in hand the optimal value function. This two-player zero-sum game
without requiring any knowledge of the system dynamics. control problem has a unique solution if a game theoretic
saddle point exists, i.e., if the following Nash condition holds:
III. HJI E QUATION FOR H∞ T RACKING V ∗ (X (t)) = min max J (u, d) = max min J (u, d). (15)
u d d u
In this section, it is first shown that the problem of
solving the H∞ tracking problem can be transformed into Note that differentiating (13) and noting that V (X (t)) =
a min–max optimization problem subject to an augmented J (u(t), d(t)) give the following Bellman equation:
system composed of the tracking error dynamics and the
H (V, u, d) = X T Q T X + u T R u − γ 2 d T d − αV
command generator dynamics. A tracking HJI equation is
then developed, which gives the solution to the min–max + V XT (F + G u + K d) = 0 (16)
optimization problem. The stability and L 2 -gain boundedness
where F(X) = F, G = G(X), K = K (X), and V X = ∂ V /∂ X.
of the tracking HJI control solution are discussed.
Applying stationarity conditions ∂ H (V ∗ , u, d)/∂u = 0 and
∂ H (V ∗ , u, d)/∂d = 0 [52] give the optimal control and
A. Tracking HJI Equation disturbance inputs as
In this section, a tracking HJI equation is formulated, 1
which gives the solution to the H∞ tracking problem stated u ∗ = − R −1 G T V X∗ (17)
2
in Definition 2. 1
Define the augmented system state d∗ = K T V X∗ (18)
2γ 2
X (t) = [ed (t)T r (t)T ]T ∈ R2n (8) where V ∗ is the optimal value function defined in (14).
where ed (t) is the tracking error defined in (3) and r (t) is the Substituting the control input u (17) and the disturbance d (18)
reference trajectory. into (16), the following tracking HJI equation is obtained:
Putting (2) and (4) together yields the augmented system
H (V ∗ , u ∗ , d ∗ ) = X T Q T X + V X∗T F − αV X
Ẋ(t) = F(X (t)) + G(X (t)) u(t) + K (X (t)) d(t) (9) 1
− V X∗T G T R −1 G V X∗
where u(t) = u(X (t)) and 4
1
+ 2 V X∗T K K T V X∗ = 0. (19)
f (ed + r ) − h d (r ) g(ed + r )
F(X) = , G(X) = 4γ
h d (r ) 0
In the following, it is shown that the control solution (17),
k(ed + r ) which is found by solving the HJI equation (19), solves the
K (X) = . (10)
0 H∞ tracking problem formulated in Definition 2.
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
MODARES et al.: H∞ TRACKING CONTROL OF COMPLETELY UNKNOWN CONTINUOUS-TIME SYSTEMS 2553
B. Disturbance Attenuation and Stability of the for every T > 0 and every d ∈ L 2 [0, ∞). Since V ∗ (.) ≥ 0
Solution to the Tracking HJI Equation the above equation yields
T
In this section, first, it is shown that the control solution (17)
e−ατ (X T Q T X + u ∗T Ru ∗ )dτ
satisfies the disturbance attenuation condition (11) 0 T
[part 1) of Definition 2]. Then, the stability of the tracking
≤ e−ατ (γ 2 d T d)dτ + V ∗ (X (0)). (26)
error dynamics (4) without the disturbance is discussed 0
[part 2) of Definition 2]. It is shown that there exists an upper This completes the proof.
bound α ∗ such that if the discount factor is <α ∗ , the control Theorem 2 solves part 1) of the state-feedback H∞ tracking
solution (17) makes the system locally asymptotically stable. control problem given in Definition 2. In the following,
Theorem 1 (Saddle Point Solution): Consider the we consider the problem of stability of the closed-loop system
H∞ tracking control problem as a two-player zero-sum without disturbance, which is part 2) of Definition 2.
game problem with the performance function (13). Then, the Theorem 3 (Stability of the Optimal Solution for α → 0):
pair of strategies (u ∗ , d ∗ ) defined in (17) and (18) provides a Suppose that V ∗ (X) is a smooth positive-semidefinite and
saddle point solution to the game. locally quadratic solution to the tracking HJI equation. Then,
Proof: See [26] for the same proof. the control input given by (17) makes the error dynamics (4)
Theorem 2 (L 2 -Gain of System for the Solution to the with d = 0 asymptotically stable in the limit as the discount
HJI Equation): Assume that there exists a continuous positive- factor goes to zero.
semidefinite solution V ∗ (X) to the tracking HJI equation (19). Proof: Differentiating V ∗ along the trajectories of the
Then, u ∗ in (17) makes the closed-loop system (9) to have closed-loop system with d = 0, and using the tracking
L 2 -gain less than or equal to γ. HJI equation give
Proof: The Hamiltonian (16) for the optimal value
function V ∗ , and any control policy u and disturbance policy w V X∗T (F + G u ∗ ) = αV ∗ − X T Q T X − u ∗T R u ∗ + γ 2 d T d
become (27)
or equivalently
H (V ∗ , u, d) = X T Q T X + u T R u − γ 2 d T d − αV ∗
d −αt ∗
+ V X∗T (F + G u + K d). (20) (e V (X)) = e−αt (−X T Q T X − u ∗T Ru ∗ +γ 2 d T d) ≤ 0.
dt
(28)
On the other hand, using (17)–(19), one has
If the discount factor goes to zero, then LaSalle’s extension can
H (V ∗ , u, d) = H (V ∗ , u ∗ , d ∗ ) + (u − u ∗ )T R (u − u ∗ ) be used to show that the tracking error is locally asymptotically
+ γ 2 (d − d ∗ )T (d − d ∗ ). (21) stable. More specifically, if α → 0, based on LaSalle’s
extension, X (t) = [ed (t)T r (t)T ]T goes to a region wherein
Based on the HJI equation (19), we have H (V ∗ , u ∗ , d ∗ ) = 0. V̇ = 0. Since X T Q T X = ed (t)T Q ed (t), where Q is the
Therefore, (20) and (21) give positive definite, V̇ = 0 only if ed (t) = 0, and u = 0
when d = 0. On the other hand, u = 0 also requires that
X T Q T X + u T Ru − γ 2 d T d − αV ∗ + V X∗T (F + G u + K d) ed (t) = 0; therefore, for γ = 0, the tracking error is locally
= −(u − u ∗ )T R (u − u ∗ ) − γ 2 (d − d ∗ )T (d − d ∗ ). (22) asymptotically stable.
Theorem 3 shows that if the discount factor goes to zero,
Substituting the optimal control policy u = u ∗ in the above then optimal control solution found by solving the tracking
equation yields HJI equation makes the system locally asymptotically stable.
However, if the discount factor is nonzero, the local asymptotic
X T Q T X + u ∗T Ru ∗ − γ 2 d T d − αV ∗ stability of the optimal control solution cannot be guaranteed
+V X∗T (F + Gu ∗ + K d) = −γ 2 (d − d ∗ )T (d − d ∗ ) ≤ 0. by Theorem 3. In Theorem 4, it is shown that the local
asymptotic stability of the optimal solution is guaranteed as
(23)
long as the discount factor is smaller than an upper bound.
Multiplying both the sides of this equation by e−αt , and Before presenting the proof of local asymptotic stability, the
defining V̇ ∗ = V X∗T (F + G u ∗ + K d) as the derivative of V ∗ following example shows that if the discount factor is not
along the trajectories of the closed-loop system, it gives small, the control solution obtained by solving the tracking
HJI equation can make the system unstable.
d −αt ∗ Example 1: Consider the scalar dynamical system
(e V (X)) ≤ e−αt (−X T Q T X − u ∗T R u ∗ + γ 2 d T d).
dt Ẋ = X + u + d. (29)
(24)
Assume that in the HJI equation (19), we have Q T = R = 1
Integrating from both the sides of this equation yields and the attenuation level is γ = 1. For this linear system with
quadratic performance, the value function is quadratic. That
e−αT V ∗ (X (T )) − V ∗ (X (0)) is, V (X) = p X 2 , and therefore, the HJI equation reduces to
T
≤ e−ατ (−X T Q T X − u ∗T Ru ∗ + γ 2 d T d)dτ (25) 3
(2 − α) p − p2 + 1 = 0 (30)
0 4
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
2554 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015
and the optimal control solution becomes Equation (42) gives the augmented system dynamics (9),
and (43) is equivalent to the HJI equation (19) with μ = V X∗ .
u = − p X. (31)
In order to prove the local stability of the closed-loop system,
Solving this equation gives the optimal solution as the stability of the closed-loop linearized system is investi-
gated. Using (33) for the system dynamics, (41) becomes
4 2 4
u = − (1 − 0.5α) + √ (1 − 0.5α)2 + 1 X. (32)
3 3 3 H m = (X T Q T X + u ∗T R u ∗ − γ 2 d ∗T d ∗ )
However, this optimal solution does not make the system + μT (AX + Bu ∗ + Dd ∗ + F̄(X)). (44)
stable for all values of the discount factor α. If fact, if
Then, the costate can be written as the sum of a linear and a
α > α ∗ = 27/12, then the system is unstable. The next
nonlinear term as
theorem shows how to find an upper bound α ∗ for the discount
factor to assure the stability of the system without disturbance. μ = 2P X + ϕ0 (X) ≡ μ1 + ϕ0 (X). (45)
Before presenting the stability theorem, note that the
augmented system dynamics (9) can be written as Using ∂ H m /∂u = 0, ∂ H m /∂d = 0 and (45), one has
μ = eαt ρ. (40) for some nonlinear function F f (X) with F f = [F Tf1 , F Tf2 ]T ,
which gives the following tracking error dynamics:
Based on (40), define the modified Hamiltonian function as
ėd = Al1 − Bl R −1 BlT P11 ed + F f 1 = Ac ed + F f 1 . (52)
H m = e−αt H = (X T Q T X + u ∗T Ru ∗ − γ 2 d ∗T d ∗ )
+ μT (F + G u ∗ + K d ∗ ). (41) The GARE (50) based on the closed-loop error dynamics Ac
becomes
Then, conditions (38) and (39) become
Q + AcT P11 + P11 Ac − α P11 + P11 Bl R −1 BlT P11
Ẋ = Hμm (X, μ) (42) 1
μ̇ = α μ − H Xm (X, μ). (43) + 2 P11 Dl DlT P11 = 0. (53)
γ
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
MODARES et al.: H∞ TRACKING CONTROL OF COMPLETELY UNKNOWN CONTINUOUS-TIME SYSTEMS 2555
To find a condition on the discount factor to assure stability of Algorithm 1 Offline RL Algorithm
the linearized error dynamics, assume that λ is an eigenvalue Initialization: Start with an admissible stabilizing control
of the closed-loop error dynamics Ac . That is, Ac x = λx policy u 0
with x, the eigenvector corresponding to λ. Then, multiplying 1) For a control input u i and a disturbance policy di , find
the left- and right-hand sides of the GARE (53) by x T and x, Vi using the following Bellman equation
respectively, one has
H (Vi , u i , di ) = X T Q T X + V XTi (F + G u i + K di )
T
2 (Re(λ) − 0.5α)x P11 x
− αVi + u iT R u i − γ 2 diT di = 0 (58)
= −x T Q x − x T P11 Bl R −1 BlT + D D T P11 x. (54)
2) Update the disturbance using
Using the inequality a 2 + b2 ≥ 2ab and since P11 > 0,
1
(54) becomes di+1 = arg max[H (Vi , u i , d)] = K T VX i (59)
2γ 2
−1 1/2
d
(Re(λ) − 0.5α) ≤ − Q P11 (L l P11 )1/2 (55)
and the control policy using
or equivalently 1
1/2 u i+1 = arg min[H (Vi , u, d)] = − R −1 G T V X i (60)
Re(λ) ≤ − QP−1
11
(L l P11 )1/2 + 0.5α (56) u 2
3) Go to 1.
where L l is defined in (35). Using the fact that AB ≥
AB gives
Re(λ) ≤ −(LQ)1/2 + 0.5α. (57) In fact, one can always pick a very small discount factor,
and/or a large weighting matrix Q (which is a design matrix)
Therefore, the linear error dynamics in (52) is stable if to assure that condition (36) is satisfied.
condition (36) is satisfied, and this completes the proof.
Remark 5: Note that the GARE (49) can be written as IV. O FF -P OLICY IRL FOR L EARNING THE
Q T + (A − 0.5α I )T P + P(A − 0.5α I ) T RACKING HJI E QUATION
1 In this section, an offline RL algorithm is first given
−P B R −1 B T P + P D D T P = 0.
γ2 to solve the problem of H∞ optimal tracking by learning
the solution to the tracking HJI equation. An off-policy
This amounts to a GARE without the discount factor and
IRL algorithm is then developed to learn the solution to the
with the system dynamics given by A − 0.5α I , B, and D.
HJI equation online and without requiring any knowledge of
Therefore, existence of a unique solution to the GARE (30)
the system dynamics. Three neural networks (NNs) on an
requires (A−0.5α I, B) be stabilizable. Based on the definition
actor–critic–disturbance structure are used to implement the
of A and B in (34), this requires that (Al1 − 0.5α I, Bl )
proposed off-policy IRL algorithm.
be stabilizable and (Al2 − 0.5α I ) is stable. However, since
(Al1 , Bl ) be stabilizable, as the system dynamics in (1) is
assumed robustly stabilizing, then (Al1 − 0.5α I, Bl ) is also A. Off-Policy Reinforcement Learning Algorithm
stabilizable for any α > 0. Moreover, since the reference The Bellman equation (16) is linear in the cost
trajectory is assumed bounded, the linearized model of the function V , while the HJI equation (19) is nonlinear in the
command generator dynamics in (2), i.e., Al2 , is marginally value function V ∗ . Therefore, solving the Bellman equation
stable, and thus, (Al2 − 0.5α I ) is stable. Therefore, the for V is easier than solving the HJI for V ∗ . Instead of
discount factor does not affect the existence of the solution directly solving for V ∗ , a policy iteration (PI) algorithm
to the GARE. iterates on both the control and disturbance players to break the
Remark 6: Theorem 4 shows that the asymptotic stability HJI equation into a sequence of differential equations linear in
of only the first n variables of X is guaranteed, which are the the cost. An offline PI algorithm for solving the H∞ optimal
error dynamic states. This is reasonable as the last n variables tracking problem is given in Algorithm 1.
of X are the reference command generator variables, which Algorithm 1 extends the results of the simultaneous
are not under our control. RL algorithm in [33] to the tracking problem. The
Remark
√ 7: For Example 1, condition (35) gives the bound convergence of this algorithm to the minimal nonnegative
α < 80/12 to assure the stability. This bound is very close solution of the HJI equation was shown in [33]. In fact, similar
to the actual bound obtained in Example 1. However, it is to [33], the convergence of Algorithm 1 can be established by
obvious that condition (35) gives a conservative bound for the proving that iteration on (58) is essentially Newton’s iterative
discount factor to assure the stability. sequence, which converges to the unique solution of the
Remark 8: Theorem 4 confirms the existence of an upper HJI equation (19).
bound for the discount factor to assure the stability of the Algorithm 1 requires complete knowledge of the system
solution to the HJI tracking equation, and relates this bound dynamics. In the following, the off-policy IRL algorithm,
to the input and disturbance dynamics, and the weighting which was presented in [21] and [22] for solving the
matrices in the performance function. Condition (36) is not a H2 optimal regulation problem, is extended here to solve
restrictive condition even if the system dynamics are unknown. the H∞ optimal tracking for systems with completely
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
2556 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015
unknown dynamics. To this end, the system dynamics (9) is one has
first written as t +T
e−α(τ −t ) X T Q T X + u iT Ru i − γ 2 diT di dτ
Ẋ = F + G u i + K di + G (u − u i ) + K (d − di ) (61) lim t
T →0 T
where u i = [u i,1 , . . . , u i,m ] ∈ Rm and di = [di,1 , . . . , di,q ] ∈ Rq = X Q T X + u iT R u i −γ 2 diT di
T
(66)
are policies to be updated. Differentiating Vi (X) along with t +T T
the system dynamics (61) and using (58)–(60) give e−α(τ −t ) 2u i+1 R(u −u i )−2γ 2 di+1
T
(d − di ) dτ
t
V̇i = V XTi (F + G u i + K di )+V XTi G(u − u i )+V XTi K (d − di ) lim
T →0 T
= α Vi − X T Q T X − u iT R u i + γ 2 diT di −2 u i+1
T
R (u − u i ) = T
2 u i+1 R (u − u i ) − 2γ 2 di+1
T
(d − di ). (67)
+ 2γ 2 di+1
T
(d − di ). (62)
Substituting (65)–(67) in (64) yields
Multiplying both the sides of (62) by e−α(τ −t ) and integrating
−α Vi + V X i (F + G u i + K di + G (u − u i ) + K (d − di ))
from both the sides yield the following off-policy IRL Bellman
equation: +X T Q T X + u iT R u i − γ 2 diT di + 2u i+1
T
R (u − u i )
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
MODARES et al.: H∞ TRACKING CONTROL OF COMPLETELY UNKNOWN CONTINUOUS-TIME SYSTEMS 2557
Algorithm 2 Online Off-Policy RL Algorithm for Solving vectors, Ŵ1 ∈ Rl1 , Ŵ2 ∈ Rm×l2 , and Ŵ3 ∈ Rq×l3 are
Tracking HJI Equation constant weight vectors, and l1 , l2 , and l2 are the number
Phase 1 (data collection using a fixed control policy): Apply a of neurons. Define v 1 = [v 11 , . . . , v 1m ]T = u − u i ,
fixed control policy u to the system and collect required system v 2 = [v 12 , . . . , v q2 ]T = d −di and assume R = diag(r, . . . , rm ).
information about the state, control input and disturbance at Then, substituting (70)–(72) in (69) yields
N different sampling interval T . e(t) = Ŵ1T(e−αT σ (X (t + T )) − σ (X (t)))
Phase 2 (reuse of collected data sequentially to find an t +T
optimal policy iteratively): Given u i and di , use collected − e−α(τ −t ) − X T Q T X − u iT Ru i +γ 2 diT di dτ
information in phase 1 to Solve the following Bellman
t
m t +T
equation for Vi , u i+1 and di+1 simultaneously: +2 rl e−α(τ −t ) Ŵ2,l
T
φ(X (t)) vl1 dτ
l=1 t
e−αT Vi (X (t + T )) − Vi (X (t)) q
t +T
t +T − 2γ 2 e−α(τ −t ) Ŵ3,k
T
ϕ(X (t)) v k2 dτ
(73)
= e−α(τ −t ) − X T Q T X − u iT R u i + γ 2 diT di dτ k=1 t
t
t +T where e(t) is the Bellman approximation error, Ŵ2,l is the
+ e−α(τ −t ) − 2u i+1
T
R(u − u i ) lth column of Ŵ2 , and Ŵ3,k is the kth column of Ŵ3 . The
t
Bellman approximation error is the continuous-time counter-
+ 2γ 2 di+1
T
(d − di ) dτ (69) part of the temporal difference (TD) [10]. In order to bring
Stop if a stopping criterion is met, otherwise set i = i + 1 the TD error to its minimum value, a least-squares method is
and got to 2. used. To this end, rewrite equation (73) as
y(t) + e(t) = Ŵ T h(t) (74)
solutions given by (17) and (18), where the value function where
satisfies the tracking HJI equation (19). T
Ŵ = Ŵ1T , Ŵ2,l
T T
, . . . , Ŵ2,m T
, Ŵ3,1 T
, . . . , Ŵ3,q
Proof: It was shown in Lemma 1 that the off-policy
tracking Bellman equation (69) gives the same value function ∈ Rl1 +m×l2 +q×l3 (75)
⎡ ⎤
as the Bellman equation (58) and the same updated policies e−αT σ (X (t+ T )) − σ (X (t)))
⎢ t +T ⎥
as (59) and (60). Therefore, both Algorithms 1 and 2 have the ⎢ 2r1
same convergence properties. Convergence of Algorithm 1 is ⎢ e−α(τ −t ) φ(X (t)) v 11 dτ ⎥ ⎥
⎢ t ⎥
proved in [33]. This confirms that Algorithm 2 converges to ⎢ .. ⎥
⎢ t +T . ⎥
the optimal solution. ⎢ ⎥
⎢ −α(τ −t ) ⎥
Remark 11: Although both Algorithms 1 and 2 have the ⎢ 2rm e φ(X (t)) v 1
dτ ⎥
h(t) = ⎢
m ⎥ (76)
same convergence properties, Algorithm 2 is a model-free ⎢ t t +T ⎥
⎢ ⎥
algorithm, which finds an optimal control policy without ⎢ −2γ 2 e−α(τ −t ) ϕ(X (t)) v 12 dτ ⎥
⎢ ⎥
requiring any knowledge of the system dynamics. This is ⎢ t
.. ⎥
⎢ . ⎥
in contrast to Algorithm 1 that requires full knowledge of ⎢ ⎥
⎣ t +T ⎦
the system dynamics. Moreover, Algorithm 1 is an on-policy −α(τ −t )
−2γ 2 e ϕ(X (t)) v q dτ
2
RL algorithm, which requires the disturbance input to be t +T t
specified and adjustable. On the other hand, Algorithm 2 is
y(t) = e−α(τ −t ) − X T Q T X − u iT Ru i + γ 2 diT di dτ .
an off-policy RL algorithm, which obviates this requirement. t
(77)
B. Implementing Algorithm 2 Using Neural Networks
The parameter vector Ŵ , which gives the approximated
In order to implement the off-policy RL Algorithm 2, it is value function, actor, and disturbance (70)–(72), is found by
required to reuse the collected information found by applying minimizing, in the least-squares sense, the Bellman error (74).
a fixed control policy u to the system to solve (69) for Vi , Assume that the systems state, input, and disturbance
u i+1 , and di+1 iteratively. Three NNs, i.e., the actor NN, the information are collected at N ≥ l1 + m × l2 + q × l3
critic NN, and the disturber NN, are used here to approximate (the number of independent elements in Ŵ ) points t1 to t N
the value function and the updated control and disturbance in the state space, over the same time interval T in phase 1.
policies in the Bellman equation (69). That is, the solution Vi , Then, for a given u i and di , one can use this information to
u i+1 , and di+1 of the Bellman equation (69) is approximated evaluate (76) and (77) at N points to form
by three NNs as
H = [h(t1 ), . . . ., h(t N )] (78)
V̂i (X) = Ŵ1T σ (X) (70)
Y = [y(t1 ), . . . ., y(t N )]T. (79)
û i+1 (X) = Ŵ2T φ(X) (71)
The least-squares solution to (74) is then equal to
d̂i+1 (X) = Ŵ3T ϕ(X) (72)
Ŵ = (H H T )−1 H Y (80)
where σ = [σ1 , . . . , σl1 ] ∈ Rl1 , φ = [φ1 , . . . , φl2 ] ∈ Rl2 ,
and ϕ = [ϕ1 , . . . , ϕl3 ] ∈ Rl3 provide suitable basis function which gives Vi , u i+1 , and di+1 .
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
2558 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015
V. S IMULATION R ESULTS
In this section, the proposed off-policy IRL method is first
applied to a linear system to show that it converges to the
optimal solution. Then, it is tested on a nonlinear system.
Fig. 2. Convergence of the kernel matrix P to its optimal value for
F-16 example.
A. Linear System: F16 Aircraft Systems
Consider the F16 aircraft system described by ẋ = Ax +
Bu + Dd with the following dynamics:
⎡ ⎤
−1.01887 0.90506 −0.00215
A = ⎣ 0.82225 −1.07741 −0.17555 ⎦
0 0 −1
⎡ ⎤ ⎡ ⎤
0 1
B = ⎣ 0 ⎦, D = ⎣ 0 ⎦. (81)
5 0
The system state vector is x = [x 1 x 2 x 3 ] = [α q δe ],
Fig. 3. Convergence of the control gain to its optimal value for F-16 example.
where α denotes the angle of attack, q is the pitch rate,
and δe is the elevator deflection angle. The control input is
the elevator actuator voltage, and the disturbance is wind their optimal values. In fact, P converges to
gusts on angle of attack. It is assumed that the output is ⎡ ⎤
12.675 5.418 −0.432 −7.481 5.424 −0.439
y = α, and the desired value is constant. Thus, the command ⎢ 5.420 3.412 −0.330 −4.985 3.404
⎢ −0.329 ⎥
⎥
generator dynamics become ṙ = 0. Therefore, the augmented ⎢ −0.427 −0.323 0.042 0.546 −0.333 0.046 ⎥
dynamics (9) becomes equal to (82), as shown at the bottom of P =⎢ ⎢ −7.495 −4.973 0.545 201.408 −4.985
⎥
⎢ 0.527 ⎥
⎥
this page. Since only e1 = x 1 −r1 is concerned as the tracking ⎣ 5.419 3.406 −0.328 −4.968 3.405 −0.339 ⎦
error, the first element of the matrix Q T in (12) is considered
−0.421 −0.347 −0.201 0.036 −0.333 0.046
to be 20, and all other elements are zero. It is also assumed
here that R = 1 and γ = 10. The offline solution to the which is very close to its optimal value given in (83). These
GARE (49) and consequently the optimal control policy (46) results and Figs. 2 and 3 confirm that the proposed method
are given in (83), as shown at the bottom of this page. converses with the optimal tracking solution without requiring
We now implement the off-policy IRL Algorithm 2. The the knowledge of the system dynamics. The optimal control
reinforcement interval is chosen as T = 0.05. The initial solution found in (83) is now applied to the system to test
control gain is chosen as zero. Figs. 2 and 3 show the its performance. To this end, it is assumed that the desired
convergence of the kernel matrix P and the control gain to value for the output is r1 = 2 for 0–30 s, and changes
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1.01887 0.90506 −0.00215 −1.01887 0.90506 −0.00215 0 1
⎢ 0.82225 −1.07741 −0.17555 0.82225 −1.07741 −0.17555 ⎥ ⎢0⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ −1 −1 ⎥ ⎢5⎥ ⎢0⎥
⎢ 0 0 0 0 ⎥ ⎢ ⎥ ⎢ ⎥
Ẋ = ⎢ ⎥ X + ⎢ ⎥u + ⎢ ⎥d (82)
⎢ 0 0 0 0 0 0 ⎥ ⎢0⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 0 0 0 0 0 0 ⎦ ⎣0⎦ ⎣0⎦
0 0 0 0 0 0 0 0
⎡ ⎤
12.677 5.420 −0.432 −7.474 5.420 −0.432
⎢ 5.420 3.405 −0.332 −4.980 3.405 −0.332 ⎥
⎢ ⎥
⎢ −0.432 −0.332 −0.332 0.040 ⎥
∗ ⎢ 0.040 0.544 ⎥
P =⎢ ⎥
⎢ −7.474 −4.980 0.544 201.451 −4.980 0.544 ⎥
⎢ ⎥
⎣ 5.420 3.405 −0.332 −4.980 3.405 −0.332 ⎦
−0.432 −0.332 −0.205 0.040 −0.332 0.040
u ∗ = −[−2.1620, −1.6623, 0.2005, 2.7198, −1.6623, 0.2005]X (83)
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
MODARES et al.: H∞ TRACKING CONTROL OF COMPLETELY UNKNOWN CONTINUOUS-TIME SYSTEMS 2559
Fig. 4. Reference trajectory versus output for F-16 systems using the
proposed control method. Fig. 6. Reference trajectory versus the second state of the robot manipulator
systems during and after learning.
Fig. 5. Reference trajectory versus the first state of the robot manipulator
systems during and after learning.
Fig. 7. Reference trajectory versus the third state of the robot manipulator
to r1 = 3 at 30 s. The disturbance is assumed to be systems during and after learning.
d = 0.1e−0.1t sin(0.1t). Fig. 4 shows how the output converges
to its desired values after the control solution (83) is applied
to the system, and confirms that the proposed optimal control
solution achieves suitable results.
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
2560 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
MODARES et al.: H∞ TRACKING CONTROL OF COMPLETELY UNKNOWN CONTINUOUS-TIME SYSTEMS 2561
[29] K. G. Vamvoudakis and F. L. Lewis, “Online gaming: Real time solution [50] C. Qin, H. Zhang, and Y. Luo, “Online optimal tracking control
of nonlinear two-player zero-sum games using synchronous policy of continuous-time linear systems with unknown dynamics by using
iteration,” in Advances in Reinforcement Learning, A. Mellouk, Ed. adaptive dynamic programming,” Int. J. Control, vol. 87, no. 5,
Delhi, India: Intech, 2011. pp. 1000–1009, 2014.
[30] H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, “Online solution [51] D. Liu, Y. Huang, and Q. Wei, “Neural network H∞ tracking control of
of nonquadratic two-player zero-sum games arising in the H∞ control nonlinear systems using GHJI method,” in Advances in Neural Networks,
of constrained input systems,” Int. J. Adapt. Control Signal Process., C. Guo, Z.-G. Huo, and Z. Zeng, Eds. Dalian, China: Springer-Verlag,
vol. 28, nos. 3–5, pp. 232–254, 2014. 2013.
[31] H. Zhang, C. Qin, B. Jiang, and Y. Luo, “Online adaptive policy [52] F. L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control, 3rd ed.
learning algorithm for H∞ state feedback control of unknown affine New York, NY, USA: Wiley, 2012.
nonlinear discrete-time systems,” IEEE Trans. Cybern., vol. 44, no. 12, [53] F. L. Lewis, D. M. Dawson, and C. T. Abdallah, Robot Manipulator
pp. 2706–2718, Dec. 2014. Control: Theory and Practice, 2nd ed. New York, NY, USA: CRC Press,
[32] H.-N. Wu and B. Luo, “Simultaneous policy update algorithms for 2003.
learning the solution of linear continuous-time H∞ state feedback
control,” Inf. Sci., vol. 222, pp. 472–485, Feb. 2013.
[33] H.-N. Wu and B. Luo, “Neural network based online simultaneous policy
update algorithm for solving the HJI equation in nonlinear H∞ control,”
IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 12, pp. 1884–1895,
Dec. 2012. Hamidreza Modares received the B.S. degree from
[34] B. Luo and H.-N. Wu, “Computationally efficient simultaneous pol- the University of Tehran, Tehran¸ Iran, in 2004,
icy update algorithm for nonlinear H∞ state feedback control with the M.S. degree from the Shahrood University
Galerkin’s method,” Int. J. Robust Nonlinear Control, vol. 23, no. 9, of Technology, Shahrood, Iran, in 2006, and the
pp. 991–1012, 2013. Ph.D. degree from The University of Texas at
[35] D. Vrabie and F. L. Lewis, “Adaptive dynamic programming for online Arlington, Arlington, TX, USA, in 2015.
solution of a zero-sum differential game,” J. Control Theory Appl., vol. 9, He was a Senior Lecturer with the Shahrood
no. 3, pp. 353–360, 2011. University of Technology, from 2006 to 2009.
[36] H. Li, D. Liu, and D. Wang, “Integral reinforcement learning for linear His current research interests include cyber-physical
continuous-time zero-sum games with completely unknown dynamics,” systems, reinforcement learning, distributed control,
IEEE Trans. Autom. Sci. Eng., vol. 11, no. 3, pp. 706–714, Jul. 2014. robotics, and pattern recognition.
[37] B. Luo, H.-N. Wu, and T. Huang, “Off-policy reinforcement learning for
H∞ control design,” IEEE Trans. Cybern., vol. 45, no. 1, pp. 65–76,
Jan. 2015.
[38] Q. Wei, D. Liu, G. Shi, and Y. Liu, “Multibattery optimal coordination
control for home energy management systems via distributed iterative Frank. L. Lewis (S’70–M’81–SM’86–F’94)
adaptive dynamic programming,” IEEE Trans. Ind. Electron., vol. 62, received the bachelor’s degree in physics/electrical
no. 7, pp. 4203–4214, Jul. 2015. engineering and the M.S. degree in electrical
[39] S. Jagannathan and G. Galan, “Adaptive critic neural network-based engineering from Rice University, Houston, TX,
object grasping control using a three-finger gripper,” IEEE Trans. Neural USA, the M.S. degree in aeronautical engineering
Netw., vol. 15, no. 2, pp. 395–407, Mar. 2004. from the University of West Florida, Pensacola,
[40] Q. Wei, D. Liu, and G. Shi, “A novel dual iterative Q-learning method FL, USA, and the Ph.D. degree from the Georgia
for optimal battery management in smart residential environments,” Institute of Technology, Atlanta, GA, USA.
IEEE Trans. Ind. Electron., vol. 62, no. 4, pp. 2509–2518, Apr. 2015. He is currently a U.K. Chartered Engineer,
[41] Q. Wei and D. Liu, “Data-driven neuro-optimal temperature control of the IEEE Control Systems Society Distinguished
water–gas shift reaction using stable iterative adaptive dynamic pro- Lecturer, a University of Texas at Arlington
gramming,” IEEE Trans. Ind. Electron., vol. 61, no. 11, pp. 6399–6408, Distinguished Scholar Professor, a UTA Distinguished Teaching Professor,
Nov. 2014. and a Moncrief-O’Donnell Chair with the University of Texas at Arlington
[42] H. Modares, I. Ranatunga, F. L. Lewis, and D. O. Popa, “Optimized Research Institute, Fort Worth, TX, USA. He is a Qian Ren Thousand Talents
assistive human–robot interaction using reinforcement learning,” IEEE Consulting Professor with Northeastern University, Shenyang, China. He is
Trans. Cybern., to be published. involved in feedback control, reinforcement learning, intelligent systems, and
[43] Q. Wei and D. Liu, “Adaptive dynamic programming for optimal distributed control systems. He has authored six U.S. patents, 301 journal
tracking control of unknown nonlinear systems with application to papers, 396 conference papers, 20 books, 44 chapters, and 11 journal special
coal gasification,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, issues.
pp. 1020–1036, Oct. 2014. Dr. Lewis is a member of the National Academy of Inventors. He is
[44] B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and a fellow of the International Federation of Automatic Control, the U.K.
M.-B. Naghibi-Sistani, “Reinforcement QQ-learning for optimal track- Institute of Measurement and Control, and Professional Engineer at Texas.
ing control of linear discrete-time systems with unknown dynamics,” He was a Founding Member of the Board of Governors of the Mediterranean
Automatica, vol. 50, no. 4, pp. 1167–1175, 2014. Control Association. He received the Fulbright Research Award, the NSF
[45] H. Modares and F. L. Lewis, “Linear quadratic tracking control of Research Initiation Grant, the ASEE Terman Award, the International
partially-unknown continuous-time systems using reinforcement learn- Neural Network Society Gabor Award in 2009, and the U.K. Institute of
ing,” IEEE Trans. Autom. Control, vol. 59, no. 11, pp. 3051–3065, Measurement and Control Honeywell Field Engineering Medal in 2009.
Nov. 2014. He was a recipient of the IEEE Computational Intelligence Society Neural
[46] R. Kamalapurkar, H. Dinh, S. Bhasin, and W. E. Dixon, “Approximate Networks Pioneer Award in 2012, the Distinguished Foreign Scholar from
optimal trajectory tracking for continuous-time nonlinear systems,” the Nanjing University of Science and Technology, the 111 Project Professor
Automatica, vol. 51, pp. 40–48, Jan. 2015. at Northeastern University, China, the Outstanding Service Award from the
[47] H. Modares and F. L. Lewis, “Optimal tracking control of nonlin- Dallas IEEE Section, an Engineer of the Year from the Fort Worth IEEE
ear partially-unknown constrained-input systems using integral rein- Section. He was listed in Fort Worth Business Press Top 200 Leaders in
forcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, Manufacturing. He was also a recipient of the 2010 IEEE Region Five
2014. Outstanding Engineering Educator Award, the 2010 UTA Graduate Dean’s
[48] H. Zhang, R. Song, Q. Wei, and T. Zhang, “Optimal tracking control Excellence in Doctoral Mentoring Award, was elected to the UTA Academy
for a class of nonlinear discrete-time systems with time delays based on of Distinguished Teachers in 2012, and the Texas Regents Outstanding
heuristic dynamic programming,” IEEE Trans. Neural Netw., vol. 22, Teaching Award in 2013. He served on the NAE Committee on Space
no. 12, pp. 1851–1862, Dec. 2011. Station in 1995. He also received the IEEE Control Systems Society Best
[49] B. Kiumarsi and F. L. Lewis, “Actor–critic-based optimal track- Chapter Award (as a Founding Chairman of DFW Chapter), the National
ing for partially unknown nonlinear discrete-time systems,” IEEE Sigma Xi Award for Outstanding Chapter (as a President of UTA Chapter),
Trans. Neural Netw. Learn. Syst., vol. 26, no. 1, pp. 140–151, and the U.S. SBA Tibbets Award in 1996 (as the Director of ARRI’s SBIR
Jan. 2015. Program).
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.
2562 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 10, OCTOBER 2015
Zhong-Ping Jiang (M’94–SM’02–F’08) received Prof. Jiang is a fellow of the International Federation of Automatic
the B.Sc. degree in mathematics from the University Control. He was a recipient of the prestigious Queen Elizabeth II Fellowship
of Wuhan, Wuhan, China, in 1988, the M.Sc. degree Award from the Australian Research Council, the CAREER Award from the
in statistics from the University of Paris XI, Paris, U.S. National Science Foundation (NSF), and the Distinguished Overseas
France, in 1989, and the Ph.D. degree in automatic Chinese Scholar Award from the NSF of China. He was also a recipient
control and mathematics from the Ecole des Mines of recent awards recognizing his research work, including the Best Theory
de Paris, Paris, France, in 1993. Paper Award with Y. Wang at the 2008 World Congress on Intelligent
He is currently a Professor of Electrical and Control and Automation, the Guan Zhao Zhi Best Paper Award, with
Computer Engineering with the Polytechnic School T. Liu and D. Hill, at the 2011 Chinese Control Conference, and the
of Engineering, New York University, New York, Shimemura Young Author Prize with his student Y. Jiang at the 2013 Asian
NY, USA. His current research interests include Control Conference in Istanbul, Turkey. He is a Deputy co-Editor-in-Chief of
stability theory, robust, adaptive and distributed nonlinear control, adaptive the Journal of Control and Decision, an Editor for the International Journal
dynamic programming, and their applications to information, mechanical of Robust and Nonlinear Control, and has served as an Associate Editor for
and biological systems. He has co-authored the book Stability and Stabi- several journals, including the Mathematics of Control, Signals and Systems,
lization of Nonlinear Systems (Springer, 2011) with Dr. I. Karafyllis, and Systems and Control Letters, the IEEE T RANSACTIONS ON AUTOMATIC
Nonlinear Control of Dynamic Networks (Taylor & Francis, 2014) with C ONTROL, the European Journal of Control, and Science China: Information
Dr. T. Liu and Dr. D. J. Hill. Sciences.
Authorized licensed use limited to: Motilal Nehru National Institute of Technology. Downloaded on October 09,2024 at 11:17:46 UTC from IEEE Xplore. Restrictions apply.