Distributed_Optimal_Consensus_Control_for_Multiagent_Systems_With_Input_Delay

This paper presents a method for distributed optimal consensus control in heterogeneous linear multiagent systems (MASs) affected by time-varying input delays. It transforms the continuous-time input-delayed system into a discrete-time delay-free system, allowing the application of Hamilton–Jacobi–Bellman equations to design optimal control policies. The proposed adaptive dynamic programming algorithm, implemented via a critic-action neural network, ensures that local consensus and weight estimation errors remain bounded while achieving consensus with minimal predefined performance.

Uploaded by

lekhanh2410

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Distributed_Optimal_Consensus_Control_for_Multiagent_Systems_With_Input_Delay

Uploaded by

lekhanh2410

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 48, NO.

6, JUNE 2018 1747

Distributed Optimal Consensus Control for

Multiagent Systems With Input Delay
Huaipin Zhang, Dong Yue, Senior Member, IEEE, Wei Zhao, Songlin Hu, and Chunxia Dou

Abstract—This paper addresses the problem of distributed robots [5], and so on. As one of the fundamental issues of coor-
optimal consensus control for a continuous-time heteroge- dination control, consensus control [6]–[11] aims at designing
neous linear multiagent system subject to time varying input a control protocol based on local information that enables all
delays. First, by discretization and model transformation, the
continuous-time input-delayed system is converted into a discrete- the agents to reach an agreement regarding certain quantity
time delay-free system. Two delicate performance index func- of interest. We refer to the surveys [12]–[14] on the recent
tions are defined for these two systems. It is shown that the developments of consensus control for MASs.
performance index functions are equivalent and the optimal In the context of multiagent consensus control, it is desirable
consensus control problem of the input-delayed system can be to develop a distributed control policy which not only ensures
cast into that of the delay-free system. Second, by virtue of
the Hamilton–Jacobi–Bellman (HJB) equations, an optimal con- consensus but also guarantees a certain level of performance.
trol policy for each agent is designed based on the delay-free This stimulates the study of guaranteed cost consensus control
system and a novel value iteration algorithm is proposed to for MASs [6], [15], [16], in which all the agents can reach
learn the solutions to the HJB equations online. The proposed consensus under some performance constraints. Guaranteed
adaptive dynamic programming algorithm is implemented on the cost consensus control is regarded as an effective strategy to
basis of a critic-action neural network (NN) structure. Third, it
is proved that local consensus errors of the two systems and achieve a tradeoff between regulation performance and system
weight estimation errors of the critic-action NNs are uniformly stability, which is deemed as a suboptimal problem. A key
ultimately bounded while the approximated control policies con- concern lies in how to design a distributed optimal control
verge to their target values. Finally, two simulation examples policy such that all the agents reach consensus with the mini-
are presented to illustrate the effectiveness of the developed mal predefined performance. However, the optimal consensus
method.
control policies generally depend on the Hamilton–Jacobi–
Index Terms—Adaptive dynamic programming (ADP), Bellman (HJB) equations [17]–[19], rendering the optimal
distributed consensus control, input delays, multiagent consensus control problem difficult to solve analytically. It is
systems (MASs), optimal control.
thus of great significance to develop an effective method to
tackle this issue.
Adaptive dynamic programming (ADP) [20] is regarded as a
promising method to solve the HJB equations forward-in-time.
I. I NTRODUCTION By means of value iteration (VI) or policy iteration algorithm,
the value function can be obtained and the optimal control pol-
ITH the rapid development of computation and
W communication, coordination control of networked
multiagent systems (MASs) has received considerable atten-
icy can be calculated by solving the HJB equations [21], [22].
Recently, intensive research attention has been paid to the
investigation of ADP methods from both theoretical research
tion due to their broad applications in various fields, such as
and practical applications. For example, in [23] and [24],
unmanned aerial vehicles [1], automated highway systems [2],
some research trends as well as the developments and appli-
satellite clusters [3], multiple rigid systems [4], mobile
cations of ADP methods were reviewed. Moreover, ADP
Manuscript received December 19, 2016; revised March 31, 2017 and June methods show their merits in improving the capacity of con-
5, 2017; accepted June 6, 2017. Date of publication June 27, 2017; date of cur- trol and optimization for several practical engineering systems,
rent version May 15, 2018. This work was supported in part by the Natural such as power systems [25], navigation systems [26], air-
Science Foundation of China under Grant 61533010, Grant 61374055, and
Grant 61673223, and in part by the Natural Science Foundation of Jiangsu crafts [27], and so forth. Using ADP methods deal with
Province of China under Grant BK20131381 and Grant BK20151510. This optimal coordination control problems for MASs also attracts
paper was recommended by Associate Editor Q.-L. Han. (Corresponding the attention of research [17]–[19]. In [17] and [18], some
author: Dong Yue.)
H. Zhang is with the School of Automation, Huazhong University of online adaptive learning solutions were obtained for multiagent
Science and Technology, Wuhan 430074, China. graphical games. In [19], an online scheme to settle opti-
D. Yue, S. Hu, and C. Dou are with the Institute of Advanced Technology, mal coordination control problem for MASs was designed
Nanjing University of Posts and Telecommunications, Nanjing 210023, China
(e-mail: [email protected]). by employing fuzzy ADP. However, it should be pointed out
W. Zhao is with the School of Mechanical Science and Engineering, that all the works aforementioned for MASs only consider
Huazhong University of Science and Technology, Wuhan 430074, China. an idealized network environment, which means that informa-
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. tion exchanges among the agents are required to be performed
Digital Object Identifier 10.1109/TCYB.2017.2714173 instantaneously.
2168-2267 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:10 UTC from IEEE Xplore. Restrictions apply.
1748 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 48, NO. 6, JUNE 2018

Due to the limited transmission rate of communication chan- V} ⊆ V × V, and a weighted adjacency matrix A = [aij ]N×N .
nels and the limited bandwidth in some shared communication If there is a communication link between agent i and agent j,
networks [28], [29], time delay is an inescapable effect fac- i.e., (i, j) ∈ E, then aij > 0; otherwise, aij = 0. It is assumed
tor when dealing with the consensus problem for MASs. It is that aii = 0. The set of neighbors of agent i is denoted by
well known that time delay can degrade system performance Ni = { j : (i, j) ∈ E}. A path from i to j is an edge sequence of
and even jeopardize system stability [30]. Thus, it is essential the form (i, i1 ), . . . , (ik , j) starting from i and ending with j.
to address the consensus problem for MASs in the pres- Two agents i and j are connected if there is a path from i to j.
ence of time delays. For example, in [31], an adaptive neural If all pairs of agents in G are connected, then G is called a con-
network (NN) consensus control protocol was proposed for nected graph. The degree matrix of G is given by D = diag{d1 ,
a class of nonlinear MASs with state time delay, in which d2 , . . . , dN }, where di = j∈Ni aij . The Laplacian matrix
radial basis function NNs were used to compensate the uncer- H = D − A of G is a symmetric positive semi-definite matrix.
tain nonlinearity and the time delay of agents’ dynamics.
In [32], the problem of robust consensus control for an input-
B. Problem Formulation
delayed uncertain MAS was studied. Using the Artstein model
reduction, a state transformation was performed to deal with Consider a group of N agents subject to heterogenous
the input delay and the input-dependent integral term in the dynamics. For any i ∈ V, the dynamics of the ith agent are
transformed system was carefully analyzed. Then, based on described by the following continuous-time linear system with
Lyapunov functional approach, sufficient conditions on global a time-varying input delay:
consensus were derived. To the best of the authors’ knowledge,
ẋi (t) = Ai xi (t) + Bi ui (t − τi (t)) (1)
the optimal consensus control problem of MASs in presence
of input delays has not been adequately addressed, which where xi (t) ∈ Rn and ui (t) ∈ Rm are the state and the control
motivates this paper. input of the ith agent, respectively; τi (t) is the time-varying
In this paper, we will deal with the problem of distributed input delay satisfying 0 ≤ τi (t) ≤ dTs with d ∈ N being a
optimal consensus control for a continuous-time heterogenous positive integer and Ts > 0 representing the sampling interval;
linear MAS subject to time-varying input delays. The main Ai and Bi are constant matrices of appropriate dimensions; and
contributions are summarized as follows. the initial condition of the system (1) is given by xi (θ ), θ ∈
1) A proper model transformation based on discretization [−dTs , 0].
will be developed to incorporate the impact of the time-varying In the following, we consider a consensus control frame-
input delay such that the optimal consensus problem of the work for MAS (1) in the sampled-data setting, where agents’
original continuous-time input-delayed MAS is cast into the states and control inputs are sampled at discrete instants
optimal consensus problem of a discrete-time delay-free MAS. {kTs }k∈∈N and digital controllers only use the latest sampled-
Moreover, conditions on the equivalence of the performance data to update their inputs. Sensors are time-driven while the
index functions for both of the two MASs will be provided. controllers and actuators are event-driven [33]. It is assumed
2) A novel VI algorithm will be proposed to learn the that there exist at most d + 1 current and previous control
solutions to the coupled HJB equations online. Such an inputs arriving at the actuator during any sampling interval
ADP algorithm will be implemented within critic-action NN [kTs , (k + 1)Ts ) for each agent. If several control inputs arrive
networks. at the actuator at the same time, the latest control input is
3) A delicate criterion, which guarantees that local consen- adopted to act on the controlled agent while the others are
sus errors of the two MASs and weight estimation errors of the discarded. For agent i, the sampled-data are delivered to con-
critic-action NNs are uniformly ultimately bounded (UUB), troller i at sampling instants (k −l)Ts +τil (k) = kTs +til (k), l =
and the approximated control policies converge to their target 0, 1, . . . , d, where τil (k) denotes the input delay at sampling
values, will be presented.
instant (k − l)Ts and til (k) = τil (k) − lTs with til (k) < til−1 (k).
The rest of this paper is organized as follows. Section II for-
Integrating (1) over a sampling interval [kTs , (k + 1)Ts ), we
mulates the main problem to be addressed. Section III presents
have the following discretized system:
the main result for designing an optimal consensus control
policy for each agent. A novel VI algorithm is also outlined xi (k + 1) = Āi xi (k) + B0i (k)ui (k)
in Section III. In Section IV, critic-action NNs are used to
+ · · · + Bli (k)ui (k − l) + · · · + Bdi (k)ui (k − d)
approximate the value functions and control policies online
based on the gradient descending method. Section V pro- (2)
vides simulation results on validating the proposed method
and Section VI summarizes this paper. where xi (k) = xi (kTs ), ui (k) = ui (kTs ), Āi = eAi Ts and
Ts
B0i (k) = eAi (Ts −t) dtBi ψi (k + 1)Ts − τi0 (k)
II. P RELIMINARIES AND P ROBLEM F ORMULATION ti0 (k)
A. Algebraic Graph Theory til−1 (k)
Some notations of algebraic graph theory are recalled. Bli (k) = eAi (Ts −t) dtBi ψi Ts + τil−1 (k) − τil (k)
til (k)
Denote G = {V, E, A} as a weighted graph composed of a
set of agents V = {1, . . . , N}, a set of edges E = {(i, j) : i, j ∈ × ψi τil (k) − lTs , l = 1, . . . ., d − 1

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DISTRIBUTED OPTIMAL CONSENSUS CONTROL FOR MASs WITH INPUT DELAY 1749

tid−1 (k)
III. D ISTRIBUTED O PTIMAL C ONSENSUS
Bdi (k) = eAi (Ts −t) dtBi ψi Ts + τid−1 (k) − τid (k)
tid (k) C ONTROLLER D ESIGN

× ψi τid (k) − dTs To further tackle an optimal consensus problem of (1), the
relationship between the performance index of MAS (1) and
1; x ≥ 0 the performance index of MAS (3) should be properly estab-
ψi (x) =
0; x < 0. lished such that one only needs to deal with the optimal
consensus control problem of system (3). In this section, we
Define a new augment state variable zi (k) = [xiT (k), uTi (k −
will first propose the local performance index functions for
1), . . . , uTi (k − d)]T ∈ Rn+dm , then system (2) can be repre-
both MAS (1) and MAS (3). Then, the equivalence relation
sented as
between MAS (1) and MAS (3) will be built with regard to
zi (k + 1) = Fi (k)zi (k) + Gi (k)ui (k) (3) the optimal performance index. Furthermore, based on the HJB
equations, an optimal consensus controller for each agent will
where be designed. Before we end this section, we will present a VI
⎡ ⎤
Āi B1i (k) ··· Bid−1 (k) Bdi (k) algorithm to learn the coupled HJB equations online.
⎢0 0 ··· 0 0 ⎥
⎢ ⎥
⎢ ··· 0 ⎥
Fi (k) = ⎢ 0 Im 0 ⎥ A. Coupled HJB Equations
⎢ . .. .. .. .. ⎥
⎣ .. . . . . ⎦ Motivated by [35], the following local performance index
0 0 ··· Im 0 of agent i for system (1) is defined:
T ∞
Gi (k) = (B0i (k))T , ImT , 0, . . . , 0 .
Jxi (t) = rxi (δi (τ ), Ui (τ ), U−i (τ ))dτ
0
Remark 1: In general, the number of multiplication for a ∞

control scheme is used as the estimation of its computational = rxi (δi (k), Ui (k), U−i (k))
complexity. For the newly augmented state variable zi (k), the k=0
computation cost is described by N[((n + dm)2 + m(n + ∞

di Ui (k) T
T
dm))n1 ], where n1 denotes the number of iterations. Thus, = δi (k)Pii δi (k) +
T
the computational complexity will increase as the scale of − j∈Ni aij Uj (k)
k=0
MAS (1) (i.e., the number of agents N) increases.
di Ui (k)
Remark 2: According to [34], if there exists a control × Ti (7)
− j∈Ni aij Uj (k)
law U(k) = KZ(k), where U = [uT1 , . . . , uTN ]T and Z =
[zT1 , . . . , zTN ]T , then MAS (1) achieves synchronization when where rxi (δi (k), Ui (k), U−i (k)) is the utility function; Ui (k) =
MAS (3) reaches consensus. [uTi (k), uTi (k − 1), . . . , uTi (k − d)]T is the overall delay control
To study the consensus problems of MASs (1) and (3), we input vector for agent i during the time interval [kTs , (k+1)Ts );
define the local neighbor tracking errors at sampling instants U−i = {Uj : j ∈ Ni } denotes the overall control input vec-
{kTs } as tor from the neighbors of agent i; and all weighting matrices
satisfying Pii > 0 and Ti ≥ 0 are constant.
δi (k) = aij (xi (k) − xj (k)) (4)
Similar to [36], we define the local performance index for
j∈Ni
system (3) as
and
∞

ei (k) = aij (zi (k) − zj (k)) (5) Jzi (k) = rzi (ei (k), ui (k))
j∈Ni k=0
∞

for all i ∈ V, respectively. T
= ei (k)Mii ei (k) + uTi (k)Lii ui (k) (8)
Based on (3) and (5), the dynamics of the local neighbor
k=0
tracking errors ei (k) can be represented as
where all weighting matrix are constant and satisfy
ei (k + 1) = aij (zi (k + 1) − zj (k + 1))
j∈Ni Mii1
Mii = ∈ R(n+dm)×(n+dm) > 0, Lii > 0.
= aij (Fi (k)zi (k) − Fj (k)zj (k)) Mii2
j∈Ni
Now, the following theorem, which establishes the equiv-
+ aij (Gi (k)ui (k) − Gj (k)uj (k)) (6) alence relation between the performance index (7) and the
j∈Ni performance index (8), is presented.
where ei (k) depends on the states and control inputs of itself Theorem 1: If the following conditions are satisfied:
and its all underlying neighbors.
Pii = Mii1
Based on the above analysis, it can be seen that the consen-
sus problem of the input-delayed MAS (1) is converted into
1 T
Lii 0
Ti = di2
the consensus problem of the delay-free MAS (3). 0 T Mii2

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:10 UTC from IEEE Xplore. Restrictions apply.
1750 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 48, NO. 6, JUNE 2018

where = [1m , 0m , . . . , 0m ] ∈ Rm×(d+1)m , 1m = [1, . . . , 1] Algorithm 1 VI

∈ Rm and Step 1: Initialization. Start with arbitrary initial admissible control
⎡ ⎤ [0] [0]
policies ui (k) and value functions Vi (ei (k)).
0 Im 0 · · · 0 [l]
⎢ 0 0 Im · · · 0 ⎥ Step 2: Value Evaluation. Given the iterative control policies ui (k),
⎢ ⎥ [l]
=⎢. . .. .. .⎥∈R
dm×(d+1)m solve Vi (ei (k)) using Bellman equations
⎣ .. .. . . .. ⎦
[l+1] [l] [l]
0 0 0 · · · Im Vi (ei (k)) = rzi (ei (k), ui (k)) + Vi (ei (k + 1)). (15)

then the performance index (7) of agent i for the input-delayed Step 3: Value Improvement. Update the iterative control policies using
MAS (1) is equivalent to the performance index (8) of agent 1
(k) = − di Li−1 GTi (k)∇V [l+1] (ei (k + 1)).
[l+1]
ui (16)
i for the delay-free MAS (3), i.e., Jxi = Jzi , ∀ i ∈ V. 2
Proof: See Appendix A. Step 4: If convergent, Exit. Else l = l + 1, go back to Step 2.
Theorem 1 indicates that the optimal consensus control
problem of MAS (1) can be converted into that of MAS (3).
Theorefore, the following problem is proposed.
and ∇V(e) = [(∂V1 /∂e1 )T , . . . , (∂VN /∂eN )T ]T . Based
Problem 1: How can one design suitable distributed opti-
on (11), the overall control input can be repre-
mal consensus control policies such that all the agents
sented as U = −(1/2)DL−1 G(k)T Pe KZ where
in (3) reach consensus while minimizing the performance
K = −(1/2)DL−1 G(k)T PH, L = diag{Lii } and
indexes (8)?
G(k) = diag{Gi (k)}.
Definition 1 [18] (Admissible Consensus Control): The
Theorem 2: Let optimal value functions Vi∗ (ei (k)) for all
control policies ui (k) for all i ∈ V are said to be admissible if
i ∈ V be the solutions to the coupled HJB equations (13)
system (6) can be stabilized with finite performance index (8).
and the optimal consensus control policies are given by (14).
Given the admissible control policies ui , the local value
Then the local neighborhood consensus errors ei (k) for all
function for agent i is described by
i ∈ V asymptotically converge to zero and all agents can reach
∞
consensus.
Vi (ei (k)) = eTi (l)Mii ei (l) + uTi (l)Lii ui (l) Proof: From (12), we have that
l=k
∞
Vi∗ (ei (k + 1)) − Vi∗ (ei (k)) = −rzi∗ (ei (k), u∗i (k)) < 0.
= rzi (ei (l), ui (l))
l=k Choosing Vi∗ (ei (k)) as the Lyapunov function candidate for
= rzi (ei (k), ui (k)) + Vi (ei (k + 1)) (9) system (6), the error system (6) is thus asymptotically stable,
and the coupled Hamilton–Jacobi equation has the form of i.e., ei (k) → 0 as k → ∞, which implies that zi (k) = zj (k)
as k → ∞. Therefore, all agents can reach consensus. This
Hi (ei (k), ∇Vi (ei (k + 1)), ui (k)) completes the proof.
= rzi (ei (k), ui (k)) + ∇ViT (ei (k + 1))ei (k + 1) (10)
where ∇Vi (ei (k + 1)) = (∂Vi (ei (k + 1))/∂ei (k + 1)) is the B. Value Iteration Algorithm for Discrete-Time HJB
gradient vector of the value function Vi (ei (k + 1)) with respect Equations
to ei (k + 1). In the section, Algorithm 1 is proposed to solve the cou-
According to the necessary conditions for optimality, we pled discrete-time HJB equations online and the proof of its
have convergence is also presented.
1 In the following, we will prove that the VI algorithm is
ui (k) = − di Lii−1 GTi (k)∇Vi (ei (k + 1)). (11) convergent.
2
The optimal control policy sequence {v[l] ∞
Based on Bellman optimality principle, the local optimal i }l=0 for agent i at
value functions Vi∗ (ei (k)) satisfy the coupled HJB equations each iteration step in VI algorithm is chosen as

Vi∗ (ei (k)) = min rzi (ei (k), ui (k) + Vi∗ (ei (k + 1)) (12) v[l]
i = arg min ei (k)Mii ei (k) + ui (k)Lii ui (k)
T T
ui (k) ui (k)

or
+ Vi[l] (ei (k + 1)) (17)
min Hi (ei (k), ∇Vi∗ (ei (k + 1)), ui (k)) = 0. (13)
ui (k)
and the associated value function sequence is
Thus, the local optimal consensus control policy for agent
i can be designed as Vi[l+1] (ei (k)) = min eTi (k)Mii ei (k) + vi[l]T (k)Lii v[l]
i (k)
ui (k)
1
u∗i (k) = − di Lii−1 GTi (k)∇Vi∗ (ei (k + 1)). (14) + Vi[l] (ei (k + 1)) . (18)
2
Remark 3: As indicated in [18], there exists a
positive definition matrix P ∈ RnN×nN such that Lemma 1 [35]: Let μ[l] i be an arbitrary sequence of control
∇V(e) = Pe = PHZ, where e = [eT1 , . . . , eTN ]T laws and v[l]
i be the control policy sequence described in (17).

Define Vi[l] as in (18) and [l]

i as that there exists an admissible control policy ui and let
[l+1] Vi[l] (ei (k)) and u[l]
i (k) be computed based on VI algorithm.
(ei (k)) = eTi (k)Mii ei (k) + μi[l]T (k)Lii μ[l]
i i Then, the solution sequence Vi[l] (ei (k)) and control policy
[l]
+ i (ei (k + 1)). (19) sequence u[l] ∗
i (k) converge to the optimal values Vi (ei (k)) and
∗
ui (k), respectively.
If 0 ≤ Vi[0]≤ [0]
i ,
then ≤ Vi[l] [l]
i .
Lemma 2: Suppose that the neighbors of agent i have fixed Proof: For convenience of analysis, define a new sequence
control policies uj , j ∈ Ni . Let the value function sequence Vi[l] { il } as follows:
be defined as in (18). Then there exists an upper bounded Y [l+1]
(ei (k)) = eTi (k)Mi ei (k) + v[l+1]T (k)Li vi[l+1]
i i
such that 0 ≤ Vi[l] ≤ Y. [l]
+ i (ei (k + 1)) (24)
Proof: Let {μ[l]
i } be a sequence of admissible control laws
and Vi[0] (·) = [0]i (·) = 0, where Vi and
[l] [l]
i is updated where [0]
i (·) = Vi[0] (·) = 0. In the following, we will prove:
by (18) and (19), respectively. [l]
From (19), we have i (ei (k)) ≤ Vi[l+1] (ei (k))
[l]
i (ei (k + 1)) = eTi (k + 1)Mii ei (k + 1) + μi[l−1]T (k)Lii μi[l−1] by mathematical induction.
[l−1] For l = 0, one has
+ (ei (k + 2))
i Vi[1] (ei (k)) − [0]
= rzi ei (k + 1), μi[l−1] + i[l−1] (ei (k + 2)) i (ei (k))
= eTi (k)Mi ei (k) + vi[0]T (k)Li v[0]
i ≥0 (25)
(20)
where rzi (ei (k + 1), μi[l−1] ) = eTi (k + 1)Mii ei (k + 1) + which straightforwardly implies that Vi[1] (ei (k)) ≥ i[0] (ei (k)).
μi[l−1]T (k)Lii μi[l−1] . Suppose that Vi[l] (ei (k)) ≥ i[l−1] (ei (k)) for any ei (k). Then,
Rearranging (20) yields for step l, according to (18) and (24), we have that

[l+1] [l] Vi[l+1] (ei (k)) − [l]
i (ei (k))
i (ei (k)) = r i
z ei (k), μi + [l] i (ei (k + 1))
= Vi[l] (ei (k + 1)) − [l−1]
i (ei (k + 1)) ≥ 0 (26)
= rzi ei (k), μ[l]
i + r i
z ei (k + 1), μ [l−1]
i
which leads to i[l] (ei (k)) ≤ Vi[l+1] (ei (k)).
+ i[l−1] (ei (k + 2)) Furthermore, it can be seen from Lemmas 1 and 2 that

= rzi ei (k), μ[l] + rzi ei (k + 1), μi[l−1] 0 ≤ Vi[l] (ei (k)) ≤ [l]
≤ Vi[l+1] (ei (k)) ≤ Y.
i i (ei (k))

+ rzi ei (k + 2), μi[l−2] + [l−2]
i (ei (k + 3)) Therefore, the sequence Vi[l] is increasing and has an upper
.. bound, which means that Vi[l] is convergent. And we define
. Vi∗ = liml→∞ Vi[l] . Meanwhile, by (14), we can also obtain
= rzi ei (k), μ[l] + rzi ei (k + 1), μi[l−1]
lim ui[l] = u∗i .
i

l→∞
+ rzi ei (k + 2), μi[l−2] + · · ·
This completes the proof.
+ rzi ei (k + l), μ[0]
i + [0]i (ei (k + l + 1))
IV. O NLINE A DAPTIVE S OLUTIONS

l
= rzi ei (k + h), μi[l−h] (21) BY C RITIC -ACTION NN S
h=0 In the section, a critic-action NN framework based on
[0] VI algorithm will be established to solve the coupled HJB
where i (ei (k
+ l + 1)) = 0.
equations online, in which the critic NNs will be used to
Since μ[l]
i is
an admissible control policy sequence, then approximate the value functions and the action NNs will be
there exists an upper bound such that employed to perform the control policies.

l
[l+1]
i (ei (k)) ≤ lim rzi ei (k + h), μi[l−h] A. Critic NNs Design
l→∞
h=0
Based on the universal approximation property of NNs [37],
= Y, ∀l ∈ N (22)
l the value function for agent i can be approximated by critic
where Y liml→∞ h=0 rzi (ei (k + h), μi[l−h] ). NN as follows:
Combining with Lemma 1, we obtain
Vi (k) = WciT φ(yi (k)) + i (k) (27)
Vi[l+1] (ei (k)) ≤ [l+1]
i (ei (k)) ≤ Y ∀l ∈ N+ . (23)
where yi (k) is an information vector for agent i from locally
Thus, there is an upper bounded Y such that 0 ≤ Vi[l] ≤ Y. available measurements, e.g., ei (k) and {ej (k), j ∈ Ni }; Wci
This completes the proof. represents the constant critic NN weight; φ(·) is the activation
Theorem 3 (Convergence of VI Algorithm): Let the neigh- function vector for the critic NN; and i is the approximation
bors of agent i have fixed control policies uj , j ∈ Ni . Suppose error for the critic NN.

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:10 UTC from IEEE Xplore. Restrictions apply.
1752 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 48, NO. 6, JUNE 2018

Suppose that the current weight estimation of each critic NN The tuning law of action NN weight is chosen as
is Ŵci (k) and the critic NN approximation of (27) is given by
ϕ(yi (k))
Ŵai (k + 1) = Ŵai (k) − αai eT (k)
V̂i (k) = ŴciT (k)φ(yi (k)). (28) ϕ (yi (k))ϕ(yi (k)) + 1 ai
T
(38)
Define the critic NN error as
eci (k) = rzi (ei (k), ui (k)) + V̂i (ei (k + 1)) − V̂i (ei (k)) where 0 < αai < 1 is the learning rate of the action NN.
In order to select the control policy ui (k) to make the desired
= eTi (k)Mii ei (k) + uTi (k)Lii ui (k) + ŴciT (k) φ(yi (k)) value function (27) minimum, we have
(29)
1
T
Wai ϕ(yi (k)) + εi (k) = − di Lii−1 GTi (k)
where φ(yi (k)) = φ(yi (k + 1)) − φ(yi (k)). 2
It is desired to select Ŵci (k) to minimize the square residual × ∇φ T (yi (k))Ŵci (k) + ∇ i (k) (39)
error
1 where ∇ i (k) = (∂ i (k)/∂yi (k)).
Eci (Ŵci (k)) = eTci (k)eci (k). (30)
2 Combined (37) with (39), the action NN approximation
Based on gradient descent algorithm, the weight adaptive error can be rewritten as
updating law of the critic NN is given by 1
eai (k) = −W̃ai
T
(k)ϕ(yi (k)) − di Lii−1 GTi (k)∇φ(yi (k))T
φ(yi (k)) 2
Ŵci (k + 1) = Ŵci (k) − αci 1 −1 T 1
φ T (yi (k)) φ(yi (k)) + 1 × Ŵci (k) − di Lii Gi (k)∇ i (k) − ζi (k) (40)
T 2 2
× rzi (ei (k), ui (k)) + ŴciT (k) φ(yi (k)) (31)
where ζi (k) = −εi (k) + (1/2)di Lii−1 GTi (k)∇ i (k) and it need
where αci > 0 is the learning rate of the critic NN. to satisfy ζi (k) ≤ ζM with ζM being a positive constant, and
Substituting (27) into (9), we obtain ∇ i (k) ≤ M .
Then, the action NN weight estimation error can be repre-
WciT (φ(yi (k + 1)) − φ(yi (k))) + i (k + 1) − i (k)
sented as
= −rzi (ei (k), ui (k)). (32)
W̃ai (k + 1)
In other words, we have ϕ(yi (k))
= W̃ai (k) − αai
WciT φ(yi (k)) + i (k) = −rzi (ei (k), ui (k)) (33) ϕ(yi (k))T ϕ(yi (k)) + 1

1
where i (k) = i (k + 1) − i (k). × W̃ai T
(k)ϕ(yi (k)) + di Lii−1 GTi (k)∇φ T (yi (k))W̃ci (k)
Let the weight estimation error of the critic NN be W̃ci (k) = 2

Wci − Ŵci (k). Then according to (33), we can obtain 1 −1 1 T
+ di Lii × GTi (k)∇ i (k) + ζi . (41)
φ(yi (k)) φ T (yi (k)) 2 2
W̃ci (k + 1) = W̃ci (k) − αci
φ T (yi (k)) φ(yi (k)) + 1
C. Stability Analysis
φ(yi (k)) i (k)
× W̃ci (k) − αci . (34) In the section, a theorem will be given to analyze the sta-
φ (yi (k)) φ(yi (k)) + 1
T
bility of our proposed method. Before giving the theorem, we
B. Action NNs Design first present a definition as follows.
Definition 2 [19]: The local neighborhood consensus error
Using the universal approximation property of NNs [37], an
ei (t) is UUB if there exists a compact set i ∈ Rn such
action NN is used to approximate the control policy of agent
that for any ei (t0 ) ∈ i , there exists a bound Bi and a
i as follows:
time tfi (Bi , ei (t0 )), both independent of t0 ≥ 0, such that
ui (k) = Wai
T
ϕ(yi (k)) + εi (k) (35) ei (t) ≤ Bi , ∀ t ≥ t0 + tfi .
Theorem 4: Consider that the local neighbor consensus
where Wai denotes the constant action NN weight; ϕ(·) is the
error dynamics are given by (6) and the update laws
activation function for the action NN; and εi is the action NN
of the critic-action NN weight estimations are defined
approximation error.
as (31) and (38), respectively. Then, the local consensus errors
Let Ŵai (k) be the estimated value of Wai , then the estimated
δi , ei , and the critic-action NN weight estimation errors W̃ci
control policy of the action NN for agent i is represented as
and W̃ai are UUB, respectively. Moreover, the approximated
ûi (k) = Ŵai
T
(k)ϕ(yi (k)). (36) optimal consensus control policies ûi converge to its target
values ui .
The action NN error is expressed as
Proof: See Appendix B.
1
eai (k) = Ŵai
T
(k)ϕ(yi (k)) + di Lii−1 GTi (k)
2 V. S IMULATION
× ∇φ T (yi (k))Ŵci (k) (37)
In this section, we will illustrate the effectiveness of the
where ∇φ(yi (k)) = (∂φ(yi (k))/∂yi (k)). proposed approach through two simulation examples.

(a)

(b)

Fig. 1. (a) Undirected communication topology G1 . (b) Distribution of input

delay.

(b)
Example 1: The communication topology G1 of the input-
Fig. 2. Convergence of the NN weight estimations. (a) Critic NN.
delayed MAS (1) is shown in Fig. 1(a) and the system (b) Action NN.
parameter matrices of MAS (1) are given by

−0.8 −30 0 0 0
Ai = , B1 = , B2 = , B3 =
−50 −0.5 1 3 2

0
and B4 = .
5
The input delays are characterized by τi (t) = 0.2 × rand(t)
for any i ∈ V with their distributions showing in Fig. 1(b),
where rand(t) is a random function with its value varying on
(0, 1). The sampling time is chosen as Ts = 0.1 s and the
bound of the input delays is 2, i.e., d = 2. The learning rates
are selected as αai = 3 × 10−4 and αci = 7 × 10−4 . The
weighting matrices in the performance index functions of (7)
are chosen as Pii = I2 and Ti = diag{1, 0, 1, 1}, thus we Fig. 3. Evolution of the states for the transformed delay-free MAS.
have Mii = I4 , Lii = 1. Furthermore, one has that zi (k) =
[xi1 (k) xi2 (k) ui (k − 1) ui (k − 2)]T . For each agent, the
local neighbor consensus error is ei = [ei1 ei2 ei3 ei4 ]T . displayed in Fig. 6. From Figs. 5 and 6, it can be observed
Choose the fuzzy hyperbolic function as the critic and action that the original input-delayed MAS can achieve consensus
NN activation functions: φi (k) = tanhT (e2i (k)) and ϕi (k) = based on the proposed optimal consensus controllers using
tanhT (ei (k)). ADP method. Fig. 7 compares the performance index func-
Fig. 2 shows the convergence curves of the critic and action tions between the proposed optimal consensus controllers and
NN weight estimations, which indicates that their estimation the guaranteed cost controllers presented in [16] and [38], from
errors are UUB. Fig. 3 depicts the evolutions of the states which it can be clearly seen that the proposed optimal consen-
for the transformed delay-free MAS (3) which show that the sus controllers outperform the guaranteed cost controllers in
transformed MAS can reach consensus. Fig. 4 shows that the minimizing the performance index (7). This verifies the effec-
optimal consensus controllers based on ADP are convergent. tiveness of the optimal consensus controllers design based on
The local neighbor consensus errors of the original input- ADP method.
delayed MAS are shown in Fig. 5. Moreover, the 3-D phase Example 2: In the example, we test our proposed approach
plane plot of the states for the original input-delayed MAS is on an MAS consisting four batch reactors with a directed

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:10 UTC from IEEE Xplore. Restrictions apply.
1754 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 48, NO. 6, JUNE 2018

Fig. 7. Comparison of performance index functions.

Fig. 4. Evolution of the optimal consensus controllers.

Fig. 8. Directed communication topology G2 .

Fig. 5. Consensus of the local neighbor consensus errors for original input-
delayed MAS.

(a)
Fig. 6. 3-D phase plane plot of the states for the original input-delayed MAS.

communication topology G2 shown in Fig. 8 and the system

parameter matrices given by [39] and [40]
⎡ ⎤
1.38 −0.2077 6.715 −5.676
⎢ −0.5814 −4.29 0 0.675 ⎥
Ai = ⎢⎣ 1.067
⎥
4.273 −6.654 5.893 ⎦
0.048 4.273 1.343 −2.104
⎡ ⎤
0 0
⎢ 5.679 0 ⎥
Bi = ⎢⎣ 1.136 −3.146 ⎦.
⎥
(b)
1.136 0
Fig. 9. The evolution curves of the NN weight estimations. (a) Critic NN.
The input delays are simulated as τi (t) = 0.2 cos2 (t). (b) Action NN.
The weighting matrices in the performance index func-
tions of (7) are given by Pii = I4 and Ti =
diag{2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1}, then we can obtain Mii = Fig. 9 shows the evolution curves of the critic and action
I8 and Lii = I2 . Moreover, we select the critic-action activation NN weight estimations for four agents, respectively. Clearly,
functions as φi (k) = e2i (k)T and ϕi (k) = tanh(ei (k))T . these weight estimations are convergent, implying that the

optimal control policy for each agent dependent on current

step of state feedback and linear combination of some former
steps of control inputs has been proposed. The critic-action NN
structures have been utilized to implement the proposed ADP
method, in which the critic NNs and the action NNs have been
adopted to approximate optimal value functions and optimal
control policies, respectively. Finally, two numerical simula-
tions have been presented to illustrate the effectiveness of
the proposed ADP method. Future research directions include
model-free systems, communication delays, packet losses, and
disturbances.

A PPENDIX A
P ROOF OF T HEOREM 1
In order to analyze the equivalence between Jzi and Jxi ,
we shall first build the relationship between ei (k) and δi (k).
Fig. 10. Consensus of state trajectories of the original input-delayed MAS. Rewrite (5) as

ei (k) = (zi (k)−zj (k))
j∈Ni
⎡ ⎤
xi (k)−xj (k)
⎢ ⎥
⎢ ui (k − 1)−uj (k − 1) ⎥
= aij ⎢ .. ⎥
⎣ . ⎦
j∈Ni
ui (k − d)−uj (k − d)

δi (k)
= (42)
Ūi (k)
Fig. 11. Optimal control policies for our proposed approach.
where
⎡ ⎤
aij (ui (k − 1)−uj (k − 1))
j∈Ni
⎢ .. ⎥
Ūi (k) = ⎣ . ⎦

j∈Ni aij (ui (k − d)−uj (k − d))

= di Ui (k)− aij Uj (k).
j∈Ni
Then, we have
eTi (k)Mii ei (k)
= δiT (k)Mii1 δi (k) + ŪiT (k)Mii2 Ūi (k)
⎛ ⎞T

= δiT (k)Mii1 δi (k) + ⎝di Ui (k) − aij Uj (k)⎠ Mii2
Fig. 12. Evolution of performance index functions. j∈Ni
⎛ ⎞

× ⎝di Ui (k) − aij Uj (k)⎠
estimation errors are UUB. Fig. 10 depicts the evolutions j∈Ni
of the states for the original input-delayed MAS. Fig. 11
= δiT (k)Mii1 δi (k) + UiT (k)di2 T Mii2 Ui (k)
shows that the input-delayed MAS can reach consensus based
on the proposed optimal consensus controllers. Fig. 12 illus- −2 UiT (k)di T Mii2 aij Uj (k)
trates the performance index functions of the proposed optimal j∈Ni
consensus controllers. ⎛ ⎞T ⎛ ⎞

+⎝ aij Uj (k)⎠ Mii2 ⎝ aij Uj (k)⎠. (43)
VI. C ONCLUSION j∈Ni j∈Ni

The distributed optimal consensus control problem for linear Thus, the utility function of (3) is given by
continuous-time MASs subject to time-varying input delays
has been investigated. It has been shown that the optimal con- rzi = δiT (k)Mii1 δi (k) + UiT (k)di2 T Mii2 Ui (k)
⎛ ⎞T ⎛ ⎞
sensus control problem of the original input-delayed MAS
can be converted into that of a delay-free MAS by model +⎝ aij Uj (k)⎠ T Mii2 ⎝ aij Uj (k)⎠
transformation and performance index equivalence. Then, an j∈Ni j∈Ni

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:10 UTC from IEEE Xplore. Restrictions apply.
1756 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 48, NO. 6, JUNE 2018

−2 UiT (k)di T Mii2 aij Uj (k) + uTi (k)Lii ui (k) where δi (k) = δi (k + 1) − δi (k) and Ūi (k) = Ūi (k + 1) −
j∈Ni Ūi (k).

According to (2), we have
= δiT (k)Mii1 δi (k) + UiT (k) di2 T Mii2 + T Lii Ui (k)

−2 UiT (k)di T Mii2 aij Uj (k) δi (k) = δi (k + 1) − δi (k)

j∈Ni = aij xi (k + 1) − xj (k + 1) − δi (k)
⎛ ⎞T ⎛ ⎞
j∈Ni

+⎝ aij Uj (k)⎠ T Mii2 ⎝ aij Uj (k)⎠. (44) = aij Āi xi (k) + fi (Bi , T)ui − Āi xj (k)
j∈Ni j∈Ni j∈Ni

From (7), one has + fj Bj , T uj − δi (k)

= Āi − I δi (k) + di fi (Bi , T)ui
rxi = δiT (k)Pii δi (k) + UiT (k)di Ti,11 di Ui (k) − 2 UiT (k)
j∈Ni − aij fj Bj , T uj (47)
⎛ ⎛
⎞T ⎞ j∈Ni

× di Ti,12 aij Uj (k) + ⎝ aij Uj (k)⎠ Ti,22 ⎝ aij Uj (k)⎠. " (k+1)T A ((k+1)T−t)
j∈Ni j∈Ni
where fi (Bi , T) = Bi kT e i dt is bounded since
eAi ((k+1)T−t) is continuous.
Then, one obtains
When the following conditions:
Pii = Mii1 δi (k)T (δi (k + 1) + δi (k))
⎛ ⎞T
1 T
Ti,11 = T Mii2 + Lii
di2 = ⎝ Āi − I δi (k) + di fi (Bi , T)ui − aij fj Bj , T uj ⎠
Ti,12 = T Mii2 j∈Ni
× (δi (k + 1) + δi (k))
Ti,22 = T Mii2 T
= δi (k)T Āi − I (δi (k + 1) + δi (k)) + di uTi fi (Bi , T)T
are satisfied, we have T
× (δi (k + 1) + δi (k)) − aij uTj fj Bj , T
1 T
I L I 0 j∈Ni
Ti = di2 1 ii 1 ≥ 0.
T Mii2 × (δi (k + 1) + δi (k))
0
3 1
Thus, it is straightforward to obtain Jxi = Jzi . This completes ≤ λmax Āi − I + di δi (k) 2 + λmax Āi − I + di
2 2
the proof.
j∈N̄i aij
× δi (k + 1) 2 + fj 2 uj 2 . (48)
2
A PPENDIX B
P ROOF OF T HEOREM 4 Thus, the first difference of Lie (k) can be rewritten as
Choose the local Lyapunov function candidate as
Lie (k)
Li (k) = Lie (k) + Lic (k) + Lia (k) (45) = ei (k + 1)T ei (k + 1) − ei (k)T ei (k)

where + 2i Vxi (δi (k)) + Vxi (δi (k + 1))

= δi (k)T (δi (k + 1) + δi (k)) + Ūi (k)T Ūi (k + 1) + Ūi (k)
Lie (k) = ei (k) ei (k) + i Vxi (δi (k)) + Vxi (δi (k + 1))
T
+ i −rxi (δi (k), ui (k), u−i (k))
Lic (k) = tr W̃ci (k)T W̃ci (k)
− ri (δi (k + 1), ui (k + 1), u−i (k + 1))
Lia (k) = tr W̃ai (k)T W̃ai (k) . x
3
≤ λmax Āi − I + di − i λmin (Pi ) δi (k) 2
The first difference of Lie (k) is given by 2

1
ei (k + 1)T ei (k + 1) − ei (k)T ei (k) + λmax Āi − I + di − i λmin (Pi ) δi (k + 1) 2
2
! δi (k + 1)
= δi (k + 1) T
Ūi (k + 1) T j∈N̄i aij − 2i λmin Qij
Ūi (k + 1) + fj 2 uj 2 . (49)
2
! δi (k)
− δi (k)T Ūi (k)T Select i such that
Ūi (k)
= δi (k + 1)T δi (k + 1) − δi (k)T δi (k) # $
2 λmax Āi − I + di
3
aij
+ Ūi (k + 1)T Ūi (k + 1) − Ūi (k)T Ūi (k) i > max , . (50)
λmin (Pi ) 2λmin Qij
= δi (k)T (δi (k + 1) + δi (k))

+ Ūi (k)T Ūi (k + 1) + Ūi (k) (46) Then, we have Lie (k) ≤ 0.

1
Next, the first difference of Lic (k) can be represented as × W̃ai (k)T ϕ(yi (k)) + di Lii−1 × GTi (k)∇φ T (yi (k))W̃ci
2

Lic (k) 1 −1 T 1
+ di Lii Gi (k)∇ i (k) + ζi
= tr W̃ci (k + 1)T W̃ci (k + 1) − tr W̃ci (k)T W̃ci (k) 2 2
%% 3αai − 2di αai − 5αai
2
φ(yi (k)) φ(yi (k))T ≤− 2 ϕM
2
W̃ai 2
= tr W̃ci (k) − αci W̃ci (k) 2 ϕM +1
φ(yi (k))T φ(yi (k)) + 1
&&T 2di αai + 4αai
2 + 2d α 2 + 2d 2 α 2
Lii−1 Gi ∇φM
i ai
φ(yi (k)) i (k) + 2 i ai 2
W̃ci 2

− αci W̃ci (k) 4 ϕM +1

φ(yi (k))T φ(yi (k)) + 1 4di αai + 4αai
2 + d α 2 + 4d 2 α 2
Lii−1 Gi
i ai
+ 2 i ai 2
∇ M
2
φ(yi (k)) φ(yi (k)) T
8 ϕM + 1
− αci W̃ci (k) − αci
φ(yi (k))T φ(yi (k)) + 1 2αai + 3αai
2 + d α2
+ 2
i
ai ζM
2
. (53)
φ(yi (k)) i (k) 4 ϕM + 1
× − tr W̃ci (k)T W̃ci (k)
φ(yi (k)) φ(yi (k)) + 1
T
% & Thus, Lia (k) ≤ 0 if the following inequality holds:
2αci W̃ci (k)T φ(yi (k)) φ(yi (k))T W̃ci (k) '
= −tr εi
φ(yi (k))T φ(yi (k)) + 1 W̃ai (k) ≥ bai (54)
% & 2 ϕ2
4 3αai − 2di αai − 5αai
2αci W̃ci (k)T φ(yi (k)) i (k) M
− tr
φ(yi (k))T φ(yi (k)) + 1 where
% 2 &
αci
2 W̃ (k)T
ci φ(yi (k)) φ(yi (k))T W̃ci (k) εi = 4di αai + 4αai
2
+ di αai
2
+ 4di2 αai2
Lii−1 Gi 2 ∇ M 2
+ tr 2
φ(yi (k))T φ(yi (k)) + 1
% & + 2 2αai + 3αai 2
+ di αai
2
ζM
2
2 W̃ (k)T φ(y (k)) φ(y (k))T φ(y (k))
2αci ci i i i i (k)
− tr 2 + 4 di αai + 2αai2
+ di αai
2
+ di2 αai
2
Lii−1 Gi ∇φM 2 bci .
φ(yi (k))T φ(yi (k)) + 1
% T &
αci
2
i (k) φ(yi (k))T φ(yi (k)) i (k) Combining (50), (52), and (54) yield
+ tr 2
φ(yi (k))T φ(yi (k)) + 1 Li (k) = Lie (k) + Lic (k) + Lia (k) ≤ 0 (55)
αci (3 φm − 2αci φM )2 W̃ci (k) 2
≤− which means that the local neighbor consensus error
φM
2 +1
system (6) is asymptotically stable.
αci (2αci − 1) 2
From (42), one obtains
+ M
(51)
φM
2 +1
δi (k) ≤ ei (k) . (56)
where φm ≤ φi (yi (k)) ≤ φM and i (k) ≤ M is
ensured by the PE condition. Based on Lyapunov extension theorem, the local consensus
If W̃ci (k) satisfies error ei , δi and the weights estimation errors of the critic and
' actor NNs are UUB, respectively.
(2αci − 1) M
2
Next, we will prove ui (k) − ûi (k) ≤ ςi as k → ∞.
W̃ci (k) ≥ bci . (52)
3 φm − 2αci φM 2 Using (35) and (36), we have
Then, we get Lic (k) < 0. ui (k) − ûi (k) = W̃ci ϕ(yi (k)) + εi (k)
Subsequently, the first difference of Lia (k) is expressed as ≤ bci ϕM + ζM ςi . (57)
Lia (k) Thus, we have the approximated optimal consensus control

= tr W̃ai (k + 1)T W̃ai (k + 1) − tr W̃ai (k)T W̃ai (k) policies ûi converge to its target values ui . This completes the
%
2αai W̃ai (k)T ϕ(yi (k)) proof.
= tr − W̃ai (k)T ϕ(yi (k))
ϕ(yi (k))T ϕ(yi (k)) + 1
1 1 R EFERENCES
+ di Lii−1 GTi (k)∇φ T (yi (k))W̃ci + di Lii−1 GTi (k)∇ i (k)
2 2
&
[1] A. Abdessameud and A. Tayebi, “Formation control of VTOL unmanned
1 T aerial vehicles with communication delays,” Automatica, vol. 47, no. 11,
+ ζi pp. 2383–2394, 2011.
2 [2] L. Alvarez, R. Horowitz, and C. V. Toy, “Multi-destination traffic flow
% control in automated highway systems,” Transp. Res. C Emerg. Technol.,
α 2 ϕ(yi (k))T ϕ(yi (k)) vol. 11, no. 1, pp. 1–28, 2003.
+ tr ai 2
ϕ(yi (k))T ϕ(yi (k)) + 1 [3] Q.-Y. Yu, W.-X. Meng, M.-C. Yang, L.-M. Zheng, and Z.-Z. Zhang,
“Virtual multi-beamforming for distributed satellite clusters in space
1
× W̃ai (k)T × ϕ(yi (k)) + di Lii−1 GTi (k)∇φ T (yi (k))W̃ci
information networks,” IEEE Wireless Commun., vol. 23, no. 1,
2 pp. 95–101, Feb. 2016.
[4] S. Weng, D. Yue, Z. Sun, and L. Xiao, “Distributed robust finite-time
1 1 T
+ di Lii−1 × GTi (k)∇ i (k) + ζi attitude containment control for multiple rigid bodies with uncertainties,”
2 2 Int. J. Robust Nonlin. Control, vol. 25, no. 15, pp. 2561–2581, 2014.

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:10 UTC from IEEE Xplore. Restrictions apply.
1758 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 48, NO. 6, JUNE 2018

[5] X. Ge and Q.-L. Han, “Distributed formation control of networked [28] X. Ge, Q.-L. Han, and F. Yang, “Event-based set-membership leader-
multi-agent systems using a dynamic event-triggered communica- following consensus of networked multi-agent systems subject to lim-
tion mechanism,” IEEE Trans. Ind. Electron., to be published, ited communication resources and unknown-but-bounded noise,” IEEE
doi: 10.1109/TIE.2017.2701778. Trans. Ind. Electron., vol. 64, no. 6, pp. 5045–5054, Jun. 2017.
[6] J. Xi, Y. Yu, G. Liu, and Y. Zhong, “Guaranteed-cost consensus for [29] C. Peng, D. Yue, and M.-R. Fei, “A higher energy-efficient sampling
singular multi-agent systems with switching topologies,” IEEE Trans. scheme for networked control systems over IEEE 802.15.4 wireless
Circuits Syst. I, Reg. Papers, vol. 61, no. 5, pp. 1531–1542, May 2014. networks,” IEEE Trans. Ind. Informat., vol. 12, no. 5, pp. 1766–1774,
[7] X. Ge and Q.-L. Han, “Consensus of multiagent systems subject to par- Oct. 2016.
tially accessible and overlapping Markovian network topologies,” IEEE [30] X.-M. Zhang, Q.-L. Han, and X. Yu, “Survey on recent advances in
Trans. Cybern., to be published, doi: 10.1109/TCYB.2016.2570860. networked control systems,” IEEE Trans. Ind. Informat., vol. 12, no. 5,
[8] G. Wen, C. L. P. Chen, Y.-J. Liu, and Z. Liu, “Neural network-based pp. 1740–1752, Oct. 2016.
adaptive leader-following consensus control for a class of nonlinear [31] C. L. P. Chen, G.-X. Wen, Y.-J. Liu, and F.-Y. Wang, “Adaptive consen-
multiagent state-delay systems,” IEEE Trans. Cybern., to be published, sus control for a class of nonlinear multiagent time-delay systems using
doi: 10.1109/TCYB.2016.2608499. neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 6,
[9] W. He et al., “Leader-following consensus of nonlinear multiagent pp. 1217–1226, Jun. 2014.
systems with stochastic sampling,” IEEE Trans. Cybern., vol. 47, no. 2, [32] Z. Zuo, C. Wang, and Z. Ding, “Robust consensus control of uncertain
pp. 327–338, Feb. 2017. multi-agent systems with input delay: A model reduction method,” Int.
[10] G. Guo, L. Ding, and Q.-L. Han, “A distributed event-triggered trans- J. Robust Nonlin. Control, vol. 27, no. 11, pp. 1874–1894, Jul. 2017,
mission strategy for sampled-data consensus of multi-agent systems,” doi: 10.1002/rnc.3642.
Automatica, vol. 50, no. 5, pp. 1489–1496, 2014. [33] X.-M. Zhang, Q.-L. Han, and B.-L. Zhang, “An overview and deep
[11] C. Hua, Y. Li, and X. Guan, “Leader-following consensus for high- investigation on sampled-data-based event-triggered control and filter-
order nonlinear stochastic multiagent systems,” IEEE Trans. Cybern., to ing for networked systems,” IEEE Trans. Ind. Informat., vol. 13, no. 1,
be published, doi: 10.1109/TCYB.2017.2651019. pp. 4–16, Feb. 2017.
[12] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and coop- [34] M. B. G. Cloosterman, N. V. D. Wouw, W. P. M. H. Heemels, and
eration in networked multi-agent systems,” Proc. IEEE, vol. 95, no. 1, H. Nijmeijer, “Stability of networked control systems with uncertain
pp. 215–233, Jan. 2007. time-varying delays,” IEEE Trans. Autom. Control, vol. 54, no. 7,
[13] Y. Cao, W. Yu, W. Ren, and G. Chen, “An overview of recent progress pp. 1575–1580, Jul. 2009.
in the study of distributed multi-agent coordination,” IEEE Trans. Ind. [35] H. Zhang, D. Liu, Y. Luo, and D. Wang, “Adaptive dynamic pro-
Informat., vol. 9, no. 1, pp. 427–438, Feb. 2013. gramming for control,” Commun. Control Eng., vol. 54, no. 45,
[14] X. Ge, F. Yang, and Q.-L. Han, “Distributed networked control systems: pp. 6019–6022, 2013.
A brief overview,” Inf. Sci., vol. 380, pp. 117–131, Feb. 2017. [36] T. L. Nguyen, “Adaptive dynamic programming-based design of inte-
[15] Z.-H. Guan, B. Hu, M. Chi, D.-X. He, and X.-M. Cheng, “Guaranteed grated neural network structure for cooperative control of multiple
performance consensus in second-order multi-agent systems with hybrid MIMO nonlinear systems,” Neurocomputing, vol. 237, pp. 12–24,
impulsive control,” Automatica, vol. 50, no. 9, pp. 2415–2418, 2014. May 2017.
[16] Z. Wang, J. Xi, Z. Yu, and G. Liu, “Guaranteed cost consensus for [37] S. Jagannathan, Neural Network Control of Nonlinear Discrete-Time
multi-agent systems with time delays,” J. Frankl. Inst., vol. 352, no. 9, Systems. Boca Raton, FL, USA: CRC Press, 2006.
pp. 3612–3627, 2014. [38] X. Zhou, P. Shi, C.-C. Lim, C. Yang, and W. Gui, “Event based guar-
[17] K. G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, “Multi-agent differ- anteed cost consensus for distributed multi-agent systems,” J. Frankl.
ential graphical games: Online adaptive learning solution for synchro- Inst., vol. 352, no. 9, pp. 3546–3563, 2015.
nization with optimality,” Automatica, vol. 48, no. 8, pp. 1598–1611, [39] H. Xu, S. Jagannathan, and F. L. Lewis, “Stochastic optimal control of
2012. unknown linear networked control system in the presence of random
[18] M. I. Abouheaf, F. L. Lewis, K. G. Vamvoudakis, S. Haesaert, and delays and packet losses,” Automatica, vol. 48, no. 6, pp. 1017–1030,
R. Babuska, “Multi-agent discrete-time graphical games and reinforce- 2012.
ment learning solutions,” Automatica, vol. 50, no. 12, pp. 3038–3053, [40] G. C. Walsh, Y. Hong, and L. G. Bushnell, “Stability analysis of net-
2014. worked control systems,” IEEE Trans. Control Syst. Technol., vol. 10,
[19] H. Zhang, J. Zhang, G.-H. Yang, and Y. Luo, “Leader-based optimal no. 3, pp. 438–446, May 2002.
coordination control for the consensus problem of multiagent differential
games via fuzzy adaptive dynamic programming,” IEEE Trans. Fuzzy
Syst., vol. 23, no. 1, pp. 152–163, Feb. 2015. Huaipin Zhang received the B.S. degree from
[20] P. J. Werbos, “Approximate dynamic programming for real-time control the School of Mathematical Sciences, Chuzhou
and neural modeling,” in Handbook of Intelligent Control. New York, University, Chuzhou, China, in 2011. He is cur-
NY, USA: Van Nostrand Reinhold, 1992. rently pursuing the Ph.D. degree in control science
[21] A. Al-Altamimi, F. L. Lewis, and M. Abu-khalaf, “Discrete-time and engineering with the Huazhong University of
nonlinear HJB solution using approximate dynamic programming: Science and Technology, Wuhan, China.
Convergence proof,” IEEE Trans. Syst., Man, Cybern. B, Cybern., His current research interests include nonlinear
vol. 38, no. 4, pp. 943–949, Aug. 2008. systems, cooperative control of multiagent systems,
[22] K. G. Vamvoudakis and F. L. Lewis, “Online actor–critic algorithm adaptive dynamic programming, and event-triggered
to solve the continuous-time infinite horizon optimal control problem,” control scheme.
Automatica, vol. 46, no. 5, pp. 878–888, 2010.
[23] F.-Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming:
An introduction,” IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39–47, Dong Yue (SM’08) received the Ph.D. degree
May 2009. from the South China University of Technology,
[24] H.-G. Zhang, X. Zhang, Y.-H. Luo, and J. Yang, “An overview of Guangzhou, China, in 1995.
research on adaptive dynamic programming,” Acta Autom. Sinica, He is currently a Professor and the Dean of
vol. 39, no. 4, pp. 303–311, 2013. the Institute of Advanced Technology, Nanjing
[25] W. Qiao, R. G. Harley, and G. K. Venayagamoorthy, “Coordinated reac- University of Posts and Telecommunications,
tive power control of a large wind farm and a statcom using heuristic Nanjing, China, and also a Changjiang Professor
dynamic programming,” IEEE Trans. Energy Convers., vol. 24, no. 2, with the Department of Control Science and
pp. 493–503, Jun. 2009. Engineering, Huazhong University of Science and
[26] D. P. Bertsekas, M. L. Homer, D. A. Logan, S. D. Patek, and Technology, Wuhan, China. He has published
N. R. Sandell, “Missile defense and interceptor allocation by neuro- over 100 papers in international journals, domestic
dynamic programming,” IEEE Trans. Syst., Man, Cybern. A, Syst., journals, and international conferences. His current research interests include
Humans, vol. 30, no. 1, pp. 42–51, Jan. 2000. analysis and synthesis of networked control systems, multiagent systems,
[27] D. Liu, H. Javaherian, O. Kovalenko, and T. Huang, “Adaptive critic optimal control of power systems, and Internet of Things.
learning techniques for engine torque and air–fuel ratio control,” IEEE Dr. Yue is currently an Associate Editor of the IEEE Control Systems
Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 988–993, Society Conference Editorial Board and the International Journal of Systems
Aug. 2008. Science.

Wei Zhao received the B.S. degree in mathemat- Chunxia Dou received the B.S. and M.S. degrees
ics and applied mathematics from Henan University, in automation from Northeast Heavy Machinery
Kaifeng, China, in 2011. She is currently pursuing Institute, Qiqihaer, China, in 1989 and 1994,
the Ph.D. degree in mechanical and electronic engi- respectively, and the Ph.D. degree in measurement
neering from the Huazhong University of Science technology and instrumentation from the Institute
and Technology, Wuhan, China. of Electrical Engineering, Yanshan University,
Her current research interests include multiagent Qinhuangdao, China, in 2005.
systems, neural-network-based control, adaptive In 2010, she joined the Department of
dynamic programming, and event-triggered control Engineering, Peking University, Beijing, China,
and optimal control. where she was a Post-Doctoral Fellow for two
years. Since 2005, she has been a Professor with
the Institute of Electrical Engineering, Yanshan University. Her current
research interests include multiagent-based control, event-triggered hybrid
control, distributed coordinated control, and multimode switching control
Songlin Hu received the Ph.D. degree from the and their applications in power systems, microgrids, and smart grids.
Huazhong University of Science and Technology,
Wuhan, China, in 2012.
Since 2013, he has been with the College
of Automation, Nanjing University of Posts and
Telecommunications, Nanjing, China, where he is
currently an Associate Professor with the Institute
of Advanced Technology. His current research
interests include networked/event-triggered control,
TCS fuzzy systems, and time delay systems.

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:40:10 UTC from IEEE Xplore. Restrictions apply.