0% found this document useful (0 votes)
2 views

Distributed_Optimal_Coordination_Control_for_Continuous-Time_Nonlinear_Multi-Agent_Systems_With_Input_Constraints

This paper addresses the optimal coordination control for nonlinear multi-agent systems (MASs) with input constraints using an adaptive dynamic programming (ADP) algorithm. It introduces a non-quadratic functional to transform the problem into an optimization framework, leading to a distributed control law that aims for Nash equilibrium among agents. The effectiveness of the proposed algorithm is demonstrated through simulations, showcasing its convergence and applicability in real-world scenarios.

Uploaded by

lekhanh2410
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Distributed_Optimal_Coordination_Control_for_Continuous-Time_Nonlinear_Multi-Agent_Systems_With_Input_Constraints

This paper addresses the optimal coordination control for nonlinear multi-agent systems (MASs) with input constraints using an adaptive dynamic programming (ADP) algorithm. It introduces a non-quadratic functional to transform the problem into an optimization framework, leading to a distributed control law that aims for Nash equilibrium among agents. The effectiveness of the proposed algorithm is demonstrated through simulations, showcasing its convergence and applicability in real-world scenarios.

Uploaded by

lekhanh2410
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2020 IEEE 9th Data Driven Control and Learning Systems Conference

November 20-22, 2020, Liuzhou, China

Distributed Optimal Coordination Control for Continuous-Time


Nonlinear Multi-Agent Systems With Input Constraints
Yunhong Deng1 , Jun Xiao1 , Qinglai Wei1,2
1. The authors are with the University of Chinese Academy of Sciences, Beijing 100049, P. R. China
E-mail: [email protected], [email protected], [email protected]
2. This work was supported in part by the National Natural Science Foundation of China under Grants 61722312, 61533017, and by the
European Union’s Horizon 2020 research and innovation programme under grant agreement 739551 (KIOS CoE)

Abstract: This paper is concerned with an optimal coordination control problem for nonlinear multi-agent systems (MASs) with
constraints of the control inputs. The idea of daptive dynamic programming (ADP) algorithm is to use the policy iteration to
solve the coupled Hamilton-Jacobi equations. First, a suitable non-quadratic functional is introduced into the cost function to
transform the question into an optimization problem. Second, a distributed control law is designed for each agent, which aims
that the cost function of the MASs converge to Nash equilibrium. Next, the analysis of the convergence is indicated that the
iterative cost functions of nonlinear multi-agent systems is convergent. Neural network (NNs) are used to approximate the cost
functions for the calculation of the control laws. Finally, simulation results show the effectiveness of the coordination control
algorithm.
Key Words: Adaptive dynamic programming (ADP), multi-agent systems, Nash equilibrium, reinforcement learning, optimal
control.

1 Introduction bining Reinforcement learning, neural network and dynamic


programming, it becomes hot in current development of au-
Distributed cooperative optimal control of multi-agent tomation technology. ADP algorithm includes two types:
systems (MASs) have made great progress in recent years[2– value iteration and policy iteration[22–28], the value itera-
4] benefit from extensively applications like unmanned aerial tion needs a lot of data to train the model, while the policy
vehicles (UAVs)[5] and Euler-Lagrange systems[6], etc. iteration needs to give a permissive initial control.
This is an attractive subject that the application of ADP not In practical engineering applications, saturation and non-
only makes each member of coordination control achieve linearity are hardly avoidable. The state of nonlinear system
consensus, but also makes the performance index function is time-varying, so it is necessary to detect the state of the
reach the minimum [3, 7, 8]. Thus we can do more inter- system continuously and change the control strategy accord-
esting research to solve a variety of complex MASs. ADP ing to the state of the system. By using distributed control
has been extensively utilized for solving consensus problem law, each agent only communicates with neighboring agents
of multi-agent graphical games in past decade[9–13]. Each and only needs to store and calculate a small amount of in-
member receives limited information from its neighbors, and formation. The system is easy to implement, has low re-
the final state is up to the entire communication agents. quirements for equipment and strong system stability. With
As an intelligent control algorithm, adaptive dynamic pro- the increasing application fields of distributed control, the
gramming algorithm has obvious advantages in solving the advantages of distributed control are also shown, which can
optimal control problem of nonlinear system. In [14], the reduce the system complexity and control cost, and improve
authors proved the convergence of HDP algorithm based on the system robustness and anti-interference ability. As an in-
value iteration in nonlinear systems, and provided a theo- telligent control algorithm, adaptive dynamic programming
retical analysis for the widespread application of ADP algo- enriches its theoretical research and makes a contribution to
rithms. To solve infinite horizon adaptive dynamic program- the field of artificial intelligence.
ming problem, The authors in [15] proposed a dynamic pro- In this paper, if the limit of the controller is not consid-
gramming method for discrete-time systems that ADP can ered, it may lead to the instability of the closed-loop sys-
be achieved with expected accuracy by choosing appropriate tem. Considering this problem, an optimal cooperative con-
error values. In [16], it is developed a dual iterative ADP for trol method based on input constrained continuous nonlinear
the nonlinear systems with time-delays, the performance in- multi-agent is developed in this paper. In order to overcome
dicator function and system status are updated in every iter- the problem which is brought by input saturation, a appro-
ation. ADP is used for solving energy problem to manage or priate non-quadratic functional is led into the value function.
decrease the loss of energy[17–19]. The traditional three net- The performance index functions are approximated by critic
work structure also changes with the actual scene [20, 21]. In NNs and then we can figure out the control policies. The
[21], a critic-only structure is adopted for solving zero-sum convergence of iterative algorithm is proved and simulation
games,which is different from previous three-network archi- results are provided to illustrate the effectiveness of the de-
tecture. By using the least-square updating laws to obtain veloped algorithm.
the final weight.ADP made a major breakthrough in solving
coupled Hamilton-Jacobi-Bellman (HJB) equations by com-

This work is supported by National Natural Science Foundation


(NNSF) of China under Grant 00000000.

978-1-7281-5922-5/20/$31.00 ©2020 IEEE DDCLS'20


455
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
2 Problem Formulation be defined as
2.1 Notations Ji (δi (0), ui , u−i )
Communication topology is used to represent the infor- Z ∞
mation flow direction of multi-agent system. The directed = Ui (δi , ui , u−i )dt
communication topology is defined as G= {V, ε}, which is Z0 ∞ X Z uj
T
composed by nodes V = {v0 , v1 , ..., vN } and edges ε = = T
(δi Qii δi + 2 (ϕ−1 (v)) Rij dv)dt (5)
{(vi , vj ) : vi , vj ∈ V } ⊆ V × V . Ni = {j|(vi , vj ) ∈ G} 0 − 0
j∈N i
is defined as the neighbours’ set of agent i.A = [aij ] is the
weighted adjacency matrix,and N̄i = {Ni , i}. If aij > 0 then ϕ(·) is a monotonically increasing and continuously differ-
there there is a path from node j to node i, and j is called entiable odd function, it is a bounded function which satisfies
the neighbour of i,or else ,aij =0. The in-degree matrix is one to one mapping, and ϕ(0) = 0, and Qii > 0, Rii > 0,
PN
D = diag{di } ∈ RN×N , i = 1, 2, ..., N, and di = j=1 aij Rij ≥ 0 are constant matrices.
is the ith agent’s in-degree.Graph G’s Laplacian matrix is
Remark 1. Any control strategy needs to make the system
given by L = D − A and L ≥ 0.
satisfy the following conditions:
2.2 Dynamic systems and coordination error (1)make the closed-loop system (6) stable;
Consider a multi-agent system with N +1 nodes.The dy- (2)the performance index function is finite.
namic equation of the ith follower is expressed as follows The local value function of agent i can be defined as
ẋi = f (xi ) + gi (xi )ui , i ∈ Ω = {1, ..., N } (1) Vi (δi (t), ui , u−i )
Z ∞
where xi ∈ Rn represents the state of the ith follower and = Ui (δi , ui , u−i )dτ
ui ∈ Rmi is the control input of the ith follower with αi ≤ Zt ∞ uj
ui ≤ βi ,f (xi ) ∈ Rn , gi (xi ) ∈ Rn ×mi ,f (0) = 0,f + gi ui
X Z T
= T
(δi Qii δi + 2 (ϕ−1 (v)) Rij dv)dτ (6)
satisfy Lipschitz continuity and also contain the origin, and t − 0
||gi (xi )|| ≤ βi . j∈N i

The leader’s dynamic system is represent by


while the initial control law is admissible.
According to (6), we can obtain Hamilton-Jacobi(HJ)
ẋ = f (x0 ) (2)
equation
x0 ∈ Rn in (2).The leader gives order then the other agents Hi (δi , Vδi , ui , u−i )
follow, and the leader don’t receive information from the
other agents. =Ui (δi , ui , u−i ) + VδTi δ̇i
In order to realize the synchronization of all agents, the X Z uj T
local neighborhood cooperative error [4, 12] of each follower
T
=δi Qii δi + 2 (ϕ−1 (v)) Rij dv+
0
can be defined as j∈N̄i
X
X VδTi ( (lij + bij )(f˜j + gj uj ))
δi = aij (xi − xj ) + ci (xi − x0 ) (3) j∈N̄i
j∈Ni
=0 (7)
where ci ≥ 0 and is an invariant constant. ∂V
According to [4],we can take the derivative of (3) Vδi = ∂δii represents the derivative of local value function of
agent i to δi .
X Optimal control strategy set of N-tuples {u∗1 , u∗2 , ..., u∗N }

δ̇i = aij (ẋi − ẋj ) + ci (ẋi − ẋ0 ) constitute the Nash equilibrium point only when Ji∗ =
j∈Ni Ji {u∗1 , u∗2 , ..., u∗i , ..., u∗N } ≤ Ji {u∗1 , u∗2 , ..., ui , ..., u∗N }, ∀i ∈
N.
X
= (lij + cj )(f˜j + gj uj ) (4)
j∈N̄i
Vi∗ (δi ) is the local optimal performance index function,
we can get that Vi∗ (δi ) satisfies HJ equation
where f˜j =f (xj ) − f (x0 ).
min Hi (δi , Vδ∗i , ui , u−i )=0 (8)
ui
3 Coordination controller design
Take the derivative of Hi to ui :
In this section, we will give the control policy of the
MASs. The performance index function of saturated nonlin- 2(ϕ−1 (ui ))T Rii + VδTi (lii + bij )gi ui = 0 (9)
ear system is defined in [29]. The monotonically increasing
function ϕ−1 (·) is used as one part of the performance index Then we can obtain
function.
Refer to [29–31], by making concrete analysis of perfor- u∗i = arg min Hi (δi , Vδ∗i , ui , u−i )
ui
mance index function of nonlinear system in these refer- 1 −1 T ∗
ences.The local performance index function of node i can =−ϕ( (di + bi )Rii gi Vδi ) (10)
2
DDCLS'20
456
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
4 Policy iteration and its convergence singular values of Rii and Rij . Λi and Λi are orthogonal
symmetric matrixes. Substituting into (15) we can get
In this section, a policy iteration is proposed to solve cou-
(l) (l+1)
pled HJB equation. More than this, the policy iteration con- V̇i (δi , ui )
vergence is proved. (l+1) (l) (l+1)
The iteration contains two parts: policy evaluation and = − δiT Qii δi +2[ϕ−T (ui )Λi Σi Λi (ui − ui )]
policy improvement. The two steps repeat again and again, Z u(l)
i
T
until the policy remain unchanged. −2 (ϕ−1 (v)) Λi Σi Λi dv
0
Algorithm 1. Constraint nonlinear multi-agent systems pol- (l+1)
XZ uj
icy iteration T
−2 (ϕ−1 (v)) Λj Σj Λj dv (16)
step 1: Given an admissible initial control policy 0
(0) (0) (0) j∈Ni
u1 , u2 , ..., uN , i ∈ Ω
(l) (l) (l)
step 2: Given control policy of N agents u1 , u2 , ..., uN , Utilizing coordinate transformation ui = Λ−1
i zi , uj =
(l) (l) Λ−1
j zj , Substituting into (16)
to calculate Vi , and Vi is the value function of agent i’s
lth policy iteration.
(l) (l+1)
V̇i (δi , ui )
(l) (l) (l)
Hi (δi , Vδi , ui , u−i ) =0 (11) (l+1) (l) (l+1)
=− δiT Qii δi +2[ϕ−T (Λ−1
i zi )Λi Σi Λi (Λ−1
i zi − Λ−1
i zi )]
(l)
step 3: Update the control policy according to (10) Z zi
T
−2 (ϕ−1 (Λ−1 −1
i ζ)) Λi Σi Λi Λi dζ
(l+1) 1 −1 T (l) 0
ui = −ϕ( (di + bi )Rii gi Vδi ) (12) zj
(l)
2 XZ T
−2 (ϕ−1 (Λ−1 −1
j ζ)) Λj Σj Λj Λj dζ
Go to step 2. j∈Ni 0
In the following the convergence property will be ana- (l+1) (l) (l+1)
lyzed. = − δiT Qii δi +2[ϕ−1 (Λ−1
i zi )Λi Σi (zi − zi )]
Z zi(l)
Theorem 1. In the multi-agent distributed optimal control T
−2 (ϕ−1 (Λ−1 −1
i ζ)) Λi Σi Λi Λi dζ
with input saturation, if agent I updates the control strat- 0
(l)
egy and other agents update to the optimal control in the XZ zj
T
iteration. Then the control policy and value function of −2 (ϕ−1 (Λ−1 −1
j ζ)) Λj Σj Λj Λj dζ
0
agent i will converge to their respective optimal values: j∈Ni
(l) (l)
Vi (δi ) → Vi∗ (δi ), ui → u∗i . =−
(l+1)
δiT Qii δi +2ξ T (zi
(l)
)Σi (zi − zi
(l+1)
)]
(l) (l)
Z zi X zj
Z
Proof. According to (6) −2 ξ T (ζ)Σi dζ − 2 ξ T (ζ)Σj dζ (17)
0 j∈Ni 0
(l) (l+1)
X
V̇i (δi , ui ) = (lij + bij )f˜
(l) (l)
− where ξ T (zi )=ϕ−1 (Λ−1 T
i zi ) Λi ,
j∈Ni
(l) −1 (l)
(l)
X (l+1) ξ T (zj )=ϕ−1 (Λj zj )T Λj .
+(Vδi )T δi (lij + bij )gj uj (13)
j∈N̄i
(l) (l+1)
V̇i (δi , ui )
Then refer to formulas (7) and (13) we can easily get (l+1) (l) (l+1)
=− δiT Qii δi +2ξ T (zi )Σi (zi − zi )]
X
(lij + bij )f˜ Z (l)
zi
(l)
X Z zj
j∈N̄i −2 ξ T (ζ)Σi dζ − 2 ξ T (ζ)Σj dζ
0 j∈Ni 0
(l)
XZ uj
−1 T m
=− δiT Qii δi −2 (ϕ (v)) Rij dv X (l+1) (l) (l+1)
0 = − δiT Qii δi +2 Σi(kk) [ξ T (zi(k) )(zi(k) − zi(k) )
j∈N̄i
k=1
(l) (l+1) (l) (l)
+ (Vδi )T gj uj − (Vδi )T gj uj (14) Z (l)
zi(k)
− ξ T (ζk )dζk ]
According to (14) , we can get 0
m (l)
X XZ zj(k)
(l) (l+1)
V̇i (δi , ui )= −
(l+1)
δiT Qii δi +2[ϕ−T (ui
(l)
)Rii (ui −
(l+1)
ui )] −2 Σj(kk) ( ξ T (ζk )dζk ) (18)
(l) k=1 j∈Ni 0
Z uli XZ uj
T T
−2 (ϕ−T (v)) Rii dv − 2(ϕ−1 (v)) Rij dv Since Rii and Rij are symmetric positive definite matrixes,
0 0
j∈Ni we can get that their singular values Σi(kk) and Σj(kk) are
(15) always positive. From geometric figure we can get that if
ξ(zk ) is monotonous, then
Rii ,Rij are symmetric positive definite matrixes. Refer to
Z z(l)
[29], Rii ,Rij can be represented as Rii = Λi Σi Λi , Rij = T (l+1) (l) (l+1) i(k)

Λ Σ Λ . Σ and Σ are diagonal matrixes, which are the ξ (z i(k) )(zi(k) − zi(k) ) − ξ T (ζk )dζk < 0 (19)
j j j i j
0

DDCLS'20
457
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
(l) (l)
And because ξ T (zi )=ϕ−1 (Λ−1 T
i zi ) Λi , ϕ−1 (·) 5 Online NN controller design
is monotonous odd function, we can figure out that
(l) (l+1) (l+1) In this section, NNs utilized to help calculate the opti-
V̇i (δi , ui ) < 0, ui is admissible control policy. mal control policies through approximating critic network.
Adaptive dynamic programming with single NN structure is
(l+1) (l) used for solving coupled HJB equation. It greatly simplifies
Vi (δ0 ) − Vi (δ0 ) the algorithm structure and reduces the calculation burden.
Z ∞
(l+1) The performance index function of agent i is presented as
= Q(δ(τ, δ0 , ui ))dτ
0
Z ∞ XZ
(l+1)
uj (δ(τ,δ0 ,uj
(l+1)
) Vi∗ (δi ) = WiT Φ(δi ) + εi (δi ) (24)
−T
+2 ( ϕ (v)Rij dv)dτ
0
j∈N̄i
0 then
∞ ∂Φ(δi ) ∂εi (δi )
Vδ∗i = WiT
Z
(l+1) + (25)
− Q(δ(τ, δ0 , ui ))dτ ∂δi ∂δi
0
(l) (l+1) T
Z ∞ XZ uj (δ(τ,δ0 ,uj ) where Wi = [Wi1 ,...,Wil ] ∈ Rl is desired weight vector of
−2 ( ϕ−T (v)Rij dv)dτ agent i. Φ(δi ) is activation function vector of linear indepen-
0 0
j∈N̄i dent NN. εi (δi ) is the estimation error of NN to local value
Z ∞ (l+1) function. Suppose that Ŵi , then the critic network weight is
d(Vi (δ0 )) X
=− (lij + bij )(f˜j + gj uj )dτ
0 dδ
j∈N̄i V̂i (δi ) = ŴiT Φ(δi ) (26)
Z ∞ (l)
d(Vi (δ0 )) X
+ (lij + bij )(f˜j + gj uj )dτ (20) The HJB equation can be defined as
0 dδ
j∈N̄i

ei =Hi (δi , Vδi , ui , u−i )


(l)T uj
X
Vδi (lij + bij )f˜j XZ T
=δiT Qii δi + 2 (ϕ−1 (v)) Rij dv

j∈N i 0
j∈N̄i
(l)T (l)
X
(lij + bij )gj uj − δiT Qii δi
X
= − Vδi + VδTi ( (lij + bij )(f˜j + gj uj ))

j∈N i j∈N̄i
XZ uj
(l) (l) T
ui uj
=δiT Qii δi +2 (ϕ−1 (v)) Rij dv
Z XZ
−1 T −1 T
−2 (ϕ (v)) Rii dv − 2 (ϕ (v)) Rij dv 0
j∈N̄i
0 j∈Ni 0
∂Φ(δi ) X
(21) + WiT ( (lij + bij )(f˜j + gj uj )) (27)
∂δi
j∈N̄i
When agent i update its control policy, the other agents’ con-
trol policies hold invariant, so Appropriate weight should be chose to minimize the resid-
ual square Ei (Ŵi ).
(l)
XZ uj
T
(ϕ−1 (v)) Rij dv 1
0 Ei (Ŵi )= eT ei (28)
j∈Ni 2 i
(l+1)
XZ uj
T
= (ϕ−1 (v)) Rij dv (22) The decreasing gradient algorithm can be utilized to cal-
j∈Ni 0 culate the adaptive updating law of weight

˙ ∂Ei (Ŵi )
Ŵi = − ai [ ] (29)
(l+1) (l) ∂ Ŵi
Vi (δ0 ) − Vi (δ0 )
Z ∞
(l+1) (l+1) (l)
=−2 ϕ−T (ui )Rii (ui − ui )dτ where ∂Ei (Ŵi ) ∂Ei ∂Vi
= ∂Vi ∂ Ŵ . Then we can derive
0 ∂ Ŵi i

(l+1)
Z ∞ Z ui
+ ϕ−T (v)Rii dvdτ (23) ˙ ai ∂Ei (Ŵi )
(l) Ŵi = − 2
0 ui
(1 + σiT σi ) ∂ Ŵi
ai σi
We can easily get the conclusion Vi (δ0 ) − Vi (δ0 ) ≤
(l+1) (l) =− 2 ei (30)
(1 + σiT σi )
0. When l → ∞, then Vi∞ (δ0 ) ≥ Vi∗ (δ0 ). And
(l+1) (l)
Vi (δ0 ) ≤ Vi (δ0 ), then Vi∞ (δ0 ) ≤ Vi∗ (δ0 ). We can where ai is learning rate of critic network,
get that liml→∞ Vil (δ0 ) = Vi∗ (δ0 ), at the same time, we can (1 + σiT σi )
2
is utilized for normalization, and
obtain liml→∞ uli = u∗i . ∂Φ(δi ) P
σi = ∂δi ˜
(lij + bij )(f + gj (xj )ûj ).
j∈N̄i

DDCLS'20
458
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
0.8
x0(1)
x1(1)
0.6 x2(1)
x3(1)

0.4

0.2

Fig. 1: Structure of communication topology


0

-0.2
6 Simulation
In this section, an example is given to show the validity of -0.4

the algorithm. Multi-agent system’s consistency is achieved.


-0.6
The structure of the communication of MAS system is given 0 5 10 15 20 25 30
by Fig.1.
For simulation, refer to [4], the dynamics of the followers Fig. 2: State of agents(1)
are given as:

ẋi1 = xi2 − x2i1 xi2


ẋi2 = −(xi1 +xi2 )(1 − xi1 )2 + xi2 ui , i = 1, 2, 3 (31)

under the condition |ui | ≤ 0.5. The dynamic function of the 0.6
x0(2)
leader is given as x1(2)
0.4
x2(2)

ẋ01 = x02 − x201 x02 0.2


x3(2)

ẋ02 = −(x01 +x02 )(1 − x01 )2 (32)


0

where the parameters are designed as: Qii = I2 , Rii = -0.2


1, Rij = 0.1, j ∈ Ni , b1 = 1, the NN activation functions
-0.4
are
2 2 T -0.6
Φ1 (δ1 )= [δ11 ,δ11 δ12 ,δ12 , 0, 0, 0, 0, 0, 0]
2 2 2 2 T -0.8
Φ2 (δ2 )= [δ11 ,δ11 δ12 ,δ12 ,δ21 ,δ21 δ22 ,δ22 , 0, 0, 0]
2 2 2 2 2 2 T -1
Φ3 (δ3 )= [δ11 ,δ11 δ12 ,δ12 ,δ21 ,δ21 δ22 ,δ22 ,δ31 ,δ31 δ32 ,δ32 ] 0 5 10 15 20 25 30
(33)

and set the initial weights to be zero vec- Fig. 3: State of agents(2)
tor. The initial states of the nodes are given as
x0 (0) = [−0.6; 0.6],x1 (0) = [0.4; −0.4], x2 (0) = [0.8; −0.8],
x3 (0) = [0.65; −0.65]. By setting ϕ(·) = 0.5 tanh(·), the
system states are shown in Fig.2 and Fig.3. In these two
figures, the followers reach a consensus with the leader 12
about 15 seconds later. ||w1||
||w2||
Fig.4 shows that the NN weights converge to the optimum 10 ||w3||

in about 5 seconds. The control policies are of the followers


are shown in Fig.5. Under the condition of input saturation, 8
the state of each agent reach a consensus, the control input
also converge to the optimum. 6

References
4
[1] D. Cheng, On logic-based intelligent systems, in Proceedings
of 5th International Conference on Control and Automation,
2005: 71–75. 2

[2] S. Weng, D. Yue and J. Shi, Distributed cooperative control


for multiple photovoltaic generators in distribution power sys- 0
0 5 10 15 20 25 30
tem under event-triggered mechanism, Journal of the Franklin
Institute 2016, 353(14): 3407–3427, 2016.
[3] H. Zhang, D. Yue , W. Zhao and S. Hu, Distributed opti- Fig. 4: Convergence of critic NN weight estimations
mal consensus control for multiagent systems with input delay,
IEEE transactions on cybernetics,48(6): 1747–1759, 2017.

DDCLS'20
459
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
in the performance cost, in 2009 International Joint Confer-
0.5
u1
ence on Neural Networks, 1849–1854, 2009.
0.4 u2 [16] Q. Wei, D. Wang and D. Zhang, Dual iterative adaptive dy-
u3
0.3 namic programming for a class of discrete-time nonlinear sys-
tems with time-delays, Neural Computing and Applications,
0.2
23(7–8): 1851–1863, 2013.
0.1 [17] Q. Wei, F. L. Lewis, G. Shi and R. Song, Error-tolerant it-
0 erative adaptive dynamic programming for optimal renewable
home energy scheduling and battery management, IEEE Trans-
-0.1
actions on Industrial Electronics, 64(12): 9527-9537, 2017.
-0.2 [18] Q. Wei and D. Liu, Mixed iterative adaptive dynamic pro-
-0.3 gramming for optimal battery energy control in smart resi-
dential microgrids, IEEE Transactions on Industrial Electron-
-0.4
ics,64(5): 4110–4120, 2017.
-0.5 [19] Q. Wei, G. Shi, R. Song and Y. Liu, Adaptive dynamic
0 5 10 15 20 25 30
programming-based optimal control scheme for energy storage
systems with solar renewable energy, IEEE Transactions on
Fig. 5: Control inputs Industrial Electronics, 64(7): 5468–5478, 2017.
[20] Q. Wei, L. Wang, Y. Liu and M. M. Polycar-
pou, Optimal elevator group control via deep asynchronous
actor-critic learning, IEEE Transactions on Neural Networks
[4] W. Zhao and H. Zhang, Distributed optimal coordination con- and Learning Systems, (99): 1–12, in press 2020. DOI:
trol for nonlinear multi-agent systems using event-triggered 10.1109/TNNLS.2020.2965208
adaptive dynamic programming method, ISA transactions,91: [21] H. Jiang, H. Zhang , J. Han and K. Zhang, Iterative adap-
184–195, 2019. tive dynamic programming methods with neural network im-
[5] R. V. Mihai and M. M. Bivolaru, Cooperative distributed tra- plementation for multi-player zero-sum games, Neurocomput-
jectory optimization for a heterogeneous UAV formation,AIP ing, 307: 54–60, 2018.
Conference Proceedings,2018. [22] Q. Wei, Z. Liao, Z. Yang, B. Li and D. Liu, Continuous-
[6] A. R. Mehrabian and K. Khorasani, Constrained distributed time time-varying policy iteration, IEEE Transactions on Cy-
cooperative synchronization and reconfigurable control of het- bernetics, in press, 2019. 10.1109/TCYB.2019.2926631
erogeneous networked Euler-Lagrange multi-agent systems, [23] Q. Wei, D. Liu, Q. Lin and R. song, Adaptive dynamic pro-
Information Sciences, 370: 578–597, 2016. gramming for discrete-time zero-sum games, IEEE Transac-
[7] Y. Zhu, D. Zhao, H. He, and J. Ji, Event-triggered opti- tions on Neural Networks and Learning Systems, 29(4): 957–
mal control for partially unknown constrained-input systems 969, 2018.
via adaptive dynamic programming, IEEE Transactions on In- [24] Q. Wei, F. L. Lewis, D. Liu, R. Song and H. Lin,
dustrial Electronics, 64(5): 4101–4109, 2016. Discrete-time local value iteration adaptive dynamic program-
[8] H. Zhang, L. Cui and Y. Luo, Near-optimal control for ming: Convergence analysis, IEEE Transactions on Systems,
nonzero-sum differential games of continuous-time nonlinear Man, and Cybernetics: Systems, 48(6): 875–891, 2018.
systems using single-network ADP, IEEE transactions on cy- [25] Q. Wei, D. Liu and Q. Lin, Discrete-time local iterative
bernetics, 43(1): 206–216, 2012. adaptive dynamic programming: Terminations and admissi-
[9] K. G. Vamvoudakis, F. L. Lewis and G. R. Hudas, Multi-agent bility analysis, IEEE Transactions on Neural Networks and
differential graphical games: Online adaptive learning solution Learning Systems, 28(11):2490-2502, 2017.
for synchronization with optimality, Automatica, 48(8): 1598– [26] Q. Wei, D. Liu, Q. Lin and R. Song, Discrete-time optimal
1611, 2012. control via local policy iteration adaptive dynamic program-
[10] M. I. Abouheaf and F. L. Lewis, Multi-agent differen- ming, IEEE Transactions on Cybernetics, 47(10): 3367–3379,
tial graphical games: Nash online adaptive learning solutions, 2017.
in 52nd IEEE Conference on Decision and Control, 5803– [27] Q. Wei, B. Li and R. Song, Discrete-time stable general-
5809,2013 ized self-learning optimal control with approximation errors,
[11] H. Zhang, J. Zhang, G. H. Yang and Y. Luo, Leader-based IEEE Transactions on Neural Networks and Learning Systems,
optimal coordination control for the consensus problem of mul- 29(4): 1226–1238, 2018.
tiagent differential games via fuzzy adaptive dynamic program- [28] Q. Wei, R. Song, Z. Liao, B. Li and L. L.
ming, IEEE Transactions on Fuzzy Systems, 23(1): 152–163, Frank, Discrete-time impulsive adaptive dynamic program-
2014. ming, IEEE Transactions on Cybernetics, in press, 2019.
[12] Q. Wei, D. Liu and F. L. Lewis, Optimal distributed syn- 10.1109/TCYB.2019.2906694
chronization control for continuous-time heterogeneous multi- [29] M. Abu-Khalaf and F. L. Lewis, Nearly optimal control laws
agent differential graphical games, Information Sciences, 317: for nonlinear systems with saturating actuators using a neural
96–113, 2015. network HJB approach, Automatica, 41(5): 779–791, 2005.
[13] Q. Jiao, H. Modares, S. Xu and F. L. Lewis, Multi-agent [30] X. Lin, Y. Huang , N. Cao and Y. Lin, Optimal con-
zero-sum differential graphical games for disturbance rejection trol scheme for nonlinear systems with saturating actuator us-
in distributed control, Automatica, 69: 24–34, 2016. ing iterative adaptive dynamic programming, in Proceedings
[14] A. Al-Tamimi , F. Lewis and M. Abu-Khalaf, Discrete-time of 2012 UKACC International Conference on Control, 58–63,
nonlinear HJB solution using approximate dynamic program- 2012.
ming: Convergence proof, in IEEE Transactions on Systems, [31] L. Cui, X. Xie, X. Wang, Y. Luo and J. Liu, Event-triggered
Man, and Cybernetics, Part B (Cybernetics), 943–949, 2008. single-network ADP method for constrained optimal tracking
[15] D. Liu and N. Jin, Adaptive dynamic programming for control of continuous-time non-linear systems, Applied Math-
discrete-time systems with infinite horizon and ∈-error bound ematics and Computation, 352: 220–234, 2019.

DDCLS'20
460
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.

You might also like