0% found this document useful (0 votes)

8 views6 pages

Distributed Optimal Coordination Control For Continuous-Time Nonlinear Multi-Agent Systems With Input Constraints

This paper addresses the optimal coordination control for nonlinear multi-agent systems (MASs) with input constraints using an adaptive dynamic programming (ADP) algorithm. It introduces a non-quadratic functional to transform the problem into an optimization framework, leading to a distributed control law that aims for Nash equilibrium among agents. The effectiveness of the proposed algorithm is demonstrated through simulations, showcasing its convergence and applicability in real-world scenarios.

Uploaded by

lekhanh2410

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views6 pages

Distributed Optimal Coordination Control For Continuous-Time Nonlinear Multi-Agent Systems With Input Constraints

Uploaded by

lekhanh2410

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2020 IEEE 9th Data Driven Control and Learning Systems Conference

November 20-22, 2020, Liuzhou, China

Distributed Optimal Coordination Control for Continuous-Time

Nonlinear Multi-Agent Systems With Input Constraints
Yunhong Deng1 , Jun Xiao1 , Qinglai Wei1,2
1. The authors are with the University of Chinese Academy of Sciences, Beijing 100049, P. R. China
E-mail: [email protected], [email protected], [email protected]
2. This work was supported in part by the National Natural Science Foundation of China under Grants 61722312, 61533017, and by the
European Union’s Horizon 2020 research and innovation programme under grant agreement 739551 (KIOS CoE)

Abstract: This paper is concerned with an optimal coordination control problem for nonlinear multi-agent systems (MASs) with
constraints of the control inputs. The idea of daptive dynamic programming (ADP) algorithm is to use the policy iteration to
solve the coupled Hamilton-Jacobi equations. First, a suitable non-quadratic functional is introduced into the cost function to
transform the question into an optimization problem. Second, a distributed control law is designed for each agent, which aims
that the cost function of the MASs converge to Nash equilibrium. Next, the analysis of the convergence is indicated that the
iterative cost functions of nonlinear multi-agent systems is convergent. Neural network (NNs) are used to approximate the cost
functions for the calculation of the control laws. Finally, simulation results show the effectiveness of the coordination control
algorithm.
Key Words: Adaptive dynamic programming (ADP), multi-agent systems, Nash equilibrium, reinforcement learning, optimal
control.

1 Introduction bining Reinforcement learning, neural network and dynamic

programming, it becomes hot in current development of au-
Distributed cooperative optimal control of multi-agent tomation technology. ADP algorithm includes two types:
systems (MASs) have made great progress in recent years[2– value iteration and policy iteration[22–28], the value itera-
4] benefit from extensively applications like unmanned aerial tion needs a lot of data to train the model, while the policy
vehicles (UAVs)[5] and Euler-Lagrange systems[6], etc. iteration needs to give a permissive initial control.
This is an attractive subject that the application of ADP not In practical engineering applications, saturation and non-
only makes each member of coordination control achieve linearity are hardly avoidable. The state of nonlinear system
consensus, but also makes the performance index function is time-varying, so it is necessary to detect the state of the
reach the minimum [3, 7, 8]. Thus we can do more inter- system continuously and change the control strategy accord-
esting research to solve a variety of complex MASs. ADP ing to the state of the system. By using distributed control
has been extensively utilized for solving consensus problem law, each agent only communicates with neighboring agents
of multi-agent graphical games in past decade[9–13]. Each and only needs to store and calculate a small amount of in-
member receives limited information from its neighbors, and formation. The system is easy to implement, has low re-
the final state is up to the entire communication agents. quirements for equipment and strong system stability. With
As an intelligent control algorithm, adaptive dynamic pro- the increasing application fields of distributed control, the
gramming algorithm has obvious advantages in solving the advantages of distributed control are also shown, which can
optimal control problem of nonlinear system. In [14], the reduce the system complexity and control cost, and improve
authors proved the convergence of HDP algorithm based on the system robustness and anti-interference ability. As an in-
value iteration in nonlinear systems, and provided a theo- telligent control algorithm, adaptive dynamic programming
retical analysis for the widespread application of ADP algo- enriches its theoretical research and makes a contribution to
rithms. To solve infinite horizon adaptive dynamic program- the field of artificial intelligence.
ming problem, The authors in [15] proposed a dynamic pro- In this paper, if the limit of the controller is not consid-
gramming method for discrete-time systems that ADP can ered, it may lead to the instability of the closed-loop sys-
be achieved with expected accuracy by choosing appropriate tem. Considering this problem, an optimal cooperative con-
error values. In [16], it is developed a dual iterative ADP for trol method based on input constrained continuous nonlinear
the nonlinear systems with time-delays, the performance in- multi-agent is developed in this paper. In order to overcome
dicator function and system status are updated in every iter- the problem which is brought by input saturation, a appro-
ation. ADP is used for solving energy problem to manage or priate non-quadratic functional is led into the value function.
decrease the loss of energy[17–19]. The traditional three net- The performance index functions are approximated by critic
work structure also changes with the actual scene [20, 21]. In NNs and then we can figure out the control policies. The
[21], a critic-only structure is adopted for solving zero-sum convergence of iterative algorithm is proved and simulation
games,which is different from previous three-network archi- results are provided to illustrate the effectiveness of the de-
tecture. By using the least-square updating laws to obtain veloped algorithm.
the final weight.ADP made a major breakthrough in solving
coupled Hamilton-Jacobi-Bellman (HJB) equations by com-

This work is supported by National Natural Science Foundation

(NNSF) of China under Grant 00000000.

978-1-7281-5922-5/20/$31.00 ©2020 IEEE DDCLS'20

455
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
2 Problem Formulation be defined as
2.1 Notations Ji (δi (0), ui , u−i )
Communication topology is used to represent the infor- Z ∞
mation flow direction of multi-agent system. The directed = Ui (δi , ui , u−i )dt
communication topology is defined as G= {V, ε}, which is Z0 ∞ X Z uj
T
composed by nodes V = {v0 , v1 , ..., vN } and edges ε = = T
(δi Qii δi + 2 (ϕ−1 (v)) Rij dv)dt (5)
{(vi , vj ) : vi , vj ∈ V } ⊆ V × V . Ni = {j|(vi , vj ) ∈ G} 0 − 0
j∈N i
is defined as the neighbours’ set of agent i.A = [aij ] is the
weighted adjacency matrix,and N̄i = {Ni , i}. If aij > 0 then ϕ(·) is a monotonically increasing and continuously differ-
there there is a path from node j to node i, and j is called entiable odd function, it is a bounded function which satisfies
the neighbour of i,or else ,aij =0. The in-degree matrix is one to one mapping, and ϕ(0) = 0, and Qii > 0, Rii > 0,
PN
D = diag{di } ∈ RN×N , i = 1, 2, ..., N, and di = j=1 aij Rij ≥ 0 are constant matrices.
is the ith agent’s in-degree.Graph G’s Laplacian matrix is
Remark 1. Any control strategy needs to make the system
given by L = D − A and L ≥ 0.
satisfy the following conditions:
2.2 Dynamic systems and coordination error (1)make the closed-loop system (6) stable;
Consider a multi-agent system with N +1 nodes.The dy- (2)the performance index function is finite.
namic equation of the ith follower is expressed as follows The local value function of agent i can be defined as
ẋi = f (xi ) + gi (xi )ui , i ∈ Ω = {1, ..., N } (1) Vi (δi (t), ui , u−i )
Z ∞
where xi ∈ Rn represents the state of the ith follower and = Ui (δi , ui , u−i )dτ
ui ∈ Rmi is the control input of the ith follower with αi ≤ Zt ∞ uj
ui ≤ βi ,f (xi ) ∈ Rn , gi (xi ) ∈ Rn ×mi ,f (0) = 0,f + gi ui
X Z T
= T
(δi Qii δi + 2 (ϕ−1 (v)) Rij dv)dτ (6)
satisfy Lipschitz continuity and also contain the origin, and t − 0
||gi (xi )|| ≤ βi . j∈N i

The leader’s dynamic system is represent by

while the initial control law is admissible.
According to (6), we can obtain Hamilton-Jacobi(HJ)
ẋ = f (x0 ) (2)
equation
x0 ∈ Rn in (2).The leader gives order then the other agents Hi (δi , Vδi , ui , u−i )
follow, and the leader don’t receive information from the
other agents. =Ui (δi , ui , u−i ) + VδTi δ̇i
In order to realize the synchronization of all agents, the X Z uj T
local neighborhood cooperative error [4, 12] of each follower
T
=δi Qii δi + 2 (ϕ−1 (v)) Rij dv+
0
can be defined as j∈N̄i
X
X VδTi ( (lij + bij )(f˜j + gj uj ))
δi = aij (xi − xj ) + ci (xi − x0 ) (3) j∈N̄i
j∈Ni
=0 (7)
where ci ≥ 0 and is an invariant constant. ∂V
According to [4],we can take the derivative of (3) Vδi = ∂δii represents the derivative of local value function of
agent i to δi .
X Optimal control strategy set of N-tuples {u∗1 , u∗2 , ..., u∗N }
∆
δ̇i = aij (ẋi − ẋj ) + ci (ẋi − ẋ0 ) constitute the Nash equilibrium point only when Ji∗ =
j∈Ni Ji {u∗1 , u∗2 , ..., u∗i , ..., u∗N } ≤ Ji {u∗1 , u∗2 , ..., ui , ..., u∗N }, ∀i ∈
N.
X
= (lij + cj )(f˜j + gj uj ) (4)
j∈N̄i
Vi∗ (δi ) is the local optimal performance index function,
we can get that Vi∗ (δi ) satisfies HJ equation
where f˜j =f (xj ) − f (x0 ).
min Hi (δi , Vδ∗i , ui , u−i )=0 (8)
ui
3 Coordination controller design
Take the derivative of Hi to ui :
In this section, we will give the control policy of the
MASs. The performance index function of saturated nonlin- 2(ϕ−1 (ui ))T Rii + VδTi (lii + bij )gi ui = 0 (9)
ear system is defined in [29]. The monotonically increasing
function ϕ−1 (·) is used as one part of the performance index Then we can obtain
function.
Refer to [29–31], by making concrete analysis of perfor- u∗i = arg min Hi (δi , Vδ∗i , ui , u−i )
ui
mance index function of nonlinear system in these refer- 1 −1 T ∗
ences.The local performance index function of node i can =−ϕ( (di + bi )Rii gi Vδi ) (10)
2
DDCLS'20
456
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
4 Policy iteration and its convergence singular values of Rii and Rij . Λi and Λi are orthogonal
symmetric matrixes. Substituting into (15) we can get
In this section, a policy iteration is proposed to solve cou-
(l) (l+1)
pled HJB equation. More than this, the policy iteration con- V̇i (δi , ui )
vergence is proved. (l+1) (l) (l+1)
The iteration contains two parts: policy evaluation and = − δiT Qii δi +2[ϕ−T (ui )Λi Σi Λi (ui − ui )]
policy improvement. The two steps repeat again and again, Z u(l)
i
T
until the policy remain unchanged. −2 (ϕ−1 (v)) Λi Σi Λi dv
0
Algorithm 1. Constraint nonlinear multi-agent systems pol- (l+1)
XZ uj
icy iteration T
−2 (ϕ−1 (v)) Λj Σj Λj dv (16)
step 1: Given an admissible initial control policy 0
(0) (0) (0) j∈Ni
u1 , u2 , ..., uN , i ∈ Ω
(l) (l) (l)
step 2: Given control policy of N agents u1 , u2 , ..., uN , Utilizing coordinate transformation ui = Λ−1
i zi , uj =
(l) (l) Λ−1
j zj , Substituting into (16)
to calculate Vi , and Vi is the value function of agent i’s
lth policy iteration.
(l) (l+1)
V̇i (δi , ui )
(l) (l) (l)
Hi (δi , Vδi , ui , u−i ) =0 (11) (l+1) (l) (l+1)
=− δiT Qii δi +2[ϕ−T (Λ−1
i zi )Λi Σi Λi (Λ−1
i zi − Λ−1
i zi )]
(l)
step 3: Update the control policy according to (10) Z zi
T
−2 (ϕ−1 (Λ−1 −1
i ζ)) Λi Σi Λi Λi dζ
(l+1) 1 −1 T (l) 0
ui = −ϕ( (di + bi )Rii gi Vδi ) (12) zj
(l)
2 XZ T
−2 (ϕ−1 (Λ−1 −1
j ζ)) Λj Σj Λj Λj dζ
Go to step 2. j∈Ni 0
In the following the convergence property will be ana- (l+1) (l) (l+1)
lyzed. = − δiT Qii δi +2[ϕ−1 (Λ−1
i zi )Λi Σi (zi − zi )]
Z zi(l)
Theorem 1. In the multi-agent distributed optimal control T
−2 (ϕ−1 (Λ−1 −1
i ζ)) Λi Σi Λi Λi dζ
with input saturation, if agent I updates the control strat- 0
(l)
egy and other agents update to the optimal control in the XZ zj
T
iteration. Then the control policy and value function of −2 (ϕ−1 (Λ−1 −1
j ζ)) Λj Σj Λj Λj dζ
0
agent i will converge to their respective optimal values: j∈Ni
(l) (l)
Vi (δi ) → Vi∗ (δi ), ui → u∗i . =−
(l+1)
δiT Qii δi +2ξ T (zi
(l)
)Σi (zi − zi
(l+1)
)]
(l) (l)
Z zi X zj
Z
Proof. According to (6) −2 ξ T (ζ)Σi dζ − 2 ξ T (ζ)Σj dζ (17)
0 j∈Ni 0
(l) (l+1)
X
V̇i (δi , ui ) = (lij + bij )f˜
(l) (l)
− where ξ T (zi )=ϕ−1 (Λ−1 T
i zi ) Λi ,
j∈Ni
(l) −1 (l)
(l)
X (l+1) ξ T (zj )=ϕ−1 (Λj zj )T Λj .
+(Vδi )T δi (lij + bij )gj uj (13)
j∈N̄i
(l) (l+1)
V̇i (δi , ui )
Then refer to formulas (7) and (13) we can easily get (l+1) (l) (l+1)
=− δiT Qii δi +2ξ T (zi )Σi (zi − zi )]
X
(lij + bij )f˜ Z (l)
zi
(l)
X Z zj
j∈N̄i −2 ξ T (ζ)Σi dζ − 2 ξ T (ζ)Σj dζ
0 j∈Ni 0
(l)
XZ uj
−1 T m
=− δiT Qii δi −2 (ϕ (v)) Rij dv X (l+1) (l) (l+1)
0 = − δiT Qii δi +2 Σi(kk) [ξ T (zi(k) )(zi(k) − zi(k) )
j∈N̄i
k=1
(l) (l+1) (l) (l)
+ (Vδi )T gj uj − (Vδi )T gj uj (14) Z (l)
zi(k)
− ξ T (ζk )dζk ]
According to (14) , we can get 0
m (l)
X XZ zj(k)
(l) (l+1)
V̇i (δi , ui )= −
(l+1)
δiT Qii δi +2[ϕ−T (ui
(l)
)Rii (ui −
(l+1)
ui )] −2 Σj(kk) ( ξ T (ζk )dζk ) (18)
(l) k=1 j∈Ni 0
Z uli XZ uj
T T
−2 (ϕ−T (v)) Rii dv − 2(ϕ−1 (v)) Rij dv Since Rii and Rij are symmetric positive definite matrixes,
0 0
j∈Ni we can get that their singular values Σi(kk) and Σj(kk) are
(15) always positive. From geometric figure we can get that if
ξ(zk ) is monotonous, then
Rii ,Rij are symmetric positive definite matrixes. Refer to
Z z(l)
[29], Rii ,Rij can be represented as Rii = Λi Σi Λi , Rij = T (l+1) (l) (l+1) i(k)

Λ Σ Λ . Σ and Σ are diagonal matrixes, which are the ξ (z i(k) )(zi(k) − zi(k) ) − ξ T (ζk )dζk < 0 (19)
j j j i j
0

DDCLS'20
457
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
(l) (l)
And because ξ T (zi )=ϕ−1 (Λ−1 T
i zi ) Λi , ϕ−1 (·) 5 Online NN controller design
is monotonous odd function, we can figure out that
(l) (l+1) (l+1) In this section, NNs utilized to help calculate the opti-
V̇i (δi , ui ) < 0, ui is admissible control policy. mal control policies through approximating critic network.
Adaptive dynamic programming with single NN structure is
(l+1) (l) used for solving coupled HJB equation. It greatly simplifies
Vi (δ0 ) − Vi (δ0 ) the algorithm structure and reduces the calculation burden.
Z ∞
(l+1) The performance index function of agent i is presented as
= Q(δ(τ, δ0 , ui ))dτ
0
Z ∞ XZ
(l+1)
uj (δ(τ,δ0 ,uj
(l+1)
) Vi∗ (δi ) = WiT Φ(δi ) + εi (δi ) (24)
−T
+2 ( ϕ (v)Rij dv)dτ
0
j∈N̄i
0 then
∞ ∂Φ(δi ) ∂εi (δi )
Vδ∗i = WiT
Z
(l+1) + (25)
− Q(δ(τ, δ0 , ui ))dτ ∂δi ∂δi
0
(l) (l+1) T
Z ∞ XZ uj (δ(τ,δ0 ,uj ) where Wi = [Wi1 ,...,Wil ] ∈ Rl is desired weight vector of
−2 ( ϕ−T (v)Rij dv)dτ agent i. Φ(δi ) is activation function vector of linear indepen-
0 0
j∈N̄i dent NN. εi (δi ) is the estimation error of NN to local value
Z ∞ (l+1) function. Suppose that Ŵi , then the critic network weight is
d(Vi (δ0 )) X
=− (lij + bij )(f˜j + gj uj )dτ
0 dδ
j∈N̄i V̂i (δi ) = ŴiT Φ(δi ) (26)
Z ∞ (l)
d(Vi (δ0 )) X
+ (lij + bij )(f˜j + gj uj )dτ (20) The HJB equation can be defined as
0 dδ
j∈N̄i

ei =Hi (δi , Vδi , ui , u−i )

(l)T uj
X
Vδi (lij + bij )f˜j XZ T
=δiT Qii δi + 2 (ϕ−1 (v)) Rij dv
−
j∈N i 0
j∈N̄i
(l)T (l)
X
(lij + bij )gj uj − δiT Qii δi
X
= − Vδi + VδTi ( (lij + bij )(f˜j + gj uj ))
−
j∈N i j∈N̄i
XZ uj
(l) (l) T
ui uj
=δiT Qii δi +2 (ϕ−1 (v)) Rij dv
Z XZ
−1 T −1 T
−2 (ϕ (v)) Rii dv − 2 (ϕ (v)) Rij dv 0
j∈N̄i
0 j∈Ni 0
∂Φ(δi ) X
(21) + WiT ( (lij + bij )(f˜j + gj uj )) (27)
∂δi
j∈N̄i
When agent i update its control policy, the other agents’ con-
trol policies hold invariant, so Appropriate weight should be chose to minimize the resid-
ual square Ei (Ŵi ).
(l)
XZ uj
T
(ϕ−1 (v)) Rij dv 1
0 Ei (Ŵi )= eT ei (28)
j∈Ni 2 i
(l+1)
XZ uj
T
= (ϕ−1 (v)) Rij dv (22) The decreasing gradient algorithm can be utilized to cal-
j∈Ni 0 culate the adaptive updating law of weight

˙ ∂Ei (Ŵi )
Ŵi = − ai [ ] (29)
(l+1) (l) ∂ Ŵi
Vi (δ0 ) − Vi (δ0 )
Z ∞
(l+1) (l+1) (l)
=−2 ϕ−T (ui )Rii (ui − ui )dτ where ∂Ei (Ŵi ) ∂Ei ∂Vi
= ∂Vi ∂ Ŵ . Then we can derive
0 ∂ Ŵi i

(l+1)
Z ∞ Z ui
+ ϕ−T (v)Rii dvdτ (23) ˙ ai ∂Ei (Ŵi )
(l) Ŵi = − 2
0 ui
(1 + σiT σi ) ∂ Ŵi
ai σi
We can easily get the conclusion Vi (δ0 ) − Vi (δ0 ) ≤
(l+1) (l) =− 2 ei (30)
(1 + σiT σi )
0. When l → ∞, then Vi∞ (δ0 ) ≥ Vi∗ (δ0 ). And
(l+1) (l)
Vi (δ0 ) ≤ Vi (δ0 ), then Vi∞ (δ0 ) ≤ Vi∗ (δ0 ). We can where ai is learning rate of critic network,
get that liml→∞ Vil (δ0 ) = Vi∗ (δ0 ), at the same time, we can (1 + σiT σi )
2
is utilized for normalization, and
obtain liml→∞ uli = u∗i . ∂Φ(δi ) P
σi = ∂δi ˜
(lij + bij )(f + gj (xj )ûj ).
j∈N̄i

DDCLS'20
458
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
0.8
x0(1)
x1(1)
0.6 x2(1)
x3(1)

0.4

0.2

Fig. 1: Structure of communication topology

-0.2
6 Simulation
In this section, an example is given to show the validity of -0.4

the algorithm. Multi-agent system’s consistency is achieved.

-0.6
The structure of the communication of MAS system is given 0 5 10 15 20 25 30
by Fig.1.
For simulation, refer to [4], the dynamics of the followers Fig. 2: State of agents(1)
are given as:

ẋi1 = xi2 − x2i1 xi2

ẋi2 = −(xi1 +xi2 )(1 − xi1 )2 + xi2 ui , i = 1, 2, 3 (31)

under the condition |ui | ≤ 0.5. The dynamic function of the 0.6
x0(2)
leader is given as x1(2)
0.4
x2(2)

ẋ01 = x02 − x201 x02 0.2

x3(2)

ẋ02 = −(x01 +x02 )(1 − x01 )2 (32)

where the parameters are designed as: Qii = I2 , Rii = -0.2

1, Rij = 0.1, j ∈ Ni , b1 = 1, the NN activation functions
-0.4
are
2 2 T -0.6
Φ1 (δ1 )= [δ11 ,δ11 δ12 ,δ12 , 0, 0, 0, 0, 0, 0]
2 2 2 2 T -0.8
Φ2 (δ2 )= [δ11 ,δ11 δ12 ,δ12 ,δ21 ,δ21 δ22 ,δ22 , 0, 0, 0]
2 2 2 2 2 2 T -1
Φ3 (δ3 )= [δ11 ,δ11 δ12 ,δ12 ,δ21 ,δ21 δ22 ,δ22 ,δ31 ,δ31 δ32 ,δ32 ] 0 5 10 15 20 25 30
(33)

and set the initial weights to be zero vec- Fig. 3: State of agents(2)
tor. The initial states of the nodes are given as
x0 (0) = [−0.6; 0.6],x1 (0) = [0.4; −0.4], x2 (0) = [0.8; −0.8],
x3 (0) = [0.65; −0.65]. By setting ϕ(·) = 0.5 tanh(·), the
system states are shown in Fig.2 and Fig.3. In these two
figures, the followers reach a consensus with the leader 12
about 15 seconds later. ||w1||
||w2||
Fig.4 shows that the NN weights converge to the optimum 10 ||w3||

in about 5 seconds. The control policies are of the followers

are shown in Fig.5. Under the condition of input saturation, 8
the state of each agent reach a consensus, the control input
also converge to the optimum. 6

References
4
[1] D. Cheng, On logic-based intelligent systems, in Proceedings
of 5th International Conference on Control and Automation,
2005: 71–75. 2

[2] S. Weng, D. Yue and J. Shi, Distributed cooperative control

for multiple photovoltaic generators in distribution power sys- 0
0 5 10 15 20 25 30
tem under event-triggered mechanism, Journal of the Franklin
Institute 2016, 353(14): 3407–3427, 2016.
[3] H. Zhang, D. Yue , W. Zhao and S. Hu, Distributed opti- Fig. 4: Convergence of critic NN weight estimations
mal consensus control for multiagent systems with input delay,
IEEE transactions on cybernetics,48(6): 1747–1759, 2017.

DDCLS'20
459
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
in the performance cost, in 2009 International Joint Confer-
0.5
u1
ence on Neural Networks, 1849–1854, 2009.
0.4 u2 [16] Q. Wei, D. Wang and D. Zhang, Dual iterative adaptive dy-
u3
0.3 namic programming for a class of discrete-time nonlinear sys-
tems with time-delays, Neural Computing and Applications,
0.2
23(7–8): 1851–1863, 2013.
0.1 [17] Q. Wei, F. L. Lewis, G. Shi and R. Song, Error-tolerant it-
0 erative adaptive dynamic programming for optimal renewable
home energy scheduling and battery management, IEEE Trans-
-0.1
actions on Industrial Electronics, 64(12): 9527-9537, 2017.
-0.2 [18] Q. Wei and D. Liu, Mixed iterative adaptive dynamic pro-
-0.3 gramming for optimal battery energy control in smart resi-
dential microgrids, IEEE Transactions on Industrial Electron-
-0.4
ics,64(5): 4110–4120, 2017.
-0.5 [19] Q. Wei, G. Shi, R. Song and Y. Liu, Adaptive dynamic
0 5 10 15 20 25 30
programming-based optimal control scheme for energy storage
systems with solar renewable energy, IEEE Transactions on
Fig. 5: Control inputs Industrial Electronics, 64(7): 5468–5478, 2017.
[20] Q. Wei, L. Wang, Y. Liu and M. M. Polycar-
pou, Optimal elevator group control via deep asynchronous
actor-critic learning, IEEE Transactions on Neural Networks
[4] W. Zhao and H. Zhang, Distributed optimal coordination con- and Learning Systems, (99): 1–12, in press 2020. DOI:
trol for nonlinear multi-agent systems using event-triggered 10.1109/TNNLS.2020.2965208
adaptive dynamic programming method, ISA transactions,91: [21] H. Jiang, H. Zhang , J. Han and K. Zhang, Iterative adap-
184–195, 2019. tive dynamic programming methods with neural network im-
[5] R. V. Mihai and M. M. Bivolaru, Cooperative distributed tra- plementation for multi-player zero-sum games, Neurocomput-
jectory optimization for a heterogeneous UAV formation,AIP ing, 307: 54–60, 2018.
Conference Proceedings,2018. [22] Q. Wei, Z. Liao, Z. Yang, B. Li and D. Liu, Continuous-
[6] A. R. Mehrabian and K. Khorasani, Constrained distributed time time-varying policy iteration, IEEE Transactions on Cy-
cooperative synchronization and reconfigurable control of het- bernetics, in press, 2019. 10.1109/TCYB.2019.2926631
erogeneous networked Euler-Lagrange multi-agent systems, [23] Q. Wei, D. Liu, Q. Lin and R. song, Adaptive dynamic pro-
Information Sciences, 370: 578–597, 2016. gramming for discrete-time zero-sum games, IEEE Transac-
[7] Y. Zhu, D. Zhao, H. He, and J. Ji, Event-triggered opti- tions on Neural Networks and Learning Systems, 29(4): 957–
mal control for partially unknown constrained-input systems 969, 2018.
via adaptive dynamic programming, IEEE Transactions on In- [24] Q. Wei, F. L. Lewis, D. Liu, R. Song and H. Lin,
dustrial Electronics, 64(5): 4101–4109, 2016. Discrete-time local value iteration adaptive dynamic program-
[8] H. Zhang, L. Cui and Y. Luo, Near-optimal control for ming: Convergence analysis, IEEE Transactions on Systems,
nonzero-sum differential games of continuous-time nonlinear Man, and Cybernetics: Systems, 48(6): 875–891, 2018.
systems using single-network ADP, IEEE transactions on cy- [25] Q. Wei, D. Liu and Q. Lin, Discrete-time local iterative
bernetics, 43(1): 206–216, 2012. adaptive dynamic programming: Terminations and admissi-
[9] K. G. Vamvoudakis, F. L. Lewis and G. R. Hudas, Multi-agent bility analysis, IEEE Transactions on Neural Networks and
differential graphical games: Online adaptive learning solution Learning Systems, 28(11):2490-2502, 2017.
for synchronization with optimality, Automatica, 48(8): 1598– [26] Q. Wei, D. Liu, Q. Lin and R. Song, Discrete-time optimal
1611, 2012. control via local policy iteration adaptive dynamic program-
[10] M. I. Abouheaf and F. L. Lewis, Multi-agent differen- ming, IEEE Transactions on Cybernetics, 47(10): 3367–3379,
tial graphical games: Nash online adaptive learning solutions, 2017.
in 52nd IEEE Conference on Decision and Control, 5803– [27] Q. Wei, B. Li and R. Song, Discrete-time stable general-
5809,2013 ized self-learning optimal control with approximation errors,
[11] H. Zhang, J. Zhang, G. H. Yang and Y. Luo, Leader-based IEEE Transactions on Neural Networks and Learning Systems,
optimal coordination control for the consensus problem of mul- 29(4): 1226–1238, 2018.
tiagent differential games via fuzzy adaptive dynamic program- [28] Q. Wei, R. Song, Z. Liao, B. Li and L. L.
ming, IEEE Transactions on Fuzzy Systems, 23(1): 152–163, Frank, Discrete-time impulsive adaptive dynamic program-
2014. ming, IEEE Transactions on Cybernetics, in press, 2019.
[12] Q. Wei, D. Liu and F. L. Lewis, Optimal distributed syn- 10.1109/TCYB.2019.2906694
chronization control for continuous-time heterogeneous multi- [29] M. Abu-Khalaf and F. L. Lewis, Nearly optimal control laws
agent differential graphical games, Information Sciences, 317: for nonlinear systems with saturating actuators using a neural
96–113, 2015. network HJB approach, Automatica, 41(5): 779–791, 2005.
[13] Q. Jiao, H. Modares, S. Xu and F. L. Lewis, Multi-agent [30] X. Lin, Y. Huang , N. Cao and Y. Lin, Optimal con-
zero-sum differential graphical games for disturbance rejection trol scheme for nonlinear systems with saturating actuator us-
in distributed control, Automatica, 69: 24–34, 2016. ing iterative adaptive dynamic programming, in Proceedings
[14] A. Al-Tamimi , F. Lewis and M. Abu-Khalaf, Discrete-time of 2012 UKACC International Conference on Control, 58–63,
nonlinear HJB solution using approximate dynamic program- 2012.
ming: Convergence proof, in IEEE Transactions on Systems, [31] L. Cui, X. Xie, X. Wang, Y. Luo and J. Liu, Event-triggered
Man, and Cybernetics, Part B (Cybernetics), 943–949, 2008. single-network ADP method for constrained optimal tracking
[15] D. Liu and N. Jin, Adaptive dynamic programming for control of continuous-time non-linear systems, Applied Math-
discrete-time systems with infinite horizon and ∈-error bound ematics and Computation, 352: 220–234, 2019.

DDCLS'20
460
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.

Zhongkui Li Cooperative Control of Multi-Agent Systems
No ratings yet
Zhongkui Li Cooperative Control of Multi-Agent Systems
263 pages
Sprinkler Hydraulic Calculation (Manual Worksheet)
0% (1)
Sprinkler Hydraulic Calculation (Manual Worksheet)
5 pages
AWS AI ML Virtual Internship Full Report
No ratings yet
AWS AI ML Virtual Internship Full Report
33 pages
GE OHV AC Control System Events For 17KG537 and Counters - Prolfiles
100% (2)
GE OHV AC Control System Events For 17KG537 and Counters - Prolfiles
74 pages
Adaptive Dynamic Programming: Single and Multiple Controllers
No ratings yet
Adaptive Dynamic Programming: Single and Multiple Controllers
278 pages
User Manual Vital Signs Monitor (AV-VItal X)
No ratings yet
User Manual Vital Signs Monitor (AV-VItal X)
111 pages
Adaptive Control For Distributed Leader-Following
No ratings yet
Adaptive Control For Distributed Leader-Following
117 pages
IP CSV Project For Class 12
No ratings yet
IP CSV Project For Class 12
22 pages
BBMbook Cambridge Newstyle
No ratings yet
BBMbook Cambridge Newstyle
373 pages
Beyond Vibe Checks - A PM's Complete Guide To Evals
No ratings yet
Beyond Vibe Checks - A PM's Complete Guide To Evals
20 pages
Robust Control Scheme For A Class of Uncertain Nonlinear Systems With Completely Unknown Dynamics Using Data-Driven Reinforcement Learning Method
No ratings yet
Robust Control Scheme For A Class of Uncertain Nonlinear Systems With Completely Unknown Dynamics Using Data-Driven Reinforcement Learning Method
34 pages
Near-Optimal Control of Dynamical Systems With Neural Ordinary Differential Equations
No ratings yet
Near-Optimal Control of Dynamical Systems With Neural Ordinary Differential Equations
23 pages
1 s2.0 S0020025524012647 Main
No ratings yet
1 s2.0 S0020025524012647 Main
22 pages
A Neural Network Approach For Solving Optimal Control Problems With Inequality Constraints and Some Applications
No ratings yet
A Neural Network Approach For Solving Optimal Control Problems With Inequality Constraints and Some Applications
29 pages
An Introduction To Model-Based Predictive Control (MPC) : ECE 680 Fall 2017
No ratings yet
An Introduction To Model-Based Predictive Control (MPC) : ECE 680 Fall 2017
25 pages
Model-Based Adaptive Critic Designs: Editor's Summary
No ratings yet
Model-Based Adaptive Critic Designs: Editor's Summary
31 pages
Low-Cost Adaptive Fuzzy Neural Prescribed Performa
No ratings yet
Low-Cost Adaptive Fuzzy Neural Prescribed Performa
27 pages
Applsci 15 03226
No ratings yet
Applsci 15 03226
18 pages
My Solar Edge 3
No ratings yet
My Solar Edge 3
925 pages
Lyapunov, Adaptive, and Optimal Design Techniques For Cooperative Systems On Directed Communication Graphs
No ratings yet
Lyapunov, Adaptive, and Optimal Design Techniques For Cooperative Systems On Directed Communication Graphs
16 pages
PhysRevResearch.5.013122 Physics of Networks
No ratings yet
PhysRevResearch.5.013122 Physics of Networks
9 pages
Observer-Based Optimal Fault-Tolerant Tracking Control For Input-Constrained Interconnected Nonlinear Systems With Mismatched Disturbances
No ratings yet
Observer-Based Optimal Fault-Tolerant Tracking Control For Input-Constrained Interconnected Nonlinear Systems With Mismatched Disturbances
24 pages
! Boss
No ratings yet
! Boss
17 pages
Adaptive Dynamic Programming
No ratings yet
Adaptive Dynamic Programming
9 pages
10 1002@047134608X W1022 Pub2
No ratings yet
10 1002@047134608X W1022 Pub2
13 pages
Lewis LFC
No ratings yet
Lewis LFC
15 pages
20-IEEE-NNLS-Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming
No ratings yet
20-IEEE-NNLS-Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming
13 pages
Global Adaptive Dynamic Programming For Continuous-Time Nonlinear Systems
No ratings yet
Global Adaptive Dynamic Programming For Continuous-Time Nonlinear Systems
13 pages
Distributed Optimal Consensus Control For Multiagent Systems With Input Delay
No ratings yet
Distributed Optimal Consensus Control For Multiagent Systems With Input Delay
13 pages
Data-Enabled Predictive Control: in The Shallows of The Deepc
No ratings yet
Data-Enabled Predictive Control: in The Shallows of The Deepc
8 pages
Optimal Control in Large Stochastic Multi-Agent Systems: (B.vandenbroek, W.wiegerinck, B.kappen) @science - Ru.nl
No ratings yet
Optimal Control in Large Stochastic Multi-Agent Systems: (B.vandenbroek, W.wiegerinck, B.kappen) @science - Ru.nl
12 pages
Eene Sakha Ea2025
No ratings yet
Eene Sakha Ea2025
14 pages
Data-Driven-Based Sliding-Mode Dynamic Event-Triggered Control
No ratings yet
Data-Driven-Based Sliding-Mode Dynamic Event-Triggered Control
11 pages
Shi 2021
No ratings yet
Shi 2021
11 pages
CaoRenEgerstedt12 Automatica
No ratings yet
CaoRenEgerstedt12 Automatica
12 pages
Algorithms For Synthesis of Adaptive Decentralized Control of Interconnected Systems by The Speed Gradient Method
No ratings yet
Algorithms For Synthesis of Adaptive Decentralized Control of Interconnected Systems by The Speed Gradient Method
9 pages
Tac 232
No ratings yet
Tac 232
7 pages
Cooperative Control of Multi-Agent Systems With Unknown State-Dependent Controlling Effects
No ratings yet
Cooperative Control of Multi-Agent Systems With Unknown State-Dependent Controlling Effects
8 pages
Formation Control With Obstacle Avoidance For A Class of Stochastic Multiagent Systems
No ratings yet
Formation Control With Obstacle Avoidance For A Class of Stochastic Multiagent Systems
9 pages
Decentralized Control For Large-Scale Nonlinear Systems With Unknown Mismatched Interconnections Via Policy Iteration
No ratings yet
Decentralized Control For Large-Scale Nonlinear Systems With Unknown Mismatched Interconnections Via Policy Iteration
11 pages
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
No ratings yet
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
8 pages
2020 ADP Nonlinear System Mismatched Disterbances 2
No ratings yet
2020 ADP Nonlinear System Mismatched Disterbances 2
8 pages
Ultimately Bounded Output Feedback Control For Networked Nonlinear Systems With Unreliable Communication Channel A Buffer-Aided Strategy
No ratings yet
Ultimately Bounded Output Feedback Control For Networked Nonlinear Systems With Unreliable Communication Channel A Buffer-Aided Strategy
13 pages
ZHANG-Distributed Optimal Control
No ratings yet
ZHANG-Distributed Optimal Control
8 pages
Robust Control Design For Zero-Sum Differential Games Problem Based On Off-Policy Reinforcement Learning Technique
No ratings yet
Robust Control Design For Zero-Sum Differential Games Problem Based On Off-Policy Reinforcement Learning Technique
9 pages
Adaptive Dynamic Programming Algorithm For Uncertain Nonlinear Switched Systems
No ratings yet
Adaptive Dynamic Programming Algorithm For Uncertain Nonlinear Switched Systems
7 pages
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
No ratings yet
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
6 pages
A Convex Approximation of Optimal Distributed Controller in Frequency Domain
No ratings yet
A Convex Approximation of Optimal Distributed Controller in Frequency Domain
7 pages
Adaptive Fuzzy State-Feedback Control For A Class of Multivariable Nonlinear Systems
No ratings yet
Adaptive Fuzzy State-Feedback Control For A Class of Multivariable Nonlinear Systems
7 pages
Programacion Lineal 9
No ratings yet
Programacion Lineal 9
5 pages
13.3 Reading A System Status List or Partial List With SFC 51 "Rdsysst"
No ratings yet
13.3 Reading A System Status List or Partial List With SFC 51 "Rdsysst"
6 pages
Continuous-Time Stochastic Policy Iteration of Adaptive Dynamic Programming
No ratings yet
Continuous-Time Stochastic Policy Iteration of Adaptive Dynamic Programming
13 pages
Optimal Control of A Fully Decentralized Quadratic Regulator
No ratings yet
Optimal Control of A Fully Decentralized Quadratic Regulator
7 pages
Adaptive Control of Uncertain Gun Control System of Tank (396KB)
No ratings yet
Adaptive Control of Uncertain Gun Control System of Tank (396KB)
6 pages
Mean-Field Control Barrier Functions: A Framework For Real-Time Swarm Control
No ratings yet
Mean-Field Control Barrier Functions: A Framework For Real-Time Swarm Control
7 pages
Adapti
No ratings yet
Adapti
6 pages
An Augmented Lagrangian For Optimal Control of DAE Systems: Algorithm and Properties
No ratings yet
An Augmented Lagrangian For Optimal Control of DAE Systems: Algorithm and Properties
6 pages
Domain and Range
No ratings yet
Domain and Range
20 pages
Distributed Inverse Optimal Control For Discrete-Time Nonlinear Multi-Agent Systems
No ratings yet
Distributed Inverse Optimal Control For Discrete-Time Nonlinear Multi-Agent Systems
6 pages
Li 等 - 2017 - Fault-Tolerant Consensus for Non-linear Leader-fol
No ratings yet
Li 等 - 2017 - Fault-Tolerant Consensus for Non-linear Leader-fol
6 pages
ACODS 2014 GAndrade
No ratings yet
ACODS 2014 GAndrade
7 pages
Graphrepresentation
No ratings yet
Graphrepresentation
59 pages
Ifac08 - Seul - Yu 0457 PDF
No ratings yet
Ifac08 - Seul - Yu 0457 PDF
6 pages
Feng Dai 1986 IEEETranAutomaticCtrl
No ratings yet
Feng Dai 1986 IEEETranAutomaticCtrl
4 pages
A Data-Driven Trajectory Tracking Control Method Based On Adaptive Fourier Decomposition For Linear Discrete Systems
No ratings yet
A Data-Driven Trajectory Tracking Control Method Based On Adaptive Fourier Decomposition For Linear Discrete Systems
5 pages
Solution of Differential Games
No ratings yet
Solution of Differential Games
6 pages
Python Backend Developer PDF
No ratings yet
Python Backend Developer PDF
1 page
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
001 01 C Welker Catalog 2022
No ratings yet
001 01 C Welker Catalog 2022
105 pages
VLANs and Trunking Assignment #1
No ratings yet
VLANs and Trunking Assignment #1
9 pages
4.3336.x2.000 001 008 - Combined - Wind - Transmitter Classic Serial TMR - en
No ratings yet
4.3336.x2.000 001 008 - Combined - Wind - Transmitter Classic Serial TMR - en
20 pages
AIBot With Scratch 10
No ratings yet
AIBot With Scratch 10
12 pages
Some of The Most Commonly Used SAP
No ratings yet
Some of The Most Commonly Used SAP
5 pages
03 Ipc
No ratings yet
03 Ipc
36 pages
Central Electricity Authority: Grid Management Division 1
No ratings yet
Central Electricity Authority: Grid Management Division 1
23 pages
Unit-2 HTML Abhinay Kumar
No ratings yet
Unit-2 HTML Abhinay Kumar
7 pages
Increasing Productivity in HVAC Layout Work
No ratings yet
Increasing Productivity in HVAC Layout Work
11 pages
5 C 03 C 2158 A 5 FF
No ratings yet
5 C 03 C 2158 A 5 FF
4 pages
Jurnal Negara Hukum
No ratings yet
Jurnal Negara Hukum
10 pages
CE 2215L Hydraulics LaboratoryI Experiment No. 4 Investigation of Forced Vortices
No ratings yet
CE 2215L Hydraulics LaboratoryI Experiment No. 4 Investigation of Forced Vortices
12 pages
12V 150ah VRLA Slim Battery
No ratings yet
12V 150ah VRLA Slim Battery
2 pages
Python Week 5 - 6 GrPA (Made by Unknown)
No ratings yet
Python Week 5 - 6 GrPA (Made by Unknown)
4 pages
Dice Resume CV Avinash Peechari
No ratings yet
Dice Resume CV Avinash Peechari
4 pages
All Iphone - Best Buy
No ratings yet
All Iphone - Best Buy
1 page
JupyterLab Notebook Cheatsheet
No ratings yet
JupyterLab Notebook Cheatsheet
2 pages
Personal Chiller 6-Can Mini Refrigerator, Pink K4
No ratings yet
Personal Chiller 6-Can Mini Refrigerator, Pink K4
1 page
Nonlinear Control Feedback Linearization Sliding Mode Control
From Everand
Nonlinear Control Feedback Linearization Sliding Mode Control
Mourad Boufadene
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet

Distributed Optimal Coordination Control For Continuous-Time Nonlinear Multi-Agent Systems With Input Constraints

Uploaded by

Distributed Optimal Coordination Control For Continuous-Time Nonlinear Multi-Agent Systems With Input Constraints

Uploaded by

2020 IEEE 9th Data Driven Control and Learning Systems Conference

November 20-22, 2020, Liuzhou, China

Distributed Optimal Coordination Control for Continuous-Time

1 Introduction bining Reinforcement learning, neural network and dynamic

This work is supported by National Natural Science Foundation

978-1-7281-5922-5/20/$31.00 ©2020 IEEE DDCLS'20

The leader’s dynamic system is represent by

ei =Hi (δi , Vδi , ui , u−i )

Fig. 1: Structure of communication topology

the algorithm. Multi-agent system’s consistency is achieved.

ẋi1 = xi2 − x2i1 xi2

ẋ01 = x02 − x201 x02 0.2

ẋ02 = −(x01 +x02 )(1 − x01 )2 (32)

where the parameters are designed as: Qii = I2 , Rii = -0.2

in about 5 seconds. The control policies are of the followers

[2] S. Weng, D. Yue and J. Shi, Distributed cooperative control

You might also like