Distributed_Optimal_Coordination_Control_for_Continuous-Time_Nonlinear_Multi-Agent_Systems_With_Input_Constraints
Distributed_Optimal_Coordination_Control_for_Continuous-Time_Nonlinear_Multi-Agent_Systems_With_Input_Constraints
Abstract: This paper is concerned with an optimal coordination control problem for nonlinear multi-agent systems (MASs) with
constraints of the control inputs. The idea of daptive dynamic programming (ADP) algorithm is to use the policy iteration to
solve the coupled Hamilton-Jacobi equations. First, a suitable non-quadratic functional is introduced into the cost function to
transform the question into an optimization problem. Second, a distributed control law is designed for each agent, which aims
that the cost function of the MASs converge to Nash equilibrium. Next, the analysis of the convergence is indicated that the
iterative cost functions of nonlinear multi-agent systems is convergent. Neural network (NNs) are used to approximate the cost
functions for the calculation of the control laws. Finally, simulation results show the effectiveness of the coordination control
algorithm.
Key Words: Adaptive dynamic programming (ADP), multi-agent systems, Nash equilibrium, reinforcement learning, optimal
control.
Λ Σ Λ . Σ and Σ are diagonal matrixes, which are the ξ (z i(k) )(zi(k) − zi(k) ) − ξ T (ζk )dζk < 0 (19)
j j j i j
0
DDCLS'20
457
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
(l) (l)
And because ξ T (zi )=ϕ−1 (Λ−1 T
i zi ) Λi , ϕ−1 (·) 5 Online NN controller design
is monotonous odd function, we can figure out that
(l) (l+1) (l+1) In this section, NNs utilized to help calculate the opti-
V̇i (δi , ui ) < 0, ui is admissible control policy. mal control policies through approximating critic network.
Adaptive dynamic programming with single NN structure is
(l+1) (l) used for solving coupled HJB equation. It greatly simplifies
Vi (δ0 ) − Vi (δ0 ) the algorithm structure and reduces the calculation burden.
Z ∞
(l+1) The performance index function of agent i is presented as
= Q(δ(τ, δ0 , ui ))dτ
0
Z ∞ XZ
(l+1)
uj (δ(τ,δ0 ,uj
(l+1)
) Vi∗ (δi ) = WiT Φ(δi ) + εi (δi ) (24)
−T
+2 ( ϕ (v)Rij dv)dτ
0
j∈N̄i
0 then
∞ ∂Φ(δi ) ∂εi (δi )
Vδ∗i = WiT
Z
(l+1) + (25)
− Q(δ(τ, δ0 , ui ))dτ ∂δi ∂δi
0
(l) (l+1) T
Z ∞ XZ uj (δ(τ,δ0 ,uj ) where Wi = [Wi1 ,...,Wil ] ∈ Rl is desired weight vector of
−2 ( ϕ−T (v)Rij dv)dτ agent i. Φ(δi ) is activation function vector of linear indepen-
0 0
j∈N̄i dent NN. εi (δi ) is the estimation error of NN to local value
Z ∞ (l+1) function. Suppose that Ŵi , then the critic network weight is
d(Vi (δ0 )) X
=− (lij + bij )(f˜j + gj uj )dτ
0 dδ
j∈N̄i V̂i (δi ) = ŴiT Φ(δi ) (26)
Z ∞ (l)
d(Vi (δ0 )) X
+ (lij + bij )(f˜j + gj uj )dτ (20) The HJB equation can be defined as
0 dδ
j∈N̄i
˙ ∂Ei (Ŵi )
Ŵi = − ai [ ] (29)
(l+1) (l) ∂ Ŵi
Vi (δ0 ) − Vi (δ0 )
Z ∞
(l+1) (l+1) (l)
=−2 ϕ−T (ui )Rii (ui − ui )dτ where ∂Ei (Ŵi ) ∂Ei ∂Vi
= ∂Vi ∂ Ŵ . Then we can derive
0 ∂ Ŵi i
(l+1)
Z ∞ Z ui
+ ϕ−T (v)Rii dvdτ (23) ˙ ai ∂Ei (Ŵi )
(l) Ŵi = − 2
0 ui
(1 + σiT σi ) ∂ Ŵi
ai σi
We can easily get the conclusion Vi (δ0 ) − Vi (δ0 ) ≤
(l+1) (l) =− 2 ei (30)
(1 + σiT σi )
0. When l → ∞, then Vi∞ (δ0 ) ≥ Vi∗ (δ0 ). And
(l+1) (l)
Vi (δ0 ) ≤ Vi (δ0 ), then Vi∞ (δ0 ) ≤ Vi∗ (δ0 ). We can where ai is learning rate of critic network,
get that liml→∞ Vil (δ0 ) = Vi∗ (δ0 ), at the same time, we can (1 + σiT σi )
2
is utilized for normalization, and
obtain liml→∞ uli = u∗i . ∂Φ(δi ) P
σi = ∂δi ˜
(lij + bij )(f + gj (xj )ûj ).
j∈N̄i
DDCLS'20
458
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
0.8
x0(1)
x1(1)
0.6 x2(1)
x3(1)
0.4
0.2
-0.2
6 Simulation
In this section, an example is given to show the validity of -0.4
under the condition |ui | ≤ 0.5. The dynamic function of the 0.6
x0(2)
leader is given as x1(2)
0.4
x2(2)
and set the initial weights to be zero vec- Fig. 3: State of agents(2)
tor. The initial states of the nodes are given as
x0 (0) = [−0.6; 0.6],x1 (0) = [0.4; −0.4], x2 (0) = [0.8; −0.8],
x3 (0) = [0.65; −0.65]. By setting ϕ(·) = 0.5 tanh(·), the
system states are shown in Fig.2 and Fig.3. In these two
figures, the followers reach a consensus with the leader 12
about 15 seconds later. ||w1||
||w2||
Fig.4 shows that the NN weights converge to the optimum 10 ||w3||
References
4
[1] D. Cheng, On logic-based intelligent systems, in Proceedings
of 5th International Conference on Control and Automation,
2005: 71–75. 2
DDCLS'20
459
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.
in the performance cost, in 2009 International Joint Confer-
0.5
u1
ence on Neural Networks, 1849–1854, 2009.
0.4 u2 [16] Q. Wei, D. Wang and D. Zhang, Dual iterative adaptive dy-
u3
0.3 namic programming for a class of discrete-time nonlinear sys-
tems with time-delays, Neural Computing and Applications,
0.2
23(7–8): 1851–1863, 2013.
0.1 [17] Q. Wei, F. L. Lewis, G. Shi and R. Song, Error-tolerant it-
0 erative adaptive dynamic programming for optimal renewable
home energy scheduling and battery management, IEEE Trans-
-0.1
actions on Industrial Electronics, 64(12): 9527-9537, 2017.
-0.2 [18] Q. Wei and D. Liu, Mixed iterative adaptive dynamic pro-
-0.3 gramming for optimal battery energy control in smart resi-
dential microgrids, IEEE Transactions on Industrial Electron-
-0.4
ics,64(5): 4110–4120, 2017.
-0.5 [19] Q. Wei, G. Shi, R. Song and Y. Liu, Adaptive dynamic
0 5 10 15 20 25 30
programming-based optimal control scheme for energy storage
systems with solar renewable energy, IEEE Transactions on
Fig. 5: Control inputs Industrial Electronics, 64(7): 5468–5478, 2017.
[20] Q. Wei, L. Wang, Y. Liu and M. M. Polycar-
pou, Optimal elevator group control via deep asynchronous
actor-critic learning, IEEE Transactions on Neural Networks
[4] W. Zhao and H. Zhang, Distributed optimal coordination con- and Learning Systems, (99): 1–12, in press 2020. DOI:
trol for nonlinear multi-agent systems using event-triggered 10.1109/TNNLS.2020.2965208
adaptive dynamic programming method, ISA transactions,91: [21] H. Jiang, H. Zhang , J. Han and K. Zhang, Iterative adap-
184–195, 2019. tive dynamic programming methods with neural network im-
[5] R. V. Mihai and M. M. Bivolaru, Cooperative distributed tra- plementation for multi-player zero-sum games, Neurocomput-
jectory optimization for a heterogeneous UAV formation,AIP ing, 307: 54–60, 2018.
Conference Proceedings,2018. [22] Q. Wei, Z. Liao, Z. Yang, B. Li and D. Liu, Continuous-
[6] A. R. Mehrabian and K. Khorasani, Constrained distributed time time-varying policy iteration, IEEE Transactions on Cy-
cooperative synchronization and reconfigurable control of het- bernetics, in press, 2019. 10.1109/TCYB.2019.2926631
erogeneous networked Euler-Lagrange multi-agent systems, [23] Q. Wei, D. Liu, Q. Lin and R. song, Adaptive dynamic pro-
Information Sciences, 370: 578–597, 2016. gramming for discrete-time zero-sum games, IEEE Transac-
[7] Y. Zhu, D. Zhao, H. He, and J. Ji, Event-triggered opti- tions on Neural Networks and Learning Systems, 29(4): 957–
mal control for partially unknown constrained-input systems 969, 2018.
via adaptive dynamic programming, IEEE Transactions on In- [24] Q. Wei, F. L. Lewis, D. Liu, R. Song and H. Lin,
dustrial Electronics, 64(5): 4101–4109, 2016. Discrete-time local value iteration adaptive dynamic program-
[8] H. Zhang, L. Cui and Y. Luo, Near-optimal control for ming: Convergence analysis, IEEE Transactions on Systems,
nonzero-sum differential games of continuous-time nonlinear Man, and Cybernetics: Systems, 48(6): 875–891, 2018.
systems using single-network ADP, IEEE transactions on cy- [25] Q. Wei, D. Liu and Q. Lin, Discrete-time local iterative
bernetics, 43(1): 206–216, 2012. adaptive dynamic programming: Terminations and admissi-
[9] K. G. Vamvoudakis, F. L. Lewis and G. R. Hudas, Multi-agent bility analysis, IEEE Transactions on Neural Networks and
differential graphical games: Online adaptive learning solution Learning Systems, 28(11):2490-2502, 2017.
for synchronization with optimality, Automatica, 48(8): 1598– [26] Q. Wei, D. Liu, Q. Lin and R. Song, Discrete-time optimal
1611, 2012. control via local policy iteration adaptive dynamic program-
[10] M. I. Abouheaf and F. L. Lewis, Multi-agent differen- ming, IEEE Transactions on Cybernetics, 47(10): 3367–3379,
tial graphical games: Nash online adaptive learning solutions, 2017.
in 52nd IEEE Conference on Decision and Control, 5803– [27] Q. Wei, B. Li and R. Song, Discrete-time stable general-
5809,2013 ized self-learning optimal control with approximation errors,
[11] H. Zhang, J. Zhang, G. H. Yang and Y. Luo, Leader-based IEEE Transactions on Neural Networks and Learning Systems,
optimal coordination control for the consensus problem of mul- 29(4): 1226–1238, 2018.
tiagent differential games via fuzzy adaptive dynamic program- [28] Q. Wei, R. Song, Z. Liao, B. Li and L. L.
ming, IEEE Transactions on Fuzzy Systems, 23(1): 152–163, Frank, Discrete-time impulsive adaptive dynamic program-
2014. ming, IEEE Transactions on Cybernetics, in press, 2019.
[12] Q. Wei, D. Liu and F. L. Lewis, Optimal distributed syn- 10.1109/TCYB.2019.2906694
chronization control for continuous-time heterogeneous multi- [29] M. Abu-Khalaf and F. L. Lewis, Nearly optimal control laws
agent differential graphical games, Information Sciences, 317: for nonlinear systems with saturating actuators using a neural
96–113, 2015. network HJB approach, Automatica, 41(5): 779–791, 2005.
[13] Q. Jiao, H. Modares, S. Xu and F. L. Lewis, Multi-agent [30] X. Lin, Y. Huang , N. Cao and Y. Lin, Optimal con-
zero-sum differential graphical games for disturbance rejection trol scheme for nonlinear systems with saturating actuator us-
in distributed control, Automatica, 69: 24–34, 2016. ing iterative adaptive dynamic programming, in Proceedings
[14] A. Al-Tamimi , F. Lewis and M. Abu-Khalaf, Discrete-time of 2012 UKACC International Conference on Control, 58–63,
nonlinear HJB solution using approximate dynamic program- 2012.
ming: Convergence proof, in IEEE Transactions on Systems, [31] L. Cui, X. Xie, X. Wang, Y. Luo and J. Liu, Event-triggered
Man, and Cybernetics, Part B (Cybernetics), 943–949, 2008. single-network ADP method for constrained optimal tracking
[15] D. Liu and N. Jin, Adaptive dynamic programming for control of continuous-time non-linear systems, Applied Math-
discrete-time systems with infinite horizon and ∈-error bound ematics and Computation, 352: 220–234, 2019.
DDCLS'20
460
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on February 21,2024 at 00:35:54 UTC from IEEE Xplore. Restrictions apply.