0% found this document useful (0 votes)
7 views

Adaptive Dynamic Programming

cv

Uploaded by

ankush thakur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Adaptive Dynamic Programming

cv

Uploaded by

ankush thakur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Neurocomputing 564 (2024) 126965

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Optimal bipartite consensus for discrete-time multi-agent systems with


event-triggered mechanism based on adaptive dynamic programming✩
Wanli Jin a , Huaguang Zhang a,b ,∗,1 , Zhongyang Ming a
a
School of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, 110004, China
b
State Key Laboratory of Synthetical Automation for Process Industries (Northeastern University), Shenyang, Liaoning, 110004, China

ARTICLE INFO ABSTRACT

Communicated by B. Zhao This paper proposes optimal bipartite consensus issue for discrete-time multi-agent system (DTMAS) using
event-triggered control (ETC) and quantized event-triggered control (QETC). Firstly, a novel event-triggered
Keywords:
control strategy is introduced along with specific conditions to ensure the stability for DTMAS. Due to the
Optimal bipartite consensus
Discrete-time multi-agent system (DTMAS)
intractability of the Hamilton–Jacobi–Bellman (HJB) equation, through two neural networks (NNs), the value
Quantized event-triggered control (QETC) function and optimal control are approximated by the adaptive dynamic programming (ADP) method. The
Adaptive dynamic programming (ADP) weight matrix of the action neural network is updated only when triggered, while also taking into account the
Neural networks (NNs) quantization effect of the information. The stability of the weights errors and the convergence of the system
dynamic are demonstrated using Lyapunov stability theory. Finally, the practicality of the improved method
is confirmed a numerical example.

1. Introduction extensively studied by scholars [9–16]. In [9], some conditions under


event-triggered control (ETC) was derived to ensure the convergence
Over that last several decades, the cooperative control of multi- of nonlinear MASs while avoiding the consumption of communication
agent systems (MASs) has found extensive application in numerous bandwidth among neighboring agents. For MASs with nonlinear func-
practical domains, for instance, it has been utilized in the formation of tion, such as exponential and tangent functions, to achieve average
unmanned aerial vehicle (UAV) [1,2], distributed sensor network [3,4] consensus, ETC strategy was utilized in [10]. For second-order MAS
and industrial electronics [5,6]. The consensus problem, as the most with unknown dynamics, an adaptive ETC strategy without global
fundamental issue in cooperative control, can be divided into two
information was presented in [11,12]. Similarly, in [13], for het-
categories: leader-following and leaderless consensus, depending on
erogeneous linear MASs, the adaptive ETC was employed to output
whether there is a (virtual) leader present or not. Since the 1990s, this
regulation consensus. On the other hand, in the most consensus prob-
problem has gained significant momentum, and an increasing number
lem, the information exchange among agents is naturally assumed
of researchers have devoted considerable attention to studying this
point [7,8]. to be precise, which is unrealistic in control engineering practice.
It is worth noting above works need continuous communication. Considering communication channel with limited capacity, quantized
However, in practical situations, agents have limited microprocessor control approach, as an available control approach, has been exten-
storage space and network bandwidth, making continuous control and sively studied for consensus problem with communication constrains
communication impractical. As a result, it is essential to develop some in digital platform [17–20]. In [17], some novel impulsive control
control strategies that can efficiently conserve resources, such as sam- schemes and a reasonable quantized controller were put forward to
pling control, impulsive control and intermittent control. Compared address fixed-time consensus issue of MASs. In [18], Dong et al. con-
to traditional sample-data control, event-triggered sample-data control structed distributed fuzzy cooperative controllers with quantization
samples and transmits information to neighboring agents only when to ensure output consensus for heterogeneous MASs under directed
the specified threshold is exceeded. This control strategy has been fixed and switching communication graph. Hence, taking into account

✩ This work was supported by National Key R&D Program of China under grant 2018YFA0702200, and National Natural Science Foundation of China
(61627809), and Liaoning Revitalization Talents Program (XLYC1801005). the Nature Science Foundation of Liaoning Province of China (2022JH25/10100008).
∗ Corresponding author at: State Key Laboratory of Synthetical Automation for Process Industries (Northeastern University), Shenyang, Liaoning, 110004,
China.
E-mail addresses: [email protected] (W. Jin), [email protected] (H. Zhang), [email protected] (Z. Ming).
1
Fellow, IEEE.

https://doi.org/10.1016/j.neucom.2023.126965
Received 3 May 2023; Received in revised form 19 August 2023; Accepted 22 October 2023
Available online 30 October 2023
0925-2312/© 2023 Published by Elsevier B.V.
W. Jin et al. Neurocomputing 564 (2024) 126965

the advantages of both control strategies, this paper will further inte- The rest of this paper is organized as follows. The signed graph
grate quantized control with event-triggered strategy to solve resource knowledge and optimal bipartite consensus are described in Section 2.
constrained problems. In the Section 3, the ETC is introduced and the stability of MAS is
Furthermore, the energy carried by power grids, aircraft and UAV analyzed. In the Section 4, the critic–actor NN with quantized event-
is often circumscribed. Hence, the system cost needs to be considered triggered control are designed and the weight estimation errors’ uni-
while ensuring the completion of the task. Given such social demand, formly ultimately bounded (UUB) are proven. Then, a example of
the optimal control problem minimizing control cost has attracted ex- numerical simulation to demonstrate the feasibility of the results in the
tensive attention. As an optimal control method, ADP has the advantage Section 5. Finally, ending Section 6 with a brief conclusion.
of solving HJB equations, which are difficult to be solved analytically.
So far, there are fruitful theoretical results related to ADP algorithms 2. Preliminary knowledge
for both linear and nonlinear systems [21–24]. For discrete-time (DT)
nonlinear systems, under the upper bound constraint of uncertain 2.1. Signed graph theory
parameters, robust control problem is further converted into dealing
with the optimal robust control issue in [21]. In [24], the optimal coop- Let 𝑠 = (, , ) be a signed graph, where  = {1, … , 𝑁},  ⊆
erative control of continuous-time MASs with unknown dynamics was { × } and  = [𝑎𝑖𝑗 ]𝑁×𝑁 denote the node set, the edge set and the
obtained by ADP technology. When the system dynamics was difficult weighted adjacency matrix, respectively. 𝑒𝑖𝑗 = {𝑖, 𝑗} is the edge of 𝑠 .
to analyze, a compensator was constructed to obtain an augmented The neighbors set node 𝑖 is denoted by 𝑁𝑖 = {𝑗 ∈ , (𝑗, 𝑖) ∈ }.
error system, then the value function’s approximation was proved by Furthermore, for matrix  = [𝑎𝑖𝑗 ]𝑁×𝑁 , in which 𝑎𝑖𝑗 ≠ 0 if 𝑒𝑖𝑗 ∈ 
the policy iteration strategy. Further on, the results combining ADP and 𝑎𝑖𝑗 = 0, otherwise. The degree matrix 𝐷 = diag{𝑑𝑖 } with 𝑑𝑖 =

and event-triggered mechanism also attracted scholars’ attention [25– |𝑎𝑖𝑗 |. The Laplacian matrix  = [𝑙𝑖𝑗 ]𝑁×𝑁 of 𝑠 is defined as
𝑗∈𝑁𝑖 ∑
28]. A dynamic event-triggered controller was given by introducing an 𝑙𝑖𝑖 = 𝑗∈𝑁𝑖 |𝑎𝑖𝑗 | and 𝑙𝑖𝑗 = −𝑎𝑖𝑗 for 𝑖 ≠ 𝑗. If 𝑎𝑖𝑗 > 0 (𝑎𝑖𝑗 < 0), the
adaptive variable [25]. Luo et al. in [28] established an event-triggered connection relationship between the node 𝑖 and node 𝑗 is cooperative
function with regard to performance index, and realized the balance (competitive). Let ̄𝑠 be a graph composed of a leader and 𝑁 followers.
between performance index and energy by adjusting parameter. At 𝑅 = diag{𝑎10 , 𝑎20 , … , 𝑎𝑁0 }. If there exists connection between the
present, Scholars rarely show solicitude for optimal control problems leader and followers, 𝑎𝑖0 > 0, otherwise, 𝑎𝑖0 = 0. Definite 𝐻 = 𝐿 + 𝑅.
of MASs via ETC. The signed topology graph 𝑠 implies structurally balanced (SB),
Another problem encountered in control application is that there if there is a bipartition of node set 𝑉1 and node set 𝑉2 that satisfies
is not only cooperative relationship but also competitive relationship 𝑉1 ∪ 𝑉2 = 𝑉 , 𝑉1 ∩ 𝑉2 = ∅ and for any 𝑖, 𝑗 ∈ 𝑉𝑞 (𝑞 ∈ 1, 2), 𝑎𝑖𝑗 > 0, and for
among agents, such as social network, UAV system. In this case, a any 𝑖 ∈ 𝑉3−𝑞 , 𝑗 ∈ 𝑉𝑞 , 𝑎𝑖𝑗 < 0.
signed graph, whose matrices connection weights are positive and
negative, is served as depict the network topology. Due to the existence 2.2. Problem formulation
of cooperation and competitive interaction, scholars introduced the
concept of bipartite consensus, that is, the agent converges to the Consider ordinary DTMAS with the leader and 𝑁 followers is de-
corresponding value with the same module but the opposite sign. A fined by:
lot of outstanding work about it have emerged [29–33]. Altafini [29]
𝑥𝑖 (𝜏 + 1) = 𝐴𝑥𝑖 (𝜏) + 𝐵𝑖 𝑢𝑖 (𝜏), (1)
firstly exploited bipartite consensus of single-integral MASs, then the
bipartite consensus problem was extended to more general MASs. where 𝑥𝑖 (𝜏) ∈ R𝑛 represents the state vector of 𝑖th agent, 𝑢𝑖 (𝜏) ∈ R𝑚 is
In [31], by utilizing matrix-valued Gauge transformation and stability control input of 𝑖th agent, 𝐴 ∈ R𝑛×𝑛 and 𝐵𝑖 ∈ R𝑛×𝑚 .
theory, cluster bipartite consensus was achieved for second-order MASs The model of the leader being tracked is described as:
with matrix coupling. In [32,33], second-order and high-order MASs
bipartite problems are stable in finite time. However, most research 𝑥0 (𝜏 + 1) = 𝐴𝑥0 (𝜏), (2)
results about the bipartite consensus of MASs are not optimal. There- where 𝑥0 (𝜏) ∈ R𝑛 denotes state variable of the leader.
fore, we will study the optimal bipartite consensus control of MASs with
signed graphs. Assumption 1. The signed topology 𝑠 is SB and ̄𝑠 contains a directed
Inspired by the above discussion, the optimal bipartite consensus spanning tree whose root node is a leader.
with ETC strategy for DTMAS is analyzed in this paper. Compared with
related works, this paper has some innovations: First, the definition of bipartite consensus is presented.

(1) A novel ETC strategy for the DTMAS is designed to solve optimal Definition 1. Consider the MAS (1) and (2), for any initial state, if
consensus using ADP theory, where the weight matrices of the there exists 𝑢𝑖 (𝜏) such that:
actor NN are only updated at the triggered moment.
(2) Some existing results on the consensus of MASs have been lim (𝑥𝑖 (𝜏) − 𝑠𝑖 𝑥0 (𝜏)) = 0, 𝑠𝑖 ∈ {1, −1} (3)
𝜏→∞
explored in [9–16]. However, these works only focus on the
holds, then (1) and (2) achieve bipartite consensus.
cooperative communication among agents. In this paper, we
not only take into account cooperation and competition among Our aim is to enable agent 𝑖 to track leader’s state under the
agents, but also explore the optimal consensus through the ADP influence of control 𝑢𝑖 and its neighbor nodes. Since there is information
method. exchange between the agents, we build a local error to describe the
(3) A logarithmic quantizer with ETC is applied in the developed information communication between them:
optimal control strategy, which has advantages of overcom- ∑
ing bandwidth limitations and eliminating quantization error 𝜁𝑖 (𝜏) = |𝑎𝑖𝑗 |(𝑥𝑖 (𝜏) − 𝑠𝑖𝑔𝑛(𝑎𝑖𝑗 )𝑥𝑗 (𝜏))
𝑗∈𝑁𝑖
compared with existing control strategies [21,22,24,28,34]. Spe-
cially, [28,34] only consider ETC, not consider quantized con- + 𝑎𝑖0 (𝑥𝑖 (𝜏) − 𝑠𝑖 𝑥0 (𝜏)). (4)
trol, but we simultaneously study elements including event
The compacting form of (4) is:
triggered, logarithmic quantizers, and antagonistic interactions,
which can further save resources. 𝜁(𝜏) = (𝐻 ⊗ 𝐼𝑛 )𝜀(𝜏), (5)

2
W. Jin et al. Neurocomputing 564 (2024) 126965

where 𝜁(𝜏) = (𝜁1𝑇 (𝜏), 𝜁2𝑇 (𝜏), … , 𝜁𝑁 𝑇 (𝜏))𝑇 , 𝜀 (𝜏) = 𝑥 (𝜏) − 𝑠 𝑥 (𝜏), 𝜀(𝜏) =
𝑖 𝑖 𝑖 0 The corresponding local value function of (6) is defined by:
(𝜀𝑇1 (𝜏), 𝜀𝑇2 (𝜏), … , 𝜀𝑇𝑁 (𝜏))𝑇 . Differentiating (4) ∑

𝑉𝑖 (𝜁𝑖 (𝜏)) = 𝑟𝑖 (𝜁𝑖 (𝑡), 𝑢𝑖 (𝑡), 𝑢−𝑖 (𝑡)). (11)
𝜁𝑖 (𝜏 + 1) =𝐴𝜁𝑖 (𝜏) + (𝑑𝑖 + 𝑎𝑖0 )𝐵𝑖 𝑢𝑖 (𝜏) 𝑡=𝜏

− 𝑎𝑖𝑗 𝐵𝑗 𝑢𝑗 (𝜏) The DT HJB equation is written as:
𝑗∈𝑁𝑖
0 =𝐻𝑖 (𝜁𝑖 , 𝑢𝑖 , 𝑢−𝑖 , 𝑉𝑖 (𝜁𝑖 ))
=𝑔𝑖 (𝜁𝑖 (𝜏), 𝑢𝑖 (𝜏)), (6)
=𝜁𝑖𝑇 𝑄𝑖𝑖 𝜁𝑖 + 𝑢𝑇𝑖 𝑅𝑖𝑖 𝑢𝑖 + 𝑢𝑇−𝑖 𝑅𝑖𝑗 𝑢−𝑖
where sign(𝑎𝑖𝑗 ) is the sign function.
𝜕𝑉𝑖𝑇 (𝜁𝑖 (𝜏 + 1))
+ 𝜁𝑖 (𝜏 + 1). (12)
𝜕𝜁𝑖 (𝜏 + 1)
Remark 1. Since 𝐻 is a non-singular matrix, it is known by (5) that
the stability of 𝜀(𝜏) can be proved by analyzing the stability of 𝜁(𝜏). From the local value function (11), the local optimal 𝑉𝑖∗ (𝜁𝑖 (𝜏)) is:
Moreover, when the communication topology graph 𝑠 is directed, 𝐻
𝑉𝑖∗ (𝜁𝑖 (𝜏)) = min {𝑟𝑖 (𝜁𝑖 (𝜏), 𝑢𝑖 (𝜏𝑎𝑖 )) + 𝑉𝑖∗ (𝜁𝑖 (𝜏 + 1))}, (13)
may not be a non-singular matrix, so what is considered in this paper 𝑢𝑖 (𝜏𝑎𝑖 )
is that the communication topology 𝑠 is undirected.
which satisfies the DT HJB equation.
In order to minimize the following local performance indicators Therefore, it can be inferred from Bellman’s optimality principle:
index, it is necessary to seek out an optimal control as our ultimate
𝑢∗𝑖 (𝜏) = arg min {𝑟𝑖 (𝜁𝑖 (𝜏), 𝑢𝑖 (𝜏𝑎𝑖 )) + 𝑉𝑖∗ (𝜁𝑖 (𝜏 + 1))}. (14)
goal, 𝑢𝑖 (𝜏𝑎𝑖 )



By the stationarity condition, it is obtained that
𝐽𝑖 (𝜁𝑖 (𝜏), 𝑢𝑖 (𝜏), 𝑢−𝑖 (𝜏)) = 𝑟𝑖 (𝜁𝑖 (𝜏), 𝑢𝑖 (𝜏), 𝑢−𝑖 (𝜏))
1
𝜏=0 𝑢∗𝑖 (𝜏) = − (𝑑𝑖 + 𝑎𝑖0 )𝑅−1 𝑇 ∗
𝑖𝑖 𝐵𝑖 𝛥𝑉𝑖 (𝜁𝑖 (𝜏 + 1)), (15)

∞ 2
= (𝜁𝑖𝑇 (𝜏)𝑄𝑖𝑖 𝜁𝑖 (𝜏) + 𝑢𝑇𝑖 (𝜏)𝑅𝑖𝑖 𝑢𝑖 (𝜏) + 𝑢𝑇−𝑖 (𝜏)𝑅𝑖𝑗 𝑢−𝑖 (𝜏)), (7) 𝜕𝑉𝑖∗ (𝜁𝑖 (𝜏+1))
𝜏=0
where 𝛥𝑉𝑖∗ (𝜁𝑖 (𝜏 + 1)) = 𝜕𝜁 (𝜏+1) .
𝑖
Designing an event-triggered function adopted reasonable condi-
where 𝑢−𝑖 (𝜏) stands for the control inputs of the neighbors of agent 𝑖 and
tions:
𝑄𝑖𝑖 > 0, 𝑅𝑖𝑖 > 0, 𝑅𝑖𝑗 > 0 represent symmetric time-invariant matrices.
1 − 2𝜃 2
𝜑𝑖 (𝜏) = ‖𝜂𝑖 (𝜏)‖2 − ‖𝜁𝑖 (𝜏)‖2 , (16)
Definition 2. for any initial state 𝑥𝑖 , if the MAS (1) and (2) not only 2𝜃 2
satisfy Definition 1 but also minimize the local performance index, then the latest event-triggered moment is defined as:
the MAS achieves optimal bipartite consensus. 𝑖
𝜏𝑎+1 = inf {𝜏 > 𝜏𝑎𝑖 |𝜑𝑖 (𝜏) ≥ 0}, (17)

Remark 2. In previous literature [21–24], only the optimal control of a once the triggered error exceeds the current disagreement error, we
single system is considered, but in this paper, each agent communicates sample and update among messages to get the next event-triggered
with other agents through graph theory, apart form that the agents own moment.
state, the optimal control strategy of each agent also related to the
control of its neighboring agent. So we design the local performance Assumption 2 ([25]). There exists a constant 𝜃 greater than 0 such that
index contained 𝜁𝑖 (𝑘), which describes the information exchange among the function ‖𝑔𝑖 (𝜁𝑖 (𝑘), 𝑢𝑖 (𝑘𝑖𝑎 ))‖ satisfies the following inequality:
agents, rather than the information of a single individual.
‖𝑔𝑖 (𝜁𝑖 (𝜏), 𝑢𝑖 (𝜏𝑎𝑖 ))‖ ≤ 𝜃‖𝜁𝑖 (𝜏)‖ + 𝜃‖𝜂𝑖 (𝜏)‖. (18)
Remark 3. In Definition 1, 𝑠𝑖 = −1 if the leader and the follower are in
3.2. Stability analysis
competition and 𝑠𝑖 = 1 if they are in cooperation. Moreover, compared
with previous literature [29–33], this paper not only considers the
The stability of local error (4) is ensured using the proposed ETC
problem of bipartite consensus, but also proposes the local performance
strategy, which will be proved in Theorem 1 below.
index to ensure optimal control.
Theorem 1. Under Assumption 1, consider the MAS √ (1) and (2) with ETC
3. Optimal ETC for DTMAS
(9) and event-triggered function (16). If 𝜃 ∈ (0, 2∕2), then (4) is stable,
3.1. Design of event-triggered optimal controller which implies the MAS (1) and (2) achieve bipartite consensus.

𝑖 ), the control 𝑢 (𝜏) keep constant, let the


Proof. (1) When 𝜏 ∈ (𝜏𝑎𝑖 , 𝜏𝑎+1
In order to avoid resource waste caused by continuous control, the 𝑖
controller samples the system state, and its design form is Lyapunov function 𝛶𝑖 (𝜏) = 𝜁𝑖𝑇 (𝜏)𝜁𝑖 (𝜏) + 𝑢𝑇𝑖 (𝜏)𝑢𝑖 (𝜏), for 𝛶𝑖 (𝜏), one has

𝜁𝑖 (𝜏) = 𝜁𝑖 (𝜏𝑎𝑖 ), 𝜏 ∈ [𝜏𝑎𝑖 , 𝜏𝑎+1


𝑖
) (8) 𝛥𝛶𝑖 (𝜏) =𝜁𝑖𝑇 (𝜏 + 1)𝜁𝑖 (𝜏 + 1) + 𝑢𝑇𝑖 (𝜏 + 1)𝑢𝑖 (𝜏 + 1)
− (𝜁𝑖𝑇 (𝜏)𝜁𝑖 (𝜏) + 𝑢𝑇𝑖 (𝜏)𝑢𝑖 (𝜏))
Sorting with monotonically increasing time {𝜏0𝑖 , …, 𝜏𝑎𝑖 , 𝜏𝑎+1𝑖 , …} as sam-

pling instants in ETC. Control updates only occur at discrete triggering =𝜁𝑖𝑇 (𝜏 + 1)𝜁𝑖 (𝜏 + 1) − 𝜁𝑖𝑇 (𝜏)𝜁𝑖 (𝜏), (19)
time instants: 𝜏0𝑖 , 𝜏1𝑖 , … , 𝜏𝑎𝑖 , 𝜏𝑎+1
𝑖 , …. This indicates that the actuators
then, according to Assumption 2 and triggered condition (17), we have
maintains control strategy unchanged during the triggering interval.
Therefore, the control policy is given as: 𝛥𝛶𝑖 (𝜏) ≤ (𝜃‖𝜁𝑖 (𝜏)‖ + 𝜃‖𝜂𝑖 (𝜏)‖)2 − ‖𝜁𝑖 (𝜏)‖2

𝑢𝑖 (𝜏) = 𝑢𝑖 (𝜁𝑖 (𝜏𝑎𝑖 )) = 𝑢𝑖 (𝜏𝑎𝑖 ), 𝜏 ∈ [𝜏𝑎𝑖 , 𝜏𝑎+1


𝑖
). (9) ≤ 2𝜃 2 ‖𝜁𝑖 (𝜏)‖2 + 2𝜃 2 ‖𝜂𝑖 (𝜏)‖2 − ‖𝜁𝑖 (𝜏)‖2
< 0. (20)
After defining the ETC, in order to design a triggered condition, we
need to define a local error between the triggered instant and the (2) when 𝜏 = 𝜏𝑎𝑖 , let 𝑉𝑖 (𝜁𝑖 (𝜏)) be the Lyapunov function, according to
normal instant: (11), the difference of 𝑉𝑖 (𝜁𝑖 (𝜏)) is given by 𝛥𝑉𝑖 (𝜁𝑖 (𝜏)) = 𝑉𝑖 (𝜁𝑖 (𝜏 + 1)) −
𝑉𝑖 (𝜁𝑖 (𝜏)) = −𝑟𝑖 (𝜁𝑖 (𝜏), 𝑢𝑖 (𝜏), 𝑢−𝑖 (𝜏)) < 0.
𝜂𝑖 (𝜏) = 𝜁𝑖 (𝜏𝑎𝑖 ) − 𝜁𝑖 (𝜏), (10)
Therefore, the asymptotic stability of the error system (4) is
once triggered, reset error 𝜂𝑖 (𝜏) to zero. demonstrated. □

3
W. Jin et al. Neurocomputing 564 (2024) 126965

Remark 4. From (15), the optimal control requires the information 4.2. The stability of weight error and local error
of the main matrix 𝐵𝑖 and 𝛥𝑉𝑖∗ (𝜁𝑖 (𝜏 + 1)), but sometimes when the
system information is difficult to obtain accurately, or solving the HJB Providing a reasonable assumption before proving the theorem:
equation with too many coupled terms is difficult, we have no way to
obtain the optimal control 𝑢∗𝑖 (𝜏) directly by (15), so we use a model-free Assumption 3. Assuming the existence of positive constants that make
reinforcement learning method to approximate the optimal control. certain parameters of the neural network bounded, i.e., 𝑐𝑚 , 𝑎𝑚 , 𝜙𝑐𝑚 ,
𝜙𝑎𝑚 , 𝜀𝑐𝑖𝑚 , 𝜀𝑎𝑖𝑚 such that:
4. Implementation with actor–critic NNs
‖𝑐𝑖 ‖ ≤ 𝑐𝑚 , ‖𝑎𝑖 ‖ ≤ 𝑎𝑚 ,
Due to the difficulty in solving the HJB equation, using the actor– ‖𝜙𝑐𝑖 ‖ ≤ 𝜙𝑐𝑚 , ‖𝜙𝑎𝑖 ‖ ≤ 𝜙𝑎𝑚 ,
critic NNs to solve the approximate solution of the HJB equation is a
‖𝜀𝑐𝑖 ‖ ≤ 𝜀𝑐𝑖𝑚 , ‖𝜀𝑎𝑖 ‖ ≤ 𝜀𝑎𝑖𝑚 .
good method. Therefore, two types of neural networks are designed in
this section.
Theorem 2. Assume that Assumptions 1–3 hold, consider MAS (1) and (2)
with control law (24), the critic NN weights  ̂ 𝑐𝑖 are updated by (26), the
4.1. Actor–critic NNs approximation
actor NN weights  ̂ 𝑎𝑖 are updated by (28) and (29), respectively. If event-
Firstly, constructing an approximation value (cost) function using triggered function is defined by (16) and event-triggered condition (17),
the characteristic of neural networks: 𝛼𝑐𝑖 ≤ ‖𝜙 1 ‖2 , 𝛼𝑎𝑖 ≤ ‖𝜙 1 ‖2 , are satisfied, then the local error system (4)
𝑐𝑚 𝑎𝑚
and the NNs weight estimation errors ultimately confirmed to be UUB.
𝑖∗ (𝜏) = 𝑐𝑖𝑇 𝜙𝑐𝑖 (𝜏) + 𝜀𝑐𝑖 , (21)
where 𝑐𝑖 represents the ideal weight of critic NN, 𝜙𝑐𝑖 (𝜏) and 𝜀𝑐𝑖 Proof. The Lyapunov function (𝜏) is derived from:
expresses activation function of the critic NN and the reconstruction
(𝜏) = 1 (𝜏) + 2 (𝜏) + 3 (𝜏) (30)
error, respectively.
The corresponding value function ̂ 𝑖 is estimated: with
̂ 𝑇 𝜙𝑐𝑖 (𝜏),
̂ 𝑖 (𝜏) =  (22) 1 ̃ 𝑇 ̃
𝑐𝑖,𝜏 1 (𝜏) = 𝑊 𝑊 ,
𝛼𝑐𝑖 𝑐𝑖,𝜏 𝑐𝑖,𝜏
̂ 𝑐𝑖,𝜏 is the critic NN’s estimate weight.
where  1 ̃ 𝑇 ̃
2 (𝜏) = 𝑊 𝑊 ,
The ideal input of the actor NN is expressed by: 𝛼𝑎𝑖 𝑎𝑖,𝜏 𝑎𝑖,𝜏
𝑢∗𝑖 (𝜏) = 𝑎𝑖
𝑇
𝜙𝑎𝑖 (𝜏) + 𝜀𝑎𝑖 , (23) 3 (𝜏) =𝑉𝑖 (𝜁𝑖 (𝜏)).

where 𝑎𝑖 is the ideal weight of the actor NN, 𝜙𝑎𝑖 (𝜏) denotes the actor Case 1: Events are triggered, when 𝜏 = 𝜏𝑎𝑖 .
NN activation function, 𝜀𝑎𝑖 is the reconstruction error. In Case 1, we have 𝛥(𝜏) = 𝛥1 (𝜏) + 𝛥2 (𝜏) + 𝛥3 (𝜏). First, for
The output of actor NN is: the function 1 (𝜏), making a difference gives

̂ 𝑇 (𝜏)𝜙𝑎𝑖 (𝜏), 1 ( ̃𝑇 ̃𝑇 
̃ 𝑐𝑖,𝜏+1 −  ̃
)
𝑢̂ 𝑖 (𝜏) =  𝑎𝑖,𝜏 (24) 𝛥1 (𝜏) = 𝑐𝑖,𝜏+1  𝑐𝑖,𝜏 𝑐𝑖,𝜏 . (31)
𝛼𝑐𝑖
where  ̂ 𝑎𝑖,𝜏 is the actor NN’s estimate weight.
Let ̃ 𝑐𝑖,𝜏 = 
̂ 𝑐𝑖,𝜏 − 𝑐𝑖 , 
̃ 𝑎𝑖,𝜏 = 
̂ 𝑎𝑖,𝜏 − 𝑎𝑖 . We can derive from (26)
Then updating the training rate of  ̂ 𝑐𝑖,𝜏 . Since there is an error in
that
the estimation of the NNs weights, let the estimation error of the critic
NN be: ̃ 𝑐𝑖,𝜏+1 =
 ̂ 𝑐𝑖,𝜏+1 − 𝑐𝑖

𝜎𝑐𝑖,𝜏 = ̂ 𝑖 (𝜏) − 𝑟𝑖 (𝜏) − ̂ 𝑖 (𝜏 + 1), (25) ̂ 𝑐𝑖,𝜏 − 𝛼𝑐𝑖 𝜙𝑐𝑖 (𝜏)𝜎𝑐𝑖,𝜏 − 𝑐𝑖
=
( 𝑇
1 𝑇 ̃
=𝑐𝑖,𝜏 − 𝛼𝑐𝑖 𝜙𝑐𝑖 (𝜏)  ̂ 𝜙𝑐𝑖 (𝜏)
let squared error function 𝐸𝑐𝑖,𝜏 = 𝜎 𝜎 ,
then the squared error 𝑐𝑖,𝜏
2 𝑐𝑖,𝜏 𝑐𝑖,𝜏 )
function is minimized by adjusting  ̂ 𝑐𝑖,𝜏 . Based on gradient-based − ̂𝑇 𝜙 (𝜏 + 1) − 𝑟𝑖 (𝜏)
𝑐𝑖,𝜏+1 𝑐𝑖
update law, turning law of critic NN weight is: =(𝐼 − 𝛼𝑐𝑖 𝜙𝑇𝑐𝑖 (𝜏)𝜙𝑐𝑖 (𝜏))
̃ 𝑐𝑖,𝜏 − 𝛼𝑐𝑖 𝜙𝑐𝑖 (𝜏)
𝜕𝐸𝑐𝑖,𝜏 𝜕𝜎𝑐𝑖,𝜏 𝜕 ̂ 𝑖 (𝜏) ( 𝑇 )
̂ 𝑐𝑖,𝜏+1 = 𝑊̂ 𝑐𝑖,𝜏 − 𝛼𝑐𝑖 × 𝑐𝑖,𝜏 𝜙𝑐𝑖 (𝜏) − 𝑟𝑖 (𝜏) −  ̂𝑇 𝜙 (𝜏 + 1) . (32)
 𝑐𝑖,𝜏+1 𝑐𝑖
𝜕𝜎𝑐𝑖,𝜏 𝜕 ̂ 𝑖 (𝜏) 𝜕 
̂ 𝑐𝑖 (𝜏)
Let (𝐼 − ̃ 𝑐𝑖,𝜏
𝛼𝑐𝑖 𝜙𝑇𝑐𝑖 (𝜏)𝜙𝑐𝑖 (𝜏)) ̂𝑇
≜ 𝛱1 , 𝛼𝑐𝑖 𝜙𝑐𝑖 (𝜏)(−𝑟𝑖 (𝜏) −  𝜙 (𝜏 + 1) +
̂ 𝑐𝑖,𝜏 − 𝛼𝑐𝑖 𝜙𝑐𝑖 (𝜏)𝜎𝑐𝑖,𝜏 , 𝑐𝑖,𝜏+1 𝑐𝑖
= (26) 1 ̃𝑇  ̃
𝑐𝑖,𝜏 𝜙𝑐𝑖 (𝜏)) ≜ 𝛱2 , then 𝛥1 (𝜏) = 𝛼 (𝛱1 − 2𝛱1 𝛱2 + 𝛱22 − 
𝑇 2
𝑐𝑖,𝜏 𝑐𝑖,𝜏 ).
𝑐𝑖
where 𝛼𝑐𝑖 > 0 is the critic NN learning parameter. For the purpose of the following analysis, the three terms 𝛱12 , −2𝛱1
Similar to update method of  ̂ 𝑐𝑖,𝜏 , the squared error function 𝐸𝑎𝑖,𝜏 𝛱2 and 𝛱22 will be treated and simplified as follows
is select as follow:
1 𝑇 𝛱12 =
̃ 𝑇 (𝐼 − 𝛼𝑐𝑖 𝜙𝑇 (𝜏)𝜙𝑐𝑖 (𝜏))2 
𝑐𝑖,𝜏 𝑐𝑖
̃ 𝑐𝑖,𝜏
𝐸𝑎𝑖,𝜏 = 𝜎𝑎𝑖,𝜏 𝜎𝑎𝑖,𝜏 , 𝜎𝑎𝑖,𝜏 = ̂ 𝑖 (𝜏) − 𝜒𝑖 ,
2 ̃𝑇 
= ̃ ̃𝑇 𝑇 ̃
𝑐𝑖,𝜏 𝑐𝑖,𝜏 − 2𝛼𝑐𝑖 𝑐𝑖,𝜏 𝜙𝑐𝑖 (𝜏)𝜙𝑐𝑖 (𝜏)𝑐𝑖,𝜏
where 𝜒𝑖 = 0 stands for the cost-to-go. Our object is to minimize 𝐸𝑎𝑖,𝜏 2
+ 𝛼𝑐𝑖 (𝜙𝑇𝑐𝑖 (𝜏)𝜙𝑐𝑖 (𝜏))2 
̃𝑇  ̃
̂ 𝑎𝑖,𝜏 is developed: 𝑐𝑖,𝜏 𝑐𝑖,𝜏
by adjusting 𝑊̂ 𝑎𝑖,𝜏 , thus, the turning law of 
̃ 𝑇 ‖2 − 2𝛼𝑐𝑖 ‖𝜓𝑐𝑖 (𝜏)‖2 + 𝛼 2 ‖𝜙𝑐𝑖 (𝜏)‖2 ‖𝜓𝑐𝑖 (𝜏)‖2 ,
=‖ (33)
𝜕𝐸𝑎𝑖,𝜏 𝜕𝜎𝑎𝑖,𝜏 𝜕 ̂ 𝑖 (𝜏) 𝜕𝜙𝑐𝑖 (𝜏) 𝜕 𝑢̂ 𝑖 (𝜏) 𝑐𝑖,𝜏 𝑐𝑖
̂ 𝑎𝑖,𝜏+1 = 
 ̂ 𝑎𝑖,𝜏 − 𝛼𝑎𝑖 (27)
𝜕𝜎𝑎𝑖,𝜏 𝜕 ̂ 𝑖 (𝜏) 𝜕𝜙𝑐𝑖 (𝜏) 𝜕 𝑢̂ 𝑖 (𝜏) 𝜕 
̂ 𝑎𝑖,𝜏 ̃ 𝑐𝑖,𝜏 𝜙𝑐𝑖 (𝜏).
where 𝜓𝑐𝑖 (𝜏) = 

where 𝛼𝑎𝑖 > 0 is a learning rate of the actor NN. (27) is rewritten: − 2𝛱1 𝛱2 = − 2̃ 𝑇 (𝐼 − 𝛼𝑐𝑖 𝜙𝑇 (𝜏)𝜙𝑐𝑖 (𝜏))𝛼𝑐𝑖 𝜙𝑐𝑖 (𝜏)
𝑐𝑖,𝜏 𝑐𝑖
( 𝑇
̂ 𝑎𝑖,𝜏+1 = 
 ̂ 𝑎𝑖,𝜏 , 𝜏 ∈ [𝜏𝑎𝑖 , 𝜏𝑎+1
𝑖
) (28) × −𝑟𝑖 (𝜏) + 𝑐𝑖,𝜏 𝜙𝑐𝑖 (𝜏)
)
− ̂𝑇 𝜙 (𝜏 + 1)
and 𝑐𝑖,𝜏+1 𝑐𝑖
𝑇
𝜕𝜙𝑐𝑖 (𝜏) 𝑇 = − 2𝛼𝑐𝑖 𝜓𝑐𝑖 (𝜏)(𝑐𝑖,𝜏 𝜙𝑐𝑖 (𝜏)
̂ 𝑎𝑖,𝜏+1 = 
 ̂𝑇
̂ 𝑎𝑖,𝜏 − 𝛼𝑎𝑖 𝜙𝑎𝑖 (𝜏) 𝑖
̂ (𝜏), 𝜏 = 𝜏𝑎+1 , (29)
𝑐𝑖,𝜏 𝜕 𝑢̂ 𝑖 (𝜏) 𝑖 2
̂𝑇
− 𝜙 (𝜏 + 1) − 𝑟𝑖 (𝜏)) + 2𝛼𝑐𝑖
𝑐𝑖,𝜏+1 𝑐𝑖
‖𝜙𝑐𝑖 (𝜏)‖2
̂ 𝑎𝑖,𝜏 updated only when the specified threshold is exceeded. 𝑇
 × 𝜓𝑐𝑖 (𝜏)(𝑐𝑖,𝜏 𝜙𝑐𝑖 (𝜏)

4
W. Jin et al. Neurocomputing 564 (2024) 126965

̂𝑇
− 𝜙 (𝜏 + 1) − 𝑟𝑖 (𝜏)) Case 2: When triggered conditions are not met, that is, ∀𝜏 ∈
𝑐𝑖,𝜏+1 𝑐𝑖
𝑖 ), (𝜏) is written as follows
(𝜏𝑎𝑖 , 𝜏𝑎+1
=2𝛼𝑐𝑖 (𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 − 1)𝜓𝑐𝑖 (𝜏)𝜌𝑖 (𝜏)
=𝛼𝑐𝑖 (𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 − 1)‖𝜓𝑐𝑖 (𝜏) + 𝜌𝑖 (𝜏)‖2 (𝜏) = 1 (𝜏) + 2 (𝜏) + 4 (𝜏), (42)
2 2 2
− 𝛼𝑐𝑖 (𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖ − 1)(‖𝜓𝑐𝑖 (𝜏)‖ + ‖𝜌𝑖 (𝜏)‖ ), (34) where
where 𝜌𝑖 (𝜏) = 𝑐𝑖,𝜏 ̂𝑇
𝑇 𝜙 (𝜏) − 𝑟 (𝜏) −  𝜙 (𝜏 + 1). 1 ̃𝑇 ̃
𝑐𝑖 𝑖 𝑐𝑖,𝜏+1 𝑐𝑖 1 (𝜏) =   ,
𝛼𝑐𝑖 𝑐𝑖,𝜏 𝑐𝑖,𝜏
𝛱22 =𝛼𝑐𝑖
2
‖𝜙𝑐𝑖 (𝜏)‖2 ∥𝑐𝑖,𝜏
𝑇
𝜙𝑐𝑖 (𝜏) − 𝑟𝑖 (𝜏) 1 ̃𝑇 ̃
2 (𝜏) =   ,
𝛼𝑎𝑖 𝑎𝑖,𝜏 𝑎𝑖,𝜏
̂𝑇
− 𝜙 (𝜏 + 1)∥2 .
𝑐𝑖,𝜏+1 𝑐𝑖
(35)
4 (𝜏) =𝛶𝑖 (𝜏).
Combining (31),(33),(34) and (35) to obtain
From (28),
𝛥1 (𝜏) = − 2‖𝜓𝑐𝑖 (𝜏)‖2 + 𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 ‖𝜓𝑐𝑖 (𝜏)‖2 1 ( ̃𝑇 )
𝛥2 (𝜏) = 𝑎𝑖,𝜏+1  ̃ 𝑎𝑖,𝜏+1 −  ̃𝑇  ̃
𝑎𝑖,𝜏 𝑎𝑖,𝜏
− (1 − 𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 )‖𝜓𝑐𝑖 (𝜏) + 𝜌𝑖 (𝜏)‖2 𝛼𝑎𝑖
− (𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 − 1)(‖𝜓𝑐𝑖 (𝜏)‖2 + ‖𝜌𝑖 (𝜏)‖2 ) 1 ̂ 𝑎𝑖,𝜏 − 𝑎𝑖 ‖2 − ‖ ̂ 𝑎𝑖,𝜏 − 𝑎𝑖 ‖2 )
= (‖
𝛼𝑎𝑖
+ 𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 ‖𝜌𝑖 (𝜏)‖2 1
= (‖ ̃ 𝑎𝑖,𝜏 ‖2 − ‖ ̃ 𝑎𝑖,𝜏 ‖2 )
= − ‖𝜓𝑐𝑖 (𝜏)‖2 + ‖𝜌𝑖 (𝜏)‖2 𝛼𝑎𝑖
− (1 − 𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 )‖𝜓𝑐𝑖 (𝜏) + 𝜌𝑖 (𝜏)‖2 . (36) =0. (43)

For the function 2 (𝜏), making a difference gives: 𝛥(𝜏) is written as follows
1 ( ̃𝑇 ̃𝑇 
̃ 𝑎𝑖,𝜏+1 −  ̃
) 𝛥(𝜏) = − ‖𝜓𝑐𝑖 (𝜏)‖2 + ‖𝜌𝑖 (𝜏)‖2
𝛥2 (𝜏) = 𝑎𝑖,𝜏+1  𝑎𝑖,𝜏 𝑎𝑖,𝜏 . (37)
𝛼𝑎𝑖
− (1 − 𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 )‖𝜓𝑐𝑖 (𝜏) + 𝜌𝑖 (𝜏)‖2
It follows from (29) that: + ‖𝛶𝑖 (𝜏 + 1)‖2 − ‖𝛶𝑖 (𝜏)‖2
̃ 𝑎𝑖,𝜏+1 =
 ̂ 𝑎𝑖,𝜏+1 − 𝑎𝑖 = − ‖𝜓𝑐𝑖 (𝜏)‖2 + ‖𝜌𝑖 (𝜏)‖2
𝜕𝜙𝑐𝑖 (𝜏) 𝑇 − (1 − 𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 )‖𝜓𝑐𝑖 (𝜏) + 𝜌𝑖 (𝜏)‖2 , (44)
̂ 𝑎𝑖,𝜏
= ̂𝑇
− 𝛼𝑎𝑖 𝜙𝑎𝑖 (𝜏) ̂
𝑐𝑖,𝜏 𝜕 𝑢̂ (𝜏) 𝑖 (𝜏) − 𝑎𝑖
𝑖
̃ 𝑎𝑖,𝜏 − 𝛼𝑎𝑖 𝜙𝑎𝑖 (𝜏)𝛤1 𝛤2 , Therefore, if the inequality conditions are met:
= (38)
1 1
̂𝑇 𝜕𝜙𝑐𝑖 (𝜏) 𝛼𝑐𝑖 ≤ , 𝛼𝑎𝑖 ≤ ,
where 𝛤1 =  𝑐𝑖,𝜏 𝜕 𝑢̂ 𝑖 (𝜏)
, 𝛤2 = ̂ 𝑖𝑇 (𝜏). According to (37) and (38), it is ‖𝜙𝑐𝑚 ‖2 ‖𝜙𝑎𝑚 ‖2
known that:
‖𝜓𝑐𝑖 (𝜏)‖ > 𝜌𝑚 ,
̃ 𝑇 𝜙𝑎𝑖 (𝜏)𝛤1 𝛤2
𝛥2 (𝜏) = − 2 𝑎𝑖,𝜏
then, we get 𝛥(𝜏) ≤ 0. Thus, according to Lyapunov stability theorem,
+ 𝛼𝑎𝑖 ‖𝜙𝑎𝑖 (𝜏)‖2 ‖𝛤1 ‖2 ‖𝛤2 ‖2 ̃ 𝑐𝑖,𝜏 and 
̃ 𝑎𝑖,𝜏 are UUB. □

=‖𝛤1 𝛤2 − 𝜓𝑎𝑖 (𝜏)‖2 − (1 − 𝛼𝑎𝑖 ‖𝜙𝑎𝑖 (𝜏)‖2 )
Considering network constraints, introducing quantized control into
× ‖𝛤1 ‖2 ‖𝛤2 ‖2
action NN will further save resources. The discussion about it is as
− ‖𝜓𝑎𝑖 (𝜏)‖2 , (39) follows.
̃ 𝑇 𝜙𝑎𝑖 (𝜏). Hence,
with 𝜓𝑎𝑖 (𝜏) =  𝑎𝑖,𝜏 4.3. The design of input quantization
𝛥(𝜏) =𝛥1 (𝜏) + 𝛥2 (𝜏) + 𝛥3 (𝜏)
Input quantization process is considered in proposed control strat-
= − ‖𝜓𝑐𝑖 (𝜏)‖2 + ‖𝜌𝑖 (𝜏)‖2
egy. The logarithmic quantizer is shown as follow:
− (1 − 𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 )‖𝜓𝑐𝑖 (𝜏) + 𝜌𝑖 (𝜏)‖2
⎧𝜉𝑖 1
𝜉 ≤𝑎≤ 1
𝜉 ,𝑎 >0
+ ‖𝛤1 𝛤2 − 𝜓𝑎𝑖 (𝜏)‖2 − ‖𝜓𝑎𝑖 (𝜏)‖2 ⎪ 1+𝜅 𝑖 1+𝜅 𝑖
𝑄(𝑎) = ⎨0 𝑎 = 0, (45)
− (1 − 𝛼𝑎𝑖 ‖𝜙𝑎𝑖 (𝜏)‖2 )‖𝛤1 𝛤2 ‖2 ⎪
⎩−𝑄(−𝑎) 𝑎 < 0,
+ 𝑉𝑖 (𝜁𝑖 (𝜏 + 1)) − 𝑉𝑖 (𝜁𝑖 (𝜏))
≤ − ‖𝜓𝑐𝑖 (𝜏)‖2 + 𝜌2𝑚 + 2𝜙21𝑚 𝜙22𝑚 + 2𝑎𝑚
2
𝜙2𝑎𝑚 where 𝜉𝑖 is quantization level and 0 < 𝜅 < 1 represents quantizer’s
accuracy parameter. The set of quantization levels can be represented:
− (1 − 𝛼𝑐𝑖 ‖𝜙𝑐𝑖 (𝜏)‖2 )‖𝜓𝑐𝑖 (𝜏) + 𝜌𝑖 (𝜏)‖2
( 1 − 𝜅 )𝑖
− (1 − 𝛼𝑎𝑖 ‖𝜙𝑎𝑖 (𝜏)‖2 )‖𝛤1 ‖2 ‖𝛤2 ‖2 , 𝜉̄ = {±𝜉𝑖 ∶ 𝜉𝑖 = 𝜉 , 𝑖 = 0, ±1, ±2, … } ∪ {0},
(40) 1+𝜅 0
where ‖𝜌𝑖 (𝜏)‖ ≤ ‖𝜌𝑚 ‖, ‖𝛤1 ‖ ≤ ‖𝜙1𝑚 ‖, ‖𝛤2 ‖ ≤ ‖𝜙2𝑚 ‖. Therefore, if the According to the quantifier definition, ∀𝑎 ∈ 𝑅. it satisfies |𝑄(𝑎) − 𝑎| ≤
following inequality conditions hold: 𝜅|𝑎|. Then ∀𝑧 = (𝑧1 , 𝑧2 , … , 𝑧𝑛 )𝑇
∈ 𝑅𝑛 , defining 𝑄(𝑧) = 𝑄(𝑧1 , 𝑧2 , … , 𝑧𝑛 )𝑇 ∈ 𝑅𝑛 , it can obtain that
1 1 𝑄(𝑧) − 𝑧 = 𝛬𝑧, where 𝛬 = diag{∧1 , ∧2 , … , ∧𝑁 } and ∧𝑖 ∈ [−𝜅, +𝜅].
𝛼𝑐𝑖 ≤ , 𝛼𝑎𝑖 ≤ ,
‖𝜙𝑐𝑚 ‖2 ‖𝜙𝑎𝑚 ‖2 Then, this paper introduces an event-triggered strategy with in-

put control signal quantization, so that the quantized control can be
‖𝜓𝑐𝑖 (𝜏)‖ > 𝜌2𝑚 + 2𝜙21𝑚 𝜙22𝑚 + 2𝑎𝑚
2 𝜙2 ,
𝑎𝑚 obtained by:
then
𝑢̂ 𝑞𝑖 (𝜏) = 𝑄(𝑢̂ 𝑖 (𝜏)) = (1 + 𝛬)𝑢̂ 𝑖 (𝜏). (46)
𝛥(𝜏) ≤ 0. (41)
The effect of quantized control on the stability of the system will be
̃ 𝑐𝑖,𝜏 and 
Thus, according to Lyapunov stability theory,  ̃ 𝑎𝑖,𝜏 are UUB. discussed in the following proof of the system dynamics (4). Due to
quantizer, the formula (6) has been changed to
5
W. Jin et al. Neurocomputing 564 (2024) 126965

𝜁𝑖 (𝜏 + 1) =𝐴𝜁𝑖 (𝜏) + (𝑑𝑖 + 𝑎𝑖0 )𝐵𝑖 𝑢̂ 𝑞𝑖 (𝜏)



− 𝑎𝑖𝑗 𝐵𝑗 𝑢̂ 𝑞𝑗 (𝜏)
𝑗∈𝑁𝑖

=𝑔𝑖 (𝜁𝑖 (𝜏), 𝑢̂ 𝑞𝑖 (𝜏)). (47)

According to Assumption 2, 𝑔𝑖 (𝜁𝑖 (𝜏), 𝑢̂ 𝑞𝑖 (𝜏))


≤ 𝜃‖𝜁𝑖 (𝜏)‖ + 𝜃‖(1 + 𝛬)𝜂𝑖 (𝜏)‖.
Similar to the design of the previously event-triggered function, design-
ing an ETC function adopted reasonable conditions:
1 − 2𝜃 2
𝜑𝑖 (𝜏) = ‖𝜂𝑖 (𝜏)‖2 − ‖𝜁𝑖 (𝜏)‖2 , (48)
2(1 + 𝜅)2 𝜃 2
the latest event-triggered moment is defined as
𝑖
𝜏𝑎+1 = inf {𝜏 > 𝜏𝑎𝑖 |𝜑𝑖 (𝜏) ≥ 0}, (49)

once the triggered error exceeds the current disagreement error, we


sample and update between messages to get the next event-triggered
moment. Fig. 1. Topology graph of multi-agent system.
The stability of local error system (4) is ensured using the proposed
quantized event-triggered control strategy, which will be proved in
Theorem 3 below.

Theorem 3. Under Assumptions 1–3, consider the MAS (1) and √ (2) with
QETC law (46) and event-triggered function (48). If 𝜃 ∈ (0, 2∕2), then
the error system (4) is asymptotically stable, which means the MAS (1) and
(2) can achieve optimal bipartite consensus.

𝑖 ), the control 𝑢̂ 𝑞 (𝜏) keep constant, let the


Proof. (1) When 𝜏 ∈ (𝜏𝑎𝑖 , 𝜏𝑎+1 𝑖
Lyapunov function 𝛶𝑖 (𝜏) = 𝜁𝑖𝑇 (𝜏)𝜁𝑖 (𝜏) + 𝑢̂ 𝑞𝑖 (𝜏)𝑇 𝑢̂ 𝑞𝑖 (𝜏), for 𝛶𝑖 (𝜏), one has

𝛥𝛶𝑖 (𝜏) =𝜁𝑖𝑇 (𝜏 + 1)𝜁𝑖 (𝜏 + 1) + 𝑢̂ 𝑞𝑖 (𝜏 + 1)𝑇 𝑢̂ 𝑞𝑖 (𝜏 + 1)


− (𝜁𝑖𝑇 (𝜏)𝜁𝑖 (𝜏) + 𝑢̂ 𝑞𝑖 (𝜏)𝑇 𝑢̂ 𝑞𝑖 (𝜏))
=𝜁𝑖𝑇 (𝜏 + 1)𝜁𝑖 (𝜏 + 1) − 𝜁𝑖𝑇 (𝜏)𝜁𝑖 (𝜏), (50)

then, according to Assumption 2 and triggered condition (48), we have


Fig. 2. The dynamics of 𝑥𝑖 (𝑖 = 0, 1, 2, 3, 4).
𝛥𝛶𝑖 (𝜏) ≤(𝜃‖𝜁𝑖 (𝜏)‖ + 𝜃‖(𝜃 + 𝛬)𝜂𝑖 (𝜏)‖)2 − ‖𝜁𝑖 (𝜏)‖2
≤2𝜃 2 ‖𝜁𝑖 (𝜏)‖2 + 2𝜃 2 (1 + 𝜅)2 ‖𝜂𝑖 (𝜏)‖2 − ‖𝜁𝑖 (𝜏)‖2
belonging to (0, 1). Fig. 2 indicates the local errors of the agent that
<0. (51)
reach consensus. Figs. 3 and 4 reveal that the weight of critic–actor NNs
(2) when 𝜏 = 𝜏𝑎𝑖 , let 𝑉𝑖 (𝜁𝑖 (𝜏)) be the Lyapunov function, according to eventually reach a fixed convergence value after the network have been
(11), the difference along the Lyapunov function 𝑉𝑖 (𝜁𝑖 (𝜏)) is 𝛥𝑉𝑖 (𝜁𝑖 (𝜏)) = continuously training. Fig. 5 shows that the agents eventually achieved
𝑉𝑖 (𝜁𝑖 (𝜏 + 1)) − 𝑉𝑖 (𝜁𝑖 (𝜏)) = −𝑟𝑖 (𝜁𝑖 (𝜏), 𝑢̂ 𝑞𝑖 (𝜏), 𝑢̂ 𝑞−𝑖 (𝜏)) < 0. bipartite consensus. Fig. 6 depicts the triggered moments of the agents
Therefore, error system (4) is asymptotically stable. □ 𝑖(𝑖 = 1, 2, 3, 4). Fig. 7 clearly reveals that the frequency of ETC is much
smaller than the frequency of time triggers, which shows that ETC
Remark 5. Compared with [28], which only studies continuous-time has the advantage of saving resources and reducing communication
single system under ETC, this paper not only considers the interaction
reduction. Fig. 8 indicates MAS reach optimal bipartite consensus under
between multiple agents, but also applies ADP method to DT QETC.
QETC. Fig. 9 clearly reveals that the triggering number under QETC.
Compared with Fig. 7, Fig. 9 shows triggering frequency of each agent
5. Simulation example
has decreased, indicating that QETC can further save resource. Fig. 10
Consider the MAS (1), (2) with a leader and four followers, whose displays that triggering errors of each of the four agents do not exceed
dynamics is expressed as: the triggering threshold, which is a demonstration of the rationality of
[ ] ETC design.
1 0.1
𝐴= ,
−0.1 1
[ ] [ ] [ ] [ ] 6. Conclusion
0.2 0.1 0.5 0.2
𝐵1 = , 𝐵1 = , 𝐵3 = , 𝐵4 = .
0.1 0.1 0.4 0.1

The network communication topology of MAS (1)–(2) is shown in This paper is concerned with the optimal bipartite consensus control
Fig. 1. According to the analysis in the theorem, corresponding weight of DT MAS via quantization control with ETC strategies. Firstly, a novel
matrices parameters and learning rate can be selected  = 𝐼2 , 𝑖𝑖 = ETC strategy is presented. Secondly, based on the ADP approach, we
𝐼2 (𝑖 = 1, 2, 3, 4), 12 = 21 = 34 = 43 = 24 = 42 = 𝐼2 , train actor NN and critic NN to approximate the value function and the
𝛼𝑐𝑖 = 0.1, 𝛼𝑎𝑖 = 0.1, activation functions 𝜙𝑐1 (𝜏) = [𝜁12 (𝜏), 𝑢21 (𝜏𝑎𝑖 ), 𝑢22 (𝜏𝑎𝑖 )]𝑇 , optimal control by appropriate weight adjustment laws. In addition, we
𝑇
𝜙𝑎1 (𝜏) = 𝜁12 (𝜏), 𝜙𝑎2 (𝜏) = 𝜁22 (𝜏), 𝜙𝑐2 (𝜏) = [𝜁22 (𝜏), 𝑢21 (𝜏𝑎𝑖 ), 𝑢22 (𝜏𝑎𝑖 ), 𝑢24 (𝜏𝑎𝑖 )] , quantize the control taking into account the influence of information
𝜙𝑎3 (𝜏) = 𝜁32 (𝜏), 𝜙𝑐3 (𝜏) = [𝜁32 (𝜏), 𝑢23 (𝜏𝑎𝑖 ), 𝑢24 (𝜏𝑎𝑖 )]𝑇 , 𝜙𝑎4 (𝜏) = 𝜁42 (𝜏), 𝜙𝑐4 (𝜏) = transfer. Finally, a numerical simulation demonstrates the effectiveness
[𝜁42 (𝜏), 𝑢24 (𝜏𝑎𝑖 ), 𝑢23 (𝜏𝑎𝑖 ), 𝑢22 (𝜏𝑎𝑖 )]𝑇 , initial parameters of critical NN are cho- of the proposed approach. Practical factors such as the convergence rate
sen as zero and initial parameters of actor NN are random numbers and delay of DTMAS will be further considered in the future.

6
W. Jin et al. Neurocomputing 564 (2024) 126965

Fig. 3. Evaluate the training trajectory of the critic NN weights.


Fig. 6. The event-triggered instant.

Fig. 4. Evaluate the training trajectory of the actor NN weights.


Fig. 7. Comparison of event-triggered times and time triggered times.

Fig. 5. Trajectories development curve of 𝜀𝑖 .


Fig. 8. Trajectories development curve of 𝜀𝑖 under quantized event-triggered control.

7
W. Jin et al. Neurocomputing 564 (2024) 126965

References

[1] H. Xiao, C. Chen, D. Yu, Two-level structure swarm formation system with
self-organized topology network, Neurocomputing 384 (2020) 356–367.
[2] K. Guo, X. Li, L. Xie, Ultra-wideband and odometry-based cooperative relative
localization with application to multi-UAV formation control, IEEE Trans. Cybern.
50 (6) (2020) 2590–2603.
[3] J. Qin, W. Fu, H. Gao, W.X. Zheng, Distributed 𝑘-means algorithm and fuzzy
𝑐-means algorithm for sensor networks based on multiagent consensus theory,
IEEE Trans. Cybern. 47 (3) (2017) 772–783.
[4] M.A.B. Brasil, B. Bösch, F.R. Wagner, E.P. de Freitas, Performance comparison
of multi-agent middleware platforms for wireless sensor networks, IEEE Sens. J.
18 (7) (2018) 3039–3049.
[5] Q. Sun, R. Han, H. Zhang, J. Zhou, J.M. Guerrero, A multiagent-based consensus
algorithm for distributed coordinated control of distributed generators in the
energy internet, IEEE Trans. Smart Grid 6 (6) (2015) 3006–3019.
[6] R. Lu, Y. Xu, A. Xue, J. Zheng, Networked control with state reset and quantized
Fig. 9. Comparison of quantized event-triggered times and time triggered times. measurements: Observer-based case, IEEE Trans. Ind. Electron. 60 (11) (2013)
5206–5213.
[7] Y. Cao, W. Ren, M. Egerstedt, Distributed containment control with multiple sta-
tionary or dynamic leaders in fixed and switching directed networks, Automatica
48 (8) (2012) 1586–1597.
[8] S. Zhang, Z. Li, X. Wang, Robust 𝐻2 consensus for multi-agent systems with
parametric uncertainties, IEEE Trans. Circuits Syst. II 68 (7) (2021) 2473–2477.
[9] J. Shi, Cooperative control for nonlinear multi-agent systems based on
event-triggered scheme, IEEE Trans. Circuits Syst. II 68 (6) (2021) 1977–1981.
[10] Q. Jia, W.K.S. Tang, Event-triggered protocol for the consensus of multi-agent
systems with state-dependent nonlinear coupling, IEEE Trans. Circuits Syst. I.
Regul. Pap. 65 (2) (2018) 723–732.
[11] Z. Li, J. Yan, W. Yu, J. Qiu, Adaptive event-triggered control for unknown
second-order nonlinear multiagent systems, IEEE Trans. Cybern. 51 (12) (2021)
6131–6140.
[12] Y. Yang, Y. Li, D. Yue, W. Yue, Adaptive event-triggered consensus control of
a class of second-order nonlinear multiagent systems, IEEE Trans. Cybern. 50
(12) (2020) 5010–5020.
[13] X. Shi, Y. Li, Q. Liu, K. Lin, S. Chen, A fully distributed adaptive event-triggered
control for output regulation of multi-agent systems with directed network,
Inform. Sci. 626 (2023) 60–74.
[14] Z.-G. Wu, Y. Xu, Y.-J. Pan, P. Shi, Q. Wang, Event-triggered pinning control for
consensus of multiagent systems with quantized information, IEEE Trans. Syst.
Man Cybern. 48 (11) (2018) 1929–1938.
[15] N. Lin, Q. Ling, Bit-rate conditions for the consensus of quantized multiagent
systems based on event triggering, IEEE Trans. Cybern. 52 (1) (2022) 116–127.
[16] Q. Wang, S. Li, W. He, W. Zhong, Fully distributed event-triggered bipartite
consensus of linear multi-agent systems with quantized communication, IEEE
Trans. Circuits Syst. II 69 (7) (2022) 3234–3238.
[17] Z. Xu, C. Li, Y. Han, Leader-following fixed-time quantized consensus of
multi-agent systems via impulsive control, J. Franklin Inst. B 356 (1) (2019)
Fig. 10. The dynamics of triggering error. 441–456.
[18] S. Dong, L. Liu, G. Feng, M. Liu, Z.-G. Wu, Quantized fuzzy cooperative
output regulation for heterogeneous nonlinear multiagent systems with directed
fixed/switching topologies, IEEE Trans. Cybern. 52 (11) (2022) 12393–12402.
CRediT authorship contribution statement
[19] X. Li, M.Z.Q. Chen, H. Su, Quantized consensus of multi-agent networks with
sampled data and Markovian interaction links, IEEE Trans. Cybern. 49 (5) (2019)
Wanli Jin: Data curation, Writing – original draft, Software, Writing 1816–1825.
[20] S. Huo, L. Ge, F. Li, Robust ℋ∞ consensus for Markov jump multiagent systems
– review & editing. Huaguang Zhang: Conceptualization, Methodol-
under mode-dependent observer and quantizer, IEEE Syst. J. 15 (2) (2021)
ogy, Supervision, Visualization, Investigation. Zhongyang Ming: Soft- 2443–2450.
ware, Validation. [21] D. Wang, D. Liu, H. Li, B. Luo, H. Ma, An approximate optimal control
approach for robust stabilization of a class of discrete-time nonlinear systems
with uncertainties, IEEE Trans. Syst. Man Cybern. 46 (5) (2016) 713–717.
Declaration of competing interest [22] Q. Wei, R. Song, P. Yan, Data-driven zero-sum neuro-optimal control for a class
of continuous-time unknown nonlinear systems with disturbance using ADP, IEEE
Trans. Neural Netw. Learn. Syst. 27 (2) (2016) 444–458.
The authors declare that they have no known competing finan- [23] H. Zhang, H. Jiang, Y. Luo, G. Xiao, Data-driven optimal consensus control for
cial interests or personal relationships that could have appeared to discrete-time multi-agent systems with unknown dynamics using reinforcement
influence the work reported in this paper. learning method, IEEE Trans. Ind. Electron. 64 (5) (2017) 4091–4100.
[24] J. Zhang, H. Zhang, T. Feng, Distributed optimal consensus control for nonlinear
multiagent system with unknown dynamic, IEEE Trans. Neural Netw. Learn. Syst.
Data availability 29 (8) (2018) 3339–3348.
[25] M. Ha, D. Wang, D. Liu, Event-triggered adaptive critic control design for
discrete-time constrained nonlinear systems, IEEE Trans. Syst. Man Cybern. 50
Data will be made available on request. (9) (2020) 3158–3168.

8
W. Jin et al. Neurocomputing 564 (2024) 126965

[26] X. Yang, Q. Wei, Adaptive critic learning for constrained optimal event-triggered Huaguang Zhang (F’15) received the B.S. and M.S. degrees
control with discounted cost, IEEE Trans. Neural Netw. Learn. Syst. 32 (1) (2021) in control engineering from the Northeast Dianli University
91–104. of China, Jilin City, China, in 1982 and 1985, respectively,
[27] H. Su, H. Zhang, H. Jiang, Y. Wen, Decentralized event-triggered adaptive control and the Ph.D. degree in thermal power engineering and
of discrete-time nonzero-sum games over wireless sensor-actuator networks automation from Southeast University, Nanjing, China, in
1991. He joined the Department of Automatic Control,
with input constraints, IEEE Trans. Neural Netw. Learn. Syst. 31 (10) (2020)
Northeastern University, Shenyang, China, in 1992, as a
4254–4266.
Post-Doctoral Fellow for two years, where he has been a
[28] B. Luo, Y. Yang, D. Liu, H.-N. Wu, Event-triggered optimal control with per-
Professor and the Head of the Institute of Electric Automa-
formance guarantees using adaptive dynamic programming, IEEE Trans. Neural
tion, School of Information Science and Engineering, since
Netw. Learn. Syst. 31 (1) (2020) 76–88. 1994. He has authored and coauthored over 200 journals
[29] C. Altafini, Consensus problems on networks with antagonistic interactions, IEEE and conference papers and 4 monographs, and co-invented
Trans. Automat. Control 58 (4) (2013) 935–946. 20 patents. His current research interests include fuzzy
[30] H. Zhang, J. Duan, Y. Wang, Z. Gao, Bipartite fixed-time output consensus of control, stochastic system control, neural networks-based
heterogeneous linear multiagent systems, IEEE Trans. Cybern. 51 (2) (2021) control, nonlinear control, and their applications. Dr. Zhang
548–557. was a recipient of the Outstanding Youth Science Founda-
[31] S. Miao, H. Su, Bipartite consensus for second-order multiagent systems tion Award from the National Natural Science Foundation
with matrix-weighted signed network, IEEE Trans. Cybern. 52 (12) (2022) Committee of China in 2003, the IEEE TRANSACTIONS
13038–13047. ON NEURAL NETWORKS 2012 Outstanding Paper Award,
and the Andrew P. Sage Best Transactions Paper Award
[32] H. Wang, W. Yu, G. Wen, G. Chen, Finite-time bipartite consensus for multi-agent
2015. He was named the Cheung Kong Scholar by the
systems on directed signed networks, IEEE Trans. Circuits Syst. I. Regul. Pap. 65
Education Ministry of China in 2005. He is the E-Letter
(12) (2018) 4336–4348.
Chair of the IEEE CIS Society and the Former Chair of
[33] Y. Wu, Y. Pan, M. Chen, H. Li, Quantized adaptive finite-time bipartite NN the Adaptive Dynamic Programming and Reinforcement
tracking control for stochastic multiagent systems, IEEE Trans. Cybern. 51 (6) Learning Technical Committee on the IEEE Computational
(2021) 2870–2881. Intelligence Society. He was an Associate Editor of the
[34] Z. Peng, R. Luo, J. Hu, K. Shi, B.K. Ghosh, Distributed optimal tracking control IEEE TRANSACTIONS ON FUZZY SYSTEMS from 2008 to
of discrete-time multiagent systems via event-triggered reinforcement learning, 2013. He is an Associate Editor of Automatica, the IEEE
IEEE Trans. Circuits Syst. I. Regul. Pap. 69 (9) (2022) 3689–3700. TRANSACTIONS ON NEURAL NETWORKS AND LEARNING
SYSTEMS, IEEE TRANSACTIONS ON CYBERNETICS, and
Neurocomputing.

Wanli Jin received the B.S. and M.S. degrees in mathe-


matics and applied mathematics from Xinjiang University, Zhongyang Ming received the B.S. degree in mathematics
Urumqi, China, in 2019 and 2022, respectively. She is and applied mathematics from Shenyang Normal Univer-
currently pursuing the Ph.D. degree in control science and sity, Shenyang, China, in 2018, and the M.S. degree in
engineering from Northeastern University, Shenyang, China. basic mathematics from Northeastern University, Shenyang,
Her current research interests include reinforcement learn- China, in 2020, where he is currently pursuing the Ph.D.
ing, optimal control, event-triggered control, multi-agent degree in control science and engineering. His current
systems, stochastic systems, adaptive dynamic programming. research interests include reinforcement learning, optimal
control, fuzzy control, multi-agent system, stochastic system,
adaptive dynamic programming, and their applications in
power systems.

You might also like