21-IEEE-NNLS-Adaptive Resilient Event-Triggered Control Design of Autonomous Vehicles With An Iterative Single Critic Learning Framework

This document discusses the development of an adaptive resilient event-triggered control design for autonomous vehicles using an iterative single critic learning framework. It introduces the problem of controlling rear-wheel-drive autonomous vehicles and building an error system model. It then covers combining event-triggered sampling with an iterative single critic learning framework to develop a new adaptive resilient control algorithm.

Uploaded by

duodaji262

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views10 pages

21-IEEE-NNLS-Adaptive Resilient Event-Triggered Control Design of Autonomous Vehicles With An Iterative Single Critic Learning Framework

Uploaded by

duodaji262

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

5502 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 32, NO.

12, DECEMBER 2021

Adaptive Resilient Event-Triggered Control Design

of Autonomous Vehicles With an Iterative Single
Critic Learning Framework
Kun Zhang , Member, IEEE, Rong Su , Senior Member, IEEE, Huaguang Zhang , Fellow, IEEE,
and Yunlin Tian

Abstract— This article investigates the adaptive resilient event- human beings in man–machine games, the AI technique,
triggered control for rear-wheel-drive autonomous (RWDA) vehi- which is based on reinforcement learning (RL) or adaptive
cles based on an iterative single critic learning framework, which dynamic programming (ADP) methods, has recently drawn
can effectively balance the frequency/changes in adjusting the
vehicle’s control during the running process. According to the much attention of engineers and scholars from the realm of
kinematic equation of RWDA vehicles and the desired trajectory, science and technology [1]–[4]. That is because RL and ADP
the tracking error system during the autonomous driving process methods have the capability to solve the optimal control,
is first built, where the denial-of-service (DoS) attacking signals which can maximize or minimize the cost function while
are injected into the networked communication and transmission. stabilizing the system dynamics. Partially due to the feasibility
Combining the event-triggered sampling mechanism and iterative
single critic learning framework, a new event-triggered condition of solving optimal control problems, RL and ADP meth-
is developed for the adaptive resilient control algorithm, and ods have presented good applicability in autonomous vehicle
the novel utility function design is considered for driving the driving [5], [6]. However, it is still in the infancy stage, and
autonomous vehicle, where the control input can be guaranteed many pivotal control issues on autonomous vehicles are still
into an applicable saturated bound. Finally, we apply the new far from perfect.
adaptive resilient control scheme to a case of driving the RWDA
vehicles, and the simulation results illustrate the effectiveness and Generally, the energy expenditure in the driving process
practicality successfully. plays a key role in determining the main performance of an
autonomous vehicle. One way is the optimal control method,
Index Terms— Adaptive dynamic programming (ADP),
autonomous vehicle, event-triggered control, optimal control, which gives the maximum or minimum of the performance
resilient control, tracking control. index. Based on the principle of optimality, the optimal control
satisfies a mathematical partial differential equation [7]–[10],
I. I NTRODUCTION which is difficult to be solved by an analytical method directly.
To overcome this challenge, many scholars try to find an
W ITH the rapid development of the Internet of Things,
artificial intelligence (AI), and new energy technolo-
gies, a new era of automobile intelligence has come quietly.
approximated solution by iterative learning methods [11]–[13],
which depends on RL and ADP architecture. In particular,
After the “Alpha-Go” algorithm defeated the champion of for the traditional optimal tracking control problem with
asymptotically stable trajectories, the control input must be
Manuscript received June 27, 2020; revised October 22, 2020; accepted divided into steady-state part and feedback part to guarantee
January 16, 2021. Date of publication February 3, 2021; date of cur-
rent version December 1, 2021. This work was supported in part by the the finiteness of the cost function [14]–[16]. To better measure
National Postdoctoral Program for Innovative Talents of China under Grant the control input, the discounted cost function was designed by
BX20200357, in part by the China Postdoctoral Science Foundation under some researchers [17]–[19] for an augmented tracking system
Grant 2020M680718, in part by the Singapore National Research Foundation
Delta-NTU Corporate Lab Program (DELTA-NTU CORP-SMA-RP2), and in reconstructed by combining system and reference dynamics.
part by the Singapore Ministry of Education Tier 1 Academic Research Grant Another way is the event-triggering control method that can
2013-T1-002-177. (Corresponding author: Kun Zhang.) reduce the cost of control operation in the driving process.
Kun Zhang is with the Key Laboratory of Systems and Control, Institute
of Systems Science, Academy of Mathematics and Systems Science, Chinese Distinct from the time-driven methods, the event-triggering
Academy of Sciences, Beijing 100049, China (e-mail: [email protected]). mechanism can optimize the control cost by decreasing data
Rong Su is with the School of Electrical and Electronic Engi- sampling frequency. With this characteristic, many event-
neering, Nanyang Technological University, Singapore 639798 (e-mail:
[email protected]). driven structures and methods were developed [20]–[22], and
Huaguang Zhang is with the State Key Laboratory of Synthetical Automa- Wang et al. [23] discussed a self-learning optimal regulation,
tion for Process Industries, Northeastern University, Shenyang 110819, China, where an event-based adaptive critic algorithm was developed
and also with the School of Information Science and Engineering, Northeast-
ern University, Shenyang 110819, China (e-mail: [email protected]). with an event-driven formulation. However, the optimal cost
Yunlin Tian is with the Faculty of Business, University of Wollongong, functions were not further analyzed according to the event
Wollongong, NSW 2522, Australia (e-mail: [email protected]). sampling processes at this stage.
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TNNLS.2021.3053269. It should be mentioned that the communication net-
Digital Object Identifier 10.1109/TNNLS.2021.3053269 works in an autonomous vehicle system, which may suffer
2162-237X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:32:05 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: ADAPTIVE RESILIENT EVENT-TRIGGERED CONTROL DESIGN OF AUTONOMOUS VEHICLES 5503

denial-of-service (DoS) attacks, affect the driving path processes. A new adaptive tracking control algorithm is pro-
and feedback control policy by transmitting the driving posed in Section IV for the autonomous driving system of
information. To overcome the DoS attacks in networked vehicles, and the single critic learning framework is developed.
cyber–physical systems, some resilient control methods were The novel adaptive resilient event-triggered control algorithm
developed to resist their negative effects [24], [25]. Among is implemented in a simulated case shown in Section V, and
the complex systems in cyberenvironment, resilient dis- conclusions are drawn in Section VI.
tributed control schemes were developed for multiagent
systems [26]–[28], and Tang et al. [29] researched the forma- II. P ROBLEM F ORMULATION AND P RELIMINARIES
tion control problem of nonlinear multiagent systems under
A. Dynamic Modeling of an Autonomous Vehicle System
DoS attacks and achieved the contribution with hybrid event-
triggering strategies. Considering the communication protocol In this article, we consider the typical rear-wheel-drive
of networked systems, the state-saturated resilient strategies autonomous (RWDA) vehicle [39], and the normal kinematic
were investigated in [30] and [31] to against the deception equation of the RWDA vehicle is
⎧
attacks, and Xue et al. [32] designed a switching-dependent ⎨
⎨ ẋ(t) = v x (t) cos(θ (t)) − dr wx (t) sin(θ (t))
controller for the asynchronously switched systems with adap-
ẏ(t) = v x (t) sin(θ (t)) + dr wx (t) cos(θ (t)) (1)
tion to fast-switching perturbation. The resilient stabilization ⎨
⎩
of switched discrete-time systems was studied in [33] against θ̇(t) = wx (t)
adversarial switching, and Dong et al. [34] proposed a class of where x(t) is the horizontal position of the mass center of the
new convex reliable stabilization conditions for the influence vehicle in the inertial reference frame, y(t) is the vehicle’s
of sensor faults. Besides, according to the optimal control vertical position, θ (t) is its orientation, v x (t) represents the
and game theory, some resilient methods were developed to longitudinal velocity of the mass center in the body fixed
mitigate impacts of the attacks and perturbations in [35]–[37], frame, wx (t) denotes the yaw angular velocity (around Z axis,
and Lu and Yang [38] designed an event-triggering mechanism and perpendicular to the X − Y plane), and dr is the direct
to make a tradeoff between the transmission efficiency and distance from the mass center to the rear axle in the vehicle.
tolerable attack intensity, which indicates the potential advan- To make an RWDA vehicle follow a desired trajectory,
tages of event-triggering resilient control methods. we assume that the desired reference trajectory is generated
To solve the control scheme with less energy expenditure by fixed longitudinal velocity vr (t) and yaw angular velocity
in the vehicle’s driving process, and motivated by the ADP wr (t) of the vehicle, and the reference dynamics can be
techniques, a new resilient event-triggered control scheme for presented as
the autonomous driving system of vehicles is proposed by ⎧
using the iteration single critic learning framework. The main ⎨
⎨ ẋr (t) = vr (t) cos(θr (t)) − dr wr (t) sin(θr (t))
contributions in this article can be summarized as follows. ẏr (t) = vr (t) sin(θr (t)) + dr wr (t) cos(θr (t)) (2)
⎨
⎩
1) A novel autonomous driving system is structured with θ̇r (t) = wr (t)
considering the DoS attacks, and by combining event- where xr (t), yr (t), and θr (t) indicate the desired horizon-
triggered sampling mechanism and iterative single critic tal position, vertical position, and the vehicle’s orientation,
learning framework, the driving control process involves respectively.
less energy expenditure. In the local coordinate system of a vehicle, we define that
2) Different from some existing event-triggering mecha- the horizontal position error is x e (t), the vertical position error
nisms in ADP methods, such as [20]–[23], the control is ye (t), and the heading direction error is θ (t). Then, the
scheme is developed with a specific sampling mech- tracking error vector of the vehicle during the autonomous
anism, where the cost function under event-triggered driving becomes
processes is analyzed by sampling intervals, which is ⎡ ⎤ ⎡ ⎤⎡ ⎤
the first of their kinds. x e (t) cos(θ (t)) sin(θ (t)) 0 xr (t) − x(t)
3) The adaptive resilient event-triggered control algo- ⎣ ye (t)⎦ = ⎣ − sin(θ (t)) cos(θ (t)) 0⎦⎣ yr (t) − y(t)⎦.
rithm is first developed for the autonomous vehi- θe (t) 0 0 1 θr (t) − θ (t)
cles, which effectively balances the frequency/changes (3)
in adjusting the vehicle’s control during the running
The RWDA vehicle’s free-body diagram and tracking tra-
process.
jectory are presented in Fig. 1, where the reaction forces on
The rest of this article is organized as follows. The main each individual wheel are summed up at their mid-axles, and
design is inspired by how to drive the autonomous vehicle run- the reference path is given.
ning with less energy expenditure, as presented in Section I, The control objective of this article is to find a pair of con-
where the event-triggering mechanism and resilient control trol inputs v x (t) and wx (t) of RWDA vehicles, which stabilizes
problem are considered. The system model of an RWDA the tracking errors and resists the attacking signals from the
vehicle is built based on the kinematic equation in Section II, vehicle’s communication system. Generally, there exist a lot
where DoS attacks occur in the dynamic system. Section III of DoS attacks in networked systems, and the autonomous
gives the event-triggered sampling mechanism and derives the vehicle is particularly vulnerable to sensor and actuator attacks
cost function and control policy under the event-triggering as it heavily relies on proper data communication. The tracking

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:32:05 UTC from IEEE Xplore. Restrictions apply.
5504 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 32, NO. 12, DECEMBER 2021

error dynamics with uncertain attacking signals are presented

in the following.

B. Resilient Control Problem in Autonomous Driving

According to the RWDA vehicle’s free body diagram and
tracking error equation (3), we compute the tracking error
dynamics of an RWDA vehicle.
Theorem 1: For a nominal RWDA vehicle system, the
tracking error dynamic can be derived as
ṡ(t) = f (s(t)) + g(s(t))u(t) (4)
wx (t)
where s(t) = [ xe (t), ye (t), θe (t) ]T , u(t) = v x (t) , f (t) = Fig. 1. RWDA vehicle’s free-body diagram and tracking trajectory.
vr (t) cos(θe (t)) ye (t) −1
vr (t) sin(θe (t)) , and g(t) = −xe (t)−dr 0 .
wr (t) −1 0
Proof: First, based on the tracking error equation (3),
using the vehicle’s dynamic (1) and reference dynamic (2),
the error dynamic ẋ e yields
ẋ e = cos(θ )[ẋr − ẋ] + sin(θ )[ ẏr − ẏ]
− sin(θ )θ̇[xr − x] + cos(θ )θ̇ [yr − y]
= cos(θr − θe )ẋr − cos(θ )[v x cos(θ ) − dr wx sin(θ )]
+ sin(θr − θe ) ẏr − sin(θ )[v x sin(θ ) + dr wx cos(θ )]
Fig. 2. Autonomous driving system in the suffering of uncertain DoS
− sin(θ )wx [xr − x] + cos(θ )wx [yr − y] attacking signals.
= ẋr [cos(θr ) cos(θe ) + sin(θr ) sin(θe )] + ye wx
+ ẏr [sin(θr ) cos(θe ) − cos(θr ) sin(θe )] − v x . (5) In the autonomous driving and operation process, the track-
ing error system (4) of RWDA vehicles generally suffers
From the RWDA vehicle’s free body diagram and modeling uncertain attacking signals, where the DoS attacks are injected
process, one has ẋr sin(θr ) = ẏr cos(θr ); thus, it becomes into the tracking dynamic via communication networks as the
ẋ e = ye wx + [vr cos(θr ) − dr wr sin(θr )] cos(θr ) cos(θe ) signal flow shown in Fig. 2.
Then, the general autonomous driving system of RWDA
− v x + [vr sin(θr ) + dr wr cos(θr )] sin(θr ) cos(θe )
vehicles can be rewritten as
= ye wx − v x + vr cos(θe ). (6)
ṡ(t) = f (s(t)) + α(t) + g(s(t))u(t) (10)
Second, for the tracking error ye , there is
ẏe = − sin(θ )[ẋr − ẋ] + cos(θ )[ ẏr − ẏ] where ||g(s(t))|| = 0 for all s(t), u(t) = μ(t)+u r (t)+β(t) is
the actual system input working on the vehicles, μ(t) indicates
− cos(θ )θ̇[xr − x] − sin(θ )θ̇ [yr − y] the resilient tracking control policy, u r (t) = [ wr (t),vr (t) ]T is
= − sin(θr − θe )ẋr + sin(θ )[v x cos(θ ) − dr wx sin(θ )] the reference policy, and α(t) = [ αx (t), α y (t), αθ (t) ]T and β(t) =
+ cos(θr − θe ) ẏr − cos(θ )[v x sin(θ ) + dr wx cos(θ )] [ β1 (t), β2 (t) ]T denote the uncertain sensor and actuator attacking
− cos(θ )wx [xr − x] − sin(θ )wx [yr − y] signals in the tracking dynamic functions, respectively.
In the tracking driving process, the reference policy u r (t) is
= −dr wx − ẋr [sin(θr ) cos(θe ) − cos(θr ) sin(θe )] used to maintain the tracking trajectory, which is a given ref-
−x e wx + ẏr [cos(θr ) cos(θe ) + sin(θr ) sin(θe )] erence in the vehicles, and the resilient tracking control policy
= −dr wx − x e wx + vr sin(θe ). (7) μ(t) is used to decrease the tracking errors and overcome the
uncertain signals.
Finally, based on the dynamics (1) and (2), the tracking Assumption 1: In this article, the sensor and actuator attack-
error dynamic θe (t) can be directly derived by ing signals are assumed to be state dependent, which can be
θ̇e (t) = θ̇r (t) − θ̇ (t) = wr (t) − wx (t). (8) parameterized as α(t) = (t)s(t) and β(t) = W (t)ϕ(s(t)) for
all t ≥ 0. Similar to [41], (t) and W (t) are unknown time-
Thus, we have varying matrices satisfying || (t)||2 ≤ b1 and ||W (t)||2 ≤ b2
⎡ ⎤ ⎡ ⎤ with constants b1 , b2 > 0, and ϕ(·) is an unknown and
ẋ e (t) vr (t) cos(θe (t)) + ye (t)wx (t) − v x (t)
⎣ ẏe (t)⎦ = ⎣ vr (t) sin(θe (t)) − x e (t)wx (t) − dr wx (t)⎦ (9) bounded nonlinear function.
To stabilize the tracking error dynamic (10) under the DoS
θ̇e (t) wr (t) − wx (t)
attacks, we design the resilient control scheme for RWDA
which gives the tracking error dynamic (4) and completes the vehicles by using the iterative single critic learning framework,
proof. which is developed in Section III.

III. E VENT-T RIGGERED R ESILIENT C ONTROL AND is designed by an integrand positive definite function as [1]
S TABILITY A NALYSIS OF S AMPLING M ECHANISM D ESIGN μ1 (t,zi )

U (μ(t, z i )) = 2λ1 tanh−1 λ−11 υ1 R1 dυ1
A. Event-Triggered Resilient Optimal Control With 0
Saturating Bound μ2 (t,zi )

+ 2λ2 tanh−1 λ−1
2 υ2 R2 dυ2
For the tracking error dynamics of RWDA vehicles, the 0
μ(t,zi )
uncertain attacking signals are considered in the communica-
tion networks. By this way, we further rewrite the autonomous = 2λ tanh−T λ−1 υ Rdυ (15)
0
driving system (10) as
where μ(t, z i ) = [ μ1 (t,zi ), μ2 (t,zi ) ]T is the control vector in an
ṡ(t) = f (s(t)) + g(s(t))(μ(t) + u r (t)) + σ (t) RWDA vehicle, λ = diag(λ1 , λ2 ) > 0 is the saturating bound
= f¯(s(t)) + g(s(t))μ(t) + σ (t) (11) matrix of the control vector, tanh−T (·) = [tanh−1 (·)]T , υ =
[ υ1 , υ2 ]T , and R = diag(R1 , R2 ) > 0 is a positive definite
where f¯(s(t)) = f (s(t))+g(s(t))u r (t) is the desired dynamic matrix.
part, and σ (t) = α(t) + g(t)β(t) is the overall attacking Definition 1: For the system (11), a state feedback control
signal. Based on the aforementioned assumption, it can be policy μ(t, z i ) is called an admissible event-triggered resilient
obtained that ||σ (t)|| ≤ γ ||g(t)|| with a constant γ > 1. control if the control μ(t, z i ) is updated based on an event-
As we have pointed out above, the reference policy u r (t) is triggered mechanism and can make the infinite horizon integral
given according to the desired reference, which is generally set performance index J (s(0)) finite with any initial state s(0)
in the driving system as the desired reference. Besides, in a when stabilizing the system dynamics to its origins.
conventional tracking control problem, the resilient tracking By using the admissible event-triggered resilient control, the
control policy μ(t) is designed as a time-driven feedback cost function in [t, +∞) can be obtained as
control policy, and here, we will give a new event-triggered
V (s(t))
tracking control scheme. zi+1
− (τ −t)

Considering the process of information transmission in com- = γ ||g(τ )||2 + Q(s(τ )) + U (μ(τ, z i )) dτ
munication networks, the event-triggered sampling mechanism t
+∞
z j +1
is developed with a monotonically increasing time sequence
as {z i }+∞ + γ− (τ −t)
||g(τ )||2 +Q(s(τ ))+U (μ(τ,z j )) dτ,
i=0 = {z 0 , z 1 , z 2 , . . . , z i , . . .}, i ∈ N , where z 0 = 0, zj
j =i+1
and the system state in the control policy is updated at each
triggering instant. By this way, an event-triggered feedback (16)
control is designed as μ(t, z i ) = μ(t, s(z i )) for all the time where time t ∈ [z i , z i+1 ).
t ∈ [z i , z i+1 ), where s(z i ) is the state at time instant z i , and To compute the optimal control, one can obtain the Hamil-
the zero-order hold can keep the control in continuous at every tonian function as
time instant z i .
Let the system state s(t) be the event-triggered state; thus, H (V, μ, s) = ∇V T (s(t)) f¯(t) + g(t)μ(t, z i ) + σ (t)
based on the event-triggering mechanism, we define the event- − ||δg || + ||g(t)||2 + Q(s(t)) + U (μ(t, z i ))
triggered condition as follows:
(17)
F (s(t)) ≤ sT (t, z i ) (12) ∞ − (τ −t)
where ||δg || = t γ ||g(τ )||2 dτ is a constant, and
where z i is the latest triggering instant at time t, and sT (t, z i ) ∇V (s(t)) denotes the partial derivative of V (s(t)) with respect
is the triggering threshold. The event-triggered state satisfies to the state s(t).
Under the event-triggered sampling mechanism, the optimal
s(t) = s(z i ) − e(t) (13) cost function at every triggering instant t = z i , i ∈ N , yields

t = zi , i ∈ N V (s(z i ))
0,
where e(t) = denotes the trig- z i+1
s(z i ) − s(t), t ∈ (z i , z i+1 ) = min γ− (τ −z i )
||g(τ )||2 +Q(s(τ ))+U (μ(τ, z i )) dτ
μ(t,z i )
gering state error. Then, the event-triggered control policy zi
+∞

becomes μ(t, z i ) = μ(s(t) + e(t)). z j +1
Define the infinite horizon integral performance index for + min γ− (τ−z i )
||g(τ )||2 +Q(s(τ ))+U(μ(τ,z j )) dτ
μ(·)
j =i+1 z j
the system dynamic (11) by
(18)
+∞
z i+1
J (s(0)) = γ− τ
||g(τ )||2 +Q(s(τ ))+U (μ(τ, z i )) dτ which satisfies the following HJB equation:
i=0 zi
(14) ∇V T (s(z i )) f¯(t) + g(t)μ∗ (t, z i ) + σ (t)
− ||δg || + ||g(t)||2 + Q(s(t)) + U (μ∗ (t, z i )) = 0. (19)
where > 0 is a discount factor, the utility function Q(s(τ ))
is selected with a quadratic form as Q(s(t)) = s T (t)Qs(t), Note that, in the general time-driven optimal control, the
Q ∈ R3 is positive definite, and the utility function U (μ(τ, z i )) HJB equation needs to hold for all time instants, and the

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:32:05 UTC from IEEE Xplore. Restrictions apply.
5506 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 32, NO. 12, DECEMBER 2021

control policy μ(t) is tuned along with the time evolution,

as well as the system state s(t) synchronous. In the event-
triggered control design, the control policy is only updated at
some triggering instants, but the system states and dynamics
are updated in real time. Then, the HJB equation is only
required to hold at every triggering instant under a specific
event-triggered sampling mechanism.
Lemma 1: By using the integrand utility function
U (μ(τ, z i )) in the cost function, the event-triggered resilient
optimal control for any time t ∈ [z i , z i+1 ) can be derived by
the form

1
μ (t, z i ) = −λ tanh (λR)−1 g T (z i )∇V (s(z i )) (20)
2
Fig. 3. Architecture of the event-triggered resilient optimal control mecha-
where μ (t, z i ) = [ μ1 (t,zi ), μ2 (t,zi ) ]T , and the control policy is nism.
bounded in the sense that ||μ1 (t, z i )|| ≤ λ1 and ||μ2 (t, z i )|| ≤
λ2 .
Proof: Based on Bellman’s optimality principle, we take in driving the RWDA vehicle to track the desired reference
the derivative of HJB equation (19) with respect to control trajectory. The architecture of event-triggered resilient optimal
μ(t, z i ) at time t = z i and have control is proposed in Fig. 3, and the event-triggered condition
⎛ ⎞ will be designed as follows.
∇V T (s(z i )) f¯(t) + g(t)μ(t, z i ) + σ (t) Lemma 2: Let μ (s(t)) be the optimal state-feedback con-
∂⎝ ⎠
trol policy at time t with the form
− ||δg || + ||g(t)||2 + Q(s(t)) + U (μ(t, z i ))

∂μ(t, z i ) 1
μ (s(t)) = −λ tanh (λR)−1 g T (s(t))∇V (s(t)) . (24)
T
= ∇V (s(z i ))g(t) + 2λ tanh−T (λ−1 μ(t, z i ))R 2
=0 (21) Then, one has ||μ (s(t))−μ∗ (s(z i ))|| ≤ λ̄||s(t)−s(z i )||, where
z i is an arbitrary triggering instant, and λ̄ > 0 is a constant.
which indicates
Proof: Consider the hyperbolic tangent function tanh(·),
2λ tanh−1 (λ−1 μ(t, z i )) = −R −1 g T (t)∇V (s(z i )). which is continuous, monotone, and bounded, and its deriv-
ative is tanh (·) = 1 − tanh2 (·). Thus, according to the
Thus, the event-triggered resilient optimal control can be hyperbolic tangent function’s properties, there is
computed by
zi+1 ||μ∗ (s(t)) − μ∗ (s(z i ))||
− (τ −z ) ≤ λ̄ (25)
μ (t, z i ) = arg min γ i
||g(τ )||2 ||s(t) − s(z i )||
μ(t,z i ) zi
for some positive constant λ̄, which indicates that ||μ (s(t)) −
+ Q(s(τ )) + U (μ(τ, z i )) dτ + V (s(z i+1 ))
μ∗ (s(z i ))|| ≤ λ̄||s(t) − s(z i )||.
1
= −λ tanh (λR)−1 g T (z i )∇V (s(z i )) (22) This gives the proof.
2 Theorem 2: Consider the autonomous driving system
for any time t ∈ [z i , z i+1 ), which is constrained as and the event-triggered sampling mechanism, the tracking
||μ1 (t, z i )|| ≤ λ1 and ||μ2 (t, z i )|| ≤ λ2 . Thus, the proof is dynamic (11) is asymptotically stable with the event-triggered
completed. resilient optimal control μ (t, z i ) when the event-triggered
Then, the optimal cost function at any time t can be further condition for any time t ∈ [z i , z i+1 ), i ∈ N , is given by
presented as ρmin (Q)
F (s(t)) = ||s(t)||2 + E(t)
V (s(t)) λ̄||L 1 (t)||
zi+1
− (τ −t)
≤ ||s(t) − s(z i )|| = sT (t, z i ) (26)
= γ ||g(τ )||2 + Q(s(τ )) + U (μ (τ, z i )) dτ
t
+∞
where E(t) = ((1/2)||g(t)||2 + U (μ (t)))/λ̄||L 1 (t)||, L 1 (t) =
z j +1 ∇V T (s(t))g(t), and ρmin (·) means the minimal eigenvalue of
+ min γ − (τ−t) ||g(τ )||2 +Q(s(τ ))+U(μ(τ, z j )) dτ.
μ(·) a matrix. Besides, the Zeno behavior is strictly ruled out for
j =i+1 z j
the vehicle’s control process.
(23)
Proof: First, we consider the tracking error dynamics of
the vehicle with the event-triggering mechanism (26).
Let us select the cost function V (s(t)) with the event-
B. Event-Triggered Optimal Control Design and Stability
triggered resilient optimal control μ (t, z i ) as a Lyapunov
Analysis
function and derive it with respect to time; thus, there is
Based on the event-triggered sampling mechanism, we will
develop the event-triggered optimal control policy μ (t, z i ) V̇ (s(t)) = ∇V T (s(t)) f¯(t) + g(t)μ (t, z i ) + σ (t) . (27)

As we pointed out above, the HJB equation can hold for Remark 1: In this article, the proposed event-triggered
every time with the time-driven optimal control μ (t); thus, mechanism (26) can effectively balance the frequency/changes
it becomes in adjusting the vehicle’s control during the running process.
1) At the vehicle’s urgent tracking stage, the error state s(t)
∇V T (s(t)) f¯(t) + g(t)μ (t) + σ (t)
is very large; then, the condition (26) is triggered with a
− ||δg || + ||g(t)||2 + Q(s(t)) + U (μ (t)) = 0. (28) shorter time interval, and the event-triggered control can
make the vehicle tracking the desired reference faster.
Note that the discount factor > 0 can be selected as 2) At the vehicle’s steady tracking stage, the error state s(t)
= ||g(t)||
2

2||δg || , and based on the vehicle system (11), there is can be in some small ranges; then, the condition (26) is
min(||g(t)||2 ) ≥ 1. triggered with a longer time interval, where the designed
Inserting the HJB equation (28) into (27), and based on the control can be unchanged, and no operation needs to be
Lemma 2, we have changed in the vehicle system.
Compared with the conventional time-based driving
V̇ (s(t))
approaches, the event-triggered mechanism can reduce much
= ∇V T (s(t)) f¯(t) + g(t)μ (t) + σ (t) energy and resources in the vehicle’s physical operation.
− L 1 (t)μ (t) + L 1 (t)μ (t, z i ) The new event-triggered control can achieve a good balance
between tracking and convenience for the autonomous vehicle.
≤ ||δg || − ||g(t)||2 − Q(s(t)) − U (μ (t))

− L 1 (t) μ (t) − μ (t, z i ) IV. S INGLE C RITIC N ETWORK -BASED E VENT-T RIGGERED
1 C ONTROL A LGORITHM
≤ − ||g(t)||2 − U (μ (t)) − ρmin (Q)||s(t)||2
2 In this section, we design the adaptive resilient event-
+ ||L 1 (t)||λ̄ ||s(t) − s(z i )|| . (29) triggered control algorithm for RWDA vehicles by combining
the event-triggered sampling mechanism and the ADP method,
Then, it can be found that V̇ (s(t)) ≤ 0 for all time
where a single critic network structure is proposed.
t ∈ [z i , z i+1 ), i ∈ N , when the event-triggering condition (26)
First, to obtain the control parameters, we design the single
holds.
critic network to approximate the optimal cost function based
Second, we prove that the Zeno behavior is strictly ruled
on the higher order approximation theorem [42] by
out for the vehicle’s control process.
The proof is given by contradiction. Suppose that the V (s(t)) = W T (s(t)) + ε(t) (32)
Zeno behavior occurs for the vehicle. Furthermore, without
where W = [ ω1 ,ω2 ,...,ωn ]T ∈ Rn×1 is the weight vector, n
loss of generality, suppose that the tracking system (11)
is the number of activation functions, (s(t)) = [φ1 (s(t)),
exhibits the Zeno behavior. Then, there is a finite time
φ2 (s(t)), . . . , φn (s(t))]T ∈ Rn×1 is the independent activation
T > 0 such that z i < T and liml>i,l→∞ z l = T . To
function vector, ε(t) is the approximated error, which satisfies
make the contradiction, we will first find that, for any i ∈
||ε(t)|| ≤ εc , and εc can be an arbitrarily small positive
N , there is a positive integer εi > 0 such that z i +
constant, when the number of activation functions n is large
εi ≤ z i+1 . Consider the function (29); when the event is
enough. It can be guaranteed that ε(t) → 0 as n → ∞, based
triggered as
on the approximation theorem [42]. Besides, in the single
F (s(t)) ≤ sT (t, z i ) (30) network framework, we also use the network for tuning the
control policy.
at the time instant is t = z i , then it yields ||s(t) − s(z i )|| = 0 Let V (k) (s(t)) = W (k)T (s(t)) be the kth approximated
(the term sT (t, z i ) is forced to zero), and the error state s(t) value of V (s(t)). Then, we design the novel adaptive resilient
will change and drop as event-triggered control, as presented in Algorithm 1, for the
1 autonomous driving system of vehicles.
V̇ (s(t)) = − ||g(t)||2 − Q(s(t)) − U (μ (t)) < 0. (31) Note that, in Algorithm 1, the learning procedure is devel-
2
oped based on the policy iteration learning process, and the
After that, along with the change of error s(t), when the time well-known least-squares method is applied to update the
t ∈ [z i , z i+1 ), there is weight parameters for policy evaluation as
Y
F (s(t)) > sT (t, z i ). X
!" !
"
It means that the term ||L 1 (t)||λ̄[||s(t) − s(z i )||] in the time W (k)T · ∇(s(t)) f¯(t) + g(t)μ(k) (t) + σ (t)
interval t ∈ [z i , z i+1 ) will increase from 0 to 12 ||g(t)||2 + Z
! "
U (μ (t)) + ρmin (Q)||s(t)||2 > 0 to trigger the condition (26)
= ||δg || − ||g(t)||2 − Q(s(t)) − U (μ(k) (t)) (35)
again. As we known, the term ||g(t)|| ≥ 1 > 0 in the vehicle
for all time; thus, there must exist a time interval εi > 0 where W (k)T can be solved by X = ZY T (YY T )−1 . Besides,
such that z i + εi ≤ z i+1 . Thus, z i < T , T = liml>i,l→∞ z l = it has X = ZY −1 if the generated data make Y a full-rank
∞
z i + l=i εl = +∞, which contradicts that T is a finite time. square matrix. Then, we insert the event-triggered sampling
The proof is completed. mechanism into the controlling procedure of the autonomous

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:32:05 UTC from IEEE Xplore. Restrictions apply.
5508 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 32, NO. 12, DECEMBER 2021

Algorithm 1 Adaptive Resilient Event-Triggered Control Moreover, we have

Algorithm for the Autonomous Driving System of Vehicles
Initialization V (k) (s(t)) → V (s(t))
(36)
Select an initial admissible control μ(1) (t), set k = 1, > μ(k) (s(t)) → μ (s(t))
0 is a small number and propagate the tracking procedure
along with time t ∈ [0, +∞) when k → +∞ for any state s(t) ∈ (s).
Learning Procedure Proof: First, let us consider the difference between cost
1: while V (k+1) − V (k) > do
functions V (k+1) (s(t)) and V (k) (s(t)) along with the system
2: (Policy Evaluation) Solve V (k) (s) = W (k)T (s) from the
dynamic under control μ(k+1) (s(t)) ∀s(t); we have
Bellman’s equation:
T V (k+1) (s(t)) − V (k) (s(t))
+∞
∇V (k) (s(t)) f¯(t) + g(t)μ(k) (t) + σ (t) T
= ∇V (k+1) (s(τ )) f¯(τ ) + g(τ )μ(k+1) (τ ) + σ (τ ) dτ
− ||δg || + ||g(t)||2 + Q(s(t)) + U (μ(k) (t)) = 0 t
+∞
T
(33) − ∇V (k) (s(τ )) f¯(τ ) + g(τ )μ(k+1) (τ ) + σ (τ ) dτ.
t
3: (Policy Improvement) Tune the control μ(k) (t) by using: (37)

(k+1) 1 −1 T (k)
μ (t) = −λ tanh (λR) g (t)∇V (s(t)) (34) According to the policy evaluation equation (33), there are
2
(k+1) T
4: Set k = k + 1 V (s(t)) f¯(t) + σ (t)
5: end while T
Controlling Procedure = − V (k+1) (s(t))g(t)μ(k+1) (t) + ||δg || − ||g(t)||2
1: for i ∈ N do − Q(s(t)) − U (μ(k+1) (t))
2: Run the autonomous driving system (11) and use the
event-triggered resilient optimal control policy μ (t, z i ) and
for the time t ∈ [z i , z i+1 ), which has the same parameters
T
with the feedback control μ(k) (t) obtained from the V (k) (s(t)) f¯(t) + σ (t)
iteration learning processes (33) and (34) T
3: if the system state s(t) satisfies the designed triggering = − V (k) (s(t))g(t)μ(k) (t) + ||δg || − ||g(t)||2
condition (26): − Q(s(t)) − U (μ(k) (t)).
F (s(t)) ≤ sT (t, z i )
which makes (37) to be
then
4: Update the control policy for driving the autonomous V (k+1) (s(t)) − V (k) (s(t))
+∞
vehicle as μ (t, z i+1 ) for the time t ∈ [z i+1 , z i+2 ) T
5: Set i = i + 1 = ∇V (k+1) (s(τ ))g(τ )μ(k+1) (τ )dτ
6: end if
t
+∞
T
7: end for − ∇V (k+1) (s(τ ))g(τ )μ(k+1) (τ ) − ||δg ||
End Procedures
t

+ ||g(t)||2 + Q(s(τ )) + U μ(k+1) (τ ) dτ
+∞
T
− ∇V (k) (s(τ ))g(τ )μ(k+1) (τ )dτ
t +∞
T
driving system and achieve a new control scheme from + ∇V (k) (s(τ ))g(τ )μ(k) (τ ) − ||δg ||
Algorithm 1.
t

+||g(t)||2 + Q(s(τ )) + U μ(k) (τ ) dτ. (38)
Let us suppose that the number of activation functions n is
selected large enough, and we will provide the convergence
Based on the definition (15) and using (34), it will become
analysis of the learning procedure shown in Algorithm 1 in
the following theorem.
V (k+1) (s(t)) − V (k) (s(t))
Theorem 3: Suppose that the control policy μ(k) (t) is +∞

updated according to (34), and the cost function V (k) (s(t)) =− − ||δg ||+||g(τ )||2 +Q(s(τ ))+U μ(k+1) (τ ) dτ
is obtained by solving (33) with the boundary condition t +∞
V (k) (0) = 0, where 0 is the zero vector, k = 0, 1, . . ., T
+ − ∇V (k) (s(τ ))g(τ )μ(k+1) (τ )
and the optimal cost function V (s(t)) ≥ 0 is smooth on t
a compact domain of validity (s). Then, the sequence T
+ ∇V (k) (s(τ ))g(τ )μ(k) (τ )
{V (k) (s(t)), k = 0, 1, . . .} is monotonically nonincreasing, i.e.,
V (s(t)) ≤ V (k+1) (s(t)) ≤ V (k) (s(t)) for all s(t) ∈ (s). − ||δg || + ||g(τ )||2 + Q(s(τ )) + U μ(k) (τ ) dτ (39)

which can be further derived as

V (k+1) (s(t)) − V (k) (s(t))
+∞

=− U μ(k+1) (τ ) − U μ(k) (τ ) dτ
t +∞

+ 2λ tanh−T λ−1 μ(k) (τ ) R(μ(k+1) (τ )−μ(k) (τ )) dτ
t
+∞ μ(k+1)
=− [2λ tanh−T λ−1 υ ]Rdυ
t μ(k)

−2λ tanh−T λ−1 μ(k) (τ ) R μ(k+1) (τ ) − μ(k) (τ ) dτ. (40)

According to the first integration mid-value theorem,

we have
μ(k+1)
Fig. 4. Evolution of parameters in weight vector W.
2λ tanh−T (λ−1 υ) Rdυ
μ(k)

= 2λ tanh−T (λ−1 μ̄)R μ(k+1) − μ(k) (41)
where the elements in μ̄ are between the corresponding
elements in μ(k) and μ(k+1) . As the function tanh−1 (·) is a
monotone increasing function, it yields

ϕ(λ−1 μ̄) i ≥ ϕ(λ−1 μ(k) ) i , μ(k+1) i ≥ μ(k) i
(42)
ϕ(λ−1 μ̄) i ≤ ϕ(λ−1 μ(k) ) i , μ(k+1) i ≤ μ(k) i

where the function ϕ(·) = tanh−T (·) and [·]i means the i th
element of a vector. Thus, there is
V (k+1) (s(t)) − V (k) (s(t)) ≤ 0 (43)
and based on the definition of optimal cost function, we have
Fig. 5. X-Y diagram of driving trajectories from two autonomous vehicles.
V (s(t)) ≤ V (k+1) (s(t)) ≤ V (k) (s(t)) (44)
for any state s(t) ∈ (s), which indicates that the sequence λ1 = λ2 = 2, and we adopt the uncertain signals as shown
{V (k) (s(t)), k = 0, 1, . . .} is monotonic nonincreasing. in [41] with sensor attacks α(t) = −(0.75 + 0.15 sin(2.5t)),
Second, since (s) is a compact set, according to t ≥ 0, and actuator attacks β(t) = [ 1, 1 ]T 0.005 cos(2.5t) +
Dini’s theorem [43], the monotonic nonincreasing sequence [ 0.1 cos(2t), 0.5 sin(t) ]T 0.2 sin(x e (t)) cos(ye (t)), t ≥ 0.
{V (k) (s(t)), k = 0, 1, . . .} will uniform pointwise converge to The initial system state of a desired reference is
the optimal cost function V (s(t)) as V (k) (s(t)) → V (s(t)), selected as [ xr (0), yr (0), θr (0) ]T = [ 0, 0, 0 ]T , and the desired
along with k → +∞. longitudinal velocity and yaw angular velocity are selected
Finally, from the optimal cost function, the uniform con- as vr (t) = 0.5 (m/s) and wr (t) = 0 (rad/s) in the reference
vergence of the control sequence {μ(k) (t), k = 0, 1, . . .} policy u r (t). The parameters in the cost function (23) are
is also achieved during the iteration learning procedure as chosen as γ = 2, = 0.5||g(t)||2 /||δg ||, and positive definite
μ(k) (s(t)) → μ (s(t)), along with k → +∞. matrices in utility functions are Q = diag(20, 20, 20) and
The proof is, thus, completed. R = diag(1, 1). The activation function vector of optimal
cost function is selected as (s) = [s14 , s24 , s34 , s12 s22 , s12 s32 , s22 s32 ,
V. S IMULATION C ASE S TUDIES s12 s2 s3 , s1 s22 s3 , s1 s2 s32 , s13 s2 , s13 s3 , s1 s23 , s1 s33 , s2 s33 , s23 s3 , sin(s1 ),
sin(s2 ), sin(s3 ), cos(s1 ), cos(s2 ), cos(s3 )]T , and the initial
We now apply our approach to a simulated autonomous
weight vector is W (0) = rands(21, 1). Then, the evolution of
driving system (10) of vehicles, where the modified RWDA
the weight vector is presented in Fig. 4, where the parameters
vehicle’s dynamic function [39] becomes
get converged during the learning procedure, as shown in
ṡ(t) = f (s(t)) + g(s(t))(μ(t) + u r (t)) + σ (t) (45) Algorithm 1.
To verify the availability of the designed resilient event-
wx (t)
where s(t) = [ xe (t), ye (t), θe (t) ]T , μ(t) = v x (t) , f (t) = triggered control scheme, we simulate two autonomous driving
vr (t) cos(θe (t)) ye (t) −1 systems of RWDA vehicles at different initial system states,
vr (t) sin(θe (t)) , g(t) = −xe (t)−dr 0 , σ (t) = α(t) + g(t)β(t), which are selected as [ −1.2, 1.2, 0.5 ]T for the first vehicle, and
wr (t) −1 0
dr = 1.2 (m) is the direct distance from the mass cen- [ 1.2, −1.2, −0.5 ]T for the second vehicle. The X-Y diagram of
ter to the rear axle in the vehicle, λ = diag(λ1 , λ2 ) > driving trajectories is shown in Fig. 5, where we can find
0 is the saturating bound matrix of the control vector, that the two autonomous vehicles are driven to approach the

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:32:05 UTC from IEEE Xplore. Restrictions apply.
5510 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 32, NO. 12, DECEMBER 2021

Fig. 8. Tracking error trajectory under uncertain attacking signals of the

autonomous driving system.

obtained vehicle’s control inputs are constrained and sat-

Fig. 6. Evolution of the triggering condition and sampling period. isfy the saturating bound condition u 1 (t, z i ) ∈ [ −2, 2 ] and
u 2 (t, z i ) ∈ [ −1.5, 2.5 ] for all time instants. Moreover, the
comparison between event-triggered and time-driven control
trajectories is also presented in Fig. 7 clearly, in which the
changing/adjusting times of the vehicle’s control with the
designed event-triggered method are very smaller than
the time-driven method during the vehicle’s running process.
In addition, the tracking error trajectories of the first vehicle
are presented in Fig. 8, where the tracking errors from the
autonomous driving system can be found approaching a region
around zero under the proposed event-triggered resilient track-
ing control as predicted by theorems, and although the uncer-
tain attacking signals occur on actuator and sensor, it further
demonstrates the effectiveness of the designed method.

VI. C ONCLUSION
In this article, we have designed an adaptive resilient event-
triggered control method for an autonomous driving system
of RWDA vehicles. Based on the kinematic equation, the
Fig. 7. Event- and time-driven control inputs for vehicles. tracking control objective was proposed, and DoS attacks
are considered in the driving dynamic system. An event-
reference under the proposed event-triggered control method, triggering condition was presented, which leads to a spe-
although the uncertain attacking signals occur on the actuator cific event-triggering mechanism that determines when the
and the sensor. control policy was updated and can effectively balance the
Besides, Fig. 5 also shows the advantage of designed frequency/changes in adjusting the vehicle’s control during the
event-triggered control, as we have pointed out in Remark 1. running process. To overcome the challenge of DoS attacks
Then, the corresponding event-triggering evolution processes injected in the tracking system dynamics, the new adaptive
of the sampling mechanism are depicted in Fig. 6, where the resilient control algorithm was designed based on an iterative
triggering functions F (s(t)) and sT (t, z i ) are given by the single critic learning framework, which can also reduce the
condition (26). The sampling period of the designed algorithm control operations of autonomous vehicles. Specific parameter
is also shown in the figure, which displays every sampling updating strategy was introduced to determine the control
instant, and when the event (26) is triggered, the triggering policy. Finally, the simulation result has clearly shown the
function sT (t, z i ) will be forced to zero as desired. effectiveness of our proposed method.
Along with the procedures in Algorithm 1, the developed
event-triggering control policy is obtained, which is deter- R EFERENCES
mined according to the event-triggering condition (26) and
[1] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for
the achieved parameters of the weight vector. Fig. 7 displays nonlinear systems with saturating actuators using a neural network HJB
the event-driven control input that the autonomous vehicle approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005.
receives in the driving process, which contains the event-driven [2] D. Liu, D. Wang, and H. Li, “Decentralized stabilization for a class of
continuous-time nonlinear interconnected systems using online learning
resilient tracking control policy μ(t, z i ) and the reference optimal control approach,” IEEE Trans. Neural Netw. Learn. Syst.,
policy u r (t). As the reference policy u r (t) = [ 0, 0.5 ]T , the vol. 25, no. 2, pp. 418–428, Feb. 2014.

[3] H. He, S. Chen, K. Li, and X. Xu, “Incremental learning from stream [24] C. Zhao, J. He, and Q.-G. Wang, “Resilient distributed optimization
data,” IEEE Trans. Neural Netw., vol. 22, no. 12, pp. 1901–1914, algorithm against adversarial attacks,” IEEE Trans. Autom. Control,
Dec. 2011. vol. 65, no. 10, pp. 4308–4315, Oct. 2020, doi: 10.1109/TAC.2019.
[4] H. Zhang, K. Zhang, Y. Cai, and J. Han, “Adaptive fuzzy fault-tolerant 2954363.
tracking control for partially unknown systems with actuator faults [25] H. Modares, B. Kiumarsi, F. L. Lewis, F. Ferrese, and A. Davoudi,
via integral reinforcement learning method,” IEEE Trans. Fuzzy Syst., “Resilient and robust synchronization of multiagent systems under
vol. 27, no. 10, pp. 1986–1998, Oct. 2019. attacks on sensors and actuators,” IEEE Trans. Cybern., vol. 50, no. 3,
[5] C. Mu, Z. Ni, C. Sun, and H. He, “Air-breathing hypersonic vehicle pp. 1240–1250, Mar. 2020.
tracking control based on adaptive dynamic programming,” IEEE Trans. [26] Z. Wang, Y. Wu, L. Liu, and H. Zhang, “Adaptive fault-tolerant
Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 584–598, Mar. 2017. consensus protocols for multiagent systems with directed graphs,” IEEE
[6] J. Biggs and W. Holderbaum, “Optimal kinematic control of an Trans. Cybern., vol. 50, no. 1, pp. 25–35, Jan. 2020.
autonomous underwater vehicle,” IEEE Trans. Autom. Control, vol. 54, [27] C. Chen et al., “Resilient adaptive and H∞ controls of multi-agent sys-
no. 7, pp. 1623–1626, Jul. 2009. tems under sensor and actuator faults,” Automatica, vol. 102, pp. 19–26,
[7] B. Kiumarsi, F. L. Lewis, and Z.-P. Jiang, “H∞ control of linear discrete- Apr. 2019.
time systems: Off-policy reinforcement learning,” Automatica, vol. 78, [28] D. Meng and K. L. Moore, “Studies on resilient control
pp. 144–152, Apr. 2017. through multiagent consensus networks subject to disturbances,”
[8] T. Bian, Y. Jiang, and Z.-P. Jiang, “Adaptive dynamic programming for IEEE Trans. Cybern., vol. 44, no. 11, pp. 2050–2064,
stochastic systems with state and control dependent noise,” IEEE Trans. Nov. 2014.
Autom. Control, vol. 61, no. 12, pp. 4170–4175, Dec. 2016. [29] Y. Tang, D. Zhang, P. Shi, W. Zhang, and F. Qian, “Event-based
[9] Q. Wei, D. Liu, Q. Lin, and R. Song, “Discrete-time optimal control formation control for nonlinear multiagent systems under DoS attacks,”
via local policy iteration adaptive dynamic programming,” IEEE Trans. IEEE Trans. Autom. Control, vol. 66, no. 1, pp. 452–459, Jan. 2021,
Cybern., vol. 47, no. 10, pp. 3367–3379, Oct. 2017. doi: 10.1109/TAC.2020.2979936.
[10] F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for [30] Y. Yuan, Z. Wang, P. Zhang, and H. Liu, “Near-optimal resilient
partially observable dynamic processes: Adaptive dynamic programming control strategy design for state-saturated networked systems under
using measured output data,” IEEE Trans. Syst., Man, Cybern. B, stochastic communication protocol,” IEEE Trans. Cybern., vol. 49, no. 8,
Cybern., vol. 41, no. 1, pp. 14–25, Feb. 2011. pp. 3155–3167, Aug. 2019.
[11] H. He and X. Zhong, “Learning without external reward,” IEEE Comput. [31] E. Mousavinejad, X. Ge, Q.-L. Han, F. Yang, and L. Vlacic, “Resilient
Intell. Mag., vol. 13, no. 3, pp. 48–54, Aug. 2018. tracking control of networked control systems under cyber attacks,”
[12] Z. Ni, H. He, X. Zhong, and D. V. Prokhorov, “Model-free dual heuristic IEEE Trans. Cybern., early access, Nov. 8, 2019, doi: 10.1109/TCYB.
dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, 2019.2948427.
no. 8, pp. 1834–1839, Aug. 2015. [32] M. Xue, Y. Tang, W. Ren, and F. Qian, “Practical output synchro-
[13] K. Zhang, H.-G. Zhang, Y. Cai, and R. Su, “Parallel optimal tracking nization for asynchronously switched multi-agent systems with adap-
control schemes for mode-dependent control of coupled Markov jump tion to fast-switching perturbations,” Automatica, vol. 116, Jun. 2020,
systems via integral RL method,” IEEE Trans. Autom. Sci. Eng., vol. 17, Art. no. 108917.
no. 3, pp. 1332–1342, Jul. 2020, doi: 10.1109/TASE.2019.2948431. [33] J. Hu, J. Shen, and D. Lee, “Resilient stabilization of
[14] E. B. Kosmatopoulos and A. Kouvelas, “Large scale nonlinear control switched linear control systems against adversarial switching,”
system fine-tuning through learning,” IEEE Trans. Neural Netw., vol. 20, IEEE Trans. Autom. Control, vol. 62, no. 8, pp. 3820–3834,
no. 6, pp. 1009–1023, Jun. 2009. Aug. 2017.
[15] H. Zhang, L. Cui, X. Zhang, and Y. Luo, “Data-driven robust approx- [34] J. Dong and G.-H. Yang, “Reliable state feedback control of T–S fuzzy
imate optimal tracking control for unknown general nonlinear systems systems with sensor faults,” IEEE Trans. Fuzzy Syst., vol. 23, no. 2,
using adaptive dynamic programming method,” IEEE Trans. Neural pp. 421–433, Apr. 2015.
Netw., vol. 22, no. 12, pp. 2226–2236, Dec. 2011. [35] M. Zhu and S. Martinez, “On the performance analysis of resilient
[16] Y.-M. Park, M.-S. Choi, and K. Y. Lee, “An optimal tracking neuro- networked control systems under replay attacks,” IEEE Trans. Autom.
controller for nonlinear dynamic systems,” IEEE Trans. Neural Netw., Control, vol. 59, no. 3, pp. 804–808, Mar. 2014.
vol. 7, no. 5, pp. 1099–1110, Sep. 1996. [36] J. Dong, Y. Wu, and G.-H. Yang, “A new sensor fault isolation
[17] H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear method for T–S fuzzy systems,” IEEE Trans. Cybern., vol. 47, no. 9,
partially-unknown constrained-input systems using integral reinforce- pp. 2437–2447, Sep. 2017.
ment learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, Jul. 2014. [37] Y. Yuan, P. Zhang, Z. Wang, and L. Guo, “On resilient strategy
[18] K. Zhang, H. Zhang, Y. Mu, and C. Liu, “Decentralized tracking design of multi-tasking optimal control for state-saturated systems with
optimization control for partially unknown fuzzy interconnected systems nonlinear disturbances: The time-varying case,” Automatica, vol. 107,
via reinforcement learning method,” IEEE Trans. Fuzzy Syst., early pp. 138–145, Sep. 2019.
access, Jan. 13, 2020, doi: 10.1109/TFUZZ.2020.2966418. [38] A.-Y. Lu and G.-H. Yang, “Observer-based control for cyber-physical
[19] B. Luo, D. Liu, T. Huang, and D. Wang, “Model-free optimal tracking systems under denial-of-service with a decentralized event-triggered
control via critic-only Q-learning,” IEEE Trans. Neural Netw. Learn. scheme,” IEEE Trans. Cybern., vol. 50, no. 12, pp. 4886–4895,
Syst., vol. 27, no. 10, pp. 2134–2144, Oct. 2016. Dec. 2020, doi: 10.1109/TCYB.2019.2944956.
[20] D. Wang, C. Mu, D. Liu, and H. Ma, “On mixed data and event driven [39] G. Ma, M. Ghasemi, and X. Song, “Integrated powertrain energy man-
design for adaptive-critic-based nonlinear H∞ control,” IEEE Trans. agement and vehicle coordination for multiple connected hybrid electric
Neural Netw. Learn. Syst., vol. 29, no. 4, pp. 993–1005, Apr. 2018, vehicles,” IEEE Trans. Veh. Technol., vol. 67, no. 4, pp. 2893–2899,
doi: 10.1109/TNNLS.2016.2642128. Apr. 2018.
[21] X. Yang and H. He, “Adaptive critic designs for event-triggered robust [40] Z. P. Jiang and H. Nijmeijer, “Tracking control of mobile robots: A case
control of nonlinear systems with unknown dynamics,” IEEE Trans. study in backstepping,” Automatica, vol. 33, no. 7, pp. 1393–1399,
Cybern., vol. 49, no. 6, pp. 2255–2267, Jun. 2019. Jul. 1997.
[22] L. Dong, X. Zhong, C. Sun, and H. He, “Event-triggered adaptive [41] X. Jin, W. M. Haddad, and T. Yucelen, “An adaptive control architecture
dynamic programming for continuous-time systems with control con- for mitigating sensor and actuator attacks in cyber-physical systems,”
straints,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 8, IEEE Trans. Autom. Control, vol. 62, no. 11, pp. 6058–6064, Nov. 2017.
pp. 1941–1952, Aug. 2017. [42] B. A. Finlayson, The Method of Weighted Residuals and Variational
[23] D. Wang, M. Ha, and J. Qiao, “Self-learning optimal regulation for Principles. New York, NY, USA: Academic, 1990.
discrete-time nonlinear systems under event-driven formulation,” IEEE [43] R. G. Bartle and D. R. Sherbert, Introduction to Real Analysis, 3rd ed.
Trans. Autom. Control, vol. 65, no. 3, pp. 1272–1279, Mar. 2020. New York, NY, USA: Wiley, 2000.

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:32:05 UTC from IEEE Xplore. Restrictions apply.