0% found this document useful (0 votes)
35 views11 pages

A New Reinforcement Learning Fault-Tolerant Tracking Control Method With Application To Baxter Robot

Fault tolerant control

Uploaded by

Huynh Duy Phuong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views11 pages

A New Reinforcement Learning Fault-Tolerant Tracking Control Method With Application To Baxter Robot

Fault tolerant control

Uploaded by

Huynh Duy Phuong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 29, NO.

2, APRIL 2024 1331

A New Reinforcement Learning Fault-Tolerant


Tracking Control Method With
Application to Baxter Robot
Jun-Wei Zhu , Member, IEEE, Zi-Yuan Dong , Zhi-Jun Yang , and Xin Wang , Member, IEEE

Abstract—The fault-tolerant control problem of the flex- and other fields [1], [2], [3]. The multijoint robotic arm has
ible multijoint manipulator is a difficult issue due to its greatly promoted the upgrading of the traditional equipment
strong nonlinearity and coupling. This article proposes manufacturing industry [4]. However, the multijoint robot is
a reinforcement learning (RL) based model-free adaptive
fault-tolerant control (MFAFTC) algorithm for the multi-joint a typical nonlinear, strongly coupled, time-varying multiple-
manipulator. First, a parameter estimation mode switch- input–multiple-output system. These features can easily result
ing mechanism is designed based on the two dimensions in heavy chattering under traditional PID controllers, hence, it
of the time axis and the sampling period, where an iter- brings a challenge to the design of a reliable controller [5], [6],
ative estimation structure is introduced to identify some [7]. Furthermore, if the fault occurs in some components of
key parameters online accurately. Meanwhile, the radial ba-
sis function neural network is used to identify the spring the robots, the control performance will get worse. Thus, the
interference as well as actuator fault, and a compensa- multijoint robot fault-tolerant control (FTC) issue is becoming
tion fault-tolerant control strategy is proposed. Moreover, a research focus.
the computation complexity is optimized via designing the To date, many works about FTC have been reported; see [8],
critic–actor mechanism with an event-trigger parameter se-
[9], and [10]. Some researchers attempt to achieve a good
lection strategy. Finally, the superiority and effectiveness of
the proposed method are verified by the application to the FTC performance by building fault estimators. Various types
Baxter robot. of observers can be designed based on system model parame-
ters. Common fault reconstruction methods in robotic systems
Index Terms—Baxter robot, flexible multijoint manipula-
tor, model-free fault-tolerant control (FTC), reinforcement
include robust observer [11], adaptive observer [12], sliding
learning (RL). mode observer [13], neural network observer [14], etc. Due
to the inherent robustness of the sliding mode observer to
I. INTRODUCTION system uncertainties and external disturbances, a sliding mode
observer-based FTC is presented in [15]. However, the design
URING the past decade, the application of robotics has
D gained increasing attention owing to their wide application
in reducing production costs, improving production efficiency,
of the controller demands a priori knowledge of bounded faults,
which may be difficult to obtain in practical terms. To relax the
sliding mode observer requirement for bounded faults, in [16],
an observer strategy based on a neural sliding model is designed
Manuscript received 23 February 2023; revised 8 May 2023; accepted to perform robust fault diagnosis. In [17], an online fault es-
24 August 2023. Date of publication 13 September 2023; date of current
version 18 April 2024. Recommended by Technical Editor D. Dong and timation algorithm based on time-delay estimation (TDE) is
Senior Editor K. J. Kyriakopoulos. This work was supported in part proposed to approach real fault’s magnitude. Similarly, in [18],
by the Key Research and Development Program of Zhejiang under a set of intermediate estimators is proposed to estimate the
Grant 2022C01018, in part by the National Natural Science Foundation
of China under Grant U21A20122, in part by the Zhejiang Provincial actuator fault based on which a compensating FTC strategy
Natural Science Foundation of China under Grant LZ21F030004, and in is designed. Since fault is an event, in [19], the problem of
part by the National Natural Science Foundation of China under Grant event-triggered estimation-based FTC is also discussed. In most
U21B2001. (Corresponding author: Jun-Wei Zhu.)
Jun-Wei Zhu and Zi-Yuan Dong are with the Institute of Cy- of the existing research works, the aforementioned fault recon-
berspace Security, College of Information Engineering, Zhejiang struction techniques have been successfully applied to the FTC
University of Technology, Hangzhou 310023, China (e-mail: jun- issues of robots. Although these methods can achieve acceptable
[email protected]; [email protected]).
Zhi-Jun Yang is with the Jack Technology, Company Ltd, Taizhou control performance, it is worth noting that most of the FTC
318000, China (e-mail: [email protected]). techniques are in the framework of model-based methods [20].
Xin Wang is with the School of Mathematical Science, Heilongjiang However, model-based methods require prior knowledge of the
University, Harbin 150080, China, and also with the Heilongjiang
Provincial Key Laboratory of the Theory and Computation of Complex robot dynamics model, which could be hard to acquire and very
Systems, Heilongjiang University, Harbin 150080, China (e-mail: xin- time consuming in practical terms. In addition, the conventional
[email protected]). control strategy directly linearizes the nonlinear system around
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TMECH.2023.3309888. its equilibrium point, and this control design concept may bring
Digital Object Identifier 10.1109/TMECH.2023.3309888 the big model approximation error.

1083-4435 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.
1332 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 29, NO. 2, APRIL 2024

In recent years, data-driven control has attracted more and systems. In addition, there is a strong coupling between joints
more attention [21], [22], [23]. The subspace identification in the real flexible manipulator system, so it is difficult to apply
method is a common data-driven method [24], [25]. The basic RL directly to the system with completely unknown model.
idea of the scheme is to use the online or offline I/O data In light of the aforementioned analysis, a model-free rein-
of the controlled system to design the controller without the forcement learning FTC scheme (RLFTC) for multijoint robots
explicit mathematical models of the system. In [26] and [27], is proposed in this work. The major contributions are shown
an improved method is revised based on standard principal follows.
component analysis to achieve a consistent estimation of the 1) In this article, the FTC problem of the flexible multijoint
system matrix. To achieve better robustness, Ding et al. [28] robot under the actuator fault is considered, and the policy
propose that all stabilization controllers can be implemented iteration based on the A/C structure is applied to solve the
with the observer equivalently. In [28], the estimated output inherent heavy chattering problem of the Baxter robot
error is introduced into the controller as a residual signal, thus compared with its own PID controller. Compared with
enhancing the stability and robustness issue in the design of most existing FTC results, this method does not involve
feedback controllers. However, such methods are only suitable any information about the robot’s dynamic model, even
for time-invariant linear systems. For FTC problems of nonlinear including the system order.
time-varying systems, a model-free adaptive control (MFAC) 2) The computation complexity is optimized by designing a
method is proposed by Hou et al. [29], [30]. The primary critic–actor mechanism with a cost function in terms of
approach is to model an equivalent dynamic linearization at control performance. Moreover, compared with the rigid
the current working point of the controlled system rather than parameter updating mechanism [31], an event-trigger pa-
building a general discrete-time nonlinear system. In [31], the rameter selection strategy is introduced to avoid unnec-
compact form dynamic linearization is used to build the data essary loop iterations.
model. In [32], the partial form dynamic linearization (PFDL) 3) Compared with the existing MFAC methods, a parameter
considers the effects of output change and input change on the estimation mode switching mechanism based on two
system at the same time. Fault-tolerance performance is guar- dimensions of the time axis and the sampling period is
anteed by the neural network algorithm used to recompense the designed. Compared with the single-dimensional param-
fault function. However, these methods ignore that the current eter estimation approach [29], the RLFTC is shown to
output change of the system may also be related to a series of have better robustness against the disturbance, including
historical control inputs. spring interference.
Reinforcement learning (RL) is a class of machine algorithms The rest of this article is organized as follows. The specific
that can effectively improve the estimation performance of con- studied issues are outlined in Section II; Section III presents the
trol parameters and permit the design of adaptive controllers design of the RLFTC method; Section IV is the experimental
with real-time online learning capacities [33], [34], [35]. RL is results on the Baxter robot. Finally, Section V concludes this
characterized by intelligence and learning ability. It interacts article.
with the environment with the goal of finding the optimal
strategy to maximize the value function. Common learning II. PRELIMINARY KNOWLEDGE AND PROBLEM STATEMENT
algorithms include policy iteration, value iteration, integral RL
A. System Description
(IRL), Q-learning, and so on [36], [37]. Most RL algorithms in-
clude two parts: policy evaluation (PE) and policy improvement Considering n-degrees-of-freedom robot in the joint space,
(PI) [38]. In [39], the RL controller is designed to get the optimal the dynamic model [40] of a continuous time is given by
function and the optimal control policy. The value function is
Mj (q)q̈ + Cj (q, q̇)q̈ + Gj (q) = u − τd (1)
used to evaluate whether the current action is close to the optimal
solution in this method. Note that the value function can be where q, q̇, and q̈, respectively, represent the joints positions,
designed with respect to the optimality objective, but depends to velocities, and accelerations, Mj (q) is the definite inertia matrix,
some extent on the system model, so the RL algorithms in these Cj (q, q̇) is the centrifugal force matrix of the manipulator, Gj (q)
methods cannot be applied to systems with partially unknown or is the gravitational torques, u is the control torque vector, and
completely unknown models. In [35], the controller is based on τd represents the unknown disturbance such as the interference
an actor/critic (A/C) framework and takes advantage of neural of springs in flexible joints.
networks to parametrically represent the control strategy and the Considering the fault that occurs either in the driving motors
performance of the control system. With the coming of more and or in the corresponding gear trains due to aging of actuators [40],
more advanced RL methods, in [36], a new adaptive Q-learning the actual control torque u of the manipulator can be expressed
structure is proposed for ensuring the presented performance. as
In the iterative process, the optimal gain is solved based on the
u = α1 u + α2 (2)
Bellman equation. Through reducing the number of repetitions,
the algorithm can converge to the optimal solution faster, thus where u is the nominal input, α1 ∈ (0, 1) is the multiplicative
improving the performance. However, most of the proposed fault factor, α2 < α2 max < ∞ is the additive fault factor, and
methods are for linear systems or the systems after linearization, α2 max is the unknown upper bound. If the actuator is not faulty,
and still cannot deal with relatively complex nonlinear robot α1 = 1, α2 = 0.

Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: NEW REINFORCEMENT LEARNING FAULT-TOLERANT TRACKING CONTROL METHOD 1333

Significantly, the disturbance is difficult to estimate accu- dynamics, which is hard to acquire and very time consuming
rately, and actuator fault factors α1 and α2 are unpredictable. in practice. In addition, it is worth mentioning that traditional
Moreover, the manipulator’s joints are highly coupled, and the model-free robust control methods [29], [30] generally show
motion is complicated. Therefore, it is difficult to determine poor control performance and low flexibility in handling the FTC
inertia matrix Mj (q), Coriolis moment vector Cj (q), etc., of of the flexible robotic arms. For instance, a typical problem is
the robot system model. However, a large amount of I/O data is the intrinsic heavy chattering problem of the multijoint robots
generated during the motion of the manipulator. Based on this under traditional PID controllers. Motivated by these reasons,
motion information, the dynamic model can be represented as a this article investigates the FTC issue for the considered flexible
nonlinear nonaffine discrete system. robotic arm.
Considering a robot arm with an actuator fault, the generic
dynamics model can be represented as follows: III. MAIN RESULTS
y(k + 1) = f (y(k), . . ., y(k − m), u(k), . . ., A. Design of the RLFTC Strategy
u(k − n), fs (k)) (3) Here, an RLFTC algorithm is presented. First, define the
following value function:
where m and n are nonnegative factors, f (.) is the unknown non-
linear function, y(k) represents the joint position information at QL (k) = |y ∗ (k + 1)−y(k + 1)|2 + μ(||Φ1 (k) − Φ1 (k − 1)||2
the kth moment, and fs (k) ∈ R is a fault function containing
+ |UL (k) − UL (k − 1)||2 ) (6)
information such as disturbances.
Define UL (k) = [u(k), . . ., u(k − L + 1)]T as a vector con- where μ > 0 is the weight factor.
sisting of system control input within a sliding time window QL (k) is first proposed to establish a parameter estimation
[k − L + 1, k]. Fs (k) = [fs (k), . . ., fs (k − L + 1)]T is a vec- mode switching mechanism. Then, the online optimal con-
tor consisting of fault function fs (k) within a time window of a trol problem is to find the optimal update control law u∗ (k)
certain length [k − L + 1, k]. that minimizes the aforementioned cost function, i.e., u∗ (k) =
Similar to [41], the following three assumptions are intro- argu(k) min Q. As we know, the robot dynamics only evolves
duced: with time k. However, in the RLFTC, in addition to the update
1) f (.) can solve partial derivatives for u(k) and fs (k); along the time domain, an L-step update process in another di-
2) system satisfies the generalized Lipschitz condition; mension is considered at each sampling cycle to obtain superior
3) there is a known bound fs max ∈ R such that ||fs (k)|| ≤ performance in parameter estimation. The parameter L has the
fs max . following two new functions:
Then, when ||ΔFc (k)|| = 0, there must exist time-varying 1) The adopted L comprehensively considers the relation-
parameter vectors Φ1 (k) ∈ RL called pseudo partial derivative ship between the output change at the next moment and
(PPD) and Φ2 (k) ∈ RL , such that the system (3) can be con- the input change in a sliding time window;
verted into the dynamic equation 2) L is a mode selection parameter in each control cycle,
where it is used as an indicator for the online estimation
Δy(k + 1) = ΦT1 (k)ΔUL (k) + ΦT2 (k)ΔFs (k) (4)
of the PPD parameter Φ̂1 (k).
where Φ1 (k) = [φ1 (k), φ2 (k), . . ., φL (k)]T , Φ2 (k) = The optimal Φ̂∗1 (k) can be determined by QL (k).
[φ2 1 (k), φ2 2 (k), . . ., φ2 L (k)]T , ΔUL (k) = UL (k) − UL (k − Introduce the criterion function of Φ1 (k) as
1), and ΔFs (k) = Fs (k) − Fs (k − 1).
J(Φ1 (k)) = |y ∗ (k) − y(k − 1) − ΦT1 (k)ΔUL (k)
B. Problem Statement − ΦT2 (k − 1)ΔFs (k − 1)|2
This article aims to design the RLFTC algorithm for mul- + μ||Φ1 (k) − Φ1 (k − 1)||2 . (7)
tijoint robots, where a switching mechanism of the parameter
estimation mode is designed based on the two dimensions of Conduct ∂J(Φ1 (k))/∂Φ1 (k) = 0 by the optimal criteria, and
the time axis and the sampling period to update Φ̂1 (k), and a the new PPD estimator for the Lth node is proposed as
radial basis function neural network (RBFNN) [41] is built to
 1 (k) = Φ
 1 (k − 1) + ηΔUL (k − 1)
compensate the effect of disturbance and actuator fault. Φ (y ∗ (k)
μ + ||ΔUL (k − 1)||2
Problem 1: For the considered system (4), design the RLFTC
with its estimates Φ̂1 (k) and e(k) = y(k) − y ∗ (k) that satisfy − y(k − 1) − ΦT2 (k − 1)ΔFs (k − 1)
⎧ 
⎨   T (k − 1)ΔUL (k − 1))
−Φ (8)
Φ̂1 (k) ≤ χ1 (fs max , ω̂1 m ) 1
(5)
⎩ lim |e(k)| ≤ χ2 (ω̂1 m ) where the step factor η ∈ (0, 1] can increase the flexibility of the
k→∞
controller.
where χ1 (fs max , ω̂1 m ) and χ2 (ω̂1 m ) is the bounded function of It can be seen in the dynamic model (4) that the control
ω̂1 m , and e(k) is the position tracking error. performance of robot manipulators will be affected by unknown
The existing output feedback methods [20], [42], [43] on FTC fault functions, including spring disturbance. RBFNN has the
of the robotic arm mostly require a prior knowledge of robot ability to approximate the nonlinear function infinitely, so the

Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.
1334 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 29, NO. 2, APRIL 2024

output ξ(k) can be used to compensate for the fault function


Algorithm 1: The Design of RLFTC Strategy.
ΦT2 (k)ΔFs (k).
Define R(k) = [y(k) ε(k)]T as the input of RBFNN. ε(k) = 1: Given initial ρ, λ, η, Φ̂1 (1), ∂, τ , ϕ, adaptive mode
∗ length Lmax , and calculate value function QL (1);
y (k) − y(k). The output ξ(k) is expressed as
2: Get the I/O information of the manipulator robot
ξ(k) = ω ∗ T ψ(k) + σ(k) (9) through the current subscription node;
where ω ∗ = [ω1∗ (k) ω2∗ (k) . . . ωL ∗
(k)]T is the desired 3: Determine the number of neurons and Gaussian vectors
weight vector, σ(k) is the network system error, ω ∗ (k) = of the RBFNN, and then train the weights ω(k) online
argmin by Δ ω (k) = ∂(y ∗ (k) − y(k))Ψ(k);
ω∈R {sup |ω (k)ψ(k) − ξ(k)}, ψ(k) is the Gaussian func-
T

tion vector, which is chosen as 4: Mode Switching Mechanism:


  5: Get the pseudo partial derivative vector φ1 (k),
||R(k) − Ci ||2 φ2 (k),..., φL (k) according to iterating different
Ψi (k) = exp − (10)
2Γi 2 numbers of data points from one to L;
where Ci and Γi represent the center and width of the hidden 6: Compute Φ̂1 (k) based on the pseudo partial derivative
layer neurons, respectively. vector φ1 (k), φ2 (k),..., φL (k) and then compute
The online update method is considered as follows: Q1 (k), Q2 (k),..., QL (k);
7: Choose an optimal Φ̂∗1 (k) through comparing Q1 (k),
ω (k) = ∂(y ∗ (k) − y(k))Ψi (k)
Δ (11) Q2 (k),..., QL (k) and then compute control law u(k);
where ∂ is a freely chosen factor, Δ
ω (k) = ω
 (k) − ω
 (k − 1). 8: Policy Evaluation:
Then, the estimation of ξ(k) is denoted as 9: If |QL (k) − QL (k − 1)| ≤ τ , then
10:  1 (k)|| ≤ ϕ or ||ΔUL (k)|| ≤ ϕ.
if ||Φ
ˆ = ω̂ T (k)ψ(k).
ξ(k) (12) 11: Update Φ̂1 (k) by (14);
The data model is updated with pseudopartial derivatives 12: else
13:  1 (k) = Φ
Φ  1 (1);
φ1 (k), φ2 (k),..., φL (k). Notably, the network output ξ(k) will
also update with the parameter ω̂ T (k) changes. In contrast to 14: else
the general RBFNN algorithm, since the A/C structure based 15: Running step 3;
on QL (k) is introduced, the output of the network will also be 16: Policy Improvement:
17: Update fault-tolerance function ξ(k) ˆ by
updated in the process of policy improvement.
ˆ = ω̂ T (k)ψ(k);
ξ(k)
By introducing the input criterion J(u(k)) = |y ∗ (k + 1) −
y(k + 1)|2 + λ|u(k) − u(k − 1)|2 , λ > 0 and combining (4), 18: Compute controller u(k).
the basic steps of the RLFTC are designed as 19: k = k + 1; Go to the step 2;

ρ1 φ1 (k)
u(k) = u(k − 1) + (y ∗ (k + 1)
λ + |φ1 (k)|2
ρi ∈ (0, 1] [44]. λ is used to limit the change of the input
− y(k) − ξ(k − 1) volume, which guarantees the smoothness of the input signal
to a certain extent, so it should not be set too large. In addition,
φ1 (k) L
ρi φi (k)Δu(k − i + 1)
− i=2
(13) in the real robot environment of torque control, there is an upper
λ + |φ1 (k)|2 limit to the force issued by the joints, so the system input u(k)
 1 (k) = Φ
 1 (k − 1) + ηΔUL (k − 1) needs to be limited. The compatibility with the actual system
Φ (y ∗ (k) parameters needs to be taken into account when selecting the
μ + ||ΔUL (k − 1)||2
aforementioned parameters.
− y(k − 1) − ξ(k − 1) =phi2*delta(Fs) Remark 1: Different from traditional MFAC algorithms, an
 1 (k − 1)ΔUL (k − 1)) adaptive switching mechanism is proposed in this article to
−Φ (14)
increase the online estimation performance of Φ̂1 (k). According
 1 (k) = Φ
Φ  1 (1), if ||Φ
 1 (k)|| ≤ ϕ or ||ΔUL (k)|| ≤ ϕ (15) to (4), since Φ̂1 (k) is time varying, its estimation modes are
designed according to different choices of L in each control
 1 (1) is
where ϕ > 0 is a sufficiently small positive scalar, and Φ cycle. According to the aforementioned estimation strategy,
 1 (k).
the initial value of Φ steps 6 and 7 of Algorithm 1 need to be executed in each sampling
Based on the aforementioned analysis, an RLFTC algorithm cycle to obtain the optimal value of Φ̂1 (k). Subsequently, the
is given as Algorithm 1. L is a mode selection parameter in critical parameter Φ̂1 (k) will be updated in both time t and step
each control cycle, where it is used as an indicator for the length L directions, where L is the step length parameter and
online estimation of the PPD parameter Φ̂1 (k). The choice of satisfies L ≤ Lmax , Lmax > 0 is the predetermined maximum
Lmax should not be too large, otherwise it will increase the historical data length. To evaluate the estimation results for
computational complexity of the algorithm. In Algorithm 1, ρ different L, the value function QL (k) is used to describe the
and η are the step-size constant for the controller, and μ is a performance of Φ̂1 (k) at different L. However, in the existing
weighting factor. To make the control algorithm more flexible, MFAC algorithms, the choice of parameter L cannot change

Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: NEW REINFORCEMENT LEARNING FAULT-TOLERANT TRACKING CONTROL METHOD 1335

adaptively [41]. Compared with the single-dimensional param- ⎛ ⎞ L−1


1

eter estimation approach [29], the RLFTC is shown to have better ⎜ L  


φ1 (k)φi (k) ⎟
robustness against the disturbance including spring interference. ⎝ 2 ⎠
≤ M4 . (19)
i=2 λ + φ 1 (k)
In the RLFTC, L parameter estimation modes are designed
according to different choices of L in each control cycle. Tuning Choose max pi , i = 1, 2, . . ., L, which satisfy
L will improve the robustness of the closed-loop system. Based
on this advantage, the RLFTC is applied to solve the intrinsic   
φ1 (k)φi (k) φ1 (k)φi (k)
L L
heavy chattering problem of the multijoint robots. pi 2
≤ max pi 2
λ + φ1 (k) λ + φ1 (k)
i=1,2,...,L
Remark 2: Most data-driven control methods are time- i=2 i=2
triggered control methods, which update control signals peri-  
odically [45], [46], this may result in the transmission of a large Δ
≤ max pi M4 L−1 = M5 < 1.
amount of unnecessary information, thereby causing waste of i=1,2,...,L
limited network resources. Compared with the existing rigid (20)
parameter updating mechanism [31], an event-trigger parame-
Define tracking error e(k) = y ∗ − y(k). A(k) is
ter selection strategy is introduced to avoid unnecessary loop ⎡ ρ φ (k)φ (k) ⎤
 3 (k)  L (k)
iterations in Algorithm 1, where the computation complexity − 2 1  2 2 − ρ3 φ1 (k) φ
. . . − ρL φ1 (k) φ
0
⎢ λ+|φ1 (k)| λ+|φ1 (k)|
 λ+|φ1 (k)|

2 2
is optimized via designing the Policy Evaluation and Policy ⎥
⎢ ··· 0⎥
Improvement of the A/C mechanism. By designing the A/C ⎢ 1 0 0 ⎥
⎢ ··· 0⎥
mechanism and parameter selection strategy, the number of loop ⎢ 0 1 0 ⎥
⎢ .. .. .. .. ⎥
iterations is significantly reduced and the computational com- ⎣ . . . . 0⎦
plexity problem is solved without sacrificing the performance 0 0 ··· 1 0 L∗L
of parameter estimation. Thus, the real-time performance of the
algorithm is guaranteed. ΔUL (k) = [Δu(k), . . . , Δu(k − L + 1)] T

C = [1, 0, . . . , 0]T . (21)


B. Stability Analysis of the Closed-Loop System
Equation (13) can be converted to
Here, the stability analysis on the uniformly ultimately bound-
edness of the states of the closed-loop system under the designed ΔUL (k) = A(k)[Δu(k), . . . , Δu(k − L)]T
RLFTC is presented.
ρ1 φ1 (k)
Theorem 1: For the considered system (4), given the pa- + 2
C(e(k) − ω̂ T (k)ψ(k))
rameters λ > 0, μ > 0, η ∈ (0, 1], ρ1 ∈ (0, 1], and the RLFTC λ + φ1 (k)
strategy (13)–(15), there exist λmin > 0 and λ > λmin such that
the position tracking error e(k) of the manipulator joint satisfies ρ1 φ1 (k)
= A(k)ΔUL (k − 1) + C(e(k)
limk→∞ |e(k)| ≤ χ2 (ω̂1 m ). 2
Proof: First the boundedness of Φ̂1 (k) is proved in [41]. λ + φ1 (k)
For more details, please refer to [41]. In addition, ω 1T (k)
− ω̂ T (k)ψ(k)). (22)
is bounded and we can obtain that |ω̂1T (k)ψ(k)| ≤ ω̂1 m ,
Φ2 (k)ΔFs (k) ≤ 2pfs max . From the characteristic equation of A(k) and (21), we
 
Then, we need to analyze the boundedness of tracking error know |z| < 1 and so that |z|L−1 ≤ L
i=2 pi | φ1 (k) φi (k)2 ||z|L−i ≤
e(k). λ+|φ1 (k)|
 
Considering λ > λmin > 0, since Φ̃1 (k) and Φ̂1 (k) are
L
i=2 pi | φ1 (k) φi (k)2 | ≤ (maxi=1,2,...,L pi )M4 L−1 < 1.
λ+|φ1 (k)|
bounded, there exist constant M1 , M2 , M3 , M4 , and M5 , sat- So, there exists a positive number ε1 that satisfies A(k) ≤
isfying 1
s(A(k)) + ε1 ≤ (maxi=1,2,...,L pi ) L−1 M4 + ε1 < 1.
1
Define d2 = (maxi=1,2,...,L pi ) L−1 M4 + ε1 . Then, we can get
φ1 (k) φ1 (k) 1 Δ 0.5
≤ √  < √ = M1 < e(k + 1) = y ∗ − y(k + 1)
2
2 λmin p
λ + φ1 (k) 2 μ φ1 (k)
= y ∗ − y(k) − Φ1 T (k)ΔUL (k) − Φ2 T (k)ΔFs (k)
(16)
ρ1 φ1 (k)φ1 (k)
φ1 (k)φi (k) φ1 (k) = 1− e(k) − Φ1 T (k)A(k)
0 < M2 ≤ ≤p √ λ + φ1 (k)
2

2 μ φ1 (k)
2
λ + φ1 (k)
⎛ ⎞
⎜ ρ1 φ1 (k)φ1 (k) ⎟ T
p
< √ < 0.5 (17) ΔUL (k − 1) + ⎝ ω̂ (k)ψ(k).
2 λmin 2 ⎠
λ + φ1 (k)
M1 Φ1 (k) ≤ M3 < 0.5
(18) (23)
M2 + M3 < 1

Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.
1336 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 29, NO. 2, APRIL 2024

From (17), choose 0 < ρ1 ≤1 that satisfies TABLE I


1 (k)φ1 (k) 1 (k)φ1 (k) PARAMETER SETTINGS FOR THE SYSTEM
ρ1 φ ρ1 φ
1− 1 (k)|2
= 1− 1 (k)|2
≤ 1 − ρ1 M2 < 1.
λ+|φ λ+|φ
Let d3 = 1 − ρ1 M1 and take the norm of (23)

ρ1 φ1 (k)φ1 (k)


|e(k + 1)| ≤ 1 − 2
|e(k)| + Φ1 T (k) |A(k)|
λ + φ1 (k)
⎛ ⎞
⎜ ρ1 φ1 (k)φ1 (k) ⎟ T
ΔUL (k − 1) + ⎝ 2 ⎠
ω̂ (k)ψ(k)
λ + φ1 (k)
 
< d3 |e(k)| +d2 Φ1 T (k) ΔUL (k−1) +2ω̂1 m

k−1
 
< · · · < d3 k |e(1)| + d2 d3 k−1−i Φ1 T (i + 1)
i=1 Substituting (27) into (26), we can get limk→∞ g(k + 2) <
limk→∞ (d3 + d2 )g(k + 1) < · · · < limk→∞ (d3 + d2 )k g(2) =
ΔUL (i) + 2ω̂1 m 0. g(k + 1) + 2ω̂1 m > 0. This indicates that the tracking error

k−1
  e(k) is converged from (25) and (26). Thus, y(k) is also
< d3 k |e(1)| + d2 d3 k−1−i Φ1 T (i + 1)ρ1 M1 bounded. This completes the proof. 
i=1


i IV. EXPERIMENTS
d2 i−j |e(j)| + 2ω̂1 m . (24) The proposed RLFTC strategy is applied to two nonlinear sys-
j=1
tems with actuator fault to validate its effectiveness in different
k−1 application scenarios, one of which is the numerical model of a
Define d4 = ρ1 M3 , g(k + 1) = d3 k |e(1)| + d2 d4 i=1
d3 k−1−i i i−j
|e(j)|. single-link robotic arm system and the other one is the Baxter
j=1 d2 From (17) and (24), we can get
robot with spring interference.
|e(k + 1)| < g(k + 1) + 2ω̂1 m (25)
A. Simulation for the Single-Link Robotic Arm System
where g(2) = d3 k |e(1)|.
It can be found that the convergence of e(k + 1) is related to In this subsection, the mathematical modeling is based on the
g(k + 1) as single-link robotic arm system proposed in [47] as


k−1 
i
⎪ J1 q̈ + N1 q̇ + B sin(q) = ka I
g(k + 2) = d3 k+1 |e(1)| + d2 d4 d3 k−1−i d2 i−j |e(j)| ⎪
⎨LI˙ + RI + K q̇ = V
b
mD 2 2m0 r 2 (29)
i=1 j=1

⎪ J = J + 3 + m0 D +
2
⎩ 1 5
= d3 g(k + 1) + d4 d2 |e(1)| + · · · + d4 d2 2 |e(k − 1)| N1 = mDg 2 + m0 Dg.
+ d4 d2 |e(k)| Define x1 = q, x2 = q̇, x3 = I, V = u, and the kinetic equa-
tion of the system is given as follows:
< d3 g(k + 1) + d4 d2 k |e(1)| + · · · +d4 d2 2 |e(k − 1)|

+ d4 d2 |g(k)| ⎨ẋ1 = x2
ẋ2 = − N
J1 x2 − J1 sin (x1 ) + J1 I
1 B ka
(30)
= d3 g(k + 1) + h(k) (26) ⎩
ẋ3 = − L ẋ3 − L x2 + L u
R Kb 1

Δ
where h(k) = d4 d2 k |e(1)|+ · · · +d4 d2 2 |e(k − 1)|+d4 d2 |g(k)|. where J is the rotor inertia, J1 represents the motor rotational
From (19), d3 = 1 − ρ1 M2 > ρ1 (M2 + M3 ) − ρ1 M2 = inertia, N1 , m, D, and q are the potential energy, mass, length,
ρ1 M3 = d4 , so h(k) satisfies h(k) < d2 g(k + 1), details can be and rotation angle of the connecting rod, respectively, m0 and r
found in [41]. are the mass and radius of the load respectively, g is the gravity
We can get constant, B represents the viscous friction coefficient at the joint,
I, L and R are the current, inductance and resistance of the mo-
g(k + 2) < d3 g(k + 1) + h(k) < (d3 + d2 )g(k + 1). (27) tor armature respectively, ka represents the electromechanical
Choose 0 < ρ1 < 1, 0 < ρL ≤ 1 that satisfy 0< conversion coefficient of armature current to torque, Kb is the
max {ρi }1/(L−1) M4 < ρ1 M2 < 1, then back electromotive force constant, and V represents the input
control voltage.
0 < 1 − ρ1 M2 + max {ρi }1/(L−1) M4 < 1. (28) The system parameter settings are shown in Table I.

Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: NEW REINFORCEMENT LEARNING FAULT-TOLERANT TRACKING CONTROL METHOD 1337

Fig. 1. Tracking control performances of the three methods. Fig. 2. Tracking error of the three methods.

The discretization of the system dynamics equation can be TABLE II


PERFORMANCE INDICATORS OF THREE CONTROLLERS
obtained


⎪ x1,k = x1,k−1 + T x2,k−1





x  = x2,k−1 +
2,k 

T −N 1
J1 x 2,k−1 − B
J1 sin (x 1,k−1 ) + k
J1 x 3,k−1
(31)

⎪ x = x +
⎪  R

3,k 3,k−1 

⎪T − L x3,k−1 − L x2,k−1 + L (uk−1 + f (k − 1))
⎪ Kb 1

yk = x1,k .
Set the initialization parameters, where the sampling time T = Fig. 1 shows the tracking performances of the proposed
0.001 s, λ = 0.6, ρ1 = ρ2 = ρ3 = 0.8, and μ = 0.1. The desired RLFTC scheme, the AFTC method, and the model-free RC.
trajectory is set as yd(k) = sin(T ∗ k) + cos(T ∗ k), and the It can be observed that in the transient period (0–1 s), AFTC has
fault function is set as f (k) = 2 ∗ sin(k/50) ∗ cos(k/20) + 3 faster convergence speed than both RLFTC and RC. Then, in
at time = 10. The RBFNN weights are initialized to 0, and the the fault-free period (1–10 s), Figs. 1 and 2 show that the three
number of hidden layer neurons is set to 6. The size of the width methods all achieve good tracking performances. When the fault
σ of the Gaussian function and the choice of the center c are occurs (after 10 s), it can be seen from Figs. 1 and 2 that both
given as follows: of RLFTC and AFTC have a fast responsiveness and accurate
tracking ability. Moreover, it can be found that the RLFTC con-
σ = [0.42, 0.41, 0.33, 0.38, 0.91, 0.15]T (32) verges slightly slower than the model-based method-AFTC, and
  its steady-state error is also larger than that of AFTC. It should
10.8 11.35 18.2 16.4 23.1 34.2
c= . (33) be pointed out that the RLFTC is a model-free FTC algorithm,
0.2 0.2 0.43 0.21 0.4 0.74
which is only based on I/O data. Despite of less information, the
Finally, in order to be able to quantitatively analyze the tracking performance of the RLFTC is still satisfactory, and the
control effect, the absolute value range of position tracking performance index of the steady-state error in Table II shows that
error (ARPE), integral of absolute position error (IAPE), and the FTC effect of the proposed method is very close to that of
the integral of absolute input variation (IAIV) [41] are used as AFTC. In contrast, Fig. 2 shows that RC has poor tracking perfor-
evaluation indicators to quantify the tracking control effect. mance, and the tracking error of RC is much larger than those of
the RLFTC and AFTC.

T
IAPE = |y ∗ (k) − y(k)| (34)
k=1 B. Application to the Baxter Robot

T
The Baxter robot is shown in Fig. 3. It is a dual-arm robot
IAIV = |u(k) − u(k − 1)|. (35) developed and its single manipulator is a redundant flexible one
k=2
with seven degrees of freedom. The robot arm is connected with
Since some model parameters of the Baxter robot experimen- a rigid link by a rotating joint, and the joint is connected by
tal platform in our lab are unknown, a numerical simulation on an elastic brake, that is, a motor and a reducer are connected
a single-link robotic arm system under actuator fault is consid- in series with a spring to drive the load to protect the human
ered. The comparison study is conducted between the proposed or robot body under human–robot cooperation. Since the spring
model-free FTC method in this article and the model-based has a fixed stiffness, the momentum at the joint can be detected
active FTC method (AFTC) in [17]. Furthermore, to highlight by Hooke’s law. There is a torque sensor at Baxter’s joint. The
the advantages of the proposed method, the robust controller front and rear of the arm are driven by 26-W and 63-W servo
(RC) [29] is also introduced. The simulation results are depicted motors, and the 14-bit encoder can realize the reading for joint
in Figs. 1 and 2. angle.
Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.
1338 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 29, NO. 2, APRIL 2024

TABLE III
ANGLE LIMIT OF JOINT OF THE BAXTER ROBOT

TABLE IV
ACTUATOR FAULT FACTOR TABLE

TABLE V
PERFORMANCE INDICATORS OF THREE CONTROLLERS (CASE III)

Fig. 4. FTC scheme for the Baxter robot based on the RLFTC.

Fig. 3. Baxter robot experiment platform.

In the experiment, the Baxter robot runs on the Linux platform


based on the ROS operating system. we can connect the internal
computer of the robot through the network to read information
or send instructions, or remotely control the operation on the
internal computer through secure shell (SSH). The computer Fig. 5. Case I.
is connected to the robot through a network cable and utilizes
the software development kit (SDK) of Baxter robot operating
system under the Ubuntu 16.04 LTS platform, which realizes the
programming and control of Baxter robots. Table III shows the performance for W2 joint under different fault sizes, respec-
limited range of the joint angle of the Baxter robot. The sampling tively. To verify the FTC performance of the presented RLFTC
frequency is set to 120 Hz. algorithm, both multiplicative and additive faults are consid-
To verify whether the RLFTC improves the fault-tolerance ered in cases I to III. The fault is given when k = 200, and
performance of the system, the results of RLFTC and the its type and size are shown in Table IV. It is worth noting
RC [29] are compared in case I. In besides, a comparative that the maximum acceptable torque for the W2 joint of the
study is conducted to illustrate the superiority of the RLFTC Baxter robot is 15 Nm, so the additional fault of 5 Nm is very
over existing methods, including the FTC algorithm PFDL- significant.
MFAFTC [32] as well as the classic PID algorithm [48], and The RBFNN weights are initialized to 0, and the number of
the RLFTC scheme is shown in Fig. 4. Figs. 5 and 6 compare hidden layer neurons is set to 6. The size of the width σ of the
three methods for tracking control performances and real-time Gaussian function and the choice of the center c are given as

Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: NEW REINFORCEMENT LEARNING FAULT-TOLERANT TRACKING CONTROL METHOD 1339

TABLE VI
ACTUATOR FAULT FACTOR TABLE

Fig. 6. Case II. (a) Tracking control performances of the three meth-
ods. (b) Execution time of the three methods.

follows:
σ = [0.12, 0.41, 0.33, 0.75, 0.91, 0.43]T (36)
 
15.3 26.5 8.2 16.2 33.1 44.2
c= . (37)
0.6 0.2 0.51 0.21 0.8 0.11
The value of Lmax is set to 3, and three parameter estimation
modes are designed for the system. For comparison, the same
controller parameter should be selected as far as possible in the
experiment. The other parameters in the controller are reselected
as shown in Table VI.
From Fig. 5, the RLFTC remarkably improves the fault-
tolerance performance of the system. It can be seen from
Figs. 6(a) and 7(a) that during the fault-free period in time
0–200, the presented RLFTC has the best tracking control per- Fig. 7. Case III. (a) Tracking control performances of the three meth-
formance, whereas largest tracking error is obtained under the ods. (b) Execution time of the three methods.
PID controller. When the actuator fault occurs, big fluctuations
appear under PID and PFDL, but such phenomenon does not
occur in the result of thr RLFTC; the RLFTC can converge sampling period. Among the three methods, PFDL-MFAFTC
within 10–20 sampling instants and the response curve is very has the largest tracking error, reaching 1.135 rad, while the
smooth in the whole transient period. Then, we can see the RLFTC has only 0.236 rad. In Table V, IAPE and IAFV indica-
steady-state tracking control performance of the RLFTC are also tors of RLFTC most advantaged, FTC capability is optimal, and
much better than those of the other two methods. On the other control performance is significantly improved. Therefore, we
hand, from Figs. 6(b) and 7(b), we can see that all three methods can conclude that the presented RLFTC has higher reliability and
has acceptable real-time performance. However, the maximum feasibility for flexible multijoint manipulators including Baxter
execution time for both of PID and PFDL have approached the robot.

Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.
1340 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 29, NO. 2, APRIL 2024

V. CONCLUSION [19] M. Zhong and X. Zhu, “An overview of recent advances in model-based
event-triggered fault detection and estimation,” Int. J. Syst. Sci., vol. 54,
This article is dedicated to the issue of providing the RLTFC no. 4, pp. 1–15, 2022.
scheme as well as its application in the control system of the [20] B. Cangan, S. Navarro, and B. Yang, “Model-based disturbance estima-
tion for a fiber-reinforced soft manipulator using orientation sensing,”
Baxter robot. By applying the iterative learning algorithm of RL in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Kyoto, Japan, 2022,
to the design of pseudopartial derivatives according to the time- pp. 9424–9430.
varying characteristics of system parameters, and switching the [21] S. Gao, H. Dong, B. Ning, Y. Chen, and X. Sun, “Adaptive fault-tolerant
automatic train operation using RBF neural networks,” Neural Comput.
parameter update mode in each control cycle, the proposed Appl., vol. 26, no. 1, pp. 141–149, 2015.
RLFTC has a better parameter estimation effect. On this basis, a [22] Y. Ma and W. Che, “Distributed model-free adaptive control for learning
fault approach mechanism is established to compensate for the nonlinear MASs under DoS attacks,” IEEE Trans. Neural Netw. Learn.
Syst., vol. 34, no. 3, pp. 1146–1155, Mar. 2023.
fault function. Our further work will focus on fault detection and [23] B. Zhang and J. Cheng, “Path tracking of autonomous tractor based on
the location of the multijoint robotic arms. model-free adaptive control,” Proc. Inst. Mech. Engineers D, J. Automobile
Eng., to be published, 2022.
[24] G. Liu and Z. Hou, “RBFNN-based adaptive iterative learning fault-
REFERENCES tolerant control for subway trains with actuator faults and speed
constraint,” IEEE Trans. Syst., Man, Cybern. Syst., vol. 51, no. 9,
[1] W. Zhang and F. Li, “Review of on-orbit robotic arm active Debris capture pp. 5785–5799, Sep. 2021.
removal methods,” Aerospace, vol. 10, no. 1, 2023, Art. no. 13. [25] G. Liu and Z. Hou, “Cooperative adaptive iterative learning fault-tolerant
[2] M. Van and D. Ceglarek, “Robust fault tolerant control of robot manipula- control scheme for multiple subway trains,” IEEE Trans. Cybern., vol. 52,
tors with global fixed-time convergence,” J. Franklin Inst., vol. 358, no. 1, no. 2, pp. 1098–1111, Feb. 2022.
pp. 699–722, 2021. [26] J. Wang and S. Qin, “A new subspace identification approach based
[3] G. Zong, D. Yang, J. Lam, and X. Song, “Fault-tolerant control of switched on principal component analysis,” J. Process Control, vol. 12, no. 8,
LPV systems: A bumpless transfer approach,” IEEE/ASME Trans. Mecha- pp. 841–855, 2002.
tron., vol. 27, no. 3, pp. 1436–1446, Jun. 2021. [27] H. Chen, B. Jiang, and S. Ding, “A broad learning aided data-driven
[4] W. He, Z. Li, and C. Chen, “A survey of human-centered intelligent robots: framework of fast fault diagnosis for high-speed trains,” IEEE Intell.
Issues and challenges,” IEEE/CAA J. Automatica Sinica, vol. 4, no. 4, Transp. Syst. Mag., vol. 13, no. 3, pp. 83–88, Fall 2021.
pp. 602–609, Sep. 2017. [28] S. Ding, “Data-driven design of monitoring and diagnosis systems for
[5] B. Jiang and C. Gao, “Observer-based continuous adaptive sliding mode dynamic processes: A review of subspace technique based schemes and
control for soft actuators,” Nonlinear Dyn., vol. 105, no. 1, pp. 371–386, some recent results,” J. Process Control, vol. 24, no. 2, pp. 431–449,
2021. 2014.
[6] B. Xu, Z. Shi, C. Yang, and F. Sun, “Composite neural dynamic surface [29] Z. Hou and S. Jin, “Data-driven model-free adaptive control for a class
control of a class of uncertain nonlinear systems in strict-feedback form,” of MIMO nonlinear discrete-time systems,” IEEE Trans. Neural Netw.,
IEEE Trans. Cybern., vol. 44, no. 12, pp. 2626–2634, Dec. 2014. vol. 22, no. 12, pp. 2173–2188, Dec. 2011.
[7] L. Tie, K. Cai, and Y. Lin, “A survey on the controllability of bilinear [30] Z. Hou and S. Jin, “A novel data-driven control approach for a class
systems,” Acta Automatica Sinica, vol. 37, no. 9, pp. 1040–1049, 2011. of discrete-time nonlinear systems,” IEEE Trans. Control Syst. Technol.,
[8] M. Van, S. Ge, and H. Ren, “Robust fault-tolerant control for a class of vol. 19, no. 6, pp. 1549–1558, Nov. 2011.
second-order nonlinear systems using an adaptive third-order sliding mode [31] Z. Hou and Y. Zhu, “Controller-dynamic-linearization-based model free
control,” IEEE Trans. Syst., Man, Cybern. Syst., vol. 47, no. 2, pp. 221–228, adaptive control for discrete-time nonlinear systems,” IEEE Trans. Ind.
Feb. 2017. Inform., vol. 9, no. 4, pp. 2301–2309, Nov. 2013.
[9] X. Liu, J. Han, X. Wei, H. Zhang, and X. Hu, “Distributed fault detection [32] H. Wang and G. Liu, “Data-driven model-free adaptive fault tolerant
for non-linear multi-agent systems: An adjustable dimension observer control for high-speed trains,” Control Decis., vol. 1543, pp. 1–10, 2021.
design method,” IET Control Theory Appl., vol. 13, no. 15, pp. 2407–2415, [33] H. Roland and R. Martin, “Reinforcement learning in feedback control,”
Oct. 2019. Mach. Learn., vol. 84, no. 1, pp. 137–169, 2011.
[10] H. Zhu and M. Chen, “Inverse reinforcement learning-based fire-control [34] X. Bu and Q. Qi, “Fuzzy optimal tracking control of hypersonic flight
command calculation of an unmanned autonomous helicopter using swarm vehicles via single-network adaptive critic design,” IEEE Trans. Fuzzy
intelligence demonstration,” Aerospace, vol. 10, no. 3, 2023, Art. no. 309. Syst., vol. 30, no. 1, pp. 270–278, Jan. 2022.
[11] B. Sabbaghian and M. Farrokhi, “Robust fuzzy observer-based fault- [35] X. Bu and Y. Xiao, “An adaptive critic design-based fuzzy neural con-
tolerant control: A homogeneous polynomial Lyapunov function ap- troller for hypersonic vehicles: Predefined behavioral nonaffine con-
proach,” IET Control Theory Appl., vol. 17, no. 1, pp. 74–91, 2023. trol,” IEEE/ASME Trans. Mechatron., vol. 24, no. 4, pp. 1871–1881,
[12] M. Van, M. Mavrovouniotis, and S. Ge, “An adaptive backstepping non- Aug. 2019.
singular fast terminal sliding mode control for robust fault tolerant control [36] C. Hua, S. Ding, and Y. Shardt, “A new method for fault tolerant control
of robot manipulators,” IEEE Trans. Syst., Man, Cybern. Syst., vol. 49, through q-learning,” IFAC-PapersOnLine, vol. 51, no. 24, pp. 38–45,
no. 7, pp. 1448–1458, Jul. 2019. 2018.
[13] E. Shahzad and A. Khan, “Sensor fault-tolerant control of microgrid using [37] X. Bu and B. Jiang, “Non-fragile quantitative prescribed performance con-
robust sliding-mode observer,” Sensors, vol. 22, no. 7, 2022, Art. no. 2524. trol of waverider vehicles with actuator saturation,” IEEE Trans. Aerosp.
[14] M. Chen and S. Ge, “Robust adaptive neural network control for a class Electron. Syst., vol. 58, no. 4, pp. 3538–3548, Aug. 2022.
of uncertain MIMO nonlinear systems with input nonlinearities,” IEEE [38] Z. Wang and J. Liu, “Reinforcement learning based-adaptive track-
Trans. Neural Netw., vol. 21, no. 5, pp. 796–812, May 2010. ing control for a class of semi-Markov non-Lipschitz uncertain sys-
[15] P. Shi and M. Liu, “Fault-tolerant sliding-mode-observer synthesis of tem with unmatched disturbances,” Inf. Sci., vol. 626, pp. 407–407,
Markovian Jump systems using quantized measurements,” IEEE Trans. 2023.
Ind. Electron., vol. 62, no. 9, pp. 5910–5918, Sep. 2015. [39] L. Frank and V. Draguna, “Reinforcement learning and feedback control:
[16] M. Van and H. Kang, “A robust fault diagnosis and accommodation Using natural decision methods to design optimal adaptive controllers,”
scheme for robot manipulators,” Int. J. Control, Automat. Syst., vol. 11, IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76–105, Dec. 2012.
pp. 377–388, 2013. [40] F. Caccavale, P. Cilibrizzi, F. Pierri, and L. Villani, “Actuators fault
[17] M. Van and S. Ge, “Finite time fault tolerant control for robot manipulators diagnosis for robot manipulators with uncertain model,” Control Eng.
using time delay estimation and continuous nonsingular fast terminal Pract., vol. 17, no. 1, pp. 146–157, 2009.
sliding mode control,” IEEE Trans. Cybern., vol. 47, no. 7, pp. 1681–1693, [41] H. Wang and Z. Hou, “Model-free adaptive fault-tolerant control for
Jul. 2017. subway trains with speed and traction/braking force constraints,” IET
[18] J. Zhu, C. Gu, S. Ding, and W. Zhang, “A new observer-based cooperative Control Theory Appl., vol. 14, no. 12, pp. 1557–1566, 2020.
fault-tolerant tracking control method with application to networked mul- [42] S. Gadsden, Y. Song, and S. Habibi, “Novel model-based estimators for the
tiaxis motion control system,” IEEE Trans. Ind. Electron., vol. 68, no. 8, purposes of fault detection and diagnosis,” IEEE/ASME Trans. Mechatron.,
pp. 7422–7432, Aug. 2021. vol. 18, no. 4, pp. 1237–1249, Aug. 2013.

Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: NEW REINFORCEMENT LEARNING FAULT-TOLERANT TRACKING CONTROL METHOD 1341

[43] F. Shen, Z. Cao, D. Xu, and C. Zhou, “A dynamic model of robotic Zhi-Jun Yang received the B.Eng. degree in
dolphin based on Kane method and its speed optimization method,” Acta radio technology from the Huazhong University
Automatica Sinica, vol. 38, no. 8, pp. 1247–1256, 2012. of Science and Technology, Wuhan, China, in
[44] Z. Hou, R. Chi, and H. Gao, “An overview of dynamic-linearization-based 1989, the M.Sc. degree in biomedical electron-
data-driven control and applications,” IEEE Trans. Ind. Electron., vol. 64, ics from Nanjing University, Nanjing, China, in
no. 5, pp. 4076–4090, May 2017. 1992, and the Ph.D. degree in computer science
[45] X. Wang, Z. Fei, Z. Wang, and X. Liu, “Event-triggered fault estimation from Federal University of Rio de Janeiro, Rio de
and fault-tolerant control for networked control systems,” J. Franklin Inst., Janeiro, Brazil, in 1999.
vol. 356, no. 8, pp. 4420–4441, 2019. He has been affiliated with Heriot-Watt Uni-
[46] X. Chu and M. Li, “Event-triggered fault estimation and sliding mode versity, University of Stirling, and University of
fault-tolerant control for a class of nonlinear networked control systems,” Edinburgh as a Research Fellow. From 2013
J. Franklin Inst., vol. 355, no. 13, pp. 5475–5502, 2018. till recent, he served as a Senior Lecturer with Middlesex University
[47] J. Zhang and G. Yang, “Low-computation adaptive fuzzy tracking control London. He is currently the Senior AI scientist with Jack Technology, Co.
of unknown nonlinear systems with unmatched disturbances,” IEEE Trans. Ltd., Taizhou, China. His research interests include machine learning,
Fuzzy Syst., vol. 28, no. 2, pp. 321–332, Feb. 2020. neuromorphic VLSI designs, and bioinspired system development.
[48] X. Yu, W. He, C. Xue, B. Li, and L. Cheng, “Adaptive neural admittance
control for collision avoidance in human-robot collaborative tasks,” in
Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2019, pp. 7574–7579.

Jun-Wei Zhu (Member, IEEE) received the B.S.


degree in control theory and engineering from
Northeastern University, Shenyang, China, in Xin Wang (Member, IEEE) received the B.S. de-
2008, the M.S. degree in control theory and en- gree in information and computing science and
gineering from Shenyang University, Shenyang, the M.S. degree in operational research and cy-
in 2011, and the Ph.D. degree in control theory bernetics from Heilongjiang University, Harbin,
and engineering from Northeastern University, China, in 2008 and 2011, respectively, and the
in 2016. Ph.D. degree in navigation guidance and control
He was a visiting Professor with the Insti- from Northeastern University, Shenyang, China,
tute of Automatic Control and Complex Systems in 2016.
(AKS), University of Duisburg-Essen, Germany, From 2017 to 2018, he was a Visiting Pro-
from September 2019 to September 2020. He is currently a special- fessor with the Department of Mechanical En-
termed Associate Professor with the Institute of Cyberspace Security, gineering, University of Victoria, Victoria, BC,
Zhejiang University of Technology, Hangzhou, China. His research in- Canada. He is currently a Lecturer with the School of Mathemati-
terests include cyber-physical systems, fault diagnosis, and fault-tolerant cal Science, Heilongjiang University, and also a Postdoctoral Fellow
control. with the Department of Electrical Engineering, Yeungnam University,
Gyeongsan, South Korea. His research interests include fault diagnosis,
fault-tolerant control, multiagent coordination, and time-delay systems.

Zi-Yuan Dong is currently working toward the


Ph.D. degree in control engineering with the In-
stitute of Cyberspace Security, Zhejiang Univer-
sity of Technology, Hangzhou, China.
His research interests include data-driven
control, robot-physical systems, and fault diag-
nosis.

Authorized licensed use limited to: Vietnam National University. Downloaded on June 20,2024 at 04:04:46 UTC from IEEE Xplore. Restrictions apply.

You might also like