0% found this document useful (0 votes)
21 views8 pages

Adaptive Dynamic Programming-Based Fixed-Time Optimal Control For Wheeled Mobile Robot

This document presents an adaptive dynamic programming (ADP)-based fixed-time optimal control strategy for trajectory tracking in wheeled mobile robots. The proposed controller utilizes a critic neural network to estimate the cost function and ensures fixed-time convergence of tracking errors, addressing challenges such as slippage and disturbances. Simulations and experiments demonstrate that this method achieves faster convergence compared to traditional control approaches, highlighting its effectiveness in real-world applications.

Uploaded by

Vinh Thành
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Adaptive Dynamic Programming-Based Fixed-Time Optimal Control For Wheeled Mobile Robot

This document presents an adaptive dynamic programming (ADP)-based fixed-time optimal control strategy for trajectory tracking in wheeled mobile robots. The proposed controller utilizes a critic neural network to estimate the cost function and ensures fixed-time convergence of tracking errors, addressing challenges such as slippage and disturbances. Simulations and experiments demonstrate that this method achieves faster convergence compared to traditional control approaches, highlighting its effectiveness in real-world applications.

Uploaded by

Vinh Thành
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

176 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 10, NO.

1, JANUARY 2025

Adaptive Dynamic Programming-Based Fixed-Time


Optimal Control for Wheeled Mobile Robot
Chen Wang , Haoran Zhan , Qing Guo , Senior Member, IEEE, and Tieshan Li , Senior Member, IEEE

Abstract—In this study, the adaptive dynamic programming trajectory tracking problem of wheeled mobile robots [2], [3],
(ADP)-based fixed-time optimal trajectory tracking control is in- [4], [5], [6], [7]. Considering the possibility of slippage when
vestigated for wheeled mobile robots. An ADP-based fixed-time the robot moves in slopes or uneven terrain, the authors in [2]
optimal tracking controller is developed based on the critic-only and [3] proposed adaptive control algorithms with an adaptive
neural network ADP technique, which guarantees the robot track compensation mechanism. To solve the difficulties posed by
the desired trajectory in fixed time. Firstly, to address the solution
difficulty of the Hamilton-Jacobi-Bellman (HJB) equation, a critic
lateral and longitudinal slippage, Chen et al. [4] proposed an im-
neural network is used to estimate the cost function. Meanwhile, proved linear active disturbance rejection control algorithm for a
a weight update law is designed by using the adaptive control wheeled robot with six wheels. For unicycle mobile robots with
technique, which not only removes the persistent or finite excita- perturbation parameters, two sliding-mode control algorithms
tion condition, but also enables the fixed-time convergence of the were proposed in [5] and [6] to achieve trajectory tracking.
weight estimation error. By using the proposed controller, all error Additionally, considering wheeled mobile robots with unknown
variables can converge to a neighborhood of zero in fixed time. disturbances and unmeasurable velocities, a robust tracking
Finally, both simulations and physical experiments indicate that control algorithm based on an extended observer was introduced
the proposed ADP-based fixed-time optimal controller has a faster in [7]. It is important to note that the above research mainly
convergence rate compared to the two comparison controllers. focuses on handling unknown disturbances, slippage, unmeasur-
Index Terms—Wheeled mobile robots, adaptive dynamic able velocity information, and model parameter perturbations.
programming (ADP), fixed-time optimal control, critic neural However, these approaches only lead to asymptotic convergence
networks. of the system, meaning that the tracking error theoretically takes
an infinite amount of time to converge to the equilibrium point
I. INTRODUCTION or a neighborhood near it.
HE wheeled mobile robot, with its superior mobility and In recent years, a new control method, finite-time control,
T flexibility, is gradually becoming an indispensable tool
in various fields of modern society. Its efficient fulfillment of
which ensures system convergence within a finite time, has
gained popularity among scholars [8], [9], [10]. In [8], a fuzzy
finite-time control method was proposed for nonlinear systems
tasks requires the precise and stable support of a powerful
control algorithm. However, the underactuated and nonlinear with actuator failures. In [9], a fast finite-time control scheme
characteristics of the wheeled mobile robot introduce numerous was proposed for nonlinear systems with full-state constraints
complexities and challenges to the development of the control to improve the convergence speed when the energy function
algorithms [1]. value is relatively small. Later, this approach was extended
Trajectory tracking control, aimed at guiding wheeled mobile to consensus control of nonlinear multiagent systems [10]. In
robots to follow a preset reference signal, has been a signif- addition, the finite-time control methods for wheeled mobile
icant focus in the control community. Over the past decade, robots can be found in [11] and [12]. Li et al. [11] proposed
many control algorithms have been developed to address the a finite-time control method for time-varying systems and ap-
plied it to the trajectory tracking control of wheeled mobile
robots. In contrast to [11] which employs full-state feedback,
Received 30 July 2024; accepted 10 November 2024. Date of publication 21 Wu et al. [12] developed a finite-time output feedback tracking
November 2024; date of current version 29 November 2024. This article was rec- control method. It has been demonstrated in numerous studies
ommended for publication by Associate Editor K. Tahara and Editor C. Gosselin
upon evaluation of the reviewers’ comments. This work was supported in part by
that finite-time control exhibits better robustness, faster con-
the National Natural Science Foundation of China under Grant 52175046 and vergence, and superior anti-interference capability compared to
Grant 51939001 and in part by Sichuan Science and Technology Program under non-finite-time control methods [13], [14], [15]. However, there
Grant 2024YFFK0037 and Grant 2024ZYD0165. (Corresponding author: Qing exists an undeniable fact that the convergence time is highly
Guo.) dependent on the initial system states. To address this, fixed-time
Chen Wang, Haoran Zhan, and Qing Guo are with the School of Aero-
nautics and Astronautics, University of Electronic Science and Technol- control was proposed in [16], where the convergence time has
ogy of China, Chengdu 611731, China (e-mail: [email protected]; an upper bound independent of the system’s initial states. This
[email protected]; [email protected]). control method has since been rapidly developed in the study
Tieshan Li is with the School of Automation Engineering, University of of nonlinear systems [17], [18], [19], [20], [21]. For instance,
Electronic Science and Technology of China, Chengdu 611731, China (e-mail:
[email protected]).
Wang et al. [17] proposed a fuzzy fixed-time control scheme for
This letter has supplementary downloadable material available at higher-order nonlinear systems with sensor and actuator faults.
https://doi.org/10.1109/LRA.2024.3504314, provided by the authors. The scheme was later extended to handle time-varying full-state
Digital Object Identifier 10.1109/LRA.2024.3504314 constraints and actuator hysteresis [18]. Fixed-time control has

2377-3766 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Macquarie University. Downloaded on November 30,2024 at 03:12:42 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: ADP-BASED FIXED-TIME OPTIMAL CONTROL FOR WHEELED MOBILE ROBOT 177

also been applied to various practical systems, such as robotic


arms [22], Euler-Lagrange systems [23], and quadrotor sys-
tems [24]. Despite these advances, there are still few fixed-time
control schemes for wheeled mobile robots, which motivates
this study.
Notice that the above control concerns for wheeled mobile
robots are how to improve tracking accuracy precision and
response speed under complex external conditions, and ignoring
the control cost. In practical engineering applications, wheeled
mobile robots have limited energy, so it is important to realize
the balance between control performance and control cost [25].
To achieve this goal, a cost function can be specified and then
the optimal control law can be found by solving the Hamilton-
Jacobi-Bellman (HJB) equation. However, solving the HJB
equation is difficult because it is a nonlinear partial differential
equation. Fortunately, dynamic programming [26] provides a
feasible way to find the optimal control solution. However, this
approach suffers from the dimension disaster problem as the
variable dimension increases. To solve this problem, adaptive
dynamic programming (ADP) has been widely studied by schol-
ars [27], [28], [29], [30], [31]. For example, Liu et al. [27]
proposed an optimal robust guaranteed cost control method for
uncertain nonlinear systems, using ADP to solve the optimal
Fig. 1. Schematic diagram of wheeled mobile robot.
stabilization control problem. Later, Wang et al. [28] extended
this method to address the optimal tracking control problem.
In [30] and [31], the authors introduced ADP into backstepping the robot is
control, proposing two different optimal control schemes for ⎧
strict-feedback systems. Recently, to achieve optimal stability, ⎨ẋ = v cos θ
some scholars have applied ADP to optimal control problems ẏ = v sin θ (1)

in real physical systems. For instance, Dong et al. [32] studied θ̇ = ω
ADP-based optimal control for Euler-Lagrange systems. Xiao
where v denotes the linear velocity of the center of mass of
et al. [33] studied the attitude stabilization control problem of
the robot, and ω denotes the angular velocity; (x, y) and (ẋ, ẏ)
spacecraft using ADP and proposed an optimal fault-tolerant
represent the position and velocity in the earth-fixed frame,
control method. Unfortunately, there are still relatively few
respectively; θ represents the heading direction; u = [v, ω]T ∈
studies on optimal trajectory tracking control for wheeled mo-
R2×1 is the control input of the robotic system which will
bile robots, and even fewer that consider fixed-time stability.
be designed later. Furthermore, to characterize the trajectory
Motivated by the aforementioned studies, ADP-based fixed-time
tracking control problem, the following user-defined reference
optimal trajectory tracking control for wheeled mobile robots is
trajectory is introduced:
investigated, which has the following contributions. ⎧
r Many traditional ADP-based optimal control methods use ⎨ẋr = vr cos θr
the gradient descent algorithm to design neural network ẏr = vr sin θr (2)

weight update laws, which require the persistent or fi- θ̇r = ωr
nite excitation condition [32], [34], a requirement that is
difficult to satisfy in practical applications. In this study, where xr , yr and θr are the desired system states; vr and ωr are
the adaptive control technique is employed to design the the desired linear velocity and angular velocity, respectively.
weight update law of the critic neural network, which not Therefore, one can get a description of the trajectory tracking
only eliminates the need for the excitation condition but error of the robot as follows [35]:
also ensures fixed-time convergence of the weight estima-     
zx cos θ sin θ 0 xr − x
tion error. z = zy = − sin θ cos θ 0 yr − y
r Most of the existing ADP-based optimal control methods (3)
zθ 0 0 1 θr − θ
can only achieve uniformly ultimately bounded stabil-
ity [28], [29], [30], which may not satisfy the need for rapid where z = [zx , zy , zθ ]T ∈ R3×1 represents the tracking error
stabilization in real-world applications. In this study, to vector. Then, one can get the error dynamics of (3) as follows:
ensure the fixed-time convergence of the trajectory tracking 
error in a wheeled mobile robot, a fixed-time optimal żx = vr cos zθ − v + zy ω
control strategy is proposed. ży = vr sin zθ − zx ω . (4)
żθ = ωr − ω
The control input is designed as u = uf + uo , where uf =
II. MOTION DESCRIPTION OF WHEELED MOBILE ROBOT [vr cos zθ , ωr ]T ∈ R2×1 is the feedforward control input, and
As shown in Fig. 1, the position and attitude information of a uo = [vo , ωo ]T ∈ R2×1 is the feedback control input to be de-
differential drive mobile robot can be described by an earth-fixed signed. Hence, the trajectory tracking control problem is trans-
frame {Fi } and a body frame {Fb }, and the kinematic model of formed into an optimal regulation problem, and the error system

Authorized licensed use limited to: Macquarie University. Downloaded on November 30,2024 at 03:12:42 UTC from IEEE Xplore. Restrictions apply.
178 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 10, NO. 1, JANUARY 2025

(4) is rewritten as Remark 1: The analytical solution of the optimal control law
(12) requires solving the partial derivative function ∇J  (z)
ż = A(z) + B(z)uo (5a)
through the HJB (13). Unfortunately, solving the (13) directly
T can be quite challenging, thereby rendering it impractical to
A(z) = [ωr zy vr sin zθ − ωr zx 0] (5b)
directly utilize the optimal control law (12) to achieve optimal
 T
−1 0 0 performance for the error system (5).
B(z) = (5c)
zy −zx −1
B. ADP-Based Fixed-Time Optimal Tracking Controller
where A(z) is locally Lipchitz with respect to z, and it is Design
assumed that there exists a positive constant ψB such that
B(z) < ψB [28], [33]. For convenience, A(z) and B(z) will To overcome the difficulty of solving the HJB equation, a critic
be abbreviated to A and B, respectively. neural network is used to estimate the optimal cost function
J  (z) defined on a compact set z ∈ Θ ∈ R3×1 . It can be
III. ADP-BASED FIXED-TIME OPTIMAL CONTROL DESIGN described as
The design of controller and system stability analysis requires J  (z) = ϕT (z)W  + σ(z) (14)
the support of some mathematical lemmas, which are given as
where W  = [W1 , W2 , . . . , Wl ]T ∈ Rl×1 is the ideal weight
follows.
vector; σ(z) ∈ R is the estimation error; ϕ(z) = [ϕ1 (z),
Lemma 1 [20]: For ∀ ∈ R, ∀¯  ∈ R, o > 0, μ > 0, and 1 −
ϕ2 (z), . . . , ϕl (z)]T ∈ Rl×1 is the basis function vector.
μ > 0, one has
Hence, the optimal control law (12) can be reformulated as
−µ
|μ ||1−μ ≤ (1 − μ)o 1−µ || + μo|¯
|¯ |. (6) 1  
uo = − R−1 BT ∇ϕT (z)W  + ∇σ(z) (15)
Lemma 2 [20]: For bi > 0, i = 1, 2, . . . , n, 0 < a1 < 1, and 2
a2 ≥ 1, one has
where ∇ϕT (z) = [∂ϕ(z)/∂z]T ∈ R3×l , ∂ϕ(z)/∂z = [∂ϕ1
a1 a2
n n n n (z)/∂z, ∂ϕ2 (z)/∂z, . . . , ∂ϕl (z)/∂z]T ∈ Rl×3 , ∂ϕi (z)/∂z
bi ≤ bai 1 , n1−a2 bi ≤ bai 2 . (7) = [∂ϕi (z)/∂zx , ϕi (z)/∂zy , ϕi (z)/∂zθ ]T ∈ R3×1 , i = 1, 2,
i=1 i=1 i=1 i=1 . . . , l, and ∇σ(z) = ∂σ(z)/∂z ∈ R3×1 .
Lemma 3 [36]: For ∀a1 ∈ R and a2 > 0, one has Since W  is unknown, the corresponding estimation Ŵ =
 [Ŵ1 , Ŵ2 , . . . , Ŵl ]T ∈ Rl×1 is introduced. Thus, the estimation
a1
0 ≤ |a1 | − a1 tanh ≤ 0.2785a2 . (8) of uo is
a2
1
A. Ideal Optimal Controller ûo = − R−1 BT ∇ϕT (z)Ŵ. (16)
2
The control objective is to design the optimal control law uo Meanwhile, Ŵ is updated by the adaptive update law
that effectively minimizes the prescribed cost function
 ∞
 T  ˙ 1   T  
Ŵ = Γ ∇ϕ(z)BR−1 BT z − κ1 Ŵ − κ2 Ŵ Ŵ Ŵ
J (z) = uo (z)Ruo (z) + z T (τ )Qz(τ ) dτ (9) 2
t (17)
where R ∈ R2×2 and Q ∈ R3×3 are positive definite matrices. where Γ ∈ Rl×l is a positive definite matrix, and κ1 and κ2 are
Assuming that the optimal control law is uo which is C1 , one positive constants.
can obtain the optimal cost function Based on the above design, the ADP-based fixed-time optimal
 ∞ tracking controller is constructed as follows:
 T   
J  (z) = uo (z)Ruo (z) + z T (τ )Qz(τ ) dτ. (10)  † z p
t uo = ûo − B λtanh + μz + αz q + βz 3 (18)
ρ
Then, the HJB equation is obtained as follows:
H (z, uo , ∇J  (z)) = ∇J T (z) (A + Buo ) + z T Qz where B† the pseudo-inverse of B; λ, μ, α, and β are
positive definite matrices with appropriate dimensions;
+ uT 
o Ruo = 0 (11) tanh(z/ρ) = [tanh(zx /ρ), tanh(zy /ρ), tanh(zθ /ρ)]T ,
p/q p/q p/q
z 3 = [zx3 , zy3 , zθ3 ]T , z p/q = [zx , zy , zθ ]T ; p and q are
where ∇J T (z) = [∂J  (z)/∂z]T ∈ R1×3 , ∂J  (z)/∂z = positive odd integers satisfying p < q.
[∂J  (z)/∂zx , ∂J  (z)/∂zy , ∂J  (z)/∂zθ ]T ∈ R3×1 . Up to now, the fixed-time optimal controller has been de-
According to ∂H(z, uo , ∇J  (z))/∂uo = 02×1 , the ideal signed. To gain a deeper understanding of the controller’s struc-
optimal control law is derived as ture, its primary framework is presented in Fig. 2.
1
uo = − R−1 BT ∇J  (z). (12)
2 C. System Stability Analysis
Finally, applying (12) to (11) yields Theorem 1: Consider the tracking error system described by
1 (5a)–(5c) for the nonholonomic mobile robot. The proposed
H (z, uo , ∇J  (z)) = − ∇J T (z)BR−1 BT ∇J  (z) ADP-based fixed-time optimal tracking controller (18) can sta-
4
bilize the origin of the error system in fixed time independent
+ ∇J T (z)A + z T Qz. (13) of the initial condition. Specifically, the tracking error and the

Authorized licensed use limited to: Macquarie University. Downloaded on November 30,2024 at 03:12:42 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: ADP-BASED FIXED-TIME OPTIMAL CONTROL FOR WHEELED MOBILE ROBOT 179

with
 p+q
1 q−p p + q q−p
ς = κ1 W T W  +
2 2q 2q
l 
κ2 Wi4 3κ2 Wi4
+ + . (23)
i=1
12 4i4

Since A(z) is locally Lipchitz, there exists ψA such that


A < ψA z holds, where  ·  stands for the Euclidean
Fig. 2. Controller structure diagram. norm. Therefore, if a suitable μ is chosen to satisfy the condi-
tion ϑmin (μ) ≥ ψA , the inequality z T (A − μz) ≤ ψA z2 −
ϑmin (μ)z2 ≤ 0 holds, where ϑmin (μ) represents the min-
weight estimation error of the neural network can converge to a imum eigenvalue of μ. By Lemma 3, it can be obtained
neighborhood of zero in fixed time Tmax . that −z T λtanh(z/ρ) ≤ −|z|T λι + 0.2785ριT λι with ι =
Proof: A Lyapunov function is chosen as [1, 1, 1]T and |z| = [|zx |, |zy |, |zθ |]T . Theoretically, the optimal
control law uo exists and can be approximated using critic
V (z, W̃) =
1 T 1 T
z z + W̃ Γ−1 W̃ (19) neural networks. Therefore, it is reasonable to assume that uo
2 2 and ∇σ(z) are bounded [33]. Further, it can be obtained that
z T [Buo + 12 BR−1 BT ∇σ(z)] ≤ |z|T τ since B is bounded,
where W̃ = W  − Ŵ = [W̃1 , W̃2 , . . . , W̃l ]T ∈ Rl×1 is the
where τ = [τ1 , τ2 , τ3 ]T , τi > 0, and i = 1, 2, 3. Thus, a suitable
weight estimation error; Γ−1 is the inverse of Γ.
λ can make |z|T τ − |z|T λι ≤ 0 hold. Based on the above dis-
Calculating the time derivative of (19) yields
cussion, if suitable parameters μ and λ are defined, the following
    inequality holds:
1
V̇ = z T B R−1 BT ∇ϕT (z) W  − Ŵ + uo 
2 1
z Buo + BR−1 BT ∇σ(z) + A − μz
T

1 −1 T p 2
+ R B ∇σ(z) + A − μz − αz q − βz 3 
2 z
 −λtanh ≤ 0.2785ριT λι. (24)
z T ˙ ρ
− λtanh − W̃ Γ−1 Ŵ
ρ Then, substituting (24) into (22) yields
  p+q 2
T 1 ˙
∇ϕ(z)BR−1 BT z − Γ−1 Ŵ
2q
= W̃ 1 T −1 1 T −1
2 V̇ ≤ − δ1 W̃ Γ W̃ − φ1 W̃ Γ W̃
2 2

1  2q
p+q 2
+ z T Buo + A + BR−1 BT ∇σ(z) 1 T 1 T
2 − δ2 z z − φ2 z z + ς¯
 2 2
z p
−μz − λtanh − αz q − βz 3 . (20) ≤ −δV
p+q
− φV 2 + ς¯ (25)
ρ 2q

Then, substituting (17) into (20) yields where δ2 = 2(p+q)/2q ϑmin (α), φ2 = 4ϑmin (β), ς¯ =
 T  T ς + 0.2785ριT λι, δ = min{δ1 , δ2 }, φ = 2 min{φ1 , φ2 },
T
V̇ = κ1 W̃ Ŵ + κ2 W̃ Ŵ Ŵ Ŵ and ϑmin (α) and ϑmin (β) are the minimum eigenvalues of α
and β, respectively.
 According to the Lemma 2.2 in [37], the Lyapunov function
1
+ z Buo + A + BR−1 BT ∇σ(z)
T
V can converge to the region described as
2  
 2q  1

z p ς¯ p+q
ς¯ 2

−μz − λtanh − αz q − βz 3 . (21) V ≤ min , (26)


ρ (1 − ε)δ (1 − ε)φ

Based on the inequalities (39) and (42) obtained in appendix, and the convergence time T (z(0), W̃(0)) satisfies
we can get
2q 1
 p+q 2 T (z(0), W̃(0)) ≤ Tmax := + (27)
1 T −1 2q
1 T −1 δε (q − p) φε
V̇ ≤ − δ1 W̃ Γ W̃ − φ1 W̃ Γ W̃
2 2 where ε ∈ (0, 1), z(0) and W̃(0) are the initial errors.
 According to (19) and (27), as t → Tmax , the robot’s tracking
1
+ z T Buo + BR−1 BT ∇σ(z) + A − μz error and neural network weight estimation error satisfy
2 ⎧ ⎫
⎨√   1
 q
4⎬
z p ς¯ p+q
ς

−λtanh − αz q − βz 3 + ς (22) z ≤ min 2 , (28)
ρ ⎩ (1 − ε)δ (1 − ε)φ ⎭

Authorized licensed use limited to: Macquarie University. Downloaded on November 30,2024 at 03:12:42 UTC from IEEE Xplore. Restrictions apply.
180 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 10, NO. 1, JANUARY 2025

 
√  ς̄  p+q  4 1
q IV. SIMULATIONS AND EXPERIMENTS
4ς̄
min 2 (1−ε)δ , (1−ε)φ
In order to verify the feasibility of the proposed optimal
W̃ ≤ (29) controller, simulations and physical experiments are conducted.
ϑmin (Γ−1 ) We compare the proposed controller with two other controllers.
One of them is the classical backstepping controller (marked as
where ϑmin (Γ−1 ) represents the maximum eigenvalue of Γ−1 . BC) which is designed as follows [35]:
Then, the error between the ideal optimal controller (15) and   T
the ADP-based fixed-time optimal controller (18) needs to be v ξ1 zx + vr cos zθ
= (31)
analyzed. According to (16) and (18), one has ω ωr + ξ2 vr zy + ξ3 vr sin zθ
where ξ1 , ξ2 , and ξ3 are positive constants; zθ , zx , and zy are
1   given in (3). The other one is the ADP-based optimal controller
uo − uo = R−1 BT ∇ϕT (z)W̃ + ∇σ(z) (marked as ADP-UUB) with uniformly ultimately bounded sta-
2
  bility, which is obtained by modifying the proposed controller.
z p
To be specific, replace (17) and (18) with
− B† λtanh + μz + αz q + βz 3 .
ρ
˙ 1  
(30) Ŵ = Γ ∇ϕ(z)BR−1 BT z − κ1 Ŵ (32)
2
and
Notice that B, B† , ∇ϕT (z), and ∇σ(z) are function vectors  
with respect to z. Moreover, according to (28) and (29), z and z
uo = ûo − B† λtanh + μz + αz (33)
W̃ are bounded. Hence, it follows that uo − uo  is bounded, ρ
which implies that the fixed-time optimal controller uo designed where the relevant parameter definitions are the same as (17)
using critic neural networks can fully approximate the ideal and (18). In addition, all the remaining designs are consistent
optimal controller uo by adjusting the relevant customized with the proposed controller. To ensure the fairness of the com-
parameters. The proof of Theorem 1 is complete.  parison, the control parameters of ADP-UUB and the proposed
Remark 2: In some existing studies on ADP-based optimal controller are chosen consistently in the follow-up simulations
control [32], [34], the neural network weight update law is and experiments.
designed based on the gradient descent algorithm, which re- Remark 5: Similar to the proof of Theorem 1, the ADP-UUB
quires the persistent/finite excitation condition. Although [38] allows the Lyapunov function (19) to satisfy the inequality
removed the above excitation condition by introducing a positive
V̇ ≤ −δ1 V + δ2 with δ1 > 0 and δ2 > 0, which implies that the
definite function, it still requires neural networks with the same
ADP-UUB could make the error system (4) uniformly ultimately
actor-critic structure as [34]. This complex structure is not
bounded. Moreover, since the ADP-UUB is designed in the
favorable for engineering applications. In contrast, only a critic
framework of optimal control, the system can also obtain optimal
neural network is used in this study, which not only removes the
control performance. It is worth mentioning that ADP-UUB
excitation condition, but also simplifies the controller structure.
allows the robot’s tracking error to converge exponentially [30].
Remark 3: Although the optimal control performance can be
Theoretically, as time tends to infinity, the tracking error con-
obtained by some existing ADP-based optimal control meth-
verges to a neighborhood of the origin. In contrast, the control
ods [28], [29], [30], the convergence speed of the system has
method proposed in this study can achieve tracking error con-
not been concerned. Specifically, they can only guarantee uni-
vergence in fixed time.
formly ultimately bounded stability, so the system takes take an
infinitely long time to enter the final stable state. In this study,
we concurrently consider both the convergence speed and the A. Simulation Results
optimal control performance of the system, ensuring that both The control task is to drive the mobile robot to move follow-
aspects are optimized for improved overall performance. ing the reference trajectory defined by [xr (0), yr (0), θr (0)]T =
Remark 4: Utilizing the Lyapunov stability theory, it has been [−2, −1.5, 0]T , vr = 0.3 m/s, and ωr is given as
shown that both the tracking error and the weight estimation ⎧
error of the system can converge to the compact sets described by ⎪ 0 rad/s 0 s ≤ t ≤ 15 s

(28) and (29), respectively. According to (27), the convergence 0.2 rad/s 15 s < t ≤ 5π + 15 s
ωr = . (34)
time of the system is bounded, and the boundary is independent ⎪
⎩ 0 rad/s 5π + 15 < t ≤ 5π + 30 s
of the system’s initial state. This is in contrast to the uniformly 0.2 rad/s 5π + 30 s < t ≤ 60 s
ultimately bounded stability and the finite-time stability [30],
[31], [32]. From (27)–(29), one can choose the relevant param- The initial position of the mobile robot is specified as
eters to achieve larger values of δ and φ and smaller values of [x(0), y(0), θ(0)]T = [−2.1, −1.6, π/20]T . The basis function
ς¯, thereby reducing the tracking and estimation errors. On the vector is designed as ϕ(z) = [zx2 , zy2 , zθ2 , zx zy , zy zθ , zx zθ ]T .
other hand, convergence can be accelerated by increasing the Hence, the corresponding partial derivative term is ∇ϕT (z) =
values of δ and φ. For instance, δ and φ can be increased by [2zx , 0, 0, zy , 0, zθ ; 0, 2zy , 0, zx , zθ , 0; 0, 0, 2zθ , 0, zy , zx ]. The
raising ϑmin (α), ϑmin (β), κ1 , and κ2 , and vice versa. However, initial value of the weight estimation vector is Ŵ(0) =
excessively large values of δ and φ may result in a large control [0, 0, 0, 0, 0, 0]T . The parameters of the proposed optimal con-
input, potentially causing system chatter or even instability. troller are chosen as follows: p = 17, q = 19, R = 2I 3 , Q =
Therefore, all parameters must be selected carefully to optimize 10I 3 , λ = μ = α = diag(1, 1, 0.4), β = diag(1e5, 1e5, 1)
the control performance. ρ = 10, Γ = 2I 6 , κ1 = κ2 = 0.1, where I N ∈ RN ×N is an

Authorized licensed use limited to: Macquarie University. Downloaded on November 30,2024 at 03:12:42 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: ADP-BASED FIXED-TIME OPTIMAL CONTROL FOR WHEELED MOBILE ROBOT 181

Fig. 3. Simulation results: (a) tracking performance; (b) tracking error; (c) neural network weight estimation; (d) control input.

TABLE I
COMPARISON OF COST INDEXES IN SIMULATIONS

identity matrix. The parameters of BC are selected as ξ1 = 3 Fig. 4. Mobile robot experiment platform.
and ξ2 = ξ3 = 8.
The simulation outcomes pertaining to the dynamic per-
formance of the mobile robot are presented in Fig. 3. The TABLE II
COMPARISON OF COST INDEXES IN EXPERIMENTS
motion trajectory of the center of mass in the x − y plane
is shown in Fig. 3(a). Meanwhile, the position tracking error
and heading angle tracking error are depicted in Fig. 3(b). The
results show that the proposed ADP-based fixed-time optimal
tracking controller exhibits faster convergence and steady-state
accuracy compared to the BC and ADP-UUB controllers. The
learning process of neural network weight estimation is shown in
Fig. 3(c), and it shows that all the weight estimations converge
rapidly to different constants. The control input responses for network. The QBot 3 can receive real-time state information
the three controllers are shown in Fig. 3(d). To facilitate a more from the host computer, thus realizing closed-loop control.
accurate comparison of the control performance between the The desired trajectory of the robot is defined as
three controllers, we define two specific performance indexes. [xr (0), yr (0), θr (0)]T = [1.5, 0, π/2]T , vr = 0.3 m/s, ωr =
t 0.4 rad/s. The initial position of the robot is specified as
Specifically, one is the control cost Jc = t0f uT (τ )Ru(τ )dτ
t [x(0), y(0), θ(0)]T = [0.5, 0, π/2]T . The neural network basis
and the other is the error cost Je = t0f z T (τ )Qz(τ )dτ , where functions are chosen in the same way as in the simulation. The
t0 and tf represent the initial and terminal moments, respectively. parameters of the proposed optimal controller are chosen
The performance indexes of the three controllers at different time as follows: Ŵ(0) = [0, 0, 0, 0, 0, 0]T , p = 17, q = 19,
periods are shown in Table I. The results show that compared to R = 2I 3 , Q = 10I 3 , λ = μ = α = diag(0.1, 0.1, 0.2),
the other two controllers, the proposed controller requires more β = diag(200, 200, 0.3) ρ = 1, Γ = 2I 6 , and κ1 = κ2 = 0.1.
control cost in the first 30 s to achieve fast convergence of the The parameters of BC are selected as ξ1 = ξ2 = ξ3 = 1.5.
tracking error. However, the error cost of the proposed controller The experimental outcomes are depicted in Table II and Fig. 5.
is minimized for both time periods. It is obvious that all closed-loop system signals remain bounded.
According to the robot motion trajectory depicted in Fig. 5(a),
the proposed controller and ADP-UUB can enable the robot to
B. Experiment Results follow the desired trajectory rapidly, while the conventional BC
To test the performance of the proposed method in a practical cannot. Moreover, owing to the advantage of fixed-time stability,
application, an experimental platform as shown in Fig. 4 is built. the proposed controller exhibits a faster convergence speed than
The OptiTrack system consists of multiple cameras, which can that of the ADP-UUB. The tracking error is depicted by Fig. 5(b).
measure the position and attitude information of a robot in real Fig. 5(c) shows that the weight estimates of the neural network
time and transmit it to the host computer. The robot is QBot 3 converge to a stable state after a period of learning process.
manufactured by Quanser and is connected to the host computer The control signals are depicted by Fig. 5(d). Additionally, a
via a wireless network. The controller is constructed in the comparison of the cost indexes of the three controllers is shown
host computer and downloaded to the QBot 3 via the wireless in Table II. According to the table, the error indexes of the

Authorized licensed use limited to: Macquarie University. Downloaded on November 30,2024 at 03:12:42 UTC from IEEE Xplore. Restrictions apply.
182 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 10, NO. 1, JANUARY 2025

Fig. 5. Experimental results: (a) tracking performance; (b) tracking error; (c) neural network weight estimation; (d) control input.

TABLE III
COST INDEXES FOR DIFFERENT CONTROL GAINS IN EXPERIMENTS
By the definition of Ŵ, one has
 
W̃i Ŵi3 = W̃i Wi3 + 3Wi W̃i2 − 3Wi2 W̃i − W̃i3 . (35)

From Young’s inequality (Lemma 7 in [17]), one gets


Wi4
W̃i Wi3 ≤ + 3Wi2 W̃i2 (36)
12
4
9i W̃i4
3
3Wi4
3W̃i3 Wi ≤ + (37)
4 4i4
proposed controller and ADP-UUB are better than the traditional
BC for both time periods, which verifies the advantages of where i > 0.
optimal control. On the other hand, comparing to the ADP-UUB, According to (35)–(37), the following inequality holds:
the proposed controller requires a bigger control cost to achieve  2
4 1 2 W 4 3Wi4
fast tracking of the reference trajectory, which is in line with 3
W̃i Ŵi ≤ − 4 − 9i 3
W̃i + i + . (38)
the practical significance. Finally, we have also conducted the 2 12 4i4
experiment with the proposed controller under different control
gains λ, μ, and α (the remaining parameters are consistent Hence, by using Lemma 2, it can be obtained that
with previous experiments). The corresponding cost indexes   l   2
T T 4 1 2
are shown in Table III. It is clear that the control gains have κ2 W̃ Ŵ Ŵ Ŵ ≤ − 4κ2 − 9κ2 i
3

a significant effect on the system’s control performance, and i=1
2 i
suitable values need to be chosen to obtain the desired control l 
performance. κ2 Wi4 3κ2 Wi 4
+ +
i=1
12 4i4
2
V. CONCLUSION 1 T −1
≤ − φ1 W̃ Γ W̃
By using the critic-only neural network ADP technique, an 2
ADP-based fixed-time optimal tracking controller has been l 
κ2 Wi4 3κ2 Wi4
developed for wheeled mobile robots. The cost function is + +
reconstructed using a critic neural network. Meanwhile, the i=1
12 4i4
adaptive control technique is utilized to design the weight update (39)
law without persistent or finite excitation conditions. Rigorous
4/3
theoretical derivations demonstrate that the weight estimation where φ1 = min{(4κ2 − 9κ2 1 )/[l(ϑmin (Γ−1 ))2 ], . . . , (4κ2
4/3
error and the robot tracking error can converge to a neighborhood − 9κ2 l )/[l(ϑmin (Γ−1 ))2 ]}.
of zero in fixed time. Both the simulations and the experiments According to Young’s inequality (Lemma 7 in [17]), one gets
reveal that the proposed ADP-based fixed-time optimal tracking
controller exhibits a superior convergence rate compared to the T 1 1 T
κ1 W̃ Ŵ ≤ κ1 W T W  − κ1 W̃ W̃. (40)
BC and ADP-UUB controllers. Based on the results of this study, 2 2
we will study the optimal formation control for wheeled mobile T
robots in the future [39]. By Lemma 1, if make  = 1, ¯ = (1/2)κ1 W̃ W̃, μ =
(p + q)/(2q), and o = 1/μ, one has
 p+q  p+q
APPENDIX 1 T 2q
1 T q − p p + q q−p
κ1 W̃ W̃ ≤ κ1 W̃ W̃ + .
This section presents two proofs for the inequalities used in 2 2 2q 2q
the derivation of (22). (41)

Authorized licensed use limited to: Macquarie University. Downloaded on November 30,2024 at 03:12:42 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: ADP-BASED FIXED-TIME OPTIMAL CONTROL FOR WHEELED MOBILE ROBOT 183

Hence, one can obtain the inequality [19] C. Wang, Q. Guo, J. Wang, Z. Liu, and C. L. P. Chen, “Fixed-time
fuzzy control for uncertain nonlinear systems with prescribed performance
 p+q
2q and event-triggered communication,” IEEE Trans. Circuits Syst. I., Reg.
T 1 1 T −1
κ1 W̃ Ŵ ≤ κ1 W T W  − δ1 W̃ Γ W̃ Papers, vol. 71, no. 5, pp. 2362–2371, May 7, 2024.
2 2 [20] M. Chen, H. Wang, and X. Liu, “Adaptive practical fixed-time tracking
 p+q control with prescribed boundary constraints,” IEEE Trans. Circuits Syst.
q − p p + q q−p I., Reg. Papers, vol. 68, no. 4, pp. 1716–1726, Apr. 2021.
+ (42) [21] Y. Zhang and F. Wang, “Observer-based fixed-time neural control for a
2q 2q
class of nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33,
where δ1 = (κ1 /ϑmin (Γ−1 ))(p+q/2q) , and ϑmin (Γ−1 ) represents no. 7, pp. 2892–2902, Jul. 2022.
[22] J. Zhang, F. Xu, X. Liu, S. Gu, and H. Geng, “Fixed-time dynamic
the maximum eigenvalue of Γ−1 . surface control for pneumatic manipulator system with unknown dis-
turbances,” IEEE Robot. Autom. Lett., vol. 7, no. 4, pp. 10890–10897,
REFERENCES Oct. 2022.
[23] C. Wang, H. Zhan, Q. Guo, and T. Li, “Distributed neural fixed-time
[1] Z. Li, W. Gao, C. Goh, M. Yuan, E. K. Teoh, and Q. Ren, “Asymptotic consensus control of uncertain multiple Euler-Lagrange systems with
stabilization of nonholonomic robots leveraging singularity,” IEEE Robot. event-triggered mechanism,” IEEE/ASME Trans. Mechatron., early ac-
Autom. Lett., vol. 4, no. 1, pp. 41–48, Jan. 2019. cess: Jun., 25, 2024, doi: 10.1109/TMECH.2024.3410299.
[2] S. J. Yoo, “Adaptive tracking control for a class of wheeled mobile robots [24] Z. Lv, Y. Wu, X.-M. Sun, and Q.-G. Wang, “Fixed-time control for a
with unknown skidding and slipping,” IET Control Theory Appl., vol. 4, quadrotor with a cable-suspended load,” IEEE Trans. Intell. Transp. Syst.,
no. 10, pp. 2109–2119, Oct. 2010. vol. 23, no. 11, pp. 21932–21943, Nov. 2022.
[3] H. Gao, X. Song, L. Ding, K. Xia, N. Li, and Z. Deng, “Adaptive motion [25] H. Dong, X. Zhao, Q. Hu, H. Yang, and P. Qi, “Learning-based atti-
control of wheeled mobile robot with unknown slippage,” Int. J. Control, tude tracking control with high-performance parameter estimation,” IEEE
vol. 87, no. 8, pp. 1513–1522, Feb. 2014. Trans. Aerosp. Electron. Syst., vol. 58, no. 3, pp. 2218–2230, Jun. 2022.
[4] C. Chen, H. Gao, L. Ding, W. Li, H. Yu, and Z. Deng, “Trajectory tracking [26] R. E. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton
control of WMRs with lateral and longitudinal slippage based on active Univ. Press, 1957.
disturbance rejection control,” Robot. Auton. Syst., vol. 107, pp. 236–245, [27] D. Liu, D. Wang, F.-Y. Wang, H. Li, and X. Yang, “Neural-network-
Sep. 2018. based online HJB solution for optimal robust guaranteed cost control
[5] M. Mera, H. Ros, and E. A. Martnez, “A sliding-mode based controller of continuous-time uncertain nonlinear systems,” IEEE Trans. Cybern.,
for trajectory tracking of perturbed unicycle mobile robots,” Control Eng. vol. 44, no. 12, pp. 2834–2847, Dec. 2014.
Pract., vol. 102, Sep. 2020, Art. no. 104548. [28] D. Wang and C. Mu, “Adaptive-critic-based robust trajectory tracking of
[6] P. Rochel, H. Ros, M. Mera, and A. Dzul, “Trajectory tracking for uncertain uncertain dynamics and its application to a spring-mass-damper system,”
unicycle mobile robots: A super-twisting approach,” Control Eng. Pract., IEEE Trans. Ind. Electron., vol. 65, no. 1, pp. 654–663, Jan. 2018.
vol. 122, May 2022, Art. no. 105078. [29] C. Mu, Y. Zhang, Z. Gao, and C. Sun, “ADP-based robust tracking
[7] H. Yang, X. Fan, Y. Xia, and C. Hua, “Robust tracking control for wheeled control for a class of nonlinear systems with unmatched uncertainties,”
mobile robot based on extended state observer,” Adv. Robot., vol. 30, no. 1, IEEE Trans. Syst. Man Cybern. Syst., vol. 50, no. 11, pp. 4056–4067,
pp. 68–78, Jan. 2016. Nov. 2020.
[8] H. Wang, P. X. Liu, X. Zhao, and X. Liu, “Adaptive fuzzy finite-time [30] G. Wen, S. S. Ge, and F. Tu, “Optimized backstepping for tracking control
control of nonlinear systems with actuator faults,” IEEE Trans. Cybern., of strict-feedback systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29,
vol. 50, no. 5, pp. 1786–1797, May 2020. no. 8, pp. 3850–3862, Aug. 2018.
[9] H. Wang, W. Liu, and M. Tong, “Adaptive fuzzy fast finite-time output- [31] Y. Li, K. Li, and S. Tong, “Reinforcement learning-based adap-
feedback tracking control for switched nonlinear systems with full- tive finite-time performance constraint control for nonlinear systems,”
state constraints,” IEEE Trans. Fuzzy Syst., vol. 32, no. 3, pp. 958–968, IEEE Trans. Syst. Man Cybern. Syst., vol. 54, no. 2, pp. 1335–1344,
Mar. 2024. Feb. 2024.
[10] J. Wang, C. Wang, C. L. P. Chen, Z. Liu, and C. Zhang, “Fast finite- [32] H. Dong, X. Zhao, and B. Luo, “Optimal tracking control for uncertain
time event-triggered consensus control for uncertain nonlinear multiagent nonlinear systems with prescribed performance via critic-only ADP,”
systems with full-state constraints,” IEEE Trans. Circuits Syst. I., Reg. IEEE Trans. Syst. Man Cybern. Syst., vol. 52, no. 1, pp. 561–573,
Papers, vol. 70, no. 3, pp. 1361–1370, Mar. 2023. Jan. 2022.
[11] S. Li and Y.-P. Tian, “Finite-time stability of cascaded time-varying [33] B. Xiao, H. Zhang, Z. Chen, and L. Cao, “Fixed-time fault-tolerant
systems,” Int. J. Control, vol. 80, no. 4, pp. 646–657, Apr. 2007. optimal attitude control of spacecraft with performance constraint via
[12] D. Wu, Y. Cheng, H. Du, W. Zhu, and M. Zhu, “Finite-time output feedback reinforcement learning,” IEEE Trans. Aerosp. Electron. Syst., vol. 59, no. 6,
tracking control for a nonholonomic wheeled mobile robot,” Aerosp. Sci. pp. 7715–7724, Dec. 2023.
Technol., vol. 78, pp. 574–579, Jul. 2018. [34] R. Kamalapurkar, H. S. D. Bhasin, and W. E. Dixon, “Approximate optimal
[13] H. Wang, K. Xu, and H. Zhang, “Adaptive finite-time tracking control trajectory tracking for continuous-time nonlinear systems,” Automatica,
of nonlinear systems with dynamics uncertainties,” IEEE Trans. Autom. vol. 51, pp. 40–48, Jan. 2015.
Control, vol. 68, no. 9, pp. 5737–5744, Sep. 2023. [35] C.-Y. Chen, T.-H. S. Li, Y.-C. Yeh, and C.-C. Chang, “Design and im-
[14] Y.-X. Li, Z. Hou, W.-W. Che, and Z.-G. Wu, “Event-based design of finite- plementation of an adaptive sliding-mode dynamic controller for wheeled
time adaptive control of uncertain nonlinear systems,” IEEE Trans. Neural mobile robots,” Mechatron., vol. 19, no. 2, pp. 156–166, Mar. 2009.
Netw. Learn. Syst., vol. 33, no. 8, pp. 3804–3813, Aug. 2022. [36] J. Zhang, S. Li, and Z. Xiang, “Adaptive fuzzy output feedback event-
[15] C. P. Vo and K. K. Ahn, “An adaptive finite-time force-sensorless tracking triggered control for a class of switched nonlinear systems with sensor
control scheme for pneumatic muscle actuators by an optimal force estima- failures,” IEEE Trans. Circuits Syst. I., Reg. Papers, vol. 67, no. 12,
tion,” IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 1542–1549, Apr. 2022. pp. 5336–5346, Dec. 2020.
[16] A. Polyakov, “Nonlinear feedback design for fixed-time stabilization of [37] D. Ba, Y. Li, and S. Tong, “Fixed-time adaptive neural tracking control
linear control systems,” IEEE Trans. Autom. Control, vol. 57, no. 8, for a class of uncertain nonstrict nonlinear systems,” Neurocomputing,
pp. 2106–2110, Aug. 2012. vol. 363, pp. 273–280, Oct. 2019.
[17] H. Wang, J. Ma, X. Zhao, B. Niu, M. Chen, and W. Wang, “Adaptive [38] S. Yang, F. Yu, H. Liu, H. Ma, and H. Zhang, “Adaptive-dynamic-
fuzzy fixed-time control for high-order nonlinear systems with sensor and programming-based robust control for a quadrotor UAV with external
actuator faults,” IEEE Trans. Fuzzy Syst., vol. 31, no. 8, pp. 2658–2668, disturbances and parameter uncertainties,” Appl. Sci., vol. 13, no. 23,
Aug. 2023. Nov. 2023, Art. no. 12672.
[18] H. Wang and L. Shen, “Adaptive fuzzy fixed-time tracking control for [39] W. Liu, X. Wang, and S. Li, “Formation control for leader-
nonlinear systems with time-varying full-state constraints and actuator follower wheeled mobile robots based on embedded control tech-
hysteresis,” IEEE Trans. Fuzzy Syst., vol. 31, no. 4, pp. 1352–1361, nique,” IEEE Trans. Control Syst. Technol., vol. 31, no. 1, pp. 265–280,
Apr. 2023. Jan. 2023.

Authorized licensed use limited to: Macquarie University. Downloaded on November 30,2024 at 03:12:42 UTC from IEEE Xplore. Restrictions apply.

You might also like