#1formation Controll-Multiple Robots

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

ISA Transactions 138 (2023) 63–73

Contents lists available at ScienceDirect

ISA Transactions
journal homepage: www.elsevier.com/locate/isatrans

Research article

Adaptive formation control of leader–follower mobile robots using


reinforcement learning and the Fourier series expansion

Gholamreza Khodamipour, Saeed Khorashadizadeh , Mohsen Farshad
Faculty of Electrical and Computer Engineering, University of Birjand, 97175/615, Birjand, Iran

article info a b s t r a c t

Article history: In this paper, a formation controller for leader–follower mobile robots is presented based on reinforce-
Received 15 September 2022 ment learning and the Fourier series expansion. The controller is designed based on the dynamical
Received in revised form 9 March 2023 model in which permanent magnet direct-current (DC) motors are included as actuator. Thus, motor
Accepted 9 March 2023
voltages are the control signals and are designed based on the actor–critic strategy which is a well-
Available online 23 March 2023
known approach in the field of reinforcement learning. Stability analysis of formation control of
Keywords: leader–follower mobile robots using the proposed controller verifies that the closed-loop system is
Formation control globally asymptotically stable. Due to the existence of sinusoidal terms in the model of mobile robots,
Reinforcement learning the Fourier series expansion has been selected to construct the actor and critic, while previous related
Actor–critic strategy works utilized neural networks in actor and critic. In comparison with neural networks, the Fourier
The Fourier series expansion
series expansion is simpler and involves the designer in fewer tuning parameters. In simulation studies,
Leader–follower mobile robots
it has been assumed that some follower robots can play the role of leader for the other follower robots
behind it. Simulation results show that there is no need to use large number of the sinusoidal terms
in the Fourier series expansion and just the first three terms can overcome uncertainties. In addition,
the proposed controller reduced the performance index of tracking errors considerably in comparison
with radial basis function neural networks (RBFNN).
© 2023 ISA. Published by Elsevier Ltd. All rights reserved.

1. Introduction Other applications of formation control can be easily observed


in spacecrafts. Several small spacecrafts can accurately play the
Using mobile robots instead of the human is one of the most role of sensor arrays or place sensors in fixed geometric shape.
important achievements in these last decades in unstructured Also, box pushing using mobile robots is a well-known problem
environments which require autonomy. Due to the fast decision in cooperative robotics.
making and high accuracy of the robotic systems, mobile robots Various controllers for formation control have been presented
are widely used in many different applications such as patrolling in recent years, such as the behavior-based method [3], the
surveillance, emergency rescue operations planetary exploration, leader-following strategy [3,4], and the consensus-based
industrial automation, personal services, entertainment, recon- approach [5]. In this paper, the formation control using leader–
naissance, medical care, and so on [1]. In some applications, follower strategy is studied in which the separation-bearing
several mobile robots must cooperate in a fixed form to increase parameters will be tuned in which the followers stay at a spec-
safety and efficiency. Thus, formation control is of great impor- ified separation and bearing from its designated leader. In [6],
tance. Formation control is not confined to mobile robots. For a velocity control signal is designed by back-stepping kinematic
example, in order to take the advantages of aerodynamics facts, equations of separation and bearing into dynamics. This paper is
aircraft flying in a V-shaped formation is usual, since it will based on the strategy presented in [6].
minimize fuel loss. This formation is inspired by flocks of birds Most of presented controllers for mobile robots and leader–
flying in the sky that use this shape to increase their efficiency follower formation control are based on the torque control strat-
[2]. Moreover, in automated formation control, several aircrafts egy in which the control law calculates the applied torques on
can be flown by one pilot simultaneously that results in reducing the mobile robot [7–9]. However, in practical applications, this
fatigue in long-distance flight. Moreover, only one piloted aircraft torque cannot be produced directly by the controller located on
can control a large group of semi-autonomous pilotless aircraft. the processors. In fact, this torque is generated by actuators such
as permanent magnet DC motors. Thus, this question arises that
∗ Corresponding author. how the motor must be excited to generate that torque calculated
E-mail address: [email protected] (S. Khorashadizadeh). by the torque-based controller. To solve this problem, voltage

https://doi.org/10.1016/j.isatra.2023.03.009
0019-0578/© 2023 ISA. Published by Elsevier Ltd. All rights reserved.
G. Khodamipour, S. Khorashadizadeh and M. Farshad ISA Transactions 138 (2023) 63–73

control strategy has been introduced in which the control input


is the applied voltages on the motors’ terminals [10]. This paper
utilizes this strategy and consequently a voltage-based controller
will be proposed.
Generally, the adaptive controllers presented for nonlinear
systems can be classified into 2 main categories: Model-based
and model-free controllers. Model-based controllers are faced by
some fundamental challenges such as uncertainties and sensing
requirements [11]. Therefore, model-free controllers are more
preferable. Model-free approaches can be classified into 2 main
groups: intelligent controllers and function approximation-based
controllers. Among intelligent control approaches, neuro-fuzzy
controllers have been extensively studied [12,13]. The universal
approximation property is the main common feature of these
researches. Neural networks such as radial-basis function neural
networks (RBFNN) are also intelligent tools that have been used
in formation control [14,15].
Reinforcement learning (RL) controllers are also considered as
Fig. 1. A non-holonomic mobile robot.
intelligent controllers in which the agents (controllers) learn their
strategy and calculate the control action by trial-and-error explo-
ration mixed with reward signals originated by the environment
(the system under control) [16]. Recently, deep reinforcement The rest of this paper is organized as follow. In Section 2, the
learning has been presented and tested on formation control modeling of a mobile robot with its DC motors are explained. In
of mobile robots in which a convolutional neural network is Section 3, the Fourier series expansion is introduced. In Section 4,
utilized for image processing [17]. Actor–critic is one of impor- the proposed RL-based controller is designed and the adaptive
tant strategies in RL. Usually, neural networks or adaptive fuzzy laws in actor and critic parts are derived. Stability analysis is dis-
systems are used to approximate the actors and critics [18–20]. cussed in Section 5. In Section 6, simulations and comparisons are
However, neural networks and fuzzy systems are involved in presented. In order to highlight the superiority of the designed
a large number of tuning parameters [21]. Therefore, function controller, a formation scenario for control of mobile robots is
approximation-based approaches using the Taylor series [22], simulated and the performance of the Fourier series expansion
Bernstein polynomials [23], Szász–Mirakyan operator [24], the and neural networks are compared. Finally, some concluding
Fourier series expansion and Legendre polynomials [25] have remarks are presented in Section 7.
been presented in which the universal approximation property is
satisfied and the number of tuning parameters is fewer in com-
2. Modeling
parison with neural networks and fuzzy systems. This paper uses
these techniques and proposes a novel actor–critic RL controller.
The proposed controller in this paper can be considered Let us consider the dynamics of a mobile robot as [26]
as a combination of intelligent controllers and function
M(q)q̈ + Vm (q, q̇)q̇ + F(q, q̇) + G(q) +τ d (t) = Bτ (t) − A(q)T λ(t) (1)
approximation-based techniques. It is intelligent due to utilizing
reinforcement learning (actor–critic strategy). Also, it is a function where M ∈ ℜn×n is a positive definite symmetric inertia matrix,
approximation-based controller due to using the Fourier series Vm (q, q̇)q̇ ∈ ℜn×n denotes the Coriolis and centrifugal torques,
expansion in the actor and critic. As mentioned above, previous F ∈ ℜn×1 is the friction torques and G ∈ ℜn×1 is the vector of
related works [18–20] utilize neural networks and fuzzy systems gravitational torques. Also, τ d is the bounded unknown distur-
in the actor and critic part and consequently involve the designer bances and un-modeled dynamics. The matrix B ∈ ℜn×r is the
in a large number of tuning parameters. However, there are much input coefficients and τ ∈ ℜn×1 is a control input. Also, A ∈ ℜm×n
fewer parameters in the Fourier series expansion. The proposed
is the weight matrix of constraints and λ ∈ ℜm×1 is a constant
controller in this paper uses just the first three terms of the
force vector and m is the number of constraints. It considered that
Fourier series expansion in the actor and critic.
all constraints equalities are time independence, so
The innovations of this paper can be summarized as follow.
Firstly, a model-free voltage-based controller has been proposed. Aq̇ = 0 (2)
Although the controller presented in [10] is a voltage-based con-
troller for mobile robots, it needs current sensors and some Let S(q) be a full rank matrix with the size of (n − m) formed by
nominal parameters of the motors, while the proposed controller a set of smooth and linearly independent vector fields spanning
does not need current sensors. Secondly, an actor–critic RL con- the null space of A, i.e.,
troller for formation control of leader–follower mobile robots
with guaranteed asymptotic stability has been presented. Some ST AT = 0 (3)
RL-based controllers for formation control have been presented According to [26], it is possible to use (2) and (3) to find a vector
in the literature [17] without stability analysis. Although the RL- v(t) ∈ ℜn−m , that
based formation controller presented in [18] is accompanied by
a rigorous stability analysis, it has been tested on some sim- q̇ = Sv(t) (4)
ple multi-agents, while the proposed controller is applied on
leader–follower mobile robots that are more complicated due the Fig. 1 shows a non-holonomic mobile robot which has two driving
separation and bearing dynamics between leaders and follow- wheels in the back and a free wheel in the front of robot. Two DC
ers. Thirdly, it is a combination of two categories of model-free motors as its actuators produce the torques of rear wheels. The
controllers (intelligent controllers and function-approximation- vector q = [xc , yc , θ]T shows the robot position and orientation in
based controllers) due to using the Fourier series expansion in the Cartesian frame, which denotes the actual Cartesian position
an actor–critic-based structure. of vehicle mass center and orientation of the physical robot.
64
G. Khodamipour, S. Khorashadizadeh and M. Farshad ISA Transactions 138 (2023) 63–73

The mobile robot, due to the nonholonomic constraints, can where Jm is the motor shaft inertia diagonal matrix, θm ∈ ℜn
only move in directions normal to the driving wheel axis [27]. is the motors angular position vector (θm = r−1 θw ), Bm is the
Therefore, damping, r is the reduction gear, τ is the electromagnetics torque,
km denotes the diagonal matrix of torque constant and Ia is the
ẏc cos θ − ẋc sin θ − dθ̇ = 0 (5) motor current vector. Also, by (8) it can be shown that
The kinematic equation for the front of the mobile robot can be θ̇w =J(q)† q̇ (16)
obtained as:
where
ẋ cos θ −d sin θ
⎡ ⎤ ⎡ ⎤
cos θ sin θ
[ ]
⎥ v v v1
[ ] [ ] [ ]
1 Rw
† T −1 T
q̇ = ⎣ẏ ⎦ = ⎣sin θ d cos θ ⎦ = Sv , v = = J(q) = [J(q) J(q)] J(q) = (17)
⎢ ⎥ ⎢
ω ω v2 rw cos θ sin θ −Rw
θ̇ 0 1
The derivative of (16) can be computed as
(6)
θ̈w =J̇(q)† q̇ + J(q)† q̈ (18)
In the rest of this paper, the notations v1 and v2 are used to
show the linear (v ) and angular velocity (ω) of the mobile robot, where
respectively. It is be assumed that − sin θ cos θ
[ ]
† θ̇ 0
J̇(q) = (19)
|v1 | ≤ Vmax rw − sin θ cos θ 0
(7)
|v2 | ≤ Wmax The electrical equation of permanent magnet DC motors is given
The relationship between motor angular speeds and Cartesian by
speeds is given by: RIa + La İa + kb θ̇m = u (t ) (20)
q̇=J(q)θ̇w (8) n
in which u ∈ ℜ denotes the vector of motor voltages, θm ∈
where θ̇w = θ̇wr
[ ]T
θ̇wl is the angular speeds of right and left ℜn is angular positions and we have θm = r−1 q, and R, La , kb
motors and the Jacobian matrix J(q) is in the form of are diagonal matrices of resistance, inductance and back electro-
motive force (EMF), respectively. Substituting Ia from (15) into
cos θ cos θ
⎡ ⎤
(20) and using (16) and (18) result in
rw ⎢ −1
J(q) = ⎣ sin θ sin θ ⎦ (9) u(t) = Rkm −1 (Jm r−1 J(q)† S + r B M)v̇

2

1/Rw −1/Rw +Rkm −1
(Jm r −1
J(q) Ṡ + Jm r −1
J̇(q)† S
−1 (21)
In this paper, due to the horizontal frame and mobile trajectory, +Bm r−1 J(q)† S + rB Vm )v
the gravitational vector is zeros, i.e., G = 0, and the matrices in −1
dynamical model of mobile robot in (1) are as follows +Rkm −1 r B (F + τ d ) + La İa + kb r−1 J(q)† Sv
m 0 md sin θ
⎡ ⎤
mdθ̇ 2 cos θ Eq. (21) can be simplified to
⎡ ⎤
M=⎣ 0 m −md cos θ ⎦ , V = ⎣ mdθ̇ 2 sin θ ⎦ D (q) v̇ + N (q, q̇) v + H (q) + vL = u
⎢ ⎥
(22)
0
md sin θ −md cos θ I where
cos θ cos θ − sin θ
⎡ ⎤ ⎡ ⎤
−1
τ1 D (q) = Rkm −1 (Jm r−1 J(q)† S + r B
[ ]
1 ⎢ M) (23)
⎣sin θ sin θ ⎦ , τ = , A = ⎣cos θ ⎦ ,
T
B=
⎥ ⎢ ⎥
† †
rw τ2 N (q, q̇) = Rkm −1
(Jm r −1
J(q) Ṡ + Jm r −1
J̇(q) S
Rw −Rw −d −1
−1 †
+ Bm r J(q) S + rB Vm ) + kb r J(q)† S −1
(24)
λ = −m(ẋc cos θ + ẏc sin θ )θ̇ −1
H (q) = Rkm −1
rB (F + τ d ) (25)
(10)
In order to find a proper velocity controller, the dynamic Eq. (1) in which vL = La İa is considered as unmodeled dynamic.

is rewritten as
3. The fourier series expansion
ST MSv̇ + ST (MṠ + Vm S)v + F + τ d = ST Bτ (11)
According to [28], the non-periodic function f (t ) defined on
in which the transformation (4) has been used. According to (11), [t1 , t2 ] can be expressed as

2nπ t 2nπ t
∑ ( ) ( )
Mv̇ + Vm v + F + τ d = Bτ (12) f (t ) = a0 + an cos + bn sin (26)
T T
n=1
in which
where T = t2 − t1 . Considering the first m + 1 harmonics, (26)
M = ST MS, Vm = ST (MṠ + Vm S) (13) can be rewritten as
τ = Bτ, B = ST B (14) m
2nπ t 2nπ t
∑ ( ) ( )
f (t ) = a0 + an cos + bn sin + εm
Now, to have a complete model of mobile robot including the DC T T
n=1
motors as its actuators, the dynamic equations of DC motors are
= ϕf T ωf + εm (27)
introduced. For the torques in permanent magnet DC motors, we
have in which

Jm θ̈m + Bm θ̇m + r τ = km Ia (15) ωf = [a0 a1 b1 ... am b m ]T (28)


65
G. Khodamipour, S. Khorashadizadeh and M. Farshad ISA Transactions 138 (2023) 63–73

Now, the reference position of the follower ‘‘j’’ can be computed


due to the desired separation Lijd and bearing ψijd as [6]

xjr = xi − di cos θi + Lijd cos(ψijd + θi ) (32)


yjr = yi − di sin θi + Lijd sin(ψijd + θi ) (33)

The actual measured position of the follower ‘‘j’’ is defined as

xj = xi − di cos θi + Lij cos(ψij + θi ) (34)


yj = yi − di sin θi + Lij sin(ψij + θi ) (35)

Substitution of (32)–(35) into (31), and applying the trigonomet-


ric identities, the kinematic error of formation control problem is
achieved as
ej1 Lijd cos(ψijd + θij ) − Lij cos(ψij + θij )
⎡ ⎤ ⎡ ⎤

ej = ⎣ej2 ⎦ = ⎣ Lijd sin(ψijd + θij ) − Lij sin(ψij + θij ) ⎦ (36)


⎢ ⎥ ⎢ ⎥

ej3 θjr − θj
where θij = θi − θj . Due to the constraint of nonholonomic
mobile robots [26], the orientation of each robot in the formation
problem will not be equal so it is not θjr = θi for each robot.
However, the reference of the follower ‘‘j’’ is chosen such that
1
Fig. 2. Separation bearing formation control problem. θ̇jr = (v2i Lijd cos(ψijd + θij ) + v1i sin θij + kj2 ej2 ) (37)
dj

)]T where v1j , v2j , v1i and v2i are the linear and angular velocities
2π t 2π t 2mπ t 2mπ t
[ ( ) ( ) ( ) (
ϕf (t) = 1 cos sin ... cos sin of follower and leader robot, respectively. Using (6) it can be
T T T T computed that [30]
(29)
L̇ij = v1j cos(ψij + θij ) − v1i cos ψij + dj v2j sin(ψij + θij )
where εm denotes approximation error, ωf is unknown weight
1
vector, and ϕf is the regressor vector. ψ̇ij = (v1i sin ψij − v1j sin(ψijd + θij )
Lij

4. Designing the controller


+ dj v2j cos(ψij + θij ) − Lij v2i ) (38)

It is assumed that Lijd and ψijd are held constant. By the deriva-
4.1. Back stepping control design tion of (36) and using (38), the error dynamics can be written
as
In this section an RL controller is proposed to solve the leader– ⎡ ⎤
ėj1

−v1j + v1i cos θij + v2j ej2 − v2i Lijd sin(ψijd + θij )

follower formation control of mobile robots. In the rest of this
⎣ėj2 ⎦ = ⎣ −v2j ej1 + v1i sin θij − dj v2j + v2i Lijd cos(ψijd + θij ) ⎦
⎢ ⎥ ⎢ ⎥
paper, indexes ‘‘i’’ denotes the leader robot and ‘‘j’’ denotes the ėj = ⎢ ⎥ ⎢ ⎥
follower one. The introduced controller, due to the voltage control
ėj3 (v2i Lijd cos(ψijd + θij ) + v1i sin θij + kj2 ej2 )/dj − v2j
strategy [29], produced voltage of DC motors to achieve
(39)
lim (Lijd − Lij ) = 0
t →∞
(30) Following velocity control inputs is used to stabilize the kine-
lim (ψijd − ψij ) = 0 matic of follower to achieve the desired separation and bearing
t →∞
[6].
where Lij and Lijd are the measured and desired separation of
the follower ‘‘j’’ with respect to leader ‘‘i’’, ψij and ψijd are the v1i cos θij + kj1 ej1 − v2i Lijd sin(ψijd + θij )
⎡ ⎤
vjc1
[ ]
measured and desired bearing of the follower ‘‘j’’ with respect to vjc = =⎣1
(v2i Lijd cos(ψijd + θij ) + v1i sin θij + kj2 ej2 + kj3 ej3 )

leader ‘‘i’’. Eq. (30) shows that the separation and bearing errors vjc2 dj
of the formation control will be zero as t → ∞. Fig. 2 shows
(40)
an example of leader–follower mobile robot in formation control
problem. where kj1 , kj2 and kj3 are controller positive design constant which
Define the tracking controller error for the follower ‘‘j’’ as: will be chosen by Lyapunov methods in stability section. Now, the
error of velocity tracking can be defined as
ej1 cos θj sin θj 0 xjr − xj
⎡ ⎤ ⎡ ⎤⎡ ⎤
vjc1 v1j
[ ] [ ] [ ]
ej4
ej = Tej (qjr − qj ) = ⎣ej2 ⎦ = ⎣− sin θj cos θj 0⎦ ⎣yjr − yj ⎦ = vjc − vj =
⎢ ⎥ ⎢ ⎥⎢ ⎥
ejc = − (41)
ej5 vjc2 v2j
ej3 0 0 1 θjr − θj
where vj = vjc − ejc . Therefore, Eq. (39) can be rewritten as
(31)
ėj1 −kj1 ej1 + v2j ej2 + ej4
⎡ ⎤ ⎡ ⎤
]T
θ̇jr , ẏjr = v1jr sin θjr , ẋjr = v1jr cos θjr ,
[
q̇jr = ẋjr ẏjr ėj = ⎣ėj2 ⎦ = ⎣−kj2 ej2 − kj3 ej3 − v2j ej1 + dj ej5 ⎦
⎢ ⎥ ⎢ ⎥
(42)
θ̇jr = v2jr
ėj3 −kj3 ej3 /dj + ej5
66
G. Khodamipour, S. Khorashadizadeh and M. Farshad ISA Transactions 138 (2023) 63–73

Considering the tracking error for leader and its derivative, the where ĝj and ω̂aj are the Fourier series approximation of the gj
velocity control inputs for leaders are found as below: and ω∗ aj , respectively.
The control voltage for the follower robot is designed as:
ei1 cos θi sin θi 0 xr − xi
⎡ ⎤ ⎡ ⎤⎡ ⎤
uj = Kj4 ejc + ĝj = Kj4 ejc + ϕj ω̂aj + uraj (54)
ei = ⎣ei2 ⎦ = ⎣− sin θi cos θi 0⎦ ⎣yr − yi ⎦ (43)
⎢ ⎥ ⎢ ⎥⎢ ⎥
where uraj is the robust term and index aj denotes the actor part
ei3 0 0 1 θr − θi
of RL controller of jth follower. Substituting the proposed control
Therefore, we have voltage (54) into dynamic (51), the close loop dynamic of robot
ėi1
⎡ ⎤ ⎡
−v1i + vir cos ei3 + v2i ei2
⎤ could be written as
Dj ėjc = −Kj4 ejc + g̃j = −Kj4 ejc + ϕj ω̃aj + εaj − uraj (55)
ėi = ⎣ėi2 ⎦ = ⎣ −v2i ei1 + v1ir sin ei3 ⎦ (44)
⎢ ⎥ ⎢ ⎥
where g̃j = gj − ĝj is the Fourier series approximation error
ėi3 v2ir − v2i
and similarly ω̃j = ωj − ω̂j . In critic part of RL controller, the
The control velocity for the leader ‘‘i’’ can be defined as reinforcement signal (long-term cost function) is produced. This
critic network signal is defined as [31]
vic1 v1ir cos ei3 + ki1 ei1
[ ] [ ]
vic = = (45) Jj = ejc + ejc  ω̂cj T ϕj T
 
(56)
vic2 v2ir + ki2 v1ir ei2 + ki3 ki2 sin ei3
Similar to the follower and Eq. (41), the error of velocity tracking in which ω̂cj is estimation of ωcj ∈ ℜ ∗ 2m+1
. The long-term strategic
utility function is usually defined as Q} (k) = min α N p(k + 1)
{
for the leader robot can be defined as
+α N −1 p(k + 2) + · · · + α k+1 p(N) + · · · where 0 < α < 1 is
vic1 v1i
[ ] [ ] [ ]
ei4
eic = = vic − v i = − (46) the discount factor and N is the horizon index and p(k) is the
ei5 vic2 v2i utility function [32]. It should be noted that Q (k) is not available
where v i = vic − eic . Therefore, Eq. (44) can be rewritten as and should be approximated [32]. The term ϕj ω̂cj in (34) repre-
sents this estimation. For the ith leader mobile robot, similar to
ėi1 −ki1 ei1 + v2i ei2 + ei4
⎡ ⎤ ⎡ ⎤
(51), (54), (56), dynamic equation, control law and critic network
ėi = ⎣ėi2 ⎦ = ⎣
⎢ ⎥ ⎢
−v2i ei1 + v1ir sin ei3

(47) signal can be written similar to (51), (54), (56) in the form of

ėi3 −ki2 v1ir ei2 − ki3 ki2 sin ei3 + ei5 Di ėic = −Ni eic + fi + Hi + vLi − ui (57)
ui = Ki4 eic + ĝi = Ki4 eic + ϕi ω̂ai + urai (58)
4.2. RL controller
Ji = eic + ∥eic ∥ ω̂ci ϕi
T T
(59)
Eq. (22) explains the complete dynamics of mobile robot. The
5. Stability analysis
actor–critic RL controller structure is designed similar to the one
introduced in [11]. By adding and subtracting Dj v̇jc and Nj vjc in
Two theorems are needed to complete the stability analysis.
(22) for jth follower, the Eq. (22) can be rewritten as
The first theorem relates to the follower robot and the second
Dj v̇j − Dj v̇jc + Dj v̇jc + Nj vj − Nj vjc + Nj vjc + Hj + vLj = uj (48) theorem discusses the leader robot and convergence analysis
of the whole closed-loop system of leader–follower formation
Now, (48) can be simplified as control.
−Dj ėjc + Dj v̇jc − Nj ejc + Nj vjc + Hj + vLj = uj (49)
Theorem 1 (Stability Analysis for Followers). The signals in the
And (49) can be written as closed-loop system consisted of the system (51) and controller (54)
are bounded and the tracking error ejc asymptotically converges to
Dj ėjc = Dj v̇jc − Nj ejc + Nj vjc + Hj + vLj − uj (50)
zero if the following adaptation laws in actor and critic parts are
If fj = Dj v̇jc + Nj vjc , then (50) changed to hold
T
ω̂˙ aj = Kcj JjT ϕj − ηj Kcj ejc  ω̂aj T
 
Dj ėjc = −Nj ejc + fj + Hj + vLj − uj (51) (60)
T
ω̂˙ cj = −Kωj ejc  ω̂aj T ϕj T ϕj − ηj Kωj ejc  ω̂cj T
   
where uj denotes the vector of motor voltages as the input of jth (61)
follower mobile robot.
in which Kcj , Kωj and ηj are designed positive constant scalars. In fact,
it is assumed that ωaj ∗
and ωcj∗ are unknown. Consider the following
Assumption 1. lets the Fourier series approximation property
robust term.
hold for the function gj = −Nj ejc + fj + Hj + vLj with accuracy εaj
for all followers j, in the compact set Sj . (δj + βj )(δj + βj )ejc
uraj =  (62)
ejc T (δj + βj )

Assumption 2. The Fourier series reconstruction error


 for all The follower robots will be stable if the controller gains are selected
followers j is bounded by positive values such that εaj  < δj . d d
as kj1 > 12 , kj2 > 2j and kj3 > 2j . The proof of Theorem 1 is
(the index a shows the actor part of controller)
presented in the Appendix.
Using Fourier series expansion, there exist gj = −Nj ejc + fj +
Hj + vLj = ϕj ω∗ aj + εaj where ω∗ aj are actual weights and εaj is the Assumption 3. Assume that aj and cj are upper  ∗ bounds of the
weight matrix ωaj and ωcj∗ , respectively, i.e., ωaj  ≤ aj , ω∗ 


approximation error. This approximation is done in actor part of cj
RL controller to find a proper voltage control. The function gj and ≤ cj .
its estimation are given by
Theorem 2 (Stability Analysis for Leaders). The signals in the closed-
gj = ϕj ωaj

+ εaj (52)
loop system consisted of the system (57) and controller (58) are
ĝj = ϕj ω̂aj + uraj (53) bounded and the tracking error eic asymptotically converges to zero
67
G. Khodamipour, S. Khorashadizadeh and M. Farshad ISA Transactions 138 (2023) 63–73

Table 1
Robot parameters.
d (m) r (m) R (m) I (kg2 ) m (kg)
0.1 0.02 0.1 0.1 1

Table 2
Motor parameters.
kb km rg Ra La Bm Jm
0.5 0.5 0.5 1 0.00021 0.001 0.002

Table 3
Initial position of mobile robots.
θ Y X Positions
0 0 0 Position of leader
0 −0.86 −0.5 Position of first follower
0 −1.72 −1 Position of second follower
0 0.86 −0.5 Position of third follower

Fig. 3. Desired formation structure. Table 4


NN performance compared with the desired controller.
Integral absolute Descriptions Formation
if the following adaptation laws are hold error (IAE) controller

T
10.4116 The proposed method: RL FF-RL
ω̂˙ ai = Kci JiT ϕi − ηi Kci ∥eic ∥ ω̂aiT (63) based on Fourier series
15.2798 Neural networks NN
ω̂˙ ci T = −Kωi ∥eic ∥ ω̂aiT ϕi T ϕi − ηi Kωi ∥eic ∥ ω̂ci T (64)
in which Kci , Kωi and ηi are positive constant scalars.
the voltage of DC motors to achieve the desired formation. There-
Assumption 4. Assume that ai and ci areupper  bounds of the fore, the dynamics of DC motors is considered. The proposed
weight matrix ωai

and ωci∗ , respectively, i.e., ωai
∗
≤ ai , ωci∗  ≤ ci . controller is designed using actor–critic approach based on the
Consider the following robust term. Fourier series. In this simulation, four robots by the formation
(δi + βi )(δi + βi )eic structure of Fig. 3 and first non-zero error of tracking are con-
urai = (65) sidered.
eic T (δi + βi )
 
Fig. 4 shows the tracking performance and mobile robots in
The leader robots will be stable by following conditions, if the xy direction. Fig. 5 shows the error of formation control for all
controller gains are selected as ki1 > 12 , ki3 > 2k1 . Due to three follower robots. As it is obvious, the formation error will
    i2
the boundedness of eij  and ėij , it can be shown that all the be bounded and converges to zero after a short time. The motor
errors will be zero as t → ∞. The proof of Theorem 2 is given voltages as the inputs of the mobile robots are shown in Fig. 6.
in Appendix. These voltages are bounded and smooth. The reason of high
voltages in initial times of simulation is large initial errors in the
6. Simulation results tracking errors presented in Fig. 6. In order to reduce the tracking
errors as fast as possible, the control gains have been selected
In this section, the performance of the proposed controller is large that results in large initial control signals. In the rest of this
considered. Formation control problem of leader–follower mobile section, to show the superiority of the proposed method, another
robot using the RL controller with actor critic approach based controller is simulated. In the second simulation, the parameters
on the Fourier series expansion is simulated as the introduced of mobile robots and their motors will be the same.
method. In addition, this method is based on the voltage control NN controller:
strategy. Therefore, DC motors as the actuators of mobile robot In this subsection, an NN controller is simulated to control
are considered. The desired formation structure of the mobile the formation of the leader–follower mobile robots. To have a
robots in the simulation is chosen as Fig. 3. This structure is fare comparison with the proposed controller, the initial positions
organized by four robots including one leader and three follow- of robots are chosen the same. The formation errors of follower
ers. The leader robot must track the desired trajectory given in robots are shown in Fig. 7.
Fig. 4 and other robots must track the leader with the desired Due to the results of the above simulations, the proposed
separation and bearing to have the desired formation of Fig. 3. controller is superior. Table 4 compares these methods. Table 4
Due to the sinusoidal terms in the dynamics of mobile robots, shows that the formation error in proposed method is much less
the Fourier series is utilized in the controller design to decrease than the NN one. Also, the computational complexity due to the
the computational complexity. This simulation is done in two use of Fourier series is very low.
approaches, at the first step, the proposed controller is simulated
and then to have a proper comparison, the Neural Network con-
7. Conclusion
troller is simulated [6]. The mobile robot parameters and motor
parameters are shown in Tables 1 and 2. Also, the initial position
In this paper, a complete model for mobile robots using DC
of each robot is shown in Table 3.
motors is introduced. Also, an RL controller for a formation con-
RL controller: trol problem is designed. Due to the sinusoidal terms in the
In this subsection, the RL controller by the voltage control dynamics of mobile robots, the Fourier series is utilized to design
strategy approach will be simulated. So, the controller produces the proposed controller. The stability of the proposed controller is
68
G. Khodamipour, S. Khorashadizadeh and M. Farshad ISA Transactions 138 (2023) 63–73

Fig. 4. The performance of the proposed controller.

Fig. 5. Tracking error of the proposed controller.

69
G. Khodamipour, S. Khorashadizadeh and M. Farshad ISA Transactions 138 (2023) 63–73

Fig. 6. Motor voltages in the proposed controller.

examined by Lyapunov theorem and it is shown that introduced where


controller is asymptotically stable. Based on the simulation re- kj2 dj kj3
sults, it can be shown that the proposed controller has a smaller Vj1 = (e2j1 + e2j2 ) + e2j3 (67)
2 2
formation error in comparison with the NN controller.
and
Declaration of competing interest 1 1 1
Vj2 = ejc T Dj ejc + ω̃ajT ω̃aj + ω̃cjT ω̃cj (68)
2 2Kcj 2Kωj
The authors declare that they have no known competing finan-
The time derivation of Vj1 is given by
cial interests or personal relationships that could have appeared
to influence the work reported in this paper. V̇j1 = kj2 ej1 ėj1 + kj2 ej2 ėj2 + dj kj3 ej3 ėj3 (69)

Using (42), (69) can be written as


Data availability
V̇j1 = −kj2 kj1 e2j1 − k2j2 e2j2 − k2j3 e2j3 − kj2 kj3 ej2 ej3 + kj2 ej1 ej4
The data that support the findings of this study are available
from the corresponding author upon reasonable request. + dj (kj2 ej2 + kj3 ej3 )ej5 (70)

The time derivation of Vj2 is given by


Compliance with ethical standards
1 ˙ T 1 ˙ T
V̇j2 = ejc T Dj ėjc − ω̂aj ω̃aj − ω̂cj ω̃cj (71)
The authors declare that this paper has never been submitted Kcj Kω j
to other journals for simultaneous review processes and it has Substituting (55) into (71) results in
not been published before (partially or completely). Moreover,
this paper has not been divided into different sections in order 1 ˙ T 1 ˙ T
to have more submissions. The data has not been fabricated or V̇j2 = −ejc T Kj4 ejc + ejc T (ϕj ω̃aj + εaj − uraj ) − ω̂aj ω̃aj − ω̂cj ω̃cj
Kcj Kω j
changed in favor of the conclusions. In addition, no theory, data,
or text belonging to other authors and publications was included (72)
as if it was our own and proper acknowledgment to other works Substituting (59), (60)–(61) into (72) yields in
has been provided.
V̇j2 = −ejc T Kj4 ejc + ejc T ϕj ω̃aj + ejc T (εaj − uraj ) − ejc T ϕj ω̃aj
Appendix
− ejc  ω̂cj T ϕj T ϕj ω̃aj
 

+ηj ejc  ω̂aj T ω̃aj + ejc  ω̂aj T ϕj T ϕj ω̃cj + ηj ejc  ω̂cj T ω̃cj
     
Proof of Theorem 1. Consider the Lyapunov function

Vj = kj4 Vj1 + (kj2 + dj (kj2 + kj3 ))Vj2 (66) (73)


70
G. Khodamipour, S. Khorashadizadeh and M. Farshad ISA Transactions 138 (2023) 63–73

Fig. 7. Separation and bearing error of NN compared with the desired controller.

Therefore, we have According to Assumption 3, it can be concluded

V̇j2 = −ejc T Kj4 ejc + ejc T (εaj − uraj ) − ejc  ω̂cj T ϕj T ϕj ω̃aj
 
+ ω̂cj  ϕj T ϕj  aj + ω̂cj  ϕj T ϕj  ω̂aj 
      

+ηj ejc  ω̂aj T ω̃aj


 
⏐α ⏐ ≤ +ηj ω̂aj  aj + ω̂aj  ϕj T ϕj  cj
    
(74)
⏐ ′⏐
j (78)
+ ejc  ω̂aj T ϕj T ϕj ω̃cj + ηj ejc  ω̂cj T ω̃cj
   
+ ω̂aj  ϕj T ϕj  ω̂cj  + ηj ω̂cj  cj
     
  
In other words, βj

V̇j2 ≤ −ejc T Kj4 ejc + ejc T (εaj − uraj ) The time-varying term βj can be simplified to

+ ejc  −ω̂cj T ϕj T ϕj ω̃aj + ηj ω̂aj T ω̃aj + ω̂aj T ϕj T ϕj ω̃cj + ηj ω̂cj T ω̃cj


 [ ]
βj = ϕj T ϕj  aj ω̂cj  + 2 ω̂cj  ω̂aj  + cj ω̂aj 
 (       )
  
αj
+ ηj aj ω̂aj  + cj ω̂cj 
(    )
(79)
(75)
By substituting (78) into (75), (75) can written as
The term αj is a function of time.  Using the definitions ω̃aj =
ωaj∗ − ω̂aj and ω̃cj = ωcj − ω̂cj , ejc  αj can be rewritten as V̇j2 ≤ −ejc T Kj4 ejc + ejc (εaj − uraj ) + ejc  βj − ejc  ηj ω̂aj T ω̂aj
   

− ejc  ηj ω̂cj T ω̂cj


 
ejc  αj = − ejc  ω̂cj T ϕj T ϕj ω∗ + ejc  ω̂cj T ϕj T ϕj ω̂aj (80)
     
aj

+ηj ejc  ω̂aj T ωaj∗ − ηj ejc  ω̂aj T ω̂aj


   
Due to the negative-ness of −ejc T Kj4 ejc , − ⏐Yj ⏐ ηj ω̂aj T ω̂aj , and
⏐ ⏐
(76)
− ⏐Yj ⏐ ηj ω̂cj T ω̂cj , it should be verified that ejc T (εaj −uraj )+ejc  βj ≤
⏐ ⏐  
+ ejc  ω̂aj T ϕj T ϕj ωcj∗ − ejc  ω̂aj T ϕj T ϕj ω̂cj
   

0. In order to achieve this goal, the robust term uraj is chosen as


+ηj ejc  ω̂cj T ωcj∗ − ηj ejc  ω̂cj T ω̂cj
   
follows [24]
It is obvious that − ejc  ηj ω̂aj T ω̂aj and − ejc  ηj ω̂cj T ω̂cj are non-
   
(δj + βj )(δj + βj )ejc
positive. Now, consider the definition: uraj = (81)
ejc T (δj + βj )
 

αj′ = −ω̂cj T ϕj T ϕj ωaj∗ + ω̂cj T ϕj T ϕj ω̂aj + ηj ω̂aj T ωaj∗ + ω̂aj T ϕj T ϕj ωcj∗


Based on εaj  ≤ δj , (86) is satisfied and we have:
 
−ω̂aj ϕj ϕj ω̂cj + ηj ω̂cj ωcj
T T T ∗

(77) V̇j2 ≤ −ejc T Kj4 ejc (82)


71
G. Khodamipour, S. Khorashadizadeh and M. Farshad ISA Transactions 138 (2023) 63–73

Due to (66), (70) and (82), and complete squares, it can be shown Due to (85), (90) and (93), and complete squares, it can be shown
that: that:
1 dj dj 1 1 1 1 2
V̇j ≤ −kj4 kj2 (kj1 − )e2j1 − kj4 kj2 (kj2 − )e2j2 − kj4 kj3 (kj3 − )e2j3 V̇i ≤ ki4 [−(ki1 − )e2i1 − (ki3 − ) sin2 ei3 − ( + )e
2 2 2 2 ki2 2 ki2 i4
1 1 2 1 1
kj2 2 dj 2 − (1 + )e2 + e − ( √ ei1 + √ ei4 )2 (94)
−kj4 ( + dj (kj2 + kj3 ))ej4 − kj4 (kj2 + (kj2 + kj3 ))ej5 2ki2 i5 2ki2 i5 2 2
2 2 1 1
kj4 kj2 2 dj kj4 kj2 2 − (√ ei5 + √ sin ei3 )2 ]
− (ej1 − ej4 ) − (ej2 − ej5 ) 2ki2 2ki2
2 2
dj kj4 kj3 Since −( √1 ei1 + √1 ei4 )2 and −( √1 ei5 + √1 sin ei3 )2 are non-
− (ej3 − ej5 )2 2 2 2ki2 2ki2
2 positive, Eq. (94) can be rewritten as:
(83)
1 1 1 1
V̇i ≤ ki4 [−(ki1 − )e2i1 − (ki3 − ) sin2 ei3 − ( + )e2i4
Examining (83), it can be concluded that V̇j ≤ 0 if the controller 2 ki2 2 ki2
dj dj
gains are selected as: kj1 > 1
, kj2 > and kj3 > . Therefore, 1 1
2 2 2 − (1 + )e2i5 + e2i5 ] (95)
the position tracking errors ej and the velocity tracking errors ejc 2ki2 2ki2
converge to zero as well as the
 weight
  gain matrix
  is bounded. Examining (95), it can be concluded that V̇i ≤ 0 if the controller
Due to the boundedness of ej , ejc , ėj , and ėjc  for follower gains are selected as: ki1 > 21 , ki3 > 2k1 . Therefore, the position
  i2
j, it can be shown that V̈j  ≤ ∞, therefore, using Barbalat’s tracking errors ei and the velocity tracking errors eic converge to
Lemma, V̇j → 0. Thus, ej → 0 and ejc → 0 as t → ∞. As a zero as well as the weight gain matrix is bounded. Also, it can be
result, the tracking errors of position, orientation and velocity are concluded that V̇ij ≤ 0, so the position, orientation and velocity
asymptotically stable. tracking errors for formation
  control
 problem
 are bounded. Due
This completes the proof of Theorem 1. to the boundedness of eij  and ėij  for the leader i and its
followers, it can be shown that V̈ij  ≤ ∞ and V̇j is uniformly
Proof of Theorem 2. Consider the Lyapunov function continuous, therefore, using Barbalat’s Lemma, V̇ij → 0. Thus,
N
∑ ej → 0, ejc → 0, ei1 → 0, ei3 → 0 and eic → 0 as t → ∞. Using
Vij′ = V ′ j + Vi (84) (47), it can verify that ei2 → 0 as t → ∞. Therefore, asymptotical
1 stability is verified. This completes the proof of Theorem 2.
where Vj is defined by (66) and is as Vi below:
Remark 1. In the case when the follower j is a leader for fol-
Vi = ki4 Vi1 + (1 + 1/ki2 )Vi2 (85) lower j + 1, the global asymptotic stability of this formation
control problem is similar to Theorem 1. In this case, the radially
where unbounded Lyapunov candidate set as:
1 1 j+1
Vi1 = (e2i1 + e2i2 ) + (1 − cos ei3 ) (86) ∑
2 ki2 V ′′ j = Vj (96)
j
and
1 1 1 where Vj is defined by (66). Now, follower j makes a reference
Vi2 = eic T Di eic + ω̃ai T ω̃ai + ω̃ci T ω̃ci (87) for follower j + 1 in formation control problem. Due to using the
2 2Kci 2Kωi
same dynamics for jth follower robot (dynamics of follower j is
The time derivation of Vi1 is given by incorporates the dynamics of leader i), so the proof of stability
1 will be the same when the follower j is being considered as leader
V̇i1 = ei1 ėi1 + ei2 ėi2 + sin ei3 ėi3 (88) for follower j + 1.
ki2
According to (47) References
V̇i1 = − ki1 e2i1 + ei1 ei4 + v2i ei1 ei2 − v2i ei1 ei2 + ei2 v1ir sin ei3
[1] Rubio F, Valero F, Llopis-Albert C. A review of mobile robots: Concepts,
1 (89)
−ki3 sin2 ei3 + ei5 sin ei3 − ei2 v1ir sin ei3 methods, theoretical framework, and applications. Int J Adv Robot Syst
ki2 2019;16(2):1729881419839596.
[2] Lewis MA, Tan KH. High precision formation control of mobile robots using
then virtual structures. Auton Robots 1997;4(4):387–403.
1 [3] Xi J, Yang X, Yu Z, Liu G. Leader–follower guaranteed-cost consensual-
V̇i1 = −ki1 e2i1 − ki3 sin2 ei3 + ei1 ei4 + ei5 sin ei3 (90) ization for high-order linear swarm systems with switching topologies. J
ki2 Franklin Inst 2015;352(4):1343–63.
[4] Wang H, Guo D, Liang X, Chen W, Hu G, Leang KK. Adaptive vision-based
The time derivation of Vi2 is given by leader–follower formation control of mobile robots. IEEE Trans Ind Electron
1 ˙ T 1 ˙ T 2017;64(4):2893–902.
V̇i2 = eic T Di ėic − ω̂ai ω̃ai − ω̂ci ω̃ci (91) [5] Peng Z, Wen G, Rahmani A, Yu Y. Distributed consensus-based formation
Kci Kω i control for multiple nonholonomic mobile robots with a specified reference
trajectory. Internat J Systems Sci 2015;46(8):1447–57.
Applying similar steps to Eq. (71), Based on Assumption 4, |εi | ≤
[6] Dierks T, Jagannathan S. Asymptotic adaptive neural network tracking
δi . Now, consider the following control term: control of nonholonomic mobile robot formations. J Intell Robot Syst
2009;56(1):153–76.
(δi + βi )(δi + βi )eic
urai = (92) [7] Lin S, Jia R, Yue M, Xu Y. On composite leader–follower formation control
eic T (δi + βi )
 
for wheeled mobile robots with adaptive disturbance rejection. Appl Artif
Intell 2019;33(14):1306–26.
Thus, according to (92), we have: [8] Li Y, Zhu L, Guo Y. Observer-based multivariable fixed-time for-
mation control of mobile robots. J Syst Eng Electron 2020;31(2):
V̇i2 ≤ −eic T Ki4 eic (93) 403–14.

72
G. Khodamipour, S. Khorashadizadeh and M. Farshad ISA Transactions 138 (2023) 63–73

[9] Hou R, Cui L, Bu X, Yang J. Distributed formation control for mul- [20] Wen G, Chen CP, Li B. Optimized formation control using simplified
tiple non-holonomic wheeled mobile robots with velocity constraint reinforcement learning for a class of multiagent systems with unknown
by using improved data-driven iterative learning. Appl Math Comput dynamics. IEEE Trans Ind Electron 2019;67(9):7879–88.
2021;395:125829. [21] Izadbakhsh A. Robust control design for rigid-link flexible-joint electrically
[10] Fateh MM, Arab A. Robust control of a wheeled mobile robot by voltage driven robot subjected to constraint: theory and experimental verification.
control strategy. Nonlinear Dynam 2015;79(1):335–48. Nonlinear Dynam 2016;85(2):751–65.
[11] Khodamipour G, Khorashadizadeh S, Farshad M. Observer-based adaptive [22] Ahmadi SM, Fateh MM. Composite direct adaptive Taylor series–fuzzy
control of robot manipulators using reinforcement learning and the Fourier controller for the robust asymptotic tracking control of flexible-joint
series expansion. Trans Inst Meas Control 2021;43(10):2307–20. robots. Trans Inst Meas Control 2019;41(14):4023–34.
[12] Cui Y, Liu X, Deng X, Wang Q. Observer-based adaptive fuzzy formation [23] Izadbakhsh A, Khorashadizadeh S. Robust adaptive control of robot manip-
control of nonlinear multi-agent systems with nonstrict-feedback form. Int ulators using Bernstein polynomials as universal approximator. Internat J
J Fuzzy Syst 2021;23(3):680–91. Robust Nonlinear Control 2020;30(7):2719–35.
[13] Li Y, Zhang J, Tong S. Fuzzy adaptive optimized leader-following formation [24] Izadbakhsh A, Khorashadizadeh S, Ghandali S. Robust adaptive impedance
control for second-order stochastic multi-agent systems. IEEE Trans Ind Inf control of robot manipulators using Szász–Mirakyan operator as universal
2022;18(9):6026–37. approximator. ISA Trans 2020;106:1–11.
[14] Yang S, Cao Y, Peng Z, Wen G, Guo K. Distributed formation control of [25] Zirkohi MM. Direct adaptive function approximation techniques-
nonholonomic autonomous vehicle via RBF neural network. Mech Syst based control of robot manipulators. J Dyn Syst Meas Control
Signal Process 2017;87:81–95. 2018;140(1):011006.
[15] Zhang M, Yu X, Ding P, Ou L, Zhang W. Distributed adaptive three- [26] Fierro R, Lewis FL. Control of a nonholonomic mobile robot: backstepping
dimension formation control based on improved RBF neural network kinematics into dynamics. J Robot Syst 1997;13:149–63.
for non-linear multi-agent time-delay systems. IET Control Theory Appl [27] Lewis FL, Liu K, Yesildirek A. Neural net robot controller with guaranteed
2019;13(17):2758–65. tracking performance. IEEE Trans Neural Netw 1995;6(3):703–15.
[16] Falco P, Attawia A, Saveriano M, et al. On policy learning robust to [28] Khorashadizadeh S, Fateh MM. Uncertainty estimation in robust tracking
irreversible events: An application to robotic in-hand manipulation. IEEE control of robot manipulators using the Fourier series expansion. Robotica
Robot Autom Lett 2018;3(3):1482–9. 2017;35(2):310–36.
[17] Zhou Y, Lu F, Pu G, Ma X, Sun R, Chen HY, Li X. Adaptive leader- [29] Fateh MM. On the voltage-based control of robot manipulators. Int J
follower formation control and obstacle avoidance via deep reinforcement Control Autom Syst 2008;6(5):702–12.
learning. In: 2019 IEEE/RSJ international conference on intelligent robots [30] Martinelli A. The odometry error of a mobile robot with a synchronous
and systems. IROS, IEEE; 2019, p. 4273–80. drive system. IEEE Trans Robot Autom 2002;18(3):399–405.
[18] Wen G, Chen CP, Feng J, Zhou N. Optimized multi-agent formation control [31] Tang L, Liu YJ, Tong S. Adaptive neural control using reinforcement learning
based on an identifier–actor–critic reinforcement learning algorithm. IEEE for a class of robot manipulator. Neural Comput Appl 2014;25(1):135–41.
Trans Fuzzy Syst 2017;26(5):2719–31. [32] Zhang X, Zhang H, Liu D, Kim Y. Neural-network-based reinforcement
[19] Tang L, Liu Y-J, Tong S. Adaptive neural control using reinforce- learning controller for nonlinear systems with non-symmetric dead-zone
ment learning for a class of robot manipulator. Neural Comput Appl inputs. In: 2009 IEEE symposium on adaptive dynamic programming and
2014;25(1):135–41. reinforcement learning. IEEE; 2009, p. 124–9.

73

You might also like