Deep Reinforcement Learning Control of Fully-Constrained Cable-Driven Parallel Robots
Deep Reinforcement Learning Control of Fully-Constrained Cable-Driven Parallel Robots
7, JULY 2023
Abstract—Cable-driven parallel robots (CDPRs) have property, while the cables cause less damage. In view of the
complex cable dynamics and working environment uncer- advantages, CDPRs are widely used in various scenarios, such
tainties, which bring challenges to the precise control of as the Skycam system [1], a cable robot simulator for human
CDPRs. This article introduces the reinforcement learning
to offset the negative effect on the control performance perception research [2], a 6-DOF CDPRs for 3D printing [3],
of CDPRs resulting from the uncertainties. The problem a feed drive device for the large radio telescope [4], cranes and
of controller design for CDPRs in the framework of deep storage equipment for cargo handling [5], [6]. However, cables
reinforcement learning is investigated. A learning-based lead to uncertain parameters in a dynamic model, which brings
control algorithm is proposed to compensate for uncer-
challenges to controller design for CDPRs.
tainties due to cable elasticity, mechanical friction, etc. A
basic control law is given for the nominal model, and a Accurate CDPRs parameters can be efficiently obtained by
Lyapunov-based deep reinforcement learning control law using calibration equipment. A laser tracker was chosen as
is designed. Moreover, the stability of the closed-loop the measurement device to calibrate the geometric parameters
tracking system under the reinforcement learning algo- of CDPRs in [7]. A vision system consisting of six cameras
rithm is proved. Both simulations and experiments validate was used to calibrate the kinematic and dynamic parameters
the effectiveness and advantages of the proposed control
algorithm. of CDPRs in [8]. A high-speed charge-coupled device camera
was used to measure the position of CDPRs in [9]. The
Index Terms—Cable-driven parallel robots (CDPRs), abovementioned methods work well in simple scenarios, but
deep reinforcement learning, parameter uncertainties.
the effectiveness depends on the accuracy of the calibration
equipment. The calibration accuracy of unknown parameters
I. INTRODUCTION is also not guaranteed, which limits the application of these
methods in some uncertain environments.
ABLE-DRIVEN parallel robots (CDPRs) drive the
C end-effector to move in a large working space by using
cables. The winding mechanisms of CDPRs are fixed on the
In view of these limitations, some control algorithms have
been proposed to solve the CDPRs control problem caused by
the model parameters with uncertainties. Adaptive control is
ground or the worktables, reducing the overall motion load
an efficient method for solving parameter uncertainties, which
and achieving higher motion speed. Compared with rigid
adapts to uncertain parameters by designing the adaptive law.
parallel robots and rigid manipulators, cables can effectively
The design of the adaptive law is based on the description of the
decrease damages when some accidents happen, for example,
model and the uncertainties. For the planar CDPRs, an adaptive
the breakage of the rigid manipulators or the rigid parallel
controller was proposed to deal with the parameter uncertainties
robots link will cause huge damage to the human body and
in the dynamic model in [10]. An adaptive dual space control al-
gorithm has been designed for space CDPRs subject to uncertain
Manuscript received 3 May 2022; revised 27 July 2022; accepted 17 parameters in [11]. The asynchrony of multiple cables leads to
August 2022. Date of publication 9 September 2022; date of current
version 17 February 2023. This work was supported in part by the
the model parameter uncertainties. Effective synchronization of
National Science Foundation of China under Grant 62033005, Grant cables will improve the performance of CDPRs controllers. An
62203136, Grant 62022030, Grant 62173107, and Grant 62106062, adaptive synchronous control method was proposed to reduce
in part by the Natural Science Foundation of Heilongjiang Province
under Grant ZD2021F001, in part by the Sichuan Province Science and
the synchronization errors in [12], where both kinematic and
Technology Support Program under Grant 2021YFSY0026, in part by dynamic uncertainties have been addressed.
the China Postdoctoral Science Foundation under Grant 2021M701007, To deal with more complex and implicit uncertainties, the
Grant 2021TQ0091, in part by the Fundamental Research Funds for the
Central Universities under Grant HIT.OCEF.2021005, and in part by the
robust control was proposed. The less model and uncertainties
Postdoctoral Science Foundation of Heilongjiang Province under Grant information are required in the design of the robust controller, for
LBH-Z21059. (Corresponding author: Chengwei Wu.) example, only the boundary conditions and types of the uncer-
The authors are with the School of Astronautics, Harbin Institute
of Technology, Harbin 150001, China (e-mail: [email protected].
tainties need to be known. Khosravi et al. [13] proposed a robust
cn; [email protected]; [email protected]; guanghuisun@ proportion-integration-differentiation (PID) control algorithm
hit.edu.cn; [email protected]; [email protected]). for fully-constrained CDPRs. For the fully-constrained CDPRs,
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TIE.2022.3203763.
Shang et al. [14] also proposed an adaptive robust control method
Digital Object Identifier 10.1109/TIE.2022.3203763 to suppress unmodeled dynamics and external disturbances,
0278-0046 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
LU et al.: DEEP REINFORCEMENT LEARNING CONTROL OF FULLY-CONSTRAINED CABLE-DRIVEN PARALLEL ROBOTS 7195
which has better adaptability for uncertainties compared to the The rest of the this is organized as follows. Section II provides
robust PID. Furthermore, Babaghasabha et al. [15] proposed a the CDPRs model and problem formulation. Section III presents
composite controller without knowing both the lower and upper the setup and implementation of the RL. Section IV presents
bounds of uncertainties, which is more general. the results of simulation and experiments. Section V is the
To further improve the control performance, sliding mode conclusion of this article.
control methods for CDPRs have been presented. An adaptive
sliding mode control method was proposed in [16], where both
II. CDPRS MODEL AND PROBLEM FORMULATION
the upper bound of unknown uncertainties and the form of linear
regression are not necessary knowledge in the design process. A. CDPRs Model
Considering the tracking synchronization of multiple cables, a For CDPRs, the displacement vector of the end-effector is
new synchronous control method based on second-order sliding defined by pe = [xp yp zp ]T , and ψ e = [αp βp γp ]T is defined as
mode was proposed in [17]. its rotation vector, thus, the motion vector of the end-effector is
The complex cable dynamics and working environment un- defined by x = [pe T ψ e T ]T . Based on [24], the overall dynamic
certainties of CDPRs bring challenges to the precise control model of the CDPRs is described as
of CDPRs. Some explicit uncertainties are mainly caused by
the transmission friction, the transmission ratio of the winding M (x) + J T RT −1 I m RT −1 J ẍ + J T RT −1 I m
mechanism, the accuracy of the sensor, and the central location
of gravity of the end-effector. In addition, there are some implicit RT −1 J̇ + C (x, ẋ) + J T RT −1 F v RT −1 J ẋ
uncertainties mainly caused by cable elasticity, mass change,
and sloshing of the end-effector. Limited by the identification ˙ + G = J T RT −1 u
+J T RT −1 F c sign() (1)
accuracy of the system model and model parameters, traditional
control methods have difficulties dealing with the explicit un- where u is the torque of the motor as the input and x is the output.
certainties and the implicit uncertainties. In contrast, model-free M (x) is the positive definite symmetric inertia matrix. is the
control based on reinforcement learning (RL) can effectively cable vector. J and J T are the Jacobian matrix and its transpose,
deal with the abovementioned uncertainties and solve the prob- respectively. RT is the gear ratio from motor angle to cable
lem of CDPRs control with unknown models. Many applications length. I m , F v , and F c are the inertia matrix, viscous friction
of RL can be found in the literatures [18], [19], where models matrix, and Coulomb friction matrix of the winding mechanism.
of systems were described by differential equations. Compared C(x, ẋ) is the Coriolis and centrifugal matrices. G is the gravity
with the abovementioned RL results, deep RL, where system vector. ẋ and ẍ are the velocity and acceleration vector of the
dynamics is described by a Markov decision process, provides a end-effector, respectively.
more general frame to learn a controller (a.k.a. policy). Deep RL
shows great potentials in designing robots controller under un- B. Problem Formulation
certain environment such as robotic arms [20], [21], unmanned
surface vehicles and drones [22], [23]. The overall dynamic model of CDPRs in (1) considers non-
Motivated by the abovementioned discussions and the advan- linear factors but it ignores the parameter uncertainties. Thus,
tages of deep RL, this article intends to propose a learning control the model in (1) can be regarded as a nominal counterpart.
framework for CDPRs. According to existing results, a nominal Considering uncertain parameters in CDPRs, its dynamics can
model of CDPRs and a basic control law are given. Then, be described as follows:
a Markov decision process is constructed for the closed-loop
system. A learning control algorithm is proposed by combining M U+J T RT U −1 I mU RT U −1 J ẍ + J T RT U −1 I mU
a deep RL algorithm and the Lyapunov function. Finally, both
RT U −1 J̇ + C + J T RT U −1 F vU RT U −1 J ẋ
simulation and experimental results are provided to demonstrate
the effectiveness and advantages of the proposed RL-based ˙ + GU = J T RT U −1 u (2)
+J T RT U −1 F cU sign()
control algorithm. The main contributions of this article are
summarized as follows. where M U , RT U , I mU , F vU , F cU represent model parameters
1) The dynamics of CDPRs are described as a Markov de- with uncertainties, which are mainly caused by the errors of pa-
cision process, based on which, a learning-based control rameters identification and variations in the movement process.
algorithm is first proposed for CDPRs. Compared with They are expressed as
existing results [13], [24], such an algorithm can achieve
the desired control performance without identifying exact M U = (M + ΔM ) , RT U = (RT + ΔRT )
system parameters.
2) The convergence of the learning algorithm is guaran- I mU = (I m + ΔI m ) , F vU = (F v + ΔF v )
teed [23], [25], and further the stability of CDPRs un- F cU = (F c + ΔF c ) , GU = (G + ΔG) (3)
der the proposed learning control algorithm is proved
by introducing the Lyapunov function in the learning where M , RT , I m , F v , F c , G are the nominal value of the
algorithm. parameters.
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
7196 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 7, JULY 2023
decision process
T
X(m) = L1 , L2 , L3 , L4 , L5 , L6 · · · L2n+11 , L2n+12
subject to
⎧
⎪
⎪L1 = xp (m) − xp e (m), L2 = dxp (m) − dxp e (m)
⎪
⎪
⎪
⎪L3 = yp (m) − yp e (m), L4 = dyp (m) − dyp e (m)
⎪
⎪
⎪
⎪L5 = zp (m) − zp e (m), L6 = dzp (m) − dzp e (m)
⎪
⎪
⎪
⎪L 7 = αp (m) − αp e (m), L8 = dαp (m) − dαp e (m)
⎪
⎨L9 = βp (m) − βp (m), L10 = dβp (m) − dβp (m)
e e
⎪L11 = γp (m) − γp e (m), L12 = dγp (m) − dγp e (m) ,
Fig. 1. Control diagram of RL-based control algorithm. ⎪
⎪
⎪
⎪L13 = 1 (m) − 1e (m), L14 = d1 (m) − d1e (m)
⎪
⎪
⎪
⎪
.. ..
To solve the problem caused by parameter uncertainties, as ⎪
⎪ . .
⎪
⎪
shown in (2), this article investigates how to design a learning- ⎪
⎪L 2n+11 = n (m) , L 2n+12 = dn (m)
⎩
based control algorithm to achieve the desired control perfor- −ne (m) − dne (m)
mance. Fig. 1 shows the RL-based control diagram, from which where Li , i = 1, 2, . . ., 2n + 12 are the error items. xp (m),
we can see that the control signal u is obtained by yp (m), zp (m), αp (m), βp (m), γp (m) mean the discrete vari-
u = ua + u r (4) ables of position and rotation of the end-effector. i (m), i =
1, 2, . . ., n mean the discrete variables of the cable length. d[·]
where ua is a basic controller, and ur denotes a control signal means the differential of [·], for example, dxp (m) is the dif-
to be learned. For the basic controller, we can choose one from ferential of xp (m), standing for its rate of change. [·]e (m) and
the existing literatures for example, the control method in [26] [·](m) are, respectively, the expected value and actual value of
is chosen in this article the abovementioned variables, for example, xp e (m) represents
ua = RT T exp + I m RT −1 ¨exp + F v RT −1 ˙ exp the expected value and xp (m) represents the actual value.
Then, the Markov decision process is described as
˙ + K p e + K d e˙
+ F c sign() (5)
X (m + 1) ∼ P (X (m + 1) |X(m) , ur (m))
where ˙ exp , ¨exp , and ˙ represent the expected value and the
where X(m) ∈ S, ur (m) ∈ U , and the probability of state tran-
actual value of the cable, respectively. T exp is the expected
sition to X(m + 1) after taking action ur (m) at state X(m) is
tension of the cable, and e , e˙ represent the errors of the cable
P(X(m + 1)|X(m), ur (m)).
length and cable speed, respectively.
Based on the Markov decision process established previously,
Remark 1: According to existing results [9], [12], control
a Lyapunov-based soft actor-critic algorithm [27] is proposed
schemes should be tailored to CDPRs. If uncertain parameters
as the learning control framework for CDPRs. The detailed
cannot be effectively covered by the designed control scheme
derivation will be given as follows.
(i.e., the basic controller used in this article) or the running
environment of CDPRs changes, the performance can be de-
B. Setup of the RL
graded. To compensate for such uncertainties and improve the
adaptability, an RL-based controller is provided. In addition, the The control cost C(m) is described as
basic controller is utilized to generate effective training data for
C(m) = X T (m)D r X(m) (6)
improving training efficiency.
The purpose of this article is to design a learning algorithm to where D r is the positive definite weight matrix.
learn ur , using which, the performance of CDPRs with uncertain Based on (6), the action-value function (Q-function) is ex-
parameters can be preserved. Next, we will introduce in detail pressed as
how to establish the RL framework to learn ur .
Qπr(X(m), ur (m))= γEX(m+1) [Vπr (X (m+1))]+C(m)
III. RL-BASED CONTROL ALGORITHM where πr is the policy to be learned, Qπr (X(m), ur (m)) repre-
sents the value when the agent takes the action ur (m) at the state
A. Markov Decision Process of CDPRs
X(m) under the policy πr . Vπr(X(m + 1)) is the state-value
The agent and the environment interact with each other in the function, which means the value when the agent reaches the
RL process, and this interaction process is generally described state X(m + 1) under the policy πr . EX(m+1)[Vπr(X(m + 1))]
by the Markov decision process. A Markov decision process is is the expectation over the distribution of X(m + 1):
generally described by a five-tuple: (S, U , P, C, γ), among
them, S means the state space, U means the action space, P EX(m+1) [Vπr(X (m + 1))] =
means the state transition probability, C is the control cost, and X(m+1)∈S
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
LU et al.: DEEP REINFORCEMENT LEARNING CONTROL OF FULLY-CONSTRAINED CABLE-DRIVEN PARALLEL ROBOTS 7197
where πr (ur (m)|X(m)) represents the probability of taking the +αr ln (πr (ur (m) |X(m) ))] .
action ur (m) at the state X(m) under the policy πr (m). In the general RL training process, the optimal policy is ob-
The goal of the RL algorithm is to find an optimal policy to tained by building two sets of deep neural networks that estimate
minimize the value of the Q-function Qπr ,βr (X(m), ur (m)) and πr,φr (ur (m)|X(m)). βr and φr
πr = arg min Qπr (X(m), ur (m)) (7) are the parameters of the two neural networks, respectively.
πr The RL algorithm proposed in this article can guarantee the
where πr represents the optimal policy. stability of the closed-loop tracking system by introducing the
By introducing entropy −αr Hr (πr (ur (m+1)|X(m+1))), Lyapunov function. Under the relevant existing results [28],
it can make the control cost minimum and the entropy of the ac- [29], action-value function Qπr (X(m), ur (m)) is selected as
tion space maximum at the same time, which can make the train- the Lyapunov function. If Qπr (X(m), ur (m)) can satisfy the
ing process more efficient. Based on this, Qπr (X(m), ur (m)) inequation shown in the following during the training process,
is expressed as then the stability of the tracking system can be guaranteed
Qπr (X(m), ur (m)) = γEX(m+1) [Vπr (X (m + 1)) Qπr (X(m + 1),ur (m + 1))−Qπr (X(m),ur(m)) <−λC(m)
− αr Hr (πr (ur (m + 1) |X (m + 1) ))] + C(m) (8) where λ is a constant during the training process.
Due to the Lyapunov function, the optimal policy is obtained
where αr is a coefficient, showing the importance of entropy Hr by solving the following constrained optimization problem
in the Q-function. The entropy Hr is described as
πr = arg min Eπr [Qπr (X(m), ur (m))
Hr (πr (ur (m + 1) |X (m + 1) )) πr ∈Nr
Based on entropy Hr , the optimization problem in (7) is To obtain the abovementioned optimal policy, two sets
updated as of deep neural networks are established for estimating the
Qπr ,δr (X(m), ur (m)) and πr,μr (ur (m)|X(m)), respectively,
πr = arg min C(m) + γEX(m+1) [Vπr (X (m + 1)) in accordance with the general RL training process. δr and μr
πr ∈Nr
are the parameters of the two sets of deep neural networks above.
−αr Hr (πr (ur (m + 1) |X (m + 1) ))])
where Nr represents all the set optional policies. C. Updating Rules for Policy Gradient
To learn the optimal policy, two steps are needed to be The parameter δr of Qπr ,δr (·) is obtained by minimizing the
executed repeatedly: 1) policy evaluation and 2) policy improve- Bellman residual
ment, until training is done.
1
Policy evaluation is realized through the Bellman backup JQπr(δr ) = EX(m),ur (m)∼Fr (Qπr ,δr(X(m),ur(m))
2
operation T πr . Thus, Qπr (X(m), ur (m)) is calculated by
2
T πr Qπr (X(m), ur (m)) −C(m) − γEX(m+1) [Vπr ,δr (X (m + 1))]
= C(m) + γEX(m+1) [Vπr (X (m + 1))] where Fr represents the data set accumulated by the training.
where Vπr (X(m)) is calculated by The gradient estimate for parameter δr is calculated by
Vπr (X(m)) ∇δr JQπr (δr ) = ∇δr JQπr (δr ) (Qπr ,δr (X(m), ur (m))
= Eπr [Qπr (X(m), ur (m))+ln (πr (ur (m) |X(m) ))] . −C(m)−γQπr ,δr (X (m+1) , ur (m+1))
Policy improvement is realized by the following equation: + γαr ln (πr,μr (ur (m+1) |X (m+1) ))) .
−1 π old
Based on some mathematical techniques, the parameter μr of
e αr Q r (X(m),·)
πr = arg min
new
DKL πr (·|X(m)) πr,μr (·) is updated as
πr ∈Nr Z πrold
Jπr (μr ) = Eπr ,μr [αr ln (πr,μr (fμr ( (m); X(m))|X(m)))
old
where πrold means the last round policy, πrnew is the new one. Qπr
− Qπr ,δr (X(m), fμr ( (m); X(m)))
means the Q-value of policy πrold , DKL means Kullback–Leibler
old
divergence, and Z πr is the normalization factor. + ξ (Qπr ,δr (X (m + 1) , ur (m + 1))
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
7198 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 7, JULY 2023
16: end for P roof : Suppose a Markov chain under a policy function πr
17: until training is done has a unique probability distribution qπr (X(m)), which can be
18: Output optimal parameters δr , μr represented as qπr (X(m)) = limm→∞ P(X(m)|ρ, πr , m).
The sequence { N1 N m=0 P(X(m)|ρ, πr , m), N ∈ Z+ } can
be proved to converge by the Abelian theorem. Then it can be
−Qπr (X(m), ur (m)) + λC(m))] deduced that βπr (X(m)) = qπr (X(m)).
Based on the abovementioned conclusions, inequation (10) is
where fμr ( (m); X(m)) = ur (m). rewritten into the following form:
The gradient estimate for parameter μr is calculated by
1
N
∇μrJπr (μr ) = ∇μr αr ln (πr,μr (ur (m) |X(m))) lim P(X(m)|ρ, πr , m)× EPπr (X(m+1)|X(m) )
S N →∞N m=0
+ ∇ur (m) αr ln (πr,μr (ur (m) |X(m) ))
[Q(m + 1)] − Q(m)) dX(m) ≤ −ζEX(m)∼qπr [C(m)]
−∇ur (m) Qπr (X(m), ur(m)) ×∇μr fμr ( (m); X(m))
2
+ ξ∇ur+1 Qπr (X (m + 1) , ur (m + 1)) (12)
× ∇μr fμr ( (m); X (m + 1)) . where Q(m) is the abbreviation of Qπr (X(m), ur (m)) for
simplifying writing.
Parameter αr and ξ are used in the training process, whose
If the Lyapunov function Q(m) learned by RL is bounded,
updating rules are to assign the αr and ξ to the new αr and ξ
then two conclusions can be drawn. One is: ∀X(m) ∈ S,
when the following equations reach the maximum value:
P(X(m)|ρ, πr , m)Q(m) is bounded, and the other is: sequence
J (αr ) = Eπr αr ln(πr (ur (m) |X(m) )) + αr H̄r { N1 N m=0 P(X(m)|ρ, πr , m)Q(m)} converges to function
qπr (X(m))Q(m).
J (ξ) = ξE[Qπr ,δr (X(m + 1), fδr (X (m + 1) ; (m))) According to Lebesgue’s dominated convergence theorem: if
−Qπr (X(m), ur (m)) + λC(m)] a sequence fn (X(m)) converges pointwise to a function f and
the bound is defined by some integrable function h(X(m))
where ur (m) = fδr (X(m); (m)), and H̄r represents the ex-
pected entropy. |fn (X(m))| ≤ h (X(m)) , ∀X(m) ∈ S, ∀n ∈ Z+ .
As shown in Algorithm 1, it is a flowchart of how to use the Then, the following conclusion is drawn as:
abovementioned update rules to obtain the optimal parameters
δr and μr . lim fn (X(m))dX(m) = lim fn (X(m))dX(m).
n→∞ S S n→∞
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
LU et al.: DEEP REINFORCEMENT LEARNING CONTROL OF FULLY-CONSTRAINED CABLE-DRIVEN PARALLEL ROBOTS 7199
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
7200 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 7, JULY 2023
TABLE I
HYPERPARAMETERS FOR ALGORITHM 1
subject to
⎧
⎪ L̃1 = xp (m) − xpe (m), L̃2 = dxp (m) − dxpe (m)
⎪
⎪
⎪
⎪ L̃3 = yp (m) − ype (m), L̃4 = dyp (m) − dype (m)
⎪
⎨
L̃5 = zp (m) − zpe (m), L̃6 = dzp (m) − dzpe (m)
,
⎪
⎪ L̃7 = 1 (m) − 1e (m), L̃8 = d1 (m) − d1e (m)
⎪
⎪
⎪
⎩L̃9 = 2 (m) − 2e (m), L̃10 = d2 (m) − d2e (m)
⎪
L̃11 = 3 (m) − 3e (m), L̃12 = d3 (m) − d3e (m)
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
LU et al.: DEEP REINFORCEMENT LEARNING CONTROL OF FULLY-CONSTRAINED CABLE-DRIVEN PARALLEL ROBOTS 7201
B. Experiments
To further verify the correctness and effectiveness of the
proposed control algorithm, some experiments are conducted.
The parameters of the basic controller and the control policy
of RL are consistent with them used in the simulation. The RL
control policy used in this article does not need online training, Fig. 5. Tracking comparison diagrams with two kinds of mass. Column
(a) is the tracking comparison diagrams when the mass is 2 kg. Column
and it can be directly deployed to the controller after offline (b) is the tracking comparison diagrams when the mass is 3 kg.
training. In the experiments, a basic controller and RL-based
controller are used to tracking the four expected trajectories
used in the simulation, respectively, and at the same time, the computer, whose relevant algorithms are programmed with C
mass of the end-effector is changed to verify that the proposed language.
control algorithm has a certain adaptive ability for the mass of The trajectory tracking comparison diagrams of the experi-
the end-effector, that is, it still has a better tracking performance ments are shown in Fig. 5, in which the four figures in column
when the mass of the end-effector changes. (a) represent the tracking comparison diagrams in four kinds
The trajectory tracking experiments are conducted on a of trajectories under two controllers when the mass of the end-
3-DOF CDPR shown in Fig. 4. The 3-DOF CDPR works in effector is 2 kg, and the four figures in column (b) represent the
a cuboid with a length, width, and height of 3 m×3 m×2 m. The tracking comparison diagrams when the mass of the end-effector
CDPR consists of three winding mechanisms, three cable-outlet is 3 kg. In each figure, the black EXP represents the expected
devices, an end-effector and three cables. The winding mecha- trajectory, the blue BC represents the tracking result under the
nism is mainly composed of a servo motor, reducer, drum, drive basic controller, and the red RLC represents the tracking result
belt, lead screw, power, motor driver, and encoder. The cable-out under the RL-based controller. It can be seen from Fig. 5 that
device is composed of a fisheye ceramic bearing, which can when the mass of the end-effector is certain, the RL-based
effectively reduce the friction of the cable and adapt the cable-out controller has significantly improved the control performance
direction. The control system adopts a master–slave structure, in any kind of trajectory compared with the basic controller, and
which is composed of the upper computer (Intel i7-1165 G at the same time, when the mass of the end-effector changes, it
CPU), the lower control card, a photoelectric encoder, and still has better tracking performance than the basic controller.
tension sensor. The upper computer is mainly responsible for the As with the simulation, to make the experimental data more
implementation of trajectory planning and control algorithms, statistically significant, ten groups of experiments at the different
whose related algorithms are programmed with Python in Visual initial positions for each experiment situation, that is, a total
Studio 2019. The lower computer is mainly responsible for of 160 experiments are conducted. The RMSE of each cable
information interaction with the motor driver, encoder, and upper tracking error and each axis tracking error of the end-effector are
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
7202 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 7, JULY 2023
TABLE II
RMSE COMPARISON IN OBLIQUE CIRCLE TRAJECTORY
TABLE III
RMSE COMPARISON IN FLAT CIRCLE TRAJECTORY
TABLE IV
RMSE COMPARISON IN OBLIQUE EIGHT-TYPE TRAJECTORY
TABLE V
RMSE COMPARISON IN FLAT EIGHT-TYPE TRAJECTORY
Fig. 7. RMSEs in four trajectories when the end-effector mass is 3 kg.
(a) RMSEs about cables. (b) RMSEs about end-effector.
under the basic controller, and the red RLC is the result under
the RL-based controller.
It can be seen from Figs. 6 and 7 that whether the end-effector
mass is 2 kg or 3 kg, RMSEs of tracking errors of three axes and
cables length under RL-based controller are both lower than
them under basic controller alone. The maximum reduction
calculated, respectively. The results are shown in Tables II–V, of all the RMSEs above is almost 30%, which can further
in which BC represents the RMSE under the basic controller, verify that the proposed control algorithm can effectively
RLC represents the RMSE under the RL-based controller. From solve the problem of control performance degradation caused
Tables II–V, it can be seen that in each experiment situation, by the uncertainties of model parameters and the variations
the RMSE of the cable error and end-effector error under the of relevant parameters in the process of motion. Although
RL-based controller are all completely smaller than those under there exist certain tracking errors, it still can demonstrate the
the basic controller. effectiveness and advantages of the proposed control algorithm.
RMSEs of the three cables tracking errors and RMSEs of The tracking accuracy can be further improved if the neural
the position tracking errors of the end-effector are, respectively, network parameters are further trained.
calculated based on the abovementioned Tables II–V, as shown
in Figs. 6 and 7. Fig. 6 is the comparison diagram of the RMSEs
V. CONCLUSION
when the end-effector mass is 2 kg, and Fig. 7 is the comparison
diagram of the RMSEs when the end-effector mass is 3 kg. In this article, an based control algorithm has been proposed
Figs. 6(a) and 7(a) are the comparison diagrams of RMSEs about to suppress the negative impact on the control performance of
cables, and Figs. 6(b) and 7(b) are the comparison diagrams of the system caused by model uncertainties and simultaneously to
RMSEs about end-effector. The blue BC histogram is the result enhance the adaptability to the mass of the end-effector. Based
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
LU et al.: DEEP REINFORCEMENT LEARNING CONTROL OF FULLY-CONSTRAINED CABLE-DRIVEN PARALLEL ROBOTS 7203
on the basic controller, the setup and the implementation of the [21] A. Kumar and R. Sharma, “Linguistic Lyapunov reinforcement learning
Lyapunov-based RL algorithm were given. More importantly, it control for robotic manipulators,” Neurocomputing, vol. 272, pp. 84–95,
2018.
was proved that the Lyapunov-based RL algorithm could ensure [22] B. Kim, J. Park, S. Park, and S. Kang, “Impedance learning for robotic
the exponential stability of the closed-loop tracking system. Fi- contact tasks using natural actor-critic algorithm,” IEEE Trans. Syst. Man
nally, the results of the simulation and experiments confirmed the Cybern. Part B., vol. 40, no. 2, pp. 433–443, Apr. 2010.
[23] Q. Zhang, W. Pan, and V. Reppa, “Model-reference reinforcement learning
effectiveness and advantages of the proposed algorithm. In the for collision-free tracking control of autonomous surface vehicles,” IEEE
future, it will be verified on CDPRs with more degree of freedom. Trans. Intell. Transp. Syst., vol. 23, no. 7, pp. 8770–8781, Jul. 2022.
[24] B. Zhang, W. Shang, S. Cong, and Z. Li, “Coordinated dynamic con-
trol in the task space for redundantly actuated cable-driven parallel
REFERENCES robots,” IEEE/ASME Trans. Mechatronics., vol. 26, no. 5, pp. 2396–2407,
Oct. 2021.
[1] L. L. Cone, “Skycam: An aerial robotic camera system,” Byte, vol. 10,
[25] M. Han, L. Zhang, J. Wang, and W. Pan, “Actor-critic reinforcement
no. 10, pp. 122–132, 1985.
learning for control with stability guarantee,” IEEE Robot. Autom. Lett.,
[2] P. Miermeister et al., “The cablerobot simulator large scale motion platform
vol. 5, no. 4, pp. 6217–6224, Oct. 2020.
based on cable robot technology,” in Proc. IEEE/RSJ Int. Conf. Intell.
[26] W. Shang, B. Zhang, B. Zhang, F. Zhang, and S. Cong, “Synchronization
Robots Syst., 2016, pp. 3024–3029.
control in the cable space for cable-driven parallel robots,” IEEE Trans.
[3] B. Zi, N. Wang, S. Qian, and K. Bao, “Design, stiffness analysis and
Ind. Electron., vol. 66, no. 6, pp. 4544–4554, Jun. 2019.
experimental study of a cable-driven parallel 3D printer,” Mech. Mach.
[27] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-
Theory, vol. 132, pp. 207–222, 2019.
policy maximum entropy deep reinforcement learning with a stochastic
[4] H. Li, J. Sun, G. Pan, and Q. Yang, “Preliminary running and performance
actor,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 1861–1870.
test of the huge cable robot of FAST telescope,” in Cable-Driven Parallel
[28] L. Hu, C. Wu, and W. Pan, “Lyapunov-based reinforcement learning state
Robots. Berlin, Germany: Springer, 2018, pp. 402–414.
estimator,” 2020, arXiv:2010.13529.
[5] X. Zhang, Y. Fang, and N. Sun, “Minimum-time trajectory planning for
[29] Y. Tang, L. Hu, Q. Zhang, and W. Pan, “Reinforcement learning compen-
underactuated overhead crane systems with state and control constraints,”
sated extended Kalman filter for attitude estimation,” in Proc. IEEE/RSJ
IEEE Trans. Ind. Electron., vol. 61, no. 12, pp. 6915–6925, Dec. 2014.
Int. Conf. Intell. Robots Syst., 2021, pp. 6854–6859.
[6] N. Sun, Y. Wu, X. Liang, and Y. Fang, “Nonlinear stable transportation
control for double-pendulum shipboard cranes with ship-motion-induced
disturbances,” IEEE Trans. Ind. Electron., vol. 66, no. 12, pp. 9467–9479,
Yanqi Lu received the bachelor’s (with honors)
Dec. 2019.
degree and the master’s degree in control sci-
[7] J. A. Dit Sandretto, D. Daney, and M. Gouttefarde, “Calibration of a fully-
ence and engineering from the School of As-
constrained parallel cable-driven robot,” in Romansy 19-Robot Design
tronautics, Harbin Institute of Technology (HIT),
Dynamics and Control. Berlin, Germany: Springer, 2013, pp. 77–84.
Harbin, China, in 2020 and 2022, respectively.
[8] R. Chellal, E. Laroche, L. Cuvillon, and J. Gangloff, “An identification
He is currently working toward the Ph.D. de-
methodology for 6-DOF cable-driven parallel robots parameters appli-
gree in control science and engineering with
cation to the inca 6D robot,” in Cable-Driven Parallel Robots. Berlin,
the School of Astronautics, HIT. His research
Germany: Springer, 2013, pp. 301–317.
interests include control and trajectory planning
[9] H. Bayani, M. T. Masouleh, and A. Kalhor, “An experimental study on
of cable-driven parallel robots, reinforcement
the vision-based control and identification of planar cable-driven parallel
learning.
robots,” Robot. Auton. Syst., vol. 75, pp. 187–202, 2016.
[10] J. Lamaury, M. Gouttefarde, A. Chemori, and P.-E. Hervé, “Dual-space
adaptive control of redundantly actuated cable-driven parallel robots,” in
Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2013, pp. 4879–4886. Chengwei Wu received the B.S. degree in man-
[11] R. Babaghasabha, M. A. Khosravi, and H. D. Taghirad, “Adaptive control agement from the Arts and Science College,
of KNTU planar cable-driven parallel robot with uncertainties in dynamic Bohai University, Jinzhou, China, in 2013, the
and kinematic parameters,” in Cable-Driven Parallel Robots. Berlin, M.S. degree in software engineering from Bo-
Germany: Springer, 2015, pp. 145–159. hai University, in 2016, and the Ph.D. degree
[12] H. Ji, W. Shang, and S. Cong, “Adaptive synchronization control of in control science and engineering from Harbin
cable-driven parallel robots with uncertain kinematics and dynamics,” Institute of Technology, Harbin, China, 2021.
IEEE Trans. Ind. Electron., vol. 68, no. 9, pp. 8444–8454, Sep. 2021. He is currently an Assistant Professor with
[13] M. A. Khosravi and H. D. Taghirad, “Robust PID control of fully- the Harbin Institute of Technology. From July
constrained cable driven parallel robots,” Mechatronics, vol. 24, no. 2, 2015 to December 2015, he was a Research
pp. 87–97, 2014. Assistant in the Department of Mechanical Engi-
[14] W. Shang, F. Xie, B. Zhang, S. Cong, and Z. Li, “Adaptive cross-coupled neering, The Hong Kong Polytechnic University, Hong Kong. From 2019
control of cable-driven parallel robots with model uncertainties,” IEEE to 2021, he was a joint-Ph.D. student with the Department of Cognitive
Robot. Autom. Lett., vol. 5, no. 3, pp. 4110–4117, Jul. 2020. Robotics, Delft University of Technology, Delft, The Netherlands. His
[15] R. Babaghasabha, M. A. Khosravi, and H. D. Taghirad, “Adaptive robust research interests include sliding mode control, reinforcement learning,
control of fully-constrained cable driven parallel robots,” Mechatronics, and networked control systems.
vol. 25, pp. 27–36, 2015.
[16] R. Babaghasabha, M. A. Khosravi, and H. D. Taghirad, “Adaptive robust
control of fully constrained cable robots: Singular perturbation approach,” Weiran Yao (Member, IEEE) received the bach-
Nonlinear Dyn., vol. 85, no. 1, pp. 607–620, 2016. elor’s (with honors) degree, the master’s de-
[17] H. Jia, W. Shang, F. Xie, B. Zhang, and S. Cong, “Second-order sliding- gree, and the doctor’s degree in aeronautical
mode-based synchronization control of cable-driven parallel robots,” and astronautical science and technology from
IEEE/ASME Trans. Mechatronics, vol. 25, no. 1, pp. 383–394, Feb. 2020. the School of Astronautics, Harbin Institute of
[18] T. Bian and Z.-P. Jiang, “Reinforcement learning and adaptive optimal con- Technology (HIT), Harbin, China in 2013, 2015,
trol for continuous-time nonlinear systems: A value iteration approach,” and 2020, respectively.
IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 7, pp. 2781–2790, He is currently an Associate Professor with
Jul. 2022. the School of Astronautics, HIT. From 2017 to
[19] C. Wu, W. Yao, W. Pan, G. Sun, J. Liu, and L. Wu, “Secure control for 2018, he was a visiting Ph.D. student with the
cyber-physical systems under malicious attacks,” IEEE Trans. Control Department of Mechanical and Industrial En-
Netw. Syst., vol. 9, no. 2, pp. 775–788, Jun. 2022. gineering, University of Toronto, Toronto, ON, Canada. His research
[20] M. Han and B. Zhang, “Control of robotic manipulators using a CMAC- interests include unmanned vehicles, multirobot mission planning, and
based reinforcement learning system,” in Proc. IEEE/RSJ Int. Conf. Intell. multiagent control systems.
Robots Syst., vol. 3, 1994, pp. 2117–2122.
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
7204 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 7, JULY 2023
Guanghui Sun (Senior Member, IEEE) re- Ligang Wu (Fellow, IEEE) received the B.S.
ceived the B.S. degree in automation and the degree in automation, the M.E. degree in nav-
M.S. and Ph.D. degrees in control science and igation guidance and control, and the Ph.D. de-
engineering from the Harbin Institute of Tech- gree in control theory and control engineering all
nology, Harbin, China, in 2005, 2007, and 2010, from the Harbin Institute of Technology, Harbin,
respectively. China, in 2001, 2003, and 2006, respectively.
He is currently a Professor with the De- From January 2006 to April 2007, he was a
partment of Control Science and Engineering, Research Associate in the Department of Me-
Harbin Institute of Technology. His research in- chanical Engineering, The University of Hong
terests include fractional-order systems, net- Kong, Hong Kong. From September 2007 to
worked control systems, and sliding mode June 2008, he was a Senior Research Asso-
control. ciate in the Department of Mathematics, City University of Hong Kong,
Hong Kong. From December 2012 to December 2013, he was a Re-
search Associate in the Department of Electrical and Electronic Engi-
neering, Imperial College London, London, U.K. In 2008, he joined the
Harbin Institute of Technology, China, as an Associate Professor, and
Jianxing Liu (Senior Member, IEEE) received was then promoted to a Full Professor in 2012. He has authored or
the B.S. degree in mechanical engineering and coauthored seven research monographs and more than 170 research
the M.E. degree in control science and engi- papers in international referred journals. His current research interests
neering from the Harbin Institute of Technol- include switched systems, stochastic systems, computational and intel-
ogy, Harbin, China, in 2004 and 2010, respec- ligent systems, sliding mode control, and advanced control techniques
tively, and the Ph.D. degree in automation from for power electronic systems.
the Technical University of Belfort-Montbeliard, Prof. Wu was the recipient of the National Science Fund for Distin-
Belfort, France, in 2014. guished Young Scholars in 2015, and received China Young Five Four
He is currently a Professor with the De- Medal in 2016. He was named as the Distinguished Professor of Chang
partment of Control Science and Engineering. Jiang Scholar in 2017, and was named as the Highly Cited Researcher
His current research interests include nonlinear in 2015–2019. He was currently was as an Associate Editor for a num-
control and observation, industrial electronics, and renewable energy ber of journals, including IEEE TRANSACTIONS ON AUTOMATIC CONTROL,
solutions. IEEE/ASME TRANSACTIONS ON MECHATRONICS, IEEE TRANSACTIONS ON
Dr. Liu is currently an Associate Editor for several journals, in- INDUSTRIAL ELECTRONICS, Information Sciences, Signal Processing, and
cluding The International Society of Automation Transactions and the IET Control Theory and Applications. He is an Associate Editor for the
IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN INDUSTRIAL Conference Editorial Board, IEEE Control Systems Society. He is also a
ELECTRONICS. Fellow of IEEE.
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.