0% found this document useful (0 votes)

17 views11 pages

Deep Reinforcement Learning Control of Fully-Constrained Cable-Driven Parallel Robots

This document discusses the application of deep reinforcement learning (RL) to control cable-driven parallel robots (CDPRs) amidst uncertainties in cable dynamics and working environments. It introduces a learning-based control algorithm that compensates for uncertainties due to factors like cable elasticity and mechanical friction, and demonstrates the effectiveness of this approach through simulations and experiments. The study highlights the advantages of using deep RL for achieving desired control performance without needing precise system parameters.

Uploaded by

San manuel Ouattara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views11 pages

Deep Reinforcement Learning Control of Fully-Constrained Cable-Driven Parallel Robots

Uploaded by

San manuel Ouattara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

7194 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO.

7, JULY 2023

Deep Reinforcement Learning Control of

Fully-Constrained Cable-Driven Parallel Robots
Yanqi Lu , Chengwei Wu , Weiran Yao , Member, IEEE, Guanghui Sun , Senior Member, IEEE,
Jianxing Liu , Senior Member, IEEE, and Ligang Wu , Fellow, IEEE

Abstract—Cable-driven parallel robots (CDPRs) have property, while the cables cause less damage. In view of the
complex cable dynamics and working environment uncer- advantages, CDPRs are widely used in various scenarios, such
tainties, which bring challenges to the precise control of as the Skycam system [1], a cable robot simulator for human
CDPRs. This article introduces the reinforcement learning
to offset the negative effect on the control performance perception research [2], a 6-DOF CDPRs for 3D printing [3],
of CDPRs resulting from the uncertainties. The problem a feed drive device for the large radio telescope [4], cranes and
of controller design for CDPRs in the framework of deep storage equipment for cargo handling [5], [6]. However, cables
reinforcement learning is investigated. A learning-based lead to uncertain parameters in a dynamic model, which brings
control algorithm is proposed to compensate for uncer-
challenges to controller design for CDPRs.
tainties due to cable elasticity, mechanical friction, etc. A
basic control law is given for the nominal model, and a Accurate CDPRs parameters can be efficiently obtained by
Lyapunov-based deep reinforcement learning control law using calibration equipment. A laser tracker was chosen as
is designed. Moreover, the stability of the closed-loop the measurement device to calibrate the geometric parameters
tracking system under the reinforcement learning algo- of CDPRs in [7]. A vision system consisting of six cameras
rithm is proved. Both simulations and experiments validate was used to calibrate the kinematic and dynamic parameters
the effectiveness and advantages of the proposed control
algorithm. of CDPRs in [8]. A high-speed charge-coupled device camera
was used to measure the position of CDPRs in [9]. The
Index Terms—Cable-driven parallel robots (CDPRs), abovementioned methods work well in simple scenarios, but
deep reinforcement learning, parameter uncertainties.
the effectiveness depends on the accuracy of the calibration
equipment. The calibration accuracy of unknown parameters
I. INTRODUCTION is also not guaranteed, which limits the application of these
methods in some uncertain environments.
ABLE-DRIVEN parallel robots (CDPRs) drive the
C end-effector to move in a large working space by using
cables. The winding mechanisms of CDPRs are fixed on the
In view of these limitations, some control algorithms have
been proposed to solve the CDPRs control problem caused by
the model parameters with uncertainties. Adaptive control is
ground or the worktables, reducing the overall motion load
an efficient method for solving parameter uncertainties, which
and achieving higher motion speed. Compared with rigid
adapts to uncertain parameters by designing the adaptive law.
parallel robots and rigid manipulators, cables can effectively
The design of the adaptive law is based on the description of the
decrease damages when some accidents happen, for example,
model and the uncertainties. For the planar CDPRs, an adaptive
the breakage of the rigid manipulators or the rigid parallel
controller was proposed to deal with the parameter uncertainties
robots link will cause huge damage to the human body and
in the dynamic model in [10]. An adaptive dual space control al-
gorithm has been designed for space CDPRs subject to uncertain
Manuscript received 3 May 2022; revised 27 July 2022; accepted 17 parameters in [11]. The asynchrony of multiple cables leads to
August 2022. Date of publication 9 September 2022; date of current
version 17 February 2023. This work was supported in part by the
the model parameter uncertainties. Effective synchronization of
National Science Foundation of China under Grant 62033005, Grant cables will improve the performance of CDPRs controllers. An
62203136, Grant 62022030, Grant 62173107, and Grant 62106062, adaptive synchronous control method was proposed to reduce
in part by the Natural Science Foundation of Heilongjiang Province
under Grant ZD2021F001, in part by the Sichuan Province Science and
the synchronization errors in [12], where both kinematic and
Technology Support Program under Grant 2021YFSY0026, in part by dynamic uncertainties have been addressed.
the China Postdoctoral Science Foundation under Grant 2021M701007, To deal with more complex and implicit uncertainties, the
Grant 2021TQ0091, in part by the Fundamental Research Funds for the
Central Universities under Grant HIT.OCEF.2021005, and in part by the
robust control was proposed. The less model and uncertainties
Postdoctoral Science Foundation of Heilongjiang Province under Grant information are required in the design of the robust controller, for
LBH-Z21059. (Corresponding author: Chengwei Wu.) example, only the boundary conditions and types of the uncer-
The authors are with the School of Astronautics, Harbin Institute
of Technology, Harbin 150001, China (e-mail: [email protected].
tainties need to be known. Khosravi et al. [13] proposed a robust
cn; [email protected]; [email protected]; guanghuisun@ proportion-integration-differentiation (PID) control algorithm
hit.edu.cn; [email protected]; [email protected]). for fully-constrained CDPRs. For the fully-constrained CDPRs,
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TIE.2022.3203763.
Shang et al. [14] also proposed an adaptive robust control method
Digital Object Identifier 10.1109/TIE.2022.3203763 to suppress unmodeled dynamics and external disturbances,

0278-0046 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
LU et al.: DEEP REINFORCEMENT LEARNING CONTROL OF FULLY-CONSTRAINED CABLE-DRIVEN PARALLEL ROBOTS 7195

which has better adaptability for uncertainties compared to the The rest of the this is organized as follows. Section II provides
robust PID. Furthermore, Babaghasabha et al. [15] proposed a the CDPRs model and problem formulation. Section III presents
composite controller without knowing both the lower and upper the setup and implementation of the RL. Section IV presents
bounds of uncertainties, which is more general. the results of simulation and experiments. Section V is the
To further improve the control performance, sliding mode conclusion of this article.
control methods for CDPRs have been presented. An adaptive
sliding mode control method was proposed in [16], where both
II. CDPRS MODEL AND PROBLEM FORMULATION
the upper bound of unknown uncertainties and the form of linear
regression are not necessary knowledge in the design process. A. CDPRs Model
Considering the tracking synchronization of multiple cables, a For CDPRs, the displacement vector of the end-effector is
new synchronous control method based on second-order sliding defined by pe = [xp yp zp ]T , and ψ e = [αp βp γp ]T is defined as
mode was proposed in [17]. its rotation vector, thus, the motion vector of the end-effector is
The complex cable dynamics and working environment un- defined by x = [pe T ψ e T ]T . Based on [24], the overall dynamic
certainties of CDPRs bring challenges to the precise control model of the CDPRs is described as
of CDPRs. Some explicit uncertainties are mainly caused by

the transmission friction, the transmission ratio of the winding M (x) + J T RT −1 I m RT −1 J ẍ + J T RT −1 I m
mechanism, the accuracy of the sensor, and the central location
of gravity of the end-effector. In addition, there are some implicit RT −1 J̇ + C (x, ẋ) + J T RT −1 F v RT −1 J ẋ
uncertainties mainly caused by cable elasticity, mass change,
and sloshing of the end-effector. Limited by the identification ˙ + G = J T RT −1 u
+J T RT −1 F c sign() (1)
accuracy of the system model and model parameters, traditional
control methods have difficulties dealing with the explicit un- where u is the torque of the motor as the input and x is the output.
certainties and the implicit uncertainties. In contrast, model-free M (x) is the positive definite symmetric inertia matrix. is the
control based on reinforcement learning (RL) can effectively cable vector. J and J T are the Jacobian matrix and its transpose,
deal with the abovementioned uncertainties and solve the prob- respectively. RT is the gear ratio from motor angle to cable
lem of CDPRs control with unknown models. Many applications length. I m , F v , and F c are the inertia matrix, viscous friction
of RL can be found in the literatures [18], [19], where models matrix, and Coulomb friction matrix of the winding mechanism.
of systems were described by differential equations. Compared C(x, ẋ) is the Coriolis and centrifugal matrices. G is the gravity
with the abovementioned RL results, deep RL, where system vector. ẋ and ẍ are the velocity and acceleration vector of the
dynamics is described by a Markov decision process, provides a end-effector, respectively.
more general frame to learn a controller (a.k.a. policy). Deep RL
shows great potentials in designing robots controller under un- B. Problem Formulation
certain environment such as robotic arms [20], [21], unmanned
surface vehicles and drones [22], [23]. The overall dynamic model of CDPRs in (1) considers non-
Motivated by the abovementioned discussions and the advan- linear factors but it ignores the parameter uncertainties. Thus,
tages of deep RL, this article intends to propose a learning control the model in (1) can be regarded as a nominal counterpart.
framework for CDPRs. According to existing results, a nominal Considering uncertain parameters in CDPRs, its dynamics can
model of CDPRs and a basic control law are given. Then, be described as follows:
a Markov decision process is constructed for the closed-loop
system. A learning control algorithm is proposed by combining M U+J T RT U −1 I mU RT U −1 J ẍ + J T RT U −1 I mU
a deep RL algorithm and the Lyapunov function. Finally, both
RT U −1 J̇ + C + J T RT U −1 F vU RT U −1 J ẋ
simulation and experimental results are provided to demonstrate
the effectiveness and advantages of the proposed RL-based ˙ + GU = J T RT U −1 u (2)
+J T RT U −1 F cU sign()
control algorithm. The main contributions of this article are
summarized as follows. where M U , RT U , I mU , F vU , F cU represent model parameters
1) The dynamics of CDPRs are described as a Markov de- with uncertainties, which are mainly caused by the errors of pa-
cision process, based on which, a learning-based control rameters identification and variations in the movement process.
algorithm is first proposed for CDPRs. Compared with They are expressed as
existing results [13], [24], such an algorithm can achieve
the desired control performance without identifying exact M U = (M + ΔM ) , RT U = (RT + ΔRT )
system parameters.
2) The convergence of the learning algorithm is guaran- I mU = (I m + ΔI m ) , F vU = (F v + ΔF v )
teed [23], [25], and further the stability of CDPRs un- F cU = (F c + ΔF c ) , GU = (G + ΔG) (3)
der the proposed learning control algorithm is proved
by introducing the Lyapunov function in the learning where M , RT , I m , F v , F c , G are the nominal value of the
algorithm. parameters.

Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
7196 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 7, JULY 2023

decision process
T
X(m) = L1 , L2 , L3 , L4 , L5 , L6 · · · L2n+11 , L2n+12
subject to
⎧
⎪
⎪L1 = xp (m) − xp e (m), L2 = dxp (m) − dxp e (m)
⎪
⎪
⎪
⎪L3 = yp (m) − yp e (m), L4 = dyp (m) − dyp e (m)
⎪
⎪
⎪
⎪L5 = zp (m) − zp e (m), L6 = dzp (m) − dzp e (m)
⎪
⎪
⎪
⎪L 7 = αp (m) − αp e (m), L8 = dαp (m) − dαp e (m)
⎪
⎨L9 = βp (m) − βp (m), L10 = dβp (m) − dβp (m)
e e
⎪L11 = γp (m) − γp e (m), L12 = dγp (m) − dγp e (m) ,
Fig. 1. Control diagram of RL-based control algorithm. ⎪
⎪
⎪
⎪L13 = 1 (m) − 1e (m), L14 = d1 (m) − d1e (m)
⎪
⎪
⎪
⎪
.. ..
To solve the problem caused by parameter uncertainties, as ⎪
⎪ . .
⎪
⎪
shown in (2), this article investigates how to design a learning- ⎪
⎪L 2n+11 = n (m) , L 2n+12 = dn (m)
⎩
based control algorithm to achieve the desired control perfor- −ne (m) − dne (m)
mance. Fig. 1 shows the RL-based control diagram, from which where Li , i = 1, 2, . . ., 2n + 12 are the error items. xp (m),
we can see that the control signal u is obtained by yp (m), zp (m), αp (m), βp (m), γp (m) mean the discrete vari-
u = ua + u r (4) ables of position and rotation of the end-effector. i (m), i =
1, 2, . . ., n mean the discrete variables of the cable length. d[·]
where ua is a basic controller, and ur denotes a control signal means the differential of [·], for example, dxp (m) is the dif-
to be learned. For the basic controller, we can choose one from ferential of xp (m), standing for its rate of change. [·]e (m) and
the existing literatures for example, the control method in [26] [·](m) are, respectively, the expected value and actual value of
is chosen in this article the abovementioned variables, for example, xp e (m) represents
ua = RT T exp + I m RT −1 ¨exp + F v RT −1 ˙ exp the expected value and xp (m) represents the actual value.
Then, the Markov decision process is described as
˙ + K p e + K d e˙
+ F c sign() (5)
X (m + 1) ∼ P (X (m + 1) |X(m) , ur (m))
where ˙ exp , ¨exp , and ˙ represent the expected value and the
where X(m) ∈ S, ur (m) ∈ U , and the probability of state tran-
actual value of the cable, respectively. T exp is the expected
sition to X(m + 1) after taking action ur (m) at state X(m) is
tension of the cable, and e , e˙ represent the errors of the cable
P(X(m + 1)|X(m), ur (m)).
length and cable speed, respectively.
Based on the Markov decision process established previously,
Remark 1: According to existing results [9], [12], control
a Lyapunov-based soft actor-critic algorithm [27] is proposed
schemes should be tailored to CDPRs. If uncertain parameters
as the learning control framework for CDPRs. The detailed
cannot be effectively covered by the designed control scheme
derivation will be given as follows.
(i.e., the basic controller used in this article) or the running
environment of CDPRs changes, the performance can be de-
B. Setup of the RL
graded. To compensate for such uncertainties and improve the
adaptability, an RL-based controller is provided. In addition, the The control cost C(m) is described as
basic controller is utilized to generate effective training data for
C(m) = X T (m)D r X(m) (6)
improving training efficiency.
The purpose of this article is to design a learning algorithm to where D r is the positive definite weight matrix.
learn ur , using which, the performance of CDPRs with uncertain Based on (6), the action-value function (Q-function) is ex-
parameters can be preserved. Next, we will introduce in detail pressed as
how to establish the RL framework to learn ur .
Qπr(X(m), ur (m))= γEX(m+1) [Vπr (X (m+1))]+C(m)
III. RL-BASED CONTROL ALGORITHM where πr is the policy to be learned, Qπr (X(m), ur (m)) repre-
sents the value when the agent takes the action ur (m) at the state
A. Markov Decision Process of CDPRs
X(m) under the policy πr . Vπr(X(m + 1)) is the state-value
The agent and the environment interact with each other in the function, which means the value when the agent reaches the
RL process, and this interaction process is generally described state X(m + 1) under the policy πr . EX(m+1)[Vπr(X(m + 1))]
by the Markov decision process. A Markov decision process is is the expectation over the distribution of X(m + 1):
generally described by a five-tuple: (S, U , P, C, γ), among
them, S means the state space, U means the action space, P EX(m+1) [Vπr(X (m + 1))] =
means the state transition probability, C is the control cost, and X(m+1)∈S

γ ∈ [0, 1) is the discount factor. P (X (m + 1) |X(m), ur (m) ) · Vπr (X (m + 1))

To construct the Markov decision process, the vector X(m)
described as follows is defined as the state of the Markov where Vπr (X(m)) is expressed as

∞ Based on the abovementioned discussions, the optimal policy

Vπr(X(m)) = πr (ur (m) |X(m)) πr is obtained by
m ur(m)∈U X(m+1)∈S
πr = arg min Eπr [Qπr (X(m), ur (m))
P(X(m + 1) |X(m), ur(m) )×(γVπr (X(m + 1))+C(m)) πr ∈Nr

where πr (ur (m)|X(m)) represents the probability of taking the +αr ln (πr (ur (m) |X(m) ))] .
action ur (m) at the state X(m) under the policy πr (m). In the general RL training process, the optimal policy is ob-
The goal of the RL algorithm is to find an optimal policy to tained by building two sets of deep neural networks that estimate
minimize the value of the Q-function Qπr ,βr (X(m), ur (m)) and πr,φr (ur (m)|X(m)). βr and φr
πr = arg min Qπr (X(m), ur (m)) (7) are the parameters of the two neural networks, respectively.
πr The RL algorithm proposed in this article can guarantee the
where πr represents the optimal policy. stability of the closed-loop tracking system by introducing the
By introducing entropy −αr Hr (πr (ur (m+1)|X(m+1))), Lyapunov function. Under the relevant existing results [28],
it can make the control cost minimum and the entropy of the ac- [29], action-value function Qπr (X(m), ur (m)) is selected as
tion space maximum at the same time, which can make the train- the Lyapunov function. If Qπr (X(m), ur (m)) can satisfy the
ing process more efficient. Based on this, Qπr (X(m), ur (m)) inequation shown in the following during the training process,
is expressed as then the stability of the tracking system can be guaranteed

Qπr (X(m), ur (m)) = γEX(m+1) [Vπr (X (m + 1)) Qπr (X(m + 1),ur (m + 1))−Qπr (X(m),ur(m)) <−λC(m)

− αr Hr (πr (ur (m + 1) |X (m + 1) ))] + C(m) (8) where λ is a constant during the training process.
Due to the Lyapunov function, the optimal policy is obtained
where αr is a coefficient, showing the importance of entropy Hr by solving the following constrained optimization problem
in the Q-function. The entropy Hr is described as
πr = arg min Eπr [Qπr (X(m), ur (m))
Hr (πr (ur (m + 1) |X (m + 1) )) πr ∈Nr

+αr ln (πr (ur (m) |X(m) ))]

=− πr (ur (m) |X(m) ) ln (πr (ur (m) |X(m) ))
ur (m)∈U subject to
= − Eπr [ln (πr (ur (m) |X(m)))] . Qπr (X(m + 1),ur (m + 1))−Qπr (X(m),ur(m)) <−λC(m).

Based on entropy Hr , the optimization problem in (7) is To obtain the abovementioned optimal policy, two sets
updated as of deep neural networks are established for estimating the
Qπr ,δr (X(m), ur (m)) and πr,μr (ur (m)|X(m)), respectively,
πr = arg min C(m) + γEX(m+1) [Vπr (X (m + 1)) in accordance with the general RL training process. δr and μr
πr ∈Nr
are the parameters of the two sets of deep neural networks above.
−αr Hr (πr (ur (m + 1) |X (m + 1) ))])
where Nr represents all the set optional policies. C. Updating Rules for Policy Gradient
To learn the optimal policy, two steps are needed to be The parameter δr of Qπr ,δr (·) is obtained by minimizing the
executed repeatedly: 1) policy evaluation and 2) policy improve- Bellman residual
ment, until training is done.
1
Policy evaluation is realized through the Bellman backup JQπr(δr ) = EX(m),ur (m)∼Fr (Qπr ,δr(X(m),ur(m))
2
operation T πr . Thus, Qπr (X(m), ur (m)) is calculated by
2
T πr Qπr (X(m), ur (m)) −C(m) − γEX(m+1) [Vπr ,δr (X (m + 1))]
= C(m) + γEX(m+1) [Vπr (X (m + 1))] where Fr represents the data set accumulated by the training.
where Vπr (X(m)) is calculated by The gradient estimate for parameter δr is calculated by
Vπr (X(m)) ∇δr JQπr (δr ) = ∇δr JQπr (δr ) (Qπr ,δr (X(m), ur (m))
= Eπr [Qπr (X(m), ur (m))+ln (πr (ur (m) |X(m) ))] . −C(m)−γQπr ,δr (X (m+1) , ur (m+1))

Policy improvement is realized by the following equation: + γαr ln (πr,μr (ur (m+1) |X (m+1) ))) .
−1 π old
Based on some mathematical techniques, the parameter μr of
e αr Q r (X(m),·)
πr = arg min
new
DKL πr (·|X(m)) πr,μr (·) is updated as
πr ∈Nr Z πrold
Jπr (μr ) = Eπr ,μr [αr ln (πr,μr (fμr ( (m); X(m))|X(m)))
old
where πrold means the last round policy, πrnew is the new one. Qπr
− Qπr ,δr (X(m), fμr ( (m); X(m)))
means the Q-value of policy πrold , DKL means Kullback–Leibler
old
divergence, and Z πr is the normalization factor. + ξ (Qπr ,δr (X (m + 1) , ur (m + 1))

Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
7198 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 7, JULY 2023

exponential stability shown in the following inequation:

Algorithm 1: Reinforcement Learning Based Control
Algorithm. σ1 C(m) ≤ Qπr (X(m), ur(m)) ≤ σ2 C(m)
1: Initialize parameters δr , μr , αr , ξ (9)
2: Set expected parameter δ̄r by δ̄r ← δr
3: Use basic controller to obtain the data set Fr EX(m)∼βπr EX(m+1)∼Pπr [Qπr (X(m + 1) , ur (m + 1))]
4: repeat
−Qπr (X(m), ur (m))] ≤ −ζEX(m)∼βπr [C(m)]
5: for each data collection step do
6: Sample ur(m) from policy πr,ur(ur(m)|X(m)) (10)
7: Use ur(m) to obtain new data X(m) σ2
||EX(m)∼βπr [X (m)] ||2 ≤ εm ||EX(0)∼βπr [X (0)] ||2
8: Expand the dataset Fr with new data X(m) σ1
9: end for (11)
10: for each gradient step do
where σ1 > 0, σ2 > 0, ζ > 0, and ε ∈ (0, 1). βπr (X(m)) is the
11: δr ← δr − ιQπr ∇δr JQπr (δr )
state probability distribution and its definition is as follows:
12: μr ← μr − ιπr ∇ur Jπr (μr )
13: αr ← αr − ιαr ∇αr Jαr (αr ) Δ 1
N

14: ξ ← ξ − ιξ ∇ξ Jξ (ξ) βπr (X(m)) = lim P (X(m) |ρ, πr , m ).

N →∞ N
15: δ̄r ← τ δr + (1 − τ )δ̄r m=0

16: end for P roof : Suppose a Markov chain under a policy function πr
17: until training is done has a unique probability distribution qπr (X(m)), which can be
18: Output optimal parameters δr , μr represented as qπr (X(m)) = limm→∞ P(X(m)|ρ, πr , m).

The sequence { N1 N m=0 P(X(m)|ρ, πr , m), N ∈ Z+ } can
be proved to converge by the Abelian theorem. Then it can be
−Qπr (X(m), ur (m)) + λC(m))] deduced that βπr (X(m)) = qπr (X(m)).
Based on the abovementioned conclusions, inequation (10) is
where fμr ( (m); X(m)) = ur (m). rewritten into the following form:
The gradient estimate for parameter μr is calculated by
1
N

∇μrJπr (μr ) = ∇μr αr ln (πr,μr (ur (m) |X(m))) lim P(X(m)|ρ, πr , m)× EPπr (X(m+1)|X(m) )
S N →∞N m=0

+ ∇ur (m) αr ln (πr,μr (ur (m) |X(m) ))
[Q(m + 1)] − Q(m)) dX(m) ≤ −ζEX(m)∼qπr [C(m)]
−∇ur (m) Qπr (X(m), ur(m)) ×∇μr fμr ( (m); X(m))
2
+ ξ∇ur+1 Qπr (X (m + 1) , ur (m + 1)) (12)
× ∇μr fμr ( (m); X (m + 1)) . where Q(m) is the abbreviation of Qπr (X(m), ur (m)) for
simplifying writing.
Parameter αr and ξ are used in the training process, whose
If the Lyapunov function Q(m) learned by RL is bounded,
updating rules are to assign the αr and ξ to the new αr and ξ
then two conclusions can be drawn. One is: ∀X(m) ∈ S,
when the following equations reach the maximum value:
P(X(m)|ρ, πr , m)Q(m) is bounded, and the other is: sequence

J (αr ) = Eπr αr ln(πr (ur (m) |X(m) )) + αr H̄r { N1 N m=0 P(X(m)|ρ, πr , m)Q(m)} converges to function
qπr (X(m))Q(m).
J (ξ) = ξE[Qπr ,δr (X(m + 1), fδr (X (m + 1) ; (m))) According to Lebesgue’s dominated convergence theorem: if
−Qπr (X(m), ur (m)) + λC(m)] a sequence fn (X(m)) converges pointwise to a function f and
the bound is defined by some integrable function h(X(m))
where ur (m) = fδr (X(m); (m)), and H̄r represents the ex-
pected entropy. |fn (X(m))| ≤ h (X(m)) , ∀X(m) ∈ S, ∀n ∈ Z+ .
As shown in Algorithm 1, it is a flowchart of how to use the Then, the following conclusion is drawn as:
abovementioned update rules to obtain the optimal parameters
δr and μr . lim fn (X(m))dX(m) = lim fn (X(m))dX(m).
n→∞ S S n→∞

D. Stability Analysis According to the abovementioned conclusions, inequation

(12) is updated as
A detailed mathematical proof is given to prove the RL-based
N
control algorithm proposed in this article can guarantee the 1
stability of the closed-loop tracking system. lim P (X(m) |ρ, πr , m )
S N →∞ N m=0
Theorem 1: If the Lyapunov function Qπr (X(m), ur (m))
learned in the RL training process satisfies the inequations (9)
× Pπr (X (m + 1) |X(m) )
and (10), then the closed-loop tracking system is proved to have S

× Q (m + 1) dX (m + 1) − Q(m)) dX(m) IV. SIMULATION AND EXPERIMENTS

N +1 To verify the control performance of the RL-based control
1
= lim EP(X(m)|ρ,πr ,m ) [Q(m)] algorithm proposed in this article, simulations and experiments
N →∞ N
m=1 are conducted on a 3-DOF CDPR with three cables. The trajec-
N
tories of the oblique circle, flat circle, oblique eight-type, and flat
− EP(X(m)|ρ,πr ,m ) [Q(m)] eight-type are conducted, respectively, to make the verification
m=0 have diversity and statistical significance. The center coordinates
of the expected circle is (x = - 1, y = 1, and z = 1 m), the radius
1
N
of its is 0.5 m and the angular velocity is: ω = 2π
= lim EP(X(m+1)|ρ,πr ,m+1 ) [Q (m + 1)] 5 rad/s. The
N →∞ N center coordinates of the expected eight-type trajectory is also
m=0
(x = - 1, y = 1, and z = 1 m), the width is 0.4 m and the angular
− EP(X(m)|ρ,πr ,m ) [Q(m)]). velocities of the three axes are, respectively: ωx = 2π 1π
5 , ωy = 5 ,
1π
and ωz = 5 rad/s. Flat expected trajectories locate at the plane
The following equation holds because the variable ε always
z = 1 m and the z coordinate of the oblique expected trajectories
exists:
changes during the movement process.
1 ζ
− 1 σ2 − = 0. (13)
ε ε
A. Simulation
The following results are obtained by combining (9) with (13):
The simulation is divided into three cases.
1
E
ι+1 P(X(ι+1)|ρ,πr ,ι+1 )
[Q (ι + 1)] 1) Basic controller when the model parameters are accurate
ε (hereinafter referred to as BS).
1 2) Basic controller when the model parameters are uncertain
− EP(X(ι)|ρ,πr ,ι ) [Q (ι)]
ει (hereinafter referred to as BU).
1 3) RL-based controller when the model parameters are un-
= ι+1 EP(X(ι+1)|ρ,πr ,ι+1 ) [Q (ι + 1)] certain (hereinafter referred to as RLU).
ε
Case 1. Basic controller when the model parameters are
−EP(X(ι)|ρ,πr ,ι ) [Q (ι)]
accurate: Assuming that parameters M , RT , and G are ac-

1 1 curately measured and remain unchanged during the mo-
+ ι − 1 EP(X(ι)|ρ,πr ,ι ) [Q (ι)] tion of the CDPRs, and parameters I m , F v , and F c are
ε ε
accurately obtained by the existing parameter identification
1 ζ 1 methods. Then, some parameters of the model in (1) are
≤ ι − + − 1 σ2 C(m).
ε ε ε thought the same as those of the basic controller in (5),
Based on the previous, the following inequation is deduced whose values are as follows: RT = diag(0.06, 0.06, 0.06),
as: M = diag(2, 2, 2), F v = diag(0.3245, 0.3211, 0.3321), K d =
diag(0.5, 0.5, 0.5), F c = diag(0.4567, 0.4475, 0.4532), K p =
EP(X(ι+1)|ρ,πr ,ι+1 ) [Q (ι + 1)] EP(X(ι)|ρ,πr ,ι ) [Q (ι)] diag(20, 20, 20), I m = diag(0.03562, 0.03562, 0.03562).
− ≤ 0.
ει+1 ει Case 2. Basic controller when the model parameters are
Adding the abovememtioned inequations from ι = 0 to uncertain: In the actual systems, due to the accuracy of
m − 1, the following inequation is obtained as: the parameter identification methods and the fact that some
parameters may change during the movement, the value of
1 parameters that can be accurately obtained in case 1 has some
EP(X(m)|ρ,πr ,m ) [Q(m)] − EP(X(0)|ρ,πr ,0 ) [Q(0)] ≤ 0.
εm uncertainties. Then, the value of the parameters of the dynamic
Then, the following inequation is obtained as: model is no longer the same as those in the designed basic
controller. To verify the control performance of the basic
σ2
EX(m)∼βπr [C(m)] ≤ εm EX(0)∼βπr [C(0)] . controller when model parameters are uncertain, the same
σ1 basic controller as case 1 is chosen, and some uncertainties are
Based on the abovementioned results, we can deduce the added to the parameters of the dynamic model. The parameter’s
following inequation: value are as follows: M U = diag(2.3, 2.3, 2.3), RT U =
σ2 diag(0.063222, 0.0646622, 0.063545), I mU = diag(0.05362,
EX(m)∼βπr [X (m)] 2 ≤ εm EX(0)∼βπr [X (0)] 2. 0.05262, 0.05262), F cU = diag(0.938571, 0.91055, 0.991356),
σ1
F vU = diag(0.6345, 0.64711, 0.62511).
Based on the abovementioned discussions, it can conclude Case 3. RL-based controller when the model parameters are
that the RL-based control algorithm proposed in this article uncertain: In this case, an RL training framework for the 3-
can ensure the exponential stability of the closed-loop tracking DOF CDPR with three cables is established to learn RL control
system. Hence, the proof is completed. policy. The RL-based control algorithm proposed in this article

Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
7200 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 7, JULY 2023

TABLE I
HYPERPARAMETERS FOR ALGORITHM 1

is composed of the trained control policy and the basic controller.

To verify the effectiveness of this algorithm, the same parameters
of the basic controller and dynamic model as case 2 are selected.
To construct the Markov decision process of 3-DOF CDPR
with three cables, the vector X̃(m) described as follows is Fig. 2. Tracking comparison diagrams in (a) oblique circle trajectory,
defined as the state of the Markov decision process: (b) flat circle trajectory, (c) oblique eight-type trajectory, and (d) flat eight-
T type trajectory under three cases.
X̃(m)= L̃1 , L̃2 , L̃3 , L̃4 , L̃5 , L̃6 , L̃7 , L̃8 , L̃9 , L̃10 , L̃11 , L̃12

subject to
⎧
⎪ L̃1 = xp (m) − xpe (m), L̃2 = dxp (m) − dxpe (m)
⎪
⎪
⎪
⎪ L̃3 = yp (m) − ype (m), L̃4 = dyp (m) − dype (m)
⎪
⎨
L̃5 = zp (m) − zpe (m), L̃6 = dzp (m) − dzpe (m)
,
⎪
⎪ L̃7 = 1 (m) − 1e (m), L̃8 = d1 (m) − d1e (m)
⎪
⎪
⎪
⎩L̃9 = 2 (m) − 2e (m), L̃10 = d2 (m) − d2e (m)
⎪
L̃11 = 3 (m) − 3e (m), L̃12 = d3 (m) − d3e (m)

where L̃i , i = 1, 2, . . ., 12 are the error items. xp (m), yp (m),

zp (m) mean the discrete variables of the end-effector position.
1 (m), 2 (m), 3 (m) mean the discrete variables of the cable
length. d[·] is the differential of [·], meaning its rate of change.
[·]e (m) and [·](m) are the expected value and actual value of the
abovementioned variables, respectively.
The hyperparameters of the RL training are shown in Ta- Fig. 3. RMSEs under three cases. (a) RMSEs about end-effector. (b)
ble I. The weight matrix D r in the control cost shown in (6) RMSEs about cables.
is: D r = diag(25, 1, 25, 1, 25, 1, 25, 1, 25, 1, 25, 1). Combining
with Algorithm 1 and Table I, the RL training can be con-
However, with the same parameter uncertainties, RL-based
ducted. The parameters of the model in (3) and the mass of
controller proposed in this article can effectively suppress the
the end-effector are changed within a certain range during the
negative effect of parameter uncertainties and can better track
RL training process, which can enhance the robustness of the
the expected trajectory.
control policy learned. In each RL training process, ten control
To make the simulation have more statistical significance, ten
policies are trained at the same time, and the best of ten policies
groups of simulation are carried out at different initial positions
is selected as the control policy of the RL-based controller in
for each trajectory under each control case, that is, a total of 120
this article.
groups of simulation. Based on the results of the simulation, the
The tracking comparison diagrams in four trajectories under
root-mean-square errors (RMSEs) of the three cable tracking
the three cases abovementioned are shown in Fig. 2, where black
errors and of the position tracking errors of the end-effector are,
EXP represents the expected trajectory, green BS represents
respectively, calculated, which are shown in Fig. 3. Based on
the result of case 1, blue BU represents the result of case 2
Fig. 3, it can be verified that the RL based control algorithm
and red RLU represents the result of case 3. It can be seen
can effectively compensate for the negative effect of parameter
that the basic controller has good tracking accuracy when the
uncertainties and improve the tracking accuracy when the model
model parameters are accurate, but when the model parameters
parameters are uncertain compared with the basic controller
are uncertain, the tracking accuracy is significantly reduced.
alone.

Fig. 4. Experiment platform of CDPR.

B. Experiments
To further verify the correctness and effectiveness of the
proposed control algorithm, some experiments are conducted.
The parameters of the basic controller and the control policy
of RL are consistent with them used in the simulation. The RL
control policy used in this article does not need online training, Fig. 5. Tracking comparison diagrams with two kinds of mass. Column
(a) is the tracking comparison diagrams when the mass is 2 kg. Column
and it can be directly deployed to the controller after offline (b) is the tracking comparison diagrams when the mass is 3 kg.
training. In the experiments, a basic controller and RL-based
controller are used to tracking the four expected trajectories
used in the simulation, respectively, and at the same time, the computer, whose relevant algorithms are programmed with C
mass of the end-effector is changed to verify that the proposed language.
control algorithm has a certain adaptive ability for the mass of The trajectory tracking comparison diagrams of the experi-
the end-effector, that is, it still has a better tracking performance ments are shown in Fig. 5, in which the four figures in column
when the mass of the end-effector changes. (a) represent the tracking comparison diagrams in four kinds
The trajectory tracking experiments are conducted on a of trajectories under two controllers when the mass of the end-
3-DOF CDPR shown in Fig. 4. The 3-DOF CDPR works in effector is 2 kg, and the four figures in column (b) represent the
a cuboid with a length, width, and height of 3 m×3 m×2 m. The tracking comparison diagrams when the mass of the end-effector
CDPR consists of three winding mechanisms, three cable-outlet is 3 kg. In each figure, the black EXP represents the expected
devices, an end-effector and three cables. The winding mecha- trajectory, the blue BC represents the tracking result under the
nism is mainly composed of a servo motor, reducer, drum, drive basic controller, and the red RLC represents the tracking result
belt, lead screw, power, motor driver, and encoder. The cable-out under the RL-based controller. It can be seen from Fig. 5 that
device is composed of a fisheye ceramic bearing, which can when the mass of the end-effector is certain, the RL-based
effectively reduce the friction of the cable and adapt the cable-out controller has significantly improved the control performance
direction. The control system adopts a master–slave structure, in any kind of trajectory compared with the basic controller, and
which is composed of the upper computer (Intel i7-1165 G at the same time, when the mass of the end-effector changes, it
CPU), the lower control card, a photoelectric encoder, and still has better tracking performance than the basic controller.
tension sensor. The upper computer is mainly responsible for the As with the simulation, to make the experimental data more
implementation of trajectory planning and control algorithms, statistically significant, ten groups of experiments at the different
whose related algorithms are programmed with Python in Visual initial positions for each experiment situation, that is, a total
Studio 2019. The lower computer is mainly responsible for of 160 experiments are conducted. The RMSE of each cable
information interaction with the motor driver, encoder, and upper tracking error and each axis tracking error of the end-effector are

Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
7202 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 7, JULY 2023

TABLE II
RMSE COMPARISON IN OBLIQUE CIRCLE TRAJECTORY

TABLE III
RMSE COMPARISON IN FLAT CIRCLE TRAJECTORY

Fig. 6. RMSEs in four trajectories when the end-effector mass is 2 kg.

(a) RMSEs about cables. (b) RMSEs about end-effector.

TABLE IV
RMSE COMPARISON IN OBLIQUE EIGHT-TYPE TRAJECTORY

TABLE V
RMSE COMPARISON IN FLAT EIGHT-TYPE TRAJECTORY
Fig. 7. RMSEs in four trajectories when the end-effector mass is 3 kg.
(a) RMSEs about cables. (b) RMSEs about end-effector.

under the basic controller, and the red RLC is the result under
the RL-based controller.
It can be seen from Figs. 6 and 7 that whether the end-effector
mass is 2 kg or 3 kg, RMSEs of tracking errors of three axes and
cables length under RL-based controller are both lower than
them under basic controller alone. The maximum reduction
calculated, respectively. The results are shown in Tables II–V, of all the RMSEs above is almost 30%, which can further
in which BC represents the RMSE under the basic controller, verify that the proposed control algorithm can effectively
RLC represents the RMSE under the RL-based controller. From solve the problem of control performance degradation caused
Tables II–V, it can be seen that in each experiment situation, by the uncertainties of model parameters and the variations
the RMSE of the cable error and end-effector error under the of relevant parameters in the process of motion. Although
RL-based controller are all completely smaller than those under there exist certain tracking errors, it still can demonstrate the
the basic controller. effectiveness and advantages of the proposed control algorithm.
RMSEs of the three cables tracking errors and RMSEs of The tracking accuracy can be further improved if the neural
the position tracking errors of the end-effector are, respectively, network parameters are further trained.
calculated based on the abovementioned Tables II–V, as shown
in Figs. 6 and 7. Fig. 6 is the comparison diagram of the RMSEs
V. CONCLUSION
when the end-effector mass is 2 kg, and Fig. 7 is the comparison
diagram of the RMSEs when the end-effector mass is 3 kg. In this article, an based control algorithm has been proposed
Figs. 6(a) and 7(a) are the comparison diagrams of RMSEs about to suppress the negative impact on the control performance of
cables, and Figs. 6(b) and 7(b) are the comparison diagrams of the system caused by model uncertainties and simultaneously to
RMSEs about end-effector. The blue BC histogram is the result enhance the adaptability to the mass of the end-effector. Based
Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
LU et al.: DEEP REINFORCEMENT LEARNING CONTROL OF FULLY-CONSTRAINED CABLE-DRIVEN PARALLEL ROBOTS 7203

on the basic controller, the setup and the implementation of the [21] A. Kumar and R. Sharma, “Linguistic Lyapunov reinforcement learning
Lyapunov-based RL algorithm were given. More importantly, it control for robotic manipulators,” Neurocomputing, vol. 272, pp. 84–95,
2018.
was proved that the Lyapunov-based RL algorithm could ensure [22] B. Kim, J. Park, S. Park, and S. Kang, “Impedance learning for robotic
the exponential stability of the closed-loop tracking system. Fi- contact tasks using natural actor-critic algorithm,” IEEE Trans. Syst. Man
nally, the results of the simulation and experiments confirmed the Cybern. Part B., vol. 40, no. 2, pp. 433–443, Apr. 2010.
[23] Q. Zhang, W. Pan, and V. Reppa, “Model-reference reinforcement learning
effectiveness and advantages of the proposed algorithm. In the for collision-free tracking control of autonomous surface vehicles,” IEEE
future, it will be verified on CDPRs with more degree of freedom. Trans. Intell. Transp. Syst., vol. 23, no. 7, pp. 8770–8781, Jul. 2022.
[24] B. Zhang, W. Shang, S. Cong, and Z. Li, “Coordinated dynamic con-
trol in the task space for redundantly actuated cable-driven parallel
REFERENCES robots,” IEEE/ASME Trans. Mechatronics., vol. 26, no. 5, pp. 2396–2407,
Oct. 2021.
[1] L. L. Cone, “Skycam: An aerial robotic camera system,” Byte, vol. 10,
[25] M. Han, L. Zhang, J. Wang, and W. Pan, “Actor-critic reinforcement
no. 10, pp. 122–132, 1985.
learning for control with stability guarantee,” IEEE Robot. Autom. Lett.,
[2] P. Miermeister et al., “The cablerobot simulator large scale motion platform
vol. 5, no. 4, pp. 6217–6224, Oct. 2020.
based on cable robot technology,” in Proc. IEEE/RSJ Int. Conf. Intell.
[26] W. Shang, B. Zhang, B. Zhang, F. Zhang, and S. Cong, “Synchronization
Robots Syst., 2016, pp. 3024–3029.
control in the cable space for cable-driven parallel robots,” IEEE Trans.
[3] B. Zi, N. Wang, S. Qian, and K. Bao, “Design, stiffness analysis and
Ind. Electron., vol. 66, no. 6, pp. 4544–4554, Jun. 2019.
experimental study of a cable-driven parallel 3D printer,” Mech. Mach.
[27] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-
Theory, vol. 132, pp. 207–222, 2019.
policy maximum entropy deep reinforcement learning with a stochastic
[4] H. Li, J. Sun, G. Pan, and Q. Yang, “Preliminary running and performance
actor,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 1861–1870.
test of the huge cable robot of FAST telescope,” in Cable-Driven Parallel
[28] L. Hu, C. Wu, and W. Pan, “Lyapunov-based reinforcement learning state
Robots. Berlin, Germany: Springer, 2018, pp. 402–414.
estimator,” 2020, arXiv:2010.13529.
[5] X. Zhang, Y. Fang, and N. Sun, “Minimum-time trajectory planning for
[29] Y. Tang, L. Hu, Q. Zhang, and W. Pan, “Reinforcement learning compen-
underactuated overhead crane systems with state and control constraints,”
sated extended Kalman filter for attitude estimation,” in Proc. IEEE/RSJ
IEEE Trans. Ind. Electron., vol. 61, no. 12, pp. 6915–6925, Dec. 2014.
Int. Conf. Intell. Robots Syst., 2021, pp. 6854–6859.
[6] N. Sun, Y. Wu, X. Liang, and Y. Fang, “Nonlinear stable transportation
control for double-pendulum shipboard cranes with ship-motion-induced
disturbances,” IEEE Trans. Ind. Electron., vol. 66, no. 12, pp. 9467–9479,
Yanqi Lu received the bachelor’s (with honors)
Dec. 2019.
degree and the master’s degree in control sci-
[7] J. A. Dit Sandretto, D. Daney, and M. Gouttefarde, “Calibration of a fully-
ence and engineering from the School of As-
constrained parallel cable-driven robot,” in Romansy 19-Robot Design
tronautics, Harbin Institute of Technology (HIT),
Dynamics and Control. Berlin, Germany: Springer, 2013, pp. 77–84.
Harbin, China, in 2020 and 2022, respectively.
[8] R. Chellal, E. Laroche, L. Cuvillon, and J. Gangloff, “An identification
He is currently working toward the Ph.D. de-
methodology for 6-DOF cable-driven parallel robots parameters appli-
gree in control science and engineering with
cation to the inca 6D robot,” in Cable-Driven Parallel Robots. Berlin,
the School of Astronautics, HIT. His research
Germany: Springer, 2013, pp. 301–317.
interests include control and trajectory planning
[9] H. Bayani, M. T. Masouleh, and A. Kalhor, “An experimental study on
of cable-driven parallel robots, reinforcement
the vision-based control and identification of planar cable-driven parallel
learning.
robots,” Robot. Auton. Syst., vol. 75, pp. 187–202, 2016.
[10] J. Lamaury, M. Gouttefarde, A. Chemori, and P.-E. Hervé, “Dual-space
adaptive control of redundantly actuated cable-driven parallel robots,” in
Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2013, pp. 4879–4886. Chengwei Wu received the B.S. degree in man-
[11] R. Babaghasabha, M. A. Khosravi, and H. D. Taghirad, “Adaptive control agement from the Arts and Science College,
of KNTU planar cable-driven parallel robot with uncertainties in dynamic Bohai University, Jinzhou, China, in 2013, the
and kinematic parameters,” in Cable-Driven Parallel Robots. Berlin, M.S. degree in software engineering from Bo-
Germany: Springer, 2015, pp. 145–159. hai University, in 2016, and the Ph.D. degree
[12] H. Ji, W. Shang, and S. Cong, “Adaptive synchronization control of in control science and engineering from Harbin
cable-driven parallel robots with uncertain kinematics and dynamics,” Institute of Technology, Harbin, China, 2021.
IEEE Trans. Ind. Electron., vol. 68, no. 9, pp. 8444–8454, Sep. 2021. He is currently an Assistant Professor with
[13] M. A. Khosravi and H. D. Taghirad, “Robust PID control of fully- the Harbin Institute of Technology. From July
constrained cable driven parallel robots,” Mechatronics, vol. 24, no. 2, 2015 to December 2015, he was a Research
pp. 87–97, 2014. Assistant in the Department of Mechanical Engi-
[14] W. Shang, F. Xie, B. Zhang, S. Cong, and Z. Li, “Adaptive cross-coupled neering, The Hong Kong Polytechnic University, Hong Kong. From 2019
control of cable-driven parallel robots with model uncertainties,” IEEE to 2021, he was a joint-Ph.D. student with the Department of Cognitive
Robot. Autom. Lett., vol. 5, no. 3, pp. 4110–4117, Jul. 2020. Robotics, Delft University of Technology, Delft, The Netherlands. His
[15] R. Babaghasabha, M. A. Khosravi, and H. D. Taghirad, “Adaptive robust research interests include sliding mode control, reinforcement learning,
control of fully-constrained cable driven parallel robots,” Mechatronics, and networked control systems.
vol. 25, pp. 27–36, 2015.
[16] R. Babaghasabha, M. A. Khosravi, and H. D. Taghirad, “Adaptive robust
control of fully constrained cable robots: Singular perturbation approach,” Weiran Yao (Member, IEEE) received the bach-
Nonlinear Dyn., vol. 85, no. 1, pp. 607–620, 2016. elor’s (with honors) degree, the master’s de-
[17] H. Jia, W. Shang, F. Xie, B. Zhang, and S. Cong, “Second-order sliding- gree, and the doctor’s degree in aeronautical
mode-based synchronization control of cable-driven parallel robots,” and astronautical science and technology from
IEEE/ASME Trans. Mechatronics, vol. 25, no. 1, pp. 383–394, Feb. 2020. the School of Astronautics, Harbin Institute of
[18] T. Bian and Z.-P. Jiang, “Reinforcement learning and adaptive optimal con- Technology (HIT), Harbin, China in 2013, 2015,
trol for continuous-time nonlinear systems: A value iteration approach,” and 2020, respectively.
IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 7, pp. 2781–2790, He is currently an Associate Professor with
Jul. 2022. the School of Astronautics, HIT. From 2017 to
[19] C. Wu, W. Yao, W. Pan, G. Sun, J. Liu, and L. Wu, “Secure control for 2018, he was a visiting Ph.D. student with the
cyber-physical systems under malicious attacks,” IEEE Trans. Control Department of Mechanical and Industrial En-
Netw. Syst., vol. 9, no. 2, pp. 775–788, Jun. 2022. gineering, University of Toronto, Toronto, ON, Canada. His research
[20] M. Han and B. Zhang, “Control of robotic manipulators using a CMAC- interests include unmanned vehicles, multirobot mission planning, and
based reinforcement learning system,” in Proc. IEEE/RSJ Int. Conf. Intell. multiagent control systems.
Robots Syst., vol. 3, 1994, pp. 2117–2122.

Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.
7204 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO. 7, JULY 2023

Guanghui Sun (Senior Member, IEEE) re- Ligang Wu (Fellow, IEEE) received the B.S.
ceived the B.S. degree in automation and the degree in automation, the M.E. degree in nav-
M.S. and Ph.D. degrees in control science and igation guidance and control, and the Ph.D. de-
engineering from the Harbin Institute of Tech- gree in control theory and control engineering all
nology, Harbin, China, in 2005, 2007, and 2010, from the Harbin Institute of Technology, Harbin,
respectively. China, in 2001, 2003, and 2006, respectively.
He is currently a Professor with the De- From January 2006 to April 2007, he was a
partment of Control Science and Engineering, Research Associate in the Department of Me-
Harbin Institute of Technology. His research in- chanical Engineering, The University of Hong
terests include fractional-order systems, net- Kong, Hong Kong. From September 2007 to
worked control systems, and sliding mode June 2008, he was a Senior Research Asso-
control. ciate in the Department of Mathematics, City University of Hong Kong,
Hong Kong. From December 2012 to December 2013, he was a Re-
search Associate in the Department of Electrical and Electronic Engi-
neering, Imperial College London, London, U.K. In 2008, he joined the
Harbin Institute of Technology, China, as an Associate Professor, and
Jianxing Liu (Senior Member, IEEE) received was then promoted to a Full Professor in 2012. He has authored or
the B.S. degree in mechanical engineering and coauthored seven research monographs and more than 170 research
the M.E. degree in control science and engi- papers in international referred journals. His current research interests
neering from the Harbin Institute of Technol- include switched systems, stochastic systems, computational and intel-
ogy, Harbin, China, in 2004 and 2010, respec- ligent systems, sliding mode control, and advanced control techniques
tively, and the Ph.D. degree in automation from for power electronic systems.
the Technical University of Belfort-Montbeliard, Prof. Wu was the recipient of the National Science Fund for Distin-
Belfort, France, in 2014. guished Young Scholars in 2015, and received China Young Five Four
He is currently a Professor with the De- Medal in 2016. He was named as the Distinguished Professor of Chang
partment of Control Science and Engineering. Jiang Scholar in 2017, and was named as the Highly Cited Researcher
His current research interests include nonlinear in 2015–2019. He was currently was as an Associate Editor for a num-
control and observation, industrial electronics, and renewable energy ber of journals, including IEEE TRANSACTIONS ON AUTOMATIC CONTROL,
solutions. IEEE/ASME TRANSACTIONS ON MECHATRONICS, IEEE TRANSACTIONS ON
Dr. Liu is currently an Associate Editor for several journals, in- INDUSTRIAL ELECTRONICS, Information Sciences, Signal Processing, and
cluding The International Society of Automation Transactions and the IET Control Theory and Applications. He is an Associate Editor for the
IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN INDUSTRIAL Conference Editorial Board, IEEE Control Systems Society. He is also a
ELECTRONICS. Fellow of IEEE.

Authorized licensed use limited to: Universite du Quebec a Trois-Rivieres. Downloaded on January 24,2025 at 00:43:31 UTC from IEEE Xplore. Restrictions apply.

Adaptive Control of Robot Manipulators Using The Voltage Control Strategy
No ratings yet
Adaptive Control of Robot Manipulators Using The Voltage Control Strategy
6 pages
MPC Control and LQ Optimal Control of A Two-Link Robot Arm
No ratings yet
MPC Control and LQ Optimal Control of A Two-Link Robot Arm
14 pages
Misdiagnosis and Dual Diagnoses of Gifted Children and Adults: Adhd, Bipolar, Ocd, Asperger's, Depression, and Other Disorders (2nd Edition)
100% (32)
Misdiagnosis and Dual Diagnoses of Gifted Children and Adults: Adhd, Bipolar, Ocd, Asperger's, Depression, and Other Disorders (2nd Edition)
23 pages
Adaptive Fuzzy Sliding Mode Control For The Two-Layered Vertical Cable-Driven Parallel Robot
No ratings yet
Adaptive Fuzzy Sliding Mode Control For The Two-Layered Vertical Cable-Driven Parallel Robot
11 pages
Automotive Software Engineering Master English
100% (1)
Automotive Software Engineering Master English
4 pages
2015 Nice 4018
No ratings yet
2015 Nice 4018
137 pages
Review On Control Strategies For Cable-Driven Parallel Robots With Model Uncertainties
No ratings yet
Review On Control Strategies For Cable-Driven Parallel Robots With Model Uncertainties
17 pages
Supervisory Adaptive Fuzzy Sliding Mode Control With Optimal Jaya Based Fuzzy PID Sliding Surface For A Planer Cable Robot
No ratings yet
Supervisory Adaptive Fuzzy Sliding Mode Control With Optimal Jaya Based Fuzzy PID Sliding Surface For A Planer Cable Robot
18 pages
Actuators 11 00367
No ratings yet
Actuators 11 00367
19 pages
Workspace-Based Model Predictive Control For Cable-Driven Robots
No ratings yet
Workspace-Based Model Predictive Control For Cable-Driven Robots
20 pages
2018 Development - of - Modular - Cable-Driven - Parallel - Robotic - Systems
No ratings yet
2018 Development - of - Modular - Cable-Driven - Parallel - Robotic - Systems
13 pages
2016 Discrete Reconfiguration Planning For Cable-Driven Parallel Robots
No ratings yet
2016 Discrete Reconfiguration Planning For Cable-Driven Parallel Robots
25 pages
Scara
No ratings yet
Scara
7 pages
Adaptive Cross-Coupled Control of Cable-Driven Parallel Robots With Model Uncertainties
No ratings yet
Adaptive Cross-Coupled Control of Cable-Driven Parallel Robots With Model Uncertainties
8 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
23 pages
Actuators 12 00200 v2
No ratings yet
Actuators 12 00200 v2
23 pages
3D Cable-Based Parallel Robot Simulation Using PD
No ratings yet
3D Cable-Based Parallel Robot Simulation Using PD
13 pages
2022 Design and Implementation of A CDPR For Additive Manufacturing Applications
No ratings yet
2022 Design and Implementation of A CDPR For Additive Manufacturing Applications
17 pages
SimulationBased Comparison of Novel Automated Construction SystemsRobotics
No ratings yet
SimulationBased Comparison of Novel Automated Construction SystemsRobotics
30 pages
1 s2.0 S0094114X22001550 Main
No ratings yet
1 s2.0 S0094114X22001550 Main
27 pages
Kumar Workspace Analysis 4 Cable Driven Spatial Parallel Robot ROMANSY 2018
No ratings yet
Kumar Workspace Analysis 4 Cable Driven Spatial Parallel Robot ROMANSY 2018
9 pages
Force-Sensor-Free Implementation of A Hybrid Posit
No ratings yet
Force-Sensor-Free Implementation of A Hybrid Posit
21 pages
Robotics 10 00111 v3
No ratings yet
Robotics 10 00111 v3
17 pages
(58) inventions-06-00076-v4 - Điều khiển thích ứng cho nhiều trục với hệ thống liên kết vật liệu
No ratings yet
(58) inventions-06-00076-v4 - Điều khiển thích ứng cho nhiều trục với hệ thống liên kết vật liệu
20 pages
Actuators2022 - Boumann - Bruckmann - Simulation and Model-Based Verification of An Emergency Strategy For Cable Failure in Cable Robots
No ratings yet
Actuators2022 - Boumann - Bruckmann - Simulation and Model-Based Verification of An Emergency Strategy For Cable Failure in Cable Robots
17 pages
Jomartov Et Al 2023 Simulation of Suspended Cable Driven Parallel Robot On Simulationx
No ratings yet
Jomartov Et Al 2023 Simulation of Suspended Cable Driven Parallel Robot On Simulationx
10 pages
Continuum Adapative Control
No ratings yet
Continuum Adapative Control
14 pages
Compensated Motion and Position Estimation of A Cable-Driven Parallel Robot Based On Deep Reinforcement Learning
No ratings yet
Compensated Motion and Position Estimation of A Cable-Driven Parallel Robot Based On Deep Reinforcement Learning
12 pages
Design and Tension Distribution Optimization of A 9-DOF Cable-Driven Parallel Spray-Painting Robot With 3 Degrees of Redundancy
No ratings yet
Design and Tension Distribution Optimization of A 9-DOF Cable-Driven Parallel Spray-Painting Robot With 3 Degrees of Redundancy
25 pages
Nonlinear PD Controllers With Gravity Compensation For Robot Manipulators
No ratings yet
Nonlinear PD Controllers With Gravity Compensation For Robot Manipulators
10 pages
Neural-Learning-Based Telerobot Control With Guaranteed Performance
No ratings yet
Neural-Learning-Based Telerobot Control With Guaranteed Performance
12 pages
Research On Controllable Stiffness of Redundant Cable-Driven Parallel Robots
No ratings yet
Research On Controllable Stiffness of Redundant Cable-Driven Parallel Robots
12 pages
Continuous Robust Control for Series ElasticActuator With Unknown Payload Parametersand External Disturbances2017一区南开孙雷
No ratings yet
Continuous Robust Control for Series ElasticActuator With Unknown Payload Parametersand External Disturbances2017一区南开孙雷
8 pages
An Adaptive Controller For Nonlinear Teleoperators With Varying Time Delays
No ratings yet
An Adaptive Controller For Nonlinear Teleoperators With Varying Time Delays
21 pages
HaiND Paper CleaningCDPR
No ratings yet
HaiND Paper CleaningCDPR
11 pages
Integrated Controller For An Over Constr
No ratings yet
Integrated Controller For An Over Constr
6 pages
Model Reference Adaptive Control and LQR Control For Quadrotor With Parametric Uncertainties
No ratings yet
Model Reference Adaptive Control and LQR Control For Quadrotor With Parametric Uncertainties
7 pages
Slotine Li Tac 88
No ratings yet
Slotine Li Tac 88
9 pages
Paper Singularity Published
No ratings yet
Paper Singularity Published
7 pages
2014 An Overview of The Development For Cable-Driven Parallel Manipulator
No ratings yet
2014 An Overview of The Development For Cable-Driven Parallel Manipulator
9 pages
Hit Study Plan
No ratings yet
Hit Study Plan
14 pages
Robotics: Deep Reinforcement Learning For The Control of Robotic Manipulation: A Focussed Mini-Review
No ratings yet
Robotics: Deep Reinforcement Learning For The Control of Robotic Manipulation: A Focussed Mini-Review
13 pages
Kinematicly Constrained CDPR Max Stifnnes
No ratings yet
Kinematicly Constrained CDPR Max Stifnnes
10 pages
Vibration Suppression of Redundantly Controlled Cable-Driven Parallel Robots
No ratings yet
Vibration Suppression of Redundantly Controlled Cable-Driven Parallel Robots
15 pages
Solving The Pulley Inclusion Problem For A Cable-Driven Parallel Robotic System Extended Kinematics and Twin-Pulley Mechanism
No ratings yet
Solving The Pulley Inclusion Problem For A Cable-Driven Parallel Robotic System Extended Kinematics and Twin-Pulley Mechanism
10 pages
Robust Adaptive Controller Design For Excavator Arm: Nga Thi-Thuy Vu
No ratings yet
Robust Adaptive Controller Design For Excavator Arm: Nga Thi-Thuy Vu
6 pages
05 - IEEE TRANSACTIONS ON CYBERNETICS 2017 - Adaptive Neural Network Control
No ratings yet
05 - IEEE TRANSACTIONS ON CYBERNETICS 2017 - Adaptive Neural Network Control
10 pages
Dynamic Modelling and Optimal Sliding Mode Control of The Wearable Rehabilitative Bipedal Cable Robot With 7 Degrees of Freedom
No ratings yet
Dynamic Modelling and Optimal Sliding Mode Control of The Wearable Rehabilitative Bipedal Cable Robot With 7 Degrees of Freedom
16 pages
Case Study
No ratings yet
Case Study
12 pages
Processes 10 02699
No ratings yet
Processes 10 02699
21 pages
Implementation of A Lag-Lead Compensator For Robots
No ratings yet
Implementation of A Lag-Lead Compensator For Robots
6 pages
Bitlis Eren Üniversitesi Fen Bilimleri Dergisi
No ratings yet
Bitlis Eren Üniversitesi Fen Bilimleri Dergisi
13 pages
Development of A Planar Cable Parallel R 6301e1be
No ratings yet
Development of A Planar Cable Parallel R 6301e1be
9 pages
Nonlinear PD Controllers With Gravity Compensation For Robot Manipulators
No ratings yet
Nonlinear PD Controllers With Gravity Compensation For Robot Manipulators
11 pages
Robust Control Design of 6-DOF Robot For Nuclear Power Plant Dismantling
No ratings yet
Robust Control Design of 6-DOF Robot For Nuclear Power Plant Dismantling
7 pages
Precdict Control
No ratings yet
Precdict Control
6 pages
Aerospace 09 00105
No ratings yet
Aerospace 09 00105
16 pages
48 BF
No ratings yet
48 BF
9 pages
6dof Dynamics
No ratings yet
6dof Dynamics
13 pages
Mod and Identification
No ratings yet
Mod and Identification
12 pages
QF ACD 019 Students Needs Assessment Questionnaire
No ratings yet
QF ACD 019 Students Needs Assessment Questionnaire
2 pages
2015 4dofpaper
No ratings yet
2015 4dofpaper
7 pages
PE 1 (Physical Fitness) Syllabus
No ratings yet
PE 1 (Physical Fitness) Syllabus
2 pages
INFORMATICS
No ratings yet
INFORMATICS
4 pages
Pascal
No ratings yet
Pascal
267 pages
Television Has More Advantages Than Disadvantages: Group Name
No ratings yet
Television Has More Advantages Than Disadvantages: Group Name
9 pages
Personal Data
No ratings yet
Personal Data
6 pages
Matric SIDRA
No ratings yet
Matric SIDRA
2 pages
Big Picture in Legislation
No ratings yet
Big Picture in Legislation
3 pages
3.1 Importance of Quantitative Research Across Fields
No ratings yet
3.1 Importance of Quantitative Research Across Fields
8 pages
OCZ SSD v1.7 Firmware Update Guide
No ratings yet
OCZ SSD v1.7 Firmware Update Guide
4 pages
Entrepreneurship Thesis Title
100% (3)
Entrepreneurship Thesis Title
7 pages
Szanto Judit Research 2021
No ratings yet
Szanto Judit Research 2021
168 pages
Lesson Plan Materials Assignment
No ratings yet
Lesson Plan Materials Assignment
3 pages
Santenemainwork
No ratings yet
Santenemainwork
63 pages
Electrostatics
No ratings yet
Electrostatics
17 pages
Abrahams Et Al. - 2013 - The Assessment of Practical Work in School Science
No ratings yet
Abrahams Et Al. - 2013 - The Assessment of Practical Work in School Science
66 pages
The Natural Approach
No ratings yet
The Natural Approach
4 pages
Research Methodology
No ratings yet
Research Methodology
74 pages
Family Homework Pass
100% (1)
Family Homework Pass
4 pages
Std12 Computer Paper Set Upto July 2024
No ratings yet
Std12 Computer Paper Set Upto July 2024
170 pages
Fundamentals of Astronomy 2nd Edition Cesare Barbieri Download
No ratings yet
Fundamentals of Astronomy 2nd Edition Cesare Barbieri Download
44 pages
Bhenhury
No ratings yet
Bhenhury
6 pages
Resume - Sunil Kumar
No ratings yet
Resume - Sunil Kumar
3 pages
Nasim Akhtar
No ratings yet
Nasim Akhtar
3 pages
Rujukan: Conduct. Washington D.C.: Author
No ratings yet
Rujukan: Conduct. Washington D.C.: Author
3 pages
Grade 1 Rationalized Mathematics Schemes of Work Term 1 KLB Tusome
No ratings yet
Grade 1 Rationalized Mathematics Schemes of Work Term 1 KLB Tusome
18 pages
Ug MBBS A.Y 2021 22
No ratings yet
Ug MBBS A.Y 2021 22
1 page
Makgrade 1 3RD Quarter Tos
No ratings yet
Makgrade 1 3RD Quarter Tos
1 page
Drone Systems and Operations: Definitive Reference for Developers and Engineers
From Everand
Drone Systems and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Deep Reinforcement Learning Control of Fully-Constrained Cable-Driven Parallel Robots

Uploaded by

Deep Reinforcement Learning Control of Fully-Constrained Cable-Driven Parallel Robots

Uploaded by

7194 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 70, NO.

Deep Reinforcement Learning Control of

γ ∈ [0, 1) is the discount factor. P (X (m + 1) |X(m), ur (m) ) · Vπr (X (m + 1))

∞ Based on the abovementioned discussions, the optimal policy

+αr ln (πr (ur (m) |X(m) ))]

exponential stability shown in the following inequation:

14: ξ ← ξ − ιξ ∇ξ Jξ (ξ) βπr (X(m)) = lim P (X(m) |ρ, πr , m ).

D. Stability Analysis According to the abovementioned conclusions, inequation

× Q (m + 1) dX (m + 1) − Q(m)) dX(m) IV. SIMULATION AND EXPERIMENTS

is composed of the trained control policy and the basic controller.

where L̃i , i = 1, 2, . . ., 12 are the error items. xp (m), yp (m),

Fig. 4. Experiment platform of CDPR.

Fig. 6. RMSEs in four trajectories when the end-effector mass is 2 kg.

You might also like