Learning A Controller For Soft Robotic Arms and Testing Its Generalization To New Observations Dynamics and Tasks

Uploaded by

Pavan Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

Learning A Controller For Soft Robotic Arms and Testing Its Generalization To New Observations Dynamics and Tasks

Uploaded by

Pavan Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Learning a Controller for Soft Robotic Arms and Testing its

Generalization to New Observations, Dynamics, and Tasks

Carlo Alessi1,2 , Helmut Hauser3 , Alessandro Lucantonio4 , and Egidio Falotico1,2 (Member, IEEE)

AbstractÐ Recently, learning-based controllers that leverage these model approximations degrades when the robot is
mechanical models of soft robots have shown promising re- subject to non-negligible external forces and unpredictable
sults. This paper presents a closed-loop controller for dy- interactions with the environment.
namic trajectory tracking with a pneumatic soft robotic arm
learned via Deep Reinforcement Learning using Proximal Policy Data-driven modeling and control are also viable ap-
Optimization. The control policy was trained in simulation proaches [8]. In the context of supervised learning, reservoir
2023 IEEE International Conference on Soft Robotics (RoboSoft) | 979-8-3503-3222-3/23/$31.00 ©2023 IEEE | DOI: 10.1109/ROBOSOFT55895.2023.10121988

leveraging a dynamic Cosserat rod model of the soft robot. computing was used to emulate the nonlinear dynamics of
The generalization capabilities of learned controllers are vital a pneumatic soft robotic arm and learning to reproduce
for successful deployment in the real world, especially when the trajectories [9]. Moreover, continual learning was proposed
encountered scenarios differ from the training environment.
We assessed the generalization capabilities of the controller to tune the weights of a neural network-based controller to
in silico for four tests. The first test involved the dynamic adapt to changes in the dynamics of a soft robotic arm due
tracking of trajectories that differ significantly in shape and to loading conditions without catastrophic forgetting [10].
velocity profiles from the training data. Second, we evaluated The success of reinforcement learning (RL) for behavior
the robustness of the controller to perpetual external end-point generation in rigid robots prompted interest in its application
forces for dynamic tracking. For tracking tasks, it was also
assessed the generalization to similar materials. Finally, we to continuum and soft robots [11]. The first applications of
transferred the control policy without retraining to intercept a RL for soft robot control relied on discretized state-action
moving object with the end-effector. The learned control policy spaces. The well-known Q-Learning algorithm was applied
has shown good generalization capabilities in all four tests. to train a model-free, open-loop, static controller for a multi-
Index TermsÐ Modeling, Control, and Learning for Soft segment planar pneumatic soft arm [12]. The same algorithm
Robots, Learning and Adaptive Systems, Soft Robot Applica-
tions. was used to compare control policies learned in simulation
and directly on the real soft robot subject to tip loads
I. I NTRODUCTION [13]. The SARSA algorithm was applied to obtain a static
controller of position and stiffness for a hybrid soft robotic
The modeling and control of continuum and soft robotic arm in a multi-agent setting, which considered the actuators
arms are still challenging problems due to hyper-redundancy, as individual agents cooperating in a shared environment
complex dynamics, and non-linear properties of soft mate- [14]. Following the recent advances of Deep RL, now it is
rials [1], [2]. Researchers have proposed several modeling also possible to consider continuous states and actuations.
techniques [3], which lead to the development of a variety For example, Satheeshbabu et al. [15] learns via Deep Q-
of model-based and model-free control strategies [4]. Learning transition between way-points in quasi-static con-
The most used approaches for deriving forward dynamics ditions with an open-loop position controller for a pneumatic
or kinematics models for continuum and soft robots have soft robotic arm capable of bending and twisting that was
been geometrical models like the piece-wise constant cur- trained in simulation, leveraging a Cosserat rod model. The
vature (PCC) approximation [5]. These models were used same authors extended the work by increasing the dexterity
within proportional-derivative (PD) control laws for dynamic of the soft robotic arm and attaining a closed-loop controller
task space control of a soft manipulator performing a variety for precise quasi-static positioning via Deep Deterministic
of real-world tasks [6]. As an alternative [7] described Policy Gradient approach [16]. The authors validated both
the shape of a synthetic planar soft robot analytically by controllers on unseen payloads. Our work presented here,
a polynomial curvature model, which was used within an however, uses a dynamic (not just quasi-static) Cosserat rod
extended PD regulator to achieve perfect steady state control model, subject to pressure-induced stretching and bending.
in generic curvature conditions. However, the suitability of In a similar work, Centurelli et al. [17] learned a closed-
*This work was supported by the European Union’s Horizon 2020
loop controller for dynamic trajectory tracking for a soft
Research and Innovation Programme under the Specific Grant Agreement robotic arm via Trust Region Policy Optimization leverag-
No. 945539 (Human Brain Project SGA3). ing an approximation of the robot forward dynamic model
1 The BioRobotics Institute, Scuola Superiore Sant’Anna, Pisa, Italy
(email: {c.alessi, e.falotico}@santannapisa.it).
obtained by a recurrent neural network. In Naughton et al.
2 Department of Excellence in Robotics and AI, Scuola Superiore [18], the authors applied several deep reinforcement learning
Sant’Anna, Pisa, Italy. algorithms to learn in simulation various control policies
3 Department of Engineering Mathematics, University of Bristol, Bristol,
using a synthetic soft arm based on Cosserat theory. All these
UK (email: [email protected]).
4 Department of Mechanical and Production Engineering, Aarhus Univer- approaches confirm that Deep RL algorithms are suitable
sity, Aarhus, Denmark (email: [email protected]). candidates for generating control policies for soft robots.

Authorized licensed use limited to: Aarhus University. Downloaded on December 05,2024 at 14:28:04 UTC from IEEE Xplore. Restrictions apply.
However, these works do not explore common problems
for RL-based controllers: the abilities to generalize to new
observations, environment dynamics, and tasks [19]. In this
work, we adopt an approach similar to [17], [18]. We propose
a closed-loop controller for dynamic trajectory tracking tasks
using a pneumatic soft robotic arm trained via deep rein-
forcement learning using Proximal Policy Optimization. The
control policy is learned in simulation leveraging a dynamic (a) (b)
Cosserat rod model of the soft robot. However, we diversify
Fig. 1: Robotic platform. (a) AM I-Support. (b) Rendering
the validation producing different target velocities profiles
of the used computational model.
to investigate in silico the generalization capabilities and
limitations of the controller in various conditions and tasks.
Specifically, we evaluate the generalization capabilities of d¯3 points along the center-line tangent (∂s x̄ = x̄s ). The
the controller in silico on four tests: (i) tracking trajectories deformations that the rod can undergo are expressed by the
of different geometries and velocities; (ii) tracking trajecto- shear/stretch vector σ(s, t) and the bend/twist vector κ(s, t).
ries subject to constant external forces applied to the end- Shearing and stretching deviate d3 from x̄s , σ = Q(x̄s −
effector; (iii) tracking trajectories using different material d¯3 ) = Qx̄s − d3 in the local frame. The curvature vector
properties; and (iv) intercepting an object moving at various κ encodes Q’s rotation rate along the material coordinate
velocities towards the workspace. Note that all of them are ∂s dj = κ × dj , while angular velocity ω is defined by
carried out without retraining. The rest of the paper is as ∂t dj = ω × dj . The rod dynamics is then governed by the
follows. Section II describes the soft robotic platform, the following set of nonlinear differential equations:
Cosserat rod model, and the training process to solve the
control problem of dynamic tracking. Section III reports
Q⊺ Sσ
and discusses the results obtained for the four generalization ρA · ∂t2 x̄
= ∂s + ef¯ (1)
tests. Section IV concludes with an insight into future works. e

ρI Bκ κ × Bκ x̄s
II. M ATERIALS AND M ETHODS · ∂t ω = ∂s + + Q × Sσ
e e3 e3 e
A. Robotic Platform: The AM I-Support ω ρIω
+ ρI · × ω + 2 · ∂t e + ec, (2)
The AM I-Support is a 3D-printed soft robotic arm with e e
three elliptical pneumatic chambers that can generate large where B is the bend/twist stiffness matrix, S is the
movements by combining stretching and bending [20]. As shear/stretch stiffness matrix, ρ is the constant material
shown in Fig. 1a, two terminal plates (top and bottom) density, A is the cross-sectional area, I is the second area
confine the modules, six rings distributed along the body moment of inertia, f¯ is the external force, c is the external
constrain the chambers, while nuts and bolts assemble the couple, and e = |x̄s | is the local stretching.
parts. The soft robotic arm is characterized by a cross- The pressurization of the pneumatic chambers of the AM
section of radius 30 mm, an overall length of ∼202 mm, and I-Support produces an internal force along d3 normal to
∼183 g overall weight. The pneumatic chambers are ∼180 the rod cross-section and a bending moment. To describe
mm long, while the top and bottom terminal are ∼20 mm the deformations of the robot, we modeled pressure-induced
and ∼5 mm long, respectively. The actuators are distributed strains as spontaneous stretching and bending, modifying the
axially, at a radial distance of 20 mm from the cross-section rest configuration of the arm dynamically. The rod is subject
centroid, and equally spaced by 120◦ around the centre. to gravity, viscous forces, viscous torques, and external
The arm was fabricated using the soft material thermoplastic forces applied to the free-end which can be integrated into
polyurethane with 80 Shore A hardness (TPU 80 A LF, by body dynamics via f¯ and c in (1)-(2). The rest length of
BASFT M ), which is characterized by 17 MPa tensile strength the rod and the cross-section radius were directly measured
and elongation at break of 471%. from the physical prototype [20]. We computed the effective
cross-sectional area and the second area moment of inertia
B. Cosserat Rod Model
considering the actuator geometry. The material density was
The soft robotic arm was modeled as a Cosserat rod with taken from the TPU datasheet, the Young Modulus was fitted
constant cross-section and homogeneous material properties from experimental stretching data, and the damping coeffi-
by extending the Cosserat theory introduced in [21] to cients were optimized on dynamic stretching and bending
account for the pneumatic actuation. A rod is described by data. This computational environment was used for training
a center-line x̄(s, t) ∈ R3 and orthogonal rotation matrix and testing the controller.
Q(s, t) = {d¯1 , d¯2 , d¯3 }−1 . Here, t is time and s ∈ [0, L] is
the material coordinate of a rod of length L. Q transforms C. Control Architecture
vectors from global frame to the local frame via x = Qx̄, To solve a trajectory tracking task using this soft robotic arm,
and vice versa x̄ = Q⊺ x. If the rod is unsheared, (d¯1 , d¯2 ) we adopt the closed-loop control scheme shown in Fig. 2.
spans the normal-binormal plane of the cross-section, and An arbitrary trajectory generator provides reference positions

Authorized licensed use limited to: Aarhus University. Downloaded on December 05,2024 at 14:28:04 UTC from IEEE Xplore. Restrictions apply.
2). State and action spaces are each normalized between -1
and 1 to increase numerical stability of the training process.
The reward is defined as
(
−10 if NaN
rt = (4)
−et + b(et ) otherwise,
where the penalty term of -10 was applied to discourage
actions that would cause numerical instabilities as proposed
Fig. 2: Control scheme with z −1 discrete time delay operator.
by [18], et = ||et || is the norm of the tracking error, and an
inductive bias b(·) is provided as incentive to explore
3
xtar
t+1 ∈ R , i.e., the desired position in Cartesian coordinates

0.05 0.03 < e ≤ 0.05

of the robot end-effector for the next time step t+1. The
b(e) = 0.1 0.01 < e ≤ 0.03 (5)
controller is implemented as a feed-forward neural network 
0.2 e ≤ 0.01.

with two hidden layers, each with 64 neurons with tanh
activation function. The output layer has a linear activation 1) Proximal Policy Optimization: The controller is
function. The controller takes as input learned via Proximal Policy Optimization (PPO), a policy-
h i gradient method appropriate for continuous control tasks
xt = dt , et , xtip
t , x tip
t−1 , x tip 15
t−2 ∈ R , (3) [22]. In particular, we adopt the reliable implementation
tip provided by [23]. The algorithm jointly optimizes a stochas-
where dt = xtart+1 −xt is the distance vector between the tic policy π(a|s) and a value-function approximator. PPO
current desired position and the current free-end position of alternates between sampling data from the policy through
the arm xtip tar
t measured before the actuation, et = xt −xt
tip
interaction with the environment and performing optimiza-
tip tip
is the current tracking error, and xt−1 and xt−2 are the two tion on the sampled data using stochastic gradient descent
previous positions of the robot end-effector. The vector dt (SGD) to maximize the objective
therefore provides the controller with a minimal prediction
of the future target position. The error vector et measures h i
how well the tracking proceeds in line with standard closed- E min ρt (π) · Ât , clip(ρt (π), 1 − ϵ, 1 + ϵ) · Ât , (6)
loop controllers. Finally, xtip tip tip
t , xt−1 , and xt−2 provide the
π(at |st )
controller with a simple short-term memory that allows it to where ρt (π) = πold (at |st ) is the ratio of the probability
infer the velocity and acceleration of the soft robotic arm. of selecting an action under the current policy π and the
The controller outputs three pressure commands for the three probability of taking it with the policy πold that collected
chambers pt = [p1 , p2 , p3 ] limited between 0 and 3.5 bar. the current batch of data, ϵ=0.2 is the clipping parameter,
The initial values are d0 = xtar 1 − xtip
0 , e0 = 0, and and Â is an estimator of the advantage function. This loss
tip tip tip
x0 = x−1 = x−2 = [0, 0, −L]. The control loop operates encourages the policy to select actions with a positive advan-
at 10 Hz frequency, actuating the robot every ∆t=0.1 s. tage while discouraging large policy updates via clipping.
D. Reinforcement Learning Algorithm E. Training Process
We solve the control problem using deep Reinforcement The control policy was optimized using PPO. For each
Learning. In general, in RL an agent receives at each time episode, a random trajectory was produced. The starting
step t an observation ot from the environment, which is a point was the resting tip position xtip
0 and two additional
subset of the full environment state st . The agent acts ac- way-points were uniformly sampled from 512 positions in
cording to a policy π mapping states/observations to actions, the workspace. Target trajectory xtar was then produced
which can be deterministic or stochastic. The agent receives a through interpolation of these three points using a cubic
scalar reward r(s, a) indicating
PT the current task performance. spline (see Fig. 3b). This was redone for each training
i−t
Let the return Gt = i=t γ r(si , ai ) be the discounted episode. This ensured that the controller visited different
sum of future rewards, with discount factor γ ∈ [0, 1]. The parts of the workspace. The duration of the training episode
agent aims to maximize the expected return Eπ [G0 |s0 ]. The was fixed at T =10 s for each target trajectory. Note that
state-value function is defined as Vπ (s) = Eπ [Gt |st ]. The since the space traveled in these 10s depended on how far
action-value function is defined as Qπ (s, a) = Eπ [Gt |st , at ]. apart the sampled way-points were, the training trajecto-
The advantage value function Aπ (s, a) = Qπ (s, a) − Vπ (s) ries had different velocity profiles. This approach ensured
expresses whether the action a is better or worse than an that the controller experienced a wide range of velocities
average action the policy π takes in the state s. ∆xtar . Through this process, we obtained learning data
In our setting, the agent is the controller implemented as points with velocities in the range [0, 0.10] m/s with a
a neural network, and the environment is the soft robotic mean of 0.025±0.016 m/s (see also Fig. 5a orange curve).
arm modeled as a Cosserat rod. Therefore, the agent receives Each episode starts with the robot at rest facing vertically
observations st = xt and outputs actions at = pt (see Fig. downward (see Fig. 1). The episode terminated when the

Authorized licensed use limited to: Aarhus University. Downloaded on December 05,2024 at 14:28:04 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c)

Fig. 3: Example of target trajectories. The starting point is the resting tip position xtip
0 . Additional waypoints are sampled
uniformly from the workspace. Target trajectories xtar are generated by interpolating xtip
0 and the waypoints using a cubic
spline. (a) 3D straight lines (1 waypoint, T =10 s); (b) 3D curves (2 waypoints, T =10 s); (c) 3D curves (3 waypoints, T =15
s). The controller was trained on curves with two waypoints and tested on straight lines and curves with three waypoints.

(a) (b)
Fig. 5: Statistics for the trajectory tracking tasks. (a) Distribu-
Fig. 4: Learning curve showing cumulative episode reward. tion of target velocities for each task. (b) Error distribution
for each task. The control policy trained on 3D curves (2
waypoints) generalizes to different trajectories and velocities.
entire target trajectory was done (i.e., the time limit of 10s
was reached) or when numerical problems occurred.
We trained a stochastic policy to solicit exploration in TABLE I: Tip error e/L (%) on dynamic trajectory tracking.
the environment. After training, we used a deterministic
Trajectory mean ± std (%) IQR (%)
and greedy policy to exploit the best actions learned. After
an empirical model selection and hyper-parameter tuning, 3D curves (2 waypoints) 8.86 ± 6.22 5.60
the learning took place over 1.2 million time steps (i.e., 3D lines (1 waypoint) 6.56 ± 4.03 5.11
∼10k episodes), equivalent to about 33 hours of learning 3D curves (3 waypoints) 9.61 ± 6.61 6.56
experience in silico. The training episodes were collected
using N =8 parallel agents interacting with the environment
for M =64 time steps per policy update. At each iteration,
the policy was optimized on the current N ·M samples with tigate its generalization abilities. In particular, first, we
SGD for ten epochs using four mini-batches and a learning tested how the controller could track trajectories of different
rate of 0.00025. The training lasted about 14 hours on a geometries and velocities profiles. Second, we assessed the
standard laptop (Intel i7-7500U Processor, 8 GB RAM). robustness of the controller to external forces of various
The learning curve in Fig. 4 shows the sum of the rewards magnitudes and directions applied to the robot’s end-effector
the agent received in each training episode. The light blue during trajectory tracking. Third, we evaluated the gener-
curve is noisy because of the intrinsic explorative behavior alization of the policy to different material stiffnesses for
of training a stochastic policy and the fact that each episode dynamic tracking. As a performance metric, we adopted the
generates a new target trajectory xtar . Nonetheless, the trend mean and standard deviation of the normalized tip error e/L,
of the exponential moving average (dark blue) is increasing. i.e., the error in percentage of the robot length L. In addition,
we measured the spread of the normalized tip error using the
III. R ESULTS AND D ISCUSSION interquartile range (IQR), which is robust to extreme outliers.
After learning the stochastic policy, we evaluated its greedy Finally, we deployed the controller to make the soft robot tip
(deterministic) version. We conducted four tests to inves- intercept the trajectory of a moving object.

Authorized licensed use limited to: Aarhus University. Downloaded on December 05,2024 at 14:28:04 UTC from IEEE Xplore. Restrictions apply.
Fig. 6: Controller evaluation on three sample trajectories: (left) straight line (1 waypoint); (center) curve (2 waypoints); (right)
curve (3 waypoints). The controller outputs various pressure profiles to track target trajectories geometrically different.

A. Generalization to Trajectory Tracking Tasks TABLE II: Tip error e/L (%) for trajectory tracking subject
to random perpetual external endpoint forces.
We tested the control policy trained on trajectories generated
from two waypoints on three different sets of 100 generated Trajectory fext (N) mean ± std (%) IQR (%)
tracking tasks (see Fig. 3). 3D curves (2 waypoints) 0.0 8.25 ± 5.64 5.23
As a baseline, we assessed the performance on trajectories 3D curves (2 waypoints) 0.1 8.31 ± 5.73 5.16
sampled from the same distribution as the ones used for 3D curves (2 waypoints) 0.5 8.75 ± 6.30 5.42
3D curves (2 waypoints) 1.0 9.89 ± 7.59 5.80
training, i.e., 3D curves (two way-points, with the same
starting point, duration T =10 s). The velocities for this task
ranged in [0, 0.10] m/s with a mean of 0.025±0.016 m/s.
On this set of trajectories, the controller achieved a mean tip test confirmed that the distributions are pair-wise different.
error of 8.86±6.22%L and 5.60%L IQR, see Table I. In particular, the p-value was 0 for the three tests, rejecting
For the first test of generalization capabilities, we em- the null hypothesis that the distributions are identical. Fig. 6
ployed trajectories that differed from the training set in shows examples of actuation, dimension-wise tracking, and
geometry and velocity profile (see Fig. 5a). The first group tracking error for each trajectory type.
of testing trajectories included 3D straight lines (1 way-
point, T =10 s) with velocities in the range [0, 0.019] m/s B. Generalization to External Forces
with a mean of 0.01±0.004 m/s. The second group included External forces applied to soft robots can significantly alter
3D curves (3 way-points, T =15 s) with velocities in the their dynamics. In this experiment, we evaluate the robust-
range [0, 0.178] m/s with a mean of 0.031±0.02 m/s. The ness of the controller to perpetual external forces applied
mean tip error performance attained on these tasks were to the end-effector in dynamic trajectory tracking tasks.
6.56±4.03%L, and 9.61±6.61% respectively (see Table I The force fext = [fx , fy , fz ] includes the special case of
for a comparison). The controller tracked 3D straight lines a standard payload in which fx =fy =0. After sampling the
better than the baseline (i.e., 3D curves with two way-points) force vector components from a normal distribution, fext
and performed slightly worse on tracking 3D curves with was scaled to the desired magnitude. We investigated three
three way-points. Fig. 5b shows the distributions of the tip different magnitudes, i.e., fext ∈ {0.1, 0.5, 1.0} N. When the
errors for each task. The controller achieved good results in soft arm was at rest, the force caused an average deflection
tracking trajectories that differed not only geometrically (e.g., in the direction of fext of 1.16±0.32%L, 5.82±1.56%L, and
lines and curves) but also in the velocity profile required 11.65±3.06%L, respectively. The controller did not have any
to follow them. The hypothesis is that the learned policy explicit information about the perturbations. The controller
has generalized to different tracking tasks. This was verified evaluated on 100 random trajectories for each of the three
quantitatively by conducting a Kolmogorov-Smirnov statis- magnitudes, each with a different endpoint force, shows
tical test [24] on the velocity distributions (see Fig. 5a). The comparable performance with the no-force case (see Table

Authorized licensed use limited to: Aarhus University. Downloaded on December 05,2024 at 14:28:04 UTC from IEEE Xplore. Restrictions apply.
TABLE III: Accuracy of trajectory interception task.
T (s) 3D lines 3D curves
10 85% 95%
5 74% 90%
2 46% 64%
1 33% 52%
0.5 18% 27%

object was a sphere of radius Robj =25 mm identified by its

centroid xtar travelling a path towards the workspace. The
Fig. 7: Controller generalization to a range of Young Moduli.
object’s initial position xtar
0 was uniformly sampled outside
the workspace from a sphere of radius 0.3 m centered at the
resting position of the free-end of the robot, i.e., xtip
0 . Then
II). Despite the wide range of force magnitudes, i.e., between
one or two additional way-points were sampled uniformly
fext =0.1 N and fext =1.0 N, the tip error increased only by
from the workspace and interpolated with a cubic spline to
around 1%L. Therefore, the control policy trained without
generate the object trajectories, i.e., 3D straight lines or 3D
disturbances generalized well to perpetual external forces
curves. The task was successful if the end-effector contacted
applied to the end-effector post-training.
the object intercepting its trajectory.
C. Generalization to Material Properties We evaluated the accuracy of trajectory interception for
Materials play a crucial role in soft robotics. Factors like different object velocities to understand the complexity of
temperature changes or material degradation could nonlin- the interception task (see Table III). The average velocity
early modify the stiffness of soft robots. Similarly, soft robots of the object ranged from 0.03 m/s (T =10 s) up to 0.51
with equal geometries but different material properties could m/s (T =0.5 s) for the linear trajectories, and from 0.05
attain significantly different deformations that affect the m/s (T =10 s) up to 0.63 m/s (T =0.5 s) for the curvilinear
reachable workspace. Therefore, it is interesting to evaluate trajectories. As expected, the percentage of successful inter-
the generalization of a controller to diverse materials. In ceptions decreased for objects moving at higher velocities
particular, we tested how the control policy generalized to for both trajectory types. This was because the average
different Young Moduli of the Cosserat rod for tracking object trajectory in the interception task was up to 25 times
trajectories. The calibrated value of the Young modulus of faster (for T =0.5 s) than the average training trajectory.
the rod used to train the controller was E=1.65 MPa. The Mechanical limitations of the soft robot also play a role.
learned controller was tested on Young Moduli ranging from Interestingly, the success rate for the curvilinear trajectories
0.49 MPa to 6.59 MPa in steps of ∆E=0.1E, respectively was higher than for the straight lines despite the higher
0.3 and four times the calibrated value. This range is reason- velocities of the former. This could be because the curves
able for soft robotics applications. For each value of E, we stay inside the learned workspace for longer, increasing the
averaged the normalized tip errors over 100 target trajectories probability of interception. Overall, the learned controller
sampled from two waypoints and T =10 s. As shown in performed the object interception satisfactorily, suggesting
Fig. 7, the controller generalized fairly well for values that the knowledge learned for dynamic path following is
of E that deviated moderately from the calibrated value. transferrable to other tasks. Fig. 8 shows a rendering of a
However, the mean and standard deviation of the normalized successful trajectory interception trial for T =5 s. From Fig.
tip error increased with ∆E. Notice how the error curve was 9, observe that the object started from a remote position
asymmetric to the calibrated stiffness. For lower values of the and quickly moved toward the workspace. The controller
Young Modulus, the tracking error increased faster than for smoothly tracked the trajectory contributing with all three
higher values. This could be because softer materials undergo chambers reducing the tracking error.
larger deformations under the same applied pressure, making
IV. C ONCLUSION
them harder to control. Conversely, stiffer materials reduced
the reachable workspace. As a result, the controller could In this paper, we leveraged a dynamic Cosserat rod model of
not track all the points along the generated trajectories as a soft robotic arm and trained a control policy for dynamic
effectively. In summary, the controller generalized well (e.g., trajectory tracking using Proximal Policy Optimization, a
average tip error less than 9%L) to similar materials, having deep reinforcement learning algorithm. We investigated how
Young Modulus in the range [0.9E, 1.4E]. well the learned control policy generalized to new observa-
tions, including tracking trajectories of different geometry
D. Generalization to Trajectory Interception Tasks and velocity profiles. Moreover, the controller generalized
After assessing that the control policy can generalize to to new environmental dynamics imposed as perpetual end-
track different trajectories at various speeds, we deployed the point forces of different magnitudes in any direction. We
controller to intercept a moving object. Again, the control also tested new dynamics for various material stiffnesses
policy was not retrained for solving the new task. The in trajectory tracking. Finally, the policy also generalized

Authorized licensed use limited to: Aarhus University. Downloaded on December 05,2024 at 14:28:04 UTC from IEEE Xplore. Restrictions apply.
(a) 0/4 (b) 1/4 (c) 2/4 (d) 3/4 (e) 4/4
Fig. 8: Rendering of object interception with a soft robotic arm. Episode lasts 3.3 seconds, max T =5 s.

[7] C. Della Santina and D. Rus, ªControl oriented modeling of soft

robots: the polynomial curvature case,º IEEE Robotics and Automation
Letters, vol. 5, no. 2, pp. 290±298, 2019.
[8] D. Kim, S.-H. Kim, T. Kim, B. B. Kang, M. Lee, W. Park, S. Ku,
D. Kim, J. Kwon, H. Lee et al., ªReview of machine learning methods
in soft robotics,º Plos one, vol. 16, no. 2, p. e0246102, 2021.
[9] M. Eder, F. Hisch, and H. Hauser, ªMorphological computation-
based control of a modular, pneumatically driven, soft robotic arm,º
Advanced Robotics, vol. 32, no. 7, pp. 375±385, 2018.
[10] F. PiquÂe, H. T. Kalidindi, L. Fruzzetti, C. Laschi, A. Menciassi, and
E. Falotico, ªControlling soft robotic arms using continual learning,º
IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5469±5476,
2022.
[11] J. Kober, J. A. Bagnell, and J. Peters, ªReinforcement learning in
robotics: A survey,º The International Journal of Robotics Research,
vol. 32, no. 11, pp. 1238±1274, 2013.
[12] X. You, Y. Zhang, X. Chen, X. Liu, Z. Wang, H. Jiang, and X. Chen,
ªModel-free control for soft manipulators based on reinforcement
learning,º in 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS). IEEE, 2017, pp. 2909±2915.
[13] H. Zhang, R. Cao, S. Zilberstein, F. Wu, and X. Chen, ªToward ef-
fective soft robot control via reinforcement learning,º in International
Conference on Intelligent Robotics and Applications. Springer, 2017,
pp. 173±184.
[14] Y. Ansari, M. Manti, E. Falotico, M. Cianchetti, and C. Laschi,
ªMultiobjective optimization for stiffness and position control in a
soft robot arm module,º IEEE Robotics and Automation Letters, vol. 3,
Fig. 9: Successful trajectory interception example, T =5 s. no. 1, pp. 108±115, 2017.
[15] S. Satheeshbabu, N. K. Uppalapati, G. Chowdhary, and G. Krishnan,
ªOpen loop position control of soft continuum arm using deep
reinforcement learning,º in 2019 International Conference on Robotics
to the new task of intercepting with the end-effector a and Automation (ICRA). IEEE, 2019, pp. 5133±5139.
[16] S. Satheeshbabu, N. K. Uppalapati, T. Fu, and G. Krishnan, ªCon-
moving object (zero-shot transfer). While, as expected, the tinuous control of a soft continuum arm using deep reinforcement
performance dropped slightly for these new scenarios, the learning,º in 2020 3rd IEEE International Conference on Soft Robotics
learned model was surprisingly robust. Cosserat rod models (RoboSoft). IEEE, 2020, pp. 497±503.
[17] A. Centurelli, L. Arleo, A. Rizzo, S. Tolu, C. Laschi, and E. Falotico,
of soft robots are promising for learning complex control ªClosed-loop dynamic control of a soft manipulator using deep rein-
policies in simulation. Extensions to this work include the forcement learning,º IEEE Robotics and Automation Letters, vol. 7,
sim-to-real transfer of the controller to multi-section soft no. 2, pp. 4741±4748, 2022.
[18] N. Naughton, J. Sun, A. Tekinalp, T. Parthasarathy, G. Chowdhary,
robotic arms, orientation tracking, using recurrent networks, and M. Gazzola, ªElastica: A compliant mechanics environment for
and learning the trajectory interception policy. soft robotic control,º IEEE Robotics and Automation Letters, vol. 6,
no. 2, pp. 3389±3396, 2021.
R EFERENCES [19] R. Kirk, A. Zhang, E. Grefenstette, and T. RocktÈaschel, ªA survey
of generalisation in deep reinforcement learning,º arXiv preprint
[1] C. Laschi and M. Cianchetti, ªSoft robotics: new perspectives for robot arXiv:2111.09794, 2021.
bodyware and control,º Frontiers in bioengineering and biotechnology, [20] L. Arleo, G. Stano, G. Percoco, and M. Cianchetti, ªI-support soft
vol. 2, p. 3, 2014. arm for assistance tasks: a new manufacturing approach based on 3d
[2] D. Rus and M. T. Tolley, ªDesign, fabrication and control of soft printing and characterization,º Progress in Additive Manufacturing,
robots,º Nature, vol. 521, no. 7553, pp. 467±475, 2015. vol. 6, no. 2, pp. 243±256, 2021.
[3] C. Armanini, F. Boyer, A. T. Mathew, C. Duriez, and F. Renda, [21] M. Gazzola, L. Dudte, A. McCormick, and L. Mahadevan, ªForward
ªSoft robots modeling: A structured overview,º IEEE Transactions on and inverse problems in the mechanics of soft filaments,º Royal Society
Robotics, 2023. open science, vol. 5, no. 6, p. 171628, 2018.
[4] T. George Thuruthel, Y. Ansari, E. Falotico, and C. Laschi, ªControl [22] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
strategies for soft robotic manipulators: A survey,º Soft robotics, vol. 5, ªProximal policy optimization algorithms,º arXiv preprint
no. 2, pp. 149±163, 2018. arXiv:1707.06347, 2017.
[5] R. J. Webster III and B. A. Jones, ªDesign and kinematic modeling [23] A. Hill, A. Raffin, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore,
of constant curvature continuum robots: A review,º The International P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Rad-
Journal of Robotics Research, vol. 29, no. 13, pp. 1661±1683, 2010. ford, J. Schulman, S. Sidor, and Y. Wu, ªStable baselines,º
[6] O. Fischer, Y. Toshimitsu, A. Kazemipour, and R. K. Katzschmann, https://github.com/hill-a/stable-baselines, 2018.
ªDynamic task space control enables soft manipulators to perform [24] J. L. Hodges, ªThe significance probability of the smirnov two-sample
real-world tasks,º Advanced Intelligent Systems, p. 2200024, 2022. test,º Arkiv för Matematik, vol. 3, no. 5, pp. 469±486, 1958.

Authorized licensed use limited to: Aarhus University. Downloaded on December 05,2024 at 14:28:04 UTC from IEEE Xplore. Restrictions apply.

AI ML Report
No ratings yet
AI ML Report
35 pages
Top 12 Python Libraries
No ratings yet
Top 12 Python Libraries
15 pages
Data Science and Machine Learning - MCQ
No ratings yet
Data Science and Machine Learning - MCQ
19 pages
Christian Trejo 2022
No ratings yet
Christian Trejo 2022
11 pages
1 Dobot
No ratings yet
1 Dobot
4 pages
Deep Learning Methods in Soft Robotics Architectur
No ratings yet
Deep Learning Methods in Soft Robotics Architectur
30 pages
Machine Learning in Finance
100% (4)
Machine Learning in Finance
300 pages
Modeling and Control of Robotic Manipulators Based On Artificial Neural Networks A Review
No ratings yet
Modeling and Control of Robotic Manipulators Based On Artificial Neural Networks A Review
41 pages
Towards Reinforcement Learning Controllers For Soft Robots Using Learned Environments
No ratings yet
Towards Reinforcement Learning Controllers For Soft Robots Using Learned Environments
6 pages
Scirobotics Add6864
No ratings yet
Scirobotics Add6864
14 pages
Deep Lagrangian Networks
No ratings yet
Deep Lagrangian Networks
17 pages
Good 3
No ratings yet
Good 3
15 pages
A Survey On Deep Reinforcement Learning Algorithms For Robotic Manipulation
No ratings yet
A Survey On Deep Reinforcement Learning Algorithms For Robotic Manipulation
35 pages
Robotic Arm1
No ratings yet
Robotic Arm1
6 pages
Continuous Deep Q-Learning With Model-Based Acceleration
No ratings yet
Continuous Deep Q-Learning With Model-Based Acceleration
13 pages
SCARA Robot
No ratings yet
SCARA Robot
31 pages
1 s2.0 S095741742301686X Main
No ratings yet
1 s2.0 S095741742301686X Main
10 pages
Soft Manipulator For Soft Robotic Applications A R
No ratings yet
Soft Manipulator For Soft Robotic Applications A R
1 page
A Single-Chamber Pneumatic Soft Bending Actuator With Increased Stroke-Range by Local Electric Guidance
No ratings yet
A Single-Chamber Pneumatic Soft Bending Actuator With Increased Stroke-Range by Local Electric Guidance
9 pages
Paper Ask1 Arxiv
No ratings yet
Paper Ask1 Arxiv
7 pages
Underwater Soft Robot Modeling and Control With Differentiable Simulation
No ratings yet
Underwater Soft Robot Modeling and Control With Differentiable Simulation
8 pages
Model-Based Tracking Control Design, Implementation of Embedded Digital
No ratings yet
Model-Based Tracking Control Design, Implementation of Embedded Digital
8 pages
Learning Hand-Eye Coordination For Robotic. Alex Krizhevsky
No ratings yet
Learning Hand-Eye Coordination For Robotic. Alex Krizhevsky
12 pages
2018 Thruthel Control Strategies
No ratings yet
2018 Thruthel Control Strategies
15 pages
1 s2.0 S2666386422001163 Main
No ratings yet
1 s2.0 S2666386422001163 Main
17 pages
Unwinding
No ratings yet
Unwinding
22 pages
Machine Learning Meets Advanced Robotic Manipulation: A, B C, C C D e
No ratings yet
Machine Learning Meets Advanced Robotic Manipulation: A, B C, C C D e
69 pages
PMLR (2018) - Sim-to-Real Transfer in Reinforcement Learning For Deformable Object Manipulation
No ratings yet
PMLR (2018) - Sim-to-Real Transfer in Reinforcement Learning For Deformable Object Manipulation
10 pages
Composite Dynamic Movement Primitives Based On Neural Networks For Human-Robot Skill Transfer
No ratings yet
Composite Dynamic Movement Primitives Based On Neural Networks For Human-Robot Skill Transfer
11 pages
Machine Learning For Soft Robotic Sensing and Control
No ratings yet
Machine Learning For Soft Robotic Sensing and Control
8 pages
Modeling and Trajectory Tracking Control For A Mul
No ratings yet
Modeling and Trajectory Tracking Control For A Mul
19 pages
A Reservoir Computing Approach For Learning Forward Dynamics of Industrial Manipulators
No ratings yet
A Reservoir Computing Approach For Learning Forward Dynamics of Industrial Manipulators
7 pages
Sample-Efficient Model Predictive Control Design of Soft Robotics by Bayesian Optimization
No ratings yet
Sample-Efficient Model Predictive Control Design of Soft Robotics by Bayesian Optimization
6 pages
Robotics 12 00012 v2
No ratings yet
Robotics 12 00012 v2
19 pages
Dynamics and Control of A Robotic Arm Having Four Links
No ratings yet
Dynamics and Control of A Robotic Arm Having Four Links
13 pages
Lu 2022 J. Phys. Conf. Ser. 2216 012026
No ratings yet
Lu 2022 J. Phys. Conf. Ser. 2216 012026
9 pages
Adaptive-Neural-Network-Based Trajectory Tracking Control For A Nonholonomic Wheeled Mobile Robot With Velocity Constraints
No ratings yet
Adaptive-Neural-Network-Based Trajectory Tracking Control For A Nonholonomic Wheeled Mobile Robot With Velocity Constraints
11 pages
Adaptive Robotics Papers
No ratings yet
Adaptive Robotics Papers
56 pages
Paper Ask1
No ratings yet
Paper Ask1
7 pages
Soft Robots Modeling A Structured Overview
No ratings yet
Soft Robots Modeling A Structured Overview
21 pages
已读2022（3区）Review of Learning-Based Robotic Manipulation in
No ratings yet
已读2022（3区）Review of Learning-Based Robotic Manipulation in
37 pages
Sim-to-Real Reinforcement Learning For Vision-Based Dexterous Manipulation On Humanoids
No ratings yet
Sim-to-Real Reinforcement Learning For Vision-Based Dexterous Manipulation On Humanoids
12 pages
Robotics: Deep Reinforcement Learning For The Control of Robotic Manipulation: A Focussed Mini-Review
No ratings yet
Robotics: Deep Reinforcement Learning For The Control of Robotic Manipulation: A Focussed Mini-Review
13 pages
Pimm Plo Chenevier 2018
No ratings yet
Pimm Plo Chenevier 2018
16 pages
Kinematic Modeling and Observer Based Control of Soft Robot Using Real-Time Finite Element Method
No ratings yet
Kinematic Modeling and Observer Based Control of Soft Robot Using Real-Time Finite Element Method
6 pages
Joshi 2020
No ratings yet
Joshi 2020
6 pages
Imitation-Based Motion Planning and Control of A M
No ratings yet
Imitation-Based Motion Planning and Control of A M
8 pages
Literature Review 1
No ratings yet
Literature Review 1
2 pages
Continuous Deep Q-Learning With Model-Based Acceleration: Shixiang Gu Timothy Lillicrap Ilya Sutskever Sergey Levine
No ratings yet
Continuous Deep Q-Learning With Model-Based Acceleration: Shixiang Gu Timothy Lillicrap Ilya Sutskever Sergey Levine
10 pages
Modeling and Trajectory Tracking Control For A Multi Section Continuum Manipulator
No ratings yet
Modeling and Trajectory Tracking Control For A Multi Section Continuum Manipulator
18 pages
10 1109@iros 2018 8593880
No ratings yet
10 1109@iros 2018 8593880
6 pages
Model-Free Safety Critical Model Predictive Control For Mobile Robot in Dynamic
No ratings yet
Model-Free Safety Critical Model Predictive Control For Mobile Robot in Dynamic
12 pages
Model-Based Deep Reinforcement Learning For Robotic Systems
No ratings yet
Model-Based Deep Reinforcement Learning For Robotic Systems
146 pages
Survey of Model-Based Reinforcement Learning: Applications On Robotics
No ratings yet
Survey of Model-Based Reinforcement Learning: Applications On Robotics
21 pages
Dynamic Path Planning For Dexterous Manipulation A Matlab Implementation IJERTV13IS100037
No ratings yet
Dynamic Path Planning For Dexterous Manipulation A Matlab Implementation IJERTV13IS100037
8 pages
Design, Modelling, and Control of Continuum Arms With Pneumatic Artificial MusclesA Review
No ratings yet
Design, Modelling, and Control of Continuum Arms With Pneumatic Artificial MusclesA Review
23 pages
Actuators 13 00242
No ratings yet
Actuators 13 00242
20 pages
Dynamic Finite Element Modeling and Simulation of Soft Robots
No ratings yet
Dynamic Finite Element Modeling and Simulation of Soft Robots
11 pages
Advanced Intelligent Systems - 2024 - Falotico - Learning Controllers For Continuum Soft Manipulators Impact of Modeling
No ratings yet
Advanced Intelligent Systems - 2024 - Falotico - Learning Controllers For Continuum Soft Manipulators Impact of Modeling
20 pages
Path Following For Autonomous Mobile Robots With Deep Reinforcement Learning
No ratings yet
Path Following For Autonomous Mobile Robots With Deep Reinforcement Learning
22 pages
Reinforcement Learning For Robotics Advance
No ratings yet
Reinforcement Learning For Robotics Advance
2 pages
A Two-Level Approach For Solving The Inverse Kinematics of An Extensible Soft Arm Considering Viscoelastic Behavior
No ratings yet
A Two-Level Approach For Solving The Inverse Kinematics of An Extensible Soft Arm Considering Viscoelastic Behavior
7 pages
Salary Prediction-2
No ratings yet
Salary Prediction-2
26 pages
ML R23 Material
No ratings yet
ML R23 Material
79 pages
Actuators 13 00032 v3
No ratings yet
Actuators 13 00032 v3
18 pages
Circadian DL
No ratings yet
Circadian DL
19 pages
Question Bank With Answers
No ratings yet
Question Bank With Answers
21 pages
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
No ratings yet
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
12 pages
CFD Modeling in Industry2021
No ratings yet
CFD Modeling in Industry2021
7 pages
New - AtHomeWithAI Resources PDF
No ratings yet
New - AtHomeWithAI Resources PDF
6 pages
Counselling Procedure M Tech Full-Time
No ratings yet
Counselling Procedure M Tech Full-Time
2 pages
RL PDF
No ratings yet
RL PDF
4 pages
Sixweeks TRG EEE 2010 General Instructions
No ratings yet
Sixweeks TRG EEE 2010 General Instructions
9 pages
NIPS Conference Book 2011
No ratings yet
NIPS Conference Book 2011
113 pages
Adaptive Dynamic Programming Based Linear Quadratic Regulator Design For Rotary Inverted Pendulum System
No ratings yet
Adaptive Dynamic Programming Based Linear Quadratic Regulator Design For Rotary Inverted Pendulum System
22 pages
Full Graph Database and Graph Computing For Power System Analysis Renchang Dai Ebook All Chapters
No ratings yet
Full Graph Database and Graph Computing For Power System Analysis Renchang Dai Ebook All Chapters
47 pages
Information Theoretic Principles For Agent Learning
No ratings yet
Information Theoretic Principles For Agent Learning
9 pages
Digitalization of The Healthcare Industry - IMPORTANT
No ratings yet
Digitalization of The Healthcare Industry - IMPORTANT
59 pages
Chapter 13 - Reinforcement Learning For Algorithmic Trading
No ratings yet
Chapter 13 - Reinforcement Learning For Algorithmic Trading
21 pages
Mastering Pair Trading With Risk-Aware Recurrent Reinforcement Learning
No ratings yet
Mastering Pair Trading With Risk-Aware Recurrent Reinforcement Learning
8 pages
ML Question Bank
No ratings yet
ML Question Bank
7 pages
Efficient Learning of Robust Quadruped Bounding Using Pretrained Neural Networks
No ratings yet
Efficient Learning of Robust Quadruped Bounding Using Pretrained Neural Networks
12 pages
Reasoning-SQL: Reinforcement Learning With SQL Tailored Partial Rewards For Reasoning-Enhanced Text-to-SQL
No ratings yet
Reasoning-SQL: Reinforcement Learning With SQL Tailored Partial Rewards For Reasoning-Enhanced Text-to-SQL
29 pages
BTP Report
No ratings yet
BTP Report
32 pages
Synchronous Online Learning For Zero-Sum Two-Player Games: 9.5.2 Online Solution of HJI Equation For Non-Linear ZS Game
No ratings yet
Synchronous Online Learning For Zero-Sum Two-Player Games: 9.5.2 Online Solution of HJI Equation For Non-Linear ZS Game
41 pages
Experimental Thermal and Fluid Science
No ratings yet
Experimental Thermal and Fluid Science
11 pages
AI A Z HandBook
No ratings yet
AI A Z HandBook
12 pages
1 s2.0 S0263224123006152 Main
No ratings yet
1 s2.0 S0263224123006152 Main
11 pages
Ashwin Kumar REPORT - 1BI21IS019
No ratings yet
Ashwin Kumar REPORT - 1BI21IS019
57 pages
Ai CW2
No ratings yet
Ai CW2
68 pages
Sustainability 16 01278
No ratings yet
Sustainability 16 01278
15 pages
Xie & Huang, 2023
No ratings yet
Xie & Huang, 2023
10 pages
2016 Sen Conf Viv Square
No ratings yet
2016 Sen Conf Viv Square
9 pages
2015 Sen
No ratings yet
2015 Sen
18 pages
SAC Explanation Slide
No ratings yet
SAC Explanation Slide
13 pages
ME - II YEAR TIME TABLE 28-Feb-2024
No ratings yet
ME - II YEAR TIME TABLE 28-Feb-2024
1 page
Hierarchical Control System: Fundamentals and Applications
From Everand
Hierarchical Control System: Fundamentals and Applications
Fouad Sabry
No ratings yet

Learning A Controller For Soft Robotic Arms and Testing Its Generalization To New Observations Dynamics and Tasks

Uploaded by

Learning A Controller For Soft Robotic Arms and Testing Its Generalization To New Observations Dynamics and Tasks

Uploaded by

Learning a Controller for Soft Robotic Arms and Testing its

Generalization to New Observations, Dynamics, and Tasks

object was a sphere of radius Robj =25 mm identified by its

[7] C. Della Santina and D. Rus, ªControl oriented modeling of soft

You might also like