0% found this document useful (0 votes)
5 views8 pages

Hyatt 2020

Uploaded by

fnoentouba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views8 pages

Hyatt 2020

Uploaded by

fnoentouba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020 1

Real-Time Nonlinear Model Predictive Control of


Robots Using a Graphics Processing Unit
Phillip Hyatt1 , and Marc D. Killpack1

Abstract—In past robotics applications, Model Predictive Con- model, we then use an evolutionary algorithm to find optimal
trol (MPC) has often been limited to linear models and relatively control inputs at the next time step. While the evolutionary
short time horizons. In recent years however, research in op- algorithm cannot guarantee a global optimum, it is easily
timization, optimal control, and simulation has enabled some
forms of nonlinear model predictive control which find locally parallelizable and can be made to perform a global search.
optimal solutions. The limiting factor for applying nonlinear In [1] it is shown that standard gradient-based methods are
MPC for robotics remains the computation necessary to solve superior in continuous quadratic-like cost landscapes, how-
the optimization, especially for complex systems and for long ever evolutionary algorithms are superior in multi-modal or
time horizons. This paper presents a new solution method discontinuous ones.
which addresses computational concerns related to nonlinear
MPC called nonlinear Evolutionary MPC (NEMPC), and then Many implementations of MPC cast the optimal control
we compare it to several existing methods. These comparisons problem as a Sequential Quadratic Program (SQP) and then
include simulations on torque-limited robots performing a swing- use a very fast solver designed to solve an optimization with
up task and demonstrate that NEMPC is able to discover linear constraints and convex cost [2], [3], [4], [5]. The speed
complex behaviors to accomplish the task. Comparisons with of these solvers makes real-time implementation of MPC
state-of-the-art nonlinear MPC algorithms show that NEMPC
finds high quality control solutions very quickly using a global, practical for many applications with fast dynamics, including
instead of local, optimization method. Finally, an application robotics. The restriction of these implementations of MPC is
in hardware (a 24 state pneumatically actuated continuum soft that the model used for optimization must be linear. While this
robot) demonstrates that this method is tractable for real-time assumption is likely fairly accurate for short time horizons, it
control of high degree of freedom systems. becomes less accurate over longer time horizons or with sharp
Index Terms—Optimization and Optimal Control, Control nonlinearities in the dynamics such as when a robot makes
Architectures and Programming, Deep Learning in Robotics and contact with the environment.
Automation, Model Learning for Control
Ideally we could define a high-level cost function and use
MPC to discover the low level behaviors needed to minimize
I. I NTRODUCTION it. However this type of MPC would require a long time
horizon. This desire for long-time-horizon MPC has driven the
M ODEL Predictive Control (MPC) is a well established
sub-optimal form of optimal control which performs
very well in practice. To summarize, MPC seeks to perform a
development of several fast nonlinear MPC algorithms [6], [7],
[8]. In [9] and [10] a very fast dynamics simulation (MuJoCo
trajectory optimization over a future time period, then applies [11]) is used to perform Differential Dynamic Programming
only the first input of the optimized trajectory. After applying (DDP) at fast enough rates to be used as part of an MPC
the input, the process is always repeated which is why it is also scheme for real-time control of a humanoid. We refer to this
called receding horizon control. The fact that the trajectory is method as DDP MPC. Exciting steps have even been taken
re-optimized online with current state information makes MPC to parallelize DDP MPC using multiple shooting integration
behave like a feedback controller, while the ability of MPC to [12], [13] and even CPU and GPU parallelization [14].
plan inputs based on future state predictions makes it behave Parallelized methods for MPC have started to gain more
somewhat like a planner. The complexity of the model used, attention recently as evidenced by the survey paper [15]. Since
as well as the time horizon length and optimization method, the optimization at the heart of MPC is the bottleneck, there
influence the speed of the controller. have been several methods proposed to parallelize convex [16],
In this work we develop a new method, Nonlinear Evolu- [17], [18] and non-convex [19] optimizations using GPUs in
tionary MPC (NEMPC), which enables the use of long time order to speed up MPC.
horizons and complex nonlinear models. For NEMPC, we In [20] a policy improvement method using GPUs is also
approximate the dynamic model of the robotic system using a shown to solve fast enough for real time control of a miniature
Deep Neural Network (DNN). Using the learned approximate race car. This method is called Model Predictive Path Integral
(MPPI) control. Both MPPI and DDP MPC are based on
Manuscript received: August, 31, 2019; Revised November, 25, 2019; improvement of an initial control trajectory which must be
Accepted January, 6, 2020.
This paper was recommended for publication by Editor Dezhen Song upon known a priori in order to employ local optimization meth-
evaluation of the Associate Editor and Reviewers’ comments. This work was ods. We use DDP MPC and MPPI as benchmarks for our
supported by a Utah NASA Space Grant Consortium Fellowship experiments because they represent state-of-the-art methods
1 Phillip Hyatt and Marc Killpack are with Brigham Young University,
Mechanical Engineering Department and because our proposed parallelized method most closely
Digital Object Identifier (DOI): see top of this page. resembles MPPI.

2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
2 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020

Our work is most similar to [21] and [20] in that we sample


control input trajectories in parallel using a GPU. However
we employ a global optimization method to search the entire
control space instead of improving a trajectory with a local
optimization or policy improvement method. In this sense our
work is similar to [22] and [23] where global sampling-based
GPU methods are used for control of a parafoil and half-car
suspension system. However, our use of DNNs to approximate
the system dynamics allows more complex models and faster
control rates.
Because the dimension of the control space is so large
(especially for long time horizons) we choose to parame- Fig. 1: The two simulated robots used for this work. An
terize the input space using linear piecewise functions. The inverted pendulum with single simulated torque source and
parameterization of the input space is another distinguishing a three link robot with simulated motors at each joint.
element of our method which enables global optimization
methods to find good solutions quickly. Our choice of pa-
rameterization builds on previous work using parameterized
control trajectories in a linear Evolutionary Model Predictive
Control algorithm [24]. While the preliminary work in [24]
uses a cubic spline parameterization, a linear or linearized
model, and is developed with heuristics specific to impedance
controlled manipulators, the method presented in this work
uses a linear piecewise parameterization, a nonlinear model,
and can be generalized to any dynamic system.
In order to rapidly sample hundreds or thousands of input
trajectories in parallel using the full nonlinear dynamics of
the system, the nonlinear model is approximated using a
Deep Neural Network (DNN). DNNs have gained popularity
in image classification and regression applications, and have
more recently become popular tools in robotics for modeling
and control (see [25], [26], [27], [28], [29]). While machine
learning models are often black boxes, because we are sam- Fig. 2: The pneumatically actuated continuum robot used for
pling from an analytically derived dynamics model we have hardware experiments.
a practically infinite data set. This allows us to avoid the
problems of overfitting and to clearly define the experience
distribution for the DNN. Another benefit to using DNNs as setup for the comparison of NEMPC with three state-of-the-
function approximators is that it allows us to use existing APIs art optimal controllers (DP, DDP MPC, and MPPI), as well
for optimized training and evaluation of DNN models on a as an actual hardware experiment. Section IV discusses the
GPU. experimental results. Section V summarizes our findings.
The use of DNNs in this work builds upon the work in
[26] wherein a large DNN which is trained to approximate II. M ETHOD
discrete-time dynamics is linearized for use in a traditional
gradient-based MPC solver. In this work a DNN of the same A. Dynamic Model Approximation using DNNs
form, but much smaller, is used to simulate nonlinear dynamics In order to quickly roll out nonlinear simulations of robot
in an MPC scheme which uses an evolutionary optimization dynamics in parallel, the nonlinear dynamics are approximated
algorithm. using a DNN. While the dynamic model of a robot can be
The main contributions of this paper are as follows: learned from real data, in this work we are simply using
• A real-time nonlinear model predictive control algorithm DNNs as function approximators, and the functions we wish
based on a global optimization method to approximate are the discrete time dynamic equations.
• Experimental comparisons of nonlinear evolutionary 1) Inverted Pendulum Dynamic Equations: The continuous
model predictive control (NEMPC) with dynamic pro- time dynamics of an inverted pendulum such as the one
gramming (DP), DDP MPC, and MPPI solutions depicted in Figure 1 are
• Real-time demonstration of NEMPC on a 24 state pneu-
matically actuated continuum robot in hardware ml2 q̈ + bq̇ + mglsin(q) = τmotor (1)
The rest of the paper is organized as follows: Section II
explains the models and methods used for approximating dy- where m is the mass at the end of the link, l is the length of a
namic models with a DNN, as well as the NEMPC algorithm massless link, b is a viscous damping coefficient, and g is the
implemented on a GPU. Section III details the experimental acceleration of gravity. We choose as our state x = [q̇, q]T .

2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
HYATT et al.: NONLINEAR MPC USING A GPU 3

2) Three Link Robot Dynamic Equations: We assume our


three link robot is comprised of rigid links and pin joints, so
the dynamic equations take the canonical form
M (q)q̈ + C(q, q̇) + bq̇ + τgrav = τ (2)
where q is the vector of generalized coordinates, M (q) is
a configuration dependent inertia matrix, C(q, q̇) represents
torques produced by centrifugal and Coriolis forces, b is a
viscous damping coefficient, τgrav are the torques applied by
gravity on the robot and τ are applied torques from the motors.
We choose as our state x = [q̇, q]T .
3) Pneumatically Actuated Continuum Robot Dynamic
Equations: We model the dynamics of our continuum joint
robot (seen in Figure 2) with equations very similar to those Fig. 3: DNN architecture used to approximate dynamic mod-
given for the three link robot, but with the addition of pressure els.
dynamics. The full dynamics of the arm are

The input (or second) dimension of the weighting matrix for


M (q)q̈ + C(q, q̇) + bq̇ + τgrav − Kspring q = Kprs p (3) the first layer is equal to the number of states and inputs in
the model to be approximated. The output (or first) dimensions
ṗ = α(pcmd − p) (4) of the first three weighting matrices are labeled in Figure 3
as 100, 200, and 400. The final two layers take as inputs
where Kspring represents the passive elasticity of the joints the element-wise sum of the previous layers outputs and the
centered at the zero configuration, Kprs is a linear mapping outputs of a more distantly previous layer with the same
from pressures at each joint to torques, p is a vector of the 12 dimension.
pressures (four independent pressures in each joint), pcmd is
a vector of the commanded pressures, and α is a 1st-order fill As can be seen in Figure 3, the output of the DNN is only
coefficient representing the speed of the pressure dynamics. added to the current joint velocities, so essentially it is only
We choose as our state x = [p, q̇, q]T . With six generalized predicting the change in velocity over a fixed time step ∆t. The
coordinates, their derivatives, and the 12 pressures, this model position states are not predicted by the DNN, since we already
contains 24 states and 12 inputs. The specific form of the know that the position states will simply be the integrated
continuum kinematics and dynamics are described in more velocity as given by Equation 5. Moreover, we can perform
detail in [26] and [30]. angle wrapping on the position states instead of forcing the
4) Training the DNN: In order to forward simulate our DNN to learn that sharp nonlinearity. We found that decreasing
systems we need a discrete time model which is capable of the burden on the DNN in this way allows it to more accurately
predicting future states. To obtain this we perform first order represent changes in velocity.
Euler integration. At timestep k we predict the state at time DNNs for both of our simulated robots were implemented
step k + 1 using the equation and trained using Pytorch [32]. For training, each batch of
training data consisted of 1000 randomly selected state-input
xk+1 = xk + ẋk ∆t. (5) pairs which were then fed through our analytically derived
Using the continuous state space equations (Eqns. 1-3) and discrete time simulation. The loss function was simply the
Equation 5 we have developed a numerical simulation of our mean squared error of the DNN prediction of velocity at
robot which is able to integrate the state after some small the next timestep. The Adam optimizer was used with a
∆t given the current state and inputs. In order to forward learning rate of .0001 and training was concluded after about
simulate hundreds or thousands of these simulations quickly 30 minutes using an Nvidia GeForce 750 Ti GPU.
and in parallel on the GPU we approximate the discrete time The sizes of the DNNs for the inverted pendulum, three
simulation using a DNN of the form depicted in Figure 3. link robot, and pneumatically actuated continuum robot are
The architecture used for the DNN in this work is inspired 200,400 parameters, 201,200 parameters, and 204,200 param-
by the Unet architecture developed in [31], however with fully eters respectively. These sizes are similar because the same
connected instead of convolutional layers. It builds on the work architecture was used for all three models while only the
done in [26], however with fewer and smaller layers. Each blue dimension of the first and last layers changed to match the
box in Figure 3 represents a fully connected layer of the form number of system states, inputs, and velocities. While it is
 surprising that the same size DNN serves to model systems
y = max 0, W x + b (6)
of such varying size and complexity, because we train on a
where x and y are the input and output respectively, W and practically infinite dataset derived from an analytically derived
b are a learned weight and bias respectively, and the element- model, we drastically reduce the danger of overfitting. The
wise maximum operation is commonly referred to as the size and structure of DNNs needed to model dynamic systems
rectified linear unit (ReLU) nonlinear activation function. remains an interesting and open area of research.

2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
4 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020

Algorithm 1 Nonlinear Evolutionary Model Predictive Con-


trol Algorithm
1: for every simulation in parallel do
2: if Cold Start then
3: Uchild = U(umin , umax )
4: else if Warm Start then
5: Randomly select two parents from Uparents
6: Uchild = crossover(Uparent1 , Uparent2 )
7: Uchild = Uchild + N (µ = 0, σnoise )
8: end if
9: J =0
10: for i = 0 to T do
Fig. 4: In order to decrease the search space for our optimiza-
11: ui = get u f rom U (Uchild , i, T )
tion, input trajectories are represented as knot points linearly
12: xi+1 = DN N (xi , ui )
connected over the horizon of length T time steps.
13: J = J + cost f unction(xi , ui , i)
14: end for
B. Input Trajectory Parameterization 15: end for
16: Uparents = Uchild ’s with lowest J’s
We parameterize our input trajectory as a piecewise linear
17: U ∗ = Uchild with lowest J
function connecting knot points as seen in Figure 4. The
18: u∗ = get u f rom U (U ∗ , 0, T )
parameterization for this work consisted of three points: one
19: Apply u∗ to the robot
at the beginning of the time horizon, one in the middle, and
one at the end.
While this is a drastic simplification and severely limits the
algorithms: mating and crossover, mutation, and selection. It
control trajectory possibilities, we find that NEMPC is able
is made simpler by the fact that we have parameterized the
to find surprisingly complex behaviors. Higher order param-
input trajectories in our population, so they can be represented
eterizations can be used, but in our preliminary experiments
simply by their knot points which we denote U .
we found no significant improvement in MPC performance
1) Mating and Crossover: In the case of a cold start for
by increasing the number of knot points. In fact, if the
the optimization, there is no prior population of knot points,
number of knot points is increased beyond a certain point,
so it is necessary to create one. This is done by randomly
the performance degrades. We believe this is at least partially
sampling knot points from a uniform distribution bounded by
due to the optimization being forced to search a much higher
the minimum and maximum inputs.
dimensional space.
In the case of a warm start, at least one generation of knot
For this work, the math required to calculate an input ui
points already exists where each individual has been evaluated
given knot points U , the time step i, and horizon length
and assigned a cost. Those individuals with lowest costs are
T is a simple linear interpolation. However, any parame-
designated as parents. Each parent is paired randomly with
terization can be used with this method and so we choose
another parent for crossover. During crossover, for each system
to represent the mapping from U to u in Algorithm 1 as
input, the child trajectory inherits the knot points for that input
get u f rom U (U, i, T ).
from either parent with 50% probability.
2) Mutation: While mating and crossover help to converge
C. Nonlinear Evolutionary Algorithm for Trajectory Opti- trajectories towards minima, mutation is used to explore more
mization of the control input space. In our implementation, every
Instead of relying on a local optimization method to im- child is subject to mutation, which is simply Gaussian noise
prove upon an existing policy or trajectory, NEMPC uses added to the knot points of the input trajectory. The standard
an evolutionary algorithm to explore the entire space of deviation of this noise σnoise is a tuning parameter. We
parameterized control trajectories. Because the dynamics of found experimentally that making the noise proportional to
robotic systems are often non-convex, the optimization of a the distance from a goal seems to work well because as the
control trajectory contains many local minima. Gradient-based system approaches its goal state, the changes to the trajectory
optimization methods such as QP, SQP, or DDP guarantee should probably be minor adjustments as opposed to major
convergence to a local minimum, but not a global minimum. changes.
Heuristic global optimization methods such as evolutionary Once the mutation phase is complete there is a whole new
optimization provide no guarantees whatsoever, however they generation of knot points ready to be evaluated using the
can be made to perform a global search as opposed to a local selection phase.
one. This is why we refer to our evolutionary algorithm as a 3) Selection: Selection is the process in which the fitness of
global optimization method. The effectiveness of this approach each member of a population of potential control trajectories
will be shown in Section III. is evaluated, and only the most fit members of the population
The evolutionary algorithm implemented for this work survive to be the parents of the next generation. Fitness is
contains all of the traditional elements of evolution-based evaluated using a fitness function, which for our case is the

2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
HYATT et al.: NONLINEAR MPC USING A GPU 5

cost function for our optimal control problem. In order to


perform a more direct comparison with prior solution methods,
we use the typical quadratic cost used in optimal control

T
X −1  
J= T T
x̃i Qx̃i + ũi Rũi + x˜T T Qf x˜T (7)
i=0

where
x̃ = x − xgoal
(8)
ũ = u − ugoal
However, one of the strengths of our approach to solving
the MPC problem is that we can use any form of cost
function including those where the gradients are discontinuous
or undefined. For this reason, we denote the function used to
assign cost in Algorithm 1 as cost f unction(xi , ui , i).
During the selection phase, every control trajectory in the Fig. 5: Joint angle history during a swing-up task for the
population is simulated on the GPU from the current state. inverted pendulum. Because it does not have enough available
At each time step in the horizon, the state is forward simu- torque to lift the link directly, it swings back and forth to gain
lated using the learned model (represented by DN N (xi , ui ) momentum.
in Algorithm 1), then any additive cost is added using
cost f unction(xi , ui , i). By forward simulating the dynamics choose to test it on a simple case - an inverted pendulum with
in this way, dynamics constraints are enforced implicitly. State torque constraints. We compare with DP because DP should
boundary constraints can be added as a large penalty in the cost converge to the absolute optimal performance, providing a
function inside an if/else statement or can be saturated during lower bound on the cost possible for this task.
dynamic simulation. After all of the simulations on the GPU The task is for the controller to swing the pendulum into an
have terminated, there is a cost associated with each control upright position where θ = 0, starting from the θ = π position.
trajectory. The members of the population which are selected Because the torque is limited to 1 Nm, this is impossible to
to be parents of the next generation are those which have the do by applying a constant torque in one direction. The simple
lowest cost. At the end of the selection phase we have finished cost function defined by Equation 7 is used for both NEMPC
fitness evaluation, and are left with only the best trajectories and DP with R = 0 and Q weighting only position error with
and their associated costs. a value of 1. NEMPC was run with a horizon (T ) of 50 time
While most optimizations define a termination criteria based steps for NEMPC.
on the number of iterations, the cost, or cost derivatives, our For the sake of brevity we do not include all of the
NEMPC method returns an approximately optimal input u∗ details about implementing DP for a continuous system. We
after one iteration. We expect the population to converge to a simply state that we discretized the state and action spaces
better solution over time, but this strategy of “doing something and implemented a value iteration algorithm in order to find
good soon rather than something better later” ([2]) leads to the optimal policy. Online, we then used interpolation to
good performance as shown in Section III. implement this discrete policy for a continuous system. A good
reference for value iteration can be found [33], while details
III. E XPERIMENTS about DP for continuous systems can be found in [34].
Three experiments were designed and carried out in order The joint angle trajectories from simulating both NEMPC
to test the viability of the NEMPC approach and to evaluate and DP are depicted in Figure 5 while the commanded torques
its ability to find optimal control behaviors without trajectory are in Figure 6. Note that the straight down position can be
initialization. In order to judge the value of the approach, represented as an angle of π or −π.
comparisons were made in simulation to three state-of-the-
art optimal control algorithms - Dynamic Programming (DP), B. Simulated Three Link Robot Arm
Differential Dynamic Programming MPC (DDP MPC), and
This experiment is very similar to the inverted pendulum
Model Predictive Path Integral Control (MPPI). The final ex-
task, but with a three link robot instead of a single pendulum.
periment on hardware was carried out in order to demonstrate
Again, the task is to swing the arm from its completely stable
that this method scales well to high degree of freedom systems
equilibrium hanging down to the unstable equilibrium in an
and can still run in real-time. A video of all three experiments
upright position with all joints at θ = 0. Again, the available
can be found here: https://youtu.be/hrqdUXd-xJ4
torque (10 Nm) is not enough to swing the arm straight up
using a constant torque.
A. Simulated Inverted Pendulum Because DP does not scale easily and may not even be
In order to first determine if NEMPC allows us to find opti- feasible for some higher degree of freedom problems, we
mal behaviors for a nonlinear system with input constraints, we compare NEMPC to two nonlinear versions of MPC which

2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
6 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020

Fig. 6: Input torque for a swing-up task for the inverted


pendulum.
Fig. 7: Joint angles during a swingup task for the three link
robot arm.
serve as state-of-the-art benchmarks. DDP MPC is an MPC
scheme which uses DDP to optimize control inputs each time C. Pneumatically Actuated Continuum Robot Arm
step. There are several methods used to incorporate constraints
The last experiment was performed in hardware with the
into DDP as outlined in [7]. We implement constraints using
goal of demonstrating that NEMPC can be used in real-time
what is referred to in [7] as the clamping method wherein the
for large degree of freedom systems. In order to demonstrate
inputs are saturated during the forward rollout of dynamics for
this, we implemented NEMPC for a 24 state soft continuum
DDP. We also warm start DDP with its prior solution at each
robot and controlled it to several joint configurations. NEMPC
time step, the first solve being cold started with zero torques
was run with a horizon (T ) of 40 time steps and a model
over the trajectory.
discretized at .02s, running at 50 Hz.
MPPI is a parallelized MPC scheme designed to use the While the first two experiments showed that NEMPC is
GPU to perform forward rollouts of nonlinear dynamics. Given capable of finding high quality solutions to the MPC problem,
an initial trajectory, neighboring trajectories are sampled and even for very nonlinear and input constrained problems, they
assigned costs calculated during the forward rollout of the were performed assuming MPC was given a perfect model. For
system dynamics. The improved trajectory can be thought of this experiment, our model of the pressure and continuum joint
as an inverse cost (or reward) weighted average of the sampled dynamics are only approximate. Accurate modeling of soft
input trajectories. Our implementation of MPPI follows that of and continuum joint robots is still an active area of research
[21]. Similar to DDP MPC we warm start MPPI with its prior and is outside the scope of this paper. See [35], [36] for an
solution at each time step, the first solve being cold started overview of challenges in soft robot modeling and control. We
with zero torques over the trajectory. are confident that better modeling, including learned models
The same cost function is used for all three MPC schemes from data which fit nicely into our control framework, would
in order to make a fair comparison of how well each algorithm enable better performance of model-based controllers for soft
minimized cost over the whole trajectory. The cost function robots.
used is of the quadratic form found in Equation 7. The cost In order to decrease steady state error due to modeling error,
weightings on the proximal, middle, and distal joints’ position we implement a simple integrator on joint configuration angle.
error are 3, 2, and 1 respectively and for the final timestep Figure 8 shows the joint angle response of using NEMPC both
these weightings are increased to 30, 20, and 10. The cost with and without an integrator.
weighting on each input are .0001. The horizon length (T )
used for each MPC solver is 25 time steps and both MPPI IV. R ESULTS AND D ISCUSSION
and NEMPC used 500 samples per solve.
Figure 5 and 6 show that for the single link robot, both DP
The solve time for each method is recorded as the time and NEMPC converge to near the same solution. Evaluating
to solve for the next input to be applied to the system. These the cost function for each simulated trajectory shows the ex-
experiments were run with an Intel i7-4770 CPU and an Nvidia pected result that DP accrued a lower cost than NEMPC (1663
GeForce 750 Ti GPU. vs 1694). In fact, DP should converge to the absolute optimal
The achieved joint angles from both methods are displayed solution. Because there is no cost on inputs for this experiment,
in Figure 7. Again, note that because there are no joint limits, the optimal behavior is to apply maximum torque in one
the arms can swing all the way around so that θ = π is the direction or the other until at the goal. At about three seconds
same position as θ = −π. in Figure 6 one can see that interpolation between these

2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
HYATT et al.: NONLINEAR MPC USING A GPU 7

TABLE I: Comparison of costs and solve times for three


nonlinear MPC methods

Cost Mean Solve Time


NEMPC 889 .0295 s
MPPI 1285 .0276 s
DDP MPC 1345 .245 s

forward simulating the dynamics of the system. Often, a line


search is done which involves several forward and backward
passes in order to find a trajectory update which improves the
trajectory. By way of comparison, for each NEMPC or MPPI
solve there are hundreds of forward passes done, but these are
all completed at once in parallel on the GPU.
Fig. 8: Joint angles of the pneumatically actuated continuum
Because of the parallel nature of NEMPC and MPPI and
robot giving step inputs to the NEMPC controller. Commanded
their use of the GPU, in our experiments NEMPC and MPPI
joint angles are shown as well as performance of NEMPC and
are able to solve much more quickly than DDP MPC. We
NEMPC with an integrator.
should note that by using specialized simulation software, mul-
tithreading, and other optimizations, DDP MPC is run at much
extremes leads to the chattering behavior exhibited by DP. The faster rates in other works [9], [12], [14]. The results presented
fact that NEMPC performs so similarly to DP suggests that in this work represent un-optimized Python implementations
NEMPC is not finding the actual optimum, but is very close. of all algorithms. It is expected that all implementations could
It is important to recognize that both algorithms discovered benefit from optimization to be run at faster rates. For example,
control behavior which cannot be discovered without taking TensorRT could be used to speed up the DNN evaluation on
into account torque constraints and the nonlinear effect of the GPU for the NEMPC and MPPI methods. However, the
gravity. The advantage of NEMPC over DP is that DP requires massively parallelized nature of NEMPC and MPPI clearly
the calculation of a value function for each goal position, show a computational benefit, while allowing for similar if
which can take hours to compute. This problem is exacerbated not better performance than DDP MPC in our experiments.
with increasing degrees of freedom. Having established that NEMPC is capable of finding high
Figure 7 for the three link robot performing a swing-up quality solutions to the nonlinear MPC problem, the final
task shows that NEMPC, DDP MPC, and MPPI each find experiment demonstrated that it can be used in real-time for
different solutions. While DDP MPC maintains the second and high degree of freedom systems. As seen in Figure 8, NEMPC
third joints near the goal position the entire time, NEMPC drives the joint angles quickly toward final positions with very
and MPPI find lower cost solutions by swinging the more little overshoot despite the very underdamped nature of the
distal joints through an entire revolution. It can be seen in joints. However, due to model inaccuracy, the final position
Figure 7 that NEMPC finds a more aggressive trajectory than reached by NEMPC has steady state error which is remedied
MPPI which involves rotating the second and third links for with a simple integrator. As previously mentioned, a better
faster convergence to the goal. Evaluating the cost function model would decrease this steady state error and improve
over this simulation reveals that NEMPC has found the lowest performance.
cost solution of the three (See Table I). Though these results are impressive on their own, with
The fact that each solver finds different solutions suggests advances in parallel computing and GPUs, we expect even
that there are multiple local optima. DDP MPC is a local better performance from this method in the future. Moreover,
method based on gradient descent and therefore converges while most optimal control methods become less tractable for
to a minimum near the initial trajectory. MPPI uses a policy higher degree of freedom systems, the parallelized nature of
improvement strategy which is able to find a better solution NEMPC allows it to scale well.
than DDP MPC by searching in areas other than those of
immediate gradient descent. By searching the entire control V. C ONCLUSION
space NEMPC is able to find a solution of lower cost than In this paper we have shown the viability of a nonlinear
both DDP MPC and MPPI. The nature of the evolutionary MPC method based on global optimization techniques with
algorithm means that NEMPC can simultaneously improve a GPU and learned models. We have shown that it is pos-
upon several different local optima with each MPC iteration, sible to discover complex behaviors such as those needed to
returning the lowest cost solution while still improving upon perform swing-up tasks for torque constrained robots, using
each. This feature makes NEMPC well suited to nonlinear parameterized control input trajectories.
MPC where the cost landscape contains many local minima. Our specific implementation has made use of an evolution-
Each iteration of DDP, and therefore each MPC solve of ary algorithm for optimization and piecewise linear functions
DDP MPC, requires a backward pass through the state-control for control input trajectories. While this work shows that
trajectory calculating dynamics and cost derivatives, then a those are viable options, this paper does not claim that these
forward pass calculating an updated control trajectory and tools are the best choices. Future research should include

2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
8 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020

other parallelized global optimization methods such as interval [21] G. Williams, A. Aldrich, and E. Theodorou, “Model Predictive Path
analysis and particle swarm in order to more fully understand Integral Control using Covariance Variable Importance Sampling,” p. 8,
2015. [Online]. Available: http://arxiv.org/abs/1509.01149
the strength and weaknesses of these methods and how they [22] J. Rogers and N. Slegers, “Robust Parafoil Terminal Guidance Using
could be applied. Massively Parallel Processing,” Journal of Guidance, Control, and
Dynamics, vol. 36, no. August, pp. 1336–1345, 2013. [Online].
R EFERENCES Available: http://arc.aiaa.org/doi/abs/10.2514/1.59782
[23] K. M. M. Rathai, O. Sename, and M. Alamir, “GPU-based parameterized
[1] K. A. De Jong, “Analysis of the behavior of a class of genetic adaptive nmpc scheme for control of half car vehicle with semi-Active suspension
systems,” 1975. system,” IEEE Control Systems Letters, vol. 3, no. 3, pp. 631–636, 2019.
[2] Y. Wang and S. Boyd, “Fast Model Predictive Control Using Online Op- [24] P. Hyatt and M. D. Killpack, “Real-Time Evolutionary Model Predictive
timization,” IEEE Transactions on Control Systems Technology, vol. 18, Control Using a Graphics Processing Unit,” Humanoids 2017, 2017.
no. 2, pp. 267–278, mar 2010. [25] M. T. Gillespie, C. M. Best, E. C. Townsend, D. Wingate, and M. D.
[3] L. Rupert, P. Hyatt, and M. D. Killpack, “Comparing Model Predictive Killpack, “Learning nonlinear dynamic models of soft robots for model
Control and input shaping for improved response of low-impedance predictive control with neural networks,” 2018 IEEE International
robots,” no. June 2017, 2015. Conference on Soft Robotics (RoboSoft), pp. 39–45, 2018. [Online].
[4] C. M. Best, M. T. Gillespie, P. Hyatt, L. Rupert, V. Sherrod, and Available: https://ieeexplore.ieee.org/document/8404894/
M. D. Killpack, “A New Soft Robot Control Method: Using Model [26] P. Hyatt, D. Wingate, and M. D. Killpack, “Model-Based Control of Soft
Predictive Control for a Pneumatically Actuated Humanoid,” IEEE Actuators Using Learned Non-linear Discrete-Time Models,” Frontiers
Robotics & Automation Magazine, vol. 23, no. 3, pp. 75–84, 2016. in Robotics and AI, vol. 6, no. April, pp. 1–11, 2019. [Online]. Available:
[Online]. Available: http://ieeexplore.ieee.org/document/7551190/ https://www.frontiersin.org/article/10.3389/frobt.2019.00022/full
[5] J. S. Terry, L. Rupert, and M. D. Killpack, “Effect of Simplified Robot [27] S. Levine and P. Abbeel, “Learning neural network policies with
Dynamic Models on Model Predictive Control Performance,” IEEE-RAS guided policy search under unknown dynamics,” in Advances in Neural
International Conference on Humanoid Robots, 2017. Information Processing Systems, 2014, pp. 1071–1079.
[6] W. Li and E. Todorov, “Iterative Linear Quadratic Regulator [28] T. Zhang, G. Kahn, S. Levine, and P. Abbeel, “Learning deep control
Design for Nonlinear Biological Movement Systems,” Proceedings policies for autonomous aerial vehicles with mpc-guided policy search,”
of the First International Conference on Informatics in Control, in 2016 IEEE international conference on robotics and automation
Automation and Robotics, no. January 2004, pp. 222–229, 2004. (ICRA). IEEE, 2016, pp. 528–535.
[Online]. Available: http://www.scitepress.org/DigitalLibrary/Link.aspx? [29] I. Lenz, R. Knepper, and A. Saxena, “Deepmpc: Learning deep latent
doi=10.5220/0001143902220229 features for model predictive control,” in Robotics Science and Systems
[7] Y. Tassa, N. Mansard, and E. Todorov, “Control-limited differential (RSS), 2015.
dynamic programming,” Proceedings - IEEE International Conference [30] D. M. Bodily, “Design Optimization and Motion Planning For
on Robotics and Automation, pp. 1168–1175, 2014. Pneumatically-Actuated Manipulators,” 2017.
[8] E. Todorov and Weiwei Li, “A generalized iterative LQG method for [31] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
locally-optimal feedback control of constrained nonlinear stochastic for biomedical image segmentation,” in Medical Image Computing and
systems,” Proceedings of the 2005, American Control Conference, Computer-Assisted Intervention – MICCAI 2015. Springer International
2005., pp. 300–306. [Online]. Available: http://ieeexplore.ieee.org/ Publishing, 2015, pp. 234–241.
document/1469949/ [32] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,
[9] J. Koenemann, A. Del Prete, Y. Tassa, E. Todorov, O. Stasse, M. Ben- A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in
newitz, and N. Mansard, “Whole-body model-predictive control applied pytorch,” in NIPS-W, 2017.
to the HRP-2 humanoid,” IEEE International Conference on Intelligent [33] S. Thrun, W. Burgard, and D. Fox, Probabilistic robotics. MIT press,
Robots and Systems, vol. 2015-Decem, pp. 3346–3351, 2015. 2005.
[10] Y. Tassa, T. Erez, and E. Todorov, “Synthesis and Stabilization of [34] R. Cory and R. Tedrake, “Experiments in fixed-wing uav perching,” in
Complex Behaviors through Online Trajectory Optimization,” pp. 4906– AIAA Guidance, Navigation and Control Conference and Exhibit, 2008,
4913, 2012. p. 7256.
[11] E. Todorov, T. Erez, and Y. Tassa, “MuJoCo: A physics engine for [35] D. Rus and M. T. Tolley, “Design, fabrication and control of soft robots,”
model-based control,” IEEE International Conference on Intelligent Nature, vol. 521, no. 7553, pp. 467–475, 2015.
Robots and Systems, pp. 5026–5033, 2012. [36] P. Hyatt, D. Kraus, V. Sherrod, L. Rupert, N. Day, and M. D. Killpack,
[12] M. Giftthaler, M. Neunert, M. Stäuble, J. Buchli, and M. Diehl, “A “Configuration estimation for accurate position control of large-scale
Family of Iterative Gauss-Newton Shooting Methods for Nonlinear soft robots,” IEEE/ASME Transactions on Mechatronics, vol. 24, no. 1,
Optimal Control,” 2017. [Online]. Available: http://arxiv.org/abs/1711. pp. 88–99, 2018.
11006
[13] M. Neunert, M. Stäuble, M. Giftthaler, C. D. Bellicoso, J. Carius,
C. Gehring, M. Hutter, and J. Buchli, “Whole-Body Nonlinear
Model Predictive Control Through Contacts for Quadrupeds,” 2017.
[Online]. Available: http://arxiv.org/abs/1712.02889{%}0Ahttp://dx.doi.
org/10.1109/LRA.2018.2800124
[14] B. Plancher and S. Kuindersma, “A Performance Analysis of Parallel
Differential Dynamic Programming on a GPU,” International Workshop
on the Algorithmic Foundations of Robotics (WAFR), 2018.
[15] K. M. Abughalieh and S. G. Alawneh, “A survey of parallel implemen-
tations for model predictive control,” IEEE Access, vol. 7, pp. 34 348–
34 360, 2019.
[16] N. Parikh and S. Boyd, “Block splitting for distributed optimization,”
Mathematical Programming Computation, vol. 6, no. 1, pp. 77–102,
2014.
[17] L. Yu, A. Goldsmith, and S. D. Cairano, “Efficient Convex Optimization
on GPUs for Embedded Model Predictive Control Categories and
Subject Descriptors,” 2017.
[18] M. Maggioni, “Sparse Convex Optimization on GPUs,” Thesis, 2015.
[Online]. Available: http://hdl.handle.net/10027/20173
[19] P. Cottle, “Massively Parallel Non-Convex Optimization on the GPU
Through the Graphics Pipeline.”
[20] G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou,
“Aggressive driving with model predictive path integral control,” Pro-
ceedings - IEEE International Conference on Robotics and Automation,
vol. 2016-June, pp. 1433–1440, 2016.

2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like