Hyatt 2020
Hyatt 2020
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020 1
Abstract—In past robotics applications, Model Predictive Con- model, we then use an evolutionary algorithm to find optimal
trol (MPC) has often been limited to linear models and relatively control inputs at the next time step. While the evolutionary
short time horizons. In recent years however, research in op- algorithm cannot guarantee a global optimum, it is easily
timization, optimal control, and simulation has enabled some
forms of nonlinear model predictive control which find locally parallelizable and can be made to perform a global search.
optimal solutions. The limiting factor for applying nonlinear In [1] it is shown that standard gradient-based methods are
MPC for robotics remains the computation necessary to solve superior in continuous quadratic-like cost landscapes, how-
the optimization, especially for complex systems and for long ever evolutionary algorithms are superior in multi-modal or
time horizons. This paper presents a new solution method discontinuous ones.
which addresses computational concerns related to nonlinear
MPC called nonlinear Evolutionary MPC (NEMPC), and then Many implementations of MPC cast the optimal control
we compare it to several existing methods. These comparisons problem as a Sequential Quadratic Program (SQP) and then
include simulations on torque-limited robots performing a swing- use a very fast solver designed to solve an optimization with
up task and demonstrate that NEMPC is able to discover linear constraints and convex cost [2], [3], [4], [5]. The speed
complex behaviors to accomplish the task. Comparisons with of these solvers makes real-time implementation of MPC
state-of-the-art nonlinear MPC algorithms show that NEMPC
finds high quality control solutions very quickly using a global, practical for many applications with fast dynamics, including
instead of local, optimization method. Finally, an application robotics. The restriction of these implementations of MPC is
in hardware (a 24 state pneumatically actuated continuum soft that the model used for optimization must be linear. While this
robot) demonstrates that this method is tractable for real-time assumption is likely fairly accurate for short time horizons, it
control of high degree of freedom systems. becomes less accurate over longer time horizons or with sharp
Index Terms—Optimization and Optimal Control, Control nonlinearities in the dynamics such as when a robot makes
Architectures and Programming, Deep Learning in Robotics and contact with the environment.
Automation, Model Learning for Control
Ideally we could define a high-level cost function and use
MPC to discover the low level behaviors needed to minimize
I. I NTRODUCTION it. However this type of MPC would require a long time
horizon. This desire for long-time-horizon MPC has driven the
M ODEL Predictive Control (MPC) is a well established
sub-optimal form of optimal control which performs
very well in practice. To summarize, MPC seeks to perform a
development of several fast nonlinear MPC algorithms [6], [7],
[8]. In [9] and [10] a very fast dynamics simulation (MuJoCo
trajectory optimization over a future time period, then applies [11]) is used to perform Differential Dynamic Programming
only the first input of the optimized trajectory. After applying (DDP) at fast enough rates to be used as part of an MPC
the input, the process is always repeated which is why it is also scheme for real-time control of a humanoid. We refer to this
called receding horizon control. The fact that the trajectory is method as DDP MPC. Exciting steps have even been taken
re-optimized online with current state information makes MPC to parallelize DDP MPC using multiple shooting integration
behave like a feedback controller, while the ability of MPC to [12], [13] and even CPU and GPU parallelization [14].
plan inputs based on future state predictions makes it behave Parallelized methods for MPC have started to gain more
somewhat like a planner. The complexity of the model used, attention recently as evidenced by the survey paper [15]. Since
as well as the time horizon length and optimization method, the optimization at the heart of MPC is the bottleneck, there
influence the speed of the controller. have been several methods proposed to parallelize convex [16],
In this work we develop a new method, Nonlinear Evolu- [17], [18] and non-convex [19] optimizations using GPUs in
tionary MPC (NEMPC), which enables the use of long time order to speed up MPC.
horizons and complex nonlinear models. For NEMPC, we In [20] a policy improvement method using GPUs is also
approximate the dynamic model of the robotic system using a shown to solve fast enough for real time control of a miniature
Deep Neural Network (DNN). Using the learned approximate race car. This method is called Model Predictive Path Integral
(MPPI) control. Both MPPI and DDP MPC are based on
Manuscript received: August, 31, 2019; Revised November, 25, 2019; improvement of an initial control trajectory which must be
Accepted January, 6, 2020.
This paper was recommended for publication by Editor Dezhen Song upon known a priori in order to employ local optimization meth-
evaluation of the Associate Editor and Reviewers’ comments. This work was ods. We use DDP MPC and MPPI as benchmarks for our
supported by a Utah NASA Space Grant Consortium Fellowship experiments because they represent state-of-the-art methods
1 Phillip Hyatt and Marc Killpack are with Brigham Young University,
Mechanical Engineering Department and because our proposed parallelized method most closely
Digital Object Identifier (DOI): see top of this page. resembles MPPI.
2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
2 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020
2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
HYATT et al.: NONLINEAR MPC USING A GPU 3
2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
4 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020
2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
HYATT et al.: NONLINEAR MPC USING A GPU 5
T
X −1
J= T T
x̃i Qx̃i + ũi Rũi + x˜T T Qf x˜T (7)
i=0
where
x̃ = x − xgoal
(8)
ũ = u − ugoal
However, one of the strengths of our approach to solving
the MPC problem is that we can use any form of cost
function including those where the gradients are discontinuous
or undefined. For this reason, we denote the function used to
assign cost in Algorithm 1 as cost f unction(xi , ui , i).
During the selection phase, every control trajectory in the Fig. 5: Joint angle history during a swing-up task for the
population is simulated on the GPU from the current state. inverted pendulum. Because it does not have enough available
At each time step in the horizon, the state is forward simu- torque to lift the link directly, it swings back and forth to gain
lated using the learned model (represented by DN N (xi , ui ) momentum.
in Algorithm 1), then any additive cost is added using
cost f unction(xi , ui , i). By forward simulating the dynamics choose to test it on a simple case - an inverted pendulum with
in this way, dynamics constraints are enforced implicitly. State torque constraints. We compare with DP because DP should
boundary constraints can be added as a large penalty in the cost converge to the absolute optimal performance, providing a
function inside an if/else statement or can be saturated during lower bound on the cost possible for this task.
dynamic simulation. After all of the simulations on the GPU The task is for the controller to swing the pendulum into an
have terminated, there is a cost associated with each control upright position where θ = 0, starting from the θ = π position.
trajectory. The members of the population which are selected Because the torque is limited to 1 Nm, this is impossible to
to be parents of the next generation are those which have the do by applying a constant torque in one direction. The simple
lowest cost. At the end of the selection phase we have finished cost function defined by Equation 7 is used for both NEMPC
fitness evaluation, and are left with only the best trajectories and DP with R = 0 and Q weighting only position error with
and their associated costs. a value of 1. NEMPC was run with a horizon (T ) of 50 time
While most optimizations define a termination criteria based steps for NEMPC.
on the number of iterations, the cost, or cost derivatives, our For the sake of brevity we do not include all of the
NEMPC method returns an approximately optimal input u∗ details about implementing DP for a continuous system. We
after one iteration. We expect the population to converge to a simply state that we discretized the state and action spaces
better solution over time, but this strategy of “doing something and implemented a value iteration algorithm in order to find
good soon rather than something better later” ([2]) leads to the optimal policy. Online, we then used interpolation to
good performance as shown in Section III. implement this discrete policy for a continuous system. A good
reference for value iteration can be found [33], while details
III. E XPERIMENTS about DP for continuous systems can be found in [34].
Three experiments were designed and carried out in order The joint angle trajectories from simulating both NEMPC
to test the viability of the NEMPC approach and to evaluate and DP are depicted in Figure 5 while the commanded torques
its ability to find optimal control behaviors without trajectory are in Figure 6. Note that the straight down position can be
initialization. In order to judge the value of the approach, represented as an angle of π or −π.
comparisons were made in simulation to three state-of-the-
art optimal control algorithms - Dynamic Programming (DP), B. Simulated Three Link Robot Arm
Differential Dynamic Programming MPC (DDP MPC), and
This experiment is very similar to the inverted pendulum
Model Predictive Path Integral Control (MPPI). The final ex-
task, but with a three link robot instead of a single pendulum.
periment on hardware was carried out in order to demonstrate
Again, the task is to swing the arm from its completely stable
that this method scales well to high degree of freedom systems
equilibrium hanging down to the unstable equilibrium in an
and can still run in real-time. A video of all three experiments
upright position with all joints at θ = 0. Again, the available
can be found here: https://youtu.be/hrqdUXd-xJ4
torque (10 Nm) is not enough to swing the arm straight up
using a constant torque.
A. Simulated Inverted Pendulum Because DP does not scale easily and may not even be
In order to first determine if NEMPC allows us to find opti- feasible for some higher degree of freedom problems, we
mal behaviors for a nonlinear system with input constraints, we compare NEMPC to two nonlinear versions of MPC which
2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
6 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020
2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
HYATT et al.: NONLINEAR MPC USING A GPU 7
2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LRA.2020.2965393, IEEE Robotics
and Automation Letters
8 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2020
other parallelized global optimization methods such as interval [21] G. Williams, A. Aldrich, and E. Theodorou, “Model Predictive Path
analysis and particle swarm in order to more fully understand Integral Control using Covariance Variable Importance Sampling,” p. 8,
2015. [Online]. Available: http://arxiv.org/abs/1509.01149
the strength and weaknesses of these methods and how they [22] J. Rogers and N. Slegers, “Robust Parafoil Terminal Guidance Using
could be applied. Massively Parallel Processing,” Journal of Guidance, Control, and
Dynamics, vol. 36, no. August, pp. 1336–1345, 2013. [Online].
R EFERENCES Available: http://arc.aiaa.org/doi/abs/10.2514/1.59782
[23] K. M. M. Rathai, O. Sename, and M. Alamir, “GPU-based parameterized
[1] K. A. De Jong, “Analysis of the behavior of a class of genetic adaptive nmpc scheme for control of half car vehicle with semi-Active suspension
systems,” 1975. system,” IEEE Control Systems Letters, vol. 3, no. 3, pp. 631–636, 2019.
[2] Y. Wang and S. Boyd, “Fast Model Predictive Control Using Online Op- [24] P. Hyatt and M. D. Killpack, “Real-Time Evolutionary Model Predictive
timization,” IEEE Transactions on Control Systems Technology, vol. 18, Control Using a Graphics Processing Unit,” Humanoids 2017, 2017.
no. 2, pp. 267–278, mar 2010. [25] M. T. Gillespie, C. M. Best, E. C. Townsend, D. Wingate, and M. D.
[3] L. Rupert, P. Hyatt, and M. D. Killpack, “Comparing Model Predictive Killpack, “Learning nonlinear dynamic models of soft robots for model
Control and input shaping for improved response of low-impedance predictive control with neural networks,” 2018 IEEE International
robots,” no. June 2017, 2015. Conference on Soft Robotics (RoboSoft), pp. 39–45, 2018. [Online].
[4] C. M. Best, M. T. Gillespie, P. Hyatt, L. Rupert, V. Sherrod, and Available: https://ieeexplore.ieee.org/document/8404894/
M. D. Killpack, “A New Soft Robot Control Method: Using Model [26] P. Hyatt, D. Wingate, and M. D. Killpack, “Model-Based Control of Soft
Predictive Control for a Pneumatically Actuated Humanoid,” IEEE Actuators Using Learned Non-linear Discrete-Time Models,” Frontiers
Robotics & Automation Magazine, vol. 23, no. 3, pp. 75–84, 2016. in Robotics and AI, vol. 6, no. April, pp. 1–11, 2019. [Online]. Available:
[Online]. Available: http://ieeexplore.ieee.org/document/7551190/ https://www.frontiersin.org/article/10.3389/frobt.2019.00022/full
[5] J. S. Terry, L. Rupert, and M. D. Killpack, “Effect of Simplified Robot [27] S. Levine and P. Abbeel, “Learning neural network policies with
Dynamic Models on Model Predictive Control Performance,” IEEE-RAS guided policy search under unknown dynamics,” in Advances in Neural
International Conference on Humanoid Robots, 2017. Information Processing Systems, 2014, pp. 1071–1079.
[6] W. Li and E. Todorov, “Iterative Linear Quadratic Regulator [28] T. Zhang, G. Kahn, S. Levine, and P. Abbeel, “Learning deep control
Design for Nonlinear Biological Movement Systems,” Proceedings policies for autonomous aerial vehicles with mpc-guided policy search,”
of the First International Conference on Informatics in Control, in 2016 IEEE international conference on robotics and automation
Automation and Robotics, no. January 2004, pp. 222–229, 2004. (ICRA). IEEE, 2016, pp. 528–535.
[Online]. Available: http://www.scitepress.org/DigitalLibrary/Link.aspx? [29] I. Lenz, R. Knepper, and A. Saxena, “Deepmpc: Learning deep latent
doi=10.5220/0001143902220229 features for model predictive control,” in Robotics Science and Systems
[7] Y. Tassa, N. Mansard, and E. Todorov, “Control-limited differential (RSS), 2015.
dynamic programming,” Proceedings - IEEE International Conference [30] D. M. Bodily, “Design Optimization and Motion Planning For
on Robotics and Automation, pp. 1168–1175, 2014. Pneumatically-Actuated Manipulators,” 2017.
[8] E. Todorov and Weiwei Li, “A generalized iterative LQG method for [31] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
locally-optimal feedback control of constrained nonlinear stochastic for biomedical image segmentation,” in Medical Image Computing and
systems,” Proceedings of the 2005, American Control Conference, Computer-Assisted Intervention – MICCAI 2015. Springer International
2005., pp. 300–306. [Online]. Available: http://ieeexplore.ieee.org/ Publishing, 2015, pp. 234–241.
document/1469949/ [32] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,
[9] J. Koenemann, A. Del Prete, Y. Tassa, E. Todorov, O. Stasse, M. Ben- A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in
newitz, and N. Mansard, “Whole-body model-predictive control applied pytorch,” in NIPS-W, 2017.
to the HRP-2 humanoid,” IEEE International Conference on Intelligent [33] S. Thrun, W. Burgard, and D. Fox, Probabilistic robotics. MIT press,
Robots and Systems, vol. 2015-Decem, pp. 3346–3351, 2015. 2005.
[10] Y. Tassa, T. Erez, and E. Todorov, “Synthesis and Stabilization of [34] R. Cory and R. Tedrake, “Experiments in fixed-wing uav perching,” in
Complex Behaviors through Online Trajectory Optimization,” pp. 4906– AIAA Guidance, Navigation and Control Conference and Exhibit, 2008,
4913, 2012. p. 7256.
[11] E. Todorov, T. Erez, and Y. Tassa, “MuJoCo: A physics engine for [35] D. Rus and M. T. Tolley, “Design, fabrication and control of soft robots,”
model-based control,” IEEE International Conference on Intelligent Nature, vol. 521, no. 7553, pp. 467–475, 2015.
Robots and Systems, pp. 5026–5033, 2012. [36] P. Hyatt, D. Kraus, V. Sherrod, L. Rupert, N. Day, and M. D. Killpack,
[12] M. Giftthaler, M. Neunert, M. Stäuble, J. Buchli, and M. Diehl, “A “Configuration estimation for accurate position control of large-scale
Family of Iterative Gauss-Newton Shooting Methods for Nonlinear soft robots,” IEEE/ASME Transactions on Mechatronics, vol. 24, no. 1,
Optimal Control,” 2017. [Online]. Available: http://arxiv.org/abs/1711. pp. 88–99, 2018.
11006
[13] M. Neunert, M. Stäuble, M. Giftthaler, C. D. Bellicoso, J. Carius,
C. Gehring, M. Hutter, and J. Buchli, “Whole-Body Nonlinear
Model Predictive Control Through Contacts for Quadrupeds,” 2017.
[Online]. Available: http://arxiv.org/abs/1712.02889{%}0Ahttp://dx.doi.
org/10.1109/LRA.2018.2800124
[14] B. Plancher and S. Kuindersma, “A Performance Analysis of Parallel
Differential Dynamic Programming on a GPU,” International Workshop
on the Algorithmic Foundations of Robotics (WAFR), 2018.
[15] K. M. Abughalieh and S. G. Alawneh, “A survey of parallel implemen-
tations for model predictive control,” IEEE Access, vol. 7, pp. 34 348–
34 360, 2019.
[16] N. Parikh and S. Boyd, “Block splitting for distributed optimization,”
Mathematical Programming Computation, vol. 6, no. 1, pp. 77–102,
2014.
[17] L. Yu, A. Goldsmith, and S. D. Cairano, “Efficient Convex Optimization
on GPUs for Embedded Model Predictive Control Categories and
Subject Descriptors,” 2017.
[18] M. Maggioni, “Sparse Convex Optimization on GPUs,” Thesis, 2015.
[Online]. Available: http://hdl.handle.net/10027/20173
[19] P. Cottle, “Massively Parallel Non-Convex Optimization on the GPU
Through the Graphics Pipeline.”
[20] G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou,
“Aggressive driving with model predictive path integral control,” Pro-
ceedings - IEEE International Conference on Robotics and Automation,
vol. 2016-June, pp. 1433–1440, 2016.
2377-3766 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.