Trajectory-Control Using Deep System Identification and Model Predictive Control For Drone Control Under Uncertain Load
Trajectory-Control Using Deep System Identification and Model Predictive Control For Drone Control Under Uncertain Load
Trajectory-Control Using Deep System Identification and Model Predictive Control For Drone Control Under Uncertain Load
Abstract—Machine learning allows to create complex models of prioritizing data samples during model training to alleviate
if provided with enough data, hence challenging more traditional these difficulties.
system identification methods. We compare the quality of neural After presenting the control algorithm used, we compare
networks and an ARX model when use in an model predictive
control to command a drone in a simulated environment. The three methods for identifying the system: a traditional linear
training of neural networks can be challenging when the data approach using the ARX algorithm, a standard machine learn-
is scarce or datasets are unbalanced. We propose an adaptation ing approach where we train a neural network on a dataset
of prioritized replay to system identification in order to mitigate without preprocessing and the same neural network which we
these problems. We illustrate the advantages and limits of this trained using a new method inspired from [12]. The dynamic
training method on the control task of a simulated drone.
Index Terms—identification, model predictive control, neural model is then used by the MPC to sample trajectories as
networks, learning in [13].
We show that the system model obtained by the neural
I. I NTRODUCTION network, especially using prioritized sampling, performs better
in term of multistep errors. We test such model for the control
The use of Unmanned Aerial Vehicles (UAV) for various of a simulated drone in two different situations. First, we use
inspection tasks is more and more appealing. Tasks such the drone in normal conditions then we add a suspended mass
as precision agriculture or building inspection are benefiting to the drone to disturb its dynamics. We discuss the gain of
greatly from the improvement of aerial robotic capabilities. the neural network model in regard to its computational cost.
However, most drone applications require an expert to ac-
complish their mission. This requirement can be a serious II. R ELATED W ORK
limitation in many situations. For example, underground mines To allow for more autonomy, control algorithms are con-
inspection doesn’t allow for real time communications. In tinuously developed and improved, in particular MPC has
such case autonomous control of the aerial robots becomes been used for the control of drones in various settings with
a necessity. success [3], [10]. MPC are based on the online optimization
Trajectory planification is an active area of research. Model of cost functions. In our implementation we use a version of
Predictive Control (MPC) is a control algorithm that is widely that algorithm called Model Predictive Path Integral (MPPI)
used in this context [9], [11], [15]. MPC is built on top of described in [13] which is able to take into account complex
a model of the controlled system. The design of the model cost functions. The flexibility given by the design of the cost
can be achieved by different methods. Model identification function allows to implement both objectives and constraints
is one of them, which works well on a wide variety of which is very useful in drones control as mission and security
systems. Recently machine learning and neural networks have often are in competition.
challenged the traditional system identification tools. Machine learning has been used with great results in recent
We study here the advantages and limitations of using neural years in several control settings [4], [15]. Neural network have
networks as the system model in the control algorithm. We shown capabilities to model complex systems which make
discuss the challenge of data generation as neural networks them a tools of choice for the implementation of predictive
require a large quantity of data to be trained. We investigate systems. However to be able to train these networks a lot
the importance of the data quality and explore the possibility of data is necessary which implies an important amount of
demonstration of the system in its environment.
This work is done under the Grande Region rObotique aerienNE (GRoNe)
project, funded by a European Union Grant thought the FEDER INTERREG VA A way to lessen this burden is to collect data from the
initiative and the french “Grand Est” Region. system while the algorithm is running as demonstrated in [14].
754
action space (command send to the drone) is less explored. Algorithm 2 Prioritized Sampling
Exploration, especially when guided by an expert, can lead Require: data, K number of trial and Task
to unbalanced dataset. This means that some dynamic will be trainingData ← data
very well covered (moving in a given plane) while other will sampleW eight ← ∅
be rare (altitude variation for instance). for k = 0 to K do
In [14], it is shown that a way to solve the problem is F ← T rain(trainingData)
to retrain the model as the experiment is being conducted, newdata ← T ask(controller(F ))
training the model on a growing dataset of samples relevant data ← data ∪ newdata
to the task. However in that setting most of the information N number of sample in data
contained in the newer dataset are already samples that are for i = 0 to N do
correctly handled by the previous model. Going in straight δi ← Yi − F (Xi , Ui )
line is the most common behavior for a drone and we keep
δα
spending time learning to do that, which is not very efficient. P (i) ← P i δα
k k
The unbalanced dataset problem has been encountered in β
the Reinforcement Learning (RL) settings by the DQN al- wi ← N1 P 1(i)
gorithm [8]. While trying to solve the Atari game, it uses
a buffer for training. However the information in that buffer sampleW eighted ← {wi }0≤i≤N
is not evenly distributed among the samples. By prioritizing trainingData ← sample data di ∼ P (i)
some samples over others, it was shown [12] that it is possible end for
to accelerate the training, using less data while conserving the end for
same performance. We propose to use a similar approach here.
To prioritize samples, a measure of their importance is
needed. How much can be learned from a given sample is a tum simulator2 which provides the same interface as the real
rather hard question. In the RL settings the temporal difference Parrot drone3 . The implementation makes extensive use of the
error is what seems to be the most logical choice. In our Robotic Operating System (ROS)4 framework of which we
context, we use the distance δi between the prediction of our used the kinetic version. Controllers run at 5 Hz as it is the
model and the actual observation (of sample i), following the rate at which the drone driver publishes the odometry.
idea that our model as more to learn from samples where its The experiment are done on a Linux 4.13 using the Ubuntu
prediction fails the most. We then use equation (3) to draw a distribution on an Intel i5-6200U CPU with 8GB of RAM in
new collection of sample from our original dataset by picking DDR3.
event i with probability P (i). The system we are trying to identify is the drone using
δα the low level Parrot drivers. Regarding data generation, three
P (i) = P i α (3) different approaches are taken.
k δk
First a simple Proportional Integral Derivative (PID) is used
The α hyper-parameter allows to soften the prioritization.
to fly the drone in a square defined by four destination points.
Choosing α = 0 gives the uniform distribution, in that case
This provides an easy way to generate data of the system in
there is no prioritization at all. While a higher α value may
closed loop but does not explore all the dynamic of the drone
encourage learning on edge cases.
as discussed in section IV-B2.
One problem is that we are now trying to learn from a
Secondly, in order to generate a dataset that better explores
different distribution than the one we had before prioritization.
the action space, we design a dummy planner that goes in a
We correct the bias induced by the prioritization by using
random direction with random speed. For stability, constraints
importance-sampling [5] weights :
preventing the drone from crashing were included. The planner
1 1 β is designed for uniformly sampling the action space. Due to
wi = (4)
N P (i) these constraints, the vertical dynamics might be a little more
With β = 1 the prioritized sampling bias is completely represented: to prevent crashing, the drone is asked to regain
corrected but it also slow down the learning. The α parameter altitude if its current one is too low. This generate data in open
increases the aggressiveness of the prioritization while the β loop as there is no feedback in this controller.
parameter increase the correction thus, there is an equilibrium Lastly, we also collect data of the system in closed loop
to find between both. with the MPPI controller while it performs the task. The task
consists in the following of a trajectory that goes around a
IV. R ESULTS cube in order to solicit dynamics on the three dimensions of
A. Experimental settings space.
Our test are conducted in a simulated environment us-
2
ing the gazebo1 simulator. The UAV is simulated using the http://http://wiki.ros.org/tum simulator
3 Copyright ©2016 Parrot Drones SA. All Rights Reserved
1 http://gazebosim.org/ 4 http://www.ros.org/
755
B. Model identification
vx prediction and vx observed
2.0
We are modeling the dynamic of the system. The input used
predictions_nn
are the velocity vx , vy , vz , vrz and command ux , uy , uz , urz of prediction_arx
1.5
the two previous timestep, t and t − dt if we want to predict observations
1) ARX and Neural Network: For the ARX model, we use −1.0
Both the ARX model and the neural network model are trained
with the same data. Here, we use a balanced dataset for which 0.0
756
error on the following task over several episode
max error of classic NN
2.0 average error of classic NN
max error of prioritized NN
average error of prioritized NN
1.5
Error (m)
1.0
0.5
0 1 2 3 4
Episode
Figure 3. comparison of Neural network with and without prioritized replay Figure 4. comparison of neural networks with or without prioritized replay
after 8 successive training run in term of multistep prediction error in term of maximum and average error.
β = 0.4. We show that the neural network using prioritized One advantage of the neural networks is their capacity
sampling is able to faster use the new data and learn a better to keep improving as new data are available for them to
model in terms of multistep error in figure 3. The two networks train on. In order to compare the capability of the prioritized
are evaluated after having both been retrained eight times on training to the normal training, we evaluate the controller on
the growing dataset. ARX is shown for comparison. the control task over several episodes. Between each episode,
the drone land and the network is trained again on a dataset
combining the data used to train the model initially to witch
C. Control we add the data collected during the previous episodes. The
To evaluate the quality of the models in term of control result of that experiment is depicted in figure 4. The neural
we use them as part of an MPPI controller. We evaluate the network using resampling prioritization on the dataset is able
controller in a trajectory following task. The trajectory used to achieve better performance both in average and maximum
is the following of the edges of a cube such that all dynamics error. As the dataset grows from the task, it also becomes more
(vx, vy and vz) are used. In our experiment the target moves unbalanced as the task presents a lot of straight line and only
at constant speed. occasional changes of direction. This has a negative influence
First, we consider the models we previously evaluated on on the classical training of the neural networks but not on the
multistep prediction and evaluate them. To this end, we use prioritized version.
the distance between the targeted pose from the trajectory The main advantage of using neural networks is there ability
and the effective pose of the drone as the error. The result to model very complex systems. In the previous experiment,
of this evaluation is shown in table I. Both neural networks the controller is implemented on top of a low level driver.
perform better than the ARX model. It is important to note Thus, the system model needed by the MPC is almost linear. In
that for this experiment all the rate where set to 5 Hz. It order to test our implementation on a more challenging system,
is possible to increase ARX performance by increasing the we add a suspended mass below our drone. This makes the
controller rate. It is harder to do so for the neural networks dynamic of the drone much more complex. Our drone mass is
because this would require to change the hardware on which 1.477 kg and we add a mass of 150 g. The distance between
they run. Indeed while the computational cost of the ARX the drone and the mass is constant and we constrain the mass
model prediction is negligible compare to the overall MPPI to stay in a cone of π3 rad below the drone. In order for our
controller, the neural network forward propagation pushes the system to be able to model the system, we increase the history
whole controller computational time just below the 200 ms it has access to. Until now only the two previous step were
that are available. considered (i.e. 400 ms of history). In order to capture the
effect of the mass we increase this history to ten steps (i.e. 2 s
of history).
Table I
MAXIMUM AND AVERAGE ERROR
First, we compare an ARX model and a neural network
model (without retraining) for the same task as previously.
ARX classical NN prioritized NN The neural networks performs better if we keep both controller
max error [m] 1.789 0.882 0.819
average error [m] 1.272 0.562 0.508 running at the same rate.
To evaluate the interest of the prioritized retraining we use
757
Table II account the state history, the use of Recurrent Neural Networks
MAXIMUM AND AVERAGE ERROR . (RNN) might be interesting.
ARX Neural Network Another avenue of investigation about the unbalanced drone
max error [m] 2.250 1.536 is to wonder about the deterministic nature of the problem.
average error [m] 1.220 0.946 Indeed, the perturbation produced by the mass might make the
problem stochastic, which would require a different approach.
R EFERENCES
error on the following task over several episode
2.0
max error of classic NN [1] Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng
average error of classic NN Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu
1.8 Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey
max error of prioritized NN
average error of prioritized NN Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,
1.6 Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry
Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens,
Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent
1.4
Error (m)
758