0% found this document useful (0 votes)
11 views6 pages

Trajectory-Control Using Deep System Identification and Model Predictive Control For Drone Control Under Uncertain Load

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 6

2018 22nd International Conference on System Theory, Control and Computing (ICSTCC)

Trajectory-control using deep system identification


and model predictive control for drone control
under uncertain load.
Antoine Mahé Cédric Pradalier Matthieu Geist
CentraleSupélec, Université de Lorraine, UMI2958 GT-CNRS Université de Lorraine,
CNRS, LORIA F-57000 Metz, France CNRS, LIEC
F-57000 Metz, France cedric.pradalier@georgiatech-metz.fr F-57000 Metz, France
antoine-robin.mahe@centralsupelec.fr matthieu.geist@univ-lorraine.fr

Abstract—Machine learning allows to create complex models of prioritizing data samples during model training to alleviate
if provided with enough data, hence challenging more traditional these difficulties.
system identification methods. We compare the quality of neural After presenting the control algorithm used, we compare
networks and an ARX model when use in an model predictive
control to command a drone in a simulated environment. The three methods for identifying the system: a traditional linear
training of neural networks can be challenging when the data approach using the ARX algorithm, a standard machine learn-
is scarce or datasets are unbalanced. We propose an adaptation ing approach where we train a neural network on a dataset
of prioritized replay to system identification in order to mitigate without preprocessing and the same neural network which we
these problems. We illustrate the advantages and limits of this trained using a new method inspired from [12]. The dynamic
training method on the control task of a simulated drone.
Index Terms—identification, model predictive control, neural model is then used by the MPC to sample trajectories as
networks, learning in [13].
We show that the system model obtained by the neural
I. I NTRODUCTION network, especially using prioritized sampling, performs better
in term of multistep errors. We test such model for the control
The use of Unmanned Aerial Vehicles (UAV) for various of a simulated drone in two different situations. First, we use
inspection tasks is more and more appealing. Tasks such the drone in normal conditions then we add a suspended mass
as precision agriculture or building inspection are benefiting to the drone to disturb its dynamics. We discuss the gain of
greatly from the improvement of aerial robotic capabilities. the neural network model in regard to its computational cost.
However, most drone applications require an expert to ac-
complish their mission. This requirement can be a serious II. R ELATED W ORK
limitation in many situations. For example, underground mines To allow for more autonomy, control algorithms are con-
inspection doesn’t allow for real time communications. In tinuously developed and improved, in particular MPC has
such case autonomous control of the aerial robots becomes been used for the control of drones in various settings with
a necessity. success [3], [10]. MPC are based on the online optimization
Trajectory planification is an active area of research. Model of cost functions. In our implementation we use a version of
Predictive Control (MPC) is a control algorithm that is widely that algorithm called Model Predictive Path Integral (MPPI)
used in this context [9], [11], [15]. MPC is built on top of described in [13] which is able to take into account complex
a model of the controlled system. The design of the model cost functions. The flexibility given by the design of the cost
can be achieved by different methods. Model identification function allows to implement both objectives and constraints
is one of them, which works well on a wide variety of which is very useful in drones control as mission and security
systems. Recently machine learning and neural networks have often are in competition.
challenged the traditional system identification tools. Machine learning has been used with great results in recent
We study here the advantages and limitations of using neural years in several control settings [4], [15]. Neural network have
networks as the system model in the control algorithm. We shown capabilities to model complex systems which make
discuss the challenge of data generation as neural networks them a tools of choice for the implementation of predictive
require a large quantity of data to be trained. We investigate systems. However to be able to train these networks a lot
the importance of the data quality and explore the possibility of data is necessary which implies an important amount of
demonstration of the system in its environment.
This work is done under the Grande Region rObotique aerienNE (GRoNe)
project, funded by a European Union Grant thought the FEDER INTERREG VA A way to lessen this burden is to collect data from the
initiative and the french “Grand Est” Region. system while the algorithm is running as demonstrated in [14].

978-1-5386-4444-7/18/$31.00 ©2018 IEEE 753


The model is first only trained with minimal demonstrations. Algorithm 1 Model Predictive Path Integral
The accuracy of the neural network model is then improved Require: F : Dynamic model
by collecting more data as the system operates and reuses it T : number of timesteps
in a training session. K : number of sampled trajectories
The data collection problem is often encountered in learning φ : cost function
settings. For example the Deep Q-Networks (DQN) algorithm ut : commands sent at step t
that solves Atari games [8] needs millions of examples to be st : state at step t
effective. An interesting way to alleviate this problem has been U = u1 , u2 , . . . , uT : initial control sequence
developed in [12] where the samples are given importance
depending on how much they are supposed to help the learning Sample k = 1k , 2k , . . . , Tk ∼ N (µ, σ 2 )
process. This is closely linked to importance sampling which for k = 0 to K − 1 do
also uses weighted samples [5]. for t = 1 to T do
In our case we propose an improvement on the retraining ut = ut + tk
done in [14] by using prioritized sampling. This allows to st+dt ← F (st , ut )
improve sample efficiency and to counter the negative impact end for
of unbalanced datasets. To alleviate the bias induced by our Sk ={st for t in [0, T ]}
prioritization we combine it with importance sampling as Ck ← φ(Sk )
in [12]. end for
β ← mink [Ck ]
III. METHOD
A. Control PK−1
η← k=0 exp(−(Ck − β))
MPC uses a model of the system dynamic to predict the
behavior of the system. for k = 0 to K − 1 do
wk ← η1 exp(−(Ck − β))
Xt+dt = F (Xt , Ut ). (1)
end for
Given the state Xt and a command Ut , the model F provides for t = 1 to TPdo
K
the next state Xt+dt . More specifically, in our drone settings ut = ut + k=1 wk tk
our state Xt is composed of a position pt and a velocity vt . end for
The position update is done using simple kinematic update return U
while the velocity update is done using the identified model
f for the dynamic :
 B. Model Identification
pt+1 = pt + vt dt System identification is a central problem of every MPC
(2)
vt+1 = vt + f (vt , ut )dt implementation. This problem is usually solved using standard
This prediction is used to optimize a cost function over a algorithms such as ARX [7]. Machine learning algorithms, that
receding horizon. The first command calculated is the only one are used with a lot of success in vision, are also very powerful
applied from the optimization before optimizing again from to achieve system modeling. By using neural network in a
the new step. This lets the controller take into account future regression setting, it is possible to model with relative ease
events while computing the next command. It is also possible even non-linear systems.
to consider the n-first commands instead of the first one. This One of the main problem of these learning methods for
allows for a more flexible time window for the processing of identification is that they are very dependent on the quantity
the next command. In our implementation, we use a command and quality of the data used during the training of the model.
buffer to ensure the rate of the control. The impact on the In our case, we want a model of the dynamic of the system,
performance for using n-step predictions instead of one step the f function in the equation (2). To be able to do so we need
prediction is far less than the impact of uneven rate control. to gather data of the drone in as many situations as possible.
We use here the MPPI version developed in [14] which In simulation, data can be gathered by recording the drone
proposes to calculate the desired control from the evaluation of flying while sending the right command to generate any data
sampled trajectories. The approach is described in algorithm 1. needed. However, it is not always the case. Indeed there can
The prediction of the next step state as described here is be a lot of constraints in experiments, particularly when trying
done using the last step state and command. This can be to solve real world problems. Battery charge being limited
generalized to using the N last step states and commands. shorten the time available for data gathering. Security and
The cost of a trajectory φ(Sk ) is computed as the sum of regulation are also to be taken into consideration. Moreover
the cost of the states of the trajectory. In our case, each state the cost of crashing the drone may limit the range of actions
cost is calculated as the euclidean distance between the actual available.
drone pose and the desire pose. It could also be derived from All these limitations have two main consequences for
a cost map or be a combination of different factors. dataset generation. First the data is more scarce, second the

754
action space (command send to the drone) is less explored. Algorithm 2 Prioritized Sampling
Exploration, especially when guided by an expert, can lead Require: data, K number of trial and Task
to unbalanced dataset. This means that some dynamic will be trainingData ← data
very well covered (moving in a given plane) while other will sampleW eight ← ∅
be rare (altitude variation for instance). for k = 0 to K do
In [14], it is shown that a way to solve the problem is F ← T rain(trainingData)
to retrain the model as the experiment is being conducted, newdata ← T ask(controller(F ))
training the model on a growing dataset of samples relevant data ← data ∪ newdata
to the task. However in that setting most of the information N number of sample in data
contained in the newer dataset are already samples that are for i = 0 to N do
correctly handled by the previous model. Going in straight δi ← Yi − F (Xi , Ui )
line is the most common behavior for a drone and we keep
δα
spending time learning to do that, which is not very efficient. P (i) ← P i δα
k k
The unbalanced dataset problem has been encountered in  β
the Reinforcement Learning (RL) settings by the DQN al- wi ← N1 P 1(i)
gorithm [8]. While trying to solve the Atari game, it uses
a buffer for training. However the information in that buffer sampleW eighted ← {wi }0≤i≤N
is not evenly distributed among the samples. By prioritizing trainingData ← sample data di ∼ P (i)
some samples over others, it was shown [12] that it is possible end for
to accelerate the training, using less data while conserving the end for
same performance. We propose to use a similar approach here.
To prioritize samples, a measure of their importance is
needed. How much can be learned from a given sample is a tum simulator2 which provides the same interface as the real
rather hard question. In the RL settings the temporal difference Parrot drone3 . The implementation makes extensive use of the
error is what seems to be the most logical choice. In our Robotic Operating System (ROS)4 framework of which we
context, we use the distance δi between the prediction of our used the kinetic version. Controllers run at 5 Hz as it is the
model and the actual observation (of sample i), following the rate at which the drone driver publishes the odometry.
idea that our model as more to learn from samples where its The experiment are done on a Linux 4.13 using the Ubuntu
prediction fails the most. We then use equation (3) to draw a distribution on an Intel i5-6200U CPU with 8GB of RAM in
new collection of sample from our original dataset by picking DDR3.
event i with probability P (i). The system we are trying to identify is the drone using
δα the low level Parrot drivers. Regarding data generation, three
P (i) = P i α (3) different approaches are taken.
k δk
First a simple Proportional Integral Derivative (PID) is used
The α hyper-parameter allows to soften the prioritization.
to fly the drone in a square defined by four destination points.
Choosing α = 0 gives the uniform distribution, in that case
This provides an easy way to generate data of the system in
there is no prioritization at all. While a higher α value may
closed loop but does not explore all the dynamic of the drone
encourage learning on edge cases.
as discussed in section IV-B2.
One problem is that we are now trying to learn from a
Secondly, in order to generate a dataset that better explores
different distribution than the one we had before prioritization.
the action space, we design a dummy planner that goes in a
We correct the bias induced by the prioritization by using
random direction with random speed. For stability, constraints
importance-sampling [5] weights :
preventing the drone from crashing were included. The planner
 1 1 β is designed for uniformly sampling the action space. Due to
wi = (4)
N P (i) these constraints, the vertical dynamics might be a little more
With β = 1 the prioritized sampling bias is completely represented: to prevent crashing, the drone is asked to regain
corrected but it also slow down the learning. The α parameter altitude if its current one is too low. This generate data in open
increases the aggressiveness of the prioritization while the β loop as there is no feedback in this controller.
parameter increase the correction thus, there is an equilibrium Lastly, we also collect data of the system in closed loop
to find between both. with the MPPI controller while it performs the task. The task
consists in the following of a trajectory that goes around a
IV. R ESULTS cube in order to solicit dynamics on the three dimensions of
A. Experimental settings space.
Our test are conducted in a simulated environment us-
2
ing the gazebo1 simulator. The UAV is simulated using the http://http://wiki.ros.org/tum simulator
3 Copyright ©2016 Parrot Drones SA. All Rights Reserved
1 http://gazebosim.org/ 4 http://www.ros.org/

755
B. Model identification
vx prediction and vx observed
2.0
We are modeling the dynamic of the system. The input used
predictions_nn
are the velocity vx , vy , vz , vrz and command ux , uy , uz , urz of prediction_arx
1.5
the two previous timestep, t and t − dt if we want to predict observations

speed alongside the x axis (m/s)


the t + dt step. Here dt is the timestep. The output of the 1.0
model F is the velocity at next timestep :
0.5

 vx , vy , vz , vrz (t − dt)
0.0

ux , uy , uz , urz (t − dt) F


→ vx , vy , vz , vrz (t + dt) (5)
 vx , vy , vz , vrz (t)
 −0.5
ux , uy , uz , urz (t)

1) ARX and Neural Network: For the ARX model, we use −1.0

a second order model. The neural network used is composed


−1.5
of a succession of an input dense layer, two hidden dense layer 0 5 10 15 20
Time (s)
and an output dense layer. The input layer is composed of 16
nodes corresponding to the input variable described in (5). The
Figure 1. comparison of ARX and neural network model on linear dynamic.
hidden layers have 32 nodes each and the output layer have 4,
corresponding to the desired state output. Both hidden layer
use Rectified Linear Unit (ReLU) activation. The output layer vz prediction and vz observed
0.6
uses a linear activation. The loss used for the training is the
predictions_nn
mean square error, and the optimizer is ADAM [6]. prediction_arx
That network is implemented using the Keras [2] python speed alongside the z axis (m/s) 0.4 observations
framework on top of the TensorFlow library [1].
First we compare our neural network and an ARX network. 0.2

Both the ARX model and the neural network model are trained
with the same data. Here, we use a balanced dataset for which 0.0

all the dynamics are explored with similar frequencies. The


dataset is obtained using the “dummy planner” previously −0.2

mentioned. To evaluate their performance, we measure the


error between their prediction and the actual observation. −0.4
In the figure 1, we see that they both perform very well on
the vx dynamic. As this dynamic is linear this is expected. −0.6
0 5 10 15 20
The average error on the test set for the neural network is Time (s)
0.064 m/s for the velocity along the x axis. This is twice as
good as the ARX performance that has an average error on the Figure 2. comparison of ARX and neural network model on non linear
same dataset of 0.146 m/s for this axis. This is reasonable as dynamic.
such error in our condition would translate for the controller
into a position error for one step of 3 cm.
In the figure 2 we can see that for a more complex dynamics alleviate this burden we use the prioritized retraining method
such as the vertical one in which the low level controller has described in III-B.
to counteract the gravity, the gap between the neural network To highlight the interest of prioritized sampling, we compare
and the ARX model is more important. Along that axis the the different models in terms of multistep error. Each model
average error of the neural network is 0.045 m/s where it is is used to simulate the trajectory of the drone over 20 steps
0.107 m/s for the ARX model. Notice that the range of speed (4 s) while a constant command is applied. The applied speed
for vz (between −0.6 m/s to 0.6 m/s) is more narrow than command (ux = 0.5 m/s, uy = 0.5 m/s, uz = 0.1 m/s, urz =
for vx (between −1.5 m/s to 2 m/s). 1 rad/s) is chosen such that all the dimensions of the dynamics
2) Neural Network with prioritized sampling: In the previ- are affected.
ous section, neural networks were shown to be able to perform We first train the networks on a very unbalanced dataset
a more precise model identification (in terms of one-step generated with a PID which does not explore all the action
prediction error) than the ARX algorithm. To achieve this, space. Then we progressively add samples with more dynamic
a rich dataset of 60468 samples, equivalent to about 3 hours features. We construct the dataset such as the first couple of
and 20 minutes of navigation data, was used. This data was training exposes the network to only vx and vy commands and
generated in order to obtain a balanced dataset containing all then progressively add rotation and vertical commands. The
the required dynamic. This implies important limitations for choice of hyperparameters is important here as it affects the
the applicability of neural network for system identification. To performance. In this case we obtain the result with α = 0.6 and

756
error on the following task over several episode
max error of classic NN
2.0 average error of classic NN
max error of prioritized NN
average error of prioritized NN

1.5

Error (m)
1.0

0.5

0 1 2 3 4
Episode

Figure 3. comparison of Neural network with and without prioritized replay Figure 4. comparison of neural networks with or without prioritized replay
after 8 successive training run in term of multistep prediction error in term of maximum and average error.

β = 0.4. We show that the neural network using prioritized One advantage of the neural networks is their capacity
sampling is able to faster use the new data and learn a better to keep improving as new data are available for them to
model in terms of multistep error in figure 3. The two networks train on. In order to compare the capability of the prioritized
are evaluated after having both been retrained eight times on training to the normal training, we evaluate the controller on
the growing dataset. ARX is shown for comparison. the control task over several episodes. Between each episode,
the drone land and the network is trained again on a dataset
combining the data used to train the model initially to witch
C. Control we add the data collected during the previous episodes. The
To evaluate the quality of the models in term of control result of that experiment is depicted in figure 4. The neural
we use them as part of an MPPI controller. We evaluate the network using resampling prioritization on the dataset is able
controller in a trajectory following task. The trajectory used to achieve better performance both in average and maximum
is the following of the edges of a cube such that all dynamics error. As the dataset grows from the task, it also becomes more
(vx, vy and vz) are used. In our experiment the target moves unbalanced as the task presents a lot of straight line and only
at constant speed. occasional changes of direction. This has a negative influence
First, we consider the models we previously evaluated on on the classical training of the neural networks but not on the
multistep prediction and evaluate them. To this end, we use prioritized version.
the distance between the targeted pose from the trajectory The main advantage of using neural networks is there ability
and the effective pose of the drone as the error. The result to model very complex systems. In the previous experiment,
of this evaluation is shown in table I. Both neural networks the controller is implemented on top of a low level driver.
perform better than the ARX model. It is important to note Thus, the system model needed by the MPC is almost linear. In
that for this experiment all the rate where set to 5 Hz. It order to test our implementation on a more challenging system,
is possible to increase ARX performance by increasing the we add a suspended mass below our drone. This makes the
controller rate. It is harder to do so for the neural networks dynamic of the drone much more complex. Our drone mass is
because this would require to change the hardware on which 1.477 kg and we add a mass of 150 g. The distance between
they run. Indeed while the computational cost of the ARX the drone and the mass is constant and we constrain the mass
model prediction is negligible compare to the overall MPPI to stay in a cone of π3 rad below the drone. In order for our
controller, the neural network forward propagation pushes the system to be able to model the system, we increase the history
whole controller computational time just below the 200 ms it has access to. Until now only the two previous step were
that are available. considered (i.e. 400 ms of history). In order to capture the
effect of the mass we increase this history to ten steps (i.e. 2 s
of history).
Table I
MAXIMUM AND AVERAGE ERROR
First, we compare an ARX model and a neural network
model (without retraining) for the same task as previously.
ARX classical NN prioritized NN The neural networks performs better if we keep both controller
max error [m] 1.789 0.882 0.819
average error [m] 1.272 0.562 0.508 running at the same rate.
To evaluate the interest of the prioritized retraining we use

757
Table II account the state history, the use of Recurrent Neural Networks
MAXIMUM AND AVERAGE ERROR . (RNN) might be interesting.
ARX Neural Network Another avenue of investigation about the unbalanced drone
max error [m] 2.250 1.536 is to wonder about the deterministic nature of the problem.
average error [m] 1.220 0.946 Indeed, the perturbation produced by the mass might make the
problem stochastic, which would require a different approach.
R EFERENCES
error on the following task over several episode
2.0
max error of classic NN [1] Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng
average error of classic NN Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu
1.8 Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey
max error of prioritized NN
average error of prioritized NN Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,
1.6 Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry
Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens,
Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent
1.4
Error (m)

Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete


Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang
1.2 Zheng. TensorFlow: Large-scale machine learning on heterogeneous
systems, 2015. Software available from tensorflow.org.
1.0
[2] François Chollet et al. Keras. https://keras.io, 2015.
[3] Jan Dentler, Somasundar Kannan, Miguel Angel Olivares Mendez, and
Holger Voos. A tracking error control approach for model predictive
0.8 position control of a quadrotor with time varying reference. In Robotics
and Biomimetics (ROBIO), 2016 IEEE International Conference on,
0.6 pages 2051–2056. IEEE, 2016.
0 1 2 3 4
Episode
[4] Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter. Con-
trol of a quadrotor with reinforcement learning. IEEE Robotics and
Automation Letters, 2(4):2096–2103, 2017.
Figure 5. comparison of neural networks with or without prioritized replay [5] Angelos Katharopoulos and François Fleuret. Not all samples are
in term of maximum and average error for a drone with uncertain load. created equal: Deep learning with importance sampling. arXiv preprint
arXiv:1803.00942, 2018.
[6] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic
optimization. CoRR, abs/1412.6980, 2014.
the same method as before, retraining the neural network [7] Lennart Ljung et al. Theory for the user. Prentice Hall, 1987.
between episodes. The result are shown in the figure 5. In this [8] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves,
case none of the network training yield a better result. There Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing
atari with deep reinforcement learning. CoRR, abs/1312.5602, 2013.
might be several factor contributing to this. First, the dynamic [9] Mark W Mueller and Raffaello D’Andrea. A model predictive controller
being much more complex might require an improvement for quadrocopter state interception. In Control Conference (ECC), 2013
on the architecture of the network. Alternatively, for this European, pages 1383–1389. IEEE, 2013.
[10] T. Naegeli, J. Alonso-Mora, A. Domahidi, D. Rus, and O. Hilliges.
experiment, the meta-parameters of the prioritized replay α Real-time motion planning for aerial videography with dynamic obstacle
and β have been kept from the previous settings but as the avoidance and viewpoint optimization. IEEE Robotics and Automation
problem differs a new parameter search might be needed. Letters, 2(3):1696–1703, July 2017.
[11] Gabriele Pannocchia. Offset-free tracking mpc: A tutorial review and
Finally, the low level controller was calibrated for the normal comparison of different formulations. In Control Conference (ECC),
drone and wasn’t modified when the mass was added; it is 2015 European, pages 527–532. IEEE, 2015.
possible that the interaction between the mass and the driver [12] Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Priori-
tized experience replay. arXiv preprint arXiv:1511.05952, 2015.
makes the dynamic much more stochastic than dynamic, which [13] Grady Williams, Paul Drews, Brian Goldfain, James M Rehg, and
is not something system identification will handle easily. Evangelos A Theodorou. Aggressive driving with model predictive
path integral control. In Robotics and Automation (ICRA), 2016 IEEE
V. C ONCLUSIONS International Conference on, pages 1433–1440. IEEE, 2016.
[14] Grady Williams, Nolan Wagener, Brian Goldfain, Paul Drews, James M
We proposed a method to efficiently train neural networks Rehg, Byron Boots, and Evangelos A Theodorou. Information theoretic
for the purpose of system identification based on sample mpc for model-based reinforcement learning.
prioritization. We have compared the results of this modeling [15] T. Zhang, G. Kahn, S. Levine, and P. Abbeel. Learning Deep Control
Policies for Autonomous Aerial Vehicles with MPC-Guided Policy
with the standard identification method ARX. We then tested Search. ArXiv e-prints, September 2015.
those models on a drone in a simulated environment and
discussed the limitations of the neural network based approach.
Studying the prioritizing meta-parameters is an interesting
subject for future considerations as they have an influence
on the results. However, an exhaustive grid search is too
expensive to be practical. An automatic method for choosing
these parameters would be of great value and might be helpful
for improving the unbalanced drone problem.
Moreover, there is room for improvement in terms of neural
network design. For example, in order to better take into

758

You might also like