0% found this document useful (0 votes)
219 views

A Review of Deep Learning Methods and Applications For PDF

Uploaded by

ershad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
219 views

A Review of Deep Learning Methods and Applications For PDF

Uploaded by

ershad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Hindawi

Journal of Sensors
Volume 2017, Article ID 3296874, 13 pages
https://doi.org/10.1155/2017/3296874

Review Article
A Review of Deep Learning Methods and Applications for
Unmanned Aerial Vehicles

Adrian Carrio, Carlos Sampedro, Alejandro Rodriguez-Ramos, and Pascual Campoy


Computer Vision and Aerial Robotics Group, Centre for Automation and Robotics (CAR) UPM-CSIC,
Universidad Politécnica de Madrid, Calle José Gutiérrez Abascal 2, 28006 Madrid, Spain

Correspondence should be addressed to Adrian Carrio; [email protected]

Received 28 April 2017; Accepted 18 June 2017; Published 14 August 2017

Academic Editor: Vera Tyrsa

Copyright © 2017 Adrian Carrio et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Deep learning is recently showing outstanding results for solving a wide variety of robotic tasks in the areas of perception, planning,
localization, and control. Its excellent capabilities for learning representations from the complex data acquired in real environments
make it extremely suitable for many kinds of autonomous robotic applications. In parallel, Unmanned Aerial Vehicles (UAVs) are
currently being extensively applied for several types of civilian tasks in applications going from security, surveillance, and disaster
rescue to parcel delivery or warehouse management. In this paper, a thorough review has been performed on recent reported uses
and applications of deep learning for UAVs, including the most relevant developments as well as their performances and limitations.
In addition, a detailed explanation of the main deep learning techniques is provided. We conclude with a description of the main
challenges for the application of deep learning for UAV-based solutions.

1. Introduction Computer Vision tasks but also in other tasks such as


speech recognition, signal processing, and natural language
Recent successes of deep learning techniques in solving many processing [5]. More details about recent advances in deep
complex tasks by learning from raw sensor data have created learning can be found in [6, 7].
a lot of excitement in the research community. However, An evidence of the suitability of deep learning for many
deep learning is not a recent technology. It started being kinds of autonomous robotic applications is the increasing
used back in 1971, when Ivakhnenko [1] trained an 8-layer trend in deep learning robot related scientific publications
neural network using the Group Method of Data Handling over the past decades, which is expected to continue growing
(GMDH) algorithm. The term deep learning began to be [8].
used during the 2000s, when Convolutional Neural Networks Due to the versatility, automation capabilities, and low
(CNNs), a computational original model from the 80s [2] but cost of Unmanned Aerial Vehicles (UAVs), civilian applica-
trained efficiently in the 90s [3], were able to provide decent tions in diverse fields have experienced a drastic increase
results in visual object recognition tasks. At the time, datasets during the last years. Some examples include power line
were small and computers were not powerful enough, so inspection [9], wildlife conservation [10], building inspection
the performance was often similar to or worse than that of [11], and precision agriculture [12]. However, UAVs have
classical Computer Vision algorithms. The development of limitations in the size, weight, and power consumption of the
CUDA for Nvidia GPUs which enabled over 1000 GFLOPS payload and limited range and endurance. These limitations
per second and the publication of the ImageNet dataset, cannot be overlooked and are particularly relevant when deep
with 1.2 million images classified in 1000 categories [4], were learning algorithms are required to run on board a UAV.
important facts for the popularization of CNNs with several In this survey, we have grouped publications according
layers (109 to 1010 connections and 107 to 109 parameters). to the taxonomy proposed in Aerostack [13], which is aerial
These deep models show great performance not only in robotics architecture consistent with the usual components
2 Journal of Sensors

Robotic agent N
Society coordinator ···
Planning system Robotic agent 1
Global Motor system
mission Missions

Motion references
planner Actions & skills Executive Motion
feedback system Aerial

Actions
All controller A

Unexpected
Actuator

skills
Process

operation
signals commands platform &
management Planning system Manager of actuators
Communication actions Visual
system Supervision Mission servoing
system planner controller
Communication Hardware
system Action Self-localization interface
Path Environment understanding
Network

Robot-robot monitor
planner Actuator
interface Situational interface

Environment understanding
Process awareness Extracted features
Performance

Self-localization
monitor Yaw system
Feature Sensor
Human-robot planner

Extracted features
Self- extraction interface
Problem

Internal state
interface

performance
manager localization system
Human Action and mapping

measurements
specialist Feature
Communication extractor A
Multisensor

Raw
system All
signals All fusion state Sensors
signals estimator Feature
extractor B

Operator
Social layer Reflective layer Deliberative layer Executive layer Reactive layer Physical layer

Figure 1: Aerostack architecture, consisting of a layered structure, corresponding to the different abstraction levels in an unmanned aerial
robotic system. The architecture has been applied here to systematically classify deep learning-based algorithms available in the state of the
art which have been deployed for applications with Unmanned Aerial Vehicles.

related to perception, guidance, navigation, and control of (v) Executive system: this system receives high-level
unmanned rotorcraft systems. The purpose of referring to symbolic actions and generates detailed behaviour
this architecture, depicted in Figure 1, is to achieve a better sequences
understanding about the nature of the components to the (vi) Planning system: this type of system generates global
aerial robotic systems analyzed. Using this taxonomy also solutions to complex tasks by means of planning (e.g.,
helps identify the components in which deep learning has not path planning and mission planning)
been applied yet. According to Aerostack, the components
(vii) Supervision system: components in the supervision
constituting an unmanned aerial robotic system can be
system simulate self-awareness in the sense of abil-
classified into the following systems and interfaces:
ity to supervise other integrated systems. We can
(i) Hardware interfaces: this category includes interfaces exemplify this type of component with an algorithm
with both sensors and actuators that checks whether the robot is actually making
progress towards its goal and reacts in the presence
(ii) Motor system: the components of a motor system of problems (unexpected obstacles, faults, etc.) with
are motion controllers, which typically receive com- recovery actions
mands of desired values for a variable (position,
(viii) Communication system: the components in the com-
orientation, or speed). These desired values are trans-
munication system are responsible for establishing
lated into low-level commands that are sent to actua-
an adequate communication with human operators
tors
and/or other robots
(iii) Feature extraction system: feature extraction here
The remainder of this paper is as follows: firstly, Section 2
refers to the extraction of useful features or repre-
covers a description of the currently relevant and prominent
sentations from sensor data. The task of most deep
deep learning algorithms. For the sake of completeness, deep
learning algorithms is to learn data representations,
learning algorithms have been included regardless of their
so feature extraction systems are somewhat inherent
direct use in UAV applications. Section 3 presents the state
to deep learning algorithms
of the art in deep learning for feature extraction in UAV
(iv) Situational awareness system: this system includes applications. Section 4 surveys UAV applications of deep
components that compile sensor information into learning for the development of components of planning
state variables regarding the robot and its envi- and situation awareness systems. Reported applications of
ronment, pursuing environment understanding. An deep learning for motion control in UAVs are presented in
example component within the situational awareness Section 5. Finally, a discussion of the main challenges for the
system is SLAM algorithms application of deep learning for UAVs is covered in Section 6.
Journal of Sensors 3

Feature maps F. maps F. maps Output


Input

Fully
Convolutions Subsampling Convolutions Subsampling connected

Figure 2: A generic example of a Convolutional Neural Network model. The usual architecture alternates convolution and subsampling
layers. Fully connected neurons are used in the last layers.

2. Deep Learning in the Context of nowadays in supervised learning: Feedforward Neural Net-
Machine Learning works, a popular variation of these called Convolutional Neu-
ral Networks (CNNs), Recurrent Neural Networks (RNNs),
Machine Learning is a capability enabling Artificial Intelli- and a variation of RNNs called Long Short-Term Memory
gence (AI) systems to learn from data. A good definition (LSTM) models.
for what learning involves is the following: “a computer Feedforward Neural Networks, also known as Multi-
program is said to learn from experience E with respect layer Perceptrons (MLPs), are the most common supervised
to some class of tasks T and performance measure P if its learning models. Their purpose is to work as function
performance at tasks in T, as measured by P, improves with approximators: given a sample vector x with 𝑛 features, a
experience E” [15]. The nature of this experience E is typically trained algorithm is expected to produce an output value or
considered for classifying Machine Learning algorithms into classification category y that is consistent with the mapping
the following three categories: supervised, unsupervised, and of inputs and outputs provided in the training set. The
reinforcement learning: approximated function is usually built by stacking together
(i) In supervised learning, algorithms are presented with several hidden layers that are activated in chain to obtain
a dataset containing a collection of features. Addi- the desired output. The number of hidden layers is usually
tionally, labels or target values are provided for each referred to as the depth of the model, which explains the
sample. This mapping of features to labels of target origin of the term deep learning: learning using models with
values is where the knowledge is encoded. Once it has several layers. These layers are made up of neurons or units
learned, the algorithm is expected to find the mapping whose activation given an input vector 𝑥 ∈ R𝑛 is given by the
from the features of unseen samples to their correct following equation:
labels or target values.
(ii) The purpose in unsupervised learning is to extract 𝑎𝜃 (𝑥) = 𝑔 (𝜃𝑇 𝑥) , (1)
meaningful representations and explain key features
of the data. No labels or target values are necessary in
where 𝜃 is a vector of 𝑛 weights and 𝑔 is an activation function
this case in order to learn from the data.
that is usually chosen to be nonlinear. The activation of unit
(iii) In reinforcement learning algorithms, an AI agent 𝑘 in layer 𝑚 given its 𝑛 inputs (outputs of the previous layer
interacts with a real or simulated environment. This 𝑚 − 1) is given by the following equation:
interaction provides feedback between the learning
system and the interaction experience which is useful
to improve performance in the task being learned. 𝑎𝑘𝑚 = 𝑔 (Θ𝑚−1 𝑚−1
𝑘0 𝑎0 + Θ𝑚−1 𝑚−1
𝑘1 𝑎1 + ⋅ ⋅ ⋅ + Θ𝑚−1 𝑚−1
𝑘𝑛 𝑎𝑛 ). (2)

Deep learning algorithms are a subset of Machine Learn-


ing algorithms that typically involve learning representations During the process of learning, the weights in each unit
at different hierarchy levels to enable building complex con- are updated using backpropagation in order to optimize a cost
cepts out of simpler ones. The following paragraphs cover the function, which generally indicates the similarity between the
most relevant deep learning technologies currently available desired outputs and the actual ones.
in supervised, unsupervised, and reinforcement learning. Convolutional Neural Networks (CNNs), depicted in
Figure 2, are a specific type of models conceived to accept 2-
2.1. Supervised Learning. Supervised learning algorithms dimensional input data, such as images or time series data.
learn how to associate an input with some output, given These models take their name from the mathematical linear
a training set of examples of inputs and outputs [16]. The operation of convolution which is always present in at least
following paragraphs cover the most relevant algorithms one of the layers of the network. The most typical convolution
4 Journal of Sensors

operation used in deep learning is 2D convolution of a 2-


dimensional image 𝐼 with a 2-dimensional kernel 𝐾, given
by the following equation: Input gate it Output gate Ot

𝐶 (𝑖, 𝑗) = (𝐼 ∗ 𝐾) (𝑖, 𝑗)
Cell
(3)
= ∑∑𝐼 (𝑚, 𝑛) 𝐾 (𝑖 − 𝑚, 𝑗 − 𝑛) . xt
𝑚 𝑛 x ct x ℎt

The output of the convolution operation is usually run x


through a nonlinear activation function and then further
modified by means of a pooling function, which replaces Forget gate
ft
the output in a certain location with a value obtained
from nearby outputs. This pooling function helps make the
representation learned invariant to small translations of the
input and performs subsampling of the input data. The most Figure 3: A long-short term memory model, adapted from the
common pooling function is max pooling, which replaces original figure in [14]. Learned weights control how data enter and
the output with the maximum activation within a rectangular leave and are deleted through the use of gates.
neighborhood. Convolution and pooling layers are stacked
together to achieve feature learning in a hierarchical way.
For example, when learning from images, layers closer to the
input learn low-level feature representations (i.e., edges and networks to learn from and for the resolution of a computer
corners) and those closer to the output learn higher level to enable its representation.
representations (i.e., contours and parts of objects). Once Long Short-Term Memory (LSTM) models are a type
the features of interest have been learned, their activations of RNN architecture proposed in 1997 by Hochreiter and
are used in final layers, which are usually made up of fully Schmidhuber [17] which successfully overcomes the problem
connected neurons, to classify the input or perform value of vanishing gradients by maintaining a more constant error
regression with it. through the use of gated cells, which effectively allow for
In contrast to MLPs, Recurrent Neural Networks (RNNs) continuous learning over a larger number of time steps. A
are models in which the output is a function of not only the typical LSTM cell is depicted in Figure 3. The input, output,
current inputs but also of the previous outputs, which are and forget gate vector activations in a standard LSTM are
encoded into a hidden state ℎ. This means that RNNs have given as follows:
memory of the previous outputs and therefore can encode
the information present in the sequence itself, something that 𝑖𝑡 = 𝑔 (𝑊𝑖 𝑥𝑡 + 𝑈𝑖 ℎ𝑡−1 ) ,
MLPs cannot do. As a consequence, this type of model can
be very useful to learn from sequential data. The memory is 𝑜𝑡 = 𝑔 (𝑊𝑜 𝑥𝑡 + 𝑈𝑜 ℎ𝑡−1 ) , (6)
encoded into an internal state and updated as indicated in the
following equation: 𝑓𝑡 = 𝑔 (𝑊𝑓 𝑥𝑡 + 𝑈𝑓 ℎ𝑡−1 ) .

ℎ𝑡 = 𝑔 [𝑊𝑥𝑡 + 𝑈ℎ𝑡−1 ] , (4) The cell state vector activation is given by the following
equation:
where ℎ𝑡 represents the hidden state at time step 𝑡. The weight 𝑐𝑡 = 𝑓𝑡 ∘ 𝑐𝑡−1 + 𝑖𝑡 ∘ 𝑔 (𝑊𝑐 𝑥𝑡 + 𝑈𝑐 ℎ𝑡−1 ) , (7)
matrices 𝑊 (input-to-hidden) and 𝑈 (hidden-to-hidden)
determine the importance given to the current input and to where ∘ represents the Hadamard product. Finally, the output
the previous state, respectively. The activation is computed gate vector activation is given by the following equation:
with a third weight matrix 𝑉 (hidden-to-output) as indicated
by the following equation: ℎ𝑡 = 𝑜𝑡 ∘ 𝑔 (𝑐𝑡 ) . (8)

𝑎𝑡 = 𝑉ℎ𝑡 . (5) As it has been already stated, LSTM gated cells in RNNs
have internal recurrence, besides the outer recurrence of
RNNs are usually trained using Backpropagation RNNs. Cells store an internal state, which can be written to
Through Time (BPTT), an extension of backpropagation and read from them. There are gates controlling how data
which takes into account temporality in order to compute the enter and leave and are deleted from this cell state. Those
gradients. Using this method with long temporal sequences gates act on the signals they receive, and, similar to a standard
can lead to several issues. Gradients accumulated over a neural network, they block or pass on information based on
long sequence can become immeasurably large or extremely its strength and importance using their own sets of weights.
small. These problems are referred to as exploding gradients Those weights, as the weights that modulate input and hidden
and vanishing gradients, respectively. Exploding gradients states, are adjusted via the recurrent network’s learning
are easier to solve, as they can be truncated or squashed, process. The cells learn when to allow data to enter and leave
whereas vanishing gradients can become too small for or be deleted through the iterative process of making guesses,
Journal of Sensors 5

backpropagating error, and adjusting weights via gradient sample is presented to the model, the binary states of the
descent. This type of model architecture allows successful hidden variables are set to 1 with probability given by (14).
learning from long sequences, helping to capture diverse time Analogously, once the binary states of the hidden variables
scales and remote dependencies. Practical aspects on the use are computed, the binary states of the visible units are set to 1
of LSTMs and other deep learning architectures can be found with a probability given by (15).
in [18].
𝑃 (h | k; 𝜃) = ∏𝑝 (ℎ𝑗 | V) ,
2.2. Unsupervised Learning. Unsupervised learning aims 𝑗
towards the development of models that are capable of (13)
extracting meaningful and high-level representations from 𝑃 (k | h; 𝜃) = ∏𝑝 (V𝑖 | ℎ) ,
𝑖
high-dimensional sensory unlabeled data. This functionality
is inspired by the visual cortex which requires very small
amount of labeled data. 𝑝 (ℎ𝑗 = 1 | V) = 𝜎 (∑𝑊𝑖𝑗 V𝑖 + 𝑎𝑗 ) , (14)
Deep Generative Models such as Deep Belief Networks 𝑖
(DBNs) [19, 20] allow the learning of several layers of
nonlinear features in an unsupervised manner. DBNs are
𝑝 (V𝑖 = 1 | ℎ) = 𝜎 (∑𝑊𝑖𝑗 ℎ𝑗 + 𝑏𝑖 ) , (15)
built by stacking several Restricted Boltzmann Machines 𝑗
(RBMs) [21, 22], resulting in a hybrid model in which the
top two layers form a RBM and the bottom layers act as a where 𝜎(𝑧) = 1/1 + exp(−𝑧) is the logistic function.
directed graph constituting a Sigmoid Belief Network (SBN). For training the RBM model, the learning is conducted by
The learning algorithm proposed in [19] is supposed to be one applying the Contrastive Divergence algorithm [22], in which
of the first efficient ways of learning DBNs by introducing the update rule applied to the model parameters is given by
a greedy layer-by-layer training in order to obtain a deep the following equation:
hierarchical model. In this greedy learning procedure, the
hidden activity patterns obtained in the current layer are used Δ𝑊𝑖𝑗 = 𝜖 (⟨V𝑖 ℎ𝑗 ⟩data − ⟨V𝑖 ℎ𝑗 ⟩recons ) , (16)
as the “visible” data for training the RBM of the next layer.
Once the stacked RBMs have been learned and combined where 𝜖 is the learning rate, ⟨V𝑖 ℎ𝑗 ⟩data represents the expected
to form a DBN, a fine-tuning procedure using a contrastive value of the product of visible and hidden states at thermal
version of the wake-sleep algorithm [23] is applied. equilibrium, when training data is presented to the model,
For a better understanding, the theoretical details of and ⟨V𝑖 ℎ𝑗 ⟩recons is the expected value of the product of visible
RBMs are provided in the following equations. The energy of and hidden states after running a Gibbs chain.
a joint configuration {k, h} can be calculated as follows: Deep neural networks can also be utilized for dimen-
sionality reduction of the input data. For this purpose,
𝐸 (k, h; 𝜃) = − ∑ V𝑖 𝑏𝑖 − ∑ ℎ𝑗 𝑎𝑗 − ∑𝑊𝑖𝑗 V𝑖 ℎ𝑗 , (9) deep “autoencoders” [24, 25] have been shown to provide
𝑖∈vis 𝑗∈hid 𝑖,𝑗 successful results in a wide variety of applications such
as document retrieval [26] and image retrieval [27]. An
where 𝜃 = {𝑊, 𝑏, 𝑎} represent the model parameters. k ∈ {0, 1} autoencoder (see Figure 4) is an unsupervised neural network
are the “visible” stochastic binary units, which are connected in which the target values are set to be equal to the inputs.
to the “hidden” stochastic binary units h ∈ {0, 1}. The bias Autoencoders are mainly composed of an “encoder” network,
terms are denoted by 𝑏𝑖 for the visible units and 𝑎𝑗 for the which transforms the input data into a low-dimensional code,
hidden units. and a “decoder” network, which reconstructs the data from
The probability of a joint configuration over both visible the code. Training these deep models involves minimizing the
and hidden units depends on the energy of that joint error between the original data and its reconstruction. In this
configuration and is given by (10), where 𝑍(𝜃) represents the process, the weights initialization is critical to avoid reaching
partition function (see (11)): a bad local optimum; thus some authors have proposed a
pretrained stage based on stacked RBMs and a fine-tuning
1
𝑃 (k, h; 𝜃) = exp (−𝐸 (k, h; 𝜃)) , (10) stage using backpropagation [24, 27]. In addition, the encoder
𝑍 (𝜃) part of the autoencoder can serve as a good unsupervised
𝑍 (𝜃) = ∑∑ (exp (−𝐸 (k, h; 𝜃))) . nonlinear feature extractor. In this field, the use of Stacked
(11) Denoising Autoencoders (SDAE) [25] has been proven to
k h
be an effective unsupervised feature extractor in different
The probability assigned by the model to a visible vector classification problems. The experiments presented in [25]
k can be computed as expressed in the following equation: showed that training denoising autoencoders with higher
noise levels forced the model to extract more distinctive and
1 less local features.
𝑃 (k; 𝜃) = ∑ exp (−𝐸 (k, h; 𝜃)) . (12)
𝑍 (𝜃) ℎ
2.3. Deep Reinforcement Learning. In reinforcement learning,
The conditional distributions over hidden variables h and an agent is defined to interact with an environment, seeking
visible variables v can be extracted using (13). Once a training to find the best action for each state at any step in time (see
6 Journal of Sensors

Code
layer

Original Reconstructed
input input

Encoder Decoder

Figure 4: Deep autoencoder. An autoencoder consists of an encoder network, which transforms the original input data into a low-
dimensional code, and a decoder network, which reconstructs the data from the code.

Value function For a specific policy 𝜋, the value function 𝑉𝜋 in (17)


methods
is a representation of the expectation of the accumulated
st
Reward rt Q/V discounted reward 𝑅𝑡 for each state 𝑠 ∈ S (assuming a
at
State st deterministic policy 𝜋(𝑠𝑡 )):
Environment Agent
Action at
 st 𝑉𝜋 (𝑠𝑡 ) = E [𝑅𝑡 | 𝑠𝑡 , 𝑎𝑡 = 𝜋 (𝑠𝑡 )] . (17)
Policy search
methods An equivalent of the value function is represented by the
action-value function 𝑄𝜋 in (18) for every action-state pair
Figure 5: Generic structure of a reinforcement learning problem. (𝑠𝑡 , 𝑎𝑡 ):
The optimization methods to solve the reinforcement learning
problem are mainly categorized into value function and policy
search methods. 𝑄𝜋 (𝑠𝑡 , 𝑎𝑡 ) = 𝑟 (𝑠𝑡 , 𝑎𝑡 ) + 𝛾∑𝑝 (𝑠𝑡+1 | 𝑠𝑡 , 𝑎𝑡 ) 𝑉𝜋 (𝑠𝑡+1 ) . (18)
𝑠𝑡+1

The optimal policy 𝜋∗ shall be the one that maximizes the


Figure 5). The agent must balance exploration and exploita- value function (or equivalently the action-value function), as
tion of the state space in order to find the optimal policy in the following equation:
that maximizes the accumulated reward from the interaction
with the environment. In this context, an agent modifies
its behaviour or policy with the awareness of the states, 𝜋∗ = arg max 𝑉𝜋 (𝑠𝑡 ) . (19)
𝜋
actions taken, and rewards for every time step. Reinforcement
learning composes an optimization process throughout the A general problem in real robotic applications is that the
whole state space in order to maximize the accumulated state and action spaces are often continuous spaces. A con-
reward. Robotic problems are often task-based with temporal tinuous state and/or action space can make the optimization
structure. These types of problems are suitable to be solved by problem intractable, due to the overwhelming set of different
means of a reinforcement learning framework [28]. states and/or actions. As a general framework for representa-
The standard reinforcement learning theory states that an tion, reinforcement learning methods are enhanced through
agent is able to obtain a policy, which maps every state 𝑠 ∈ S deep learning to aid the design for feature representation,
to an action 𝑎 ∈ A, where S is the state space (possible states which is known as deep reinforcement learning. Reinforce-
of the agent in the environment) and A is the finite action ment learning and optimal control aim at finding the optimal
space. The inner dynamics of the agent are represented by policy 𝜋∗ by means of several methods. The optimal solution
the transition probability model 𝑝(𝑠𝑡+1 | 𝑠𝑡 , 𝑎𝑡 ) at time 𝑡. The can be searched in this original primal problem, or the dual
policy can be stochastic 𝜋(𝑎 | 𝑠), with a probability associated formulation 𝑉∗ , 𝑄∗ can be the optimization objective. In
with each possible action, or deterministic 𝜋(𝑠). In each time this review, deep reinforcement learning methods are divided
step, the policy determines the action to be chosen and the into two main categories: value function and policy search
reward 𝑟(𝑠𝑡 , 𝑎𝑡 ) is observed from the environment. The goal of methods.
the agent is to maximize the accumulated discounted reward
𝑅𝑡 = ∑𝑇𝑖=𝑡 𝛾𝑖−𝑡 𝑟(𝑠𝑖 , 𝑎𝑖 ) from a state at time 𝑡 to time 𝑇 (𝑇 = ∞ 2.3.1. Value Function Methods. These methods seek to find
for infinite horizon problems) [29]. The discount factor 𝛾 is optimal 𝑉∗ , 𝑄∗ , from which the optimal policy 𝜋∗ in (20)
defined to allocate different weights for the future rewards. is directly derived. 𝑄-learning approaches are based on the
Journal of Sensors 7

optimization of the action-value function 𝑄, based on the DDPG method learns with an average factor of 20 times
Bellman Optimality Equation [29] for 𝑄 (see (21)): fewer experience steps than DQN [33]. Both DDPG and
DQN require large samples datasets, since they are model-
𝜋∗ = arg max 𝑄∗ (𝑠𝑡 , 𝑎𝑡 ) , (20) free algorithms. Regarding DNN-based Guided Policy Search
𝑎𝑡
(DNN-based GPS) [34] method, it learns to map from the
tuple raw visual information and joint states directly to
𝑄∗ (𝑠𝑡 , 𝑎𝑡 ) = E [𝑟 (𝑠𝑡 , 𝑎𝑡 ) + 𝛾 max
𝑎
𝑄 (𝑠𝑡+1 , 𝑎𝑡+1 )] . (21) joint torques. Compared to the previous works, it managed
𝑡+1
to perform high-dimensional control, even from imperfect
Deep 𝑄-Network (DQN) [30, 31] method estimates the sensor data. DNN-based GPS has been widely applied to
action-value function (see (22)) by means of a CNN model robotic control, from manipulation to navigation tasks [35,
with a set of weights 𝜃 as 𝑄∗ (𝑠, 𝑎) ≈ 𝑄(𝑠, 𝑎; 𝜃): 36].

𝑄𝑖∗ (𝑠𝑡 , 𝑎𝑡 ) = 𝑦𝑖 3. Deep Learning for Feature Extraction


(22)
The main objective of feature extraction systems is to extract
= E [𝑟 (𝑠𝑡 , 𝑎𝑡 ) + 𝛾 max 𝑄 (𝑠𝑡+1 , 𝑎𝑡+1 ; 𝜃𝑖−1 ) | 𝑠𝑡 , 𝑎𝑡 ] .
𝑎𝑡+1 representative features from the raw measurements provided
by sensors on board a UAV.
The CNN can be trained by minimizing a sequence of
loss functions 𝐿 𝑖 (𝜃𝑖 ) which are optimized in each iteration 𝑖 3.1. With Image Sensors. Deep learning techniques for feature
as shown in the following equation: extraction using image sensors have been applied over a wide
range of applications using different imaging technologies
2
𝐿 𝑖 (𝜃𝑖 ) = E [(𝑦𝑖 − 𝑄 (𝑠𝑡 , 𝑎𝑡 ; 𝜃𝑖 )) ] . (23) (e.g., monocular RGB camera, RGB-D sensors, infrared,
etc.). Despite the wide variety of sensors utilized for image
The state 𝑠 of the DQN algorithm is the raw image and processing, main deep learning feature extractors are based
it has been widely tested with Atari games [31]. DQN is not on CNNs [67]. As explained in Section 2.1, CNN models
designed for continuous tasks; thus this method may find consist of several stacked convolution and pooling layers.
difficulties approaching some robotics problems previously The convolution layers are responsible for extracting features
solved by continuous control. Continuous 𝑄-learning with from the data by convolving the input image with learned fil-
Normalized Advantage Functions (NAF) overcomes this ters, while pooling layers provide a dimensionality reduction
issue by the use of a neural network that separately outputs over previous convolution layers.
a value function 𝑉(𝑥) and an advantage term 𝐴(𝑥, 𝑢), which In the robotics field, feature extraction systems based on
is parametrized as a quadratic function of nonlinear features CNN models have been mainly applied for object recognition
[32]. These two functions compose final 𝑄(𝑥, 𝑢 | 𝜃𝑄), given [42–48] and scene classification [51–54]. Concerning the
by the following equation: object recognition task, recent advances have integrated
object detection solutions by means of bounding box regres-
𝑄 (𝑥, 𝑢 | 𝜃𝑄) = 𝐴 (𝑥, 𝑢 | 𝜃𝐴) + 𝑉 (𝑥 | 𝜃𝑉) (24) sion and object classification capabilities within the same
CNN model [42–44]. Unsupervised feature learning for
with 𝑥 being the state, 𝑢 being the action, and 𝜃𝑄, 𝜃𝐴, and object recognition was applied in [68], making fewer require-
𝜃𝑉 being the sets of weights of 𝑄, 𝐴, and 𝑉 functions, ments on manually labeled training data, the obtainment
respectively. This representation allows simplifying more of which can be an extremely time-consuming and costly
standard actor-critic style algorithms, while preserving the process. Regarding the scene classification problem, recent
benefits of nonlinear value function approximation [32]. NAF advances have focused on learning efficient and global image
is valid for continuous control tasks and takes advantage of representations from the convolutional and fully connected
trained models to approximate the standard model-free value layers from pretrained CNNs in order to obtain representative
function. image features [53]. In [52], it was also shown that the
learned features obtained from pretrained CNN models were
2.3.2. Policy Search Methods. Policy-based reinforcement able to generalize properly even in substantially different
learning methods aim towards directly searching for the domains for those in which they were trained, such as the
optimal policy 𝜋∗ , which provides a feasible framework classification of aerial images. Scene classification on board a
for continuous control. Deep Deterministic Policy Gradient Parrot AR.Drone quadrotor was also presented in [40], where
(DDPG) [33] is based on the actor-critic paradigm [29], with a 10-layered CNN was utilized for classifying the input image
two neural networks to approximate a greedy deterministic of a forest trail into three classes, each of which represented
policy (actor) and 𝑄 function (critic). The actor network is the action to be taken in order to maintain the aerial robot on
updated by applying the chain rule to the expected return the trail (turn left, go straight, and turn right).
from the start distribution 𝐽 with respect to the actor Nowadays, object recognition and scene classification
parameters (see (25)): from aerial imagery using deep learning techniques have
also acquired a relevant role in agriculture applications. In
󵄨 these kinds of applications, UAVs provide a low-cost platform
󳶚𝜃𝜇 𝐽 ≈ E𝑠𝑡 ∼𝜌𝛽 [ 󳶚𝜃𝜇 𝑄 (𝑠, 𝑎 | 𝜃𝑄)󵄨󵄨󵄨󵄨 ]. (25) for aerial image acquisition, while deep learned features
𝑠=𝑠𝑡 ,𝑎=𝜇(𝑠𝑡 |𝜃𝜇 )
8 Journal of Sensors

are mainly utilized for plant counting and identification. Deep learning techniques for UAVs have been utilized for
Several applications have used deep learning techniques for acoustic data recognition [64, 65]. In [64], a Partially Shared
this purpose [12, 49, 50, 55, 56], providing robust systems Deep Neural Network (PS-DNN) was proposed to deal with
for monitoring the state of the crops in order to maximize the problem of sound source separation and identification
their productivity. In [55], a sparse autoencoder was utilized using partially annotated data. For this purpose, the PS-DNN
for unsupervised feature learning in order to perform weed is composed of two partially overlapped subnetworks: one
classification from images taken by a multirotor UAV. In regression network for sound source separation and one clas-
[56], a hybrid neural network for crop classification amongst sification network responsible for the sound identification.
23 classes was proposed. The hybrid network consisted The objective of the regression network for sound source
of the combination of a Feedforward Neural Network for separation is to improve the network training for sound
histogram information management and a CNN. In [49], source classification by providing a cleaner sound signal.
the well-known AlexNet CNN architecture proposed in [69] Results showed that PS-DNN model worked reasonably well
was utilized in combination with a sliding window object for people’s voice identification in disastrous situations. The
proposal technique for palm tree detection and counting. data was collected using a microphone array on board a
Other similar approaches have focused on weed scouting Parrot Bebop UAV.
using a CNN model for weed specifies classification [12]. In [65], the problem of UAVs identification based on
Deep learning techniques applied on images taken from their specific sound was addressed by using a bidirectional
UAVs have also gained a lot of importance in monitor- LSTM-RNN with 3 layers and 300 LSTM blocks. This model
ing and search and rescue applications, such as jellyfish exhibited the best performance amongst other 2 preselected
monitoring [70], road traffic monitoring from UAVs [71], models, namely, Gaussian Mixture Models (GMM) and
assisting avalanche search and rescue operations with UAV CNN.
imagery [72], and terrorist identification [73]. In [72, 73], Concerning the radar technology and despite the fact that
the use of pretrained CNN models for feature extraction is radar data has not been widely addressed using deep learning
worth noting again. In both cases, the well-known Inception techniques for UAVs in the literature, the recent advances
model [74] was used. In [72], the Inception model was presented in [62] are worth mentioning. In this paper, the
utilized with a Support Vector Machine (SVM) classifier for spectral correlation function (SCF) was captured using a
detecting possible survivors, while in [73], a transfer-learning 2.4 GHz Doppler radar sensor that was utilized in order
technique was used to fine-tune the Inception network in to detect and classify micro-UAVs amongst 3 predefined
order to detect possible terrorists. classes. The model utilized for this purpose was based on a
Most of the presented approaches, especially in the field of semisupervised DBN trained with the SCF data.
object recognition, require the use of GPUs for dealing with
Regarding laser technology, in [66], a novel strategy
real-time constraints. In this sense, the state-of-the-art object
for detecting safe landing areas based on the point clouds
recognition systems are based on the approaches presented in
captured from a LIDAR sensor mounted on a helicopter
[46, 47], in which the object recognizer is able to run at rates
was proposed. In this paper, subvolumes of 1 m3 from a
from 40 to 90 frames per second on an Nvidia GeForce GTX
volumetric density map constructed from the original point
Titan X.
cloud were used as input to a 3D CNN which was trained
Despite the good results provided by the aforementioned
to predict the probability of the evaluated area as being a
systems, UAV constraints such as endurance, weight, and
safe landing zone. Several CNN models consisting of one or
payload require the development of specific hardware and
two convolutional layers were evaluated over synthetic and
software solutions for being embedded on board a UAV.
semisynthetic datasets, showing in both cases good results
Taking these limitations into account, only few systems in the
when using a 3D CNN model with two convolutional layers.
literature have embedded feature extraction algorithms using
deep learning processed by GPU technology on board a UAV.
In [75], the problem of automatic detection, localization, 4. Deep Learning for Planning and
and classification (ADLC) of plywood targets was addressed. Situational Awareness
The solution consisted of a cascade of classifiers based on
CNN models trained on an Nvidia Titan X and applied over Several deep learning developments have been reported for
24 M-pixel RGB images processed by an Nvidia Jetson TK1 tasks related to UAV planning and situational awareness.
mounted on board a fixed-wing UAV. The ADLC algorithm Planning tasks refer to the generation of solutions for com-
was processed by combining the CPU cores for the detection plex problems without having to hand-code the environment
stage, allowing the GPU to focus on the classification tasks. model or the robot’s skills or strategies into a reactive con-
troller. Planning is required in the presence of unstructured,
3.2. With Other Sensors. Most of the presented workload dynamic environments or when there is diversity in the
using deep learning in the literature has been applied to scope and/or the robot’s tasks. Typical tasks include path,
data capture by image sensors due to the consolidated motion, navigation, or manipulation planning. Situational
results obtained using CNN models. However, deep learning awareness tasks allow robots to have knowledge about their
techniques cover a wide range of applications and can be own state and their environment’s state. Some examples of
used in conjunction with sensors other than cameras, such this kind of tasks are robot state estimation, self-localization,
as acoustic, radar, and laser sensors. and mapping.
Journal of Sensors 9

4.1. Planning. Path planning for collaborative search and 5. Deep Learning for Motion Control
rescue missions with deep learning-based exploration is
presented in [57]. This work, where a UAV explores and maps Deep learning techniques for motion control have been
the environment trying to find a traversable path for a ground recently involved in several scientific researches. Classic con-
robot, focuses on minimizing overall deployment time (i.e., trol has solved diverse robotic control problems in a precise
and analytic manner, allowing robots to perform complex
both exploration and path traversal). In order to map the
maneuvers. Nevertheless, standard control theory only solves
terrain and find a traversable path, a CNN is proposed for
the problem for a specific case and for an approximated robot
terrain classification. Instead of using a pretrained CNN,
model, not being able to easily adapt to changes in the robot
training is done on the spot, allowing training the classifier model and/or to hostile environments (e.g., a propeller on
on demand with the terrain present at the disaster site [58]. a UAV gets damaged, wind gusts, and rain). In this context,
However, the model takes around 15 minutes to train. learning from experience is a matter of importance which can
overcome numerous stated limitations.
4.2. Situational Awareness. Cross-view localization of images As a key advantage, deep learning methods are able to
is achieved with the help of deep learning in [59]. Although properly generalize with certain sets of labelled input data.
the work is presented as a solution for UAV localization, no Deep learning allows inferring a pattern from raw inputs,
UAVs were used for image collection and the experiments such as images and LIDAR sensor data which can lead to
were based on ground-level images only. The approach is proper behaviour even in unknown situations. Concerning
based on mining a library of raw image data to find nearest the UAV indoor navigation task, recent advances have led
neighbor visual features (i.e., landmarks) which are then to a successful application of CNNs in order to map images
matched with the features extracted from an input query to high-level behaviour directives (e.g., turn left, turn right,
image. A pretrained CNN is used to extract features for rotate left, and rotate right) [38, 39]. In [38], 𝑄 function is
matching verification purposes, and although the approach estimated through a CNN, which is trained in simulation and
is said to have low computational complexity, authors do not successfully tested in real experiments. In [39], actions are
provide details about retrieval time. directly mapped from raw images. In all stated methods, the
learned model is run off board, usually taking advantage of a
Ground-level query images are matched to a reference
GPU in an external laptop.
database of aerial images in [60]. Deep learning is applied
here to reduce the wide baseline and appearance variations With regard to UAV navigation in unstructured envi-
between both ground-level and aerial images. A pair-based ronments, some studies have focused on cluttered natural
scenarios, such as dense forests or trails [40]. In [40], a DNN
network structure is proposed to learn deep representations
model was trained to map image to action probabilities (turn
from data for distinguishing matched and unmatched cross-
left, go straight, or turn right) with a final softmax layer
view image pairs. Even though the training procedure in the and tested on board by means of an ODROID-U3 processor.
reported experiments took 4 days, the use of fast algorithms The performance of two automated methods, SVM and the
such as locality-sensitive hashing allowed for real-time cross- method proposed in [76], is latterly compared to that of two
view matching at city scale. The main limitation of their human observers.
approach is the need to estimate scale, orientation, and In [37], navigable areas are predicted from a disparity
dominant depth at test time for ground-level queries. image in the form of up to three bounding boxes. The center
In [61], a CNN is proposed to generate control actions of the biggest bounding box found is selected as the next
(the permitted turns for a UAV) given an image captured waypoint. Using this strategy, UAV flights are successfully
on board and a global motion plan. This global motion plan performed. The main drawback is the requirement to send the
indicates the actions to take given a position on the map disparity images to a host device where all computations are
by means of a potential function. The purpose of the CNN made. The whole pipeline for the UAV horizontal translation,
is to learn the mapping from images to position-dependent disparity map generation, and waypoint selection takes about
actions. The process would be equivalent to perform image 1.3 seconds which makes navigation still quite slow for real
registration and then generate the control actions given the applications. On the other hand, low-level motion control
global motion plan but this behaviour is here learnt to is challenging, since tackling with continuous and multi-
be efficiently encoded in a CNN, demonstrating superior variable action spaces can become an intractable problem.
results to classical image registration techniques. However, Nevertheless, recent works have proposed novel methods to
no tests on real UAV were carried out and no information is learn low-level control policies from imperfect sensor data
provided about execution time, which might complicate the in simulation [41, 63]. In [63], a Model Predictive Controller
deployment for a real UAV application. (MPC) was used to generate data at training time in order
As seen from the presented works, developments in to train a DNN policy, which was allowed to access only raw
planning and situational awareness with deep learning for observations from the UAV onboard sensors. In testing time,
UAVs are still quite rudimentary. The path planning approach the UAV was able to follow an obstacle-free trajectory even
presented is limited to small-scale disaster sites and the in unknown situations. In [41], the well-known Inception v3
different localization and mapping approaches are still slow model (pretrained CNN) was adapted in order to enable the
and have little accuracy for real UAV applications. final layer to provide six action nodes (three transitions and
10 Journal of Sensors

Table 1: Deep learning-based UAV applications grouped by learning algorithms and application fields.

Learning type Algorithm Tasks Field of application References


Outdoor navigation [37–39]
Navigation
Indoor navigation [40, 41]
[42–45]
Object recognition Generic
[46–48]
Object recognition Agriculture [49, 50]
Supervised CNN
Scene classification Generic [51–54]
Scene classification Agriculture [55, 56]
Path planning Search & rescue [57, 58]
Localization
Image registration [59–61]
Navigation
Autoencoder Feature extraction Agriculture [55]
Unsupervised
DBN Feature extraction UAV identification [62]
DQN — — —
DDPG — — —
Reinforcement
NAF — — —
GPS Indoor navigation Navigation [63]

three orientations). After retraining, the UAV managed to Challenges in Deep Learning. Deep learning techniques are
cross a room filled with a few obstacles in random locations. still facing several challenges, beginning with their own
Deep learning techniques for robotic motion control theoretical understanding. An example of this is the lack
can provide increasing benefits in order to infer com- of knowledge about the geometry of the objective function
in deep neural networks or why certain architectures work
plex behaviours from raw observation data. Deep learning
better than others. Furthermore, a lot of effort is currently
approaches have the potential of generalization, with the being put in finding efficient ways to do unsupervised
limitations of current methods which have to overcome the learning, since collecting large amounts of unlabeled data is
difficulties of continuous state and action spaces, as well as nowadays becoming economically and technologically less
issues related to the samples efficiency. Furthermore, novel expensive. Success in this objective will allow algorithms to
deep learning models require the usage of GPUs in order learn how the world works by simply observing it, as we
to work in real time. In this context, onboard GPUs, Field humans do.
Programmable Gate Arrays (FPGAs), or Application-Specific Additionally, as mentioned in Section 2.3, real-world
Integrated Circuits (ASICs) are a matter of importance which problems that usually involve high-dimensional continuous
hardware manufacturers shall take into consideration. state spaces (large number of states and/or actions) can turn
the problem intractable with current approaches, severely
limiting the development of real applications. An efficient
6. Discussion way for coping with these types of problems remains as an
Deep learning has arisen as a promising set of technologies unsolved challenge.
to the current demands for highly autonomous UAV opera-
tions, due to its excellent capabilities for learning high-level Challenges in UAV Autonomy. UAV autonomous operations,
enabling safe navigation with little or no human super-
representations from raw sensor data. Multiple success cases
vision, are currently key for the development of several
have been reported (Tables 1 and 2) in a wide variety of
civilian and military applications. However, UAV platforms
applications.
still have important flight endurance limitations, restricting
A straightforward conclusion from the surveyed articles size, weight, and power consumption of the payload. These
is that images acquired from UAVs are currently the prevail- limitations arise mainly from the current state of sensor and
ing type of information being exploited by deep learning, battery technology and limit the required capabilities for
mainly due to the low cost, low weight, and low power autonomous operations. Undoubtedly, we will see develop-
consumption of image sensors. This noticeable fact explains ments in these areas in the forthcoming years.
the dominance of CNNs among the deep learning algorithms Furthermore, onboard processing is desired for many
used in UAV applications, given the excellent capabilities of UAV operations, especially those where communications can
CNNs in extracting useful information from images. compromise performance, such as when large amounts of
However, deep learning techniques, UAV technology, and data have to be transmitted and/or when there is limited
the combined use of both still present several challenges, bandwidth available. Today, the design of powerful minia-
which are preventing faster and further advances in this field. turized computing devices with low-power consumption,
Journal of Sensors 11

Table 2: Deep learning-based UAV applications grouped by the type of system within an unmanned aerial systems architecture, the sensor
technologies, and the type of learning algorithms: supervised (𝑆), unsupervised (𝑈), and reinforcement (𝑅).

Aerial robot systems Sensing technologies Learning algorithms References


[42–45]
Image 𝑆 [46–48]
[51–54]
Feature extraction Image 𝑆, 𝑈 [55]
Acoustic 𝑆 [64, 65]
Radar 𝑆, 𝑈 [62]
LIDAR 𝑆 [66]
Planning Image 𝑆 [57, 58]
Situational awareness Image 𝑆 [59–61]
Image 𝑆 [38–41]
Motion control
LIDAR 𝑅 [63]

particularly GPUs, is an active working field for embedded References


hardware developers.
[1] A. G. Ivakhnenko, “Polynomial theory of complex systems,”
Challenges in Deep Learning-Based UAV Applications. This IEEE Transactions on Systems, Man and Cybernetics, vol. 1, no.
4, pp. 364–378, 1971.
review reveals that, within the architecture of an unmanned
aerial system, feature extraction systems are the type of [2] K. Fukushima, “Neocognitron: a hierarchical neural network
capable of visual pattern recognition,” Neural Networks, vol. 1,
systems in which deep learning algorithms have been more no. 2, pp. 119–130, 1988.
widely applied. This is reasonable given the excellent abilities
[3] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
of deep learning to learn data representations from raw sensor learning applied to document recognition,” Proceedings of the
data. Systems regarding higher-level abstractions, such as IEEE, vol. 86, no. 11, pp. 2278–2323, 1998.
UAV supervision and planning systems, have so far obtained [4] J. Deng, W. Dong, R. Socher et al., “ImageNet: a large-
little regard from the research community. These systems scale hierarchical image database,” in Proceedings of the 2009
implement complex behaviours that have to be learned IEEE Conference on Computer Vision and Pattern Recognition
and where the application of supervised learning (e.g., the (CVPR), pp. 248–255, Miami, Fla, USA, June 2009.
generation of labelled datasets) is complex. [5] Y. Bengio, A. Courville, and P. Vincent, “Representation learn-
Nevertheless, systems operating at lower levels of abstrac- ing: a review and new perspectives,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp.
tion, such as feature extraction systems, still demand great
1798–1828, 2013.
computational resources. These resources are still hard to
[6] J. Schmidhuber, “Deep learning in neural networks: an
integrate on board UAVs, requiring powerful communication overview,” Neural Networks, vol. 61, pp. 85–117, 2015.
capabilities and off-board processing. Furthermore, available [7] J. Gu, Z. Wang, J. Kuen et al., Recent Advances in Convolutional
computational resources are in most cases not compati- Neural Networks, https://arxiv.org/abs/1512.07108.
ble with online processing, limiting the applications where [8] L. Tai and M. Liu, “Deep-learning in mobile robotics - from
reactive behaviours are necessary. This again imposes the perception to control systems: a survey on why and why not,”
aforementioned challenge of developing embedded hardware CoRR abs/1612.07139. http://arxiv.org/abs/1612.07139.
technology advances but should also encourage researchers [9] C. Martinez, C. Sampedro, A. Chauhan, and P. Campoy,
to design more efficient deep learning architectures. “Towards autonomous detection and tracking of electric towers
for aerial power line inspection,” in Proceedings of the 2014
International Conference on Unmanned Aircraft Systems, ICUAS
Conflicts of Interest 2014, pp. 284–295, May 2014.
The authors declare that they have no conflicts of interest. [10] M. A. Olivares-Mendez, C. Fu, P. Ludivig et al., “Towards
an autonomous vision-based unmanned aerial system against
wildlife poachers,” Sensors, vol. 15, no. 12, pp. 31362–31391, 2015.
Acknowledgments [11] A. Carrio, J. Pestana, J.-L. Sanchez-Lopez et al. et al., “Ubristes:
uav-based building rehabilitation with visible and thermal
This work was supported by the Spanish Ministry of Sci- infrared remote sensing,” in Proceedings of the Robot 2015:
ence (Project DPI2014-60139-R). The LAL UPM and the Second Iberian Robotics Conference, pp. 245–256, Springer
MONCLOA Campus of International Excellence are also International Publishing, 2016.
acknowledged for funding the predoctoral contract of one of [12] L. Li, Y. Fan, X. Huang, and L. Tian, “Real-time uav weed
the authors. scout for selective weed control by adaptive robust control and
12 Journal of Sensors

machine learning algorithm,” in Proceedings of the 2016 ASABE [32] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep
Annual International Meeting, American Society of Agricultural q-learning with model-based acceleration,” in Proceedings of the
and Biological Engineers, p. 1, 2016. 33rd International Conference on International Conference on
[13] J. L. Sanchez-Lopez, M. Molina, H. Bavle et al., “A multi- Machine Learning, vol. 48, pp. 2829–2838, New York, NY, USA,
layered component-based approach for the development of June 2016, preprint https://arxiv.org/abs/1603.00748.
aerial robotic systems: The aerostack framework,” Journal of [33] T. P. Lillicrap, J. J. Hunt, A. Pritzel et al., “Continuous control
Intelligent & Robotic Systems, pp. 1–27, 2017. with deep reinforcement learning,” preprint https://arxiv.org/
[14] A. Graves, “Generating sequences with recurrent neural net- abs/1509.02971.
works,” arXiv preprint https://arxiv.org/abs/1308.0850. [34] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end
[15] T. M. Mitchell, Machine Learning, vol. 45 (37), McGraw Hill, training of deep visuomotor policies,” Journal of Machine
Burr Ridge, Ill, USA, 1997. Learning Research, vol. 17, no. 39, pp. 1–40, 2016, preprint
https://arxiv.org/abs/1504.00702.
[16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT
Press, Cambridge, Mass, USA, 2016. [35] M. Zhang, Z. McCarthy, C. Finn, S. Levine, and P. Abbeel,
“Learning deep neural network policies with continuous mem-
[17] S. Hochreiter and J. Schmidhuber, “LSTM can solve hard long ory states,” in Proceedings of the 2016 IEEE International Confer-
time lag problems,” in Proceedings of the 10th Annual Conference ence on Robotics and Automation, ICRA 2016, pp. 520–527, May
on Neural Information Processing Systems, NIPS 1996, pp. 473– 2016.
479, December 1996.
[36] T. Zhang, G. Kahn, S. Levine, and P. Abbeel, “Learning deep
[18] A. Gibson and J. Patterson, Deep Learning, O’Reilly, 2016. control policies for autonomous aerial vehicles with MPC-
[19] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning guided policy search,” in Proceedings of the 2016 IEEE Interna-
algorithm for deep belief nets,” Neural Computation, vol. 18, no. tional Conference on Robotics and Automation, ICRA 2016, pp.
7, pp. 1527–1554, 2006. 528–535, May 2016.
[20] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle et al., “Greedy [37] U. Shah, R. Khawad, and K. M. Krishna, “Deepfly: Towards
layer-wise training of deep networks,” Advances in Neural complete autonomous navigation of mavs with monocular
Information Processing Systems, vol. 19, pp. 153–160, 2007. camera,” in Proceedings of the Tenth Indian Conference on
[21] P. Smolensky, “Information processing in dynamical systems: Computer Vision, Graphics and Image Processing, ICVGIP 16, pp.
foundations of harmony theory,” Tech. Rep., DTIC Document, 59:1–59:8, New York, NY, USA, 2016.
1986. [38] F. Sadeghi and S. Levine, “Real single-image flight without a
[22] G. E. Hinton, “Training products of experts by minimizing single real image,” preprint https://arxiv.org/pdf/1611.04201.pdf.
contrastive divergence,” Neural Computation, vol. 14, no. 8, pp. [39] D. K. Kim and T. Chen, “Deep neural network for real-time
1771–1800, 2002. autonomous indoor navigation,” preprint https://arxiv.org/abs/
[23] G. E. Hinton, P. Dayan, B. J. Frey, and R. M. Neal, “The “wake- 1511.04668.
sleep” algorithm for unsupervised neural networks,” Science, [40] A. Giusti, J. Guzzi, D. C. Ciresan et al., “A machine learning
vol. 268, no. 5214, pp. 1158–1161, 1995. approach to visual perception of forest trails for mobile robots,”
IEEE Robotics and Automation Letters, vol. 1, no. 2, pp. 661–667,
[24] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimen-
2016.
sionality of data with neural networks,” American Association
for the Advancement of Science. Science, vol. 313, no. 5786, pp. [41] K. Kelchtermans and T. Tuytelaars, “How hard is it to cross the
504–507, 2006. room? – training (recurrent) neural networks to steer a uav,”
preprint https://arxiv.org/abs/1702.07600.
[25] P. Vincent, H. Larochelle, I. Lajoie, and P. Manzagol, “Stacked
denoising autoencoders: learning useful representations in a [42] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich fea-
deep network with a local denoising criterion,” Journal of ture hierarchies for accurate object detection and semantic
Machine Learning Research, vol. 11, pp. 3371–3408, 2010. segmentation,” in Proceedings of the 27th IEEE Conference on
Computer Vision and Pattern Recognition (CVPR ’14), pp. 580–
[26] R. Salakhutdinov and G. Hinton, “Semantic hashing,” Interna-
587, Columbus, Ohio, USA, June 2014.
tional Journal of Approximate Reasoning, vol. 50, no. 7, pp. 969–
978, 2009. [43] R. Girshick, “Fast R-CNN,” in Proceedings of the 15th IEEE
International Conference on Computer Vision (ICCV ’15), pp.
[27] A. Krizhevsky and G. E. Hinton, “Using very deep autoencoders 1440–1448, December 2015.
for content-based image retrieval,” in Proceedings of the 19th
European Symposium on Artificial Neural Networks (ESANN [44] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards
’11), Bruges, Belgium, April 2011. real-time object detection with region proposal networks,” in
Advances in Neural Information Processing Systems, vol. 28, pp.
[28] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in 91–99, 2015.
robotics: A survey,” International Journal of Robotics Research,
vol. 32, no. 11, pp. 1238–1274, 2013. [45] J. Lee, J. Wang, D. Crandall, S. Šabanovic, and G. Fox, “Real-
time, cloud-based object detection for unmanned aerial vehi-
[29] R. S. Sutton and A. G. Barto, Reinforcement Learning: An cles,” in Proceedings of the 1st IEEE International Conference on
Introduction, vol. 1, MIT Press, Cambridge, UK, 1998. Robotic Computing (IRC), pp. 36–43, Taichung, Taiwan, April
[30] V. Mnih, K. Kavukcuoglu, D. Silver et al., “Playing atari with 2017.
deep reinforcement learning,” arXiv preprint https://arxiv.org/ [46] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only
abs/1312.5602. look once: Unified, real-time object detection,” in Proceed-
[31] V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control ings of the IEEE Conference on Computer Vision and Pattern
through deep reinforcement learning,” Nature, vol. 518, no. Recognition, pp. 779–788, 2016, preprint https://arxiv.org/abs/
7540, pp. 529–533, 2015. 1506.02640
Journal of Sensors 13

[47] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” Military Communications Conference (MILCOM), pp. 924–929,
preprint https://arxiv.org/abs/1612.08242. Baltimore, Md, USA, November 2016.
[48] W. Liu, D. Anguelov, D. Erhan et al., “Ssd: Single shot multibox [63] T. Zhang, G. Kahn, S. Levine, and P. Abbeel, “Learning deep
detector,” in Proceedings of the European Conference on Com- control policies for autonomous aerial vehicles with MPC-
puter Vision, pp. 21–37, Springer, 2016. guided policy search,” in Proceedings of the 2016 IEEE Interna-
[49] W. Li, H. Fu, L. Yu, and A. Cracknell, “Deep learning based oil tional Conference on Robotics and Automation (ICRA), pp. 528–
palm tree detection and counting for high-resolution remote 535, Stockholm, Sweden, May 2016.
sensing images,” Remote Sensing, vol. 9, no. 1, p. 22, 2017. [64] T. Morito, O. Sugiyama, R. Kojima, and K. Nakadai, “Partially
[50] S. W. Chen, S. S. Shivakumar, S. Dcunha et al., “Counting apples shared deep neural network in sound source separation and
and oranges with deep learning: a data-driven approach,” IEEE identification using a uav-embedded microphone array,” in
Robotics and Automation Letters, vol. 2, no. 2, pp. 781–788, 2017. Proceedings of the 2016 IEEE/RSJ International Conference on
Intelligent Robots and Systems, IROS 2016, pp. 1299–1304,
[51] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva,
October 2016.
“Learning deep features for scene recognition using places
database,” in Proceedings of the 28th Annual Conference on [65] S. Jeon, J.-W. Shin, Y.-J. Lee, W.-H. Kim, Y. Kwon, and
Neural Information Processing Systems 2014, NIPS 2014, pp. 487– H.-Y. Yang, “Empirical study of drone sound detection in
495, December 2014. real-life environment with deep neural networks,” preprint
https://arxiv.org/abs/1701.05779.
[52] O. A. B. Penatti, K. Nogueira, and J. A. Dos Santos, “Do deep
features generalize from everyday objects to remote sensing [66] D. Maturana and S. Scherer, “3D convolutional neural networks
and aerial scenes domains?” in Proceedings of the IEEE Confer- for landing zone detection from LiDAR,” in Proceedings of
ence on Computer Vision and Pattern Recognition Workshops, the IEEE International Conference on Robotics and Automation
CVPRW 2015, pp. 44–51, June 2015. (ICRA ’15), pp. 3471–3478, IEEE, Washington, DC, USA, May
2015.
[53] F. Hu, G.-S. Xia, J. Hu, and L. Zhang, “Transferring deep
convolutional neural networks for the scene classification of [67] Y. LeCun, B. E. Boser, J. S. Denker et al., “Handwritten digit
high-resolution remote sensing imagery,” Remote Sensing, vol. recognition with a back-propagation network,” in Advances in
7, no. 11, pp. 14680–14707, 2015. Neural Information Processing Systems, D. S. Touretzky, Ed., vol.
2, pp. 396–404, 1990.
[54] A. Gangopadhyay, S. M. Tripathi, I. Jindal, and S. Raman, “Sa-
cnn: dynamic scene classification using convolutional neural [68] A. Ghaderi and V. Athitsos, “Selective unsupervised feature
networks,” preprint https://arxiv.org/abs/1502.05243. learning with convolutional neural network (S-CNN),” in Pro-
ceedings of the 2016 23rd International Conference on Pattern
[55] C. Hung, Z. Xu, and S. Sukkarieh, “Feature learning based
Recognition (ICPR), pp. 2486–2490, December 2016.
approach for weed classification using high resolution aerial
images from a digital camera mounted on a UAV,” Remote [69] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifi-
Sensing, vol. 6, no. 12, pp. 12037–12054, 2014. cation with deep convolutional neural networks,” in Proceedings
of the 26th Annual Conference on Neural Information Processing
[56] J. Rebetez, H. F. Satizábal, M. Mota et al., “Augmenting a
Systems (NIPS ’12), pp. 1097–1105, Lake Tahoe, Nev, USA,
convolutional neural network with local histograms-a case
December 2012.
study in crop classification from high-resolution uav imagery,”
in Proceedings of the European Symposium on Artificial Neural [70] H. Kim, D. Kim, S. Jung, J. Koo, J.-U. Shin, and H. Myung,
Networks, 2016. “Development of a UAV-type jellyfish monitoring system using
deep learning,” in Proceedings of the 12th International Confer-
[57] J. Delmerico, E. Mueggler, J. Nitsch, and D. Scaramuzza, “Active
ence on Ubiquitous Robots and Ambient Intelligence, URAI 2015,
autonomous aerial exploration for ground robot path planning,”
pp. 495–497, October 2015.
IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 664–671,
2017. [71] N. V. Kim and M. A. Chervonenkis, “Situation control of
unmanned aerial vehicles for road traffic monitoring,” Modern
[58] J. Delmerico, A. Giusti, E. Mueggler, L. M. Gambardella, and D.
Applied Science, vol. 9, no. 5, pp. 1–13, 2015.
Scaramuzza, ““on-the-spot training” for terrain classification in
autonomous air-ground collaborative teams,” in Proceedings of [72] M. Bejiga, A. Zeggada, A. Nouffidj, and F. Melgani, “A convo-
the International Symposium on Experimental Robotics (ISER), lutional neural network approach for assisting avalanche search
EPFL-CONF-221506, 2016. and rescue operations with UAV imagery,” Remote Sensing, vol.
9, no. 2, p. 100, 2017.
[59] T. Taisho, L. Enfu, T. Kanji, and S. Naotoshi, “Mining visual
experience for fast cross-view UAV localization,” in Proceedings [73] A. Sawarkar, V. Chaudhari, R. Chavan, V. Zope, A. Budale,
of the 8th Annual IEEE/SICE International Symposium on and F. Kazi, “HMD vision-based teleoperating UGV and
System Integration, SII 2015, pp. 375–380, December 2015. UAV for hostile environment using deep learning,” CoRR
abs/1609.04147. URL http://arxiv.org/abs/1609.04147.
[60] T.-Y. Lin, Y. Cui, S. Belongie, and J. Hays, “Learning deep
representations for ground-to-aerial geolocalization,” in Pro- [74] C. Szegedy, W. Liu, Y. Jia et al., “Going deeper with convolu-
ceedings of the IEEE Conference on Computer Vision and Pattern tions,” in Proceedings of the IEEE Conference on Computer Vision
Recognition, CVPR 2015, pp. 5007–5015, June 2015. and Pattern Recognition (CVPR ’15), pp. 1–9, Boston, Mass, USA,
June 2015.
[61] F. Aznar, M. Pujol, and R. Rizo, “Visual Navigation for UAV
with Map References Using ConvNets,” in Advances in Artificial [75] The Technion – Israel Institute of Technology, “Technion
Intelligence, vol. 9868 of Lecture Notes in Computer Science, pp. aerial systems 2016,” in Journal Paper for AUVSI Student UAS
13–22, Springer, 2016. Competition, 2016.
[62] G. J. Mendis, T. Randeny, J. Wei, and A. Madanayake, “Deep [76] P. Santana, L. Correia, R. Mendonça, N. Alves, and J. Barata,
learning based doppler radar for micro UAS detection and “Tracking natural trails with swarm-based visual saliency,”
classification,” in Proceedings of the MILCOM 2016 - 2016 IEEE Journal of Field Robotics, vol. 30, no. 1, pp. 64–86, 2013.
International Journal of

Rotating
Machinery

International Journal of
The Scientific
(QJLQHHULQJ Distributed
Journal of
Journal of

Hindawi Publishing Corporation


World Journal
Hindawi Publishing Corporation Hindawi Publishing Corporation
Sensors
Hindawi Publishing Corporation
Sensor Networks
Hindawi Publishing Corporation
http://www.hindawi.com Volume 201 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Journal of

Control Science
and Engineering

Advances in
Civil Engineering
Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Submit your manuscripts at


https://www.hindawi.com

Journal of
Journal of Electrical and Computer
Robotics
Hindawi Publishing Corporation
Engineering
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

VLSI Design
Advances in
OptoElectronics
,QWHUQDWLRQDO-RXUQDORI

International Journal of
Modelling &
Simulation
$HURVSDFH
Hindawi Publishing Corporation Volume 2014
Navigation and
Observation
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
in Engineering
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
(QJLQHHULQJ
+LQGDZL3XEOLVKLQJ&RUSRUDWLRQ
KWWSZZZKLQGDZLFRP 9ROXPH
Hindawi Publishing Corporation
http://www.hindawi.com
http://www.hindawi.com Volume 201-

International Journal of
International Journal of Antennas and Active and Passive Advances in
Chemical Engineering Propagation Electronic Components Shock and Vibration Acoustics and Vibration
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

You might also like