0% found this document useful (0 votes)
32 views7 pages

On-Board Deep-Learning-Based Unmanned Aerial Vehicle Fault Cause Detection and Identification

Uploaded by

dhmittal2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views7 pages

On-Board Deep-Learning-Based Unmanned Aerial Vehicle Fault Cause Detection and Identification

Uploaded by

dhmittal2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2020 IEEE International Conference on Robotics and Automation (ICRA)

31 May - 31 August, 2020. Paris, France

On-board Deep-learning-based Unmanned Aerial Vehicle Fault Cause


Detection and Identification
Vidyasagar Sadhu, Saman Zonouz, Dario Pompili
Department of Electrical and Computer Engineering, Rutgers University–New Brunswick, NJ, USA
{hss64, saman.zonouz, pompili}@rutgers.edu

Abstract— With the increase in use of Unmanned Aerial Vehi- Segmented Detection
cles (UAVs)/drones, it is important to detect and identify causes CNN-BiLSTM
Scalar Sensor
of failure in real time for proper recovery from a potential Data Decoder
crash-like scenario or post incident forensics analysis. The cause
of crash could be either a fault in the sensor/actuator system, CNN-BiLSTM Only if
Anomaly
a physical damage/attack, or a cyber attack on the drone’s Encoder Detected
software. In this paper, we propose novel architectures based
on deep Convolutional and Long Short-Term Memory Neural NN Classification
Parrot Bebop 2
Networks (CNNs and LSTMs) to detect (via Autoencoder) and with Nvidia Classifier
classify drone mis-operations based on real-time sensor data. Jetson TX2 GPU
The proposed architectures are able to learn high-level features
automatically from the raw sensor data and learn the spatial Fig. 1: An overview of our proposed on-board deep-learning based UAV
and temporal dynamics in the sensor data. We validate the fault detection and identification/classification framework.
proposed deep-learning architectures via simulations and real-
world experiments on a drone. Empirical results show that our avoid actions such as sudden braking which will further exac-
solution is able to detect (with over 90% accuracy) and classify erbate the situation (as it results in loss of control). Similarly
various types of drone mis-operations (with about 99% accuracy
(simulation data) and upto 85% accuracy (experimental data)). in case of a flying object, for certain failures, gliding may
be the best solution instead of the much obvious landing.
On the other hand Artificial Intelligence (AI) based data
I. I NTRODUCTION driven techniques are increasingly being used to solve many
Overview and Motivation: The advancement in the tech- complex problems relating to autonomous vehicles [4]–[6],
nology of Unmanned Aerial Vehicles (UAVs)/drones and smartphones [7]–[9], etc.
concern of safety are pushing many government and defense Our Approach: Direct and continuous analysis of sensor
organizations to use UAVs for surveillance. E-shopping com- data for real-time identification of faults is not recommended
panies like Amazon are planning to use UAVs for home due to two reasons—(i) it is computationally prohibitive
delivery of their products. Further, drones are also being especially on resource constrained devices such as UAVs
planned for use as mobile air-policing vehicles in some as it requires processing huge set of sensor data; (ii) it
countries. The advantage that drones can replace humans in requires significant amount of precious on-board memory
potentially dangerous situations is the main factor behind resources to store the real-time stream of sensor data. Hence,
investing and researching on UAVs. considering the resource constraints [10], [11] of these
Drones or UAVs are Cyber Physical Systems (CPS) with devices, we adopt a two-step approach (Fig. 1) where in
the increase of which, there are risks of both physical as well the identification/classification step is carried out only if an
as cyber attacks on them [1]. Examples of cyber attacks are anomalous behavior is detected in the sensor data. Unlike
GPS spoofing attacks [2], signal jamming, control command the previous works which are mostly model-based [12], we
attacks, attacks on sensors [3], keylogging virus, etc. Exam- follow a completely (sensor) data-driven approach (using
ples of physical attacks (both unintentional and intentional) UAV Inertial Measurement Unit (IMU) sensor data such
are bird hits, abrupt wind changes, broken propellers, etc. as accelerometer, gyroscope, etc.) for both detection and
Large sized drones/UAVs are capable of killing people if identification steps.
they fall from heights due to the massive potential energy The reason for choosing data-driven approach (such as
possessed by them. The increase in use of drones/UAVs deep learning techniques) over traditional model-based ap-
currently and in projected future makes real-time incident proaches are as follows. Deep learning techniques—(a) have
analysis for drones or UAVs a priority. The hobbyists owning ability to learn complex patterns especially non-linear func-
drones/UAVs, researchers and the government all will be tions; the sensor data of a UAV at times of potential crash
more curious to know the cause that prevented UAVs from events (such as broken propeller) is highly non-linear and
not reaching its destination or deviated from its intended complex in nature; (b) have no requirement to manually
path. As such UAV fault/anomaly detection as well as cause design the features from the data—the layers in a deep
identification are important. Firstly it is important to detect network learn meaningful features on their own during the
when the UAV’s operation deviates from the normal. Once it training process. This also translates to another advantage of
is determined that something is anomalous, more resources not needing domain expertise to extract the features; (c) can
can be utilized to identify the cause. Identifying reasons for work with unlabeled data in unsupervised fashion to generate
failure is important so that appropriate action can be taken features. This is very much beneficial for crash-like scenarios
to minimize further loss. For example, in the case of a car, due to scarce availability of labelled data. To this end,
knowing that the failure is caused by a flat tire will help to we propose a novel Convolutional Neural Network (CNN)

978-1-7281-7395-5/20/$31.00 ©2020 IEEE 5255


Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KANPUR. Downloaded on October 18,2024 at 10:42:40 UTC from IEEE Xplore. Restrictions apply.
and Bidirectional-Long Short Term Memory (Bi-LSTM) data such as mean, variance, entropy, etc. Certain statistical
deep neural network based autoencoder for detection of tests/formal rule checking actions are performed on these
faults/anomalous patterns followed by a CNN-LSTM deep features to determine if the data is anomalous. However
network for their classification/identification. these approaches work only when the anomalous patterns are
Our Contributions can be summarized as follows. known apriori so they can be monitored on the sensor data.
• We propose a novel Convolutional Neural Net- In dimensionality reduction, the data is projected onto a low-
work (CNN) and Bidirectional Long Short Term Mem- dimensional representation (such as principal components in
ory (Bi-LSTM) based deep autoencoder network archi- Principal Component Analysis (PCA)). The idea is that, this
tecture for real-time detection of anomalous patterns in low-dimensional representation captures the most important
UAV IMU sensor data. features of the input data. Then clustering techniques such as
• We propose a novel CNN and LSTM based deep neural k-means or Gaussian Mixture Models (GMMs) are used to
network classifier for real-time identification of the cluster these low-dimensional features to identify anomalies.
(cause of) fault/attack/crash based on the UAV IMU In distribution-based approaches, the training data is fit to a
sensor data. distribution (such as multi-variate gaussian distribution or
• We induce crash scenarios by modifying the firmware a mixture of them). Then given a test point, distance is
internals of both the AirSim drone simulator as well as calculated of this test point from the fitted distribution (e.g.,
a real drone [13]. Mahalanobis distance) representing the measure of anomaly.
• We validate the proposed models via both experiments Deep-learning Approaches: Deep-learning techniques
and simulations. According to the results, our solution have been widely used to solve many problems in different
is able to detect anomalies with over 90% accuracy domains. For example, deep neural network architectures
and can classify drone mis-operations correctly with have been used to predict seizures [20], deep CNNs have
about 99% (simulation data) and upto 85% accuracy been used extensively in content recommendation [21],
(experimental data). speech recognition [22], computer vision [23], etc. On the
Paper Organization: In Sect. II, we review related work. other hand, RNNs and LSTMs have been used for Model
In Sect. III, we describe the proposed deep-learning based Predictive Control based robotic manipulation [24], language
UAV fault/crash detection and identification methods. In modeling [25], phoneme recognition [26], etc. To the au-
Sect. IV, we present both experimental and simulation re- thors’ best knowledge, deep learning techniques have never
sults. Finally, in Sect. V, we conclude the paper and sketch been used to detect/identify the cause of the UAV crashes
future work. based on sensor data. In this paper, we propose novel deep
learning architectures to detect and identify the cause of UAV
II. R ELATED W ORK crashes or crash-like scenarios from drone’s IMU data.
We position our work with respect to the related work that
III. P ROPOSED S OLUTION
can be classified into the following categories.
Fault Detection and Identification (FDI): Much of the In this section, we first explain our CNN Bi-LSTM autoen-
existing work on FDI focuses on faults in sensors/actuators coder network to detect anomalies followed by CNN-LSTM
in the UAV. For example, Panitsrisit et al. [14] propose network classifier to identify anomalies/crash scenarios.
a hardware duplication system consisting of piezoresistive CNN and Bi-LSTM based Detector (‘AutoEnc’):
sensor, pressure sensor, and current sensors to detect faults Anomaly detection using unsupervised learning consists of
in the elevator of the UAV. Any abnormal outputs from these two steps. In step 1, the system is trained with several
sensors will be detected as a failure. Rago et al. [15] present normal examples to learn representations of the input data
a FDI method for the failure of sensors/actuators based on e.g., GMM clustering. Because we are dealing with temporal
Interacting Multiple-Model (IMM) Kalman Filter approach. data, a sliding window approach needs to be adopted to learn
Actuator/sensor failures are represented by a change in these representations. In step 2, given a test data point, we
the model representing the dynamics (measurements) of define an anomaly score based on the learned representations,
the system. Drozeski et al. [16] present an FDI method e.g., distance from the mean of the cluster. In an LSTM
using three-layer feed-forward neural network based on autoencoder, input time series data, {x0 , x1 , ...xn }, of size
state information. Heredia et al. [17] uses Observer/Kalman n + 1 (corresponding to one window of data segmented from
Identification (OKID) estimator to estimate the system state full data) is fed to the encoders which consist of n + 1
from measured input-output data. Detection of faults is done LSTM cells. The output of the last LSTM cell—called the
by noting deviation from the expected output beyond an embedding—is fed as input to a series of n + 1 LSTM cells
accepted threshold. They use a separate estimator for each to generate an output, {a0 , a1 , ...an }. The autoencoder is
output to make the identification problem trivial. Taking a trained by minimizing the reconstruction error, |x − a|2 .
different approach, Suarez et al. [18] use kalman filtering in Convolutional Bi-LSTM Encoder: The basic encoder in
combination with visual techniques such as 3D projection an LSTM autoencoder does not perform sufficiently well
from two observers to detect faults in the target UAV in a as it not does take into account: (i) inter channel/modal
multi-UAV setting. correlations (ii) directionality of data. We design an encoder
Anomaly Detection: Anomaly detection is generally an that addresses these issues as shown in Fig. 3(top). It consists
unsupervised machine learning (ML) technique due to lack of a series of 1-dimensional (1D) convolutional layers fol-
of sufficient examples for the anomalous class. Within un- lowed by bi-LSTM layers. The convolutional layers help in
supervised learning, it can be broadly classified into the capturing inter-channel spatial correlations, while the LSTM
following categories —statistical/regression, dimensionality layers help in capturing inter- and intra-channel temporal
reduction and distribution-based approaches. In statistical correlations. The number of filters, the size of filter kernel
approaches [19], features are generally hand-made from the and the type of padding (with a default stride length of 1) is

5256
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KANPUR. Downloaded on October 18,2024 at 10:42:40 UTC from IEEE Xplore. Restrictions apply.
Re-use Encoder Layers

Multi-modal Fully
Window 1D Conv. Same Bi-LSTM Concat. No.of Softmax
7UDLQHG 5HFRQ )LW Scalar connected
7UDLQ 'DWD Data Layers Padding Layers Output Classes Layer
0RGHO (UURUV *DXVVLDQ Sensor Data Layers
1
[ Fig. 4: Proposed deep CNN and Bi-LSTM architecture for fault classifica-
tion. For 1D Conv. and Bi-LSTM layers, please refer the encoder in Fig. 3.
Roll Accelaration Data on x axis
50 5
0DKDODQRELV

Sensor Data

Sensor Data
7UDLQHG 5HFRQ 1 5DQN $QDO\]H 0 0
7HVW 'DWD 0RGHO
'LVWDQFH 6FRUHV 7RS -50 -5
(UURUV 6FRUHV -100 -10
170 175 180 185 190 195 200 170 175 180 185 190 195 200

[W ííííííííííííííííí Time (s) Time (s)

¥ [W í 7 í [W í Pitch Acceleration Data on Y axis


Fig. 2: (Top) After the model is trained, we fit the reconstruction errors to 20 10

Sensor Data
0

Sensor Data
5
a multivariate gaussian model. (Bottom) Given a test data point, we first -20
-40
0

find the reconstruction error and then find the mahalanobis distance of this -60
-80
-5
-10

point w.r.t. the fitted gaussian distribution. Top scores can be analyzed as 170 175 180 185
Time (s)
190 195 200 170 175 180 185
Time (s)
190 195 200

needed. Yaw Acceleration Data on Z axis

Multi-layer Bi-LSTM Encoder 300 0

Sensor Data

Sensor Data
250 -5

1D Convolution BW
LSTM
BW
LSTM
BW
LSTM
bw_1 200 -10

150 -15
170 175 180 185 190 195 200 170 175 180 185 190 195 200
6@25x1 48@25x1 64@25x1 96@25x1 Time (s) Time (s)
FW FW FW fw_1
LSTM LSTM LSTM

5x1
Fig. 5: Sensor data corresponding to broken propeller scenario.
5x1 3x1
... 1D Conv ... 1D Conv ... 1D Conv ...
"same" "same" "same"
padding padding padding

t
BW
LSTM
BW
LSTM
BW
LSTM
bw_0
sure of anomaly for step 2, we design an enhanced method
with further processing to obtain better results (Fig. 2). After
fw_0
FW
LSTM
FW
LSTM
FW
LSTM the network is trained, the train data is again fed to the trained
Multi-layer Bi-LSTM Decoder t0 t1 t24
network to capture the reconstruction errors. These errors
t are then fit to a multivariate gaussian distribution. Given
bw_1 BW
LSTM
BW
LSTM
BW
LSTM a test data point, the reconstruction error is first calculated
fw_1 FW FW FW 512@25x1 64@25x1 48@25x1 6@25x1
using the trained model. Mahalanobis distance of the error
LSTM LSTM LSTM
is then calculated with respect to the fitted gaussian model
...
3x1
1D DeConv
...
5x1
1D DeConv
...
5x1
1D DeConv
...
using the formula shown in Fig. 2. These distances, which
bw_0 BW BW BW
"same"
padding
"same"
padding
"same"
padding are considered anomaly scores are then sorted in decreasing
LSTM LSTM LSTM

t
order and analyzed as per requirements e.g., top 0.01%.
fw_0
FW FW FW CNN and LSTM based Classifier (‘DCLNN’): We
LSTM LSTM LSTM
1D DeConvolution
t0 t1 t24
consider the drone’s sensor signatures in crash or crash-like
t scenarios to be very valuable. These signatures are mostly
Fig. 3: Convolutional and Bi-LSTM encoder (top) and decoder (bottom) of unique to the events that caused them. As such, we claim
the proposed autoencoder. that these signatures can be used to identify those events
by building a classifier mapping sensor signatures to events
indicated in Fig. 3(top). For example, in the first convolution that caused them. For example, the data collected from the
step, 48 filters of size 5 × 1 are applied to the input data 3DR Solo drone [27] after a propeller was broken is shown
of size, 25 × 1 × 6 (assuming a 6-channel input data), to in Fig. 5. The plots show that the drone was at stable
result in an output of size, 25 × 1 × 48. Unidirectional LSTM state when one of its propellers was broken, this resulted in
layers capture temporal patterns only in one-direction, while large variations in the accelerometer and pitch-roll-yaw data
the data might exhibit interesting patterns in both directions. (unique signatures) since the drone accelerates in a particular
Hence to capture these patterns, we have a second set of direction after the thrust on the broken propeller is zero.
LSTM cells for which the data is fed in the reverse order. These signatures are zoomed to show the variation. Other
Further, we have multiple layers of these bi-LSTM layers to sensor data such as gyroscope show similar variations.
extract more hierarchical information. All the data that has We propose a novel CNN and Bi-LSTM based architecture
been processed through multiple convolutional and bi-LSTM to classify the sensor data in real-time to identify any
layers is available in the cell states of final LSTM cells. This potential crash scenarios. This is useful in two ways—(a) it
is the output of the encoder which will be fed as input to can be used for recovery planning to stabilize the drone
our decoder. using either redundant hardware or designing appropriate
Convolutional Bi-LSTM Decoder: The decoder performs controller techniques that work only based on less than
encoder operations in reverse order so as to reconstruct the usual number of actuators; (b) in cases where the crash is
input data (Fig. 3(bottom)). It first consists of bi-LSTM unavoidable, the logged sensor data can be used to identify
layers which take the final cell states from encoder as one the cause of the crash offline. The proposed deep architecture
of the inputs (the other input being zero). Other input (other is shown in Fig. 4. It consists of convolutional layers in the
than the previous cell state) can be either zero or the output beginning as in the case of ‘AutoEnc’ to extract important
of the previous LSTM cell. The outputs of LSTM layers static/spatial features followed by LSTM layers to capture the
are fed as input to a series of 1D de-convolutional layers dynamic/temporal variations in the sensor data. As such the
which perform reverse of convolution (also called transposed encoder layers in Fig. 3 can be reused as shown in both Fig. 4
convolution) to generate data with same shape as that of input and Fig. 1. At every convolutional layer, each channel is
data to encoder (6-channel 1D data of length 25). processed by multiple kernels/filters resulting in those many
Though reconstruction error can directly be used as a mea- feature maps in the subsequent layer. The output of the con-

5257
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KANPUR. Downloaded on October 18,2024 at 10:42:40 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c)
Fig. 6: (a) Training reconstruction loss across each channel and combined data (experimental data); Receiver Operating Characteristic (ROC) curve for
different number of channels and window sizes—(b) experimental data; (c) simulation data.

volutional layers is time-wise unrolled and passed through hovering time followed by crash corresponding to one of
bi-LSTM layers to capture the temporal dependencies in the 9 classes. We collected Accelerometer, Gyroscope and
both directions. The concatenated outputs (of forward and Magnetometer data sampled at 100 Hz (maximum supported
backward LSTM layers) are then passed through a series of rate) for each crash event.
fully connected layers ending in the softmax layer (of length Training Description. We used 70% of data for training
equal to number of classes) that computes the probabilities and 30% for testing in both experiments and simulations.
of different classes. We do not use pooling layers after Since our data is time-series, we processed it into windows
convolutional layers as our input data has been segmented of size 100, 50, 25 timesteps with a stride length of 10 for
into windows and the output of the convolutional layers suitable analysis. All the data till the crash is used for
need to be passed through time-series LSTM layers [28]. We training/testing AutoEnc as it considered normal operation.
also introduce dropout at several layers in our architectures Transition data at the time of crash is used for training/testing
wherein, some of the activations chosen randomly are made DCLNN along with corresponding class label (i.e., crash
zero. This will act as regularization and help prevent the scenario mentioned above). We used tensorflow to build, train
network to depend on some idiosyncrasies and instead learn and test our models with a minibatch size of 512 windows.
the general structure. The number and the size of filters used for CNN layers
Real-time operation. Once the models are trained offline, is as shown in Fig. 3. We used two-layers of bi-LSTMs
detection and (if necessary) identification can be done in real- with a hidden size of 256 units for each LSTM cell. We
time as it comprises only of segmenting the streaming sensor trained the overall network for 100−300 epochs using Adam
data and applying a few matrix multiplications to arrive at the optimizer [29] with learning rate of 0.01 and epsilon of 0.01.
result. Identification step is carried out only if the anomaly
Detection Results. Fig. 6(a) shows the training reconstruc-
scores from detection step is beyond a threshold. Once the
tion loss of the experimental data for our AutoEnc model
cause is identified/diagnosed in real-time, appropriate actions
over several training epochs for 6-channels (accelerometer
can be taken to stabilize/safeguard the drone operation.
and gyroscope) individually and in combined form. We
IV. P ERFORMANCE E VALUATION can notice that as the model learns, the reconstruction loss
In this section, we first present our experimental and simu- decreases and converges. We considered detection as a binary
lation setup followed by detection and identification results. problem. Fig. 6(b) shows the Receiver Operating Charac-
For each case, we compare our approach with traditional teristic (ROC) curve for the experimental data compared
machine learning classifiers such as SVM. with SVM classifier (with best parameters: Radial Basis
Experimental Setup. Crash data in Fig. 5 is collected Function (RBF) kernel, C = 10, γ = 0.1). We can notice
using 3DR Solo drone. However, we could not use the that: (i) accuracy increases as the number of channels or the
same drone for experiments as it weighs 1500 grams and window size considered is increased; (ii) AutoEnc performs
easily gets damaged when it falls from heights. This is more better than SVM except for the 3-channel, 25 window size
relevant in the case of deep learning where more amount case. We believe the reason is due to lack of sufficient
of data needs to be collected to train the models. For this data for training AutoEnc (as AutoEnc is a deep learning
purpose, we used another small drone, called CrazyFlie network more data is required to train the model). For the
2.0 [13], shown in Fig. 7(a) weighing just 37 grams and same reason, this behavior is not observed in 6-channel case.
much more robust to falls. In order to evaluate our approach, Fig. 6(c) shows the ROC curve results for our simulation
we considered a total of 15 crash scenarios (classes)— data (we describe the simulator setup and data collection
all combinations of one/two/three/four propeller breakdown below). We can observe accuracy greater than 97% and also
cases. We modified the drone firmware by assigning a value notice that we do not observe the similar problem observed
of zero to the variables representing the propeller Revolutions in experimental data.
Per Minute (RPM) at appropriate levels within the software As can be observed from these experimental results, the
to induce crash. However, due to firmware limitations, we detection accuracy can still be improved. We believe the rea-
could not successfully induce the following cases—all four son for this is the unstable nature of our drone (Crazyflie 2.0)
cases of three-propeller breakdown and 2 cases of diagonal used for experiments. As the drone is very light weight (only
two-propeller breakdown, resulting in only 9 classes totally. 37 grams) and hard to control, it exhibits a certain element of
We collected the data for 30 runs; in each run, the drone randomness. This is also reflected in crash signatures making
would have a 2 second upflight time, an 8 seconds of it hard for the network to learn meaningful representations

5258
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KANPUR. Downloaded on October 18,2024 at 10:42:40 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c)
Fig. 7: (a) Crazyflie 2.0 drone used for experiments. (b) Accuracy using Magnetometer data. (c) Comparison of test accuracy across different channels.

as we show below for experimental data. On the other hand, angular versions of acceleration, velocity, and position along
we are also restricted in using larger drones due to—(i) state all the three axes from the start of the crash until the end.
laws prohibiting flying drones outdoors without an expensive We repeated this experiment 300 times to account for noises
license and it is very risky to fly these drones indoors; (ii) and gather sufficient data. We have included the video demo
these drones get easily damaged when they fall from heights of the crash experiments run in AirSim simulator along with
making them unsuitable for crash experiments. To find a this submission. For results below, we used only 3-channel
sweet spot between these two extremes, we decided to use linear acceleration data (instead of all 18 channels), unless
a realistic drone simulator to circumvent these problems. otherwise specified, as there is a need to limit the amount of
We felt the simulation should not be simplistic, ignoring data to process on a real drone.
the physical aspects including environmental objects such as Identification/Classification Results. We used our
trees and poles, kinematics such as drag, fiction and gravity, DCLNN architecture in Fig. 4 with three convolutional
etc. For these reasons, we adopted Microsoft’s AirSim drone layers ([48, 64, 96] filters with kernel sizes [5, 5, 3]) fol-
simulator [30] which is an open-source simulator written in lowed by two bi-directional LSTM layers (128 units) and
C++. The simulator tries to simulate the real-world as closely one dense/softmax layer. The training is performed for 15
as possible by trying to include all effects involved including epochs in mini-batches of size 64 using Adam optimizer and
collisions (see Fig. 8(a) for a screenshot of the simulator). categorical cross entropy as loss function. We now present
Simulator Setup. We first describe the simulator archi- identification results using experimental data.
tecture followed by data collection method. Comparison with SVM. Fig. 7(b) shows the accuracy of
Simulator Architecture: The core components of AirSim our model (DCLNN) compared against classical machine
include environment model, vehicle model, physics engine, learning classifier such as Support Vector Machine (SVM).
sensor models, rendering interface, public API layer and Accuracy is defined as the percentage of windows classified
an interface layer for vehicle firmware as depicted in Fig- correctly. We can notice that DCLNN’s accuracy increases as
ure 8(b). It is necessary to have simulated environments the training is carried out over several epochs (cross-entropy
have reasonable details. For this purpose, AirSim leverages loss also reduces accordingly but is not plotted due to space
rendering technologies implemented by Unreal engine [31]. limitations), reaching to 70% finally. We can also notice that
In addition, AirSim also utilizes the underlying pipeline in it performs better than SVM classifier (RBF kernel, C = 10,
the Unreal engine to detect collisions. The AirSim code γ = 0.01), which could only achieve 54% accuracy. Due
base is implemented as a plugin for the Unreal engine. For to space limitations, we have shown comparison only with
in-depth details on different realistic models used in the Magnetometer data; in other channels too, DCLNN performs
simulator, please refer [30]. better than SVM in a similar manner.
Inducing Crash: In order to simulate a broken propeller, Variation with Channels. There is a need to limit the
we make its RPM to zero by supplying zero current to its amount of data processed during inference considering the
motor. However, for this to work successfully, it is necessary resource scarcity of UAVs. We were curious if all the
to constantly provide zero current to the effected motor. A channels contributed equally to the learning of the network.
one-time operation would not be sufficient as the currents to For this purpose, we trained our network by giving different
the motors are generated in a high-frequency update loop channel data each time and plotting accuracy on test data as
using a PID controller and hence correct current values shown in Fig. 7(c). We can notice that in 3-channel scenario,
(which do not induce crash) are provided to the motors in Magnetometer performs best with 70% accuracy, while com-
subsequent update loops. Hence, we modified the firmware bining all 9-channel data yields an accuracy of about 85%.
code to make the effected propeller’s motor current to zero We now present results corresponding to simulation data.
inside the update loop itself so that its RPM is continuously Comparison with SVM. Fig. 9(a) shows the training and
zero, resulting in a crash. Fig. 8(c) shows the accelerometer testing loss as our network is trained over several epochs.
data (x,y,z axes) after the crash is induced by making the We notice that the loss reduces and converges to a value.
RPM of one of the propellers to zero. By comparing with This indicates that the network is learning over time and
the actual drone crash data (accelerometer) from 3DR Solo fits the data reasonably. Fig. 9(b) shows the accuracy of our
drone in Fig. 5, we can see that the simulation data is more model (DCLNN) compared against SVM. Interestingly, after
complex and hence difficult to learn than the former. We only 3 epochs, DCLNN almost converged with about 90%
successfully simulated all 15 crash scenarios (classes). In all accuracy (with final value around 93%). The test accuracy
these scenarios, we collected 18-channel data viz., linear and also finally converges to 93%, whereas SVM classifier (RBF

5259
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KANPUR. Downloaded on October 18,2024 at 10:42:40 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c)
Fig. 8: (a) A snapshot from AirSim shows an aerial vehicle flying in an urban environment. The inset shows depth, object segmentation and front
camera streams generated in real time. (b) The architecture of the system that depicts the core components and their interactions. (c) Three-dimensional
accelerometer data after one of the propeller’s RPM is made zero (simulating broken propeller crash).

(a) (b) (c)


Fig. 9: (a) Cross-entropy loss vs. number of training epochs for both train data and test data. (b) Accuracy vs. number of training epochs compared
between DCLNN (ours) and SVM classifier for both train and test data. (c) Accuracy on test data when the channel data fed to the network is varied.

TABLE I: Inference Times for AutoEnc on different hardware.


The results are shown in Table I. The numbers indicate the
No. Chans. Desktop Desktop RaspberryPi Jetson TX2 time taken to do inference on a single window of data (100
(Training) (Inference) (Inference) (Inference)
samples) with the specified number of channels and the
1 (x/y/z) 202 ms 82 ms 312 ms 83 ms
2 (xy/yz/xz) 386 ms 87 ms 372 ms 92 ms
hardware. These numbers show that AutoEnc is amenable
3 (xyz) 561 ms 95 ms 484 ms 101 ms for real-time inference on a drone aiding in detection of
potentially danger modes. Specifically, we can observe that
kernel, C = 100, γ = 0.01) achieves only 85% accuracy. in the case of Nvidia Jetson TX2, the results are impressive
We did not plot F1-score as our class distribution is equal. around 100 ms or less, which means the drone can do
inference on the last one second of sensor data every 100
Variation with Channels. Fig. 9(c) shows the variation ms. It is to be noted that, while new sensor data is sampled
with channel data. We can see that 3-channel gyroscope every 10 ms, it may not be necessary to do the inference at
data works better (with about 98% accuracy) than 3-channel the same speed.
accelerometer (93% accuracy) data. Within 3-channel gy-
roscope, we can notice that axes Y and Z give better V. C ONCLUSION AND F UTURE W ORK
performance than axis X. By knowing this, we can discard We proposed novel deep architectures to detect and iden-
axis X data and only process axes Y and Z to limit the tify the cause of the malfunction. We have shown that the
amount of computation. We can see that axes Y and Z proposed architecture is able to achieve over 90% accuracy
combined can give an accuracy of about 87% accuracy, for detection and upto 85% accuracy for identification over
which could be sufficient in some cases. experimental data (all-channels combined) and 99% over
Raspberry-Pi and Nvidia Jetson Profiling. The above re- simulation data (just 3-channels).
sults are obtained on a desktop computer with Intel Quad- As future work, we plan to test our model on more
Core i7-2600 3.4 GHz processor with 8GB RAM. However, crash/attack scenarios e.g., partially broken but functional
we wanted to know how long it takes to do inference on propellers, cyber attacks, etc. We also plan on testing our
hardware that can be mounted on a drone and powered models on heterogeneous UAV platforms to assess their
using a drone battery. For this purpose, we considered two generalizability potential. Later, we will develop real-time
embedded computing devices—Raspberry Pi 3 Model B and decision making techniques to safeguard the drone from the
Nvidia Jetson TX2 module, both of which can be mounted on identified misoperation.
a drone to augment its computing capabilities. The former Acknowledgements: We thank PhD student Sriharsha
has a QuadCore 1.2 GHz Broadcom BCM2837 processor Etigowni for his help in the experiments. This work was
with 1GB RAM, the latter has Quadcore 2 GHz ARM supported by the ONR YIP Grant No. 11028418 and by the
processor with 8GB of CUDA-compatible graphic memory. National Science Foundation (NSF).

5260
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KANPUR. Downloaded on October 18,2024 at 10:42:40 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES [15] C. Rago, R. Prasanth, R. K. Mehra, and R. Fortenbaugh, “Failure
detection and identification and fault tolerant control using the imm-
[1] K. Hartmann and C. Steup, “The vulnerability of uavs to cyber attacks- kf with applications to the eagle-eye uav,” in Proceedings of the 37th
an approach to the risk assessment,” in Cyber Conflict (CyCon), 2013 IEEE Conference on Decision and Control (Cat. No.98CH36171),
5th International Conference on. IEEE, 2013, pp. 1–23. vol. 4, Dec 1998, pp. 4208–4213 vol.4.
[2] A. J. Kerns, D. P. Shepard, J. A. Bhatti, and T. E. Humphreys, [16] G. R. Drozeski, B. Saha, and G. J. Vachtsevanos, “A fault detection
“Unmanned aircraft capture and control via gps spoofing,” Journal and reconfigurable control architecture for unmanned aerial vehicles,”
of Field Robotics, vol. 31, no. 4, pp. 617–636, 2014. in 2005 IEEE Aerospace Conference. IEEE, 2005, pp. 1–9.
[3] Y. Son, H. Shin, D. Kim, Y. Park, J. Noh, K. Choi, J. Choi, Y. Kim [17] G. Heredia and A. Ollero, “Sensor fault detection in small autonomous
et al., “Rocking drones with intentional sound noise on gyroscopic helicopters using observer/kalman filter identification,” in 2009 IEEE
sensors,” in Proceedings of the 24th USENIX Conference on Security International Conference on Mechatronics, April 2009, pp. 1–6.
Symposium. USENIX Association, 2015, pp. 881–896. [18] A. Suarez, G. Heredia, and A. Ollero, “Cooperative sensor fault re-
[4] V. Sadhu, T. Misu, and D. Pompili, “Deep Multi-Task Learning for covery in multi-uav systems,” in 2016 IEEE International Conference
Anomalous Driving Detection Using CAN Bus Scalar Sensor Data,” in on Robotics and Automation (ICRA). IEEE, 2016, pp. 1188–1193.
IEEE/RSJ International Conference on Intelligent Robots and Systems [19] D. B. Araya, K. Grolinger, H. F. ElYamany, M. A. Capretz, and G. Bit-
(IROS), 2019, pp. 1–6. suamlak, “An ensemble learning framework for anomaly detection in
[5] M. Rahmati, M. Nadeem, V. Sadhu, and D. Pompili, “UW-MARL: building energy consumption,” Energy and Buildings, vol. 144, pp.
Multi-Agent Reinforcement Learning for Underwater Adaptive Sam- 191–206, jun 2017.
pling using Autonomous Vehicles,” in ACM International Conference [20] M.-P. Hosseini, H. Soltanian-Zadeh, K. Elisevich, and D. Pompili,
on Underwater Networks and Systems (WUWNet), Atlanta, GA, USA, “Cloud-based deep learning of big eeg data for epileptic seizure
oct 2019, pp. 1–6. prediction,” in IEEE Global Conference on Signal and Information
[6] W. Chen, M. Rahmati, V. Sadhu, and D. Pompili, “Real-time Im- Processing (GlobalSIP). IEEE, 2016.
age Enhancement for Vision-based Autonomous Underwater Vehicle [21] A. van den Oord, S. Dieleman, and B. Schrauwen, “Deep content-
Navigation in Murky Waters,” in ACM International Conference on based music recommendation,” in Proceedings of the 26th Interna-
Underwater Networks and Systems (WUWNet), Atlanta, GA, USA, tional Conference on Neural Information Processing Systems. Curran
oct 2019, pp. 1–8. Associates Inc., 2013, pp. 2643–2651.
[7] V. Sadhu, S. Zonouz, V. Sritapan, and D. Pompili, “HCFContext: [22] T. N. Sainath, B. Kingsbury, G. Saon, H. Soltau, A.-r. Mohamed,
Smartphone Context Inference via Sequential History-based Collab- G. Dahl, and B. Ramabhadran, “Deep Convolutional Neural Networks
orative Filtering,” in IEEE International Conference on Pervasive for Large-scale Speech Tasks,” Neural Networks, vol. 64, pp. 39–48,
Computing and Communications (PerCom), mar 2019, pp. 1–10. 2015.
[8] V. Sadhu, G. Salles-Loustau, D. Pompili, S. Zonouz, and V. Sri- [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
tapan, “Argus: Smartphone-enabled human cooperation for disaster with deep convolutional neural networks,” in Advances in Neural
situational awareness via MARL,” in IEEE International Conference Information Processing Systems. ACM, 2012.
on Pervasive Computing and Communications Workshops, PerCom [24] I. Lenz, R. Knepper, and A. Saxena, “Deepmpc: Learning deep latent
Workshops, 2017. features for model predictive control,” in RSS, 2015.
[9] V. Sadhu, S. Zonouz, V. Sritapan, and D. Pompili, “CollabLoc: [25] T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, and S. Khudanpur,
Privacy-preserving Multi-modal Collaborative Mobile Phone Local- “Extensions of recurrent neural network language model,” in 2011
ization,” IEEE Transactions on Mobile Computing, pp. 1–13, 2019. IEEE International Conference on Acoustics, Speech and Signal
[10] X. Zhao, V. Sadhu, and D. Pompili, “Analog Signal Compression Processing (ICASSP). IEEE, may 2011, pp. 5528–5531.
and Multiplexing Techniques for Healthcare Internet of Things,” in [26] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition
IEEE International Conference on Mobile Ad Hoc and Sensor Systems with deep recurrent neural networks,” in 2013 IEEE International
(MASS), 2017. Conference on Acoustics, Speech and Signal Processing. IEEE, may
[11] M. Rahmati, V. Sadhu, and D. Pompili, “ECO-UW IoT: Eco-friendly 2013, pp. 6645–6649.
Reliable and Persistent Data Transmission in Underwater Internet [27] 3DR, “Solo Drone,” https://3dr.com/solo-drone/.
of Things,” in Annual IEEE International Conference on Sensing, [28] F. Ordóñez and D. Roggen, “Deep Convolutional and LSTM
Communication, and Networking (SECON), Boston, MA, USA, jun Recurrent Neural Networks for Multimodal Wearable Activity
2019, pp. 1–9. Recognition,” Sensors, vol. 16, no. 1, p. 115, jan 2016. [Online].
[12] A. Hasan, V. Tofterup, and K. Jensen, “Model-based fail-safe module Available: http://www.mdpi.com/1424-8220/16/1/115
for autonomous multirotor UAVs with parachute systems,” in 2019 [29] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimiza-
International Conference on Unmanned Aircraft Systems, ICUAS 2019. tion,” in International Conference on Learning Representations, San
Institute of Electrical and Electronics Engineers Inc., jun 2019, pp. Diego, CA, USA, may 2015.
406–412. [30] Microsoft Research, “AirSim: High-Fidelity Visual
[13] Bitcraze, “Crazyflie 2.0,” https://www.bitcraze.io/crazyflie-2/. and Physical Simulation for Autonomous Vehicles,”
[14] P. Panitsrisit and A. Ruangwiset, “Sensor system for fault detection https://github.com/Microsoft/AirSim.
identification and accommodation of elevator of uav,” in SICE Annual [31] B. Karis and E. Games, “Real Shading in Unreal Engine 4,” in
Conference 2011, Sept 2011, pp. 1035–1040. Physically Based Shading Theory Practice, 2013.

5261
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KANPUR. Downloaded on October 18,2024 at 10:42:40 UTC from IEEE Xplore. Restrictions apply.

You might also like