Flight Data Monitoring (FDM) Unknown Hazards Detection During Approach Phase Using Clustering Techniques and AutoEncoders
Flight Data Monitoring (FDM) Unknown Hazards Detection During Approach Phase Using Clustering Techniques and AutoEncoders
net/publication/337873937
CITATIONS READS
8 870
7 authors, including:
Florian Schwaiger
Technische Universität München
11 PUBLICATIONS 18 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Antonio Fernández Llamas on 10 December 2019.
Abstract—Airlines safety departments analyse aircraft data detect, as patterns are not often recognised by an analyst.
recorded on-board (FDM) to inspect safety occurrences. This Furthermore, the identification anomalous and rare flights is a
activity relies on human experts to create a rule-based system challenging task in proactive safety management systems.
that detects known safety issues, based on whether a small set of
parameters exceed some predefined set of thresholds. However, From the data science point of view, anomaly detection is
rare events are the hardest to manually detect, as patterns are a machine learning discipline that involves outlier detection,
not often recognised at glance. Experts agree that both approach deviation detection, or novelty detection. Abnormal events
and take-off procedures are more prone to experience a safety detection and cause-effect analysis in aviation represent a
incident. In this paper we performed descriptive and predictive challenging field that involves the mining of multi-dimensional
analyses to detect anomalies during approach phase for runway
25R in LEBL airport. From a descriptive point of view, clustering heterogeneous time series data, the lack of time-step-wise an-
techniques aids to find patterns and correlations within data, and notation of events, and the challenge of using scalable mining
also to identify clusters of similar observations. Moreover, this tools to perform analysis over a large number of events [3].
clusters might reveal certain points as rare events that are isolated Machine learning (ML) models that work on datasets under
from the rest of the observations. Predictive analytics, and more imbalanced constraints (such as a very low count of anomalies)
concisely deep learning ANN and AutoEncoders, can be used to
detect this abnormal events. The methodology relies on learning are limited and may not be able to learn patterns that explain
how ”normal” observations looks like, since they usually are the the anomalies correctly [4]. This means that the cause-effect
majority of the cases. Afterwards, if we process an abnormal analysis might be incomplete or not well supported. Also,
flight, the model will return a high reconstruction error because the study might need to consider some interesting anomalies
of the deviation from the training data. This shows how the that are not ”labelled” in the historical data but need to be
predictive methodology could be applied as an extremely useful
forensic tool for safety experts and FDM analysts. investigated. In this context, we need to shift the approach to
Keywords—Anomaly detection, hazard identification, safety, make the model less dependent on labeled data and consider
clustering, deep learning, LSTM, AutoEncoders, HDBSCAN a semi-supervised machine learning methodology [5].
Semi-Supervised algorithms can be used to train a model
I. I NTRODUCTION to learn the behaviour of the ”normal approaches”, which
Flight Data Monitoring (FDM) is an activity carried out compound the majority of the dataset. We understand as a
by airlines primarily for monitoring and improving the safety ”normal approach” those observations which features values
and operation of their aircrafts [1]. The data recorded by are bounded within the rest of the distribution. By applying this
the flight data recorder on-board an aircraft is downloaded methodology, we only require labels during the pre-processing
and analysed through various tools and techniques, with the phase as we need to obtain a dataset only containing ap-
ultimate objective of using that analysis to improve civil avi- proaches operated under normal conditions. After this “non-
ation operations, establishing maintenance schedules, training anomalous” dataset has been obtained, it can be used to train
pilots and modifying operational procedures amongst others, an unsupervised machine learning model, i.e. without the need
without compromising safety. The benefits of analysing FDM for labels and learning the latent representation of the data.
are often highlighted by the safety departments, that analyse With enough data, deep learning algorithms ensemble differ-
occurrences. This activity relies on human experts to create ent layers to build a neural network model. In particular, Au-
a rule-based system that detects known safety issues, based toEncoders are a special type of neural network architectures
on whether a small set of parameters exceed some predefined that exploit the presented semi-supervised methodology [6]. To
set of thresholds. This unveils the necessity of introducing classify the abnormal flights, several approaches might be fol-
predictive analysis methodologies to automate the detection of lowed. For instance, the reconstruction error can be calculated
events [2]. In particular, rare events are the hardest to manually by comparing the characteristics and patterns of the abnormal
techniques applied and problem context share similarities with dynamic information captured by aircraft sensors (e.g. ETA,
this paper. However, this paper will be focused in analysing ATOT, origin aircraft, STAR, etc...).
FDM data and the anomalies detected will often be safety The dataset is composed by 35.000 approach operations in
related. LEBL 25R. This means that our dataset contains successful
and failed (e.g. go-arounds, touch-and-go, ...) landing attempts.
III. P ROBLEM ASSESSMENT FDM data is known for presenting a very high variable
In this paper, we present a diagnostic analytics problem, dimensionality with more than 150 different variables stored
with the ultimate outcome of finding out dependencies and as time-series, with a resolution of up to 8 samples per second.
identifying patterns that help better define the causes for Given that iterative clustering techniques tend to be memory
anomalies in the approach phase or even find new anomalies intensive we decided to reduce the number of features by
undetected by experts. down-sampling the temporal series. The new sampling rate
Normally, airlines safety departments manually inspect in- was selected by ensuring an observation every 0.5NM between
dividually all their flights looking at concrete operations the touchdown (TD) point and 12 NM from TD. In this way,
that might hide safety implications behind. However, this we have a complete overview for the approach context with
procedure could be quite tedious since the majority of the one point every 30 seconds, from the beginning of the landing
observations usually behave normally. Machine learning might procedure to the touchdown in the runway.
empower this labor learning how normal procedures are oper- An scalable infrastructure is mandatory to run the complete
ated, and automatically discriminate the outliers to be further pipeline described in Figure 2. We have used DataBeacon
analyzed by safety experts. Furthermore, this could reveal [18] platform to process the 35.000 flights, which is based on
new possible safety concerns not previously detected by the Amazon Web Services stack. The cluster used was an EMR
airlines. cluster composed by 1 master node and 5 workers, deployed
In order to narrow down the problem, we decided to focus on m5.4xlarge instances with a CPU of 16 GB and 64 GiB
on the approach and landing phases rather the whole flight for memory.
trace, since departure and arrival procedures are more likely to
experience a safety incident. Additionally, we constrained the V. M ETHODOLOGY
scope of the research to a particular procedure, so we selected A. Feature engineering
FDM approaches landing on runway 25R of LEBL airport.
This aims to decrease the noise introduced to our scenario and The feature engineering process, both for the descriptive
extract particularized conclusions that could be extrapolated to and predictive analytics, have been computed in an unique
a wider number of airports or runways in the future. multi-step data preparation pipeline including data cleaning,
The research process has been carried out by performing de-identification of sensitive information, data merging of the
a descriptive and predictive analysis. The descriptive analysis three datasets (FDM, METAR and Flight Plans) and time-
focuses on inspecting the data, and creating clusters of flights points extraction (12NM to TD every 0.5NM). The complete
that landed under similar circumstances. Moreover, we will pipeline executed from decoded FDM files to extract the input
research about rare events detection and examine specific dataset is described in Figure 2. Increasing the granularity of
flights that differ from the rest of the distribution analysing the sampling rate would lead to less loss of information, but
the causes, which might go from external influence factors more computational capabilities to train the models would be
such as weather or traffic congestion, to wrongly calibrated or required, so we finally sampled at 0.5NM which provides a
broken sensors, and even issues during the FDM decoding. good trade-off between throughput and granularity of data.
The predictive analysis will take the set of outliers identified The features have been grouped in several categories which
in the descriptive analyses, and will attempt to automatically are summarized in Table I
classify these abnormal approaches, training an AutoEncoder In a final remark, each landing attempt/approach is dealt
and measuring the loss obtained as output. A well-trained independently. The destination airport for subsequent attempts
AutoEncoder should be able to predict correctly normal ap-
proaches as they will have similar patterns following the same
distribution. The reconstruction error will be small for these
cases; nevertheless, it will increase if we introduce a rare-event
as an input. By catching these high errors and establishing an
empirical threshold, we can perform a binary classification of
normal and abnormal flights.
IV. DATA
The main data source used to perform the analyses has been
FDM data. However, other data sources such as METAR, for
weather information during the approach, and the final flight Figure 2: Data preparation pipeline for FDM data. G/A means go-around,
plans have been used to provide an external context to the and if it happens, then a new approach attempt is created.
might be different, so information about the airport (e.g. proved that it is well suited for the visualisation of high-
runway exit) cannot be propagated across the whole flight. The dimensional datasets [19]. The memory and computational
same applies to weather reports taken from METAR that can requirements needed to run this algorithm are quite high since
refresh between multiple attempts. After running the pipeline tSNE scales quadratically in the number of objects contained
we obtained a total of 825 features per approach attempt to in the input. Therefore, learning the 2D representation of the
feed our models. data is very slow for datasets larger than a few thousands of
input observations.
B. Descriptive analysis: Clustering In Figure 3 the 2D representation of the data is visualized.
In this section we will explore our input dataset looking By only looking at the picture, it is easy to visually recognize
at variables distribution and potential correlations existent groups of flights that are very close to each other. In the
between features. The dimensions of the input dataset are other hand, it can be noticed how some points are completely
35.000 samples and 825 features. It is composed by multiple isolated from the rest of the distribution, or in the middle be-
landing attempts with static and dynamic features. As it can tween two clusters. These points can be considered as outliers
be noticed, our data has a high number of columns (825), for the given distribution. Nevertheless even if clusters are
which complicates the data visualisation. In order to deal recognizable at glance, we performed a clustering algorithm
with this problem, dimensionality reduction techniques were to group all the samples in regardless categories.
applied to the dataset to better represent the features. This data HDBSCAN is a clustering algorithm developed by
transformation pipeline can be considered as a sophisticated Campello, Moulavi, and Sander [20]. It extends DBSCAN by
feature engineering process, i.e the output will be the input of converting it into a hierarchical clustering algorithm, and then
the clustering algorithm. using a technique to extract a flat clustering based in the stabil-
The selected dimensionality reduction algorithm is the t- ity of clusters. Results after applying HDBSCAN algorithm to
distributed Stochastic Neighbor Embedding (tSNE) technique tSNE representation of the distribution is described in Figure
[19]. tSNE is a probabilistic algorithm that minimizes the 4, where it can be observed how the model is able to determine
divergence between pairwise similarities of the input objects 9 different clusters.
and their corresponding low-dimensional representation re- HDBSCAN algorithm enables to flag certain points as noise,
spectively. In other words, it inspects the input statistical prop- avoiding to set a specific cluster for these samples. Most of
erties and manages to represent this data using less dimensions these noisy samples are placed between two different groups,
by matching both distributions in the best way. It has been or isolated away from any identified cluster. In addition to
this feature, HDBSCAN supports the GLOSH outlier detection
can forensic analysis these results a produce useful insights. The objectives for the predictive analysis are:
Metrics evolution over time are represented in Figure 7 • Design and implement an Artificial Neural Network
2) Outlier B - Late Flaps deployment: This outlier has three (ANN) using an AutoEncoder architecture, composed by
metrics that stand out from the norm. The first one is the Auto- multiple LSTM layers to deal with the temporal series.
Pilot (AP) indicator. During this flight the AP is disengaged • Train and test the AutoEncoder, so it is tuned to recreate
well before the final phases of the approach (as far as 12NM normal samples minimizing the loss, and in the other
/ 4000 feet). This is interesting because although it does not hand returning high reconstruction error when an outlier
represent a hazard, as pilot can disengage the autopilot when is processed.
safety can be assured, this is not normal procedure and most
pilots maintain the AP engage until about 1500 feet or 4NM As we commented before, we will design an AutoEn-
from threshold. coder based on LSTM layers. LSTM is a type of recurrent
The metric causing the anomaly is the flaps position. In neural network (RNN), very useful to extract patterns from
this particular flight, flaps are correctly deployed at about 100 sequential or time-series data. These kinds of models are
feet but they are not fully deployed. Similar to the previous capable of automatically extracting features from past events
metric, this by itself does not represent a hazard but is rare as and LSTMs are specifically known for their ability to extract
fully deployed flaps help control the aircraft at low speed and both long and short term features. LSTM is a bit more
produces draft helping the aircraft to slow down to adequate demanding than other models referring to data preparation.
landing speed. Finally, during the final approach phase the The input data to an LSTM model is a 3D array, with
flight has to pronounce increases in the descent rate. Neither shape of (n samples, n timesteps, n f eatures). Hence the
of the both increases in the descent rates surpass the 1000 n samples refers to the number of observations fed into
feet-per-minute threshold but two sudden increases (<150%) the LSTM AutoEncoder, n timesteps or look-back which
in the decent rate could be a symptom of an unstable approach. describes the time window (past data) needed by the LSTM.
These two increases could be caused by the flaps not being Finally n f eatures represents the amount of columns se-
fully deployed, the aircraft speed decreases and there is not lected as potential features for the scenario.
enough lift generated by the wings. As detailed in Figure 1, AutoEncoders are composed by
In contrast with Outlier A, this one has now special haz- three main elements: encoder, code and decoder. We have
ardous situation but had some metrics and performance that designed an AutoEncoder with 2 layers for the encoding part,
was out of the ordinary. This presents another possible benefit one single layer for the code and another 2 layers for the
of these types of algorithms. It can help detect flights that decoder. Therefore the neural network is composed by a total
without surpassing any defined threshold can be labelled as of 5 LSTM layers. For the activation functions we used ReLU
anomalous and help detect unknown hazards or risk behaviors function due to it popularity since it avoids and rectifies
in an unprecedented level. Metrics evolution over time are the vanishing gradient problem. The AutoEncoder receives
represented in Figure 8 a complete landing attempt as an input in a (1,825) vector
format, encode it and decode it to finally return it minimizing
the loss.
Figure 9: ANN layers for the LSTM AutoEncoder, showing the encoder, code
and decoder layer sizes.
C. Predictive analysis: AutoEncoder
The starting point for the predictive analysis are the outlier
scores calculated in the descriptive analysis. These scores will To train the auto-encoder, we have followed these steps:
be used to discern between abnormal and normal approaches. • Divide the dataset in two parts, negatively labeled (normal
The AutoEncoder will be trained only using normal ap- approaches) and positively labeled (outlier approaches)
proaches to learn the behaviour of a non-anomalous approach. • Ignore the anomalies and train the auto-encoder only with
Once trained, the AutoEncoder compares each input flight with negatively labeled data. Afterwards we will test the model
the learnt approach behaviour, outputting a reconstruction error using positively labeled data, to assess if model filter
score. anomalies properly.
Figure 10: Histogram of outlier scores obtained from GLOSH algorithm. The
red line represents the threshold for 95% quantile of the distribution. Those
points above the threshold are extreme outliers.
threshold. Furthermore, if we filter out the low to medium proved once again the efficiency of AutoEncoder architectures
severity anomalies, and only consider the most severe ones when working with time-series in aviation.
filtering the upper 95% quantile, the AutoEncoder perfectly ACKNOWLEDGMENT
recognizes normal form abnormal observations. This is pre-
sented in Figure 12. We want to thank the whole Safeclouds.eu consortium
for providing the data sources and operational knowledge
necessary to perform this research. With special mentions
to TUM’s institute of flight system dynamics for providing
the algorithm for decoding and preparing FDM data. And
Iberia airlines safety department for providing the operational
expertise necessary to validate the results.
R EFERENCES
[1] R. Fernandes. An analysis of the potential benefits to airlines of flight
data monitoring programs (msc). 2002.
[2] Sameer Jasra, Jason Gauci, Alan Muscat, Gianluca Valentino, David
Zammit-Mangion, and Robert Camilleri. Literature review of machine
learning techniques to analyse flight data. 2018.
[3] Vijay Manikandan Janakiraman. Explaining aviation safety incidents
using deep temporal multiple instance learning, 2017.
[4] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly
detection: A survey. ACM Comput. Surv., 41, 07 2009.
Figure 13: Confusion matrix for normal and abnormal (all severities) flights [5] Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-
supervised learning. 2006.
[6] Pierre Baldi. Autoencoders, unsupervised learning, and deep archi-
The confusion matrix (Figure 13) exemplifies even further tectures. In Isabelle Guyon, Gideon Dror, Vincent Lemaire, Graham
the main properties of the predictive model. The algorithm is Taylor, and Daniel Silver, editors, Proceedings of ICML Workshop on
Unsupervised and Transfer Learning, volume 27 of Proceedings of
very efficient at detecting very anomalous flights, but fails to Machine Learning Research, pages 37–49, Bellevue, Washington, USA,
set a distinguishable boundary between a normal flight and a 02 Jul 2012. PMLR.
flight presenting low severity hazard. [7] Marius Kloft, Ulf Brefeld, Patrick Düssel, Christian Gehl, and Pavel
Laskov. Automatic feature selection for anomaly detection. pages 71–
VII. D ISCUSSION 76, 01 2008.
[8] Sergios Theodoridis and Konstantinos Koutroumbas. Pattern Recog-
Given the promising results obtained for the 95% percentile nition, Fourth Edition. Academic Press, Inc., Orlando, FL, USA, 4th
anomalies, the algorithm provides an extremely useful tool edition, 2008.
[9] Bryan Matthews, Santanu Das, Kanishka Bhaduri, Kamalika Das, Rod-
for FDM and Safety analysts. An implementation could be ney Martin, and Nikunj Oza. Discovering anomalous aviation safety
an automatic FDM data labelling system. At the end of events using scalable data mining algorithms. Journal of Aerospace
a day of operations, when FDM data is retrieved by the Information Systems, 11:482–482, 07 2014.
[10] Lishuan Li and R. John Hansman. Anomaly detection in airline routine
safety department, the system could flag anomalous flights for operations using flight data recorder data. Report No. ICAT-2013-4, MIT
inspection. This can enable a tool for analysing and flagging International Center for Air Transportation (ICAT), 2013.
large amount of FDM data. Not just for forensic analysis but [11] John Bro. Fdm machine learning: An investigation into the utility of
neural networks as a predictive analytic tool for go around decision
also for aircraft maintenance, enabling inter-operability with making. Journal of Applied Sciences and Arts: Vol. 1 : Iss. 3 , Article
existing predictive maintenance tools. 3., 2017.
[12] Santanu Das, Bryan Matthews, Ashok Srivastava, and Nikunj Oza.
Another implementation of the methodology could be a real- Multiple kernel learning for heterogeneous anomaly detection: algorithm
time monitoring tool, which is able to ”quantify the risk” of and aviation safety case study. pages 47–56, 07 2010.
the approach performed. This means that, by using the trained [13] Vijay Manikandan Janakiraman and David Nielsen. Anomaly detection
in aviation data using extreme learning machines. pages 1993–2000, 07
AutoEncoder, the tool could give an outlier score to a given 2016.
flight. This information then could be used to warn airport [14] J. Castle, J. Stutz, and D. McIntosh. Automatic discovery of anomalies
crew, ATCOs, the airline or the crew. reported in aerospace systems health and safety documents. 05 2007.
[15] Wei-Han Lee, Jorge Ortiz, Bongjun Ko, and Ruby Lee. Time series
Overall, the semi-supervised methodology has been suc- segmentation through automatic feature learning, 2018.
cessful. By combining two very powerful machine learning [16] Thomas Dubot. Predicting sector configuration transitions with
algorithm (HDBSCAN clustering and AutoEncoders), we were autoencoder-based anomaly detection. 06 2018.
[17] Xavier Olive, Jeremy Grignard, Thomas Dubot, and Julie Saint-Lot.
able to solve a very complex problem both from the data Detecting controllers’ actions in past mode s data by autoencoder-based
science and the aviation safety perspectives. The main benefits anomaly detection. 11 2018.
of using a semi-supervised architecture is to not only to [18] DataBeacon. Remain in control of your data while optimizing your
operations through artificial intelligence - databeacon.aero, 2019.
automate and speed-up the detection of known events (e.g. [19] Laurens van der Maaten and Geoffrey E. Hinton. Visualizing data using
late flaps deployment) but also to support the detection of t-sne. 2008.
unknown hazards and rare events. [20] Ricardo J. G. B. Campello, Davoud Moulavi, and Joerg Sander. Density-
based clustering based on hierarchical density estimates. In Jian Pei,
The first part of the research has proven that, when using the Vincent S. Tseng, Longbing Cao, Hiroshi Motoda, and Guandong Xu,
adequate dimensionality reduction techniques, the automatic editors, Advances in Knowledge Discovery and Data Mining, pages 160–
filtering of outliers is feasible. The second part of the research 172, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.