0% found this document useful (0 votes)
123 views9 pages

Flight Data Monitoring (FDM) Unknown Hazards Detection During Approach Phase Using Clustering Techniques and AutoEncoders

This document discusses using clustering techniques and autoencoders to detect unknown hazards during the approach phase of flights. It proposes applying clustering to identify patterns in approach data and detect outlier flights. It also suggests using autoencoders, a type of neural network, to learn what normal approaches look like in order to identify abnormal flights based on their reconstruction error. The methodology aims to help safety experts analyze flight data without relying solely on rule-based systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views9 pages

Flight Data Monitoring (FDM) Unknown Hazards Detection During Approach Phase Using Clustering Techniques and AutoEncoders

This document discusses using clustering techniques and autoencoders to detect unknown hazards during the approach phase of flights. It proposes applying clustering to identify patterns in approach data and detect outlier flights. It also suggests using autoencoders, a type of neural network, to learn what normal approaches look like in order to identify abnormal flights based on their reconstruction error. The methodology aims to help safety experts analyze flight data without relying solely on rule-based systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/337873937

Flight Data Monitoring (FDM) Unknown Hazards detection during Approach


Phase using Clustering Techniques and AutoEncoders

Conference Paper · December 2019

CITATIONS READS

8 870

7 authors, including:

Antonio Fernández Llamas Darío Martínez


INNAXIS Research Institute INNAXIS Research Institute
3 PUBLICATIONS   15 CITATIONS    3 PUBLICATIONS   11 CITATIONS   

SEE PROFILE SEE PROFILE

Florian Schwaiger
Technische Universität München
11 PUBLICATIONS   18 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Safeclouds.eu View project

SafeClouds.eu - Big Data Flight Safety Analysis View project

All content following this page was uploaded by Antonio Fernández Llamas on 10 December 2019.

The user has requested enhancement of the downloaded file.


Flight Data Monitoring (FDM) Unknown Hazards
detection during Approach Phase using Clustering
Techniques and AutoEncoders
Antonio Fernández, Darı́o Martı́nez Florian Schwaiger José Marı́a Nuñez
Pablo Hernández and Samuel Cristóbal Institute of Flight System Dynamics and José Manuel Ruiz
Innaxis Research Institute Technische Universität München Iberia airlines
Madrid, Spain Germany Madrid, Spain
Email: {af,dm,ph,sc}@innaxis.org Email: [email protected] Email: {jmnunez,jmruizn}@iberia.es

Abstract—Airlines safety departments analyse aircraft data detect, as patterns are not often recognised by an analyst.
recorded on-board (FDM) to inspect safety occurrences. This Furthermore, the identification anomalous and rare flights is a
activity relies on human experts to create a rule-based system challenging task in proactive safety management systems.
that detects known safety issues, based on whether a small set of
parameters exceed some predefined set of thresholds. However, From the data science point of view, anomaly detection is
rare events are the hardest to manually detect, as patterns are a machine learning discipline that involves outlier detection,
not often recognised at glance. Experts agree that both approach deviation detection, or novelty detection. Abnormal events
and take-off procedures are more prone to experience a safety detection and cause-effect analysis in aviation represent a
incident. In this paper we performed descriptive and predictive challenging field that involves the mining of multi-dimensional
analyses to detect anomalies during approach phase for runway
25R in LEBL airport. From a descriptive point of view, clustering heterogeneous time series data, the lack of time-step-wise an-
techniques aids to find patterns and correlations within data, and notation of events, and the challenge of using scalable mining
also to identify clusters of similar observations. Moreover, this tools to perform analysis over a large number of events [3].
clusters might reveal certain points as rare events that are isolated Machine learning (ML) models that work on datasets under
from the rest of the observations. Predictive analytics, and more imbalanced constraints (such as a very low count of anomalies)
concisely deep learning ANN and AutoEncoders, can be used to
detect this abnormal events. The methodology relies on learning are limited and may not be able to learn patterns that explain
how ”normal” observations looks like, since they usually are the the anomalies correctly [4]. This means that the cause-effect
majority of the cases. Afterwards, if we process an abnormal analysis might be incomplete or not well supported. Also,
flight, the model will return a high reconstruction error because the study might need to consider some interesting anomalies
of the deviation from the training data. This shows how the that are not ”labelled” in the historical data but need to be
predictive methodology could be applied as an extremely useful
forensic tool for safety experts and FDM analysts. investigated. In this context, we need to shift the approach to
Keywords—Anomaly detection, hazard identification, safety, make the model less dependent on labeled data and consider
clustering, deep learning, LSTM, AutoEncoders, HDBSCAN a semi-supervised machine learning methodology [5].
Semi-Supervised algorithms can be used to train a model
I. I NTRODUCTION to learn the behaviour of the ”normal approaches”, which
Flight Data Monitoring (FDM) is an activity carried out compound the majority of the dataset. We understand as a
by airlines primarily for monitoring and improving the safety ”normal approach” those observations which features values
and operation of their aircrafts [1]. The data recorded by are bounded within the rest of the distribution. By applying this
the flight data recorder on-board an aircraft is downloaded methodology, we only require labels during the pre-processing
and analysed through various tools and techniques, with the phase as we need to obtain a dataset only containing ap-
ultimate objective of using that analysis to improve civil avi- proaches operated under normal conditions. After this “non-
ation operations, establishing maintenance schedules, training anomalous” dataset has been obtained, it can be used to train
pilots and modifying operational procedures amongst others, an unsupervised machine learning model, i.e. without the need
without compromising safety. The benefits of analysing FDM for labels and learning the latent representation of the data.
are often highlighted by the safety departments, that analyse With enough data, deep learning algorithms ensemble differ-
occurrences. This activity relies on human experts to create ent layers to build a neural network model. In particular, Au-
a rule-based system that detects known safety issues, based toEncoders are a special type of neural network architectures
on whether a small set of parameters exceed some predefined that exploit the presented semi-supervised methodology [6]. To
set of thresholds. This unveils the necessity of introducing classify the abnormal flights, several approaches might be fol-
predictive analysis methodologies to automate the detection of lowed. For instance, the reconstruction error can be calculated
events [2]. In particular, rare events are the hardest to manually by comparing the characteristics and patterns of the abnormal

9th SESAR Innovation Days


2nd – 5th December 2019
ISSN 0770-1268
2

perform the clustering and train the AutoEncoder. In Section


V, we present the methodology with a descriptive data analysis
of some arrivals, with a detailed analysis of the context and
patterns detected in the outliers. Furthermore it also present
a predictive analysis to detect anomalous approaches. Section
VI puts the results in perspective, showing how the predictive
methodology could be applied as a forensic data analysis
tool for safety experts. Section VII finally concludes with
a discussion about the analysis performed and the results
obtained.
II. S TATE OF THE ART
Anomaly detection is one of the main topics in data science
research, though many challenges still await solution. Machine
learning (ML) provides a brand new toolkit for tackling
Figure 1: Anomaly detection using AutoEncoders, measuring the reconstruc- aviation problems. And, in particular, it has been the increasing
tion error between input and output trend of using ML to analyse flight data efficiently, as it offers
the most important insights into the operations of an aircraft.
The difficulties for applying predictive analytics to FDM data
flight with learned low level representation of a normal flight have been leveraged in the past, remarking the significance
and measuring the loss difference. Flights over an empirically of a reliable features selection pipeline [7]. Furthermore, the
set threshold will be considered ”abnormal”. However, one of process of automatically select attributes from a given dataset
the core features supported by AutoEncoders is the possibility by the learning algorithms itself is well known capability in
of studying apparently normal flights c lassified as abnormal, ML ensemble methods [8].
leading to new types of safety hazards. The capabilities of automatic features selection algorithms
Using supervised methodologies (e.g. binary classification have not reached full potential for FDM data, only two major
based on LSTMs) is expected to under-perform due to the im- works tried to tackle the issue [9] [10]. In a recent attempt,
balanced nature of the training data, i.e. the lack of anomalous Bro et al. (2017) [11] proposed a ANN-based methodology
approaches. Because of this, a semi-supervised architecture for predicting go-arounds, using more than 2.000 hours of
makes more sense. FDM data of training flights. Since the work done in features
For this study we have analysed a set of approaches landing extraction for FDM is very limited, we believe that more
on LEBL, more concisely on runway 25R. First, performing recent deep learning methodologies such as AutoEncoders can
a descriptive analysis mainly focused on applying clustering release the full potential of FDM predictive analytics.
techniques to detect similarities between the observations, Regarding anomaly detection research in aviation,
quantifying an ”outlier score” per sample. Second, to perform NASA/Ames Research Center applied Multiple Kernel for
a predictive analysis over the outlier samples, training an auto- Anomaly Detection (MKAD) for anomalous event detection
encoder considering only those observations with a low outlier within American terminal manoeuvring areas, combining both
score, and predicting anomalous approaches by measuring the continuous and static features [12]. Due to the complexity of
reconstruction error between input and output. The reconstruc- training a kernel-based machine learning algorithm, the study
tion error will be high for those flights t hat d iffer a l ot from was only performed over a dataset of a few thousand arrivals.
the regular procedure. The output of the analysis can be used In recent work, they have improved the computational time
to automatically flag anomalous flights to be later analysed by by applying deep learning method, specifically extreme
safety experts. The output of recommendations for the crew learning machines based in AutoEncoders and embeddings
or the safety analysts are out of the scope of the research. [13]. The NASA/Ames Research Center has also defined and
Therefore, the main objectives of the proposed research are experimented with methodologies for improving the efficiency
two: of the investigation of anomalies [14]. They aim to distinguish
1) Descriptive analysis based on an unsupervised cluster- between operationally significant anomalies and uninteresting
ing. statistical anomalies based on a weak-supervised classifier.
2) Predictive analysis based on a semi-supervised AutoEn- AutoEncoders have also already been used to find breakpoints
coders neural network. in time series [15] and to predict realistic transitions in sector
Section II presents a literature review of the state of the art configurations [16].
of ML techniques applied anomaly detection and flight data Xavier Olive et al. [17] presented at the SIDs 2018 a very in-
monitoring (FDM) analysis. Section III defines the problem, teresting approach with AutoEncoders and combining speech
the context of the proposed scenario and the scope of the and ADS-B data, to detect anomalies in controller’s actions.
descriptive and predictive analyses. Section IV describes the Mainly focusing in deviations of the flight trajectory from
studied dataset; then explains how to prepare the data to flight plan. This paper is of the highest relevance given that the

9th SESAR Innovation Days


2nd – 5th December 2019
ISSN 0770-1268
3

techniques applied and problem context share similarities with dynamic information captured by aircraft sensors (e.g. ETA,
this paper. However, this paper will be focused in analysing ATOT, origin aircraft, STAR, etc...).
FDM data and the anomalies detected will often be safety The dataset is composed by 35.000 approach operations in
related. LEBL 25R. This means that our dataset contains successful
and failed (e.g. go-arounds, touch-and-go, ...) landing attempts.
III. P ROBLEM ASSESSMENT FDM data is known for presenting a very high variable
In this paper, we present a diagnostic analytics problem, dimensionality with more than 150 different variables stored
with the ultimate outcome of finding out dependencies and as time-series, with a resolution of up to 8 samples per second.
identifying patterns that help better define the causes for Given that iterative clustering techniques tend to be memory
anomalies in the approach phase or even find new anomalies intensive we decided to reduce the number of features by
undetected by experts. down-sampling the temporal series. The new sampling rate
Normally, airlines safety departments manually inspect in- was selected by ensuring an observation every 0.5NM between
dividually all their flights looking at concrete operations the touchdown (TD) point and 12 NM from TD. In this way,
that might hide safety implications behind. However, this we have a complete overview for the approach context with
procedure could be quite tedious since the majority of the one point every 30 seconds, from the beginning of the landing
observations usually behave normally. Machine learning might procedure to the touchdown in the runway.
empower this labor learning how normal procedures are oper- An scalable infrastructure is mandatory to run the complete
ated, and automatically discriminate the outliers to be further pipeline described in Figure 2. We have used DataBeacon
analyzed by safety experts. Furthermore, this could reveal [18] platform to process the 35.000 flights, which is based on
new possible safety concerns not previously detected by the Amazon Web Services stack. The cluster used was an EMR
airlines. cluster composed by 1 master node and 5 workers, deployed
In order to narrow down the problem, we decided to focus on m5.4xlarge instances with a CPU of 16 GB and 64 GiB
on the approach and landing phases rather the whole flight for memory.
trace, since departure and arrival procedures are more likely to
experience a safety incident. Additionally, we constrained the V. M ETHODOLOGY
scope of the research to a particular procedure, so we selected A. Feature engineering
FDM approaches landing on runway 25R of LEBL airport.
This aims to decrease the noise introduced to our scenario and The feature engineering process, both for the descriptive
extract particularized conclusions that could be extrapolated to and predictive analytics, have been computed in an unique
a wider number of airports or runways in the future. multi-step data preparation pipeline including data cleaning,
The research process has been carried out by performing de-identification of sensitive information, data merging of the
a descriptive and predictive analysis. The descriptive analysis three datasets (FDM, METAR and Flight Plans) and time-
focuses on inspecting the data, and creating clusters of flights points extraction (12NM to TD every 0.5NM). The complete
that landed under similar circumstances. Moreover, we will pipeline executed from decoded FDM files to extract the input
research about rare events detection and examine specific dataset is described in Figure 2. Increasing the granularity of
flights that differ from the rest of the distribution analysing the sampling rate would lead to less loss of information, but
the causes, which might go from external influence factors more computational capabilities to train the models would be
such as weather or traffic congestion, to wrongly calibrated or required, so we finally sampled at 0.5NM which provides a
broken sensors, and even issues during the FDM decoding. good trade-off between throughput and granularity of data.
The predictive analysis will take the set of outliers identified The features have been grouped in several categories which
in the descriptive analyses, and will attempt to automatically are summarized in Table I
classify these abnormal approaches, training an AutoEncoder In a final remark, each landing attempt/approach is dealt
and measuring the loss obtained as output. A well-trained independently. The destination airport for subsequent attempts
AutoEncoder should be able to predict correctly normal ap-
proaches as they will have similar patterns following the same
distribution. The reconstruction error will be small for these
cases; nevertheless, it will increase if we introduce a rare-event
as an input. By catching these high errors and establishing an
empirical threshold, we can perform a binary classification of
normal and abnormal flights.
IV. DATA
The main data source used to perform the analyses has been
FDM data. However, other data sources such as METAR, for
weather information during the approach, and the final flight Figure 2: Data preparation pipeline for FDM data. G/A means go-around,
plans have been used to provide an external context to the and if it happens, then a new approach attempt is created.

9th SESAR Innovation Days


2nd – 5th December 2019
ISSN 0770-1268
4

TABLE I: Datasets and features


Group Features Data Sources
Pitch, roll and heading positions and rates. Angle of attack.
Operation dynamics FDM
Vertical descent rate. Barometric altitude. Radio altitude
Aircraft energy Air speed. Ground speed. Energy level. Aircraft mass. FDM
Static pressure. Static temperature. Relative humidity. Dew point
METAR (LEBL)
Adverse weather Air density. Wind direction. Wind speed. Wind variation.
FDM (aircraft sensors)
Ground visibility. 1st Cloud layers height and opacity.
Aircraft configuration Flaps orientation FDM
Crew coordination Autopilot status. Pilot flying FDM
Pilot awareness Current time, Distance travelled, Total Time Flown FDM
UTC time. Origin Airport. Destination Airport. Call sign.
Flight Plan
Flight static information Aircraft type. Tail number. Year. Week. Runway Occupancy Time. Time at threshold.
FDM
STAR. SID. Time exit. Runway. Runway exit

might be different, so information about the airport (e.g. proved that it is well suited for the visualisation of high-
runway exit) cannot be propagated across the whole flight. The dimensional datasets [19]. The memory and computational
same applies to weather reports taken from METAR that can requirements needed to run this algorithm are quite high since
refresh between multiple attempts. After running the pipeline tSNE scales quadratically in the number of objects contained
we obtained a total of 825 features per approach attempt to in the input. Therefore, learning the 2D representation of the
feed our models. data is very slow for datasets larger than a few thousands of
input observations.
B. Descriptive analysis: Clustering In Figure 3 the 2D representation of the data is visualized.
In this section we will explore our input dataset looking By only looking at the picture, it is easy to visually recognize
at variables distribution and potential correlations existent groups of flights that are very close to each other. In the
between features. The dimensions of the input dataset are other hand, it can be noticed how some points are completely
35.000 samples and 825 features. It is composed by multiple isolated from the rest of the distribution, or in the middle be-
landing attempts with static and dynamic features. As it can tween two clusters. These points can be considered as outliers
be noticed, our data has a high number of columns (825), for the given distribution. Nevertheless even if clusters are
which complicates the data visualisation. In order to deal recognizable at glance, we performed a clustering algorithm
with this problem, dimensionality reduction techniques were to group all the samples in regardless categories.
applied to the dataset to better represent the features. This data HDBSCAN is a clustering algorithm developed by
transformation pipeline can be considered as a sophisticated Campello, Moulavi, and Sander [20]. It extends DBSCAN by
feature engineering process, i.e the output will be the input of converting it into a hierarchical clustering algorithm, and then
the clustering algorithm. using a technique to extract a flat clustering based in the stabil-
The selected dimensionality reduction algorithm is the t- ity of clusters. Results after applying HDBSCAN algorithm to
distributed Stochastic Neighbor Embedding (tSNE) technique tSNE representation of the distribution is described in Figure
[19]. tSNE is a probabilistic algorithm that minimizes the 4, where it can be observed how the model is able to determine
divergence between pairwise similarities of the input objects 9 different clusters.
and their corresponding low-dimensional representation re- HDBSCAN algorithm enables to flag certain points as noise,
spectively. In other words, it inspects the input statistical prop- avoiding to set a specific cluster for these samples. Most of
erties and manages to represent this data using less dimensions these noisy samples are placed between two different groups,
by matching both distributions in the best way. It has been or isolated away from any identified cluster. In addition to
this feature, HDBSCAN supports the GLOSH outlier detection

Figure 3: Two-dimensional representation of the input dataset using tSNE


probabilistic algorithm. Figure 4: Clustering using HDBSCAN algorithm over tSNE representation.

9th SESAR Innovation Days


2nd – 5th December 2019
ISSN 0770-1268
5

algorithm. In order to better understand the causes, we will


analyse the features evolution from a FDM analyst point of
view. This type of analysis helps us to understand what flights
are considered abnormal and why, just by inspecting the FDM
time-series values.
1) Outlier A - High ground speed justified by a strong
tailwind: In Outlier A, upon further inspections, we can
find two metrics that stand out as atypical. The first one is
the ground speed. This flight presents a significantly high
ground speed, around 150 knots, during the final stages of
the approach (0NM-2NM from the TD).
Figure 5: Two-dimensional representation for the outlier scores obtained from On the contrast, the airspeed (CAS) is not also unusually
GLOSH algorithm in HDBSCAN clusters. The red dots are extreme outliers
above 95% quantile. high, approximately 140 knots which is inside what could
be consider normal operation. The second metric that stands
out in this flight is the rate of descent. The flight presents a
sharp increase in the descent rate at about 2NM to 4NM from
the threshold. At this distance the aircraft is approximately at
1000 feet AGL (3 Degree glide path) which is the usually the
threshold by which an aircraft needs to be stabilized before
continuing the approach. Most common Unstable Approach
(UA) criteria applied a descent rate above 1000 feet-per-minute
is considered an upper limit. The flight here reaches the 1150
feet-per-minute well above this threshold. Taking into account
both metrics we can try to make assumptions of the causes
Figure 6: Histogram of outlier scores obtained from GLOSH algorithm. for these deviations.
Threshold set for 95% quantile of the distribution. All points above the
threshold are extreme outliers.

algorithm, and it allows to combine it with the clustering


algorithm output. The GLOSH outlier detection algorithm is
also related with outlier detection algorithms such as Local
Outlier Factor (LOF), where anomalies are detected by mea-
suring the local deviation of an observation with respect to (a) Air speed (m/s) (b) Ground speed (m/s)
its neighbours. It is a fast and flexible outlier detection system
that implements the detection of ”local outliers”. Local outliers
detection implies that the algorithm can detect outliers within
a local region or cluster, that doesn’t need to be global outliers.
The algorithm outputs an outlier score for each sample of
the dataset. Then a threshold is set for classifying outliers:
being 0 a normal point and 1 an outlier sample. By calculating
the 95% quantile, we can extract those points that present (c) Wind direction (rad) (d) Vertical descent (m/s)
the highest outlier score. The scores distribution obtained
Figure 7: Outlier A features compared with the rest of the distribution
after running GLOSH outlier detection algorithm over the percentiles. Horizontal axis represents the distance, from TD point to 12 NM
HDBSCAN cluster is shown in Figure 5. In the figure, the from TD.
dotted red line represents the threshold for the most extreme
outliers. After plotting the detected extreme points in our tSNE The best suited explanation is that the aircraft was suffering
distribution, location for anomalous points can be observed in heavy tailwinds. The recorder wind speed at the 2NM-4NM
Figure 6. distance was about 11 knots. A tailwind of such force could
We have detected a total of 1.750 outliers that exceed the explain why the aircraft had an unusual high groundspeed
95% quantile of the distribution. After exploring in depth some while the airspeed was not especially high (the wind and
of these flights, we found out several potential flights that aircraft trajectory are in the same direction). This hypothetical
landed in abnormal conditions, even having sensors that were tailwind will also prompt an increase in the descent rate
wrongly calibrated or experienced issues during the decoding as the ground speed is increased. This exemplifies how the
process. To show different types of detected anomalies, we algorithm is able to detect anomalous flights caused by haz-
selected two examples that were tagged as outliers by the ardous situations (Unstable approach) and how a safety officer

9th SESAR Innovation Days


2nd – 5th December 2019
ISSN 0770-1268
6

can forensic analysis these results a produce useful insights. The objectives for the predictive analysis are:
Metrics evolution over time are represented in Figure 7 • Design and implement an Artificial Neural Network
2) Outlier B - Late Flaps deployment: This outlier has three (ANN) using an AutoEncoder architecture, composed by
metrics that stand out from the norm. The first one is the Auto- multiple LSTM layers to deal with the temporal series.
Pilot (AP) indicator. During this flight the AP is disengaged • Train and test the AutoEncoder, so it is tuned to recreate
well before the final phases of the approach (as far as 12NM normal samples minimizing the loss, and in the other
/ 4000 feet). This is interesting because although it does not hand returning high reconstruction error when an outlier
represent a hazard, as pilot can disengage the autopilot when is processed.
safety can be assured, this is not normal procedure and most
pilots maintain the AP engage until about 1500 feet or 4NM As we commented before, we will design an AutoEn-
from threshold. coder based on LSTM layers. LSTM is a type of recurrent
The metric causing the anomaly is the flaps position. In neural network (RNN), very useful to extract patterns from
this particular flight, flaps are correctly deployed at about 100 sequential or time-series data. These kinds of models are
feet but they are not fully deployed. Similar to the previous capable of automatically extracting features from past events
metric, this by itself does not represent a hazard but is rare as and LSTMs are specifically known for their ability to extract
fully deployed flaps help control the aircraft at low speed and both long and short term features. LSTM is a bit more
produces draft helping the aircraft to slow down to adequate demanding than other models referring to data preparation.
landing speed. Finally, during the final approach phase the The input data to an LSTM model is a 3D array, with
flight has to pronounce increases in the descent rate. Neither shape of (n samples, n timesteps, n f eatures). Hence the
of the both increases in the descent rates surpass the 1000 n samples refers to the number of observations fed into
feet-per-minute threshold but two sudden increases (<150%) the LSTM AutoEncoder, n timesteps or look-back which
in the decent rate could be a symptom of an unstable approach. describes the time window (past data) needed by the LSTM.
These two increases could be caused by the flaps not being Finally n f eatures represents the amount of columns se-
fully deployed, the aircraft speed decreases and there is not lected as potential features for the scenario.
enough lift generated by the wings. As detailed in Figure 1, AutoEncoders are composed by
In contrast with Outlier A, this one has now special haz- three main elements: encoder, code and decoder. We have
ardous situation but had some metrics and performance that designed an AutoEncoder with 2 layers for the encoding part,
was out of the ordinary. This presents another possible benefit one single layer for the code and another 2 layers for the
of these types of algorithms. It can help detect flights that decoder. Therefore the neural network is composed by a total
without surpassing any defined threshold can be labelled as of 5 LSTM layers. For the activation functions we used ReLU
anomalous and help detect unknown hazards or risk behaviors function due to it popularity since it avoids and rectifies
in an unprecedented level. Metrics evolution over time are the vanishing gradient problem. The AutoEncoder receives
represented in Figure 8 a complete landing attempt as an input in a (1,825) vector
format, encode it and decode it to finally return it minimizing
the loss.

(a) AP status (b) Flaps (rad)


Figure 8: Outlier B features compared with the rest of the distribution
percentiles. Horizontal axis represents the distance, from touchdown point
to 12 NM from TD.

Figure 9: ANN layers for the LSTM AutoEncoder, showing the encoder, code
and decoder layer sizes.
C. Predictive analysis: AutoEncoder
The starting point for the predictive analysis are the outlier
scores calculated in the descriptive analysis. These scores will To train the auto-encoder, we have followed these steps:
be used to discern between abnormal and normal approaches. • Divide the dataset in two parts, negatively labeled (normal
The AutoEncoder will be trained only using normal ap- approaches) and positively labeled (outlier approaches)
proaches to learn the behaviour of a non-anomalous approach. • Ignore the anomalies and train the auto-encoder only with
Once trained, the AutoEncoder compares each input flight with negatively labeled data. Afterwards we will test the model
the learnt approach behaviour, outputting a reconstruction error using positively labeled data, to assess if model filter
score. anomalies properly.

9th SESAR Innovation Days


2nd – 5th December 2019
ISSN 0770-1268
7

(a) Loss distribution for train set (normal)

Figure 10: Histogram of outlier scores obtained from GLOSH algorithm. The
red line represents the threshold for 95% quantile of the distribution. Those
points above the threshold are extreme outliers.

The process of filtering which approaches are normal or not


relies on analising the outlier scores obtained as descriptive
results. If we check the outlier score distribution in Figure 5,
(b) Loss distribution for test set (abnormal)
most of the samples have a score around zero. We established
Figure 11: MSE histogram for training and testing sets. Reconstruction error
four thresholds to categorize the ”normality” for samples with threshold = 0.03
a score greater than zero. These thresholds are based on the
distribution quantiles:
• Threshold 1 (THR1): near zero time to analyse how well the AutoEncoder filters the anomalies
• Threshold 2 (THR2): 70% quantile contained in the testing dataset.
• Threshold 3 (THR3): 90% quantile
• Threshold 4 (THR4): 95% quantile VI. R ESULTS
These four regions are represented using different colors in To test the performance of the predictive model, a selected
Figure 10. This way we can measure how abnormal a sample batch of anomalous data was inputted. We measured the
is from those having a zero outlier score. For the input dataset accuracy of classifying correctly an anomaly, finding out that
composed by 35.000 flights, the distribution for each region the AutoEncoder is able to classify more than 74% of the
is based on the anomaly severity: anomalies, predicting correctly almost all the Medium and
• Normal (outlier score = 0) - 12.011 flights High severity anomalies (Figure 12).
• Very low (THR1<outlier score<THR2) - 12.489 flights
• Low (THR2<outlier score<THR3) - 7.000 flights
• Medium (THR3<outlier score<THR4) - 1.750 flights
• High (outlier score>THR4) - 1.750 flights

With this outlier distribution we will label negatively only


those having an outlier score equals to zero, and positively
all the flights with an outlier score greater than 0, grouping
them by how far from ”normal” behaviour they are. Thus, our
training set has 12.011 flights using a 10% as a validation set,
and our testing set has 22.989 flights, formed by anomalies of
several degrees.
Once the AutoEncoder has been trained and validated, we Figure 12: Reconstruction error for normal flights and outliers with a score
should set an empirical threshold to classify normal from ab- higher than 95% quantile.
normal observations. To establish this threshold, we calculate
the mean squared error (MSE) for the samples of the training However, some flaws were detected. For example, the model
dataset, to establish an optimal limit that contains as much struggles to classify samples with low outlier score. This is
normal samples as possible below the threshold. The Figure expected because low score samples are very similar to the
11a represents the MSE distribution for the training set. The ”regular” approaches that we used to train the model. Further
majority of the training samples return a mean error of 0.005, research can be made to try to tackle this problem by following
with some outliers that almost raise a loss of 0.04. Based on several methodologies. For example, adjusting more precisely
the error distribution, and the precision/recall curve, where the decision threshold, training with more samples, including
we want to maximize the recall without loosing too much additional meaningful features or using an even deeper ANN.
precision, we set the reconstruction error threshold at 0.03 The loss distribution is represented in Figure 11b, where
Afterwards, once the reconstruction error has been fixed is we can appreciate that most samples are located above the

9th SESAR Innovation Days


2nd – 5th December 2019
ISSN 0770-1268
8

threshold. Furthermore, if we filter out the low to medium proved once again the efficiency of AutoEncoder architectures
severity anomalies, and only consider the most severe ones when working with time-series in aviation.
filtering the upper 95% quantile, the AutoEncoder perfectly ACKNOWLEDGMENT
recognizes normal form abnormal observations. This is pre-
sented in Figure 12. We want to thank the whole Safeclouds.eu consortium
for providing the data sources and operational knowledge
necessary to perform this research. With special mentions
to TUM’s institute of flight system dynamics for providing
the algorithm for decoding and preparing FDM data. And
Iberia airlines safety department for providing the operational
expertise necessary to validate the results.
R EFERENCES
[1] R. Fernandes. An analysis of the potential benefits to airlines of flight
data monitoring programs (msc). 2002.
[2] Sameer Jasra, Jason Gauci, Alan Muscat, Gianluca Valentino, David
Zammit-Mangion, and Robert Camilleri. Literature review of machine
learning techniques to analyse flight data. 2018.
[3] Vijay Manikandan Janakiraman. Explaining aviation safety incidents
using deep temporal multiple instance learning, 2017.
[4] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly
detection: A survey. ACM Comput. Surv., 41, 07 2009.
Figure 13: Confusion matrix for normal and abnormal (all severities) flights [5] Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-
supervised learning. 2006.
[6] Pierre Baldi. Autoencoders, unsupervised learning, and deep archi-
The confusion matrix (Figure 13) exemplifies even further tectures. In Isabelle Guyon, Gideon Dror, Vincent Lemaire, Graham
the main properties of the predictive model. The algorithm is Taylor, and Daniel Silver, editors, Proceedings of ICML Workshop on
Unsupervised and Transfer Learning, volume 27 of Proceedings of
very efficient at detecting very anomalous flights, but fails to Machine Learning Research, pages 37–49, Bellevue, Washington, USA,
set a distinguishable boundary between a normal flight and a 02 Jul 2012. PMLR.
flight presenting low severity hazard. [7] Marius Kloft, Ulf Brefeld, Patrick Düssel, Christian Gehl, and Pavel
Laskov. Automatic feature selection for anomaly detection. pages 71–
VII. D ISCUSSION 76, 01 2008.
[8] Sergios Theodoridis and Konstantinos Koutroumbas. Pattern Recog-
Given the promising results obtained for the 95% percentile nition, Fourth Edition. Academic Press, Inc., Orlando, FL, USA, 4th
anomalies, the algorithm provides an extremely useful tool edition, 2008.
[9] Bryan Matthews, Santanu Das, Kanishka Bhaduri, Kamalika Das, Rod-
for FDM and Safety analysts. An implementation could be ney Martin, and Nikunj Oza. Discovering anomalous aviation safety
an automatic FDM data labelling system. At the end of events using scalable data mining algorithms. Journal of Aerospace
a day of operations, when FDM data is retrieved by the Information Systems, 11:482–482, 07 2014.
[10] Lishuan Li and R. John Hansman. Anomaly detection in airline routine
safety department, the system could flag anomalous flights for operations using flight data recorder data. Report No. ICAT-2013-4, MIT
inspection. This can enable a tool for analysing and flagging International Center for Air Transportation (ICAT), 2013.
large amount of FDM data. Not just for forensic analysis but [11] John Bro. Fdm machine learning: An investigation into the utility of
neural networks as a predictive analytic tool for go around decision
also for aircraft maintenance, enabling inter-operability with making. Journal of Applied Sciences and Arts: Vol. 1 : Iss. 3 , Article
existing predictive maintenance tools. 3., 2017.
[12] Santanu Das, Bryan Matthews, Ashok Srivastava, and Nikunj Oza.
Another implementation of the methodology could be a real- Multiple kernel learning for heterogeneous anomaly detection: algorithm
time monitoring tool, which is able to ”quantify the risk” of and aviation safety case study. pages 47–56, 07 2010.
the approach performed. This means that, by using the trained [13] Vijay Manikandan Janakiraman and David Nielsen. Anomaly detection
in aviation data using extreme learning machines. pages 1993–2000, 07
AutoEncoder, the tool could give an outlier score to a given 2016.
flight. This information then could be used to warn airport [14] J. Castle, J. Stutz, and D. McIntosh. Automatic discovery of anomalies
crew, ATCOs, the airline or the crew. reported in aerospace systems health and safety documents. 05 2007.
[15] Wei-Han Lee, Jorge Ortiz, Bongjun Ko, and Ruby Lee. Time series
Overall, the semi-supervised methodology has been suc- segmentation through automatic feature learning, 2018.
cessful. By combining two very powerful machine learning [16] Thomas Dubot. Predicting sector configuration transitions with
algorithm (HDBSCAN clustering and AutoEncoders), we were autoencoder-based anomaly detection. 06 2018.
[17] Xavier Olive, Jeremy Grignard, Thomas Dubot, and Julie Saint-Lot.
able to solve a very complex problem both from the data Detecting controllers’ actions in past mode s data by autoencoder-based
science and the aviation safety perspectives. The main benefits anomaly detection. 11 2018.
of using a semi-supervised architecture is to not only to [18] DataBeacon. Remain in control of your data while optimizing your
operations through artificial intelligence - databeacon.aero, 2019.
automate and speed-up the detection of known events (e.g. [19] Laurens van der Maaten and Geoffrey E. Hinton. Visualizing data using
late flaps deployment) but also to support the detection of t-sne. 2008.
unknown hazards and rare events. [20] Ricardo J. G. B. Campello, Davoud Moulavi, and Joerg Sander. Density-
based clustering based on hierarchical density estimates. In Jian Pei,
The first part of the research has proven that, when using the Vincent S. Tseng, Longbing Cao, Hiroshi Motoda, and Guandong Xu,
adequate dimensionality reduction techniques, the automatic editors, Advances in Knowledge Discovery and Data Mining, pages 160–
filtering of outliers is feasible. The second part of the research 172, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.

9th SESAR Innovation Days


2nd – 5th December 2019
ISSN 0770-1268

View publication stats

You might also like