0% found this document useful (0 votes)
44 views12 pages

Final Version-A Deep Learning Approach For Flight Delay Prediction Through Time-Evolving Graphs

This document proposes a deep learning approach for flight delay prediction using time-evolving graphs. The approach uses a graph convolutional neural network (GCN) that employs a temporal convolutional block to model the time-varying patterns of flight delays through a sequence of airport network graphs. It also includes an adaptive graph convolutional block to handle incomplete graph inputs from occasional new air routes. The goal is to predict flight delays considering the dynamic spatial interactions between connected airports, unlike previous single-airport approaches. Extensive experiments showed the proposed approach outperformed benchmarks with improved accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views12 pages

Final Version-A Deep Learning Approach For Flight Delay Prediction Through Time-Evolving Graphs

This document proposes a deep learning approach for flight delay prediction using time-evolving graphs. The approach uses a graph convolutional neural network (GCN) that employs a temporal convolutional block to model the time-varying patterns of flight delays through a sequence of airport network graphs. It also includes an adaptive graph convolutional block to handle incomplete graph inputs from occasional new air routes. The goal is to predict flight delays considering the dynamic spatial interactions between connected airports, unlike previous single-airport approaches. Extensive experiments showed the proposed approach outperformed benchmarks with improved accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

A Deep Learning Approach for Flight Delay Prediction

through Time-Evolving Graphs


Kaiquan Cai, Yue Li, Yiping Fang, Yanbo Zhu

To cite this version:


Kaiquan Cai, Yue Li, Yiping Fang, Yanbo Zhu. A Deep Learning Approach for Flight Delay Prediction
through Time-Evolving Graphs. IEEE Transactions on Intelligent Transportation Systems, In press,
pp.1-11. �10.1109/TITS.2021.3103502�. �hal-03428046�

HAL Id: hal-03428046


https://hal.science/hal-03428046
Submitted on 14 Nov 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est


archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
1

A Deep Learning Approach for Flight Delay


Prediction through Time-Evolving Graphs
Kaiquan Cai, Member, IEEE, Yue Li, Yi-Ping Fang, Member, IEEE, and Yanbo Zhu, Member, IEEE

Abstract—Flight delay prediction has recently gained growing delays and the development of more advanced flight delay
popularity due to the significant role it plays in efficient airline prediction approaches in both industry and academia [6].
and airport operation. Most of the previous prediction works In flight delay prediction, mathematical simulations and
consider the single-airport scenario, which overlooks the time-
varying spatial interactions hidden in airport networks. In this data-driven methods are currently two types of representative
paper, the flight delay prediction problem is investigated from a work. The mathematical simulation methods [7]–[9] employ
network perspective (i.e., multi-airport scenario). To model the mathematical tools to model air traffic operations and usually
time-evolving and periodic graph-structured information in the require huge computational resources. Moreover, there are
airport network, a flight delay prediction approach based on typically some impractical assumptions and/or simplifications
the graph convolutional neural network (GCN) is developed in
this paper. More specifically, regarding that GCN cannot take made in these methods, which make it more challenging for
both delay time-series and time-evolving graph structures as these methods to be used in actual situations and greatly
inputs, a temporal convolutional block based on the Markov decrease the prediction accuracy [10]. In recent decades, data-
property is employed to mine the time-varying patterns of driven methods have gained extensive attention owing to the
flight delays through a sequence of graph snapshots. Moreover, availability of massive air traffic data [11]–[13]. In detail, Hao
considering that unknown occasional air routes under emergency
may result in incomplete graph-structured inputs for GCN, et al. [14] developed a regression model combined with an
an adaptive graph convolutional block is embedded into the econometric method to predict flight delays at three major
proposed method to expose spatial interactions hidden in airport commercial airports in New York. Chen et al. [15] utilized
networks. Through extensive experiments, it has been shown a multi-label random forest classification method combined
that the proposed approach outperforms benchmark methods with air traffic operation to predict sequence future delays for
with a satisfying accuracy improvement at the cost of acceptable
execution time. The obtained results reveal that deep learning individual flights along their scheduled itineraries. Yu et al.
approach based on graph-structured inputs have great potentials [16] proposed a deep belief network method to mine the inner
in the flight delay prediction problem. patterns of flight delays within a set of micro influential factors
Index Terms—Flight delay prediction, time-evolving airport and presented a practical flight delay prediction at Beijing
network, graph-structured information, graph convolutional neu- Capital International Airport.
ral network. However, above-mentioned works only concerned on the
single-airport scenario and neglected dynamic spatial interac-
tions hidden in airport networks. Indeed, flight delays can
I. I NTRODUCTION randomly take place within air transportation system and
readily propagate throughout the tightly connected airport
IR transportation is of great significance in business and
A tourism, achieving a record high to accommodate near
4.5 billion passengers worldwide in 2019 [1]. According to
network owing to the large number of interconnected resources
(e.g., aircraft, flight crews, passenger, and infrastructure) [17]–
[19]. Therefore, it is necessary to focus on the dynamic spatial
the statistics in the same year, the average delay per flight
interactions among connected airports, i.e., graph-structured
was 13.1 minutes in Europe and 12.9 minutes in the United
information.
States, and the number in China was around 14 minutes [2]–
Recent years have witnessed a growing interest in under-
[4]. These delays have led into inevitable consequences such
standing the spatial dependencies in flight delay prediction
as unpleasant passenger experiences, followed by economic
problem [20]–[22]. Specifically, Pyrgiotis et al. [23] devel-
losses of relevant airspace users. The annual cost of flight
oped network decomposition models to study the complex
delays to the global economy was estimated to be $50 billion
phenomenon of the propagation of delays and provided several
in 2019 [5]. Such high loss motivates the analysis of air traffic
encouraging consequences. Similarly, Du et al. [17] applied a
delay causality network to study the propagation mechanism
K. Cai and Y. Li are with the School of Electronic and Information
Engineering, Beihang University, Beijing 100191, China, and also with the of flight delays within a large-scale airport network based on
National Key Laboratory of CNS/ATM, Beijing 100191, China (e-mail: the interdependence of delay time-series. Recently, Wu and
[email protected]; [email protected]). Law [24] developed a Bayesian network in the delay-tree
Y. Fang is with the Laboratoire Génie Industriel, CentraleSupélec, Univer-
sité Paris-Saclay, 3 Rue Joliot Curie, 91190 Gif-sur-Yvette, France (e-mail: framework to examine complicated delay propagation effects
[email protected]). in an airline network.
Y. Zhu is with the School of Electronic and Information Engineer- In spite of the advances in understanding the spatial de-
ing, Beihang University, Beijing 100191, China, and also with the Avi-
ation Data Communication Corporation, Beijing 100191, China (e-mail: pendencies among airports in flight delay prediction problem,
[email protected]). previous works mostly concentrate on analyzing the qualitative
2

influence of flight delays on airport networks and may not be over, to capture the unknown newly-formed air routes under
appropriate for quantitative prediction within dynamic spatial urgent requirements, an adaptive graph convolutional block
interactions. is embedded into the architecture, which parameterizes two
Fortunately, utilizing deep learning methods to capture types of graphs (namely, importance and similarity graphs)
spatial and temporal dependency, timely and accurate traf- based on predefined structure of an airport network. These
fic prediction has gained growing attention in transportation parameterized graphs are trained and updated jointly with
management area due to the significant benefits it might bring convolutional parameters of the model. Overall, the proposed
to traffic control and guidance [25]–[28]. More specifically, flight delay prediction approach can be distinguished from the
the graph convolutional neural network (GCN), with the prior works in the following two aspects:
capability of extracting complex non-linear relationships in • To model the dynamic spatial interactions hidden in
general graphs, brings opportunities in handling complicated an airport network, the flight delay prediction problem
traffic forecasting problems with the consideration of graph- is investigated from a network perspective (i.e., multi-
structured information [29]–[32]. Recently, based on GCN airport scenario).
models, researchers have proposed a series of intelligent meth- • Considering unknown newly-formed air routes under
ods to provide quantified diagnostics for ground transportation urgent requirements, an adaptive graph convolutional
[33]–[35]. Yu et al. [36] utilized a recurrent neural network to network component is developed to learn complete graph-
model the sequential data and developed a deep neural network structured inputs for GCN.
based on long short term memory units for traffic forecasting. This paper is organized as follows. In Section 2, we formu-
To model the non-linear temporal dependency of traffic, Pan et late the flight delay prediction problem and elaborate on the
al., [37] proposed a meta recurrent neural network combined details of time-evolving nature of airport networks. Section
with an encoder-decoder architecture to predict urban traffic. 3 describes the methodology, including the deep learning ar-
Zhang et al. [38] improved GCN by applying the attention chitecture, the temporal convolutional block, and the adaptive
mechanism to aggregate features from the neighbours of a graph convolutional block. In Section 4, the proposed model
node and developed a deep learning framework to predict is compared with several benchmark approaches. Section 5
traffic speed. reports the case study of the Chinese airport network. The
Inspired by the recent success of GCN in various prediction final section concludes the paper with some brief remarks.
tasks for ground transportation, this paper addresses the flight
delay prediction problem by abstracting it as a time-series II. P ROBLEM FORMULATION
analysis task based on graph-structured information of an air-
We define a set of nodes as V = {v1 , v2 , ..., vn } to represent
port network. And an improved GCN is utilized to capture the
all airports in an airport network. At time t, the airport network
highly meaningful patterns of flight delays in the multi-airport
can be described as a directed and weighted network G(t) =
scenario. Due to the dynamic nature of airport networks, two
(V, E (t) , W (t) ), where E (t) is a set of edges representing air
critical problems need to be overcome:
routes among nodes in V . W (t) ∈ Rn×n denotes the weighted
• The structure of an airport network may evolve over time (t)
adjacency matrix of G(t) , with its element wij representing
in a day, e.g., scheduled air routes for rush or slack the number of flights between airport vi and airport vj during
hours; flight cancellations due to accidents or natural (t) (t) (t)
(t − 1, t). Moreover, Y (t) = {y1 , y2 , ..., yn } ∈ Rn is
disasters. However, the conventional GCN cannot handle employed to denote an observation vector of n airports at time
both delay time-series and time-evolving graph structures (t)
t, of which each element yi records the delay experienced
simultaneously. at a single airport i during (t − 1, t). For airport i, in addition
• To address urgent requirements (e.g., military activity to the actual flight delays, cancellations should be considered
and mechanical failure) in daily operation, the air traffic as a delay metric to assess the on-time performance of an
management department might develop temporary air airport. Therefore, according to the regulations of the Federal
routes to mitigate the severe effects of emergencies. Aviation Administration (FAA), the Eurocontrol’s Network
These temporary air routes are quite ad hoc and typically Manager (NM) Operations Centre, and the Civil Aviation
unknown to flight delay prediction models, which results Administration of China (CAAC), ρ is utilized to indicate the
in incomplete graph-structured inputs for GCN. equivalent delay time of a cancellation. Then, the weighted
To address the above problems, this paper proposes a value of the actual flight delays and the equivalent delays of
(t)
novel flight delay prediction method by jointly modeling the cancellations is used to represent the average delay yi of an
spatio-temporal features of an airport network. Specifically, a airport.
(t) (t)
deep learning architecture with graph-structured inputs, named (t) m + ρ ∗ ci
yi = i , (1)
Multiscale Spatial-Temporal Adaptive Graph Convolutional (t)
ai
Neural Network (MSTAGCN), is designed to resolve the flight
(t)
delay prediction problem for multi-airport scenario in this where mi denotes the total delay of departure flights at
(t) (t)
paper. Considering the time-varying graph-structured inputs airport i during (t − 1, t), ci and ai reveal the number
for GCN, a temporal convolutional block based on the Markov of cancelled flights and scheduled departure flights at airport
property is developed to capture the temporal dependency i during (t − 1, t), respectively, and ρ = 180 represents the
of air traffic through a sequence of graph snapshots. More- equivalent delay of a cancellation [17].
3

Weekly-periodic segment Daily-periodic segment Recent segment


(24 snapshots) (24 snapshots) (24 snapshots) ĸ Training Layer 1 Layer 2
Residual
Adaptive graph Recurrent
R-GCN
Temporal
convolutional block ReLU convolutional block
BN
Temporal GCN Spatial GCN
[2:22] [2:22] [1:21]
Dropout BN
Pw Pd Pr Pf
time ReLU ReLU
distant near recent future
BN Dropout
Spatial GCN Temporal GCN
Fig. 1. Illustration of constructing the multiscale historical delay sequences BN
based on graph snapshots. Temporal ReLU
Adaptive graph
convolutional block Recurrent R-GCN Residual
convolutional block

We denote the adjacency matrix W (t) and the delay vector ķ Inputs
Concat
Ĺ Output
(t) [2:22] [1:21]
Y as a graph snapshot at time t [39]. A sequence of graph [2:22]

snapshots W = {W (0) , W (1) , ..., W (t) } and corresponding


delay observations Y = {Y (0) , Y (1) , ..., Y (t) } is regarded as future
Weekly-periodic segment Daily-periodic segment Recent segment
a time-evolving graph in this paper. Note that the structure (24 snapshots) (24 snapshots) (24 snapshots)

of an airport network can evolve over time in a day, and distant near recent time
the evolution presents obvious periodic patterns. In fact, a
flight schedule in air transportation is typically composed Fig. 2. System architecture of Multiscale Spatial-Temporal Adaptive Graph
of two different flight seasons: Winter-Spring season and Convolutional Network.
Summer-Autumn season. In each season, all carriers perform
similar daily flight plans except for weekends, thus air traffic into the daily-periodic delay sequence; The weekly-periodic
and flight delays at each airport show similar daily patterns. delay sequence consists of delays of all airports between
Furthermore, due to the regular flight timetables, air traffic and 20:00h (the last Tuesday) and 17:00h (the last Wednesday).
flight delays also exhibit weekly periodicity, e.g., air traffic Through extensive experiments, N = 21 is chosen in our case
pattern on the latest Tuesday has a strong similarity with the study.
air traffic on Tuesdays in history. To take advantage of the In this paper, the flight delay prediction problem is con-
long-term periodicity of air traffic, we formulate the multiscale verted to a time-series analysis task on time-evolving airport
(t) (t) (t)
historical delay sequences X (t) = {Pw , Pd , Pr } at time networks and we aim to predict the most likely delays for
(t) (t)
t as shown in Fig. 1, where Pw ∈ RN ×n , Pd ∈ RN ×n , each airport in the next M time steps given the air traffic
(t) N ×n
and Pr ∈ R represent the weekly-periodic segment, the observations in the historical 3 ∗ N time steps (Note: the past
daily-periodic segment and the recent segment, respectively. 3∗N hours are chosen no matter they are in the same calendar
The details of the three delay segments are described as day or not). This procedure can be formulated as
follows:
[X (t) ] → [Y (t) , ..., Y (t+M −1) ], (5)
Pr(t) = {Y (t−N )
,Y (t−N +1)
, ..., Y (t−1)
}, (2)
III. M ETHODOLOGY
Pd
(t)
= {Y (t−N −24+1) , Y (t−N +1−24+1) , ..., Y (t−1−24+1) }, (3)
A. Overview of the proposed model
The proposed model (i.e., MSTAGCN) is composed of
(t−N −7∗24+1) two multiscale spatial-temporal adaptive graph convolutional
Pw(t) = {Y ,Y (t−N +1−7∗24+1)
, ..., Y (t−1−7∗24+1)
},
(4) layers and a residual connection is added for each layer to
where the time t is represented by the red graph snapshot stabilize the training procedure, as shown in Fig. 2. Each mul-
(t)
in Fig. 1. The recent delay sequence Pr , including N graph tiscale spatial-temporal adaptive graph convolutional layers is
snapshots before time t during the same day and the day before consisted of a temporal convolutional block and an adaptive
if t ≤ (N − 1)h, records the neighboring graph snapshots spatial convolutional block.
which have inevitable influences on future traffic. The daily- MSTAGCN takes a time-evolving airport network (i.e., a
(t)
periodic delay sequence Pd is consisted of N graph snap- sequence of graph snapshots) and the multiscale historical de-
shots tracing back from (and including) the predicted period in lay sequences as inputs. After stacking two multiscale spatial-
the day before, representing the daily periodicity of air traffic. temporal adaptive graph convolutional layers, we apply a fully
(t) connected output layer at the final step and the prediction
The weekly-periodic delay sequence Pw , including N graph
results are generated. Moreover, the Mean Squared Error is
snapshots tracing back from (and including) the predicted
employed as the loss function to evaluate the performance of
period in the same day of the previous week, exposes the
the proposed approach.
weekly periodicity of air traffic. For example, assuming that
N = 21 and the time t denotes a period from 16:00h to 17:00h
on a Wednesday. Then, the recent delay sequence records B. A temporal convolutional block for mining the time-varying
the historical delays of all airports between 19:00h (Tuesday) patterns of flight delays
and 16:00h (Wednesday); The historical delays of all airports An airport network (a directed graph) can be considered as
between 20:00h (Monday) and 17:00h (Tuesday) are poured a multi-relational graph with incoming and outgoing relations,
4

and the relational GCN (R-GCN) is capable of handling multi-


relational graphs. So, we first employ R-GCN to model the
spatial interactions of each airport in a single graph snapshot. Adaptive
Then, the temporal dynamics of flight delays between two
adjacent graph snapshots is investigated in accordance with the
Markov property. Based on above observations, we generalize
R-GCN to process the time-evolving graph. Fig. 3. Illustration of the evolution of the graph convolution operator. Blue
Considering a single graph snapshot, the adjacency matrix point: civil airport. Black line: the scheduled air routes. Red line: the unknown
temporary air routes.
W (t) and the multiscale historical delay sequences X (t) are
taken as inputs. We define the notion of graph convolution
operator ?G based on spectral graph convolution, as the C. An adaptive graph convolutional block for capturing un-
multiplication of the multiscale delay sequences X (t) with a known spatial interactions hidden in airport networks
kernel ΘF :
X The adaptive graph convolutional block is composed of a
ΘF ? GX (t) = σ( (Dr(t) )−1 Wr(t) X (t) Mr + X (t) M0 ), (6) spatial GCN (Convs), a temporal GCN (Convt), a dropout
r∈R layer, and a residual connection. Each GCN-based layer is
where ΘF denotes the parameter set used in a single followed by a batch normalization layer (BN) and a ReLU
graph snapshot, σ(·) represents the activation function. R = layer (ReLU). Here, the temporal GCN is the same as the
(t) model proposed by Shi et al. [41], i.e., it applies a Kt × 1
{in, out} reveals a set of relation type, Win = W (t) denotes
(t) convolution on the Ct ×N ×n feature map to capture temporal
the incoming relation and Dr is the diagonal degree matrix information through an airport network. The spatial GCN for
(t) (t) (t)
with (Dr )ii = j (Wr )ij , Wout = (W (t) )T represents the
P
the space domain is described as follows.
outgoing relation. And Min , Mout reveal the weight matrix for The adjacency matrix is defined as A corresponding to
incoming and outgoing relations, respectively. M0 represents the last hidden state H (tlast ) of the temporal convolutional
the self-connection weight matrix [40]. Moreover, to further block. As shown in Fig. 2, the adaptive graph convolutional
generalize R-GCN and prevent overfitting, a linear combina- block employs a spatial GCN in the middle to concatenate
tion of incoming and outgoing relations is utilized to simplify the temporal convolutional block. Therefore, we can take the
adjacency matrix A and the last hidden state H (tlast ) as inputs
the self-connection weight matrix [39]. The Eq. (6) can then to the spatial GCN. Then, the graph convolution operator ?G
be rewritten as is defined as the multiplication of the last hidden state H (tlast )
X with a kernel ΘI :
ΘF ? GX (t) = σ( Wf (t) X (t) Mr ),
r (7)
r∈R ΘI ? GH (tlast ) = ΘI (L)H (tlast )

fr(t) = (D e r(t) )ii = P (Wr(t) +


e r(t) )−1 (Wr(t) + In ), (D = ΘI (U ΛU T )H (tlast ) (10)
where W j −1 (tlast )
In )ij , and In represents a n-dimension identity matrix. = ΘI (In − Q A)H ,
Considering two adjacent graph snapshots at time t−1 and t, where ΘI denotes the parameter set used in the spatial
we take the processed adjacency matrix W fr(t−1) , W fr(t) and the GCN modeling. The graph Fourier basis U ∈ Rn×n is the
(t−1) (t)
multiscale historical delay sequences X , X as inputs. eigenmatrix of the random walk Laplacian L = In − Q−1 A =
Then, the graph convolution operator ?G for two adjacent U ΛU T ∈ Rn×n , where In represents a n-dimension identity
graph snapshots can be generalized as follow: matrix, A ∈ Rn×n is the adjacency matrix P and Q ∈ R
n×n
is
the diagonal degree matrix with Qii = j Aij , Q−1 denotes
ΘB ?G[X (t−1) , X (t) ] = σ(ΘH ?GX (t−1) +ΘF ?GX (t) ), (8) the inverse matrix of Q, and Λ ∈ Rn×n indicates the diagonal
where ΘB denotes the parameter set used in two adjacent matrix of eigenvalues of L. To efficiently study flight delays
graph snapshots. ΘH ? GX (t−1) represents the graph convo- at the system level, the 1st -order Approximation method [42]
fr(t−1) and X (t−1)
lution operation at time t − 1, and it take W is adopted in this paper due to its simplicity and proven
as inputs. Note that the parameter set ΘH does not change performance. The Eq. (10) can then be rewritten as
over time. Overall, the graph convolution operation for two 2
ΘI ? GH (tlast ) ≈ θ0 H (tlast ) + θ1 ( L − In )H (tlast )
adjacent graph snapshots can be viewed as a combination of λmax
both current and accumulating previous graph snapshots. ≈ θ0 H (tlast ) − θ1 (Q−1 A)H (tlast ) ,
Finally, a hidden state H (t−1) is utilized to memorize the (11)
accumulating previous graph snapshots, and a combination of where θ0 , θ1 represent two shared parameters in the kernel
the hidden state and the current input is employed to generate ΘI . To ensure a stabilize numerical performance [10], here
a new hidden state: we assume that θ = θ0 = −θ1 . The Eq. (11) can then be
expressed by
H (t) = σ(ΘB ? G[H (t−1) , X (t) ]), (9)
ΘI ? GH (tlast ) = θ(In + Q−1 A)H (tlast ) , (12)
where ΘB includes ΘH and ΘF . In these settings, we can
handle a time-evolving graph through applying Eq. (9) se- note that the graph convolution operator designed by Eq. (12)
quentially. concentrates on the scheduled air routes of an airport network,
5

while the unknown temporary air routes for urgent require- are poured into the experimental dataset. The dataset contains
ments (as shown in Fig. 3) are neglected. To address this 2.19 million scheduled flights connecting 74 critical airports,
problem, an adaptive graph convolutional layer is employed to which serve more than 90% of the air traffic in China. All raw
model the complicated spatial interactions in airport networks data are normalized by Z-Score method. 70% of the dataset
under urgent situations. More specifically, the importance is employed for training, 15% are used for testing while the
and similarity matrixes based on the scheduled structure of remaining 15% for validation. All experiments are tested on a
an airport network are added and parameterized, and these Linux cluster (i.e., CPU: Intel (R) Xeon (R) Gold 6126 CPU
matrixes are trained and updated jointly with the convolutional @ 2.60GHz, GPU: NVIDIA TITAN RTX).
parameters of the model. These matrixes are unique for
different layers and reveal the critical airports in flight delay
B. Baselines
propagation. The Eq. (12) can then be rewritten as
We compare our model with two types of baseline mod-
ΘA ? GH (tlast ) = θ∗ (In + Q−1 (A + Z + S))H (tlast ) , (13) els: single-airport scenario model and multi-airport scenario
where ΘA denotes the parameter set used in the spatial model.
GCN, θ∗ represents the shared parameter in the kernel ΘA . (1) Single-airport scenario models
A represents the scheduled structure of an airport network, • ARIMA [44]: Auto-Regressive Integrated Moving Aver-
Z ∈ Rn×n shows the importance of each airport in an airport age, which is one of the most popular methods in time-
network and plays the same role as the attention mechanism series prediction task.
[43]. Moreover, elements in Z are updated and trained together • SVR [45]: Support Vector Regression with Radial Basis
with other parameters based on the training inputs. Therefore, Function Kernel, the penalty term is set to 0.1, the number
there is no constraint on the value of each element, which of historical observation is 21.
allows the generation of temporary connections not existing (2) Multi-airport scenario models
in the scheduled structure of an airport network. To determine
• DCRNN [46]: Diffusion Convolutional Recurrent Neural
whether there is a connection between two airports and how
Network. Both encoder and decoder contain two recurrent
strong the connection is, the normalized embedded Gaussian
layers and there are 64 units in each recurrent layer. The
function (i.e., Eq. (14)) is employed to measure the similarity
initial learning rate is 1e−2 with a decay rate of 0.1 after
of two airports (Note that the similarity function is not sym-
every 10 epochs and early stopping is utilized on the
metric because the airport network is described as a directed
validation dataset. The maximum step of random walk is
network in this paper):
set to 3. We train our models by minimizing the mean
T
exp(ϕ(vi ) )ψ(vj ) square error using Adam (Adaptive Moment Estimation)
f (vi , vj ) = Pn T
, (14) optimizer for 50 epochs with batch size 64.
j=1 exp(ϕ(vi ) )ψ(vj )
• STGCN [10]: Spatial-temporal Graph Convolutional Net-
where ϕ(·) and ψ(·) represent two embedding functions work. The channels of three layers in ST-Conv block are
designed to map the original feature map H (tlast ) into two 64, 16, 64 respectively. Both the graph convolution kernel
embedded feature maps S 1 ∈ Rn×Ce N , S 2 ∈ RCe N ×n . size and temporal convolution kernel size are set to 3.
Specifically, we firstly embed the original feature map H (tlast ) The initial learning rate is 1e−4 with a decay rate of 0.6

into H (tlast ) ∈ RN ×n×Ce with the two embedding functions. after every 5 epochs. We train our models by minimizing
Here, we employ a 1 × 1 convolutional layer as the embed- the mean square error using Adam (Adaptive Moment
ding function based on extensive experiments. Then, the two Estimation) optimizer for 50 epochs with batch size 64.
embedded feature maps are rearranged to S 1 and S 2 , so the
similarity matrix S can be defined as the multiplication of S 1
with S 2 . Since the normalized embedded Gaussian is equipped C. Evaluation Metrics
with a softmax operation, we can obtain the similarity matrix The Root Mean Squared Error (RMSE), Mean Absolute
S based on Eq. (14) as follow: Error (MAE), and Mean Absolute Percentage Error (MAPE)
T are employed to evaluate the performance of the models:
S = sof tmax(H (tlast ) ΘTϕ Θψ H (tlast ) ), (15) v
u k
where Θϕ , Θψ reveal the parameter set of the embedding u1 X
RM SE = t (x̂i − xi )2 , (16)
function ϕ(·) and ψ(·), respectively. k i=1

IV. E XPERIMENTAL RESULTS OF THE C HINESE AIRPORT 1X


k
NETWORK M AE = |x̂i − xi |, (17)
k i=1
A. Dataset
k
The data used in this paper are provided by CAAC, compris- 100% X x̂i − xi
ing all domestic flights from April 1, 2018 to October 31, 2018 M AP E = | |, (18)
k i=1 xi
(i.e., Summer-Autumn flight season). Considering the sparse
flight schedules at spoke airports, 224 civil airports are ranked where k is the number of testing samples. x̂i and xi denote the
according to the handling capacity and the top 74 busy airports real traffic information and predicted flight delay, respectively.
6

TABLE I
P ERFORMANCE COMPARISON OF DIFFERENT METHODS ON THE C HINESE AIRPORT NETWORK .

M Metric ARIMA SVR DCRNN STGCN MSTAGCN


RMSE 18.89 12.201±0.471 11.313±0.187 11.288±0.133 10.371±0.168
1 hour MAE 12.79 7.154±0.128 6.605±0.095 6.564±0.049 5.884±0.072
MAPE 32.72% 19.31±0.95% 16.68±0.58% 16.42±0.27% 13.39±0.45%
RMSE 21.841 12.601±0.489 11.554±0.193 11.506±0.151 10.613±0.174
2 hours MAE 15.33 7.421±0.143 6.824±0.094 6.773±0.047 6.123±0.089
MAPE 41.51% 20.19±0.98% 17.57±0.61% 17.24±0.32% 14.90±0.52%
RMSE 24.547 12.761±0.514 11.684±0.191 11.641±0.159 10.787±0.181
3 hours MAE 17.811 7.493±0.142 6.963±0.095 6.924±0.054 6.252±0.093
MAPE 47.62% 20.92±0.97% 18.28±0.62% 17.97±0.41% 15.11±0.56%
Note: we run each experiment 10 times independently and report the mean and standard deviation.

TABLE II
R ESULTS OF COMPARISON AMONG DIFFERENT VARIANTS IN MSTAGCN ON THE C HINESE AIRPORT NETWORK .

M Metric MSTAGCN-NS MSTAGCN-NE MSTAGCN-NA MSTAGCN


RMSE 11.981±0.195 11.182±0.176 10.817±0.124 10.371±0.168
1 hour MAE 6.927±0.098 6.467±0.069 6.142±0.041 5.884±0.072
MAPE 18.02±0.52% 16.03±0.47% 14.46±0.22% 13.39±0.45%
RMSE 12.205±0.198 11.379±0.184 11.006±0.136 10.613±0.174
2 hours MAE 7.294±0.095 6.692±0.076 6.331±0.047 6.123±0.089
MAPE 19.17±0.57% 16.92±0.50% 15.57±0.28% 14.90±0.52%
RMSE 12.563±0.194 11.604±0.197 11.289±0.129 10.787±0.181
3 hours MAE 7.581±0.096 6.823±0.087 6.506±0.051 6.252±0.093
MAPE 19.98±0.58% 17.87±0.54% 16.30±0.29% 15.11±0.56%
Note: we run each experiment 10 times independently and report the mean and standard deviation.

D. Experimental Setting

MSTAGCN is composed of two multiscale spatial-temporal


adaptive graph convolutional layers and a residual connection
is added for each layer. Both the graph convolution kernel size
and temporal convolution kernel size are set to 5. The time-
window is set to 24. The initial learning rate is 1e−4 with a
decay rate of 0.6 after every 5 epochs. We train our models
by minimizing the mean square error using Adam (Adaptive Fig. 4. Impacts of the graph convolution kernel size and time-window size
on the prediction performance (MAE) for three different future time periods.
Moment Estimation) optimizer for 50 epochs with batch size
64.
Here, we vary the size of graph convolution kernel from E. Experimental Results
1 to 7 and test the impacts on the prediction performance (1) Model Comparison
(MAE) for three different future time periods, as shown in Table I reports the performance results of the proposed
the left panel of Fig. 4. When the size of graph convolution approach and benchmark methods on the testing dataset. We
kernel grows from 1 to 5, MAE decreases slightly on three run each experiment 10 times independently and report the
future time periods. However, with increasing from 5 to 7, mean and standard deviation. Specifically, the first row (i.e.,
MAE increases dramatically on all time periods. Therefore, M = 1 hour) in Table I shows the short-term prediction results
we set graph convolution kernel size as 5, and MSTAGCN and the long-term forecasting is illustrated by the second and
can achieve better and stable performance. third rows (i.e., M = 2, 3 hours). It is shown that our model
In addition, we also vary the size of the time-window from achieves the best performance in terms of all the statistical
12 to 24 and test the effect on the prediction performance metrics both in short-term and long-term predictions. Addi-
(MAE) for three different future time periods, as shown in tionally, as validated by the analysis of variance (ANOVA)
the right panel of Fig. 4. When the size of time-window [47], there is no significant difference among the predicted
grows from 12 to 24, MAE declines gradually on three future results for three different future time periods. Such superior
time periods. Therefore, we set the time-window as 24, and performance of the proposed approach is mainly attributed
MSTAGCN can achieve better and stable performance. to the following two points. Firstly, considering the time-
7

1) Effects of the spatial interactions component: We com-


pare the performance of MSTAGCN with MSTAGCN-NS on
a real dataset (descried in Section IV.A) to investigate the
effectiveness of the spatial interactions component. From the
result, we observe that the proposed MSTAGCN performs
better than MSTAGCN-NS which confirms the superiority of
introducing the spatial interaction to our model.
2) Effects of the time-evolving component: We compare
Fig. 5. Training MSE and validation MSE of MSTAGCN variants during the the performance of MSTAGCN with MSTAGCN-NE on a
training procedure. real dataset (descried in Section IV.A) to investigate the
effectiveness of the time-evolving component. According to
the result, it is easy to observe that the proposed MSTAGCN
50 =%$$ Actual delay data
SVR 30
=63' Actual delay data
SVR
achieves better performance in terms of all the evaluation
STGCN STGCN

40
MSTAGCN (ours)
25
MSTAGCN (ours) metrics and it also accomplish much faster training and easier
convergences in the dataset. Thanks to the presence of the
GHOD\ PLQ

GHOD\ PLQ

20

30
15 time-evolving component, the proposed architecture achieves
20 10 a better performance in the long-term predictions.
10 7KHILUVWGD\ 7KHVHFRQGGD\ 7KHWKLUGGD\
5
7KHILUVWGD\ 7KHVHFRQGGD\ 7KHWKLUGGD\
3) Effects of the adaptive spatial interactions component:
06:00 14:00 06:00 14:00 06:00 14:00 22:00 06:00 14:00 06:00 14:00 06:00 14:00 22:00
We compare the performance of MSTAGCN with MSTAGCN-
NA on a real dataset (descried in Section IV.A) to inves-
=/;< Actual delay data
SVR
=*** Actual delay data
SVR
tigate the effectiveness of the adaptive spatial interactions
20 STGCN STGCN
MSTAGCN (ours) 25 MSTAGCN (ours) component. According to the result, it is easy to observe that
15
the proposed MSTAGCN outperforms MSTAGCN-NA, which
GHOD\ PLQ

GHOD\ PLQ

20

10 denotes that the adaptive spatial interaction can consistently


15

5
provide supplementary information to benefit our model.
10
7KHILUVWGD\ 7KHVHFRQGGD\ 7KHWKLUGGD\ 7KHILUVWGD\ 7KHVHFRQGGD\ 7KHWKLUGGD\

06:00 14:00 06:00 14:00 06:00 14:00 22:00 06:00 14:00 06:00 14:00 06:00 14:00 22:00
V. C ASE STUDY OF THE C HINESE AIRPORT NETWORK
Fig. 6. Predictions of Beijing Capital International Airport (ZBAA), Shanghai A. Case Description
Pudong International Airport (ZSPD), Xian Xianyang International Airport
(ZLXY), and Guangzhou Baiyun International Airport (ZGGG) for three We select four representative airports from the Chinese
consecutive days (from 06 : 00 to 22 : 00). airport network and show their prediction results for three
consecutive days, as shown in Fig. 6. More specifically,
the SVR model (the gray curve) represents the traditional
evolving structure of the airport network, this approach em- data-driven methods that overlook the spatial information in
ploys the Markov property to capture temporal dependency of an airport network. The STGCN model (the green curve)
air traffic through a sequence of graph snapshots. Furthermore, denotes the deep learning architectures based on static graph-
by parameterizing the additional two graphs based on the structured inputs. And the proposed MSTAGCN model (the
scheduled structure of an airport network, the proposed method orange curve) represents the deep learning architectures based
can effectively explore the latent spatial relationships among on evolving graph-structured inputs. In addition, four typical
airports even if there is no scheduled air route connection. airports are examined in detail including: 1). Beijing Capital
(2) Variant Comparison International Airport (ZBAA): an aviation hub in northern
Because the proposed MSTAGCN contains multiple key China which served for a record high of near 100.98 million
components, we additionally compare variants of MSTAGCN people in 2018, ranking the first in the Chinese mainland; 2).
with respect to the following perspectives to demonstrate Shanghai Pudong International Airport (ZSPD): an aviation
the performance of MSTAGCN: 1) the effect of the spatial hub in eastern China. The handling capacity of ZSPD reached
interactions, 2) the effect of the time-evolving component, 74.05 million passengers in 2018 and it ranked the second in
and 3) the impact of the adaptive component. The following the Chinese mainland; 3). Xian Xianyang International Airport
MSTAGCN variants are designed for comparison. (ZLXY): an aviation hub in western China. It handled 329,700
flights connecting 161 domestic cities and 50 international
• MSTAGCN-NS: A variant of MSTAGCN with the spatial
cities in 2018, and its accessibility of air routes network ranked
interactions component being removed.
the second in China; 4). Guangzhou Baiyun International
• MSTAGCN-NE: A variant of MSTAGCN with the time- Airport (ZGGG): an aviation hub in southern China. The
evolving component being removed. handling capacity of ZGGG achieved 69.73 million in 2018
• MSTAGCN-NA: A variant of MSTAGCN with the adap- and it ranked the third in the Chinese mainland.
tive spatial interactions component being removed.
The ablation study results are shown in Table II and the B. Prediction Results
training MSE and validation MSE of MSTAGCN variants From Fig. 6, it is easy to observe that STGCN and
during the training procedure are shown in Fig. 5. MSTAGCN generally perform better than SVR, being able
8

DLUSRUW DLUSRUW
               

  

 



 




 

DLUSRUW

DLUSRUW


 


 


 



 



Fig. 8. Illustration of the graph-structured input of MSTAGCN and


MSTAGCN-NA. The left matrix represents the scheduled structure of an
airport network (MSTAGCN-NA). An example of the corresponding adaptive
Fig. 7. Illustration of the temporal correlation among adjacent graph snap- matrix learned by our model is represented in the right matrix (MSTAGCN).
shots.

to effectively capture the dynamic changes from 07:00h to (larger value) and the yellowish dots reveal the weak air routes
09:00h and 17:00h to 20:00h. By considering (ignoring) the connections in the airport network. Specifically, if there is
specific “spatial interactions” in airport networks, the deep no scheduled air route connection between origin-destination
learning models (SVR) make more (less) accurate predictions, pairs, the element would be assigned as zero.
especially in the morning peak and evening rush hours. As shown in Fig. 7, there are significant differences in the
More specifically, the proposed model can meet the practical graph snapshots at different time slots, namely, the graph-
requirement for long-term prediction and has the potential for structured input of the model changes dramatically over time.
short-term forecasting. Indeed, an airport at time t is not only correlated with other
airports at the same time, but also depends on airports at the
• One hour ahead delay predictions: the average error
previous time step due to the large number of interconnected
between actual delay and predicted delay is 5.884±0.072
resources. MSTAGCN-NE takes a static graph-structured in-
minutes.
formation as input (i.e., maybe one of the subplots in Fig. 7)
• Two hours ahead delay predictions: the average error and ignores the temporal correlation among graph snapshots.
between actual delay and predicted delay is 6.123±0.089 However, the proposed MSTAGCN utilizes a sequence of
minutes. graph snapshots to model the temporal correlation, which
• Three hours ahead delay predictions: the average error finely captures the changes in an airport network and greatly
between actual delay and predicted delay is 6.252±0.093 improves the performance of deep learning models.
minutes.
In practice, the errors of the Airport-Collaborative Decision D. Analysis of Adaptive Spatial Interactions
Making System (A-CDM) powered by Beijing Capital Inter-
national Airport for one hour, two hours and three hours ahead To show the advantages of adaptive spatial interactions
flight delay predictions are 7.84 minutes, 11.86 minutes and in greater detail, we compare the graph-structured input of
17.26 minutes, respectively. MSTAGCN with MSTAGCN-NA in the graph convolutional
Unfortunately, the results of the proposed model present layer. As shown in Fig. 8, the adaptive adjacency matrix (in the
a lag phenomenon compared to the actual delay data in right panel) not only presents the inherent scheduled air routes
the ZBAA subfigure of Fig. 6. Note that it is the most but also exposes the unknown temporary relationships among
common phenomenon in time-series analysis tasks based on airports under urgent requirements. From a macroscopic per-
deep learning approaches [48]–[50]. Moreover, each spot in spective, the adaptive matrix reduces the importance gap
Fig. 6 represents the delay of an airport during an hour. The between hub airports and spoke airports. Namely, it decreases
lag phenomenon produces a minute-level error (less than 10 the difference among all elements in the adjacency matrix,
minutes), which brings few negative effects to practical flight which brings more comparability for various inputs and greatly
delay prediction problem. improves the performance of deep learning models.
From a microscopic perspective, the change of each element
in the adjacency matrix is calculated to evaluate the alterations
C. Analysis of Temporal Correlation among Graph Snapshots of air route strength. Table III reports the top 10 air routes
To show the advantages of temporal correlation among with the greatest reduction of the strength of connections
graph snapshots in greater detail, we compare the graph- in the Chinese airport network. Although these air routes
structured input of MSTAGCN with MSTAGCN-NE in the accommodate a great number of flights in scheduled structure
graph convolutional layer. We randomly select three adjacent of the airport network, they do not insert flight delays into
graph snapshots from a time-evolving graph and randomly the air transportation system and play an essential role in
intercept a portion of them to form a 10 × 10 adjacency flight delay absorption based on superior infrastructures. Take
matrix, as shown in Fig. 7. Each element in these matrices ZGSZ→ZSSS as an example, both airports are capable of
denotes the number of flights per unit time between the handling flight delays based on advanced infrastructures, so
corresponding origin-destination pair. The key connections the flight delays from the upstream airports may be absorbed
among hub airports are represented by the mazarine dots by the air route connection between ZGSZ and ZSSS.
9

TABLE III TABLE IV


T OP 10 AIR ROUTES WITH THE GREATEST REDUCTION OF THE STRENGTH T OP 10 AIR ROUTES WITH THE GREATEST IMPROVEMENT OF THE
OF CONNECTIONS IN THE C HINESE AIRPORT NETWORK . STRENGTH OF CONNECTIONS IN THE C HINESE AIRPORT NETWORK .

The scheduled The strength The strength The scheduled The strength The strength
Ranking Air route Ranking Air route
strength learned by model reduction strength learned by model improvement
1 ZGSZ→ZSSS 1.458 1.042 0.417 1 ZGZH→ZSQD 0 0.241 0.241
2 ZBAA→ZSSS 1.792 1.413 0.379 2 ZGDY→ZSWX 0 0.223 0.223
3 ZGGG→ZSSS 1.417 1.054 0.363 3 ZWWW→ZWKL 0.25 0.47 0.22
4 ZUCK→ZSSS 0.625 0.337 0.288 4 ZPDL→ZSYT 0 0.215 0.215
5 ZGSZ→ZHCC 0.458 0.174 0.284 5 ZULS→ZSQZ 0 0.21 0.21
6 ZBTJ→ZGGG 0.5 0.22 0.28 6 ZPDL→ZGZJ 0 0.206 0.206
7 ZSSS→ZBAA 1.792 1.518 0.273 7 ZBNY→ZPDL 0 0.205 0.205
8 ZSSS→ZGSZ 1.292 1.021 0.271 8 ZWSH→ZBYN 0 0.2 0.2
9 ZSSS→ZLXY 0.667 0.397 0.269 9 ZSYA→ZUGY 0.042 0.241 0.199
10 ZBAA→ZUCK 1 0.734 0.266 10 ZGDY→ZHYC 0 0.198 0.198
Note: the strength reduction represents the difference of connection Note: the strength improvement represents the difference of connection
strength between the scheduled adjacency matrix and the adaptive matrix. strength between the scheduled adjacency matrix and the adaptive matrix.

Moreover, Table IV reveals the top 10 air routes with the that most of the interactions and operations in air transporta-
greatest improvement of the strength of connections in the tion are subject to highly regulated terms and conditions, it
Chinese airport network. On the one hand, although the air would be interesting to integrate data-driven methods with the
routes in Table IV accommodate few flights in scheduled operating rules of air traffic management.
structure of the airport network (e.g., ZWWW→ZWKL),
their poor infrastructures fail to handle flight delays from ACKNOWLEDGMENT
upstream airports and lead to an increased strength of air
route connection. On the other hand, although there is no This work is supported by the Funds of the National
scheduled connection between these origin-destination pairs Natural Science Foundation of China (Grant Nos. 61822102,
(e.g., ZGZH→ZSQD), our model learns a series of non-zero U2033215, U1833125).
values representing increased strength of these connections.
The increased strength may illustrate the importance of neigh- R EFERENCES
boring airports in absorbing flights requiring diversion when
severe delays or emergency breaks out. [1] International Air Transport Association (IATA), “Annual review 2019,”
https://www.iata.org/contentassets/, Accessed February, 2020.
[2] EUROCONTROL, “Annual network operations report 2019,”
https://www.eurocontrol.int/publication/annual-network-operations-
VI. C ONCLUSION AND FUTURE WORK report-2019, Accessed April, 2020.
[3] United States Department of Transportation, “Bureau of transportation
In this paper, we investigate the flight delay prediction prob- statistics, airline service quality performance,” https://transtats.bts.gov/,
lem from a novel perspective and develop a flight delay predic- Accessed October, 2020.
[4] Civil Aviation Administration of China, “Bulletin on the development of
tion method in which the time-evolving nature of an airport the civil aviation industry in 2019,” http://www.caac.gov.cn/, Accessed
network is considered. Specifically, based on a sequence of June 5, 2020.
graph snapshots, a temporal convolutional block in accordance [5] International Air Transport Association (IATA), “Iata economics,”
https://www.iata.org/en/publications/economics/, Accessed March,
with the Markov property is employed to mine the time- 2020.
varying patterns of flight delays. Moreover, an adaptive graph [6] J. J. Rebollo and H. Balakrishnan, “Characterization and prediction of air
convolutional block is embedded into the proposed approach to traffic delays,” Transportation Research Part C: Emerging Technologies,
explore spatial interactions hidden in airport networks. Using vol. 44, pp. 231–241, 2014.
[7] Y. Tu, M. O. Ball, and W. S. Jank, “Estimating flight departure delay
operational data obtained from CAAC, the proposed model distributions—a statistical approach with long-term trend and short-term
is compared with several benchmark approaches in a study pattern,” Journal of the American Statistical Association, vol. 103, no.
case involving 74 civil airports in China in 2018. The results 481, pp. 112–125, 2008.
[8] Y. Zhang and N. Nayak, “Macroscopic tool for measuring delay perfor-
show that our model achieves the best performance with all mance in national airspace system,” Transportation Research Record,
the evaluation metrics both in the short-term and the long-term vol. 2177, no. 1, pp. 88–97, 2010.
predictions. Additionally, by considering interactions among [9] R. D. Windhorst, “Towards a fast-time simulation analysis of benefits
of the spot and runway departure advisor,” AIAA Guidance, Navigation,
airports, the deep learning approaches generally outperform and Control Conference, 2012.
the classical methods in which the interactions among airports [10] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional
are ignored. networks: A deep learning framework for traffic forecasting,” pp. 3634–
3640, 7 2018.
The proposed model could be extended further if relevant [11] Y. J. Kim, S. Choi, S. Briceno, and D. Mavris, “A deep learning
data is available. For example, a more comprehensive scenario approach to flight delay prediction,” 2016 IEEE/AIAA 35th Digital
can be modeled by considering the overall landside and air- Avionics Systems Conference (DASC), pp. 1–6, 2016.
[12] Y. Guleria, Q. Cai, S. Alam, and L. Li, “A multi-agent approach
side operation, such as boarding/de-boarding, security checks, for reactionary delay prediction of flights,” IEEE Access, vol. 7, pp.
ground holding, and taxiing. Moreover, considering the fact 181 565–181 579, 2019.
10

[13] G. Gui, F. Liu, J. Sun, J. Yang, Z. Zhou, and D. Zhao, “Flight delay [35] J. Ye, J. Zhao, K. Ye, and C. Xu, “How to build a graph-based
prediction based on aviation big data and machine learning,” IEEE deep learning architecture in traffic domain: A survey,” arXiv, vol.
Transactions on Vehicular Technology, vol. 69, no. 1, pp. 140–150, 2020. abs/2005.11691, 2020.
[14] L. Hao, M. Hansen, Y. Zhang, and J. Post, “New york, new york: [36] R. Yu, Y. Li, C. Shahabi, U. Demiryurek, and Y. Liu, “Deep learning: A
Two ways of estimating the delay impact of new york airports,” generic approach for extreme condition traffic forecasting,” Proceedings
Transportation Research Part E: Logistics and Transportation Review, of the 2017 SIAM International Conference on Data Mining, pp. 777–
vol. 70, pp. 245 – 260, 2014. 785, 06 2017.
[15] J. Chen and M. Li, “Chained predictions of flight delay using machine [37] Z. Pan, Y. Liang, W. Wang, Y. Yu, Y. Zheng, and J. Zhang, “Urban
learning,” AIAA Science and Technology Forum and Exposition, 01 2019. traffic prediction from spatio-temporal data using deep meta learning,”
[16] B. Yu, Z. Guo, S. Asian, H. Wang, and G. Chen, “Flight delay prediction Proceedings of the 25th ACM SIGKDD International Conference on
for commercial air transport: A deep learning approach,” Transportation Knowledge Discovery & Data, p. 1720–1730, 2019.
Research Part E: Logistics and Transportation Review, vol. 125, pp. [38] J. Zhang, X. Shi, J. Xie, H. Ma, I. King, and D. Yeung, “Gaan: Gated
203–221, 2019. attention networks for learning on large and spatiotemporal graphs,”
[17] W.-B. Du, M.-Y. Zhang, Y. Zhang, X.-B. Cao, and J. Zhang, “Delay arXiv, vol. abs/1803.07294, 2018.
causality network in air transport systems,” Transportation Research Part [39] J. Li, Z. Han, H. Cheng, J. Su, P. Wang, J. Zhang, and L. Pan, “Predicting
E: Logistics and Transportation Review, vol. 118, pp. 466 – 476, 2018. path failure in time-evolving graphs,” arXiv, vol. abs/1905.03994, 2019.
[40] M. Schlichtkrull, T. Kipf, P. Bloem, R. V. Berg, I. Titov, and M. Welling,
[18] K. Cai, J. Zhang, M. Xiao, K. Tang, and W. Du, “Simultaneous
“Modeling relational data with graph convolutional networks,” arXiv,
optimization of airspace congestion and flight delay in air traffic network
vol. abs/1703.06103, 2017.
flow management,” IEEE Transactions on Intelligent Transportation
[41] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph
Systems, vol. 18, no. 11, pp. 3072–3082, 2017.
convolutional networks for skeleton-based action recognition,” 2019
[19] C. Yang, J. Mao, X. Qian, and P. Wei, “Designing robust air trans- IEEE/CVF Conference on Computer Vision and Pattern Recognition
portation networks via minimizing total effective resistance,” IEEE (CVPR), pp. 12 018–12 027, 06 2019.
Transactions on Intelligent Transportation Systems, vol. 20, no. 6, pp. [42] T. Kipf and M. Welling, “Semi-supervised classification with graph
2353–2366, 2019. convolutional networks,” arXiv, vol. abs/1609.02907, 2017.
[20] C. Ciruelos, A. Arranz, I. Etxebarria, S. Peces, B. Campanelli, [43] V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, “Recurrent models
P. Fleurquin, V. Eguı́luz, and J. J. Ramasco, “Modelling delay prop- of visual attention,” arXiv, vol. abs/1406.6247, 2014.
agation trees for scheduled flights,” the 11th USA/Europe Air Traffic [44] M. S. Ahmed and A. R. Cook, “Analysis of freeway traffic time-series
Management Research and Development Seminar, 06 2015. data by using box-jenkins techniques,” Transp. Res. Rec., vol. 722, pp.
[21] Y. Xiao, Y. Zhao, G. Wu, and Y. Jing, “Study on delay propagation 1 – 9, 1979.
relations among airports based on transfer entropy,” IEEE Access, vol. 8, [45] J.-M. H. C.-H. Wu and D. T. Lee, “Travel-time prediction with support
pp. 97 103–97 113, 2020. vector regression,” IEEE Transactions on Intelligent Transportation
[22] C. Chen, C. Li, J. Chen, and C. Wang, “Vfdp: Visual analysis of flight Systems, vol. 5, pp. 276 – 281, 2004.
delay and propagation on a geographical map,” IEEE Transactions on [46] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Graph convolutional recurrent neu-
Intelligent Transportation Systems, pp. 1–12, 2020. ral network: Data-driven traffic forecasting,” arXiv, vol. abs/1707.01926,
[23] N. Pyrgiotis, K. M. Malone, and A. Odoni, “Modelling delay prop- 2017. [Online]. Available: http://arxiv.org/abs/1707.01926
agation within an airport network,” Transportation Research Part C: [47] H.Scheffe, “The analysis of variance,” John Wiley & Sons, vol. 72, 1999.
Emerging Technologies, vol. 27, pp. 60 – 75, 2013. [48] Z. Hou and X. Li, “Repeatability and similarity of freeway traffic
[24] C.-L. Wu and K. Law, “Modelling the delay propagation effects of multi- flow and long-term prediction under big data,” IEEE Transactions on
ple resource connections in an airline network using a bayesian network Intelligent Transportation Systems, vol. 17, no. 6, pp. 1786–1796, 2016.
model,” Transportation Research Part E: Logistics and Transportation [49] J. Mackenzie, J. F. Roddick, and R. Zito, “An evaluation of htm and
Review, vol. 122, pp. 62 – 77, 2019. lstm for short-term arterial traffic flow prediction,” IEEE Transactions
[25] G. Mehr and A. Eskandarian, “Estimating the probability that a vehicle on Intelligent Transportation Systems, vol. 20, no. 5, pp. 1847–1857,
reaches a near-term goal state using multiple lane changes,” IEEE 2019.
Transactions on Intelligent Transportation Systems, pp. 1–12, 2021. [50] H. Zheng, F. Lin, X. Feng, and Y. Chen, “A hybrid deep learning model
[26] A. Eskandarian, C. Wu, and C. Sun, “Research advances and challenges with attention-based conv-lstm networks for short-term traffic flow
of autonomous and connected ground vehicles,” IEEE Transactions on prediction,” IEEE Transactions on Intelligent Transportation Systems,
Intelligent Transportation Systems, vol. 22, no. 2, pp. 683–711, 2021. pp. 1–11, 2020.
[27] H. Zhao, Y. Li, W. Hao, S. Peeta, and Y. Wang, “Evaluating the effects
of switching period of communication topologies and delays on electric
connected vehicles stream with car-following theory,” IEEE Transactions
on Intelligent Transportation Systems, pp. 1–11, 2020.
[28] Y. Kan, Y. Wang, D. Wang, J. Sun, C. Shao, and M. Papageorgiou, “A
novel approach to estimating missing pairs of on/off ramp flows,” IEEE
Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp.
1287–1305, 2021.
[29] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, “Long short-term memory
neural network for traffic speed prediction using remote microwave
sensor data,” Transportation Research Part C: Emerging Technologies,
vol. 54, pp. 187 – 197, 2015. Kaiquan Cai received the B.S. and Ph.D. degrees
[30] X. Shi, H. Qi, Y. Shen, G. Wu, and B. Yin, “A spatial-temporal atten- from Beihang University in 2004 and 2013, re-
tion approach for traffic prediction,” IEEE Transactions on Intelligent spectively. He is currently a professor with School
Transportation Systems, pp. 1–10, 2020. of Electronic and Information Engineering, Beihang
[31] Z. Cui, K. Henrickson, R. Ke, and Y. Wang, “Traffic graph convolutional University, and deputy director of National Key Lab-
recurrent neural network: A deep learning framework for network- oratory of CNS/ATM. His research interests include
scale traffic learning and forecasting,” IEEE Transactions on Intelligent intelligent air navigation, networked collaborative air
Transportation Systems, vol. 21, no. 11, pp. 4883–4894, 2020. traffic management.
[32] K. Guo, Y. Hu, Z. Qian, H. Liu, K. Zhang, Y. Sun, J. Gao, and B. Yin,
“Optimized graph convolution recurrent neural network for traffic pre-
diction,” IEEE Transactions on Intelligent Transportation Systems, pp.
1–12, 2020.
[33] C. Park, C. Lee, H. Bahng, T. won, K. Kim, S. Jin, S. Ko, and
J. Choo, “Stgrat: A spatio-temporal graph attention network for traffic
forecasting,” arXiv, vol. abs/1911.13181, 2019.
[34] X. Yin, G. Wu, J. Wei, Y. Shen, H. Qi, and B. Yin, “A comprehensive
survey on traffic prediction,” arXiv, vol. abs/2004.08555, 2020.
11

Yue Li received the B.S. degree in mathematics


and applied mathematics from Xidian University,
Xian, China, in 2017. He is currently pursuing the
Ph.D. degree with the School of Electronic and
Information Engineering, Beihang University, China.
His research interests include deep learning and
networked air traffic management systems.

Yi-Ping Fang received his Ph.D. degree in Indus-


triel Engineering from École Centrale Paris (ECP),
France. He is currently an assistant professor in
the Chair Risk and Resilience of Complex Systems,
Laboratoire Gene Industriel, CentraleSupélec, Uni-
versité Paris-Saclay, France. He has been the Post-
doc research fellow at ETH Zurich, Switzerland
from March 2015 to January 2017. His primary
research interests focus on the study and develop-
ment of advanced computational methods for risk,
reliability and resilience analytics of critical cyber-
physical systems (e.g., power grids and electrified transportation systems).
Particularly, He is interested in resilience assessment, stochastic and robust
optimization, and machine learning.

Yanbo Zhu received the B.S. and Ph.D. degrees


from Beihang University, in 1995 and 2009, respec-
tively. He is currently vice president of Aviation
Data Communication Corporation, China, and he is
also a part-time tutors with School of Electronic and
Information Engineering, Beihang University. His
research interests include intelligent air navigation,
aeronautical datalink communication, and collabora-
tive air traffic management.

You might also like