Distributionally Robust Ground Delay Programs With
Distributionally Robust Ground Delay Programs With
Abstract—Strategic Traffic Management Initiatives (TMIs) such operational (e.g., runway availability and traffic volume [3])
as Ground Delay Programs (GDPs) play a crucial role in sources. Although a large variety of prior work focuses on, e.g.,
mitigating operational costs associated with demand-capacity airport runway configuration prediction [4], weather impact
imbalances. However, GDPs can only be planned (e.g., duration,
delay assignments) with confidence if the future capacities at predictions [2], and TMI implementation predictions [5], all
constrained resources (i.e., airports) are predictable. In reality, predictions result in a probability distribution over potential
such future capacities are uncertain, and predictive models may outcomes. If such predictions were to be incorrect (e.g., due
provide forecasts that are vulnerable to errors and distribution to distribution shifts or misspecification), the resultant GDP
shifts. Motivated by the goal of planning optimal GDPs that are implementation may be sub-optimal.
distributionally robust against airport capacity prediction errors,
we study a fully integrated learning-driven optimization frame- A. Motivation and research problem
work. We design a deep learning-based prediction model capable
of forecasting arrival and departure capacity distributions across In this work, we are motivated by the goal of leveraging
a network of airports. We then integrate the forecasts into a advancements in machine learning (ML)-based predictions
distributionally robust formulation of the multi-airport ground for airport capacities while adopting a cautiously optimistic
holding problem (DR -MAGHP). We show how DR -MAGHP can
outperform stochastic optimization when distribution shifts occur, approach towards prescribing GDP solutions. The former ac-
and conclude with future research directions to improve both the knowledges the adoption of ML models which enables data-
learning and optimization stages. driven predictive models for aviation (a key tenant in the
Keywords—Air traffic management; Ground Delay Programs Federal Aviation Administration’s Information-Centric NAS
(GDPs); Airport capacity prediction; Distributionally robust opti- vision [6]). Moreover, the latter acknowledges the need to
mization make robust decisions, understanding that such probabilistic
predictions may be incorrect.
I. I NTRODUCTION ML has continued to see significant developments and
Congestion in air transportation systems results from applications in the field of aviation, particularly concerning
demand-capacity imbalances, often stemming from airport airport and airspace operations. By using historical and real-
capacity reductions. Within the US National Airspace System time data, ML models can help airlines and airport operators
(NAS), the strategic implementation of Traffic Management anticipate and mitigate flight delays. ML is aiding in the
Initiatives (TMIs) seeks to reduce the operational costs of such development of more advanced ATM systems, which can
imbalances. A prominent example of TMIs is Ground Delay predict traffic flows and flight delays, optimize flight paths,
Programs (GDPs), which aims to strategically delay flights on and improve overall airspace efficiency. The European Union’s
the ground at the origin airport and mitigate costly airborne SESAR (Single European Sky ATM Research) and the United
delays. The optimal implementation of GDPs is the purview States’ NextGen programs have been working on integrating
of airport ground holding optimization problems, or GHPs. ML to enhance ATM. However, most existing ML methods
Airport capacities at future time periods play a significant can only generate point estimations for prediction, which do
role in GDP implementations. If such capacities are known, the not fully characterize the uncertainties brought by the dynamic
optimal delay allocation decisions can be found by solving and complex nature of air transportation systems.
GHPs [1]. However, in practice, it is extremely difficult for Existing methods for optimizing GDPs include the de-
traffic management decision-makers to ascertain future airport terministic multi-airport ground holding problem (MAGHP)
capacities (i.e., Airport Arrival Rates and Airport Departure and the stochastic MAGHP; the former assumes deterministic
Rates) due to myriad uncertainties. Such uncertainties stem airport capacities, while the latter relaxes this assumption.
from environmental (e.g., convective weather forecasts [2]) and The feasibility of implementing ML models in real-world
1 1
H. Wu was partially supported by a Departmental Fellowship from the University of
Michigan. X. Zhu, S. Li, and Y. Zhou were partially supported by Hong Kong Innovation
and Technology Commission Innovation and Technology Fund (GHP/145/20) and City
University of Hong Kong Internal Fund (PJ9678283).
ICRAT 2024 Nanyang Technological University, Singapore
settings is compromised when the downstream learning-driven recently, data-driven approaches such as [19]. Other related
optimization model is highly sensitive to predictions from the work focused on predictions of, e.g., traffic flow [20]. Few
ML model. This sensitivity leads to decision-making based works have been conducted on the prediction of real-time
on potentially inaccurate information (e.g., inaccurate airport airport capacities [21], and nascent work has been done on
capacity predictions). Consequently, it is critical for GDP distribution predictions for flight delays [22].
solutions to acknowledge and address uncertainties in airport
capacity prediction models. II. C ONTRIBUTIONS
To account for distributional uncertainties in airport capacity In this work, our contributions are as follows:
predictions, we investigate a distributionally robust version of
the MAGHP, termed the distributionally robust multi-airport 1) We develop an ML framework for providing distribu-
ground holding problem (DR -MAGHP). Distributionally ro- tional predictions of the airport capacity, rather than a
bust optimization (DRO) aims to identify the optimal solution single value.
by considering the worst-case (with respect to the objective 2) We formulate and solve the DR -MAGHP given input
function) distribution within a predefined set of distributions. capacity distributions from the ML model.
This predefined set is known as the ambiguity set, and is 3) We conduct a sensitivity analysis to evaluate the perfor-
constructed based on the predicted airport capacity distribution mance of DR -MAGHP under varying levels of airport
from the upstream ML model. In Section III we will rigorously capacity overestimation.
restate relevant terms above.
III. M ETHODOLOGY
B. Background and prior works We first introduce our airport capacity distribution prediction
The deterministic MAGHP, stochastic MAGHP, and other model. The predicted distributions serve as inputs to the DR -
MAGHP variations have been well studied [1]. Stochastic MAGHP. Figure 1 provides an overview of our combined
implementations include using two-stage stochastic program- learning-driven optimization framework.
ming and chance-constrained programming [7, 8]. Moreover,
A. Airport capacity distribution prediction
[9] develops a data-driven control framework that not only
minimizes the total cost but redistributes delays spatially, We develop a deep learning model for airport capacity distri-
potentially improving operational recovery capabilities. [10] bution prediction. The model can forecast arrival and departure
and [11] also take equity, fairness, and passenger-centric capacity distributions across a 12-hour prediction horizon, dis-
considerations into account. cretized in 15-minute intervals. A separate prediction model is
In the operations research literature, DRO is emerging as built for each airport to account for airport-specific differences
a rigorous way to combine elements of stochastic and robust [23]. Furthermore, for each airport, separate and independent
optimization approaches. [12] proposes moment-based DRO, models are built for arrival and departure capacity predictions.
constructing the ambiguity set using the first and second mo- 1) Deriving actual capacities from throughput: In order to
ments of the distribution. Alternatively, [13] builds a Wasser- train and validate the prediction model, we need observations
stein distance-based ambiguity set: This approach considers all of the actual, true airport capacity. A unique challenge for
distributions within a Wassertein ball of radius ϵ, and proposes this prediction problem is that we do not directly observe the
a convex reformulation technique to recover a tractable form of actual capacity value from historical data [24]. In this study, we
the original DRO problem. [14] studies the Wassertein ambi- derive estimates of the actual capacity from historical records
guity set-based distributionally robust mixed-integer program of airport arrival and departure throughput. The throughput at
and solves it using dual decomposition methods. time t (observable) typically are underestimations of actual
Airport capacity can be defined as the maximum sustainable capacity at time t (unobservable, to be estimated), particularly
throughput for arriving (the Airport Arrival Rate, or AAR) and during off-peak periods. To address this, we use two rules
departing (the Airport Departure Rate, or ADR) flights [15]. as defined in (1) to filter out time periods with low-volume
In contrast to declared capacities obtained from theoretical of flight operations and select peak operational times when
analyses or statistical approaches [16], real-time capacity is throughput generally reaches its maximum to estimate the
dynamic and challenging to predict in advance. Real-time actual airport capacity:
airport capacities are influenced by several interconnected \ t = throughputt ⇐⇒ (Rule 1) ∨ (Rule 2),
capacity (1)
operational and environmental factors [17]. As the prediction
time horizon increases, forecast uncertainty grows as well, where Rule 1 is given by demandt ≥ throughputt + 3 and
rendering accurate long-term predictions difficult [18]. A sam- Rule 2 by (avg. delayt > 30) ∧ (no. of delayed flightst > 1).
pling of previous work include analytical approaches such as Only data from time periods satisfying (1) are used to train
the Integrated Airport Capacity Model (IACM) [15], and more and validate the airport capacity prediction model.
2 2
ICRAT 2024 Nanyang Technological University, Singapore
Fig. 1: Learning-driven airport capacity distribution prediction and distributionally robust GDP optimization framework.
3 3
ICRAT 2024 Nanyang Technological University, Singapore
where Π is a joint distribution of random variables ξ1 this, the full DR -MAGHP can be written as follows:
and ξ2 with marginals Q1 and Q2 , respectively. We denote
DΠ (ξ1 , ξ2 ) as the set of all joint distributions on ξ1 and ξ2
X h i
min (Cg gf + Ca af ) + max Ep Q u, ξ (g)
with nmarginals Q1 and o Q2 . Let Z be the set of all airports u,v
f ∈F
p∈Pϵg (P
b(g) )
and ξb1 , ξb2 , . . . , ξb|Z| be the set of |Z| airport capacities )
h i
with
the corresponding estimated probabilities of occurrence + max Ep Q v, ξ (a) (4a)
pb1 , pb2 , . . . , pb|Z| (i.e., this is precisely the predicted capacity p∈Pϵa (Pb(a) )
n o gf ′ + af ′ − sf ′ ≤ af , ∀(f, f ′ ) ∈ C, (4d)
Pϵ Pb := Q ∈ M (Ξ) : dw Pb, Q ≤ ϵ . (3)
where Q u, ξ (g) is:
4 4
ICRAT 2024 Nanyang Technological University, Singapore
This approach results in a substantial increase in the number details in Table I; hyperparameter turning is done via grid
of scenarios, leading to poor scalability and computational search, with a resultant learning rate of 0.0001, 300 epochs,
intractability. To reduce the number of scenarios, we focus and a batch size of 16.
on clustering time periods together with similar capacity We use three metrics—Root Mean Squared Error (RMSE),
distributions. We utilize the Wasserstein distance to quantify Coverage Rate (CR), and Average Confidence Interval Length
the pairwise similarity between capacity distributions of two (ACIL)—to evaluate model performance. Briefly, RMSE mea-
consecutive time periods. Highly similar time periods are sures model performance with respect to a single, likeliest pre-
grouped together, whereas a distinct time period (marked by a dicted capacity, ignoring the rest of the predicted distribution.
significant change in predicted airport capacity distributions) is In contrast, CR and ACIL incorporates differences between
demarcated when the pairwise similarity exceeds a predefined the predicted and actual capacity distributions through the use
threshold. The representative capacity distribution for a group of confidence intervals. The deep learning model attempts to
is its average (centroid) capacity distribution. predict airport capacity distributions that minimize the RMSE,
After scenario reduction, we reformulate the DR -MAGHP increase the CR, and reduce the spread of the ACIL.
in (4a)-(4d) by converting the inner second-stage maximization 3) Capacity distribution prediction results: Table II summa-
problem into a minimization problem, resulting in a semi- rizes the model performance across the test set for the US Core
infinite program. We then apply discretization techniques to 30 airports on December 31, 2019. A salient future research
address the continuous supports of this semi-infinite program, direction would be to tune specific ambiguity sets within the
resulting in a deterministic equivalent form for the DR - DR -MAGHP in response to the prediction performance at
MAGHP. Due to page limitations, we omit the technical individual airports. Furthermore, these performance metrics
details and refer readers to [27]. can guide future prediction model enhancements (e.g., using
refined prediction models for airports with large RMSEs such
IV. E XPERIMENTAL RESULTS AND DISCUSSION as IAD and CLT, or low CR with a large ACIL).
A. Capacity distribution prediction We select Los Angeles International (LAX) Airport as a
1) Data description: We obtained airport throughput data case study to visualize the prediction results. Figure 3 plots the
from the US Department of Transportation’s Bureau of Trans- actual estimated departure capacity (red dots) and the predicted
portation Statistics (BTS), and weather data from the US departure capacity distribution for LAX across December 31,
National Oceanic and Atmospheric Administration’s High- 2019. At 18:00 local time, we observe a rapid decline in
Resolution Rapid Refresh (HRRR) database. BTS provides the predicted capacity, which aligns with the estimated actual
detailed information for each flight, including the scheduled capacity: During this time, LAX experienced a sudden change
departure and arrival times, actual departure and arrival times, in the wind direction—this most likely triggered a change in
delay duration, cancellation status, among others. Using these the airport runway configuration. Furthermore, the predicted
data points and the procedure described in Section III-A1, capacity distributions appear to be relatively stable, which
we estimate the actual airport capacity at each airport of reinforces the validity of the scenario reduction procedure
interest. In this study, we examine the FAA Core 30 airports, described in Section III-C.
but both the prediction and optimization frameworks can be
easily extended to a larger network of airports. HRRR provides
weather data on a 3 km × 3 km grid covering all 50 US states,
with a forecast horizon of up to 23 hours from the current hour.
We collect data for the entire year of 2019. Each day is divided
into 96 quarter-hour intervals, resulting in 35,040 time periods
in total. We use 2019 weather data from January 1-December
30 to train the capacity distribution prediction models. We
reserve December 31, 2019 as the day across which we will
forecast airport capacity distributions that will serve as inputs
for the DR -MAGHP model.
2) Prediction experimental setup and results: We desig- Fig. 3: Predicted departure capacity distributions for LAX on
nate December 31, 2019 as the test set to evaluate model December 31, 2019. Blue area from 9:00-21:00 is the 12-hour
performance. For training-validation sets, we randomly split solution window of the DR -MAGHP.
the remaining data into a standard ratio of 80:20. We then
normalize all numeric features on the training set using min- B. DR -MAGHP optimization
max normalization. We use the same normalization procedure 1) Experiment setup: To ensure consistency with the ca-
for the test set as well. We give the 3-layer MLP architecture pacity distribution prediction procedure, we also use BTS
5 5
ICRAT 2024 Nanyang Technological University, Singapore
6 6
ICRAT 2024 Nanyang Technological University, Singapore
Reduction ϕsp
OS ϕdr
OS ϵ∗ % dec.
10% 331469.45 331443.65 0.01 0.007
20% 395454.05 395420.20 0.01 0.009
30% 473078.15 473068.50 0.02 0.002
40% 566420.20 566335.80 0.02 0.015
50% 673100.10 672943.65 0.04 0.023
60% 792804.80 792645.95 0.02 0.020
70% 935837.35 935473.75 0.04 0.038
80% 1105686.30 1104923.10 0.09 0.069
90% 1278823.30 1277608.70 0.09 0.095
100% 1454765.00 1453077.00 0.09 0.116
(a) Percent increase in flights subject (b) Percent decrease in flights subject
TABLE III: Sensitivity analysis results.
(c) Percent increase in flights subject (d) Percent decrease in flights subject
Fig. 5: Percent increase in delayed flights under DR -MAGHP (a) 10% reduction in capacities. (b) 50% reduction in capacities.
versus SP -MAGHP with (a) ϵ = 0.1, (c) ϵ = 0.5; Percent
Fig. 6: Out-of-sample performances of SP -MAGHP and DR -
decrease in delayed flights under DR -MAGHP versus SP -
MAGHP S for two levels of capacity reductions.
MAGHP with (b) ϵ = 0.1, (d) ϵ = 0.5.
by-day basis, just as unique GDP policies must be developed away from predicted ones, larger ambiguity sets are needed. It
for each encountered NAS scenario. is important to note that the out-of-sample performances of SP -
3) Sensitivity to distribution shifts: To evaluate how vari- MAGHP and DR -MAGHP, when radii are set to zero, might
ations in the discrepancies between predicted and realized not align, even if their in-sample performances appear identical
capacity distributions influence delay costs, and to determine under the same conditions. This discrepancy arises because
the effectiveness of DR -MAGHP in mitigating these effects, the specific delay assignment policies implemented by SP -
we perform a sensitivity analysis. This analysis generates MAGHP and DR -MAGHP based on predicted distributions
testing distributions at various levels of capacity reductions to may vary. These differences can lead to identical in-sample
compare the out-of-sample performance of SP -MAGHP and performance, yet diverge in out-of-sample performance.
DR -MAGHP. Table III gives the out-of-sample performances (e.g., delay
To sample from reduced capacity distributions at various assignment costs for SP -MAGHP and DR -MAGHP, respec-
reduction levels, we introduce a linear program in (6) that tively) for different realized capacity distributions with reduced
performs valid adjustments to the PMFs to minimize the capacities. The best ambiguity set size ϵ∗ is also given.
deviation between the current mean value and the targeted Although the cost reductions may seem modest when utilizing
(reduced) mean value. We also introduce a parameter δ for the DR -MAGHP, we note that this was across an arbitrary day
maximum variability rate to ensure a more uniform distribution (December 31, 2019) in the NAS. A full analysis of the
of probability mass. The details of this procedure are outlined advantages gained by robustifying ground holding decisions
in Algorithm 1. We then draw 100 samples from the reduced with ML-driven airport capacity prediction inputs will require
capacity distributions, and evaluate the out-of-sample perfor- performing this experiment across a longer time horizon.
P|ξ̄| ¯ ¯ ¯
mance ϕOS (x̄) = i=1 ϕd OS (x̄, ξi )/|ξ|, where ϕOS (x̄, ξi ) is
d The goal of this paper is to provide the foundations for
the objective function value of the derived ground holding robust, learning-driven optimization approaches to strategic
policy x̄ with reduced capacity sample ξ¯i . ATM considering uncertain airport capacities. Should users
We examine erroneous predictions where realized capacities trust predictive models, then the robustness aspects can (and
are reduced between 10-100% (note that a GDP with 0 AAR should, to avoid overly-conservative solutions) be dialed down.
is essentially a Ground Stop). When the realized capacities are On the other hand, should users not trust prediction model
only slightly lower than forecast (i.e., 10-20% reductions), only outputs (e.g., current conditions are particularly unstable, or the
a small ambiguity set (i.e., small ϵ) is needed to modestly out- prediction horizon is long), distributionally robust approaches
perform SP -MAGHP. As the realized capacities move farther can be considered.
7 7
ICRAT 2024 Nanyang Technological University, Singapore
8 8