Learning Optimal Objective Values For MILP.18321v1
Learning Optimal Objective Values For MILP.18321v1
A BSTRACT
Modern Mixed Integer Linear Programming (MILP) solvers use the Branch-and-Bound algorithm
together with a plethora of auxiliary components that speed up the search. In recent years, there
has been an explosive development in the use of machine learning for enhancing and supporting
these algorithmic components [18]. Within this line, we propose a methodology for predicting the
optimal objective value, or, equivalently, predicting if the current incumbent is optimal. For this
task, we introduce a predictor based on a graph neural network (GNN) architecture, together with
a set of dynamic features. Experimental results on diverse benchmarks demonstrate the efficacy of
our approach, achieving high accuracy in the prediction task and outperforming existing methods.
These findings suggest new opportunities for integrating ML-driven predictions into MILP solvers,
enabling smarter decision-making and improved performance.
1 Introduction
Mixed Integer Linear Programming (MILPs) is a widespread tool for modelling mathematical optimization problems,
with applications in numerous real-world scenarios. The Branch-and-Bound (B&B) algorithm, which employs a
divide-and-conquer approach, is the preferred method for solving MILPs to global optimality. In recent years, there
has been a surge in interest in harnessing the power of machine learning (ML) tools to aid the solution process of
MILPs. From solution prediction (e.g. [6, 15, 20]) to interventions on the heuristic rules used by the solvers (e.g.
[9, 4, 16]), several approaches have been studied in the literature (see Scavuzzo et al. [18] for an in-depth discussion
of this topic). The overarching trend is to build dynamic MILP solvers that can make active use of the large amounts
of data produced during the solving process.
Many of the decisions that must be made during the B&B process could be better informed were the optimal solution
known from the start. In fact, even knowing the optimal objective value can positively influence the solver behaviour.
For example, once a solution is found that matches this value, any effort to find new solutions can be avoided. With
perfect information of the optimal objective value, a solver can further do more aggressive pruning of nodes. In
general, having this knowledge can allow the solver to adapt its configuration, putting more emphasis on different
components. Even in absence of perfect information, a good prediction of the optimal objective value can still be used
to change the solver settings or to devise smarter rules, such as node selection policies that account for this predicted
value. Inspired by these observations we ask the two following closely-related questions:
(Q1) How well can we predict the optimal objective value?
(Q2) With what accuracy can we predict, during the solution process, whether or not a given solution is optimal?
Our contributions are as follows. First, we propose a methodology to predict optimal objective values, answering (Q1).
We then use the output of this predictive model, together with additional data, as input of our proposed classifiers,
which give an answer to question (Q2). For this second task, we also propose some metrics that capture the state of the
solving process, and that prove to be valuable for our classifier. Our computational study shows the high accuracy of
our proposed predictor. Furthermore, when compared to previous methods, our classifiers show better performance.
Finally, we provide further insight into how the performance can be tuned to the desired behaviour and into the ways
that the classifier makes use of the provided data.
Our discussion is organized as follows. We start by defining some key concepts and notation in Section 2, followed
by a discussion of the work most closely related to ours (Section 3). Section 4 describes our methodology in detail.
The results of our computational study are presented in Section 5. Finally, we conclude with some final remarks and
future work in Section 6. The code to reproduce all experiments is available online [17].
2 Background
Mixed Integer Linear Programming Given are a matrix A ∈ Qm×n , vectors c ∈ Qn and b ∈ Qm , and a partition
(I, C) of the variable index set {1, 2, ..., n}. A Mixed Integer Linear Program is the problem of finding
T
z ∗ = min c x
subject to Ax ≥ b,
(1)
xj ∈ Z≥0 ∀j ∈ I,
xj ≥ 0 ∀j ∈ C.
Notice that the variables in I are required to be integer. Removing this integrality constraint turns the problem into a
Linear Program (LP), which constitutes a relaxation of the original MILP, known as the LP relaxation. While MILP
is N P-hard, LPs are polynomial solvable.
Solving Mixed Integer Linear Programs The standard approach to solving MILPs is to use the LP-based branch-
and-bound (B&B) algorithm. This algorithm sequentially partitions the feasible region, while using LP relaxations to
obtain lower bounds on the quality of the solutions of each sub-region. This search can be represented as a binary1 tree.
At a given time t of the solution process we use Tt to denote the search tree, i.e. the set of nodes, constructed so far by
the B&B algorithm. We denote by x∗ the optimal solution to Problem (1) and z ∗ its corresponding optimal objective
value. For a given node i of the search tree, let ziLP be the optimal objective value of the node’s LP relaxation. We
use the notation z LP for the root node, i.e., the solution to the original problem’s LP relaxation. At any point of the
search, an integer feasible solution provides an upper bound on the optimal objective value. Let x̄(t) be the best known
T
solution at time t and let z̄(t) = c x̄(t) denote its objective value (also called the incumbent). Then we can prune any
LP
node i such that zi ≥ z̄(t).
The nodes of Tt can be classified into three types:
• It is the set of inner nodes of the tree. This is, nodes that have been processed (its LP relaxation solved) and
resulted in branching.
• Lt is the set of leaves of the tree. This is, the set of nodes that have been processed and resulted in pruning or
in an integer feasible solution.
• Ot is the set of open nodes, i.e., nodes that have not been processed yet.
As mentioned before, the incumbent z̄(t) provides an upper bound on z ∗ . We can also obtain a global lower bound.
Let z(t) := mini∈Ot {ziLP }. Then notice that necessarily z(t) ≤ z ∗ .
In practice, MILP solvers implement a plethora of other techniques to accelerate the solution process. Among them,
cutting planes and primal heuristics are essential parts of today’s mathematical optimization software.
MILP solving phases The B&B algorithm can solve MILPs to optimality. This means that, if the algorithm termi-
nates, it does so after having obtained a feasible solution and a proof of its optimality (or, on the contrary, proof of
infeasibility). Several solver components work together for this goal, each with more or less focus on the feasibility
and the optimality parts. Berthold et al. [2] point out that, typically, the optimal solution is found well before the solver
can prove optimality. Following this, they propose partitioning the search process into phases, according to three target
goals. These phases are the following.
1. Feasibility. This phase encompasses the time spanned from the beginning of the search until the first feasible
solution is found.
2. Improvement. From the moment the first feasible solution is found until an optimal solution is found.
3. Proving. Spans the time elapsed from the moment the optimal solution is found until the solver terminates
with a proof of optimality.
1
Standard implementations of the B&B algorithm use single-variable disjunctions that partition the feasible set into two. Other
approaches exist but are, to the best of our knowledge, not implemented in standard optimization software.
2
The transition between the first and the second phase happens when a feasible solution is found. In contrast, the
moment in which the solver transitions from the second to the third phase is unknown until the search is completed.
Notice that if the instance is infeasible the solver terminates while in the first phase. For the purpose of our study we
assume that the instances are feasible.
3 Related Work
MILP solution prediction In recent years, the topic of solution prediction for combinatorial optimization problems
has gained momentum [19, 7, 10]. For MILPs, the goal is to produce a (partial) assignment of the integer variables
via a predictive machine learning model. This prediction can then be used to guide the search in different ways. Ding
et al. [6] impose a constraint that forces the search to remain in a neighbourhood of the predicted optimal solution. In
this way, by restricting the size of the feasible region, the authors aim to accelerate the solution process. In contrast,
the approaches of Nair et al. [15] and Khalil et al. [12] consist in fixing a subset of variables to their predicted optimal
value, letting the solver optimize over the remaining ones. Khalil et al. [12] further propose a solver mode that uses
the predicted solution to guide the node processing order. In the present work, we take a different path by aiming to
predict the optimal objective value, as opposed to the solution, i.e., the values that each variable takes. This task is
easier from a learning perspective, yet still offers several ways in which one can exploit this information.
Phase transition predictions Berthold et al. [2] defined the three phases of MILP solving that were introduced
in Section 2. Their goal is to adapt the solver’s strategy depending on the phase. For this purpose, they propose
two criteria that can be used to predict the transition between phase 2 (improvement) and phase 3 (proving) without
knowledge of the optimal solution. These criteria are based on node estimates: for every node i ∈ Tt , the solver SCIP
keeps an estimate ĉ(i) of the objective value of the best solution attainable at that node (see [2] for a formal definition
of how this is computed). At time t of the solving process, let ĉmin (t) := min{ĉ(i) | i ∈ Ot } be the minimum of
these estimates among the open nodes. We further define d(i) to be the depth2 of node i. The first transition criterion,
the best-estimate criterion, indicates that the transition moment is the first time the incumbent becomes smaller than
ĉmin (t). Formally, let us define a binary classifier C est that indicates if the transition has occurred using the criterion
The second criterion is called rank-1 and is based on the set of open nodes with better estimate than the processed
nodes at the same depth. Formally, let
n o
R1 (t) := i ∈ Ot | ĉ(i) ≤ inf{ĉ(j) : j ∈ It ∪ Lt , d(j) = d(i)}
This set can be used to define a classifier C rank-1 that indicates that the transition has occurred once the set becomes
empty for the first time. This is,
The authors use these criteria to switch between different pre-determined solver settings depending on the phase of
solving. Their experiments show improved solving time, especially when using the rank-1 criterion. However, it is
also clear that both criteria tend to be satisfied before the phase transition actually occurs, and there is some room for
improvement in the accuracy of the classifiers, as we shall see from our own computational study.
B&B resolution predictions Closely related to the present work is that of Hendel et al. [11], who use a number
of solver metrics to predict the final B&B tree size. They use a combination of metrics from the literature, together
with their own, as input to a machine learning model that estimates the final tree size dynamically as the tree is being
constructed. Their method was incorporated into version 7.0 of the solver SCIP as a progress metric for the user. In
a similar fashion, Fischetti et al. [8] use a number of solver metrics to predict, during the solving process, whether or
not the run will end within the given time limit. This prediction can be used to adapt the solver behaviour in the case
that the answer is negative.
2
We define the depth of a node as its distance to the root node. Therefore, by definition, the depth of the root node is zero.
3
MILP Read problem Presolving
Root node
processing
B&B process z*
GNN z̃*
Figure 1: Optimal objective value prediction task. The MILP representation is computed after the root node has been
processed. This serves as an input to a GNN that outputs a prediction z̃ ∗ of the optimal objective value.
4 Methodology
This section details the methodology used to answer questions Q1 (Section 4.1) and Q2 (Section 4.2). We assume
we are given a space X of instances of interest. For some tasks, we will use the bipartite graph representation of
MILPs introduced by Gasse et al. [9]. This is, given an MILP instance X ∈ X defined as in Eq. 1, we build a graph
representation as follows: each constraint and each variable have a corresponding representative node. A constraint
node is connected to a variable node if the corresponding variable has a non-zero coefficient in the corresponding
constraint. Each node has an associated vector of features that describes it. We utilize the same features as Gasse et
al., except that we do not include any incumbent information. In short, instead of the raw data in X ∈ X we use the
graph representation, which we denote XG ∈ XG , and is composed of a tuple XG = (C, V , A), where C ∈ Rm×dc
and V ∈ Rn×dv represent the constraint and variable features, respectively, and A ∈ Rm×n is the adjacency matrix.
The first task we tackle is the one of predicting the optimal objective value (Q1). That is, given an MILP in-
stance X ∈ X , we want to predict the optimal objective value z ∗ . This prediction is computed once and for all at
the root node, once the LP solution is available. We frame this as a regression task. This process is depicted in Figure 1.
For this regression task, we utilise the bipartite graph representation of Gasse et al. [9] defined above, which is pro-
cessed using a Graph Neural Network (GNN) that performs two half-convolutions. In particular, the feature matrices
C and V first go through an embedding layer with two feedforward networks with ReLU activation. Next, one
first pass updates the constraint descriptors using the variable descriptors, while a second pass updates the variable
descriptors using the (new) constraint descriptors. This is done with message-passing operations, computed as
n
X
c′i = W (11) ci + W (12) Aij v j (4)
j=1
m
X
v ′j = W (21) v j + W (22) Aij c′i (5)
i=1
where W (11) , W (12) , W (21) and W (22) are trainable weights, ci is the feature vector of constraint i and v j is
the feature vector of variable j. The variable descriptors then go through another feedforward network with ReLU
activation. Finally, average pooling is applied to obtain one single output value.
Our goal is to learn a mapping f (Xg ) : XG 7→ R which outputs an approximation z̃ ∗ of the optimal objective value z ∗ .
At the moment of this prediction, the solution to the root LP relaxation is known and can be used for further context.
In order to exploit that knowledge, we test three potential targets for the machine learning model, namely
Θ1 = z ∗
4
z∗
Θ2 =
z LP
Θ3 = z ∗ − z LP .
This gives rise to three models f1 (Xg ), f2 (Xg ) and f3 (Xg ), which we later transform into the desired output by
setting either f (Xg ) = f1 (Xg ), f (Xg ) = f2 (Xg ) · z LP , or f (Xg ) = f3 (Xg ) + z LP .
The second task (Q2) is predicting the transition between phases 2 (improvement) and 3 (proving). That is, at any
point during the solution process we want to predict whether the incumbent is in fact optimal. We cast this problem
as a classification task.
We test the performance of two classifiers. The first one is based on the output of the GNN model discussed in Section
4.1. Given an instance X ∈ X (in fact its associated graph representation XG ) and the current incumbent z̄, we obtain
a binary prediction CϵGN N : XG × R 7→ {0, 1} in the following way
GN N 1 if z̄ < f (XG ) + ϵ · |f (XG )|
Cϵ (XG , z̄) = (6)
0 otherwise
for some ϵ ∈ [−1, 1]. The ϵ parameter allows us to control the confidence in the prediction.
The CϵGN N classifier is static, in the sense that it does not make use of any information coming from the B&B process.
On the contrary, the second predictor we propose, which we call C D , is based on a set of dynamic metrics that are
collected during the solving process. The metrics are the following.
Tree weight For a given node v ∈ Tt , let d(v) denote the node’s depth. Then, the tree weight at time t is defined as
X
ω(t) := 2−d(v) . (8)
v∈Lt
Median gap Let m(t) = median{ziLP | i ∈ Ot } and let z̄ 0 be the first incumbent found. We define the median gap
as
|z̄(t) − m(t)|
µ(t) = (9)
|z̄ 0 − z LP |
Trend of open nodes For a certain window size h, we store the values of |Ok | for k ∈ {t − h, t − h + 1, ..., t}. We
then fit a linear function using least squares to compute the trend of this sequence. We denote this trend at time t as
τ (t).
Ratio to GNN prediction We make use of the prediction f (XG ) coming from the GNN model and include the ratio
with respect to the current incumbent as a metric. In particular we use
f (XG )
ρ(t) = (10)
z̄(t)
Notice that, while the gap and the tree weight are metrics from the literature, the other three are our own.
The input to the classifier is therefore a tuple XD = (g(t), ω(t), µ(t), τ (t), ρ(t)). We train a classifier C D (XD ) that
makes use of these dynamic features to make a binary prediction on whether we are in phase 2 or 3. We use a simple
logistic regression, which will allow us to more easily interpret the resulting model, in contrast to more complex
machine learning models.
5
5 Computational Results
This section describes our computational setup and results. All experiments were performed with the solver SCIP
v.8.0 [3]. Code for reproducing all experiments in this section is available online [17].
Benchmarks We use three NP-hard problem benchmarks from the literature: set covering, combinatorial auctions
and generalized independent set problem (GISP). We create a fourth benchmark (mixed) that is comprised of instances
of the three types, in equal proportion. The method and configuration used for generation of the instances is summa-
rized in Table 1. For each instance type, we generate 10,000 instances for training, 2000 instances for validation and
another 2000 for testing.
Phase analysis As a first approach to the instances, we run an experiment to analyze the breakdown into solving
phases. We solve 100 of the training instances, each with 3 different randomization seeds, which gives us a total of
300 data points per benchmark. During the solution process we record the time when branching starts, the time when
the first solution is found, the time when a solution within 5% of the optimal is found, and the time when the optimal
solution is found. This allows us to compute the percentage of time spent on each phase, and the percentage of time
spent branching versus before branching (i.e., pre-processing the instance and processing the root node). We average
these numbers over the 300 samples to obtain a view of the typical behaviour of the solver on each benchmark. We
further divide phase 2 (improvement) into two sub-phases: (2a) from the first feasible solution to the first feasible
solution with objective value within 5% of the optimal, and (2b) which encompasses the rest of phase 2. The results
are shown in Figure 2. We observe the following. For all benchmarks, obtaining a feasible solution is trivial. For set
covering instances, the optimal solution is often known by the time that branching starts. In the case of combinatorial
auctions, the optimal solution is typically not known at the start of B&B, but a good solution is. For GISP, finding
optimal, or even good, solutions is not as easy, making the proving phase relatively shorter. We conclude that these
benchmarks allow us to test our methodology on three very different settings that may arise in a real-life situation.
Data collection procedure For each instance, we collect information at the root node: the bipartite graph represen-
tation XG = (C, V , A) and the optimal root LP value z LP . We then proceed to solve the instance. For the first 100
processed nodes and as long as no incumbent exists, no samples are collected. This allows us to initialize statistics
as the trend of open nodes τ (t), and to ignore instances that are solved within 100 nodes which are therefore too
easy. After 100 nodes have been processed and an incumbent exists, we collect samples with a probability of 0.02.
At sampling time, we record the value of the dynamic features (see Section 4.2), as well as the incumbent value z̄(t).
Once the instance is solved, the collected samples are completed by appending the root node information (XG , z LP )
as well as the optimal objective value z ∗ , which will be used as a target.
Optimal objective value prediction (Q1) We test the prediction accuracy of our GNN model on the four bench-
marks. We train a model for each of the targets described in Section 4. We measure the error as
N
1 X |zi∗ − z̃i∗ |
e = 100 × (11)
N i=1 |zi∗ |
Table 1: Method and configuration settings used to generate the instances of problem benchmark.
6
Phase 1: 0.0% Phase 1: 0.0%
Phase 2a: 11.9%
Phase 2a: 19.2%
Phase 3: 33.3%
First branching occurs
Phase 3: 55.1%
GISP
Phase 2b: 43.9%
(c) GISP
Figure 2: Phase analysis of three instance types. We divide the solution process into (1) Feasibility, in dark yellow,
(2a) Improvement up to 5% to optimality, in light yellow, (2b) Improvement from 5% to optimal, in light purple, and
(3) Proving, in dark purple. We also indicate when the first branching occurs. The data is averaged over 100 instances
with 3 randomization seeds (i.e., 300 samples).
where N is the number of samples, zi∗ is the optimal objective value of sample i and z̃i∗ is the predicted optimal
objective value of sample i. Notice that, independently of the learning target, we measure the error in the space of the
original prediction we want to make.
Prediction of phase transition (Q2) We make a prediction on whether we have transitioned to phase 3 (optimal
solution has been found). We compare the performance of four predictors. The first two predictors are the ones
proposed by Berthold et al. [2], namely C est (best-estimate, see Eq. 2) and C rank-1 (rank-1, see Eq. 3). The third
predictor CϵGN N is based on the GNN regression model, as described in Eq. 6. We report the performance of this
classifier with ϵ = 0 and with a tuned value ϵ∗ which was obtained by optimizing the accuracy with a small grid
search over the range [−0.02, 0.02] on the validation set. The fourth predictor C D is based on the dynamic features,
as described in Section 4.2.
5.2 Results
Tables 2 and 3 show the results of the optimal objective value prediction task. The GNN models tested in Table 2
were trained and tested on instances of the same type. On the contrary, the results of Table 3 correspond to one
unique model that was trained in the mixed dataset, and then tested on different benchmarks. First, we observe that
using targets that include LP information (Θ2 and Θ3 ) is beneficial to performance, as opposed to directly trying to
predict the optimal objective value (Θ1 ). There is no clear winner among targets Θ2 and Θ3 . Second, we observe
that the generalist model, the one trained on the mixed dataset, performs comparably to the specialized models, even
outperforming them in some cases.
We now select one GNN model per benchmark to be used in the next prediction task: the phase transition prediction.
We select the model in the following way: we use the specialized model that achieves the best result on the validation
set. Figure 3 (a-c) shows the results for all classifiers on the pure benchmarks (see Table 4 for the same results in table
form). Further, we include a column that shows the classification accuracy of a dummy model that always predicts
the majority class. We observe that the classifiers of Berthold et al. [2] (best-estimate and rank-1) tend to predict the
7
Table 2: Average relative error (as defined in Eq. 11) of the GNN model. One model was trained per benchmark. The
train and test instances in each case are of the same type.
Instances Θ1 Θ2 Θ3
Set covering 1.48 0.80 0.54
Combinatorial auctions 3.20 0.55 0.62
GISP 3.32 2.35 2.39
Table 3: Average relative error (as defined in Eq. 11) of the GNN mixed model. Only one model was trained on a
dataset comprised of intances of all types. The test sets are comprised of instances of one type only, except for the
mixed test set (last row).
Instances Θ1 Θ2 Θ3
Set covering 1.35 0.73 0.82
Combinatorial auctions 3.15 1.17 0.53
GISP 3.17 2.32 2.43
Mixed test set 1.70 0.97 0.75
phase transition too early. This is, they mostly output a positive prediction, which means they believe the incumbent
to be optimal. This results in the misclassified samples being almost exclusively false positives. On the contrary, the
GNN model C0GN N tends to be too pessimistic, which can be fixed with the right tuning of the ϵ parameter. For all
benchmarks, CϵGN∗
N
performs better than the classifiers of Berthold et al. [2]. At the same time, the inclusion of the dy-
namic features (C ) further improves the performance, except for set covering where CϵGN
D
∗
N
and C D are close to a tie.
It is important to notice that, depending on the application, false positives and false negatives could have very
different consequences. As an example, if the phase transition prediction is used to change the behaviour of the primal
heuristics (e.g. switch them off once the optimal is found) a false positive could excessively delay finding the optimal
solution and therefore has a much bigger potential of harming performance than a false negative. The parameter ϵ
provides an easy way to navigate this tradeoff, where one could sacrifice some accuracy to keep the rate of false
positives to a minimum.
Figure 3d shows the same experiment but on a mixed dataset. This is, the models were trained and tested on a
benchmark comprised of instances of all three types (in equal proportion). We observe a similar behaviour compared
to the specialized benchmarks. The GNN model C0GN N tends to be too pessimistic, while CϵGN ∗
N
achieves better
accuracy and better false positive rate than the classifiers of Berthold et al. [2]. Using dynamic features further
improves the accuracy of the model.
Finally, we analyze the importance of the dynamic features assigned by the C D classifier (Figure 4). We see that the
four learned models are in fact very different, with the GISP model mostly making decisions based on the gap and the
other three considering all features more uniformly. This speaks in favour of learning on sets of instances of the same
type.
6 Conclusions
In this paper, we presented our methodology for predicting the optimal objective value of MILPs. Compared to the
literature on predicting optimal solutions, our learning task is easier, yet still offers a variety of possibilities for its
application within MILP solvers. Our methods can be used to both predict the optimal objective value and to classify a
feasible solution into optimal or sub-optimal. Our computational study shows that our proposed approach outperforms
the existing approaches in the literature. Further, they provide more flexibility to tune the model into the desired
behaviour. We show that there are benefits to learning a model that specializes to an instance type, yet our model is
still able to generalize well and have superior performance to other methods on mixed instance sets.
These results open the door for many possible applications. In general terms, this prediction can be used to adapt the
behaviour of the different solver components and rules depending on the solving phase. These applications, however,
require further study and will be the subject of future work.
8
1.0 1.0
correct
fp
fn
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
correct
fp
fn
0.0 0.0
majority C est C rank 1 C0GNN C GNN
* CD majority C est C rank 1 C0GNN C GNN
* CD
1.0 1.0
correct correct
fp fp
fn fn
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
majority C est C rank 1 C0GNN C GNN
* CD majority C est C rank 1 C0GNN C GNN
* CD
9
Table 4: Prediction accuracy of the different classifier models. We show the fraction of correctly classified samples,
the fraction of false positives and the fraction of false negatives.
Correct False positives False negatives
Majority 0.89 0.11 0.00
C est 0.91 0.09 0.00
C rank-1 0.92 0.08 0.00
C0GNN 0.52 0.05 0.43
CϵGNN
∗ 0.93 0.04 0.03
CD 0.90 0.05 0.05
Set covering
10
g(t)
(t)
(t)
(t)
(t)
11
References
[1] E. Balas and A. Ho. Set covering algorithms using cutting planes, heuristics, and subgradient optimization: a
computational study. In Combinatorial Optimization, pages 37–60. Springer, 1980.
[2] T. Berthold, G. Hendel, and T. Koch. From feasibility to improvement to proof: three phases of solving mixed-
integer programs. Optimization Methods and Software, 33(3):499–517, 2018.
[3] K. Bestuzheva, M. Besançon, W.-K. Chen, A. Chmiela, T. Donkiewicz, J. van Doornmalen, L. Eifler, O. Gaul,
G. Gamrath, A. Gleixner, L. Gottwald, C. Graczyk, K. Halbig, A. Hoen, C. Hojny, R. van der Hulst, T. Koch,
M. Lübbecke, S. J. Maher, F. Matter, E. Mühmer, B. Müller, M. E. Pfetsch, D. Rehfeldt, S. Schlein, F. Schlösser,
F. Serrano, Y. Shinano, B. Sofranac, M. Turner, S. Vigerske, F. Wegscheider, P. Wellner, D. Weninger, and
J. Witzig. The SCIP Optimization Suite 8.0. Technical report, Optimization Online, December 2021. URL
http://www.optimization-online.org/DB_HTML/2021/12/8728.html.
[4] A. Chmiela, E. Khalil, A. Gleixner, A. Lodi, and S. Pokutta. Learning to schedule heuristics in branch and bound.
Advances in Neural Information Processing Systems, 34:24235–24246, 2021.
[5] M. Colombi, R. Mansini, and M. Savelsbergh. The generalized independent set problem: Polyhedral analysis
and solution approaches. European Journal of Operational Research, 260(1):41–55, 2017.
[6] J.-Y. Ding, C. Zhang, L. Shen, S. Li, B. Wang, Y. Xu, and L. Song. Accelerating primal solution findings
for mixed integer programs based on solution prediction. In Proceedings of the AAAI conference on artificial
intelligence, volume 34, pages 1452–1459, 2020.
[7] N. Efthymiou and N. Yorke-Smith. Predicting the optimal period for cyclic hoist scheduling problems. In
Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), volume
13884 of Lecture Notes in Computer Science, pages 238–253. Springer, 2023.
[8] M. Fischetti, A. Lodi, and G. Zarpellon. Learning MILP resolution outcomes before reaching time-limit. In
Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), volume 16,
pages 275–291. Springer, 2019.
[9] M. Gasse, D. Chételat, N. Ferroni, L. Charlin, and A. Lodi. Exact combinatorial optimization with graph convo-
lutional neural networks. Advances in Neural Information Processing Systems, 32, 2019.
[10] Q. Han, L. Yang, Q. Chen, X. Zhou, D. Zhang, A. Wang, R. Sun, and X. Luo. A GNN-guided predict-and-search
framework for mixed-integer linear programming. In International Conference on Learning Representations,
2023. URL https://api.semanticscholar.org/CorpusID:256827203.
[11] G. Hendel, D. Anderson, P. Le Bodic, and M. E. Pfetsch. Estimating the size of branch-and-bound trees. IN-
FORMS Journal on Computing, 34(2):934–952, 2022.
[12] E. B. Khalil, C. Morris, and A. Lodi. MIP-GNN: A data-driven framework for guiding combinatorial solvers.
AAAI, 2022.
[13] P. Kilby, J. Slaney, S. Thiébaux, T. Walsh, et al. Estimating search tree size. In Proceedings of the AAAI
Conference on Artificial Intelligence, 2006.
[14] K. Leyton-Brown, M. Pearson, and Y. Shoham. Towards a universal test suite for combinatorial auction algo-
rithms. In Proceedings of the 2nd ACM conference on Electronic commerce, pages 66–76, 2000.
[15] V. Nair, S. Bartunov, F. Gimeno, I. von Glehn, P. Lichocki, I. Lobov, B. O’Donoghue, N. Sonnerat, C. Tjandraat-
madja, P. Wang, et al. Solving mixed integer programs using neural networks. arXiv preprint arXiv:2012.13349,
2020.
[16] M. B. Paulus, G. Zarpellon, A. Krause, L. Charlin, and C. Maddison. Learning to cut by looking ahead: Cutting
plane selection via imitation learning. In International Conference on Machine Learning, pages 17584–17600.
PMLR, 2022.
[17] L. Scavuzzo. Code for the paper “Learning optimal objective values for MILP”, 2024. https://github.com/
lascavana/ObjValPrediction.
[18] L. Scavuzzo, K. Aardal, A. Lodi, and N. Yorke-Smith. Machine learning augmented branch and bound for mixed
integer linear programming. Mathematical Programming, pages 1–44, 2024.
[19] Y. Shen, Y. Sun, X. Li, A. C. Eberhard, and A. T. Ernst. Adaptive solution prediction for combinato-
rial optimization. European Joural of Operational Research, 309:1392–1408, 2022. URL https://api.
semanticscholar.org/CorpusID:256358882.
[20] N. Sonnerat, P. Wang, I. Ktena, S. Bartunov, and V. Nair. Learning a large neighborhood search algorithm for
mixed integer programs. arXiv preprint arXiv:2107.10201, 2021.
12