0% found this document useful (0 votes)

15 views12 pages

Learning Optimal Objective Values For MILP.18321v1

Uploaded by

neturiue

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

Learning Optimal Objective Values For MILP.18321v1

Uploaded by

neturiue

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

L EARNING O PTIMAL O BJECTIVE VALUES FOR MILP

Lara Scavuzzo Karen Aardal Neil Yorke-Smith

TU Delft TU Delft TU Delft
[email protected] [email protected] [email protected]
arXiv:2411.18321v1 [math.OC] 27 Nov 2024

A BSTRACT
Modern Mixed Integer Linear Programming (MILP) solvers use the Branch-and-Bound algorithm
together with a plethora of auxiliary components that speed up the search. In recent years, there
has been an explosive development in the use of machine learning for enhancing and supporting
these algorithmic components [18]. Within this line, we propose a methodology for predicting the
optimal objective value, or, equivalently, predicting if the current incumbent is optimal. For this
task, we introduce a predictor based on a graph neural network (GNN) architecture, together with
a set of dynamic features. Experimental results on diverse benchmarks demonstrate the efficacy of
our approach, achieving high accuracy in the prediction task and outperforming existing methods.
These findings suggest new opportunities for integrating ML-driven predictions into MILP solvers,
enabling smarter decision-making and improved performance.

1 Introduction
Mixed Integer Linear Programming (MILPs) is a widespread tool for modelling mathematical optimization problems,
with applications in numerous real-world scenarios. The Branch-and-Bound (B&B) algorithm, which employs a
divide-and-conquer approach, is the preferred method for solving MILPs to global optimality. In recent years, there
has been a surge in interest in harnessing the power of machine learning (ML) tools to aid the solution process of
MILPs. From solution prediction (e.g. [6, 15, 20]) to interventions on the heuristic rules used by the solvers (e.g.
[9, 4, 16]), several approaches have been studied in the literature (see Scavuzzo et al. [18] for an in-depth discussion
of this topic). The overarching trend is to build dynamic MILP solvers that can make active use of the large amounts
of data produced during the solving process.
Many of the decisions that must be made during the B&B process could be better informed were the optimal solution
known from the start. In fact, even knowing the optimal objective value can positively influence the solver behaviour.
For example, once a solution is found that matches this value, any effort to find new solutions can be avoided. With
perfect information of the optimal objective value, a solver can further do more aggressive pruning of nodes. In
general, having this knowledge can allow the solver to adapt its configuration, putting more emphasis on different
components. Even in absence of perfect information, a good prediction of the optimal objective value can still be used
to change the solver settings or to devise smarter rules, such as node selection policies that account for this predicted
value. Inspired by these observations we ask the two following closely-related questions:
(Q1) How well can we predict the optimal objective value?
(Q2) With what accuracy can we predict, during the solution process, whether or not a given solution is optimal?
Our contributions are as follows. First, we propose a methodology to predict optimal objective values, answering (Q1).
We then use the output of this predictive model, together with additional data, as input of our proposed classifiers,
which give an answer to question (Q2). For this second task, we also propose some metrics that capture the state of the
solving process, and that prove to be valuable for our classifier. Our computational study shows the high accuracy of
our proposed predictor. Furthermore, when compared to previous methods, our classifiers show better performance.
Finally, we provide further insight into how the performance can be tuned to the desired behaviour and into the ways
that the classifier makes use of the provided data.
Our discussion is organized as follows. We start by defining some key concepts and notation in Section 2, followed
by a discussion of the work most closely related to ours (Section 3). Section 4 describes our methodology in detail.
The results of our computational study are presented in Section 5. Finally, we conclude with some final remarks and
future work in Section 6. The code to reproduce all experiments is available online [17].

2 Background
Mixed Integer Linear Programming Given are a matrix A ∈ Qm×n , vectors c ∈ Qn and b ∈ Qm , and a partition
(I, C) of the variable index set {1, 2, ..., n}. A Mixed Integer Linear Program is the problem of finding
T
z ∗ = min c x
subject to Ax ≥ b,
(1)
xj ∈ Z≥0 ∀j ∈ I,
xj ≥ 0 ∀j ∈ C.
Notice that the variables in I are required to be integer. Removing this integrality constraint turns the problem into a
Linear Program (LP), which constitutes a relaxation of the original MILP, known as the LP relaxation. While MILP
is N P-hard, LPs are polynomial solvable.

Solving Mixed Integer Linear Programs The standard approach to solving MILPs is to use the LP-based branch-
and-bound (B&B) algorithm. This algorithm sequentially partitions the feasible region, while using LP relaxations to
obtain lower bounds on the quality of the solutions of each sub-region. This search can be represented as a binary1 tree.
At a given time t of the solution process we use Tt to denote the search tree, i.e. the set of nodes, constructed so far by
the B&B algorithm. We denote by x∗ the optimal solution to Problem (1) and z ∗ its corresponding optimal objective
value. For a given node i of the search tree, let ziLP be the optimal objective value of the node’s LP relaxation. We
use the notation z LP for the root node, i.e., the solution to the original problem’s LP relaxation. At any point of the
search, an integer feasible solution provides an upper bound on the optimal objective value. Let x̄(t) be the best known
T
solution at time t and let z̄(t) = c x̄(t) denote its objective value (also called the incumbent). Then we can prune any
LP
node i such that zi ≥ z̄(t).
The nodes of Tt can be classified into three types:

• It is the set of inner nodes of the tree. This is, nodes that have been processed (its LP relaxation solved) and
resulted in branching.
• Lt is the set of leaves of the tree. This is, the set of nodes that have been processed and resulted in pruning or
in an integer feasible solution.
• Ot is the set of open nodes, i.e., nodes that have not been processed yet.

As mentioned before, the incumbent z̄(t) provides an upper bound on z ∗ . We can also obtain a global lower bound.
Let z(t) := mini∈Ot {ziLP }. Then notice that necessarily z(t) ≤ z ∗ .
In practice, MILP solvers implement a plethora of other techniques to accelerate the solution process. Among them,
cutting planes and primal heuristics are essential parts of today’s mathematical optimization software.

MILP solving phases The B&B algorithm can solve MILPs to optimality. This means that, if the algorithm termi-
nates, it does so after having obtained a feasible solution and a proof of its optimality (or, on the contrary, proof of
infeasibility). Several solver components work together for this goal, each with more or less focus on the feasibility
and the optimality parts. Berthold et al. [2] point out that, typically, the optimal solution is found well before the solver
can prove optimality. Following this, they propose partitioning the search process into phases, according to three target
goals. These phases are the following.

1. Feasibility. This phase encompasses the time spanned from the beginning of the search until the first feasible
solution is found.
2. Improvement. From the moment the first feasible solution is found until an optimal solution is found.
3. Proving. Spans the time elapsed from the moment the optimal solution is found until the solver terminates
with a proof of optimality.
1
Standard implementations of the B&B algorithm use single-variable disjunctions that partition the feasible set into two. Other
approaches exist but are, to the best of our knowledge, not implemented in standard optimization software.

2
The transition between the first and the second phase happens when a feasible solution is found. In contrast, the
moment in which the solver transitions from the second to the third phase is unknown until the search is completed.
Notice that if the instance is infeasible the solver terminates while in the first phase. For the purpose of our study we
assume that the instances are feasible.

3 Related Work
MILP solution prediction In recent years, the topic of solution prediction for combinatorial optimization problems
has gained momentum [19, 7, 10]. For MILPs, the goal is to produce a (partial) assignment of the integer variables
via a predictive machine learning model. This prediction can then be used to guide the search in different ways. Ding
et al. [6] impose a constraint that forces the search to remain in a neighbourhood of the predicted optimal solution. In
this way, by restricting the size of the feasible region, the authors aim to accelerate the solution process. In contrast,
the approaches of Nair et al. [15] and Khalil et al. [12] consist in fixing a subset of variables to their predicted optimal
value, letting the solver optimize over the remaining ones. Khalil et al. [12] further propose a solver mode that uses
the predicted solution to guide the node processing order. In the present work, we take a different path by aiming to
predict the optimal objective value, as opposed to the solution, i.e., the values that each variable takes. This task is
easier from a learning perspective, yet still offers several ways in which one can exploit this information.

Phase transition predictions Berthold et al. [2] defined the three phases of MILP solving that were introduced
in Section 2. Their goal is to adapt the solver’s strategy depending on the phase. For this purpose, they propose
two criteria that can be used to predict the transition between phase 2 (improvement) and phase 3 (proving) without
knowledge of the optimal solution. These criteria are based on node estimates: for every node i ∈ Tt , the solver SCIP
keeps an estimate ĉ(i) of the objective value of the best solution attainable at that node (see [2] for a formal definition
of how this is computed). At time t of the solving process, let ĉmin (t) := min{ĉ(i) | i ∈ Ot } be the minimum of
these estimates among the open nodes. We further define d(i) to be the depth2 of node i. The first transition criterion,
the best-estimate criterion, indicates that the transition moment is the first time the incumbent becomes smaller than
ĉmin (t). Formally, let us define a binary classifier C est that indicates if the transition has occurred using the criterion

if mins∈[0,t] {z̄(s) − ĉmin (s)} < 0

1
C est = (2)
0 otherwise.

The second criterion is called rank-1 and is based on the set of open nodes with better estimate than the processed
nodes at the same depth. Formally, let
n o
R1 (t) := i ∈ Ot | ĉ(i) ≤ inf{ĉ(j) : j ∈ It ∪ Lt , d(j) = d(i)}

This set can be used to define a classifier C rank-1 that indicates that the transition has occurred once the set becomes
empty for the first time. This is,

if mins∈[0,t] |R1 (s)| = 0

rank-1 1
C = (3)
0 otherwise.

The authors use these criteria to switch between different pre-determined solver settings depending on the phase of
solving. Their experiments show improved solving time, especially when using the rank-1 criterion. However, it is
also clear that both criteria tend to be satisfied before the phase transition actually occurs, and there is some room for
improvement in the accuracy of the classifiers, as we shall see from our own computational study.

B&B resolution predictions Closely related to the present work is that of Hendel et al. [11], who use a number
of solver metrics to predict the final B&B tree size. They use a combination of metrics from the literature, together
with their own, as input to a machine learning model that estimates the final tree size dynamically as the tree is being
constructed. Their method was incorporated into version 7.0 of the solver SCIP as a progress metric for the user. In
a similar fashion, Fischetti et al. [8] use a number of solver metrics to predict, during the solving process, whether or
not the run will end within the given time limit. This prediction can be used to adapt the solver behaviour in the case
that the answer is negative.
2
We define the depth of a node as its distance to the root node. Therefore, by definition, the depth of the root node is zero.

3
MILP Read problem Presolving
Root node
processing
B&B process z*

GNN z̃*

Figure 1: Optimal objective value prediction task. The MILP representation is computed after the root node has been
processed. This serves as an input to a GNN that outputs a prediction z̃ ∗ of the optimal objective value.

4 Methodology
This section details the methodology used to answer questions Q1 (Section 4.1) and Q2 (Section 4.2). We assume
we are given a space X of instances of interest. For some tasks, we will use the bipartite graph representation of
MILPs introduced by Gasse et al. [9]. This is, given an MILP instance X ∈ X defined as in Eq. 1, we build a graph
representation as follows: each constraint and each variable have a corresponding representative node. A constraint
node is connected to a variable node if the corresponding variable has a non-zero coefficient in the corresponding
constraint. Each node has an associated vector of features that describes it. We utilize the same features as Gasse et
al., except that we do not include any incumbent information. In short, instead of the raw data in X ∈ X we use the
graph representation, which we denote XG ∈ XG , and is composed of a tuple XG = (C, V , A), where C ∈ Rm×dc
and V ∈ Rn×dv represent the constraint and variable features, respectively, and A ∈ Rm×n is the adjacency matrix.

4.1 Optimal value prediction

The first task we tackle is the one of predicting the optimal objective value (Q1). That is, given an MILP in-
stance X ∈ X , we want to predict the optimal objective value z ∗ . This prediction is computed once and for all at
the root node, once the LP solution is available. We frame this as a regression task. This process is depicted in Figure 1.

For this regression task, we utilise the bipartite graph representation of Gasse et al. [9] defined above, which is pro-
cessed using a Graph Neural Network (GNN) that performs two half-convolutions. In particular, the feature matrices
C and V first go through an embedding layer with two feedforward networks with ReLU activation. Next, one
first pass updates the constraint descriptors using the variable descriptors, while a second pass updates the variable
descriptors using the (new) constraint descriptors. This is done with message-passing operations, computed as
n
X
c′i = W (11) ci + W (12) Aij v j (4)
j=1
m
X
v ′j = W (21) v j + W (22) Aij c′i (5)
i=1

where W (11) , W (12) , W (21) and W (22) are trainable weights, ci is the feature vector of constraint i and v j is
the feature vector of variable j. The variable descriptors then go through another feedforward network with ReLU
activation. Finally, average pooling is applied to obtain one single output value.
Our goal is to learn a mapping f (Xg ) : XG 7→ R which outputs an approximation z̃ ∗ of the optimal objective value z ∗ .
At the moment of this prediction, the solution to the root LP relaxation is known and can be used for further context.
In order to exploit that knowledge, we test three potential targets for the machine learning model, namely
Θ1 = z ∗

4
z∗
Θ2 =
z LP
Θ3 = z ∗ − z LP .
This gives rise to three models f1 (Xg ), f2 (Xg ) and f3 (Xg ), which we later transform into the desired output by
setting either f (Xg ) = f1 (Xg ), f (Xg ) = f2 (Xg ) · z LP , or f (Xg ) = f3 (Xg ) + z LP .

4.2 Prediction of phase transition

The second task (Q2) is predicting the transition between phases 2 (improvement) and 3 (proving). That is, at any
point during the solution process we want to predict whether the incumbent is in fact optimal. We cast this problem
as a classification task.

We test the performance of two classifiers. The first one is based on the output of the GNN model discussed in Section
4.1. Given an instance X ∈ X (in fact its associated graph representation XG ) and the current incumbent z̄, we obtain
a binary prediction CϵGN N : XG × R 7→ {0, 1} in the following way

GN N 1 if z̄ < f (XG ) + ϵ · |f (XG )|
Cϵ (XG , z̄) = (6)
0 otherwise
for some ϵ ∈ [−1, 1]. The ϵ parameter allows us to control the confidence in the prediction.
The CϵGN N classifier is static, in the sense that it does not make use of any information coming from the B&B process.
On the contrary, the second predictor we propose, which we call C D , is based on a set of dynamic metrics that are
collected during the solving process. The metrics are the following.

Gap Following SCIP [3], we define the gap as

(
1 if no solution has been found yet or z̄(t) · z(t) < 0,
g(t) := |z̄(t)−z(t)| (7)
max{|z̄(t)|,|z(t)|,ϵ} otherwise.

Tree weight For a given node v ∈ Tt , let d(v) denote the node’s depth. Then, the tree weight at time t is defined as
X
ω(t) := 2−d(v) . (8)
v∈Lt

This metric was first defined by Kilby et al. [13].

Median gap Let m(t) = median{ziLP | i ∈ Ot } and let z̄ 0 be the first incumbent found. We define the median gap
as
|z̄(t) − m(t)|
µ(t) = (9)
|z̄ 0 − z LP |

Trend of open nodes For a certain window size h, we store the values of |Ok | for k ∈ {t − h, t − h + 1, ..., t}. We
then fit a linear function using least squares to compute the trend of this sequence. We denote this trend at time t as
τ (t).

Ratio to GNN prediction We make use of the prediction f (XG ) coming from the GNN model and include the ratio
with respect to the current incumbent as a metric. In particular we use
f (XG )
ρ(t) = (10)
z̄(t)

Notice that, while the gap and the tree weight are metrics from the literature, the other three are our own.
The input to the classifier is therefore a tuple XD = (g(t), ω(t), µ(t), τ (t), ρ(t)). We train a classifier C D (XD ) that
makes use of these dynamic features to make a binary prediction on whether we are in phase 2 or 3. We use a simple
logistic regression, which will allow us to more easily interpret the resulting model, in contrast to more complex
machine learning models.

5
5 Computational Results

This section describes our computational setup and results. All experiments were performed with the solver SCIP
v.8.0 [3]. Code for reproducing all experiments in this section is available online [17].

5.1 Experimental Set Up

Benchmarks We use three NP-hard problem benchmarks from the literature: set covering, combinatorial auctions
and generalized independent set problem (GISP). We create a fourth benchmark (mixed) that is comprised of instances
of the three types, in equal proportion. The method and configuration used for generation of the instances is summa-
rized in Table 1. For each instance type, we generate 10,000 instances for training, 2000 instances for validation and
another 2000 for testing.

Phase analysis As a first approach to the instances, we run an experiment to analyze the breakdown into solving
phases. We solve 100 of the training instances, each with 3 different randomization seeds, which gives us a total of
300 data points per benchmark. During the solution process we record the time when branching starts, the time when
the first solution is found, the time when a solution within 5% of the optimal is found, and the time when the optimal
solution is found. This allows us to compute the percentage of time spent on each phase, and the percentage of time
spent branching versus before branching (i.e., pre-processing the instance and processing the root node). We average
these numbers over the 300 samples to obtain a view of the typical behaviour of the solver on each benchmark. We
further divide phase 2 (improvement) into two sub-phases: (2a) from the first feasible solution to the first feasible
solution with objective value within 5% of the optimal, and (2b) which encompasses the rest of phase 2. The results
are shown in Figure 2. We observe the following. For all benchmarks, obtaining a feasible solution is trivial. For set
covering instances, the optimal solution is often known by the time that branching starts. In the case of combinatorial
auctions, the optimal solution is typically not known at the start of B&B, but a good solution is. For GISP, finding
optimal, or even good, solutions is not as easy, making the proving phase relatively shorter. We conclude that these
benchmarks allow us to test our methodology on three very different settings that may arise in a real-life situation.

Data collection procedure For each instance, we collect information at the root node: the bipartite graph represen-
tation XG = (C, V , A) and the optimal root LP value z LP . We then proceed to solve the instance. For the first 100
processed nodes and as long as no incumbent exists, no samples are collected. This allows us to initialize statistics
as the trend of open nodes τ (t), and to ignore instances that are solved within 100 nodes which are therefore too
easy. After 100 nodes have been processed and an incumbent exists, we collect samples with a probability of 0.02.
At sampling time, we record the value of the dynamic features (see Section 4.2), as well as the incumbent value z̄(t).
Once the instance is solved, the collected samples are completed by appending the root node information (XG , z LP )
as well as the optimal objective value z ∗ , which will be used as a target.

Optimal objective value prediction (Q1) We test the prediction accuracy of our GNN model on the four bench-
marks. We train a model for each of the targets described in Section 4. We measure the error as
N
1 X |zi∗ − z̃i∗ |
e = 100 × (11)
N i=1 |zi∗ |

Table 1: Method and configuration settings used to generate the instances of problem benchmark.

Benchmark Generation method Configuration

Items: 750
Set covering Balas and Ho [1]
Sets: 1000
Combinatorial Leyton-Brown et al. [14] Items: 200
auctions with arbitrary relationships Bids: 1000
Nodes: 80
Colombi et al. [5] p = 0.6
GISP
with Erdos-Renyi graphs α = 0.75
SET2, A

6
Phase 1: 0.0% Phase 1: 0.0%
Phase 2a: 11.9%
Phase 2a: 19.2%

Phase 3: 33.3%
First branching occurs

Phase 3: 55.1%

Phase 2b: 25.7%

Phase 2b: 54.8%

First branching occurs setcover cauctions

(a) Set covering (b) Combinatorial auctions

Phase 1: 0.0%
First branching occurs
Phase 3: 25.3%
Phase 2a: 30.9%

GISP
Phase 2b: 43.9%

(c) GISP
Figure 2: Phase analysis of three instance types. We divide the solution process into (1) Feasibility, in dark yellow,
(2a) Improvement up to 5% to optimality, in light yellow, (2b) Improvement from 5% to optimal, in light purple, and
(3) Proving, in dark purple. We also indicate when the first branching occurs. The data is averaged over 100 instances
with 3 randomization seeds (i.e., 300 samples).

where N is the number of samples, zi∗ is the optimal objective value of sample i and z̃i∗ is the predicted optimal
objective value of sample i. Notice that, independently of the learning target, we measure the error in the space of the
original prediction we want to make.

Prediction of phase transition (Q2) We make a prediction on whether we have transitioned to phase 3 (optimal
solution has been found). We compare the performance of four predictors. The first two predictors are the ones
proposed by Berthold et al. [2], namely C est (best-estimate, see Eq. 2) and C rank-1 (rank-1, see Eq. 3). The third
predictor CϵGN N is based on the GNN regression model, as described in Eq. 6. We report the performance of this
classifier with ϵ = 0 and with a tuned value ϵ∗ which was obtained by optimizing the accuracy with a small grid
search over the range [−0.02, 0.02] on the validation set. The fourth predictor C D is based on the dynamic features,
as described in Section 4.2.

5.2 Results

Tables 2 and 3 show the results of the optimal objective value prediction task. The GNN models tested in Table 2
were trained and tested on instances of the same type. On the contrary, the results of Table 3 correspond to one
unique model that was trained in the mixed dataset, and then tested on different benchmarks. First, we observe that
using targets that include LP information (Θ2 and Θ3 ) is beneficial to performance, as opposed to directly trying to
predict the optimal objective value (Θ1 ). There is no clear winner among targets Θ2 and Θ3 . Second, we observe
that the generalist model, the one trained on the mixed dataset, performs comparably to the specialized models, even
outperforming them in some cases.

We now select one GNN model per benchmark to be used in the next prediction task: the phase transition prediction.
We select the model in the following way: we use the specialized model that achieves the best result on the validation
set. Figure 3 (a-c) shows the results for all classifiers on the pure benchmarks (see Table 4 for the same results in table
form). Further, we include a column that shows the classification accuracy of a dummy model that always predicts
the majority class. We observe that the classifiers of Berthold et al. [2] (best-estimate and rank-1) tend to predict the

7
Table 2: Average relative error (as defined in Eq. 11) of the GNN model. One model was trained per benchmark. The
train and test instances in each case are of the same type.
Instances Θ1 Θ2 Θ3
Set covering 1.48 0.80 0.54
Combinatorial auctions 3.20 0.55 0.62
GISP 3.32 2.35 2.39

Table 3: Average relative error (as defined in Eq. 11) of the GNN mixed model. Only one model was trained on a
dataset comprised of intances of all types. The test sets are comprised of instances of one type only, except for the
mixed test set (last row).
Instances Θ1 Θ2 Θ3
Set covering 1.35 0.73 0.82
Combinatorial auctions 3.15 1.17 0.53
GISP 3.17 2.32 2.43
Mixed test set 1.70 0.97 0.75

phase transition too early. This is, they mostly output a positive prediction, which means they believe the incumbent
to be optimal. This results in the misclassified samples being almost exclusively false positives. On the contrary, the
GNN model C0GN N tends to be too pessimistic, which can be fixed with the right tuning of the ϵ parameter. For all
benchmarks, CϵGN∗
N
performs better than the classifiers of Berthold et al. [2]. At the same time, the inclusion of the dy-
namic features (C ) further improves the performance, except for set covering where CϵGN
D
∗
N
and C D are close to a tie.

It is important to notice that, depending on the application, false positives and false negatives could have very
different consequences. As an example, if the phase transition prediction is used to change the behaviour of the primal
heuristics (e.g. switch them off once the optimal is found) a false positive could excessively delay finding the optimal
solution and therefore has a much bigger potential of harming performance than a false negative. The parameter ϵ
provides an easy way to navigate this tradeoff, where one could sacrifice some accuracy to keep the rate of false
positives to a minimum.

Figure 3d shows the same experiment but on a mixed dataset. This is, the models were trained and tested on a
benchmark comprised of instances of all three types (in equal proportion). We observe a similar behaviour compared
to the specialized benchmarks. The GNN model C0GN N tends to be too pessimistic, while CϵGN ∗
N
achieves better
accuracy and better false positive rate than the classifiers of Berthold et al. [2]. Using dynamic features further
improves the accuracy of the model.

Finally, we analyze the importance of the dynamic features assigned by the C D classifier (Figure 4). We see that the
four learned models are in fact very different, with the GISP model mostly making decisions based on the gap and the
other three considering all features more uniformly. This speaks in favour of learning on sets of instances of the same
type.

6 Conclusions
In this paper, we presented our methodology for predicting the optimal objective value of MILPs. Compared to the
literature on predicting optimal solutions, our learning task is easier, yet still offers a variety of possibilities for its
application within MILP solvers. Our methods can be used to both predict the optimal objective value and to classify a
feasible solution into optimal or sub-optimal. Our computational study shows that our proposed approach outperforms
the existing approaches in the literature. Further, they provide more flexibility to tune the model into the desired
behaviour. We show that there are benefits to learning a model that specializes to an instance type, yet our model is
still able to generalize well and have superior performance to other methods on mixed instance sets.
These results open the door for many possible applications. In general terms, this prediction can be used to adapt the
behaviour of the different solver components and rules depending on the solving phase. These applications, however,
require further study and will be the subject of future work.

8
1.0 1.0
correct
fp
fn
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
correct
fp
fn
0.0 0.0
majority C est C rank 1 C0GNN C GNN
* CD majority C est C rank 1 C0GNN C GNN
* CD

(a) Set covering (b) Combinatorial auctions

1.0 1.0
correct correct
fp fp
fn fn
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
majority C est C rank 1 C0GNN C GNN
* CD majority C est C rank 1 C0GNN C GNN
* CD

(c) GISP (d) Mixed

Figure 3: Prediction accuracy of the different classifier models. We show the fraction of correctly classified samples
(correct, in purple), the fraction of false positives (fp, dark yellow) and the fraction of false negatives (fn, light yellow).

9
Table 4: Prediction accuracy of the different classifier models. We show the fraction of correctly classified samples,
the fraction of false positives and the fraction of false negatives.
Correct False positives False negatives
Majority 0.89 0.11 0.00
C est 0.91 0.09 0.00
C rank-1 0.92 0.08 0.00
C0GNN 0.52 0.05 0.43
CϵGNN
∗ 0.93 0.04 0.03
CD 0.90 0.05 0.05
Set covering

Correct False positives False negatives

Majority 0.64 0.36 0.00
C est 0.67 0.33 0.00
C rank-1 0.68 0.32 0.00
C0GNN 0.57 0.06 0.37
CϵGNN
∗ 0.72 0.27 0.01
CD 0.84 0.07 0.09
Combinatorial auctions

Correct False positives False negatives

Majority 0.59 0.00 0.41
C est 0.39 0.61 0.00
C rank-1 0.43 0.57 0.00
C0GNN 0.68 0.10 0.22
CϵGNN
∗ 0.69 0.12 0.19
CD 0.77 0.11 0.12
GISP

Correct False positives False negatives

Majority 0.64 0.36 0.00
C est 0.65 0.35 0.00
C rank-1 0.67 0.33 0.00
C0GNN 0.59 0.08 0.34
CϵGNN
∗ 0.73 0.14 0.13
CD 0.77 0.14 0.09
Mixed

10
g(t)

(t)

0.0 0.5 1.00.0 0.5 1.00.0 0.5 1.00.0 0.5 1.0

S. cover C. auctions GISP Mixed
Figure 4: Feature importance of the dynamic models trained to predict phase transition for each of the benchmarks.

11
References
[1] E. Balas and A. Ho. Set covering algorithms using cutting planes, heuristics, and subgradient optimization: a
computational study. In Combinatorial Optimization, pages 37–60. Springer, 1980.
[2] T. Berthold, G. Hendel, and T. Koch. From feasibility to improvement to proof: three phases of solving mixed-
integer programs. Optimization Methods and Software, 33(3):499–517, 2018.
[3] K. Bestuzheva, M. Besançon, W.-K. Chen, A. Chmiela, T. Donkiewicz, J. van Doornmalen, L. Eifler, O. Gaul,
G. Gamrath, A. Gleixner, L. Gottwald, C. Graczyk, K. Halbig, A. Hoen, C. Hojny, R. van der Hulst, T. Koch,
M. Lübbecke, S. J. Maher, F. Matter, E. Mühmer, B. Müller, M. E. Pfetsch, D. Rehfeldt, S. Schlein, F. Schlösser,
F. Serrano, Y. Shinano, B. Sofranac, M. Turner, S. Vigerske, F. Wegscheider, P. Wellner, D. Weninger, and
J. Witzig. The SCIP Optimization Suite 8.0. Technical report, Optimization Online, December 2021. URL
http://www.optimization-online.org/DB_HTML/2021/12/8728.html.
[4] A. Chmiela, E. Khalil, A. Gleixner, A. Lodi, and S. Pokutta. Learning to schedule heuristics in branch and bound.
Advances in Neural Information Processing Systems, 34:24235–24246, 2021.
[5] M. Colombi, R. Mansini, and M. Savelsbergh. The generalized independent set problem: Polyhedral analysis
and solution approaches. European Journal of Operational Research, 260(1):41–55, 2017.
[6] J.-Y. Ding, C. Zhang, L. Shen, S. Li, B. Wang, Y. Xu, and L. Song. Accelerating primal solution findings
for mixed integer programs based on solution prediction. In Proceedings of the AAAI conference on artificial
intelligence, volume 34, pages 1452–1459, 2020.
[7] N. Efthymiou and N. Yorke-Smith. Predicting the optimal period for cyclic hoist scheduling problems. In
Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), volume
13884 of Lecture Notes in Computer Science, pages 238–253. Springer, 2023.
[8] M. Fischetti, A. Lodi, and G. Zarpellon. Learning MILP resolution outcomes before reaching time-limit. In
Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), volume 16,
pages 275–291. Springer, 2019.
[9] M. Gasse, D. Chételat, N. Ferroni, L. Charlin, and A. Lodi. Exact combinatorial optimization with graph convo-
lutional neural networks. Advances in Neural Information Processing Systems, 32, 2019.
[10] Q. Han, L. Yang, Q. Chen, X. Zhou, D. Zhang, A. Wang, R. Sun, and X. Luo. A GNN-guided predict-and-search
framework for mixed-integer linear programming. In International Conference on Learning Representations,
2023. URL https://api.semanticscholar.org/CorpusID:256827203.
[11] G. Hendel, D. Anderson, P. Le Bodic, and M. E. Pfetsch. Estimating the size of branch-and-bound trees. IN-
FORMS Journal on Computing, 34(2):934–952, 2022.
[12] E. B. Khalil, C. Morris, and A. Lodi. MIP-GNN: A data-driven framework for guiding combinatorial solvers.
AAAI, 2022.
[13] P. Kilby, J. Slaney, S. Thiébaux, T. Walsh, et al. Estimating search tree size. In Proceedings of the AAAI
Conference on Artificial Intelligence, 2006.
[14] K. Leyton-Brown, M. Pearson, and Y. Shoham. Towards a universal test suite for combinatorial auction algo-
rithms. In Proceedings of the 2nd ACM conference on Electronic commerce, pages 66–76, 2000.
[15] V. Nair, S. Bartunov, F. Gimeno, I. von Glehn, P. Lichocki, I. Lobov, B. O’Donoghue, N. Sonnerat, C. Tjandraat-
madja, P. Wang, et al. Solving mixed integer programs using neural networks. arXiv preprint arXiv:2012.13349,
2020.
[16] M. B. Paulus, G. Zarpellon, A. Krause, L. Charlin, and C. Maddison. Learning to cut by looking ahead: Cutting
plane selection via imitation learning. In International Conference on Machine Learning, pages 17584–17600.
PMLR, 2022.
[17] L. Scavuzzo. Code for the paper “Learning optimal objective values for MILP”, 2024. https://github.com/
lascavana/ObjValPrediction.
[18] L. Scavuzzo, K. Aardal, A. Lodi, and N. Yorke-Smith. Machine learning augmented branch and bound for mixed
integer linear programming. Mathematical Programming, pages 1–44, 2024.
[19] Y. Shen, Y. Sun, X. Li, A. C. Eberhard, and A. T. Ernst. Adaptive solution prediction for combinato-
rial optimization. European Joural of Operational Research, 309:1392–1408, 2022. URL https://api.
semanticscholar.org/CorpusID:256358882.
[20] N. Sonnerat, P. Wang, I. Ktena, S. Bartunov, and V. Nair. Learning a large neighborhood search algorithm for
mixed integer programs. arXiv preprint arXiv:2107.10201, 2021.

CSE422 Midterm Spring 2022
No ratings yet
CSE422 Midterm Spring 2022
2 pages
Large Language Model-Brained GUI Agents: A Survey
No ratings yet
Large Language Model-Brained GUI Agents: A Survey
78 pages
8 - Prescriptive Analytics
No ratings yet
8 - Prescriptive Analytics
27 pages
Azure Strategy and Implementation Guide: Fourth Edition
No ratings yet
Azure Strategy and Implementation Guide: Fourth Edition
223 pages
Integer Programming PPT Final
No ratings yet
Integer Programming PPT Final
22 pages
Cambridge Primary Computing Learner S Book Stage 1 Sample Pages 9781398368569 Pages 4
No ratings yet
Cambridge Primary Computing Learner S Book Stage 1 Sample Pages 9781398368569 Pages 4
1 page
Operations Research: Integer Programming
No ratings yet
Operations Research: Integer Programming
42 pages
09 Xentry Diagnosis Kit 4-Blf en
No ratings yet
09 Xentry Diagnosis Kit 4-Blf en
117 pages
Cisco Certified Network Associate (200-301 CCNA)
No ratings yet
Cisco Certified Network Associate (200-301 CCNA)
111 pages
C. Coey, M. Lubin Et J. P. Vielma - Outer Approximation With Conic Certificates For Mixed-Integer Convex Problems (2020)
No ratings yet
C. Coey, M. Lubin Et J. P. Vielma - Outer Approximation With Conic Certificates For Mixed-Integer Convex Problems (2020)
45 pages
Game in Vue Framework
100% (1)
Game in Vue Framework
92 pages
Haptic Technology
25% (4)
Haptic Technology
29 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Lecture MILP
No ratings yet
Lecture MILP
27 pages
Lecture 10
No ratings yet
Lecture 10
25 pages
CC Identity Access Management (IAM)
No ratings yet
CC Identity Access Management (IAM)
24 pages
Biobjective Robust Optimization Over The
No ratings yet
Biobjective Robust Optimization Over The
15 pages
Duality For Mixed Integer Linear Program
No ratings yet
Duality For Mixed Integer Linear Program
20 pages
Integer Programming
No ratings yet
Integer Programming
31 pages
Deterministic Problem Linear Programming Nonlinear Programming Integer Programming Network and Graphs Transportation Problem Game Theory
No ratings yet
Deterministic Problem Linear Programming Nonlinear Programming Integer Programming Network and Graphs Transportation Problem Game Theory
47 pages
Usb Pin Assignments
100% (1)
Usb Pin Assignments
4 pages
Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
No ratings yet
Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
23 pages
Machine Learning For Cutting Planes in Integer Programming: A Survey
No ratings yet
Machine Learning For Cutting Planes in Integer Programming: A Survey
9 pages
Optimization - Integer Programming2
No ratings yet
Optimization - Integer Programming2
121 pages
Variable in Interfaces and Extent Interface-2.Pptx-2
No ratings yet
Variable in Interfaces and Extent Interface-2.Pptx-2
8 pages
MILP in Details
No ratings yet
MILP in Details
2 pages
Can LLMs Plan Paths in The Real World?
No ratings yet
Can LLMs Plan Paths in The Real World?
17 pages
Towards Efficient Neurally-Guided Program Induction For ARC-AGI
No ratings yet
Towards Efficient Neurally-Guided Program Induction For ARC-AGI
17 pages
CSTA Standards Crosswalk Template StateDistrictSchoolProduct Standards
No ratings yet
CSTA Standards Crosswalk Template StateDistrictSchoolProduct Standards
21 pages
Robust Offline Reinforcement Learning With Linearly Structured F-Divergence Regularization
No ratings yet
Robust Offline Reinforcement Learning With Linearly Structured F-Divergence Regularization
52 pages
Sarika Yadav Cloud PDF Synopsis
No ratings yet
Sarika Yadav Cloud PDF Synopsis
32 pages
Progress in Linear Programming-Based Algorithms For Integer Programming - Johnson Et Al.
No ratings yet
Progress in Linear Programming-Based Algorithms For Integer Programming - Johnson Et Al.
22 pages
Matrix - WLAN Platforms Software Support Matrix PDF
No ratings yet
Matrix - WLAN Platforms Software Support Matrix PDF
8 pages
#Include #Include #Include Int Main (// Make Two Process Which Run Same // Program After This Instruction Fork Printf ("Hello World!/n") Return 0 )
No ratings yet
#Include #Include #Include Int Main (// Make Two Process Which Run Same // Program After This Instruction Fork Printf ("Hello World!/n") Return 0 )
5 pages
Diffusion Self-Distillation For Zero-Shot Customized Image Generation
No ratings yet
Diffusion Self-Distillation For Zero-Shot Customized Image Generation
22 pages
CS614-Assignment 1 Solution Spring 2024
No ratings yet
CS614-Assignment 1 Solution Spring 2024
4 pages
Weakly Supervised Framework Considering Multi-Temporal Information For Large-Scale Cropland Mapping With Satellite Imagery
No ratings yet
Weakly Supervised Framework Considering Multi-Temporal Information For Large-Scale Cropland Mapping With Satellite Imagery
33 pages
PC Assembly, Hardware Configuration and Servicing Course Description
No ratings yet
PC Assembly, Hardware Configuration and Servicing Course Description
27 pages
SoK: Watermarking For AI-Generated Content
No ratings yet
SoK: Watermarking For AI-Generated Content
28 pages
Thai Financial Domain Adaptation of THaLLE - Technical Report.18242v1
No ratings yet
Thai Financial Domain Adaptation of THaLLE - Technical Report.18242v1
27 pages
Mixed Integer Linear Programming (MILP)
No ratings yet
Mixed Integer Linear Programming (MILP)
31 pages
ViewPower User Maunal
No ratings yet
ViewPower User Maunal
39 pages
Isometry Pursuit
No ratings yet
Isometry Pursuit
18 pages
Sysinfo
No ratings yet
Sysinfo
2 pages
Mixed Integer Linear Programming
No ratings yet
Mixed Integer Linear Programming
42 pages
Certified Training With Branch-and-Bound.18235v1
No ratings yet
Certified Training With Branch-and-Bound.18235v1
16 pages
What Neural Networks Learn Is What Network Designers Say.18343v1
No ratings yet
What Neural Networks Learn Is What Network Designers Say.18343v1
16 pages
Primal Dual
No ratings yet
Primal Dual
4 pages
Lecture Notes (7) : Eme7102 Engineering Research Methodology
No ratings yet
Lecture Notes (7) : Eme7102 Engineering Research Methodology
32 pages
Mixed Integer Linearity in Nonlinear Optimization: A Trust Region Approach
No ratings yet
Mixed Integer Linearity in Nonlinear Optimization: A Trust Region Approach
22 pages
Optimization Part7
No ratings yet
Optimization Part7
38 pages
Topic 5 Integer Programming 5.11.24
No ratings yet
Topic 5 Integer Programming 5.11.24
23 pages
Initialization To Keep SNN Training and Generalization Great With Surrogate-Stable Variance.18250v1
No ratings yet
Initialization To Keep SNN Training and Generalization Great With Surrogate-Stable Variance.18250v1
11 pages
IntegerProgramming BNB
No ratings yet
IntegerProgramming BNB
30 pages
Programare Liniara
No ratings yet
Programare Liniara
20 pages
MONOPOLY: Learning To Price Public Facilities For Revaluing Private Properties With Large-Scale Urban Data
No ratings yet
MONOPOLY: Learning To Price Public Facilities For Revaluing Private Properties With Large-Scale Urban Data
9 pages
OR2 IntegerProgramming
No ratings yet
OR2 IntegerProgramming
18 pages
05 1 Optimization Methods NDP
No ratings yet
05 1 Optimization Methods NDP
85 pages
Learning To Schedule Heuristic
No ratings yet
Learning To Schedule Heuristic
12 pages
Métodes Exactes
No ratings yet
Métodes Exactes
48 pages
OSQP
No ratings yet
OSQP
39 pages
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory.18320v1
No ratings yet
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory.18320v1
6 pages
IP Lecture Notes
No ratings yet
IP Lecture Notes
95 pages
IBM ILOG CPLEX What Is Inside of The Box
No ratings yet
IBM ILOG CPLEX What Is Inside of The Box
72 pages
Functional Relevance Based On The Continuous Shapley Value
No ratings yet
Functional Relevance Based On The Continuous Shapley Value
36 pages
OR2P2 IP Update
No ratings yet
OR2P2 IP Update
29 pages
LLM-ABBA: Understand Time Series Via Symbolic Approximation
No ratings yet
LLM-ABBA: Understand Time Series Via Symbolic Approximation
13 pages
Branch and Bound
No ratings yet
Branch and Bound
8 pages
Adld Ad DD 12 To 19 Q Bank
No ratings yet
Adld Ad DD 12 To 19 Q Bank
40 pages
05 Lecture - ILP-and-duality
No ratings yet
05 Lecture - ILP-and-duality
8 pages
Linear Control Systems Lab Manual
No ratings yet
Linear Control Systems Lab Manual
3 pages
Integer Programming: Adopted From Taha and Other Sources
No ratings yet
Integer Programming: Adopted From Taha and Other Sources
33 pages
Unit 6 (C++) - Arrays
No ratings yet
Unit 6 (C++) - Arrays
91 pages
Python (Back Traking Algorithms)
No ratings yet
Python (Back Traking Algorithms)
6 pages
Integer Programming: Saurabh Chandra
No ratings yet
Integer Programming: Saurabh Chandra
48 pages
1 Algorithms For Linear Programming: 1.1 The Geometry of Lps
No ratings yet
1 Algorithms For Linear Programming: 1.1 The Geometry of Lps
6 pages
ACT Digital Security Guidelines 2019
No ratings yet
ACT Digital Security Guidelines 2019
29 pages
Lec 05
No ratings yet
Lec 05
49 pages
Microsoft Azure Fundametnals - AZ900 Course Outline
No ratings yet
Microsoft Azure Fundametnals - AZ900 Course Outline
11 pages
Mixed-Integer Nonlinear Programming: Optimization
No ratings yet
Mixed-Integer Nonlinear Programming: Optimization
3 pages
6 Integer Slides
0% (1)
6 Integer Slides
16 pages
Hemant Sahu
No ratings yet
Hemant Sahu
3 pages
Integer Programming
No ratings yet
Integer Programming
34 pages
s71500 Cpu1513 1 PN Manual en-US
No ratings yet
s71500 Cpu1513 1 PN Manual en-US
41 pages
Branch and Bound Algorithms - Principles and Examples
No ratings yet
Branch and Bound Algorithms - Principles and Examples
30 pages
LP Relaxation Examples
No ratings yet
LP Relaxation Examples
21 pages
Adaptive Delta Modulation
No ratings yet
Adaptive Delta Modulation
10 pages
Msci603 f2018 6 Introip
No ratings yet
Msci603 f2018 6 Introip
31 pages
A Prlmal Algorithm For Interval Linear-Programming Problems
No ratings yet
A Prlmal Algorithm For Interval Linear-Programming Problems
14 pages
Operations Research
No ratings yet
Operations Research
80 pages
ILP
No ratings yet
ILP
10 pages
Linear Programming Basics
No ratings yet
Linear Programming Basics
13 pages
Branch Bound
No ratings yet
Branch Bound
3 pages
Branch and Bound
No ratings yet
Branch and Bound
4 pages
Lecture Notes On Operations Research 3OR
No ratings yet
Lecture Notes On Operations Research 3OR
51 pages
Optimizatio With Matlab
No ratings yet
Optimizatio With Matlab
49 pages
Oap 2
No ratings yet
Oap 2
8 pages
MB0048 Set 1 & 2
No ratings yet
MB0048 Set 1 & 2
14 pages
Describing The Four Security Layers of The Peoplesoft System (Continued)
No ratings yet
Describing The Four Security Layers of The Peoplesoft System (Continued)
3 pages
Case Study Hospital Management Sysytem
No ratings yet
Case Study Hospital Management Sysytem
2 pages
Toaz - Info History of Architecture Hiraskar PDF PR
No ratings yet
Toaz - Info History of Architecture Hiraskar PDF PR
6 pages
Exploration of LLM Multi-Agent Application Implementation Based On LangGraph+CrewAI.18241v1
No ratings yet
Exploration of LLM Multi-Agent Application Implementation Based On LangGraph+CrewAI.18241v1
3 pages
Introduction To Linear Programming: Algorithmic and Geometric Foundations of Optimization
No ratings yet
Introduction To Linear Programming: Algorithmic and Geometric Foundations of Optimization
28 pages

Learning Optimal Objective Values For MILP.18321v1

Uploaded by

Learning Optimal Objective Values For MILP.18321v1

Uploaded by

L EARNING O PTIMAL O BJECTIVE VALUES FOR MILP

Lara Scavuzzo Karen Aardal Neil Yorke-Smith

if mins∈[0,t] {z̄(s) − ĉmin (s)} < 0

if mins∈[0,t] |R1 (s)| = 0

4.1 Optimal value prediction

4.2 Prediction of phase transition

Gap Following SCIP [3], we define the gap as

This metric was first defined by Kilby et al. [13].

5.1 Experimental Set Up

Benchmark Generation method Configuration

Phase 2b: 25.7%

Phase 2b: 54.8%

(a) Set covering (b) Combinatorial auctions

(a) Set covering (b) Combinatorial auctions

(c) GISP (d) Mixed

Correct False positives False negatives

Correct False positives False negatives

Correct False positives False negatives

0.0 0.5 1.00.0 0.5 1.00.0 0.5 1.00.0 0.5 1.0

You might also like