Machine Learning-Driven Optimization For Solution Space Reduction in The Quadratic Multiple Knapsack Problem
Machine Learning-Driven Optimization For Solution Space Reduction in The Quadratic Multiple Knapsack Problem
ABSTRACT The quadratic multiple knapsack problem (QMKP) is a well-studied problem in operations
research. This problem involves selecting a subset of items that maximizes the linear and quadratic profit
without exceeding a set of capacities for each knapsack. While its solution using metaheuristics has been
explored, exact approaches have recently been investigated. One way to improve the performance of these
exact approaches is by reducing the solution space in different instances, considering the properties of the
items in the context of QMKP. In this paper, machine learning (ML) models are employed to support an exact
optimization solver by predicting the inclusion of items with a certain level of confidence and classifying
them. This approach reduces the solution space for exact solvers, allowing them to tackle more manageable
problems. The methodological process is detailed, in which ML models are generated and the best one is
selected to be used as a preprocessing approach. Finally, we conduct comparison experiments, demonstrating
that using a ML model is highly beneficial for reducing computing times and achieving rapid convergence.
INDEX TERMS Machine learning, combinatorial optimization, knapsack problem, quadratic multiple
knapsack problem.
I. INTRODUCTION Two variants of the KP that have gained interest are the
The knapsack problem (KP) represents a significant chal- quadratic KP (QKP) and the multiple KP (MKP). The QKP
lenge within combinatorial optimization, consisting of the is characterized by seeking the subset of items that not only
optimal selection of a subset of items from a given set, with maximizes a linear profit but also an additional quadratic
the goal of maximizing the total profit of these items while profit derived from selecting pairs of items, all while adhering
adhering to a capacity constraint. Over the years, KP has to a single capacity constraint [2]. On the other hand, the
been a subject of study and continues to be a primary area of MKP involves selecting subsets of items that maximize the
interest in operations research. Recent studies and literature linear profit, but with the variation of applying capacity
reviews, highlighted in Cacchiani et al. [1], underscore constraints to multiple knapsacks simultaneously [3]. Later,
the relevance and the ongoing challenges surrounding this Hiley and Julstrom [4] propose the quadratic MKP (QMKP),
problem. which combines aspects of both problems above, presenting
an even more complex challenge.
QMKP integrates the properties of the QKP and the MKP.
The associate editor coordinating the review of this manuscript and The QMKP seeks to select, from a set of items, the subset
approving it for publication was Christos Anagnostopoulos . that maximizes not only a linear profit but also an additional
2025 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
10638 For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 13, 2025
D. Yáñez-Oyarce et al.: Machine Learning-Driven Optimization for Solution Space Reduction
quadratic profit derived from selecting pairs of items, subject with small instances due to the nature of the approach.
to specific capacity constraints for multiple knapsacks [1]. Subsequently, Galli et al. [6] introduced relaxations and
More formally, let N = {1, 2, . . . , n} be the set of items and polynomial-size formulations for the problem. Recently,
K = {1, 2, . . . , m} be the set of knapsacks. Each item i ∈ N is Fleszar [16] introduced a branch-and-bound algorithm,
assigned a positive linear profit pi , a positive weight wi , and a and Galli et al. [17] presented deterministic Lagrangian
quadratic profit pij associated with the joint selection of item matheuristics to solve the QMKP.
i ∈ N and another item j ∈ N : i < j within the same knapsack. In recent years, machine learning (ML) approaches
Additionally, each knapsack k ∈ K has a maximum capacity have demonstrated promising results in solving various
Ck . Defining xik as the binary variable that takes the value combinatorial optimization problems, including the KP and
of 1 if item i is assigned to knapsack k and 0 otherwise, the its variants. ML provides a data-driven approach to these
binary quadratic programming (BQP) model is described as challenges, offering insights and optimization techniques that
follows: can enhance the performance of algorithms by reducing
X X X XX their solution spaces. Instance preprocessing is a key
maximize pij xik xjk + pi xik (1) technique in combinatorial optimization problems that helps
i∈N \{n} j∈N :i<j k∈K i∈N k∈K
X decrease computing times by eliminating elements that are
subject to: wi xik ≤ Ck ∀k ∈ K (2) known, in advance, to be non-optimal or infeasible. For
X
i∈N example, in the QMKP, there may be items that do not
xik ≤ 1 ∀i ∈ N (3) fit in any knapsack or those with non-positive linear or
k∈K quadratic profits. By reducing the solution space, ML tech-
xik ∈ {0, 1} ∀i ∈ N , k ∈ K (4) niques can help achieve solutions more quickly, thereby
improving convergence in both exact and metaheuristic
The objective function, expressed in equation (1), aims methods.
to maximize both the linear and quadratic profit, which Accelerating the search process within a general-purpose
are associated with the selection of items in the problem. solver can help converge quickly to optimal solutions. Most
Constraints (2) ensure that the capacity of each knapsack traditional general-purpose solvers, such as CPLEX [18] and
is not exceeded. On the other hand, constraints (3) refer to Gurobi [19], incorporate preprocessing before executing an
the cardinality condition, which implies that each item must optimization problem. Clear examples of these techniques
be assigned to exactly one knapsack. Finally, constraints (4) are processes called ‘‘Presolve’’, ‘‘Dual-Presolve’’ reformu-
define the domain of the decision variables. lations, and decompositions. However, in quadratic problems,
QMKP has numerous practical applications, such as convergence is often not fast, and additional processes may be
project management, capital budgeting, assigning workers necessary [6].
to cooperative tasks, selecting ground locations, determining A potential way to reduce the solution space is by
sites for railway stations, and handling packages at airports using ML. This subfield of artificial intelligence focuses
[5], [6], [7]. on enabling computers to learn from previous experiences,
Hiley and Julstrom [4] first introduced the problem using statistical principles and established algorithms [20].
for a team formation application for projects, where a These experiences, commonly referred to as data, are a key
decision-maker must assign people (items) to projects (knap- aspect that has revealed the vast potential of these techniques.
sacks) to maximize total productivity. They presented three Integrating ML techniques with traditional optimization
metaheuristics for solving the QMKP: a genetic algorithm methods promises to offer significant benefits, especially in
(GA), a stochastic hill-climbing algorithm, and a greedy reducing the computing times needed to find a (optimal)
heuristic. Additionally, they presented the first instances in solution.
the literature on which subsequent studies have based their While metaheuristic approaches for solving the QMKP
comparisons using only metaheuristics. have been well-explored, recent efforts have focused on
QMKP has been strongly addressed by metaheuristics. exact methods. A strategy to enhance the performance
Classical metaheuristics such as GA [5] and memetic of the exact approach is to improve the performance of
algorithm [8] have been presented. Other more sophisticated the general-purpose solver, reducing the solution space by
metaheuristics were defined as an iterated greedy approach leveraging item properties specific to QMKP instances.
enhanced with tabu search [9], or iterated responsive To this end, we propose utilizing ML models to aid
threshold search [10], and strategic oscillation algorithm [11]. general-purpose solvers by predicting item inclusion with a
The latest metaheuristics and variants have been presented in certain confidence level and classifying them. This predictive
other articles [12], [13], [14]. approach effectively narrows the solution space, enabling
Since the problem’s start, the state-of-the-art has mostly general-purpose solvers to address small, more manageable
comprised an intensive study of heuristic and metaheuristic subproblems. Thus, we evaluated seven different ML models
methods. However, Bergman [15] introduced exact method and selected the most effective one to serve as a preprocessing
studies 12 years after the problem was first introduced. This step. Comparison experiments demonstrate that integrating
study proposed a branch-and-price algorithm and worked ML significantly reduces computing times and accelerates
convergence, making it a valuable addition to exact optimiza- a k-nearest neighbor (KNN) based algorithm and integrating
tion methods. it with two other metaheuristics.
The remaining sections of this document are organized In [29], a hybrid cuckoo search algorithm with K-means
as follows: Section II reviews the literature on similar was proposed to binarize solutions and solve the set-union
problems and similar solving techniques. In Section III, KP at medium and large scales. This approach includes a
the complete methodology for generating a ML model greedy initialization algorithm and a local search operator.
is presented. In Section IV, we present the approach The role of binarization and local search operators was
to be used: preprocessing and formulation of a mathe- investigated through random operators with different tran-
matical model based on QMKP. Section V presents and sition probabilities. The proposed algorithm improved the
discusses the extensive computational experiments. Finally, results of previous methods in most cases. In [30], ML is
we give concluding remarks and propose future research in combined with GA to solve the MDKP and their approach
Section VI. is competitive and is compared with other approaches.
In [31], a deep learning-based approach is presented to
II. RELATED WORK solve MDKP using asynchronous advantage actor-critic.
ML, a subfield of artificial intelligence, is dedicated to Experiments show that the proposed method performs better
equipping machines with the ability to learn directly from than the greedy algorithm and random solution. Results
data without the need for specific programming for each in random, linear, and quadratic instances demonstrated
task [20]. This approach is based on applying algorithmic that the proposed algorithm is robust. ML algorithms have
and statistical techniques that facilitate interpretation and recently been presented in [32] to discriminate items in the
learning from available information, allowing systems to polynomial robust KP. Two heuristic methods based on the
improve their performance over time autonomously, thanks continuous relaxation of the problem are proposed: one uses
to the accumulation of experiences and interactions with their ML, specifically a RF classifier, and the other employs a GA.
environment [21]. The algorithms used in ML are classified ML algorithms enable systems to generalize from data to
into three main groups according to their methodology and new situations, improving their ability to make predictions or
the type of data they handle. decisions based on past experiences. Studying the potential
Supervised learning involves algorithms that learn to map synergy between ML and classical solvers is crucial for
inputs to outputs based on previously known examples. This integrating new approaches to combinatorial optimization
prepares them to predict outcomes from new data. Notable problems.
examples include support vector machines (SVM), decision
trees (DT), and neural networks (NN) used in classification III. MODEL GENERATION USING ML ALGORITHMS
and regression tasks. NNs have received a significant boost Our approach adheres to the classical knowledge discovery in
from recent advances in deep learning, which remains an databases methodology [33] for generating a ML model. The
ever-important area of research [22], [23]. process is divided into two main tasks:
The use of ML to tackle the KP and its variants has
• Synthetic data generation: synthetic data is created to
grown significantly in recent years. This review focuses
apply supervised learning.
on the literature surrounding ML-based techniques and
• Standard ML process: A model is trained using ML
methodologies.
algorithms for classification. The algorithm with the best
In [24], an approach based on a linkage tree genetic
performance is selected as the final model.
algorithm (LTGA) and Chu and Beasley genetic algorithm
(CBGA) for the multi-dimensional KP (MDKP). LTGA uses
supervised learning to detect problem structures and generate A. DATA GENERATION
new solutions. However, the former could not outperform To generate a model based on ML using supervised learning,
CBGA in the problem instances. In [25], an anytime approach data must be obtained to determine if an item is (potentially)
is presented for automatically selecting algorithms for KP. part of the optimal solution to the problem. In this context,
The predictions are developed through ML algorithms such selecting and generating attributes for an item is vital in our
as random forest (RF), gradient boosting, bagging, and approach to solving the QMKP. Each of these data points
extratrees, with the latter achieving the best performance. can be obtained from instances widely used in the QMKP
The authors suggest that the same framework can be applied literature.
to other engineering problems [26]. In [27], a solver based The features of an item considered for data generation are
on NNs is proposed for the KP. The approach is compared described below:
with other metaheuristics, where the latter performs better. • pi : linear profit of the item i.
However, the generated approach showed better performance • wi : weight of the item i.
when the values were correlated with the weights of each • wi /ck : weight of the item i divided by the capacity of the
item. In [28], an improvement of the binarization framework knapsack k, Ck .
that uses the k-means technique to solve the MDKP is • min(pij ): the minimum quadratic profit of item i with all
developed. The method is based on perturbing the solution in other items j in the instance.
• avg(pij ): the average quadratic profit of item i with all • sum(pij + pi )/(wi /ck ): the sum of the quadratic and
other items j in the instance. linear profit of item i divided by the quotient between the
• max(pij ): the maximum quadratic profit of item i with all weight of the same item and the capacity of the knapsack
other items j in the instance. (ck ).
• zeros(pij ): the number of quadratic profit equal to zero • pi /sum(pi ): the ratio of the linear profit of item i to the
for item i with all other items j in the instance. sum of all linear profit of the items in an instance.
• m: the number of knapsacks that can store items. • wi /sum(wi ): the ratio of the weight of item i to the sum
• d: density of the instance in which the item is framed. of the weights of the items in the instance.
• sum(pij ): the sum of all quadratic profit of item i with • x: this value represents the solution for the linear
all other items in the instance. relaxation of the BQP model for the QMKP. Since it is
• ck − wi : the difference between the capacity of the an MKP, the number of relaxed values corresponds to
knapsack (ck ) and the weight of the item i. the number of knapsacks, which can be interpreted as a
type of probability of being included in each knapsack uses level 1 reformulation linearization technique, as it is
k. necessary not only to obtain the Label values but also the
• max(x): the maximum value of the problem’s relaxation. relaxed values of the variable x, which cannot be obtained
• min(x): the minimum value of the problem’s relaxation. from the relaxation of a quadratic model.
• max(x) − min(x): the difference between the maximum
and minimum values of the problem’s relaxation. B. DATA ANALYSIS
• avg(x): the average of the problem’s relaxation. Once the new instances are generated, the synthetic data
• std(x): the standard deviation of the problem’s relax- from these problem instances must be analyzed. This step is
ation. essential for preparing the training process of the ML models
• C: the capacity of knapsacks. aimed at solving the optimization problem.
• Label: indicates whether the item belongs to the
optimal solution or not. 1) FEATURES ELIMINATION
• σi : it is a score of item i corresponding to an adaptation
Given the features of data, Figure 1 shows the correlation
of the feature proposed by [34], which is only considered
matrix obtained from the generated data. Strong correlations
a linear profit. In this case, the quadratic part of
exist between features such as:
the problem is added. This score is computed using
Equation (5). • pi and pi /sum(pi ).
X • zeros(pij ) and avg(pij ).
plj • zeros(pij ) and d.
p j∈N :j̸ =l
wi
i sum(pij ) and avg(pij ).
σi = X + X X ×nX (5) •
pl pij wl • avg(pij ) and d.
l∈N i∈N \{n} j∈N :j̸ =i l∈N • ck − wi and C.
Each of the parameters was generated from new instances for • sum(pij + pi )/(wi /ck ) and σi .
this experiment. Specifically, the generator by [15] is used • min(x) and avg(x).
to create 10 different instances for n = {20, 25, 30, 35}, K = • max(x) − min(x) and std(x).
{3, 5, 10} with densities d = {0.25, 0.50, 0.75}. Subsequently, To avoid multicollinearity, some features were eliminated
the problem was solved using the model from [6] that using the correlation matrix and expert knowledge of the
FIGURE 3. Histogram of the predictor feature σi before and after logarithmic transformation.
2) OUTLIER ELIMINATION
After reducing the set of features to be used, the next step
is eliminating the out-of-range data. We use the three-sigma
method for this task, establishing a statistical control limit
encompassing up to three standard deviations from the
mean [35]. All data points outside are imputed using the
KNN method [36]. Figure 2 shows the box plots for each
feature under study, highlighting the out-of-range records and
allowing the identification of outliers in each feature. FIGURE 4. Proportion of classes in the final data.
3) DATA TRANSFORMATION
Some ML models, such as logistic regression (LR), do not of the dataset for training ML models. First, synthetic data
rely on distributional assumptions. However, the solution was generated with key attributes, such as profits, weights,
tends to be more stable if the predictor features have a normal and density, reflecting the characteristics of the QMKP.
distribution. Other models, such as SVM, adaptive boosting, This was followed by an exploratory analysis to clean
DT, RF, or multi-layer perceptron (MLP), work effectively the data: redundant features were removed based on their
even with non-normal data [37]. Therefore, since there is correlations, outliers were identified and corrected using
no normality restriction, a logarithmic transformation of the statistical methods and KNN imputation, and the data was
data distributions is performed. A logarithmic transformation normalized through logarithmic transformations to make
was applied to the features: w, wi /ck , max(pij ), avg(pij ), it easier to work with. Afterward, relevant features were
m, ck − wi , avg(pij ), wi /sum(wi ), and σi . Significant effects selected using ANOVA, keeping only those that contributed
in reducing asymmetry are illustrated in Figures 3, where a most to the model’s performance. The remaining features
logarithmic transformation is applied to variable σi , resulting were then scaled using Min-Max Scaler to standardize their
in a shift from an asymmetric distribution to a nearly normal ranges, preparing the dataset for use. Finally, the processed
one. data was split into training and testing subsets to allow for
effective evaluation of the ML models.
4) DATA SUMMARY During training process, the model learns to identify and
The final data set comprises 360 instances, equivalent to understand patterns and relationships in the data. Without
21,800 items, of which 14,584 belong to class 1 (items that this process, the model would not be able to make accurate
are inserted into some knapsack in their respective instance) predictions or decisions [20]. Therefore, various training
and 7,216 belong to class 0 (items that are not inserted into sessions were conducted using different models to generate
any knapsack in their respective instance). Figure 4 shows the and select a ML model. Below is a brief description of each
proportion of class 1 items (items that are part of the solution) model used:
and class 0 items (items that are not part of the solution). • KNN: a supervised learning method that identifies the
k nearest data points in the feature space to the point
C. ML TRAINING of interest and assigns the most common label (in the
The final summary is shown in Figure 5. The data processing case of classification) or the average (in the case of
involved several steps to ensure the quality and relevance regression) of these neighbors [36].
• SVM: a classification method that seeks to find the while ensemble methods like AB and RF enhance robustness
hyperplane that best separates the different classes in the and accuracy through model aggregation. ML was included
feature space. The objective is to maximize the distance for its ability to model intricate non-linear patterns, and
(margin) between the data points of the classes [38]. SVM was selected for its efficacy in high-dimensional
• DT: it is a decision support hierarchical model predictive spaces.
that iteratively divides the feature space into subsets
based on simple conditions [39]. 1) TRAINING PROCESS
• AdaBoost (AB): an ensemble method that combines Once the data was obtained and pre-processed in the previous
multiple weak classifiers (typically, single-level deci- section, the next step was to scale the data. It is crucial to
sion trees called stumps) to create a strong classifier. scale the data for the techniques mentioned earlier, as many
It assigns higher weights to misclassified examples ML algorithms, such as KNN and SVM, are sensitive to the
and trains subsequent classifiers to focus on those magnitudes of the features. The scale of the data influences
examples [40]. the distance and decision boundaries, directly impacting the
• RF: this method is an ensemble model that constructs model’s accuracy and performance. Therefore, the MaxMin
multiple decision trees during training and outputs the Scaler [37] was applied to each feature, except for the
class that is the mode (or the average in the case of label.
regression) of the classes of the individual trees [41]. Subsequently, the training process begins. For this, differ-
• MLP: this method is an artificial NN consisting of ent hyperparameters were set in order to conduct a search
at least three layers of nodes, an input layer, one within feasible spaces for each of the models. The details of
or more hidden layers, and an output layer. It uses the hyperparameters can be found in Table 1. To find the best
deep learning techniques to model complex non-linear parameters, an exhaustive grid search was performed using
relationships [20]. Python’s scikit-learn library [43]. This process systematically
• LR: this method is a classification algorithm that models evaluates a predefined set of hyperparameters to find the
the probability that an instance belongs to a particular best combination for optimizing model performance. The
class. It uses the sigmoid function to transform a linear main purpose of an exhaustive grid search is to find the best
combination of input features into a probability [42]. combination of hyperparameters that optimizes the model’s
performance. Finally, the training uses the selected features
and the parameters found. The total data is divided into 70%
for training and 30% for testing.
features. For each subset of features, a classification model to the test set. Although there are performance differences
is trained on the training data and evaluated on the test among the classification models, all have metrics above
data. The model’s accuracy is calculated, and the process 90%. RF stands out slightly with the highest accuracy
continues until all possible feature subsets have been at 92.03%, while DT achieves a comparable accuracy of
evaluated. The best subset of features is identified as the 91.93%. In terms of precision, RF leads with 93.70%,
one that produces the highest accuracy on the test data. The followed closely by MLP at 93.33% and DT at 93.69%. KNN
selected feature subsets for each ML model are presented in shows the lowest precision within this close range, at 93.04%.
Table 2. Regarding sensitivity (recall), MLP and KNN perform the
best, with values of 94.93% and 94.91%, respectively, while
TABLE 2. Features used to build ML models. DT has the lowest sensitivity at 92.29%, significantly lower
than the rest of the models.
V. COMPUTATIONAL EXPERIMENTS
selected items in Ss and the non-selected items in Sns . Finally, All experiments in this study were conducted on a 13th-
constraints (11) define the domain of the decision variables. generation Intel I9-13900KF CPU with 64 GB of RAM.
X X X X X The experiments were performed sequentially, meaning each
maximize pij xik xjk + pi xik
experiment was executed after the previous one had finished,
i∈{Ss ∪Sr } j∈{Ss ∪Sr }:i<j k∈K i∈{Ss ∪Sr } k∈K
(6) using a single CPU core to compare the results in terms
X of optimal gap and computing time without biasing the
subject to: wi xik ≤ Ck ∀k ∈ K (7) results due to resource allocation. The operating system is
i∈{Ss ∪Sr }
X GNU/Linux Ubuntu 22.04 LTS, and the general-purpose
xik ≤ 1 ∀i ∈ Sr (8) solver used is Gurobi 10.0.3. The programming language
k∈K is Python 3.12, utilizing the libraries pandas, numpy, scikit-
X learn, statistics, matplotlib, seaborn, and joblib.
xik = 1 ∀i ∈ Ss (9)
The presentation of the results can be divided into two
k∈K
X parts: validation on a subset of training instances to measure
xik = 0 ∀i ∈ Sns (10) the actual error for all instances and all generated ML models.
k∈K Validation on instances from the literature
xik ∈ {0, 1} ∀i ∈ N , ∀k ∈ K (11) P using the instances
from Bergman paper [15] with Ck = (( i∈N wi )/m) × 0.6.
The computational complexity of both models may appear
similar; however, differences arise when utilizing subsets A. RESULTS WITH TRAINING INSTANCES
generated by ML. The quadratic model ((1)-(4)) depends Extensive computational experiments were conducted with
on the number of binary decision variables and constraints. seven ML models: AB, DT, KNN, MLP, SVM, RF, and LR.
Since the accuracy, precision, and recall metrics were quite classifiers [44]. As shown in Figures 8, 9, and 10, the critical
similar across these models. Tables 4, 5, and 6 present the difference diagrams revealed a significant difference, with
results for different knapsack sizes with m = 3, m = 5, and AB emerging as the best algorithm across the 160 instances
m = 10. The first and second columns indicate the instance with m = 3, m = 5, and m = 10, respectively.
set for each size n and each density d, reflecting the average
of ten instances executed per approach. The following seven
groups of two columns correspond to the seven ML models,
providing data on the percentage gap (gap) between the
best cost found by the general-purpose solver at the end of
the solving process and the best known solution (BKS) as
well as the corresponding computing time (in seconds). The
BKS is calculated as the maximum cost the general-purpose FIGURE 8. Critical difference diagram of training instances with m = 3.
solver obtains for each instance. Moreover, each instance was
executed with a time limit of one hour.
In comparing ML models on the training instances with
m = 3, differences are observed for each n in Table 4.
AB consistently achieved the best results across all instances
for n = 20, with RF showing nearly double the gap. For
n = 25, AB maintained its superior performance, followed
closely by DT. When n = 30, AB again outperformed the
other models, with RF in second place. Finally, for n = 35, FIGURE 9. Critical difference diagram of training instances with m = 5.
AB produced the best results with a gap of 0.36%, followed
by KNN, which had a more than three times larger gap.
Overall, AB demonstrated the best performance with a 0.39%
gap, followed by RF with 0.97%.
In comparing ML models on the training instances with
m = 5, differences are observed for each n in Table 5.
AB consistently achieved the best results across all instances
for each n, followed by LR in second place consistently.
Overall, AB demonstrated the best performance with a 0.32% FIGURE 10. Critical difference diagram of training instances with m = 10.
gap, followed by LR with double the gap, 0.64%.
When comparing the ML models on the training instances
with m = 10, very similar results are observed for each n, B. RESULTS WITH INSTANCES FROM LITERATURE
as shown in Table 6. For n = 20, AB and six other models AB was selected as the best classifier for our approach from
(except SVM) consistently achieved the same performance the testing process with all models. For this, a comparison was
across all instances. For n = 25, LR delivered the best perfor- made between AB along with the BQP (AB-BQP) model and
mance, followed closely by AB. At n = 30, AB outperformed the original BQP model using the Bergman instances in [15]
with a homogeneous capacity equal to C = (( Ni=1 wi )/m) ×
P
the other models, with LR in second place. Finally, for n = 35,
AB again produced the best results, followed by LR. Overall, 0.6.
AB demonstrated the best performance with a 20.07% gap, Tables 7, 8, and 9 present the results between AB-BQP
closely followed by LR with 20.14%. model and the original BQP model for different knapsack
Several nonparametric statistical methods were employed sizes with m = 3, m = 5, and m = 10. The first and
to analyze the ML models’ results. Following the rec- second columns indicate the instance set for each size n
ommendations of Demšar [44], we applied the Friedman and each density d, reflecting the average of five instances
test [45] to assess and reject the null hypothesis with executed per approach. AB-BQP and BQP models provide
at least 95% confidence. After establishing a statistically data on the percentage gap (gap) between the best cost
significant difference in the ML model’s performance, found by the general-purpose solver at the end of the solving
we proceeded with the pairwise posthoc analysis suggested process and the BKS and the corresponding computing
by Benavoli et al. [46], using the Wilcoxon signed-rank time (in seconds). The BKS is calculated as the maximum
test [47] along with Holm’s alpha correction [48], [49]. cost the general-purpose solver obtains for each instance.
In these analyses, a lower rank (positioned further to the left) Moreover, each instance was executed with a time limit of one
indicates better performance of a model relative to the others, hour.
based on the gap metric. In the critical difference diagrams, Table 7 shows the results for the Bergman instances with
a connecting line between models signifies no statistically m = 3. In general terms, using AB-BQP is faster in most
significant difference in performance among them according cases, sacrificing convergence to the optimum by less than
to the Friedman test, which compares the ranks of multiple 1%. There are no significant performance differences for
n = 20 and n = 25. AB-BQP proves to be faster for n = 25. minimal compared to the gain in computing time, especially
Finally, with n = 35, the performances are very similar. for the n = 25 instances, where the computing time is almost
The overall average shows that the computing time drops to halved.
150 seconds with AB-BQP. Table 9 compares the BQP and AB-BQP approaches for
Table 8 presents a comparison between the BQP and the Bergman instances with m = 10. AB-BQP is significantly
AB-BQP models for the Bergman instances with m = faster than the basic model for this group of instances. For
5. In general, AB-BQP reduces computing time for n = n = 20 and n = 25, the differences are minimal in both
{20, 25, 30}. However, for n = 35, the times are very similar, gap and computing times, with an average difference of
as in both approaches, the solvers reach the time limit of around 1 second for each group and all gaps close to 0.0.
3,600 seconds. Regarding solution quality, the gap remains However, the computing times are different for n = 30 and
close to 0 in most instances, with an average difference of n = 35, almost 60 seconds using the BQP model for n = 30,
0.4% between BQP and AB-BQP. This difference in quality is compared to 15 seconds with AB-BQP for the same instance.
TABLE 7. Comparison between AB-BQP model and BQP model for TABLE 8. Comparison between AB-BQP model and BQP model for
m = 3 for Bergman instances. m = 5 for Bergman instances.
TABLE 9. Comparison between AB-BQP model and BQP model for instances. For the Bergman instances, the optimality sacrifice
m = 10 for Bergman instances.
is less than 1%. Furthermore, the computing times are
consistently lower than BQPs across all average comparisons
conducted for these instances.
For future work, we propose applying the AB-based
model to larger instances typically used in metaheuristic
approaches. Evaluating the model’s performance on these
new instances and integrating it with metaheuristics could
yield substantial benefits. Additionally, we suggest enhancing
these ML models by incorporating repair heuristics. This
would allow not only for the assessment of the probability of
an item belonging to a class but also the development of local
search operators that refine the solution, all while maintaining
the computational efficiency of the algorithms.
Possible extensions for other knapsack-related problems
can be addressed using the same methodology. Specifically,
problems such as the QKP with setup [50] or the QKP with
conflicts and balance constraints [51] could be tackled. The
first step would be to identify the features to be generated that
allow computing times to be reduced for each case, as well as
conduct the corresponding exploratory data analyses.
Another interesting alternative is integrating our approach
with recent automated algorithm design techniques [52],
VI. CONCLUSION which create specialized algorithms by combining heuristic
This paper presented an approach to reduce the number components with exact methods. Our approach could signifi-
of items in the QMKP and subsequently optimize them cantly reduce the solution space for challenging problems like
using an optimizer. First, the methodology and characteristics the QMKP through confident predictions of item inclusion.
considered in this experiment are detailed. Exploratory This reduction enables automated algorithm techniques to
data analysis was performed, and redundant features were focus their exact solvers on more manageable subsets,
removed. Finally, with the data correctly consolidated, the thereby achieving faster convergence. This combination of
training of the models, KNN, SVM, DT, AB, RF, LR, and approaches could design specialized algorithms tailored to
MLP, is carried out to classify each item and obtain the reduced instances, further enhancing efficiency and perfor-
probability associated with each classification. mance.
The proposed process’s novelty lies in its ability to
predict the optimal selection of items for the QMKP. ACKNOWLEDGMENT
From the solver’s perspective, this involves accurately The authors would like to thank the anonymous reviewers
forecasting whether a binary decision variable assumes a for their invaluable comments and suggestions, which have
value of 1 or 0. This task is challenging in quadratic improved their work.
optimization problems, where discarding items is non-trivial
due to the complex interdependencies among variables and REFERENCES
becomes increasingly difficult as the number of items grows [1] V. Cacchiani, M. Iori, A. Locatelli, and S. Martello, ‘‘Knapsack
problems—An overview of recent advances. Part II: Multiple, multidimen-
significantly. sional, and quadratic knapsack problems,’’ Comput. Oper. Res., vol. 143,
The trained models show very similar results in the Jul. 2022, Art. no. 105693.
measure metrics, as the accuracy, precision, and recall for [2] D. Pisinger, ‘‘The quadratic knapsack problem—A survey,’’ Discrete Appl.
Math., vol. 155, no. 5, pp. 623–648, Mar. 2007.
each one are above 90%. Therefore, all the models obtained [3] S. Martello and P. Toth, ‘‘Algorithms for knapsack problems,’’ North-
were used within the proposed approach, although AB shows Holland Math. Stud., vol. 132, pp. 213–257, Jan. 1987.
a significant difference in the statistical tests against all the [4] A. Hiley and B. A. Julstrom, ‘‘The quadratic multiple knapsack problem
ML models. and three heuristic approaches to it,’’ in Proc. 8th Annu. Conf. Genetic Evol.
Comput., Jul. 2006, pp. 547–552.
Finally, AB was used with instances from the literature and [5] A. Singh and A. S. Baghel, ‘‘A new grouping genetic algorithm for
compared to the BQP model. It was confirmed that using AB the quadratic multiple knapsack problem,’’ in Proc. 7th Eur. Conf.
reduces computing times. For smaller instances, the use of Evol. Comput. Combinat. Optim., Valencia, Spain. Cham, Switzerland:
Springer, Jan. 2007, pp. 210–218.
ML is not recommended. However, as the number of items [6] L. Galli, S. Martello, C. Rey, and P. Toth, ‘‘Polynomial-size formulations
increases, the computing times start to differ drastically, and and relaxations for the quadratic multiple knapsack problem,’’ Eur. J. Oper.
the gap looks pretty similar. Res., vol. 291, no. 3, pp. 871–882, Jun. 2021.
[7] T. Saraç and A. Sipahioğlu, ‘‘A genetic algorithm for the quadratic multiple
Regarding computing times, the ML model integrated with knapsack problem,’’ in Proc. Int. Symp. Brain, Vis., Artif. Intell., Naples,
BQP demonstrates lower computing times in most groups of Italy, Sep. 2007, pp. 490–498.
[8] S.-M. Soak and S.-W. Lee, ‘‘A memetic algorithm for the quadratic [32] A. Baldo, M. Boffa, L. Cascioli, E. Fadda, C. Lanza, and A. Ravera, ‘‘The
multiple container packing problem,’’ Appl. Intell., vol. 36, no. 1, polynomial robust knapsack problem,’’ Eur. J. Oper. Res., vol. 305, no. 3,
pp. 119–135, Jan. 2012. pp. 1424–1434, Mar. 2023.
[9] J. Qin, X. Xu, Q. Wu, and T. C. E. Cheng, ‘‘Hybridization of Tabu [33] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, From Data Mining to
search with feasible and infeasible local searches for the quadratic multiple Knowledge Discovery: An Overview. Washington, DC, USA: American
knapsack problem,’’ Comput. Oper. Res., vol. 66, pp. 199–214, Feb. 2016. Association for Artificial Intelligence, 1996, pp. 1–34.
[10] Y. Chen and J.-K. Hao, ‘‘Iterated responsive threshold search for the [34] S. Senyu, ‘‘An approach to linear programming with 0-1 variables,’’
quadratic multiple knapsack problem,’’ Ann. Oper. Res., vol. 226, no. 1, Manage. Sci., vol. 15, pp. B196–B207, Jan. 1967.
pp. 101–131, Mar. 2015. [35] D. C. Montgomery, Introduction to Statistical Quality Control. Hoboken,
[11] C. García-Martínez, F. Glover, F. J. Rodriguez, M. Lozano, and R. NJ, USA: Wiley, 2019.
Martí, ‘‘Strategic oscillation for the quadratic multiple knapsack problem,’’ [36] T. M. Cover and P. E. Hart, ‘‘Nearest neighbor pattern classification,’’ IEEE
Comput. Optim. Appl., vol. 58, no. 1, pp. 161–185, May 2014. Trans. Inf. Theory, vol. IT-13, no. 1, pp. 21–27, Jan. 1967.
[12] Y. Chen, J.-K. Hao, and F. Glover, ‘‘An evolutionary path relinking [37] T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, The Elements
approach for the quadratic multiple knapsack problem,’’ Knowl.-Based of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2.
Syst., vol. 92, pp. 23–34, Jan. 2016. Cham, Switzerland: Springer, 2009.
[13] T. Tlili, H. Yahyaoui, and S. Krichen, ‘‘An iterated variable neighborhood [38] C. Cortes and V. Vapnik, ‘‘Support-vector networks,’’ Mach. Learn.,
descent hyperheuristic for the quadratic multiple knapsack problem,’’ in vol. 20, no. 3, pp. 273–297, Sep. 1995.
Studies in Computational Intelligence. Cham, Switzerland: Springer, 2016, [39] D. von Winterfeldt and W. Edwards, Decision Analysis and Behavioral
pp. 245–251. Research. Cambridge, U.K.: Cambridge Univ. Press, 1986.
[14] M. Aïder, O. Gacem, and M. Hifi, ‘‘Branch and solve strategies-based [40] Y. Freund and R. E. Schapire, ‘‘A desicion-theoretic generalization of on-
algorithm for the quadratic multiple knapsack problem,’’ J. Oper. Res. Soc., line learning and an application to boosting,’’ in Computational Learning
vol. 73, no. 3, pp. 540–557, Mar. 2022. Theory, P. Vitányi, Ed., Berlin, Germany: Springer, 1995, pp. 23–37.
[15] D. Bergman, ‘‘An exact algorithm for the quadratic multiknapsack problem [41] T. K. Ho, ‘‘Random decision forests,’’ in Proc. 3rd Int. Conf. Document
with an application to event seating,’’ INFORMS J. Comput., vol. 31, no. 3, Anal. Recognit., vol. 1, 1995, pp. 278–282.
pp. 477–492, Jul. 2019. [42] D. R. Cox, ‘‘The regression analysis of binary sequences,’’ J. Roy. Stat.
[16] K. Fleszar, ‘‘A branch-and-bound algorithm for the quadratic multiple Soc. Ser. B, Stat. Methodol., vol. 20, no. 2, pp. 215–232, Jul. 1958.
knapsack problem,’’ Eur. J. Oper. Res., vol. 298, no. 1, pp. 89–98, [43] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel,
Apr. 2022. V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J.
[17] L. Galli, S. Martello, C. Rey, and P. Toth, ‘‘Lagrangian matheuristics for Vanderplas, A. Joly, B. Holt, and G. Varoquaux, ‘‘API design for machine
the quadratic multiple knapsack problem,’’ Discrete Appl. Math., vol. 335, learning software: Experiences from the scikit-learn project,’’ 2013,
pp. 36–51, Aug. 2023. arXiv:1309.0238.
[18] IBM. IBM ILOG CPLEX Optimization Studio. Accessed: Jul. 30, 2024. [44] J. Demšar, ‘‘Statistical comparisons of classifiers over multiple data sets,’’
[Online]. Available: https://www.ibm.com/es-es/products/ilog-cplex- J. Mach. Learn. Res., vol. 7, no. 1, pp. 1–30, Dec. 2006.
optimization-studio [45] M. Friedman, ‘‘A comparison of alternative tests of significance for the
[19] Gurobi Optimization, LLC. Gurobi Optimizer. Accessed: Jul. 30, 2024. problem of m rankings,’’ Ann. Math. Statist., vol. 11, no. 1, pp. 86–92,
[Online]. Available: https://www.gurobi.com/. Mar. 1940.
[20] C. M. Bishop and N. M. Nasrabadi, Pattern Recognition and Machine [46] A. Benavoli, G. Corani, and F. Mangili, ‘‘Should we really use post-hoc
Learning, vol. 4. Cham, Switzerland: Springer, 2006. tests based on mean-ranks?’’ J. Mach. Learn. Res., vol. 17, no. 5, pp. 1–10,
2016.
[21] F. Chollet, Deep Learning With Python. New York, NY, USA: Simon and
[47] F. Wilcoxon, ‘‘Individual comparisons by ranking methods,’’ Biometrics
Schuster, 2021.
Bull., vol. 1, no. 6, p. 80, Dec. 1945.
[22] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
[48] S. Holm, ‘‘A simple sequentially rejective multiple test procedure,’’
MA, USA: MIT Press, 2016.
Scandin. J. Statist., vol. 6, pp. 65–70, Jan. 1979.
[23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
[49] S. García and F. Herrera, ‘‘An extension on ‘statistical comparisons of
Ł. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv.
classifiers over multiple data sets’ for all pairwise comparisons,’’ J. Mach.
Neural Inf. Process. Syst., vol. 30, Jun. 2017, pp. 5998–6008.
Learn. Res., vol. 9, no. 89, pp. 2677–2694, 2008.
[24] J. P. Martins, C. M. Fonseca, and A. C. B. Delbem, ‘‘On the performance
[50] L. Galli, S. Martello, C. Rey, and P. Toth, ‘‘The quadratic knapsack problem
of linkage-tree genetic algorithms for the multidimensional knapsack
with setup,’’ Comput. Oper. Res., vol. 173, Jan. 2025, Art. no. 106873.
problem,’’ Neurocomputing, vol. 146, pp. 17–29, Dec. 2014.
[51] P. Olivier, A. Lodi, and G. Pesant, ‘‘The quadratic multiknapsack problem
[25] I. I. Huerta, D. A. Neira, D. A. Ortega, V. Varas, J. Godoy, and
with conflicts and balance constraints,’’ INFORMS J. Comput., vol. 33,
R. Asín-Achá, ‘‘Anytime automatic algorithm selection for knapsack,’’
no. 3, pp. 949–962, Jul. 2021.
Expert Syst. Appl., vol. 158, Nov. 2020, Art. no. 113613.
[52] N. Acevedo, C. Rey, C. Contreras-Bolton, and V. Parada, ‘‘Automatic
[26] I. I. Huerta, D. A. Neira, D. A. Ortega, V. Varas, J. Godoy, and
design of specialized algorithms for the binary knapsack problem,’’ Expert
R. Asín-Achá, ‘‘Improving the state-of-the-art in the traveling salesman
Syst. Appl., vol. 141, Sep. 2019, Art. no. 112908.
problem: An anytime automatic algorithm selection,’’ Expert Syst. Appl.,
vol. 187, Jan. 2022, Art. no. 115948.
[27] H. A. A. Nomer, K. A. Alnowibet, A. Elsayed, and A. W. Mohamed,
‘‘Neural knapsack: A neural network based solver for the knapsack
problem,’’ IEEE Access, vol. 8, pp. 224200–224210, 2020.
[28] J. García, E. Lalla-Ruiz, S. Voß, and E. L. Droguett, ‘‘Enhancing a machine
learning binarization framework by perturbation operators: Analysis on
the multidimensional knapsack problem,’’ Int. J. Mach. Learn. Cybern.,
vol. 11, no. 9, pp. 1951–1970, Sep. 2020.
[29] J. García, J. Lemus-Romani, F. Altimiras, B. Crawford, R. Soto,
M. Becerra-Rozas, P. Moraga, A. P. Becerra, A. P. Fritz, J.-M. Rubio, DIEGO YÁÑEZ-OYARCE received the B.Sc.
and G. Astorga, ‘‘A binary machine learning cuckoo search algorithm degree in industrial engineering and the M.Sc.
improved by a local search operator for the set-union knapsack problem,’’ degree in industrial engineering from University
Mathematics, vol. 9, no. 20, p. 2611, Oct. 2021. of Bío-Bío, Chile, in 2023 and 2024, respectively.
[30] A. Rezoug, M. Bader-El-Den, and D. Boughaci, ‘‘Application of He is currently pursuing the M.Sc. degree in
supervised machine learning methods on the multidimensional knapsack computational engineering and intelligent systems
problem,’’ Neural Process. Lett., vol. 54, no. 2, pp. 871–890, Apr. 2022. with the University of the Basque Country.
[31] G. Sur, S. Y. Ryu, J. Kim, and H. Lim, ‘‘A deep reinforcement learning- His research interests include machine learning,
based scheme for solving multiple knapsack problems,’’ Appl. Sci., vol. 12, predictive analytics, computer vision, and opti-
no. 6, p. 3068, Mar. 2022. mization techniques.
CARLOS CONTRERAS-BOLTON received the CARLOS REY received the master’s degree in
B.Sc. degree in computer science and the M.Sc. computer engineering from the University of
degree in computer science engineering from Santiago, Chile, in 2014, and the Ph.D. degree
the Universidad de Santiago de Chile, and the from the University of Bologna, Italy, in 2022.
Ph.D. degree in biomedical, electrical, and systems Since his early career, he has contributed to the
engineering from the University of Bologna, Italy. field of automatic algorithm generation, enhancing
He is currently an Assistant Professor with the this line of research with various approaches
Department of Industrial Engineering, Universi- to date. His Ph.D. focused primarily on opera-
dad de Concepción, Chile. His research interests tions research, addressing routing problems, and
include exact, heuristic, and hybrid algorithms to knapsack-related issues.
solve combinatorial optimization problems, such as traveling salesman
problems, vehicle routing problems, and graph problems.