The Min-Knapsack Problem With Compactness Constraints and Applications in Statistics - A. Santini
The Min-Knapsack Problem With Compactness Constraints and Applications in Statistics - A. Santini
a r t i c l e i n f o a b s t r a c t
Article history: In the min-Knapsack problem, one is given a set of items, each having a certain cost and weight. The
Received 9 January 2023 objective is to select a subset with minimum cost, such that the sum of the weights is not smaller than
Accepted 14 July 2023
a given constant. In this paper, we introduce an extension of the min-Knapsack problem with additional
Available online 20 July 2023
“compactness constraints” (mKPC), stating that selected items cannot lie too far apart. This extension has
Keywords: applications in statistics, including in algorithms for change-point detection in time series. We propose
Cutting three solution methods for the mKPC. The first two methods use the same Mixed-Integer Programming
Knapsack problems (MIP) formulation but with two different approaches: passing the complete model with a quadratic num-
Applications in statistics ber of constraints to a black-box MIP solver or dynamically separating the constraints using a branch-
Dynamic programming and-cut algorithm. Numerical experiments highlight the advantages of this dynamic separation. The third
approach is a dynamic programming labelling algorithm. Finally, we focus on the particular case of the
unit-cost mKPC (1c-mKPC), which has a specific interpretation in the context of the statistical applications
mentioned above. We prove that the 1c-mKPC is solvable in polynomial time with a different ad-hoc dy-
namic programming algorithm. Experimental results show that this algorithm vastly outperforms both
generic approaches for the mKPC and a simple greedy heuristic from the literature.
© 2023 The Author(s). Published by Elsevier B.V.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
https://doi.org/10.1016/j.ejor.2023.07.020
0377-2217/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
Fig. 1. Comparison of the solutions of the min-Knapsack problem and the mKPC on the same instance. Parameter = 2.
depicted in Fig. 1(a), has total cost 12. However, it violates com- ble set. For example, a time instant corresponding to an external
pactness constraints: items 8 and 12 (with distance 4 > 2) are shock might cost less in terms of parsimony compared to a time
both selected, but no other item between them is selected. Indeed, instant when no such shock occurred. Therefore, one can associate
an optimal solution of the mKPC has a cost of 13, as shown in to each time point j a scaling factor c j and minimise the sum
Fig. 1(b). of these factors. On the other hand, when no such information is
present, one can just set c j = 1 for all time instants. As we will see
1.1. Motivation in Section 2.2, using a unitary scaling factor decidedly simplifies
the problem. In the rest of this explanation, we will consider, for
The motivation for the mKPC comes from applications in statis- simplicity, this unit-cost case.
tics. In the following, we give a detailed example from change- The most straightforward method to build the credible set is
point detection in time series. perhaps to follow a greedy approach which inserts points by de-
A time series is a sequence of numerical values indexed by dis- creasing value of probability until the desired threshold q is met.
crete time points (Hamilton, 1994). Given a time series y1 , . . . , yn , This criterion was used, for example, by Wang, Sarkar, Carbonetto,
the objective of change-point detection is to identify whether the & Stephens (2020, Supplementary Data, Section A.3). Using this ap-
underlying probability distribution of y changes, how many times proach, however, can result in a situation in which time points be-
it does so, and at which time points. Typical change-points for time longing to different change points end up in the same credible set.
series occur when the time series changes its expected value (see Fig. 4 exemplifies this concept. The points highlighted in yellow in
Fig. 2a), its variance (see Fig. 2b), or both. Change-point detection the bottom chart are included in the same credible set, but they
has important applications (Aminikhanghahi & Cook, 2017). Among are not all associated with the first change point.
the most prominent ones are those in healthcare, e.g, to detect To overcome this problem, one must then consider the com-
changes in patient conditions (Yang, Dumont, & Ansermino, 2006); pactness of the credible set: because each set should identify a
in climatology, e.g., to detect climate change (Reeves, Chen, Wang, single change point, its elements should be “compact” and, ide-
Lund, & Lu, 2007); in econometrics, e.g., to detect warning signs of ally, distributed tightly around the real (unknown) change point.
a crisis (Kim, Oh, Sohn, & Hwang, 2004); in signal processing, e.g., This objective can be achieved via compactness constraints. Indeed,
to detect changes in recorded images (Radke, Andra, Al-Kofahi, & once the value of parameter is fixed (usually to a small number
Roysam, 2005). such as 2 or 3), the problem of producing the most parsimonious
Cappello & Madrid Padilla (2022) introduced a state-of-the-art credible set becomes our mKPC, in which the probability values
method, named PRISCA, for detecting changes in the variance of a associated to each time point take the role of the weights. Fig. 5
Gaussian time series. They propose an iterative method which at- shows how including compactness leads to a better credible set
tempts to identify one change point at each iteration. As Fig. 2b construction.
shows, however, a method identifying one time point for each
change point does not give results which are easy to interpret
because there is often considerable uncertainty about when the
2. Formal definition
change takes place. In the figure, this uncertainty is represented by
the wide shaded areas. Therefore, at each iteration, PRISCA builds
In this section, we give a formal definition of the mKPC by
a discrete probability distribution over {1, . . . , n}, associating each
means of an integer programming model, and we discuss the com-
time point with the probability that it is a change point. An ex-
plexity of the mKPC and of the unit-cost mKPC (1c-mKPC). As
ample distribution relative to the first change point is depicted in
mentioned in Section 1, in fact, the mKPC is N P -complete. In
Fig. 3. The height of the bars in the bottom chart corresponds to
Section 2.2, however, we prove that the 1c-mKPC is solvable in
the probability associated with each time point.
polynomial time.
Next, it identifies a level-q credible set, i.e., a subset of {1, . . . , n}
in which the sum of the probabilities is at least q (for a given
threshold q ∈ [0, 1]). For example, a level-0.95 credible set corre-
sponds to a 95% probability that the set contains the change point. 2.1. Mathematical model
Following a criterion of parsimony, it is desirable that the cred-
ible set contains as few elements as possible. Not all time points, We can formulate the m-KPC as the following integer program,
however, must carry the same penalty if included in the credi- in which binary variable x j takes value 1 iff the jth item is se-
386
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
Fig. 2. Example time series which change their expected value and variance. Black points indicate the time series values yt . Shaded areas represent periods where, qualita-
tively, an analyst would expect a change point.
lected: user has no prior knowledge of which time instants of a time se-
n ries are more likely to be change points. The following theorem
min c jx j (1) establishes a strong result about the 1c-mKPC: namely, that it can
j=1 be solved in polynomial time.
387
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
Fig. 3. Probabilities associated with each time point and representing how likely the point is to be the first change point of the time series.
index element. If that were not the case, in fact, the compactness ticular, it extends the min-Knapsack problem by introducing com-
constraint would be violated. pactness constraints. For the earliest results on the min-Knapsack
Finally, to know whether there is a subset of {1, . . . , n} of size problem in English, we refer the reader to the seminal work of
at most t such that its elements have weight at least c and that Csirik et al. (1991); for earlier works in Russian see, e.g., Babat
satisfies compactness constraints, we must check that (1975).
The special structure of compactness constraints can be repre-
min ∈ {1, . . . , n} | ∃i ∈ {, . . . , n} s.t. W (i, ) ≥ q ≤ t. (6)
sented by a graph G = (V, E ) in which each item i corresponds to
We now analyse the complexity of the above algorithm to con- a vertex vi ∈ V , and an edge {vi , v j } ∈ E is defined for each pair of
clude that it runs in polynomial time in the instance size n. Indeed, vertices vi and v j , i < j, such that j − i < . The mKPC asks to se-
table W has size O(n2 ) and we derive the worst-case complexity of lect a subset of V inducing a connected subgraph, such that the
computing an entry. To compute a generic entry W (i, ) through corresponding items optimise the associated min-Knapsack prob-
(5) we need to compare values in rows [i − ], . . . , i − 1 of column lem.
− 1, i.e., we perform at most comparisons. Noting that the ta- If instead of graph G we are given a generic graph, and if we
ble can be built in increasing order of columns and rows (indeed, also have to include a predefined subset T ⊂ V of vertices in the
W is lower-triangular) and that ≤ n, we conclude that the total connected subgraph, the problem is known as the Connection Sub-
complexity of the DP algorithm is O(n3 ). graph problem (see Conrad, Gomes, van Hoeve, Sabharwal, & Suter,
2007). This problem is strongly N P -complete and remains so even
3. Related problems when T = ∅. As discussed in Section 2.2, the mKPC (that is, the
Connection Subgraph problem with T = ∅ and the special struc-
In addition to applications in statistics discussed in Section 1.1, ture of graph G) remains N P -complete. The definition of the mKPC
the mKPC has a specific combinatorial structure. As anticipated, as a problem on a graph gives us an interpretation of inequalities
the problem falls in the wide family of knapsack problems (see (3) as a special case of the connectivity constraints introduced by
Kellerer, Pferschy, & Pisinger, 2004; Martello & Toth, 1990). In par- Fischetti et al. (2017) to impose connectivity of Steiner trees. How-
388
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
Fig. 4. The bottom chart shows a credible set relative to the first change point of the time series in the top chart when disregarding compactness. The points in the credible
set are highlighted in yellow. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
ever, the special structure of graph G that results when solving solutions in multi-objective integer linear programming. Here the
the mKPC makes it more efficient to specialize those constraints idea is to select a solution which is not only efficient but also ro-
to the specific problem without the need to introduce G explicitly. bust, in the sense that its “closeby” solutions are efficient as well
In particular, the separation of inequalities (3) is straightforward (allowing for a substitution of the selected solution). The closeness
(see Section 4.2). of solutions depends on the specific problem and can be identi-
As discussed, our compactness constraints can be interpreted as fied as a change of base via a pivot in a linear program or as a
a connectivity requirement on a suited graph. Similar requirements “move” in a combinatorial problem. In any case, close solutions are
appear in political districting problems, where one has to partition denoted as adjacent, thus defining a graph. The robustness of each
geographic units (e.g., counties or census blocks) to obtain dis- solution is evaluated by analysing its neighbourhood in this graph.
tricts for elections. Districts must contain geographically contigu-
ous units and have the same number of inhabitants. Political dis-
4. Solution approaches
tricting problems are typically defined on a graph where vertices
represent the geographic units and have a weight corresponding
In this section, we describe exact approaches for the mKPC.
to the population, and the edges connect units that are contigu-
We also describe a greedy heuristic for the 1c-mKPC, used in the
ous. Hence, the problem consists in partitioning the vertices into
PRISCA package (Cappello, 2022).
subsets having approximately the same weight and inducing con-
nected subgraphs. According to several recent contributions, this
last requirement is the most challenging to be satisfied (see, e.g., 4.1. Integer programming
Ricca, Scozzari, & Simeone, 2013; Validi, Buchanan, & Lykhovyd,
2022 and Swamy, King, & Jacobson, 2022). The first approach involves solving model (1)–(4) with a black-
In a different perspective, Stiglmayr, Figueira, Klamroth, Pa- box integer programming solver. The model is compact because it
quete, & Schulze (2022) introduce some robustness measures for uses O(n ) variables and O(n2 ) constraints.
389
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
Fig. 5. The bottom chart shows a credible set relative to the first change point of the time series in the top chart, considering compactness requirements. The points in the
credible set are highlighted in yellow. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Strengthening compactness constraints. Compactness constraints items, i.e., items 6, 11,..., 1001. The optimal solution, therefore,
(3) state that if two items lying more than positions apart are selects 2 + 200 = 202 items.
selected, then at least another item between them must be se- When solving the continuous relaxation of the mKPC, however,
lected. These constraints, however, can be made stronger. For ex- an optimal solution is x1 = x1002 = 1, and x j = 10−3 for all other
ample, if the two selected items lie at least 2 positions apart, j ∈ {2, . . . , 1001}. Such a solution has cost 3 and does not violate
then at least two further items between them shall also be se- any compactness constraint. For example, when i = 1 and j = 1002,
j−1
lected. In general, (3) can be strengthened as follows: we have x = 10 0 0 · 10−3 = 1 and thus (3) is satisfied. On
k=i+1 k
the other hand, the strengthened constraint (7) would be violated
j−i−1
j−1
by such a solution:
( xi + x j − 1 ) ≤ xk ∀i, j ∈ {1, . . . , n}, j > i + .
1001
j−1
k=i+1
(xi + x j − 1 ) = 200(1 + 1 − 1 ) = 200 ≤ 1 = xk .
(7) 5
k=i+1
390
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
4.3. Dynamic programming For the special case of the 1c-mKPC, we describe here the
greedy heuristic procedure used in the PRISCA package (Cappello,
To derive a DP algorithm for the (general) mKPC, we first in- 2022) to determine whether a credible set corresponds to a valid
troduce an auxiliary directed graph G = (V, A ). The vertex set con- change point. As mentioned in Section 1.1, the authors consider the
tains a source node σ , a sink node τ , and one node for each item. case in which all costs are unitary, and they deem the credible set
Overall, V = {σ , 1, . . . , n, τ }. The arc set A contains: valid if their heuristic solution of the corresponding 1c-mKPC uses
fewer than n2 items.
• Arcs from σ to each node i ∈ {1, . . . , n}.
The greedy procedure aims at identifying a subset of items
• Arcs from each node i ∈ {1, . . . , n} to τ .
P ⊆ {1, . . . , n} with total weight at least q and satisfying the com-
• An arc from node i to j, for each pair i, j ∈ {1, . . . , n} such that
pactness constraints. The procedure starts by initialising P with a
i < j ≤ i + .
single item, namely the one with the highest weight:
Fig. 6 depicts graph G when = 2. Thinner arrows represents
arcs from σ and to τ , while the thicker ones represents arcs be- P= argmax w j | j ∈ {1, . . . , n} .
tween nodes {1, . . . , n}. A feasible solution of the mKPC corre-
sponds to a path in G starting at σ , ending at τ , and such that It then keeps augmenting P by adding, at each iteration, the heav-
the weight collected at visited nodes is at least q. iest item which is not yet selected and does not violate compact-
To avoid the complete enumeration of all feasible solutions, we ness constraints:
propose a labelling algorithm in which we associate a label to each
partial path from σ . A label L = (i, C, W ) has three components: P ← P ∪ argmax w j | j ∈ {1, . . . , n} \ P, ∃i ∈ P : | j − i| ≤ .
the last visited node i, the total cost C of visited nodes, and the
total collected weight W . The initial label is L = (σ , 0, 0 ). Each The algorithm stops as soon as j∈P w j ≥ q.
391
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
5. Computational results λ and scale nk . Here k ∈ {8, 16, 32} is an instance generation pa-
rameter. Weights will be more tightly distributed around the
In this section, we report the results of computational exper- peak when k is larger. The method samples 50 0 0 times from
iments to test the effectiveness of the algorithms presented in this distribution and builds the corresponding histogram with
Section 4. The code was implemented in C++, using Gurobi 9.5 n bars. The jth bar counts how many samples fell in the inter-
as the MIP solver. Experiments ran on a machine equipped with val [ j, j + 1 ). The weight w j of the jth item is then set as the
an Intel Xeon CPU running at 2.4 GHz and 4GB RAM (increased to height of the jth bar of the histogram. Finally, weights w j are
8GB for instances with n = 600). The MIP solver was instructed to obtained by normalisation as in Eq. (9). Fig. 8 shows an exam-
only use one thread. All algorithms used a time limit of 3600s. The ple of a OnePeak instance.
instances and the code used are available under an open-source li- • The TwoPeaks methods is similar to OnePeak, except that the
cence (Santini, 2022). histogram is built by sampling from the sum of two truncated
After describing the instance set used, we analyse the results of normal distributions with locations λ1 and λ2 , and common
three sets of experiments: scale 2nk . Intuitively, λ1 and λ2 are the locations of two peaks.
The values of the two locations are drawn from two further
1. Experiments to assess the impact of strengthened constraints
truncated normal distributions between 1 and n, and rounded
(7).
to the nearest integer. The first distribution has location n3 , the
2. Experiments to compare the compact formulation, the branch-
second one has location 23n , and both have scale n6 . Fig. 9 shows
and-cut algorithm, and the DP labelling algorithm for the
an example of a TwoPeaks instance.
mKPC.
3. Experiments to investigate the difficulty of solving the unit-cost We use three costs generation methods:
version of the problem. To this end, on top of the above al- • The Constant method simply assigns unit costs to all items and
gorithms, we also add the DP algorithm for the 1c-mKPC (de- allows us to extend the results obtained on the S1 set to larger
scribed in the proof of Theorem 1) and the greedy heuristic de- instances with different weight types.
scribed in Section 4.4. • The Few method aims at modelling real-life statistical applica-
tions in which few items have a small cost, and all other items
5.1. Instances n
have a constant larger one. In particular, it first selects 100
items using a roulette wheel method with probabilities equal to
We consider two sets of instances. We obtained the first set, item weights. It then assigns these items a weight of 0.10 and
denoted S1, from the authors of (Cappello & Madrid Padilla, all other items a weight of 1. The reason we use roulette wheel
2022). This set consists of 300 instances with n ∈ {40, 200}, q ∈ selection is that, in the application, the items with the lower
{0.90, 0.95}, and ∈ {2, 3, 5}. All costs are equal to 1 and, there- costs correspond to time instants with a higher prior probabil-
fore, set S1 contains 1c-mKPC instances. ity of containing a change point. These items are thus also more
Because the costs in the S1 instances are all unitary, and likely to be detected by the algorithm and, as a consequence, to
the number of items is relatively low, we also generated a sec- have a larger weight. Therefore, assuming that the prior knowl-
ond set, denoted S2. This set contains 189 instances with n ∈ edge is accurate and that the algorithm works correctly, items
{20 0, 40 0, 60 0}, q = 0.95, and ∈ {2, 3, 5, 10}. In the following, we with larger weights are more likely to have lower costs.
explain how we generate the weights and the costs in the in- • The Random method assigns each item a cost uniformly dis-
stances of set S2. We use three weight-generation methods: tributed in the interval [1, 10].
• The Noise method first assigns each item j a weight Note that we have three possible values for parameter n, three
values for parameter , and three for the cost generation method.
1 1 Their combination gives 27 parameter combinations using weight
wj = + N 0, ,
n 4n generation method Noise. Because we generate 3 instances for
where N (λ, σ ) denotes a normal distribution with location λ each combination, we build 81 Noise instances. Furthermore, for
and scale σ . To avoid numerical issues, we also ensure that no each of these 27 combinations, we have 3 possible values for pa-
weight is smaller than 10−12 , i.e., we set rameter k, yielding 81 parameter combinations for each of the
OnePeak and TwoPeaks weight generation methods. Again, gener-
w j ← max{w j , 10−12 }. ating 3 instances for each combination, we obtain 243 instances
Because the sum of the above weights is not necessarily equal for each method. Overall, we then construct 81 + 2 × 243 = 567 in-
to one, we finally normalise them: stances.
392
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
Impact of strengthened compactness constraints. We use two rel- where “UB” indicates the best primal solution and LB is the
evant metrics to assess the impact of strengthened inequalities (7): tightest dual bound returned by the solver. Gap% corresponds
to the familiar gap returned by black-box integer programming
1. The percentage optimality gap, i.e., the gap between the best
solvers and depends on both the quality of the primal and dual
primal and dual bounds returned by each algorithm within the
bound.
time limit. This metric is denoted as Gap% and is defined as
2. The second metric is the solution time in seconds, including the
follows:
time spent creating the model and exploring the branch-and-
UB − LB bound tree. It is denoted by Time (s).
Gap% = 100 · ,
UB
393
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
Table 1
Impact of strengthened inequalities (7) on the performance of the Compact Formulation and the Branch & Cut algorithm. The table refers to instances whose weights are
generated with the Noise and OnePeak methods.
n Weights Costs Compact MIP with (3) Compact MIP with (7) B&C with (3) B&C with (7)
Time (s) Time (s) Time (s) Time (s)
We also note that instances generated using weight types Two- B&C algorithm, on the other hand, closes these instances in a few
Peaks are considerably harder than the other instances. Therefore, hundredths of a second: a difference of five orders of magnitude.
we present the results obtained on Noise and OnePeak instances Regarding the effect of strengthened compactness constraints, we
separately from those obtained on TwoPeaks instances. After com- note that they do not seem to help when solving the full compact
menting on these results, we will come back to the difficulty of formulation. If anything, in fact, they slightly increase the compu-
TwoPeaks instances, and we will explain what sets them apart tation time. On the other hand, they reduce the computation time
from the other instances. of the B&C algorithm.
Table 1 reports the results on the Noise and OnePeak instances Table 2 presents the results on the TwoPeaks instances. These
of set S2. Because all algorithms solve to optimality all instances instances are considerably harder to solve: in several cases, the
with up to 600 items, Table 1 is only reporting the runtimes. solvers run out of time without solving the model to optimality.
Note how the runtimes are very different for the complete com- Even when they solve the model to optimality, it takes, on average,
pact formulation and for the B&C algorithm. For the largest in- much longer compared with Noise and OnePeak instances. For
stances, for example, the average runtimes needed to solve the these instances, the strengthening constraints have a consider-
compact formulation are in the order of hundreds of seconds. The able effect on the solvers. Indeed, by using the strengthened
394
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
Table 2
Impact of strengthened inequalities (7) on the performance of the Compact Formulation and the Branch & Cut algorithm. The table refers to instances whose weights are
generated with the TwoPeaks method.
n Costs Compact MIP with (3) Compact MIP with (7) B&C with (3) B&C with (7)
Gap% Time (s) Gap% Time (s) Gap% Time (s) Gap% Time (s)
200 Constant 5.99 710.39 0.09 492.53 13.53 1746.66 0.83 884.48
Few 7.63 828.35 0.05 462.61 12.66 2056.36 0.63 627.83
Random 4.18 578.62 0.54 561.51 8.71 1362.68 0.48 591.01
400 Constant 6.11 1993.72 2.52 1999.99 15.98 2558.45 1.56 1818.52
Few 10.92 1531.91 2.48 1737.72 13.42 1747.64 3.00 1603.26
Random 3.51 1079.89 3.41 1915.56 13.21 2606.64 2.69 1350.17
600 Constant 18.88 2131.24 6.46 2147.31 15.97 2133.58 14.07 1740.09
Few 19.31 2603.57 6.72 2890.47 19.49 2851.15 7.70 2419.43
Random 12.92 2376.42 6.23 2677.12 15.92 2666.89 4.79 2085.05
Overall 9.94 1537.12 3.17 1653.87 14.13 2008.89 3.81 1428.38
Fig. 10. Left: percentage of items selected fractionally in the optimal solution of the continuous relaxation of the mKPC. Right: normalised Gini coefficient showing how
close the fractional values are to 0.5 (the higher the value, the closer to 0.5).
inequalities (7) the average gaps are roughly reduced by two- Metric FracGini is the normalised Gini coefficient of the fractional
thirds. We also observe that, on these harder instances, the B&C variables, i.e.,
algorithm loses its advantage on the compact formulation. The n
j=1 x∗j (1 − x∗j )
gaps produced by B&C are slightly worse, while the runtimes are FracGini = .
comparable.
j ∈ {1, . . . , n} : 0 < x∗j < 1
Peculiarity of the TwoPeaks instances. As Tables 1 and 2 show,
TwoPeaks instances are much harder to solve using branch-and-
The value of this metric is higher when many x∗j are concentrated
bound methods compared with the other instances. The reason
lies in the characteristics of the optimal solution of the continu- around 0.5, while it is lower when the x∗j take values close to 0
ous relaxation of the mKPC. Solutions of TwoPeaks instances have or 1. Values x∗j ∈ {0, 1} do not contribute to the sum at the nu-
a large number of fractional items, and the value of the corre- merator. Therefore, solutions with more fractional items have more
sponding variables x are closer to 0.5. This implies that much more non-zero terms in the sum at the numerator. To compensate, we
branching is necessary while exploring the branch-and-bound tree. normalise dividing by the number of fractional items.
To appreciate the extent by which TwoPeaks instances differ from Comparison of the algorithms for the mKPC. Table 3 compares the
the other instances, Fig. 10 shows boxplots of two metrics relative performance of three approaches for the mKPC. Because strength-
to the optimal solution of the continuous relaxation of the mKPC, ened inequalities (7) result in lower gaps, we enable them for both
divided by weight generation method. Let x∗1 , . . . , x∗n be such a so- the branch-and-cut algorithm and the compact formulation.
lution. Metric Frac% gives the percentage of items selected frac- Table 3 reports the following metrics:
tionally in the solution, i.e.,
1. Opt% is the percentage of instances in each row for which the
algorithm found a provably optimal solution.
j ∈ {1, . . . , n} : 0 < x∗j <1 2. PGap% is the percentage primal gap, i.e., the gap between the
Frac% = 100 · . best primal solution found by each algorithm and the best
n
395
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
Table 3
Comparison of the MIP-based approaches (the branch-and-cut and the compact formulation) with the labelling algorithm presented in Section 4.3, on the S2 instances.
n Weights Costs B&C with (7) Compact MIP with (7) Labelling
Opt% PGap% Time (s) Opt% PGap% Time (s) Feas% Opt% PGap% Time (s)
200 Noise Constant 100.00 0.00 0.00 100.00 0.00 4.82 100.00 100.00 0.00 1.85
Few 100.00 0.00 0.01 100.00 0.00 6.72 100.00 100.00 0.00 3.05
Random 100.00 0.00 0.01 100.00 0.00 7.39 88.89 44.44 26.36 2917.73
OnePeak Constant 100.00 0.00 0.01 100.00 0.00 5.02 100.00 100.00 0.00 1.32
Few 100.00 0.00 0.01 100.00 0.00 5.25 100.00 100.00 0.00 1.59
Random 100.00 0.00 0.02 100.00 0.00 6.22 100.00 88.89 0.05 125.94
TwoPeaks Constant 77.78 0.49 884.48 96.30 0.05 492.53 100.00 96.30 0.13 1.81
Few 85.19 0.45 627.83 92.59 0.04 462.61 100.00 96.30 0.06 1.59
Random 88.89 0.01 591.01 88.89 0.00 561.51 100.00 85.19 0.04 144.46
400 Noise Constant 100.00 0.00 0.01 100.00 0.00 49.34 100.00 100.00 0.00 48.02
Few 100.00 0.00 0.02 100.00 0.00 99.85 100.00 100.00 0.00 160.17
Random 100.00 0.01 0.03 100.00 0.01 63.58 66.67 0.00 100.00 3600.00
OnePeak Constant 100.00 0.00 0.01 100.00 0.00 104.23 100.00 88.89 0.15 20.09
Few 100.00 0.00 0.02 100.00 0.00 76.48 100.00 100.00 0.00 34.99
Random 100.00 0.00 0.10 100.00 0.00 110.54 100.00 74.07 0.52 1975.23
TwoPeaks Constant 59.26 1.38 1818.52 55.56 1.42 1999.99 100.00 92.59 0.83 29.94
Few 55.56 2.12 1603.26 55.56 1.55 1737.72 100.00 92.59 0.79 45.55
Random 66.67 1.29 1350.17 55.56 1.31 1915.56 96.30 66.67 12.36 2071.43
600 Noise Constant 100.00 0.00 0.01 100.00 0.00 187.73 100.00 100.00 0.00 234.71
Few 100.00 0.00 0.02 100.00 0.00 239.03 100.00 100.00 0.00 1090.45
Random 100.00 0.01 0.03 100.00 0.01 220.93 66.67 0.00 100.00 3600.00
OnePeak Constant 100.00 0.00 0.01 100.00 0.00 635.56 100.00 96.30 0.02 207.80
Few 100.00 0.00 0.13 100.00 0.00 455.45 100.00 85.19 0.04 282.16
Random 100.00 0.00 0.19 100.00 0.00 554.11 88.89 40.74 25.05 3337.04
TwoPeaks Constant 51.85 14.12 1740.09 44.44 5.46 2147.31 100.00 100.00 3.32 231.36
Few 33.33 13.26 2419.43 29.63 4.83 2890.47 100.00 88.89 2.89 345.75
Random 44.44 4.20 2085.05 29.63 4.62 2677.12 81.48 18.52 39.37 3485.78
Overall 83.95 1.73 620.83 83.25 0.92 815.75 97.18 83.42 6.24 697.20
Table 4
Comparison of four algorithms on the unit-cost instances of sets S1 and S2.
known primal solution. We use PGap% instead of Gap% because each row for which the algorithm found at least one feasible
the labelling algorithm provides no dual bound when it cannot solution.
solve an instance within the time limit.
3. Finally, we observe that the labelling algorithm can terminate To ensure a fair comparison, we compute averages using a
in three different states. If it completes before the time limit, PGap% of 100 % and a Time (s) of 3600 s when the labelling al-
it has found the optimal solution. If it times out and there is gorithm does not give any feasible solution.
at least one label extended to the sink node τ , then the algo- Table 3 shows that branch-and-cut can usually find more op-
rithm can be used as a heuristic: any label extended to the sink tima than the other algorithms and in a shorter time. The average
node corresponds to a feasible solution. The algorithm can re- primal gap, however, is lower for the compact MIP formulation.
turn the best such solution, although it cannot prove or dis- The labelling algorithm does not always manage to produce a fea-
prove its optimality while there are still unextended labels. If sible solution. In particular, its performance degrades for instances
the algorithm times out and no label was extended to τ , then with cost type Random and, to a lesser extent, with weight type
we do not even have a feasible solution to compute PGap%. Noise. For these instances, in fact, dominance is less likely, and the
Therefore, we introduce the additional column Feas% for the labelling algorithm becomes similar to a complete enumeration of
labelling algorithm. It contains the percentage of instances in feasible solutions.
396
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397
When the labelling algorithm finds feasible solutions, however, arising in classical statistics. In particular, we showed that OR can
they are often optimal. In these cases, branch-and-cut usually also provide practical tools to solve to optimality problems which are
finds optimal solutions in a shorter time. A notable exception is often approached heuristically in the statistical community (see,
TwoPeaks instances: when costs are Constant or Few, labelling is e.g., Bertsimas, King, & Mazumder, 2016, for an illustrious exam-
the best-performing algorithm, finding more optima in a consider- ple). Future work may identify further problems in statistics which
ably shorter time. OR techniques can efficiently approach.
Experiments on 1c-mKPC instances. Finally, we report the perfor-
mances of five algorithms on the unit-cost mKPC. In addition to Acknowledgments
the three algorithms considered above, we add the DP approach
presented in the proof of Theorem 1 and the greedy heuristic in- We are grateful to Lorenzo Cappello and Oscar Madrid Padilla
troduced in Section 4.4. The test set for this experiment includes for sharing the instances of set S1, Lorenzo Cappello for fruitful dis-
both S1 instances and S2 instances with Constant costs. cussions on the problem, and three anonymous reviewers for their
The results show that the tailored DP approach vastly outper- constructive comments. Enrico Malaguti was supported by the Air
forms all other algorithms. It solves all instances, even the largest Force Office of Scientific Research under award FA8655-20-1-7019.
ones with n = 600, in less than two-thousandths of a second per
instance. The greedy heuristic is even faster (all measured times References
were under 0.0 0 0 05 s) but often fails at identifying the optimal so-
Aminikhanghahi, S., & Cook, D. (2017). A survey of methods for time series change
lution, especially when the size of the instance grows. We can con- point detection. Knowledge and Information Systems, 51(2), 339–367. https://doi.
clude that specialised approaches for the 1c-mKPC are well justi- org/10.1007/s10115- 016- 0987- z.
fied and that there is no reason to use heuristic algorithms because Babat, L. (1975). Linear functionals on the n-dimensional unit cube. Reports of the
Academy of Sciences of the Soviet Union, 221, 761–762.
our proposed DP approach is extremely fast in practice (Table 4). Bertsimas, D., King, A., & Mazumder, R. (2016). Best subset selection via a modern
optimization lens. Annals of Statistics, 44(2), 813–852. https://doi.org/10.1214/
6. Conclusions 15-AOS1388.
Cappello, L. (2022). prisca. GitHub repository. https://www.github.com/lorenzocapp/
prisca.
This paper introduced the min-Knapsack Problem with Com- Cappello, L., & Madrid Padilla, O. H. (2022). Variance change point detection with
pactness Constraints (mKPC), an extension of the classical min- credible sets. arXiv:2211.14097.
Conrad, J., Gomes, C., van Hoeve, W.-J., Sabharwal, A., & Suter, J. (2007). Connections
Knapsack problem. The motivation for studying the mKPC is that it
in networks: Hardness of feasibility versus optimality. In P. Van Hentenryck, &
arises as a sub-problem in two state-of-the-art algorithms recently L. Wolsey (Eds.), International conference on integration of artificial intelligence
introduced in the statistical community. These are the PRISCA al- (AI) and operations research (OR) techniques in constraint programming (pp. 16–
gorithm of Cappello & Madrid Padilla (2022) for detecting change 28). Springer. https://doi.org/10.1007/978- 3- 540- 72397- 4_2.
Csirik, J., Frenk, H., Labbé, M., & Zhang, S. (1991). Heuristics for the
points in time series and the SuSiE algorithm of Wang et al. 0–1 min-Knapsack problem. Acta Cybernetica, 10(1–2), 15–20.
(2020) for variable selection in high-dimensional regression. Fischetti, M., Leitner, M., Ljubic, I., Luipersbeck, M., Monaci, M., Resch, M., . . .
We proposed three approaches to solve the mKPC: a compact Sinnl, M. (2017). Thinning out steiner trees: A node-based model for uniform
edge costs. Mathematical Programming Computation, 9, 203–229. https://doi.org/
formulation with a quadratic number of constraints, a branch- 10.1007/s12532- 016- 0111-0.
and-cut approach in which these constraints are separated dy- Gambella, C., Ghaddar, B., & Naoum-Sawaya, J. (2021). Optimization problems for
namically, and a labelling algorithm. Despite branch-and-cut being machine learning: A survey. European Journal of Operational Research, 290, 807–
828. https://doi.org/10.1016/j.ejor.2020.08.045.
more often used when the number of constraints is exponential in Hamilton, J. (1994). Time series analysis. Princeton University Press.
the problem size, computational experiments proved that this ap- Kellerer, H., Pferschy, U., & Pisinger, D. (2004). Knapsack problems. Springer.
proach could also be helpful when dealing with compact models. Kim, T. Y., Oh, K. J., Sohn, I., & Hwang, C. (2004). Usefulness of artificial neural net-
works for early warning system of economic crisis. Expert Systems with Applica-
In particular, the branch-and-cut algorithm solves the largest num- tions, 26(4), 583–590. https://doi.org/10.1016/j.eswa.20 03.12.0 09.
ber of instances to optimality. It is orders of magnitude faster than Martello, S., & Toth, P. (1990). Knapsack problems. Wiley.
solving the entire formulation with the state-of-the-art black-box Radke, R., Andra, S., Al-Kofahi, O., & Roysam, B. (2005). Image change detection al-
gorithms: A systematic survey. IEEE Transactions on Image Processing, 14(3), 294–
solver Gurobi. Computational experiments also showed that the
307. https://doi.org/10.1109/TIP.2004.838698.
problem’s difficulty depends considerably on instance characteris- Reeves, J., Chen, J., Wang, X. L., Lund, R., & Lu, Q. Q. (2007). A review and com-
tics. In particular, instances with a particular double-peak structure parison of changepoint detection techniques for climate data. Journal of Applied
are harder to solve to optimality. Meteorology and Climatology, 46(6), 900–915. https://doi.org/10.1175/JAM2493.1.
Ricca, F., Scozzari, A., & Simeone, B. (2013). Political districting: From classical mod-
Finally, we focused our attention on a special case of the mKPC, els to recent approaches. Annals of Operations Research, 204, 271–299. https:
named the unit-cost mKPC (1c-mKPC). This problem is especially //doi.org/10.1007/s10479- 012- 1267- 2.
relevant for the statistical applications because it corresponds to Santini, A. (2022). Algorithms for the min-Knapsack problem with compactness con-
straints. GitHub repository. https://doi.org/10.5281/zenodo.7492799.
the case in which the user of the PRISCA and SuSiE algorithms Stiglmayr, M., Figueira, J. R., Klamroth, K., Paquete, L., & Schulze, B. (2022). Decision
mentioned above has no prior knowledge of, respectively, which space robustness for multi-objective integer linear programming. Annals of Op-
time instants and which features are more likely to be selected. We erations Research, 319, 1769–1791. https://doi.org/10.1007/s10479-021-04462-w.
Swamy, R., King, D. M., & Jacobson, S. H. (2022). Multiobjective optimization for
proved that the 1c-mKPC is solvable in polynomial time and pro- politically fair districting: A scalable multilevel approach. Operations Research,
posed a specific dynamic programming algorithm. Computational 71(2), 536–562.
results clearly show that using this algorithm is better than the Validi, H., Buchanan, A., & Lykhovyd, E. (2022). Imposing contiguity constraints in
political districting models. Operations Research, 70(2), 867–892.
generic mKPC approaches and a greedy heuristic from the statis-
Wang, G., Sarkar, A., Carbonetto, P., & Stephens, M. (2020). A simple new approach
tics literature. to variable selection in regression, with application to genetic fine mapping.
This work contributes to the literature at the intersection be- Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82, 1273–
1300. https://doi.org/10.1111/rssb.12388.
tween operational research (OR) and statistics. Although some au-
Yang, P., Dumont, G., & Ansermino, J. M. (2006). Adaptive change detection in heart
thors recently explored problems in machine learning from the rate trend monitoring in anesthetized children. IEEE Transactions on biomedical
point of view of OR (see, e.g., Gambella, Ghaddar, & Naoum- engineering, 53(11), 2211–2219. https://doi.org/10.1109/TBME.2006.877107.
Sawaya, 2021), there are not many works which address problems
397