0% found this document useful (0 votes)

10 views13 pages

The Min-Knapsack Problem With Compactness Constraints and Applications in Statistics - A. Santini

Uploaded by

hubert.villuendas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views13 pages

The Min-Knapsack Problem With Compactness Constraints and Applications in Statistics - A. Santini

Uploaded by

hubert.villuendas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

European Journal of Operational Research 312 (2024) 385–397

Contents lists available at ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier.com/locate/ejor

Interfaces with Other Disciplines

The min-Knapsack problem with compactness constraints and

applications in statistics
Alberto Santini a,b,∗, Enrico Malaguti c
a
Department of Economics and Business, Universitat Pompeu Fabra, Carrer Ramon Trias-Fargas, 25–27, Barcelona, 08005, Spain
b
Barcelona School of Economics, Carrer Ramon Trias-Fargas, 25–27, Barcelona, 08005, Spain
c
Dipartimento di Ingegneria dell’Energia Elettrica e dell’Informazione “Guglielmo Marconi”, Università di Bologna, Viale del Risorgimento, 2, Bologna, 40136,
Italy

a r t i c l e i n f o a b s t r a c t

Article history: In the min-Knapsack problem, one is given a set of items, each having a certain cost and weight. The
Received 9 January 2023 objective is to select a subset with minimum cost, such that the sum of the weights is not smaller than
Accepted 14 July 2023
a given constant. In this paper, we introduce an extension of the min-Knapsack problem with additional
Available online 20 July 2023
“compactness constraints” (mKPC), stating that selected items cannot lie too far apart. This extension has
Keywords: applications in statistics, including in algorithms for change-point detection in time series. We propose
Cutting three solution methods for the mKPC. The ﬁrst two methods use the same Mixed-Integer Programming
Knapsack problems (MIP) formulation but with two different approaches: passing the complete model with a quadratic num-
Applications in statistics ber of constraints to a black-box MIP solver or dynamically separating the constraints using a branch-
Dynamic programming and-cut algorithm. Numerical experiments highlight the advantages of this dynamic separation. The third
approach is a dynamic programming labelling algorithm. Finally, we focus on the particular case of the
unit-cost mKPC (1c-mKPC), which has a speciﬁc interpretation in the context of the statistical applications
mentioned above. We prove that the 1c-mKPC is solvable in polynomial time with a different ad-hoc dy-
namic programming algorithm. Experimental results show that this algorithm vastly outperforms both
generic approaches for the mKPC and a simple greedy heuristic from the literature.
© 2023 The Author(s). Published by Elsevier B.V.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)

1. Introduction high-dimensional statistics (see Section 1.1) motivate the study of

this variant.
In this paper, we present an extension of the min-Knapsack In the mKPC, there is a distance metric deﬁned over the items.
problem (Csirik, Frenk, Labbé, & Zhang, 1991) with applications Consider two items i and j and assume, from now on and with-
in statistics, including change-point detection in time series. Be- out loss of generality, that i < j. We deﬁne the distance between
ing an extension of min-Knapsack, the considered problem is N P - items as the difference of their indices, i.e., j − i. We can think of
complete. We also consider a special case of the problem, which is the items as an ordered sequence, and we are interested in how
both relevant for the statistical applications and solvable in poly- far apart i and j lie in the sequence. With this notion of distance,
nomial time. we impose the additional condition that the set of selected items
The min-Knapsack problem asks to select a subset of n items, is compact. Formally, we consider a maximum distance parameter
each with weight w j ≥ 0 and cost c j ≥ 0 ( j ∈ {1, . . . , n}), such that ∈ N. If two items i and j are both selected and j − i > , then
the sum of the costs of the selected items is minimum, and their we require that there is at least another selected item between
total weight is not smaller than a constant q ≥ 0. i and j. I.e., we require that there is a selected item k such that
In this paper, we introduce a variant of the min-Knapsack prob- i < k < j.
lem, which we call the min-Knapsack Problem with Compact- Fig. 1 presents an example which shows the difference between
ness Constraints (mKPC). Applications in time series analysis and the min-Knapsack problem (without compactness constraints) and
the mKPC (with compactness constraints). Items lie on the x axis
according to their index, and the bar heights indicate their weights.
∗
Corresponding author. The value of parameter for the mKPC is set to 2 and c j = 1
E-mail addresses: [email protected] (A. Santini), [email protected] for all items. An optimal solution of the min-Knapsack problem,
(E. Malaguti).

https://doi.org/10.1016/j.ejor.2023.07.020
0377-2217/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

Fig. 1. Comparison of the solutions of the min-Knapsack problem and the mKPC on the same instance. Parameter = 2.

depicted in Fig. 1(a), has total cost 12. However, it violates com- ble set. For example, a time instant corresponding to an external
pactness constraints: items 8 and 12 (with distance 4 > 2) are shock might cost less in terms of parsimony compared to a time
both selected, but no other item between them is selected. Indeed, instant when no such shock occurred. Therefore, one can associate
an optimal solution of the mKPC has a cost of 13, as shown in to each time point j a scaling factor c j and minimise the sum
Fig. 1(b). of these factors. On the other hand, when no such information is
present, one can just set c j = 1 for all time instants. As we will see
1.1. Motivation in Section 2.2, using a unitary scaling factor decidedly simplifies
the problem. In the rest of this explanation, we will consider, for
The motivation for the mKPC comes from applications in statis- simplicity, this unit-cost case.
tics. In the following, we give a detailed example from change- The most straightforward method to build the credible set is
point detection in time series. perhaps to follow a greedy approach which inserts points by de-
A time series is a sequence of numerical values indexed by dis- creasing value of probability until the desired threshold q is met.
crete time points (Hamilton, 1994). Given a time series y1 , . . . , yn , This criterion was used, for example, by Wang, Sarkar, Carbonetto,
the objective of change-point detection is to identify whether the & Stephens (2020, Supplementary Data, Section A.3). Using this ap-
underlying probability distribution of y changes, how many times proach, however, can result in a situation in which time points be-
it does so, and at which time points. Typical change-points for time longing to different change points end up in the same credible set.
series occur when the time series changes its expected value (see Fig. 4 exemplifies this concept. The points highlighted in yellow in
Fig. 2a), its variance (see Fig. 2b), or both. Change-point detection the bottom chart are included in the same credible set, but they
has important applications (Aminikhanghahi & Cook, 2017). Among are not all associated with the first change point.
the most prominent ones are those in healthcare, e.g, to detect To overcome this problem, one must then consider the com-
changes in patient conditions (Yang, Dumont, & Ansermino, 2006); pactness of the credible set: because each set should identify a
in climatology, e.g., to detect climate change (Reeves, Chen, Wang, single change point, its elements should be “compact” and, ide-
Lund, & Lu, 2007); in econometrics, e.g., to detect warning signs of ally, distributed tightly around the real (unknown) change point.
a crisis (Kim, Oh, Sohn, & Hwang, 2004); in signal processing, e.g., This objective can be achieved via compactness constraints. Indeed,
to detect changes in recorded images (Radke, Andra, Al-Kofahi, & once the value of parameter is fixed (usually to a small number
Roysam, 2005). such as 2 or 3), the problem of producing the most parsimonious
Cappello & Madrid Padilla (2022) introduced a state-of-the-art credible set becomes our mKPC, in which the probability values
method, named PRISCA, for detecting changes in the variance of a associated to each time point take the role of the weights. Fig. 5
Gaussian time series. They propose an iterative method which at- shows how including compactness leads to a better credible set
tempts to identify one change point at each iteration. As Fig. 2b construction.
shows, however, a method identifying one time point for each
change point does not give results which are easy to interpret
because there is often considerable uncertainty about when the
2. Formal definition
change takes place. In the figure, this uncertainty is represented by
the wide shaded areas. Therefore, at each iteration, PRISCA builds
In this section, we give a formal definition of the mKPC by
a discrete probability distribution over {1, . . . , n}, associating each
means of an integer programming model, and we discuss the com-
time point with the probability that it is a change point. An ex-
plexity of the mKPC and of the unit-cost mKPC (1c-mKPC). As
ample distribution relative to the first change point is depicted in
mentioned in Section 1, in fact, the mKPC is N P -complete. In
Fig. 3. The height of the bars in the bottom chart corresponds to
Section 2.2, however, we prove that the 1c-mKPC is solvable in
the probability associated with each time point.
polynomial time.
Next, it identifies a level-q credible set, i.e., a subset of {1, . . . , n}
in which the sum of the probabilities is at least q (for a given
threshold q ∈ [0, 1]). For example, a level-0.95 credible set corre-
sponds to a 95% probability that the set contains the change point. 2.1. Mathematical model
Following a criterion of parsimony, it is desirable that the cred-
ible set contains as few elements as possible. Not all time points, We can formulate the m-KPC as the following integer program,
however, must carry the same penalty if included in the credi- in which binary variable x j takes value 1 iff the jth item is se-

386
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

Fig. 2. Example time series which change their expected value and variance. Black points indicate the time series values yt . Shaded areas represent periods where, qualita-
tively, an analyst would expect a change point.

lected: user has no prior knowledge of which time instants of a time se-

n ries are more likely to be change points. The following theorem
min c jx j (1) establishes a strong result about the 1c-mKPC: namely, that it can
j=1 be solved in polynomial time.

Theorem 1. Consider the decision version of the 1c-mKPC: for a given

n
subject to w jx j ≥ q (2) integer number t ∈ {1, . . . , n}, we want to know whether there exists
j=1 a feasible solution of the 1c-mKPC using at most t items. The decision
version of the 1c-mKPC can be solved in polynomial time.

j−1
Proof. Consider a Dynamic Programming (DP) table W with en-
xi + x j − 1 ≤ xk ∀i, j ∈ {1, . . . , n}, j > i + (3)
tries W (i, ) for each i ∈ {1, . . . , n} and ∈ {1, . . . , i}. Entry W (i, )
k=i+1
will contain the maximum weight of a subset of {1, . . . , i} such
that the set has size and that the element of the set with the
x j ∈ {0, 1} ∀ j ∈ {1, . . . , n}. (4) highest index is item i. This table can be trivially initialised with
We denote constraints (3) the compactness constraints. W (i, 1 ) = wi for all i ∈ {1, . . . , n}. Furthermore, the following DP re-
cursion is valid:
2.2. Complexity
W (i, ) = max W ( j, − 1 ) + wi , (5)
j∈ [i−],...,i−1
The mKPC is N P -complete because it contains the min-
Knapsack problem as a special case when = n. In the applica- where notation [i − ] is used as a shorthand for max{1, i − }.
tions described in Section 1.1, however, it can often be the case Recursion (5) is valid because of the following observation. Any set
that all items take unit cost (i.e., c j = 1 for all i ∈ {1, . . . , n}). This of size having item i as its highest-index element must contain
problem is denoted as 1c-mKPC and arises, for example, when the at least one element in {[i − ], . . . , i − 1} as its second-highest-

387
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

Fig. 3. Probabilities associated with each time point and representing how likely the point is to be the ﬁrst change point of the time series.

index element. If that were not the case, in fact, the compactness ticular, it extends the min-Knapsack problem by introducing com-
constraint would be violated. pactness constraints. For the earliest results on the min-Knapsack
Finally, to know whether there is a subset of {1, . . . , n} of size problem in English, we refer the reader to the seminal work of
at most t such that its elements have weight at least c and that Csirik et al. (1991); for earlier works in Russian see, e.g., Babat
satisfies compactness constraints, we must check that (1975).
The special structure of compactness constraints can be repre-
min ∈ {1, . . . , n} | ∃i ∈ {, . . . , n} s.t. W (i, ) ≥ q ≤ t. (6)
sented by a graph G = (V, E ) in which each item i corresponds to
We now analyse the complexity of the above algorithm to con- a vertex vi ∈ V , and an edge {vi , v j } ∈ E is defined for each pair of
clude that it runs in polynomial time in the instance size n. Indeed, vertices vi and v j , i < j, such that j − i < . The mKPC asks to se-
table W has size O(n2 ) and we derive the worst-case complexity of lect a subset of V inducing a connected subgraph, such that the
computing an entry. To compute a generic entry W (i, ) through corresponding items optimise the associated min-Knapsack prob-
(5) we need to compare values in rows [i − ], . . . , i − 1 of column lem.
− 1, i.e., we perform at most comparisons. Noting that the ta- If instead of graph G we are given a generic graph, and if we
ble can be built in increasing order of columns and rows (indeed, also have to include a predefined subset T ⊂ V of vertices in the
W is lower-triangular) and that ≤ n, we conclude that the total connected subgraph, the problem is known as the Connection Sub-
complexity of the DP algorithm is O(n3 ). graph problem (see Conrad, Gomes, van Hoeve, Sabharwal, & Suter,
2007). This problem is strongly N P -complete and remains so even
3. Related problems when T = ∅. As discussed in Section 2.2, the mKPC (that is, the
Connection Subgraph problem with T = ∅ and the special struc-
In addition to applications in statistics discussed in Section 1.1, ture of graph G) remains N P -complete. The definition of the mKPC
the mKPC has a specific combinatorial structure. As anticipated, as a problem on a graph gives us an interpretation of inequalities
the problem falls in the wide family of knapsack problems (see (3) as a special case of the connectivity constraints introduced by
Kellerer, Pferschy, & Pisinger, 2004; Martello & Toth, 1990). In par- Fischetti et al. (2017) to impose connectivity of Steiner trees. How-

388
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

Fig. 4. The bottom chart shows a credible set relative to the ﬁrst change point of the time series in the top chart when disregarding compactness. The points in the credible
set are highlighted in yellow. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

ever, the special structure of graph G that results when solving solutions in multi-objective integer linear programming. Here the
the mKPC makes it more efficient to specialize those constraints idea is to select a solution which is not only efficient but also ro-
to the specific problem without the need to introduce G explicitly. bust, in the sense that its “closeby” solutions are efficient as well
In particular, the separation of inequalities (3) is straightforward (allowing for a substitution of the selected solution). The closeness
(see Section 4.2). of solutions depends on the specific problem and can be identi-
As discussed, our compactness constraints can be interpreted as fied as a change of base via a pivot in a linear program or as a
a connectivity requirement on a suited graph. Similar requirements “move” in a combinatorial problem. In any case, close solutions are
appear in political districting problems, where one has to partition denoted as adjacent, thus defining a graph. The robustness of each
geographic units (e.g., counties or census blocks) to obtain dis- solution is evaluated by analysing its neighbourhood in this graph.
tricts for elections. Districts must contain geographically contigu-
ous units and have the same number of inhabitants. Political dis-
4. Solution approaches
tricting problems are typically defined on a graph where vertices
represent the geographic units and have a weight corresponding
In this section, we describe exact approaches for the mKPC.
to the population, and the edges connect units that are contigu-
We also describe a greedy heuristic for the 1c-mKPC, used in the
ous. Hence, the problem consists in partitioning the vertices into
PRISCA package (Cappello, 2022).
subsets having approximately the same weight and inducing con-
nected subgraphs. According to several recent contributions, this
last requirement is the most challenging to be satisfied (see, e.g., 4.1. Integer programming
Ricca, Scozzari, & Simeone, 2013; Validi, Buchanan, & Lykhovyd,
2022 and Swamy, King, & Jacobson, 2022). The first approach involves solving model (1)–(4) with a black-
In a different perspective, Stiglmayr, Figueira, Klamroth, Pa- box integer programming solver. The model is compact because it
quete, & Schulze (2022) introduce some robustness measures for uses O(n ) variables and O(n2 ) constraints.

389
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

Fig. 5. The bottom chart shows a credible set relative to the ﬁrst change point of the time series in the top chart, considering compactness requirements. The points in the
credible set are highlighted in yellow. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

Strengthening compactness constraints. Compactness constraints items, i.e., items 6, 11,..., 1001. The optimal solution, therefore,
(3) state that if two items lying more than positions apart are selects 2 + 200 = 202 items.
selected, then at least another item between them must be se- When solving the continuous relaxation of the mKPC, however,
lected. These constraints, however, can be made stronger. For ex- an optimal solution is x1 = x1002 = 1, and x j = 10−3 for all other
ample, if the two selected items lie at least 2 positions apart, j ∈ {2, . . . , 1001}. Such a solution has cost 3 and does not violate
then at least two further items between them shall also be se- any compactness constraint. For example, when i = 1 and j = 1002,
j−1
lected. In general, (3) can be strengthened as follows: we have x = 10 0 0 · 10−3 = 1 and thus (3) is satisﬁed. On
k=i+1 k
the other hand, the strengthened constraint (7) would be violated

j−i−1
j−1
by such a solution:
( xi + x j − 1 ) ≤ xk ∀i, j ∈ {1, . . . , n}, j > i + .
1001
j−1
k=i+1
(xi + x j − 1 ) = 200(1 + 1 − 1 ) = 200 ≤ 1 = xk .
(7) 5
k=i+1

The following example shows why these constraints help

tighten the continuous relaxation of the mKPC. Consider an in- 4.2. On-the-fly constraint generation
stance in which the two heaviest items are the first one and the
last one: let n = 1002, w1 = w1002 = 0.495, and w j = 10−4 for all Formulation (1)–(4) has polynomial size, but the number of
other j ∈ {2, . . . , 1001}. Further, assume that costs are all equal, compactness constraints can be very large for large values of n.
that = 5, and that α = 0.95. Without compactness constraints, Their management can be impractical, and it can cause a degrada-
one might simply choose items 1 and 1002, obtaining a total tion of black-box IP solvers’ performances, in particular during pre-
weight of 0.99 > 0.95. Due to compactness constraints, however, processing and when solving linear programming relaxations. For
we must “link” these two items, taking other intermediate items. this reason, we evaluate the effectiveness of a branch-and-cut ap-
The most parsimonious way to achieve that is to take one every proach in which we first remove the compactness constraints and

390
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

then generate them on-the-ﬂy by separating infeasible integer and

fractional solutions of the resulting relaxed problem. In the rest of
this section, we derive the corresponding separation procedures.
Integer solution separation. The following procedure checks
whether an integer solution x∗1 , . . . , x∗n violates a compactness con-
straint. For each item i ∈ {1, . . . , n} with x∗i = 1, we search the first
item σi ∈ {i + 1, . . . , n} such that x∗σi = 1. If σi > i + , then (3) is
violated for j = σi and must be added to the formulation. Other-
wise, there is no index j such that constraint (3) is violated for
the index pair (i, j ). Stopping the algorithm after we find the first
violated constraint (if any) would cut away the current infeasible
integer solution. However, we can keep scanning items i even after
we find one involved in violating a compactness constraint, thus
attempting to separate other useful inequalities.
Fractional solution separation. Given a fractional solution
x∗1 , . . . , x∗n , the following procedure determines whether it violates Fig. 6. Graph G used by the labelling algorithm. The graph in the figure depicts an
instance with = 2.
a compactness constraint. For each item i ∈ {1, . . . , n − − 1} such
that x∗i > 0, let S = 0. Then:
time a label L = (i, C, W ) is extended from i to j, the new label
1. For each item k ∈ {i + 1, . . . , i + }, update S with value S + x∗k . L = (i , C , W ) has components:
If, at some point, S ≥ 1, then there is no index j for which (3) is
violated for the index pair (i, j ). We can then move to the next C + cj if j ∈ {1, . . . , n}
i = j, C = ,
i. C if j = τ
2. Otherwise, start scanning items j ∈ {i + + 1, . . . , n} such that
x∗j > 0. W + wj if j ∈ {1, . . . , n}
W = . (8)
W if j = τ
(a) If xi + x j − 1 > S, then the solution violates the compactness
constraint for index pair (i, j ). Optimal solutions of the mKPC correspond to labels such that i =
(b) Otherwise, update S with value S + x∗j and move to the next τ , W ≥ q, and C is minimal.
j. Note that, as soon as W ≥ q for some label, the only sensible
extension for that label is from the current node to the sink node
The validity of step 1 follows because condition S ≥ 1 makes the
τ . Analogously, if W < q, then it does not make sense to extend
right-hand side of (3) larger or equal than 2 and thus, the inequal-
that label to τ because the new label would correspond to an in-
ity holds. The condition in step 2.a corresponds exactly to a viola-
feasible solution.
tion of (3), while step 2.b is needed to consider all items between
Consider two labels, L1 = (i, C1 , W1 ) and L2 = (i, C2 , W2 ), refer-
i and j.
ring to two partial paths to the same node i. If C1 ≤ C2 and W1 ≥
Strengthened compactness constraints. Finally, we observe that
W2 , then no extension of L2 up to the sink node τ can correspond
the separation procedure for compactness constraints can be mod-
to a strictly better solution than the corresponding extension of L1
ified in a straightforward way to detect and add violated inequal-
along the same path. This observation leads to the following dom-
ities (7) instead of the original (3). In particular, for the fractional
inance rule: L1 dominates L2 if C1 ≤ C2 , W1 ≥ W2 , and at least one
case, it is enough to replace the condition in step 2.a with condi-
of the two inequalities is strict. In this case, one can discard label
tion
L2 . In case both inequalities are actually equalities, one can discard
j−i−1 either L1 or L2 (but not both) arbitrarily.
(xi + x j − 1 ) > S.

4.4. Greedy heuristic for the 1c-mKPC

4.3. Dynamic programming For the special case of the 1c-mKPC, we describe here the
greedy heuristic procedure used in the PRISCA package (Cappello,
To derive a DP algorithm for the (general) mKPC, we ﬁrst in- 2022) to determine whether a credible set corresponds to a valid
troduce an auxiliary directed graph G = (V, A ). The vertex set con- change point. As mentioned in Section 1.1, the authors consider the
tains a source node σ , a sink node τ , and one node for each item. case in which all costs are unitary, and they deem the credible set
Overall, V = {σ , 1, . . . , n, τ }. The arc set A contains: valid if their heuristic solution of the corresponding 1c-mKPC uses
fewer than n2 items.
• Arcs from σ to each node i ∈ {1, . . . , n}.
The greedy procedure aims at identifying a subset of items
• Arcs from each node i ∈ {1, . . . , n} to τ .
P ⊆ {1, . . . , n} with total weight at least q and satisfying the com-
• An arc from node i to j, for each pair i, j ∈ {1, . . . , n} such that
pactness constraints. The procedure starts by initialising P with a
i < j ≤ i + .
single item, namely the one with the highest weight:
Fig. 6 depicts graph G when = 2. Thinner arrows represents

arcs from σ and to τ , while the thicker ones represents arcs be- P= argmax w j | j ∈ {1, . . . , n} .
tween nodes {1, . . . , n}. A feasible solution of the mKPC corre-
sponds to a path in G starting at σ , ending at τ , and such that It then keeps augmenting P by adding, at each iteration, the heav-
the weight collected at visited nodes is at least q. iest item which is not yet selected and does not violate compact-
To avoid the complete enumeration of all feasible solutions, we ness constraints:

propose a labelling algorithm in which we associate a label to each
partial path from σ . A label L = (i, C, W ) has three components: P ← P ∪ argmax w j | j ∈ {1, . . . , n} \ P, ∃i ∈ P : | j − i| ≤ .
the last visited node i, the total cost C of visited nodes, and the

total collected weight W . The initial label is L = (σ , 0, 0 ). Each The algorithm stops as soon as j∈P w j ≥ q.

391
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

5. Computational results λ and scale nk . Here k ∈ {8, 16, 32} is an instance generation pa-
rameter. Weights will be more tightly distributed around the
In this section, we report the results of computational exper- peak when k is larger. The method samples 50 0 0 times from
iments to test the effectiveness of the algorithms presented in this distribution and builds the corresponding histogram with
Section 4. The code was implemented in C++, using Gurobi 9.5 n bars. The jth bar counts how many samples fell in the inter-
as the MIP solver. Experiments ran on a machine equipped with val [ j, j + 1 ). The weight w j of the jth item is then set as the
an Intel Xeon CPU running at 2.4 GHz and 4GB RAM (increased to height of the jth bar of the histogram. Finally, weights w j are
8GB for instances with n = 600). The MIP solver was instructed to obtained by normalisation as in Eq. (9). Fig. 8 shows an exam-
only use one thread. All algorithms used a time limit of 3600s. The ple of a OnePeak instance.
instances and the code used are available under an open-source li- • The TwoPeaks methods is similar to OnePeak, except that the
cence (Santini, 2022). histogram is built by sampling from the sum of two truncated
After describing the instance set used, we analyse the results of normal distributions with locations λ1 and λ2 , and common
three sets of experiments: scale 2nk . Intuitively, λ1 and λ2 are the locations of two peaks.
The values of the two locations are drawn from two further
1. Experiments to assess the impact of strengthened constraints
truncated normal distributions between 1 and n, and rounded
(7).
to the nearest integer. The first distribution has location n3 , the
2. Experiments to compare the compact formulation, the branch-
second one has location 23n , and both have scale n6 . Fig. 9 shows
and-cut algorithm, and the DP labelling algorithm for the
an example of a TwoPeaks instance.
mKPC.
3. Experiments to investigate the difficulty of solving the unit-cost We use three costs generation methods:
version of the problem. To this end, on top of the above al- • The Constant method simply assigns unit costs to all items and
gorithms, we also add the DP algorithm for the 1c-mKPC (de- allows us to extend the results obtained on the S1 set to larger
scribed in the proof of Theorem 1) and the greedy heuristic de- instances with different weight types.
scribed in Section 4.4. • The Few method aims at modelling real-life statistical applica-
tions in which few items have a small cost, and all other items
5.1. Instances n
have a constant larger one. In particular, it first selects 100
items using a roulette wheel method with probabilities equal to
We consider two sets of instances. We obtained the first set, item weights. It then assigns these items a weight of 0.10 and
denoted S1, from the authors of (Cappello & Madrid Padilla, all other items a weight of 1. The reason we use roulette wheel
2022). This set consists of 300 instances with n ∈ {40, 200}, q ∈ selection is that, in the application, the items with the lower
{0.90, 0.95}, and ∈ {2, 3, 5}. All costs are equal to 1 and, there- costs correspond to time instants with a higher prior probabil-
fore, set S1 contains 1c-mKPC instances. ity of containing a change point. These items are thus also more
Because the costs in the S1 instances are all unitary, and likely to be detected by the algorithm and, as a consequence, to
the number of items is relatively low, we also generated a sec- have a larger weight. Therefore, assuming that the prior knowl-
ond set, denoted S2. This set contains 189 instances with n ∈ edge is accurate and that the algorithm works correctly, items
{20 0, 40 0, 60 0}, q = 0.95, and ∈ {2, 3, 5, 10}. In the following, we with larger weights are more likely to have lower costs.
explain how we generate the weights and the costs in the in- • The Random method assigns each item a cost uniformly dis-
stances of set S2. We use three weight-generation methods: tributed in the interval [1, 10].

• The Noise method ﬁrst assigns each item j a weight Note that we have three possible values for parameter n, three
values for parameter , and three for the cost generation method.
1 1 Their combination gives 27 parameter combinations using weight
wj = + N 0, ,
n 4n generation method Noise. Because we generate 3 instances for
where N (λ, σ ) denotes a normal distribution with location λ each combination, we build 81 Noise instances. Furthermore, for
and scale σ . To avoid numerical issues, we also ensure that no each of these 27 combinations, we have 3 possible values for pa-
weight is smaller than 10−12 , i.e., we set rameter k, yielding 81 parameter combinations for each of the
OnePeak and TwoPeaks weight generation methods. Again, gener-
w j ← max{w j , 10−12 }. ating 3 instances for each combination, we obtain 243 instances
Because the sum of the above weights is not necessarily equal for each method. Overall, we then construct 81 + 2 × 243 = 567 in-
to one, we ﬁnally normalise them: stances.

wj 5.2. Computational experiments

w j = n ∀ j ∈ {1, . . . , n}. (9)
i=1 wi
In this section, we present the results of computational ex-
Fig. 7 shows an example of a Noise instance, with its op- periments on the instances described in Section 5.1. We ﬁrst in-
timal solution represented in orange. The y axis, labelled vestigate the role of strengthened inequalities (7) on the com-
“Probability”, refers to the statistical application mentioned in pact formulation and the branch-and-cut (B&C) algorithm. Next,
Section 1.1, in which item weights represent probabilities. Noise we compare these two algorithms with the labelling algorithm
instances tend to require a large fraction of selected items to introduced in Section 4.3. We present the results of these com-
reach the target weight of q = 0.95. parisons using instances of set S2 because these are larger and
• The OnePeak method proceeds as follows. It ﬁrst chooses a ran- more varied. Finally, we compare our approaches (including the DP
dom location λ between 1 and n, sampling from a truncated one introduced via Theorem 1) with the greedy heuristic of the
normal distribution with location n2 and scale n4 , and rounding PRISCA package (Cappello, 2022) on 1c-mKPC instances. This com-
to the nearest integer. It then generates an instance in which parison allows us to assess the advantage of exact algorithms over
the weights have a peak around λ, i.e., an instance similar to a heuristic one. We include instances from set S1 in this experi-
the one depicted in Fig. 3. To this end, it considers another ment to ensure that exact algorithms are competitive on instances
truncated normal distribution between 1 and n, with location from the statistics literature.

392
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

Fig. 7. Example Noise instance with its optimal solution.

Fig. 8. Example OnePeak instance with its optimal solution.

Impact of strengthened compactness constraints. We use two rel- where “UB” indicates the best primal solution and LB is the
evant metrics to assess the impact of strengthened inequalities (7): tightest dual bound returned by the solver. Gap% corresponds
to the familiar gap returned by black-box integer programming
1. The percentage optimality gap, i.e., the gap between the best
solvers and depends on both the quality of the primal and dual
primal and dual bounds returned by each algorithm within the
bound.
time limit. This metric is denoted as Gap% and is deﬁned as
2. The second metric is the solution time in seconds, including the
follows:
time spent creating the model and exploring the branch-and-
UB − LB bound tree. It is denoted by Time (s).
Gap% = 100 · ,
UB
393
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

Fig. 9. Example TwoPeaks instance with its optimal solution.

Table 1
Impact of strengthened inequalities (7) on the performance of the Compact Formulation and the Branch & Cut algorithm. The table refers to instances whose weights are
generated with the Noise and OnePeak methods.

n Weights Costs Compact MIP with (3) Compact MIP with (7) B&C with (3) B&C with (7)
Time (s) Time (s) Time (s) Time (s)

200 Noise Constant 3.71 4.82 0.00 0.00

Few 4.12 6.72 0.01 0.01
Random 3.92 7.39 0.01 0.01
OnePeak Constant 4.74 5.02 0.00 0.01
Few 4.32 5.25 0.01 0.01
Random 5.28 6.22 0.05 0.02
400 Noise Constant 50.60 49.34 0.01 0.01
Few 56.08 99.85 0.03 0.02
Random 52.35 63.58 0.03 0.03
OnePeak Constant 87.03 104.23 0.01 0.01
Few 72.92 76.48 0.04 0.02
Random 72.34 110.54 0.23 0.10
600 Noise Constant 185.42 187.73 0.01 0.01
Few 233.83 239.03 0.05 0.02
Random 243.66 220.93 0.03 0.03
OnePeak Constant 466.23 635.56 0.02 0.01
Few 445.60 455.45 0.33 0.13
Random 388.22 554.11 0.41 0.19
Overall 152.05 187.17 0.10 0.04

We also note that instances generated using weight types Two- B&C algorithm, on the other hand, closes these instances in a few
Peaks are considerably harder than the other instances. Therefore, hundredths of a second: a difference of ﬁve orders of magnitude.
we present the results obtained on Noise and OnePeak instances Regarding the effect of strengthened compactness constraints, we
separately from those obtained on TwoPeaks instances. After com- note that they do not seem to help when solving the full compact
menting on these results, we will come back to the diﬃculty of formulation. If anything, in fact, they slightly increase the compu-
TwoPeaks instances, and we will explain what sets them apart tation time. On the other hand, they reduce the computation time
from the other instances. of the B&C algorithm.
Table 1 reports the results on the Noise and OnePeak instances Table 2 presents the results on the TwoPeaks instances. These
of set S2. Because all algorithms solve to optimality all instances instances are considerably harder to solve: in several cases, the
with up to 600 items, Table 1 is only reporting the runtimes. solvers run out of time without solving the model to optimality.
Note how the runtimes are very different for the complete com- Even when they solve the model to optimality, it takes, on average,
pact formulation and for the B&C algorithm. For the largest in- much longer compared with Noise and OnePeak instances. For
stances, for example, the average runtimes needed to solve the these instances, the strengthening constraints have a consider-
compact formulation are in the order of hundreds of seconds. The able effect on the solvers. Indeed, by using the strengthened

394
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

Table 2
Impact of strengthened inequalities (7) on the performance of the Compact Formulation and the Branch & Cut algorithm. The table refers to instances whose weights are
generated with the TwoPeaks method.

n Costs Compact MIP with (3) Compact MIP with (7) B&C with (3) B&C with (7)

Gap% Time (s) Gap% Time (s) Gap% Time (s) Gap% Time (s)

200 Constant 5.99 710.39 0.09 492.53 13.53 1746.66 0.83 884.48
Few 7.63 828.35 0.05 462.61 12.66 2056.36 0.63 627.83
Random 4.18 578.62 0.54 561.51 8.71 1362.68 0.48 591.01
400 Constant 6.11 1993.72 2.52 1999.99 15.98 2558.45 1.56 1818.52
Few 10.92 1531.91 2.48 1737.72 13.42 1747.64 3.00 1603.26
Random 3.51 1079.89 3.41 1915.56 13.21 2606.64 2.69 1350.17
600 Constant 18.88 2131.24 6.46 2147.31 15.97 2133.58 14.07 1740.09
Few 19.31 2603.57 6.72 2890.47 19.49 2851.15 7.70 2419.43
Random 12.92 2376.42 6.23 2677.12 15.92 2666.89 4.79 2085.05
Overall 9.94 1537.12 3.17 1653.87 14.13 2008.89 3.81 1428.38

Fig. 10. Left: percentage of items selected fractionally in the optimal solution of the continuous relaxation of the mKPC. Right: normalised Gini coeﬃcient showing how
close the fractional values are to 0.5 (the higher the value, the closer to 0.5).

inequalities (7) the average gaps are roughly reduced by two- Metric FracGini is the normalised Gini coeﬃcient of the fractional
thirds. We also observe that, on these harder instances, the B&C variables, i.e.,
algorithm loses its advantage on the compact formulation. The n
j=1 x∗j (1 − x∗j )
gaps produced by B&C are slightly worse, while the runtimes are FracGini = .
comparable.
j ∈ {1, . . . , n} : 0 < x∗j < 1
Peculiarity of the TwoPeaks instances. As Tables 1 and 2 show,
TwoPeaks instances are much harder to solve using branch-and-
The value of this metric is higher when many x∗j are concentrated
bound methods compared with the other instances. The reason
lies in the characteristics of the optimal solution of the continu- around 0.5, while it is lower when the x∗j take values close to 0
ous relaxation of the mKPC. Solutions of TwoPeaks instances have or 1. Values x∗j ∈ {0, 1} do not contribute to the sum at the nu-
a large number of fractional items, and the value of the corre- merator. Therefore, solutions with more fractional items have more
sponding variables x are closer to 0.5. This implies that much more non-zero terms in the sum at the numerator. To compensate, we
branching is necessary while exploring the branch-and-bound tree. normalise dividing by the number of fractional items.
To appreciate the extent by which TwoPeaks instances differ from Comparison of the algorithms for the mKPC. Table 3 compares the
the other instances, Fig. 10 shows boxplots of two metrics relative performance of three approaches for the mKPC. Because strength-
to the optimal solution of the continuous relaxation of the mKPC, ened inequalities (7) result in lower gaps, we enable them for both
divided by weight generation method. Let x∗1 , . . . , x∗n be such a so- the branch-and-cut algorithm and the compact formulation.
lution. Metric Frac% gives the percentage of items selected frac- Table 3 reports the following metrics:
tionally in the solution, i.e.,
1. Opt% is the percentage of instances in each row for which the
algorithm found a provably optimal solution.
j ∈ {1, . . . , n} : 0 < x∗j <1 2. PGap% is the percentage primal gap, i.e., the gap between the
Frac% = 100 · . best primal solution found by each algorithm and the best
n
395
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

Table 3
Comparison of the MIP-based approaches (the branch-and-cut and the compact formulation) with the labelling algorithm presented in Section 4.3, on the S2 instances.

n Weights Costs B&C with (7) Compact MIP with (7) Labelling

Opt% PGap% Time (s) Opt% PGap% Time (s) Feas% Opt% PGap% Time (s)

200 Noise Constant 100.00 0.00 0.00 100.00 0.00 4.82 100.00 100.00 0.00 1.85
Few 100.00 0.00 0.01 100.00 0.00 6.72 100.00 100.00 0.00 3.05
Random 100.00 0.00 0.01 100.00 0.00 7.39 88.89 44.44 26.36 2917.73
OnePeak Constant 100.00 0.00 0.01 100.00 0.00 5.02 100.00 100.00 0.00 1.32
Few 100.00 0.00 0.01 100.00 0.00 5.25 100.00 100.00 0.00 1.59
Random 100.00 0.00 0.02 100.00 0.00 6.22 100.00 88.89 0.05 125.94
TwoPeaks Constant 77.78 0.49 884.48 96.30 0.05 492.53 100.00 96.30 0.13 1.81
Few 85.19 0.45 627.83 92.59 0.04 462.61 100.00 96.30 0.06 1.59
Random 88.89 0.01 591.01 88.89 0.00 561.51 100.00 85.19 0.04 144.46
400 Noise Constant 100.00 0.00 0.01 100.00 0.00 49.34 100.00 100.00 0.00 48.02
Few 100.00 0.00 0.02 100.00 0.00 99.85 100.00 100.00 0.00 160.17
Random 100.00 0.01 0.03 100.00 0.01 63.58 66.67 0.00 100.00 3600.00
OnePeak Constant 100.00 0.00 0.01 100.00 0.00 104.23 100.00 88.89 0.15 20.09
Few 100.00 0.00 0.02 100.00 0.00 76.48 100.00 100.00 0.00 34.99
Random 100.00 0.00 0.10 100.00 0.00 110.54 100.00 74.07 0.52 1975.23
TwoPeaks Constant 59.26 1.38 1818.52 55.56 1.42 1999.99 100.00 92.59 0.83 29.94
Few 55.56 2.12 1603.26 55.56 1.55 1737.72 100.00 92.59 0.79 45.55
Random 66.67 1.29 1350.17 55.56 1.31 1915.56 96.30 66.67 12.36 2071.43
600 Noise Constant 100.00 0.00 0.01 100.00 0.00 187.73 100.00 100.00 0.00 234.71
Few 100.00 0.00 0.02 100.00 0.00 239.03 100.00 100.00 0.00 1090.45
Random 100.00 0.01 0.03 100.00 0.01 220.93 66.67 0.00 100.00 3600.00
OnePeak Constant 100.00 0.00 0.01 100.00 0.00 635.56 100.00 96.30 0.02 207.80
Few 100.00 0.00 0.13 100.00 0.00 455.45 100.00 85.19 0.04 282.16
Random 100.00 0.00 0.19 100.00 0.00 554.11 88.89 40.74 25.05 3337.04
TwoPeaks Constant 51.85 14.12 1740.09 44.44 5.46 2147.31 100.00 100.00 3.32 231.36
Few 33.33 13.26 2419.43 29.63 4.83 2890.47 100.00 88.89 2.89 345.75
Random 44.44 4.20 2085.05 29.63 4.62 2677.12 81.48 18.52 39.37 3485.78
Overall 83.95 1.73 620.83 83.25 0.92 815.75 97.18 83.42 6.24 697.20

Table 4
Comparison of four algorithms on the unit-cost instances of sets S1 and S2.

Algorithm n Opt% PGap% Time (s)

B&C with (7) 40 100.00 0.00 0.0010

200 99.23 0.01 28.9298
400 76.19 0.36 698.9947
600 76.19 5.22 698.4682
Compact MIP with (7) 40 100.00 0.00 0.0190
200 99.62 0.00 40.8350
400 80.95 0.33 590.5545
600 71.43 1.07 1454.4012
Labelling (Section 4.3) 40 100.00 0.00 0.0007
200 100.00 0.00 0.6564
400 100.00 0.00 21.0971
600 100.00 0.00 212.1984
Dynamic Programming (Theorem 1) 40 100.00 0.00 0.0000
200 100.00 0.00 0.0000
400 100.00 0.00 0.0005
600 100.00 0.00 0.0021
Greedy (Section 4.4) 40 96.67 0.30 0.0000
200 87.74 1.36 0.0000
400 66.67 10.73 0.0000
600 61.90 10.54 0.0000

known primal solution. We use PGap% instead of Gap% because each row for which the algorithm found at least one feasible
the labelling algorithm provides no dual bound when it cannot solution.
solve an instance within the time limit.
3. Finally, we observe that the labelling algorithm can terminate To ensure a fair comparison, we compute averages using a
in three different states. If it completes before the time limit, PGap% of 100 % and a Time (s) of 3600 s when the labelling al-
it has found the optimal solution. If it times out and there is gorithm does not give any feasible solution.
at least one label extended to the sink node τ , then the algo- Table 3 shows that branch-and-cut can usually ﬁnd more op-
rithm can be used as a heuristic: any label extended to the sink tima than the other algorithms and in a shorter time. The average
node corresponds to a feasible solution. The algorithm can re- primal gap, however, is lower for the compact MIP formulation.
turn the best such solution, although it cannot prove or dis- The labelling algorithm does not always manage to produce a fea-
prove its optimality while there are still unextended labels. If sible solution. In particular, its performance degrades for instances
the algorithm times out and no label was extended to τ , then with cost type Random and, to a lesser extent, with weight type
we do not even have a feasible solution to compute PGap%. Noise. For these instances, in fact, dominance is less likely, and the
Therefore, we introduce the additional column Feas% for the labelling algorithm becomes similar to a complete enumeration of
labelling algorithm. It contains the percentage of instances in feasible solutions.

396
A. Santini and E. Malaguti European Journal of Operational Research 312 (2024) 385–397

When the labelling algorithm finds feasible solutions, however, arising in classical statistics. In particular, we showed that OR can
they are often optimal. In these cases, branch-and-cut usually also provide practical tools to solve to optimality problems which are
finds optimal solutions in a shorter time. A notable exception is often approached heuristically in the statistical community (see,
TwoPeaks instances: when costs are Constant or Few, labelling is e.g., Bertsimas, King, & Mazumder, 2016, for an illustrious exam-
the best-performing algorithm, finding more optima in a consider- ple). Future work may identify further problems in statistics which
ably shorter time. OR techniques can efficiently approach.
Experiments on 1c-mKPC instances. Finally, we report the perfor-
mances of five algorithms on the unit-cost mKPC. In addition to Acknowledgments
the three algorithms considered above, we add the DP approach
presented in the proof of Theorem 1 and the greedy heuristic in- We are grateful to Lorenzo Cappello and Oscar Madrid Padilla
troduced in Section 4.4. The test set for this experiment includes for sharing the instances of set S1, Lorenzo Cappello for fruitful dis-
both S1 instances and S2 instances with Constant costs. cussions on the problem, and three anonymous reviewers for their
The results show that the tailored DP approach vastly outper- constructive comments. Enrico Malaguti was supported by the Air
forms all other algorithms. It solves all instances, even the largest Force Office of Scientific Research under award FA8655-20-1-7019.
ones with n = 600, in less than two-thousandths of a second per
instance. The greedy heuristic is even faster (all measured times References
were under 0.0 0 0 05 s) but often fails at identifying the optimal so-
Aminikhanghahi, S., & Cook, D. (2017). A survey of methods for time series change
lution, especially when the size of the instance grows. We can con- point detection. Knowledge and Information Systems, 51(2), 339–367. https://doi.
clude that specialised approaches for the 1c-mKPC are well justi- org/10.1007/s10115- 016- 0987- z.
fied and that there is no reason to use heuristic algorithms because Babat, L. (1975). Linear functionals on the n-dimensional unit cube. Reports of the
Academy of Sciences of the Soviet Union, 221, 761–762.
our proposed DP approach is extremely fast in practice (Table 4). Bertsimas, D., King, A., & Mazumder, R. (2016). Best subset selection via a modern
optimization lens. Annals of Statistics, 44(2), 813–852. https://doi.org/10.1214/
6. Conclusions 15-AOS1388.
Cappello, L. (2022). prisca. GitHub repository. https://www.github.com/lorenzocapp/
prisca.
This paper introduced the min-Knapsack Problem with Com- Cappello, L., & Madrid Padilla, O. H. (2022). Variance change point detection with
pactness Constraints (mKPC), an extension of the classical min- credible sets. arXiv:2211.14097.
Conrad, J., Gomes, C., van Hoeve, W.-J., Sabharwal, A., & Suter, J. (2007). Connections
Knapsack problem. The motivation for studying the mKPC is that it
in networks: Hardness of feasibility versus optimality. In P. Van Hentenryck, &
arises as a sub-problem in two state-of-the-art algorithms recently L. Wolsey (Eds.), International conference on integration of artificial intelligence
introduced in the statistical community. These are the PRISCA al- (AI) and operations research (OR) techniques in constraint programming (pp. 16–
gorithm of Cappello & Madrid Padilla (2022) for detecting change 28). Springer. https://doi.org/10.1007/978- 3- 540- 72397- 4_2.
Csirik, J., Frenk, H., Labbé, M., & Zhang, S. (1991). Heuristics for the
points in time series and the SuSiE algorithm of Wang et al. 0–1 min-Knapsack problem. Acta Cybernetica, 10(1–2), 15–20.
(2020) for variable selection in high-dimensional regression. Fischetti, M., Leitner, M., Ljubic, I., Luipersbeck, M., Monaci, M., Resch, M., . . .
We proposed three approaches to solve the mKPC: a compact Sinnl, M. (2017). Thinning out steiner trees: A node-based model for uniform
edge costs. Mathematical Programming Computation, 9, 203–229. https://doi.org/
formulation with a quadratic number of constraints, a branch- 10.1007/s12532- 016- 0111-0.
and-cut approach in which these constraints are separated dy- Gambella, C., Ghaddar, B., & Naoum-Sawaya, J. (2021). Optimization problems for
namically, and a labelling algorithm. Despite branch-and-cut being machine learning: A survey. European Journal of Operational Research, 290, 807–
828. https://doi.org/10.1016/j.ejor.2020.08.045.
more often used when the number of constraints is exponential in Hamilton, J. (1994). Time series analysis. Princeton University Press.
the problem size, computational experiments proved that this ap- Kellerer, H., Pferschy, U., & Pisinger, D. (2004). Knapsack problems. Springer.
proach could also be helpful when dealing with compact models. Kim, T. Y., Oh, K. J., Sohn, I., & Hwang, C. (2004). Usefulness of artificial neural net-
works for early warning system of economic crisis. Expert Systems with Applica-
In particular, the branch-and-cut algorithm solves the largest num- tions, 26(4), 583–590. https://doi.org/10.1016/j.eswa.20 03.12.0 09.
ber of instances to optimality. It is orders of magnitude faster than Martello, S., & Toth, P. (1990). Knapsack problems. Wiley.
solving the entire formulation with the state-of-the-art black-box Radke, R., Andra, S., Al-Kofahi, O., & Roysam, B. (2005). Image change detection al-
gorithms: A systematic survey. IEEE Transactions on Image Processing, 14(3), 294–
solver Gurobi. Computational experiments also showed that the
307. https://doi.org/10.1109/TIP.2004.838698.
problem’s difficulty depends considerably on instance characteris- Reeves, J., Chen, J., Wang, X. L., Lund, R., & Lu, Q. Q. (2007). A review and com-
tics. In particular, instances with a particular double-peak structure parison of changepoint detection techniques for climate data. Journal of Applied
are harder to solve to optimality. Meteorology and Climatology, 46(6), 900–915. https://doi.org/10.1175/JAM2493.1.
Ricca, F., Scozzari, A., & Simeone, B. (2013). Political districting: From classical mod-
Finally, we focused our attention on a special case of the mKPC, els to recent approaches. Annals of Operations Research, 204, 271–299. https:
named the unit-cost mKPC (1c-mKPC). This problem is especially //doi.org/10.1007/s10479- 012- 1267- 2.
relevant for the statistical applications because it corresponds to Santini, A. (2022). Algorithms for the min-Knapsack problem with compactness con-
straints. GitHub repository. https://doi.org/10.5281/zenodo.7492799.
the case in which the user of the PRISCA and SuSiE algorithms Stiglmayr, M., Figueira, J. R., Klamroth, K., Paquete, L., & Schulze, B. (2022). Decision
mentioned above has no prior knowledge of, respectively, which space robustness for multi-objective integer linear programming. Annals of Op-
time instants and which features are more likely to be selected. We erations Research, 319, 1769–1791. https://doi.org/10.1007/s10479-021-04462-w.
Swamy, R., King, D. M., & Jacobson, S. H. (2022). Multiobjective optimization for
proved that the 1c-mKPC is solvable in polynomial time and pro- politically fair districting: A scalable multilevel approach. Operations Research,
posed a specific dynamic programming algorithm. Computational 71(2), 536–562.
results clearly show that using this algorithm is better than the Validi, H., Buchanan, A., & Lykhovyd, E. (2022). Imposing contiguity constraints in
political districting models. Operations Research, 70(2), 867–892.
generic mKPC approaches and a greedy heuristic from the statis-
Wang, G., Sarkar, A., Carbonetto, P., & Stephens, M. (2020). A simple new approach
tics literature. to variable selection in regression, with application to genetic fine mapping.
This work contributes to the literature at the intersection be- Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82, 1273–
1300. https://doi.org/10.1111/rssb.12388.
tween operational research (OR) and statistics. Although some au-
Yang, P., Dumont, G., & Ansermino, J. M. (2006). Adaptive change detection in heart
thors recently explored problems in machine learning from the rate trend monitoring in anesthetized children. IEEE Transactions on biomedical
point of view of OR (see, e.g., Gambella, Ghaddar, & Naoum- engineering, 53(11), 2211–2219. https://doi.org/10.1109/TBME.2006.877107.
Sawaya, 2021), there are not many works which address problems

397

Coraline Script
75% (8)
Coraline Script
62 pages
Ivandic Odyssey 2022
No ratings yet
Ivandic Odyssey 2022
1,208 pages
Knapsack Problems
100% (3)
Knapsack Problems
306 pages
Design Thinking Question Paper
100% (4)
Design Thinking Question Paper
8 pages
Bencao Gangmu 12
No ratings yet
Bencao Gangmu 12
4 pages
Salas vs. Adil - Digest
No ratings yet
Salas vs. Adil - Digest
2 pages
Unit-4 Dynamic Programming
No ratings yet
Unit-4 Dynamic Programming
131 pages
Chapter Four and Five
No ratings yet
Chapter Four and Five
56 pages
A Tutorial On Integer Programming: G Erard Cornu Ejols Michael A. Trick Matthew J. Saltzman 1995
No ratings yet
A Tutorial On Integer Programming: G Erard Cornu Ejols Michael A. Trick Matthew J. Saltzman 1995
37 pages
The Control Handbook Control System Fundamentals 2ed Edition Levine W.S. (Ed.) - The Ebook Is Now Available, Just One Click To Start Reading
No ratings yet
The Control Handbook Control System Fundamentals 2ed Edition Levine W.S. (Ed.) - The Ebook Is Now Available, Just One Click To Start Reading
83 pages
Tetrahedron Letters: Saket B. Bhagat, Vikas N. Telvekar
No ratings yet
Tetrahedron Letters: Saket B. Bhagat, Vikas N. Telvekar
5 pages
Documented Disciplinary and Grievance Handling Procedure
No ratings yet
Documented Disciplinary and Grievance Handling Procedure
7 pages
0-1 Knapsack Problem
No ratings yet
0-1 Knapsack Problem
57 pages
Geometry Meets Vectors
No ratings yet
Geometry Meets Vectors
57 pages
Dynamic Programming
No ratings yet
Dynamic Programming
39 pages
Problem Solving Method
No ratings yet
Problem Solving Method
21 pages
A Nonlinear Knapsack Problem
No ratings yet
A Nonlinear Knapsack Problem
14 pages
CASE DISCUSSION Subgroup 1 1
No ratings yet
CASE DISCUSSION Subgroup 1 1
112 pages
Bioinformatics Lab Assignment Group 3
No ratings yet
Bioinformatics Lab Assignment Group 3
7 pages
A K-Means Supported Reinforcement Learning Framework To Multi-Dimensional Knapsack
No ratings yet
A K-Means Supported Reinforcement Learning Framework To Multi-Dimensional Knapsack
31 pages
Def Aerospace
No ratings yet
Def Aerospace
26 pages
C. Coey, M. Lubin Et J. P. Vielma - Outer Approximation With Conic Certificates For Mixed-Integer Convex Problems (2020)
No ratings yet
C. Coey, M. Lubin Et J. P. Vielma - Outer Approximation With Conic Certificates For Mixed-Integer Convex Problems (2020)
45 pages
Module 3 Dynamic Programming HMP
No ratings yet
Module 3 Dynamic Programming HMP
76 pages
Lec37 Dynamic Programming
No ratings yet
Lec37 Dynamic Programming
23 pages
vt0228 English
No ratings yet
vt0228 English
38 pages
Consumer Behavior Introduction
No ratings yet
Consumer Behavior Introduction
121 pages
DP-Knapsack Problem
No ratings yet
DP-Knapsack Problem
51 pages
Tài liệu mhh
No ratings yet
Tài liệu mhh
25 pages
W Pg#s
No ratings yet
W Pg#s
24 pages
Knapsack Polytopes, A Survey - C. Hojny Et Al
No ratings yet
Knapsack Polytopes, A Survey - C. Hojny Et Al
49 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Dynamic Programing
No ratings yet
Dynamic Programing
157 pages
CSCE 310 Data Structures & Algorithms: Dr. Ying Lu
No ratings yet
CSCE 310 Data Structures & Algorithms: Dr. Ying Lu
59 pages
Frank de Meijer Et Renata Sotirov - On Integrality in Semidefinite Programming For Discrete Optimization (2023)
No ratings yet
Frank de Meijer Et Renata Sotirov - On Integrality in Semidefinite Programming For Discrete Optimization (2023)
27 pages
CSE408 Knapsack Problem
No ratings yet
CSE408 Knapsack Problem
45 pages
Courtship Dating and Marriage Apple GR 1
No ratings yet
Courtship Dating and Marriage Apple GR 1
29 pages
Del Valle, 2012
No ratings yet
Del Valle, 2012
10 pages
Solving Problems With Semidefinite and Related Constraints Using Interior-Point Methods For Nonlinear Programming
No ratings yet
Solving Problems With Semidefinite and Related Constraints Using Interior-Point Methods For Nonlinear Programming
24 pages
AA Research Paper
No ratings yet
AA Research Paper
28 pages
Improved Approximation Results For Stochastic Knapsack Problems
No ratings yet
Improved Approximation Results For Stochastic Knapsack Problems
19 pages
Kilborn v. Bakhir, 4th Cir. (2003)
No ratings yet
Kilborn v. Bakhir, 4th Cir. (2003)
4 pages
0 1 Knapsack Using Class Notes DP
No ratings yet
0 1 Knapsack Using Class Notes DP
18 pages
Alper Atamturk Et Vishnu Narayanan - Conic Mixed-Integer Rounding Cuts (2008) (Springer)
No ratings yet
Alper Atamturk Et Vishnu Narayanan - Conic Mixed-Integer Rounding Cuts (2008) (Springer)
20 pages
code-cho-bạn-Drake-Ak-Lý-thuyết-đã chuyển đổi
No ratings yet
code-cho-bạn-Drake-Ak-Lý-thuyết-đã chuyển đổi
17 pages
Ati Teas 6 English Language Study Guide
No ratings yet
Ati Teas 6 English Language Study Guide
23 pages
Regions Bank Statement
No ratings yet
Regions Bank Statement
2 pages
Computing: Dynamic Programming Algorithms For The Zero-One Knapsack Problem
No ratings yet
Computing: Dynamic Programming Algorithms For The Zero-One Knapsack Problem
17 pages
(2000) A Near-Optimal Solution To A Two-Dimensional Cutting Stock Prob
No ratings yet
(2000) A Near-Optimal Solution To A Two-Dimensional Cutting Stock Prob
13 pages
Bounds and Algorithms For The Knapsack Problem With Conflict Graph
No ratings yet
Bounds and Algorithms For The Knapsack Problem With Conflict Graph
22 pages
Knapsack Problem
No ratings yet
Knapsack Problem
2 pages
Algoritma Analizi 12-Slides
No ratings yet
Algoritma Analizi 12-Slides
25 pages
Knapsack Jurnal
No ratings yet
Knapsack Jurnal
28 pages
An Efficient Hybrid Algorithm For The Separable Convex Quadratic Knapsack Problem
No ratings yet
An Efficient Hybrid Algorithm For The Separable Convex Quadratic Knapsack Problem
25 pages
Facets of The Knapsack Polytope - E. Balas
No ratings yet
Facets of The Knapsack Polytope - E. Balas
19 pages
Dynamic Programming
No ratings yet
Dynamic Programming
22 pages
1 s2.0 S0305054821003889 Main
No ratings yet
1 s2.0 S0305054821003889 Main
14 pages
Lodi, A., & Monaci, M. (2003) - Integer Linear Programming Models For 2-Staged Two-Dimensional Knapsack Problems.
No ratings yet
Lodi, A., & Monaci, M. (2003) - Integer Linear Programming Models For 2-Staged Two-Dimensional Knapsack Problems.
22 pages
Chap6 Approximation Algorithm
No ratings yet
Chap6 Approximation Algorithm
16 pages
A Semidefinite Programming Approach To The Quadratic Knapsack Problem - C. Helmberg
No ratings yet
A Semidefinite Programming Approach To The Quadratic Knapsack Problem - C. Helmberg
16 pages
Videomashupproposal
No ratings yet
Videomashupproposal
2 pages
Dynamic Programming
No ratings yet
Dynamic Programming
6 pages
Complaint Affidavit Murder 1 PDF
No ratings yet
Complaint Affidavit Murder 1 PDF
3 pages
Knapsack Problem
No ratings yet
Knapsack Problem
15 pages
Exact Solution of The Robust Knapsack Problem
No ratings yet
Exact Solution of The Robust Knapsack Problem
8 pages
2014 QKP Heuristic
No ratings yet
2014 QKP Heuristic
12 pages
Parameterized Approximation Schemes For IS and Knapsack
No ratings yet
Parameterized Approximation Schemes For IS and Knapsack
16 pages
A Branch-and-Bound Algorithm For The Knapsack Problem With Conflict Graph
No ratings yet
A Branch-and-Bound Algorithm For The Knapsack Problem With Conflict Graph
24 pages
A Minimal Algorithm For The Multiple-Choice Knapsack Problem
No ratings yet
A Minimal Algorithm For The Multiple-Choice Knapsack Problem
23 pages
Set Cover Problem
No ratings yet
Set Cover Problem
5 pages
Teza de Licenta
No ratings yet
Teza de Licenta
16 pages
Knapsack Problem Algorithm Analysis
No ratings yet
Knapsack Problem Algorithm Analysis
13 pages
Standard Problems: PCS104: Advanced Algorithms
No ratings yet
Standard Problems: PCS104: Advanced Algorithms
15 pages
Polynomial Multiple Knapsack
No ratings yet
Polynomial Multiple Knapsack
18 pages
Assignment2 Part2
No ratings yet
Assignment2 Part2
10 pages
On The Two-Dimensional Knapsack Problem
No ratings yet
On The Two-Dimensional Knapsack Problem
10 pages
Data Structures and Algorithms: Courtsey: Sundar B Slides of Data Structures, Tamassia Et Al Material
No ratings yet
Data Structures and Algorithms: Courtsey: Sundar B Slides of Data Structures, Tamassia Et Al Material
31 pages
MCS 312: NP Completeness and Approximation Algorithms: Instructor Neelima Gupta Ngupta@cs - Du.ac - in
No ratings yet
MCS 312: NP Completeness and Approximation Algorithms: Instructor Neelima Gupta Ngupta@cs - Du.ac - in
18 pages
Dynamic Programming
No ratings yet
Dynamic Programming
23 pages
Final
No ratings yet
Final
6 pages
(David Pisinger) Where Are The Hard Knapsack Problems
No ratings yet
(David Pisinger) Where Are The Hard Knapsack Problems
14 pages
K Means Algo
No ratings yet
K Means Algo
7 pages
Ratio Analysis On Coal India
No ratings yet
Ratio Analysis On Coal India
8 pages
5 Ways To Improve User Experience
No ratings yet
5 Ways To Improve User Experience
10 pages
ETR PHD Chemistry 2019
No ratings yet
ETR PHD Chemistry 2019
5 pages
Knapsack Problems: I. History
No ratings yet
Knapsack Problems: I. History
7 pages
「ほんまや！」
No ratings yet
「ほんまや！」
4 pages
Knapsack Problem
No ratings yet
Knapsack Problem
10 pages
Chapter 7: Data Link Control Protocols True or False: Data and Computer Communications, 10 Edition, by William Stallings
No ratings yet
Chapter 7: Data Link Control Protocols True or False: Data and Computer Communications, 10 Edition, by William Stallings
5 pages
Definition
No ratings yet
Definition
4 pages
Communicative Strategies
No ratings yet
Communicative Strategies
4 pages
Unfavarouble and Hostile Witnesess
No ratings yet
Unfavarouble and Hostile Witnesess
2 pages
List of Knapsack Problems: 2 Multiple Constraints
No ratings yet
List of Knapsack Problems: 2 Multiple Constraints
3 pages

The Min-Knapsack Problem With Compactness Constraints and Applications in Statistics - A. Santini

Uploaded by

The Min-Knapsack Problem With Compactness Constraints and Applications in Statistics - A. Santini

Uploaded by

European Journal of Operational Research 312 (2024) 385–397

Contents lists available at ScienceDirect

European Journal of Operational Research

Interfaces with Other Disciplines

The min-Knapsack problem with compactness constraints and

1. Introduction high-dimensional statistics (see Section 1.1) motivate the study of

Theorem 1. Consider the decision version of the 1c-mKPC: for a given

The following example shows why these constraints help

then generate them on-the-ﬂy by separating infeasible integer and

wj 5.2. Computational experiments

Fig. 7. Example Noise instance with its optimal solution.

Fig. 8. Example OnePeak instance with its optimal solution.

Fig. 9. Example TwoPeaks instance with its optimal solution.

200 Noise Constant 3.71 4.82 0.00 0.00

Algorithm n Opt% PGap% Time (s)

B&C with (7) 40 100.00 0.00 0.0010

You might also like