A Scalable Learning Approach For The Capacitated - 2024 - Computers - Operation
A Scalable Learning Approach For The Capacitated - 2024 - Computers - Operation
Keywords: Designing efficient heuristics for the different variants of vehicle routing problems and customising the
Capacitated vehicle routing problem heuristics to various input distributions is a time-consuming and expensive task. In recent years, end-to-end
Green vehicle routing problem machine learning techniques have been developed because they are easy to modify for different problem
Reinforcement learning
variants, thereby saving on the design time to develop new efficient heuristics. These learning techniques,
Set partitioning problem
such as the transformer-based constructive methods, struggle to provide high quality solutions on problem
instances with hundreds to thousands of customers in a reasonable time. Furthermore, many of the end-to-end
heuristics also do not guarantee that solutions obey fleet-size constraints. We propose a heuristic for solving
large capacitated vehicle routing problem (CVRP) that carefully integrates a machine learning heuristic with
Integer Linear Programming techniques. To address the issue of solutions with poor objective function values
generated by end-to-end machine learning approaches on larger instances, we dynamically partition the CVRP
problem instance into smaller sub-problems and apply a machine heuristic on the smaller sub-problems. This
allows the machine learning heuristic to always operate on smaller problems similar in size to those for which
it was trained. The machine learning heuristic generates many solutions for each sub-problem which are then
combined using a set partitioning approach based on a ILP formulation. The set partitioning ILP also guarantees
that solutions obey fleet-size constraints.
We evaluate the performance of our heuristic on a difficult set of benchmark instances with hundreds to
thousands of nodes, achieving small gaps (less than 3% on average) with respect to best known solutions, sig-
nificantly improving upon the solution quality of the existing learning heuristics. Furthermore, we demonstrate
that our results generalise well to other vehicle routing problems, such as green vehicle routing problem.
∗ Corresponding author.
E-mail address: [email protected] (D. Ajwani).
https://doi.org/10.1016/j.cor.2024.106787
Received 31 October 2023; Received in revised form 31 May 2024; Accepted 24 July 2024
Available online 8 August 2024
0305-0548/© 2024 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
a narrow range of problem sizes and only for problems that are similar partitioning problem are described in Section 3. Section 4 describes our
in structure to those encountered in the training sets. Furthermore, methodology in detail. Section 5 shows the computational experiments
the solutions obtained by these learning heuristics often violate the and the comparison results. Section 7 presents a discussion on the
fleet-size constraint. This limits the applicability of the end-to-end NN strengths and limitations of the proposed approach.
models to academic analysis in most cases.
In this paper, we develop a learning-based heuristic with a NN 2. Related literature
solver at the core that scales well and can solve problem sizes from
hundreds to thousands of customers. Our approach also obeys fleet-size We first provide a brief review of the exact and heuristic techniques
constraints. used for solving RPs and then focus on the ML heuristics developed for
To ensure that our learning heuristic can provide high-quality so- this purpose.
lutions on large problem instances, we dynamically decompose the RP
instance into a set of sub-problems. For this, we adopt a decomposition 2.1. Exact and heuristic techniques for RPs
scheme similar to the POPMUSIC metaheuristic (Ribeiro et al., 2002;
Queiroga et al., 2021; Li et al., 2021). Our key novel insight is that While we can obtain optimal solutions for very large instances of the
applying a reinforcement learning (RL) heuristic on the smaller sub- travelling salesman problem (TSP) (Applegate et al., 2009), these suc-
problems allows the RL heuristic to always operate on smaller problems cesses have yet to be replicated for any of the vehicle routing problems
similar in size to those for which it was trained. Thus, RL heuristic can (VRPs). Many of the VRPs are NP-hard optimisation problems (Cordeau
be used to generate many high-quality solutions for each sub-problem. et al., 2007). Unsurprisingly, even for CVRP, the state-of-the-art optimi-
These solutions can then be combined using a set partitioning approach sation techniques struggle to find optimal solutions once the instance
based on a MILP formulation. The set partitioning MILP also guarantees size grows to a few hundred nodes. Thus, researchers have focused on
that solutions obey fleet-size constraints. developing increasingly sophisticated solution techniques using branch-
We demonstrate that this decomposition enables learning-based and-cut-and price (BCP) methods (Pessoa et al., 2020). Using BCP
approaches to leverage their success at a large scale by solving the methods, problems with fewer than one hundred nodes can be solved
smaller sub-problems effectively. We show that small changes in the to optimality in less than a minute. The CVRP, however, is considered
training scheme improves generalisation performance of the neural to be mostly of academic significance or as a testing ground for other
network solver. In particular, we demonstrate the efficacy of our lean- problem variants with different objective functions and additional con-
ing heuristic approach for the capacitated vehicle routing problem straints more reflective of real-world problems (Pessoa et al., 2020;
(CVRP) with an extensive set of computational experiments. We test Queiroga et al., 2021). Adapting these techniques to different problem
a benchmark set of difficult problem instances with between 300 and variants is difficult and time-intensive, requiring expert knowledge of
1000 customers (Uchoa et al., 2017), and achieve small gaps (less than these methods (Pessoa et al., 2020).
3% on average) to the best known solutions in seconds to minutes. For this reason, many of the state of the art solution techniques for
Further, explore, with careful feature engineering and constraint VRPs are heuristics, some of which can construct reasonable solutions
masking, the ability of these approaches to be extended to other, more for very large problem instances in a short time frame (Vidal et al.,
difficult vehicle routing problem variants. In particular, we demon- 2012; Helsgaun, 2017). One of the most well-known effective heuristics
strate that these approaches can be used to solve the green vehicle for solving the CVRP and many related variants is the Lin–Kernighan
routing problem, even for large problem instances. We solve problem heuristic. The LKH-3 implementation can obtain optimal solutions for
instances from the difficult problem set of Erdoğan and Miller-Hooks many small problem instances and near-optimal solutions for many
(2012). larger problem instances (Helsgaun, 2017). Also, Hybrid Genetic Search
(HGS) has been shown to be extremely effective for the CVRP. It can
1.1. Contributions obtain optimal solutions for many problem instances with one hundred
nodes in seconds, and solutions with small optimality gaps for many
This work presents an integration of learning-based optimisation larger instances with known optimal solutions (Vidal et al., 2012).
problem solving with the traditional techniques of optimisation and op-
erational research, namely decomposition. Using the cluster first, route 2.2. ML heuristics for solving VRPs
second approach along with set partitioning, we leverage this marriage
to solve difficult capacitated vehicle routing problem instances. In Learning approaches for solving optimisation problems has grown
particular, we: in popularity in recent years. Many learning-based techniques have
been proposed to solve routing problems, with ML-methods augment-
• Show that neural construction heuristics can perform well, con- ing existing heuristics or exact approaches. Lu et al. (2019) develop
sistently, even on large problem instances. an improvement heuristic that makes use of traditional perturbation
• We demonstrate on a difficult benchmark problem set instead of and improvement operators, but use a trained neural network in-
contrived problem sets. stead of a hand-crafted heuristic to select these operators. In a related
• We show that introducing set partitioning approaches can im- work, Hottung et al. (2021) propose an improvement heuristic that
prove the performance. uses traditional destroy operators but leverages a neural network repair
• We demonstrate that this can generalise to other problem vari- operator to fix destroyed routes. da Costa et al. (2021) develop an
ants, such as the green vehicle routing problem. approach that depends on a neural network to select 2-opt moves
in an existing solution. In some cases, the neural network is itself
We note that our learning heuristic is the first to achieve such a construction heuristic, an idea prompted by the work of Vinyals
good results on the larger benchmark datasets by Uchoa et al. (2017). et al. (2015) and Nazari et al. (2018). Such models coupled with re-
This has been possible because instead of using NNs as an end-to-end inforcement learning techniques have enabled fast and easy-to-modify
machine learning approach to solve RPs, we use a RL based heuristic heuristics to become effective at solving relatively small VRPs (Nazari
to generate many good solutions for sub-problems and combine those et al., 2018; Kool et al., 2018). Recent advances have enabled these
solutions using an optimisation technique. approaches to become competitive for solving instances with around
The rest of the paper is structured as follows: Section 2 presents one-hundred nodes, achieving a small gap with respect to best known
background literature on optimisation techniques and machine learning solutions (Kwon et al., 2020; Hottung et al., 2021). However, they
heuristics for solving RPs. The necessary preliminaries on CVRP and set have largely defied generalisation to much larger problem instances,
2
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
with poor and unpredictable performance or excessive computation A restriction on V ensures that any node not in 𝑊 has zero-valued
required. Despite these shortcomings, a lot of interest remains in de- 𝑝𝑢 . The mechanism for this restriction is referred to as a mask in the
veloping ML techniques because these techniques have the potential machine learning literature. In the case of the CVRP, 𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑁𝑜𝑑𝑒𝑠
that they can be easily adapted to different VRP variants. This can ensures that any customer node already in 𝑆 has a zero-valued 𝑝𝑢 and
significantly reduce the time to develop new heuristics for the different that the previously-visited node which could be a depot also has a zero-
VRP variants and customise it to different input distributions. valued 𝑝𝑢 . This mask can be easily modified for a wide range of RP
variants.
2.2.1. The attention model The relative simplicity of the concept of predicting a node to append
A number of construction heuristics for RPs involve node insertions. to the partial solution is attractive, but it scales poorly to larger problem
One type of insertion heuristic proceeds with the approach that nodes instances. As the problem size increases, the computation required for
are inserted one-at-a-time into an initially empty solution until a feasi- the NN encoding grows quadratically, which is prohibitive.
ble solution is obtained. At any step of the process, a node is selected Gaps with respect to best known solutions become significant for
from 𝑉 and inserted somewhere into a partial solution. The space of problems with one hundred nodes or more, unless a sampling approach
possible node insertions can be quite large because it may be necessary is taken (Kool et al., 2018). Even the sampling approach may require
to select both the next node to be inserted and the location in which a large number of samples to reduce the gap enough. The POMO
to insert it. One way to reduce the size of the space of such decisions method, described in Section 2.2.2 illustrates a much faster approach
is to restrict the placement of inserted nodes to the end of a solution: that achieves comparable results to sampling thousands of solutions.
they may only be appended to the current partial solution. This can be
Several alternative architectural modifications have been made to
effective if there exists an effective mechanism 𝑠𝑒𝑙𝑒𝑐𝑡𝑁𝑜𝑑𝑒 for selecting
improve the quality of the solutions produced by these attention-based
the node at each step, given the state of the partial solution. Nodes
end-to-end models. The work of Falkner and Schmidt-Thieme (2020)
must be selected such that no insertion renders the solution infeasible.
uses joint attention to construct several routes simultaneously in the
This can be achieved by restricting the nodes under consideration for
hope that this will increase the likelihood of good node selections. Xin
insertion to some set 𝑊 ⊂ 𝑉 with a method 𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑁𝑜𝑑𝑒𝑠. This will
et al. (2021) use multiple decoders and dynamic embeddings to pro-
depend both on the variant of the RP under consideration and the
vide better information to the model that makes the node selection
current state of the partial solution 𝑆. A general pseudo-code is given
decisions. Duan et al. (2020) implement a combined reinforcement
for these approaches in Algorithm 1.
and supervised learning approach, using node-selection techniques that
inject edge-based features to improve solution construction. These ap-
Algorithm 1 Appending Node-Insertion Heuristic proaches, while successful to some extent, lack a key quality that we
1: Given: 𝑉 need for a general heuristic: simplicity of idea and implementation.
2: 𝑆 ← () This, according to Cordeau et al. (2002) is an essential element of
3: 𝑊 ←𝑉 any good heuristic. This is why, despite the existence of many other
4: while 𝑊 ≠ {} do promising solution techniques that take advantage of ML methods (Kool
5: 𝑊 ← 𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑁𝑜𝑑𝑒𝑠(𝑉 |𝑆) et al., 2022; Lu et al., 2019, among others), we proceed with the
6: 𝑢 = 𝑠𝑒𝑙𝑒𝑐𝑡𝑁𝑜𝑑𝑒(𝑊 , 𝑆) POMO approach. This is supported by further work that has extended
7: 𝑆 ← append(𝑆, 𝑢) the POMO model successfully to problems with difficult time windows
8: end while constraints (Rabecq and Chevrier, 2022).
9: return 𝑆
2.2.2. Improved solution construction
Solutions generated with AM models have a similar structure, limit-
Starting at the depot, a partial solution may visit a sequence of ing their diversity (Kwon et al., 2020). Diversity in structure is desirable
customer nodes until it returns to a depot, completing a route. This because it increases the chance of avoiding getting trapped at local
process is continued until the heuristic constructs a set of routes that optima and that a good solution will be constructed. POMO greedy
visits each customer exactly once. selection proposes a new greedy sampling technique which constructs
The main idea of end-to-end learning techniques is that the node 𝑁 greedy solutions similar to the manner outlined in Section 2.2.1.
insertion can be thought of as a Markov Decision Process. The action However, a different initial customer is fixed for each solution artifi-
is the node-selection and the current state is the partial solution. Each cially. Each solution starts at the depot and must then visit a designated
action depends only on the current state. Kool et al. (2018) describe a customer. This approach is naive in principle, since some nodes may
neural network that estimates a score 𝑝𝑢 for each node u in Algorithm not be desirable to visit immediately before or after the depot, but
2. A neural network 𝑓 requires a feature set representing each node it works extremely well in practice. This approach yields comparable
𝑢 ∈ 𝑉 , a representation of the partial solution 𝑆 and the restricted set results with the sampling variant of AM heuristic at a small fraction of
of nodes 𝑊 . The scores can be interpreted as probabilities, since they the computational effort (Kwon et al., 2020).
can be made to sum to unity. The NN is trained to produce a higher
Employing instance augmentation techniques yields even better
score for nodes more likely to be beneficial if inserted next. Nodes can
solutions (see, for example, Kwon et al. (2020)). A problem instance
be selected greedily, select node 𝑢 with maximum probability max𝑢 𝑝𝑢
can be augmented by applying a uniform affine transformation (ro-
at each step, or randomly by sampling, where node 𝑢 is selected with
tations, reflections, translations) to the positions of the nodes. The
probability 𝑝𝑢 . We refer to the former as the greedy variant and the
ordering of the nodes in an optimal solution will not be affected.
latter as the sampling variant of the AM heuristic.
This augmented problem instance will yield a different NN feature
The node-selection procedure is outlined in Algorithm 2.
representation. The AM models have been shown to construct dif-
ferent solutions to these augmented problems than to the original,
Algorithm 2 selectNode un-augmented problems (Kwon et al., 2020). Using POMO greedy
1: Given: 𝑊 , 𝑆 selection on the same problem augmented with 𝑑 different transforms
2: for each 𝑣 ∈ 𝑊 : do allows one to greedily construct 𝑑 × 𝑛 solutions to the same problem.
3: 𝑝𝑣 = 𝑓 (𝑣|𝑊 , 𝑆) In practice, this works significantly better than the sampling variant
4: end for of the AM heuristic and still at much smaller computational cost. The
5: 𝑢 = max 𝑝𝑣 ∕∕ Or 𝑢 = 𝑟𝑎𝑛𝑑𝑣∈𝑊 (𝑝𝑣 ) normalisation of the transformations can be kept consistent if they
𝑣∈𝑊 are restricted to reflections and ninety-degree rotations. Fig. 1 shows
6: return 𝑢
sample transformations to yield augmented problem instances.
3
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
Fig. 1. Example of a toy routing problem with POMO augmentation. The blue square represents the depot and the red circles the customers of arbitrary demand. The top row
contains only rotations (about the geometric centre) of angles 𝑖𝜋2 , with 𝑖 increasing from left to right with 𝑖 ∈ {0, 1, 2, 3}. The second row is similar, but each problem has also
been flipped vertically around the centre line.
∑
3. Preliminaries s.t. 𝑎𝑖𝑟 𝑥𝑟 = 1 ∀ 𝑖∈𝐼 (3)
𝑟∈
∑
We demonstrate the efficacy of our method on CVRP and we use 𝑥𝑟 ≤ 𝑘 (4)
the set partitioning formulation to combine the solutions of our sub- 𝑟∈
problems. In this section, we describe these optimisation problems. 𝑥𝑟 ∈ {0, 1} ∀ 𝑟 ∈ . (5)
3.1. The capacitated vehicle routing problem Constraint (3) ensures that each customer is contained in exactly
one selected route, while constraint (4) demands that at most 𝑘 routes
A CVRP instance 𝑃 can be defined on a graph 𝐺 = (𝑉 , 𝐸), where 𝑉 are selected.
is the set of nodes of the graph and 𝐸 the set of edges. The set of nodes The SPP is also an NP-hard optimisation problem. Furthermore, the
𝑉 is a disjoint set 𝑉 = 𝐷 ∪ 𝐼, where 𝐷 is the set of depot nodes (usually number of feasible routes can grow exponentially with the number of
𝐷 = {0}) and 𝐼 is the set of customer nodes 𝐼 = {1, … , 𝑁}. For each customers. However for most problem instances, the vast majority of
customer 𝑖 ∈ 𝐼 there is an associated nonzero demand 𝑑𝑖 that must be the routes of are not going to be in an optimal solution. It may
satisfied. The customers are served by a homogeneous fleet of exactly 𝑘 be sufficient to solve the SPP-CVRP by identifying a small subset 𝑠
vehicles, each with a capacity 𝑄. Vehicles serving the customers must of that is likely to contain a high-quality solution (Subramanian
begin and complete their route at a depot node. et al., 2013). For larger problem instances, this subset must be carefully
A route 𝑟 is an ordered set of customer nodes. Let |𝑟| be the number constructed to ensure that the resulting SPP is tractable and this is
of customers in route 𝑟. No route may contain the same customer twice. exactly what we aim to do in our learning heuristic. Note that the fleet-
If a customer is visited in some route 𝑟, it may not be visited in any size constraint is guaranteed to be satisfied by the SPP-CVRP solution
other route in the solution. The sum of the demands of the customers as long as there is at least one feasible solution in 𝑠 , a condition that
served along a route may not exceed the total capacity of a vehicle. Let is ensured in our learning heuristic.
𝑑𝑖 be the demand of customer 𝑖 (𝑑0 = 0) and be the set of all possible
routes, we have: 3.3. The green vehicle routing problem
∑
𝑑𝑖 ≤ 𝑄 ∀𝑟 ∈ (1) A variety of solution techniques have been proposed to solve the
𝑖∈𝑟
green vehicle routing problem and its variants. These have been largely
Each edge (𝑖, 𝑗) has an associated cost 𝑐𝑖𝑗 . An optimal solution derived from the previously existing literature for traditional vehicle
satisfies the constraints and minimises the sum of the costs of the edges routing problem variants. Although exact approaches have been pro-
in the solution routes. posed to solve variants of both the green vehicle routing problem (see,
for example, Koç and Karaoglan (2016) and Bruglieri et al. (2019)) and
3.2. The set partitioning problem the electric vehicle routing problem (see Desaulniers et al. (2016)),
they have been limited to solving relatively small problem instances.
A feasible CVRP solution 𝑆 is a collectively exhaustive and mutually Metaheuristics are by far the most popular research direction because of
exclusive set of routes 𝑅 = {𝑟1 , … , 𝑟|𝑆| }, where |𝑆| is the number of their ability to quickly and effectively search the solution space (Mogh-
routes in the solution 𝑆. Note that 𝑅 is a small subset of , |𝑅| ≪ ||, dani et al., 2021). These solution methodologies generally require
where is the set of all feasible routes. A solution of a routing problem extensive algorithm engineering efforts and are typically used to solve
𝑃 can therefore be represented as a partition of the set 𝐼, where each problem instances that consider relatively few real-world constraints.
element (subset of I) of the partition can be mapped to a feasible route. In industrial settings, expediency is often required not only in the
This allows any CVRP to be transformed to a set partitioning problem run-time of a solution method, but also in the development of that
(SPP). Let 𝑥𝑟 be a binary decision variable taking unit value if a route solution method. One approach that has promise to solve both of
𝑟 is selected and zero otherwise. Let 𝑎 be matrix with a unit entry these problems is the recently emerging solution method known as
at element 𝑎𝑖𝑟 if the customer 𝑖 is in route 𝑟 and zero otherwise. We end-to-end learning, pioneered by Kool et al. (2018). This approach
can formulate the SPP-CVRP as an integer linear programme (ILP) as uses attention-based neural networks (see Vaswani et al. (2017)) to
follows: learn how to solve combinatorial optimisation problems under a re-
∑ inforcement learning regime. Recent works in this direction, such as
minimise 𝑐𝑟 𝑥𝑟 (2)
those of Kwon et al. (2020) and Hottung et al. (2021), have become
𝑟∈
4
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
increasingly effective for larger and more highly-constrained problem 4.2. The initial construction heuristic
variants. However, previous attempts to scale these approaches to
larger problem sizes have largely failed. The initial heuristic provides a starting point for the decomposition
strategy. The choice of the initial heuristic is thus, crucial, for the
4. Methodology effectiveness and easy adaptability of our learning framework. We
considered the following candidates for the choice of initial heuristic
We first outline the heuristic at a high level and then explain each in our framework:
element in detail in the following sections.
• Hybrid genetic search (HGS): HGS (Vidal et al., 2012) is an
4.1. A scalable learning-based heuristic for the CVRP extremely fast and effective heuristic for the CVRP, and can yield
very good solutions for large instances in seconds. Queiroga et al.
The key idea behind our learning heuristic is to dynamically decom- (2021) have used HGS to obtain an initial solution.
pose the RP instance into a set of roughly equal sized sub-problems. • Lin–Kernighan heuristic LKH3: LKH3 (Helsgaun, 2017) is a well-
We then train a RL heuristic to solve the RP instances of size equal to known powerful solver for the CVRP and other routing problem
the size of the sub-problems. Our RL heuristic does not just generate variants. Li et al. (2021) ran the Lin–Kernighan heuristic for
one solution, but many different solutions for each sub-problem. For 30,000 iterations to obtain an initial solution, though it took two
each sub-problem, we then combine the solutions generated by the RL to three hours of computation.
heuristic using a set partitioning ILP formulation. The union of the • Clarke–Wright (CW) parallel-savings heuristic (Clarke and Wright,
combined solutions for sub-problems gives us a high-quality solution
1964) is a relatively simple heuristic that can be easily adapted
of the original RP instance.
to a range of vehicle routing problems.
This heuristic can enable scaling to large graphs with little general-
isation error as the RL heuristic is only used on the sub-problems that Even though HGS and LKH3 would have resulted in significantly
are of small size and for which it is trained to deal with. Similarly, the better initial solutions compared to CW (see Section 5.5 for a detailed
SPP instances also do not grow too big as they are only dealing with comparison), we decided to use CW as our starting heuristic. This is
small sub-problems. because our primary interest is to design a learning heuristic that can
Algorithm 3 shows the high-level outline for the heuristic . This be easily adapted to a large number of VRP variants and CW heuristic
heuristic takes as input a routing problem 𝑃 , a maximum sub-problem is the easiest to adapt to the different variants. In contrast, we note that
size 𝑙, and a maximum number of iterations 𝑇 . We assume that the HGS is quite specific to CVRP and it is not easy to adapt it to other VRP
size 𝑁 of problem 𝑃 obeys 𝑁 ≥ 𝑙. Otherwise, no decomposition is variants. For many variants of the VRP, there exist neither powerful,
required. An initial solution 𝑆 is constructed for 𝑃 using some fast fast heuristics, nor readily-accessible implementations. In this case we
initial construction heuristic 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐. allow the CW heuristic to make disimproving ‘‘savings’’ to ensure the
At each iteration, 𝑖 a sub-problem 𝑃 ′ is created from the routing fleet-size constraint is obeyed from the beginning.
problem 𝑃 using the solution 𝑆. A subset of routes, 𝑅′ , of 𝑆, are
selected to form 𝑃 ′ by the subroutine 𝑠𝑒𝑙𝑒𝑐𝑡𝑆𝑢𝑏𝑃 𝑟𝑜𝑏𝑙𝑒𝑚. This sub- 4.3. Selecting sub-problems
problem contains all of the customers of those routes, as well as the
depot. It may not contain more than 𝑙 nodes of the master problem. The Next, we describe how we use the initial feasible solution 𝑆 gener-
routes 𝑅′ form an initial solution 𝑆 ′ to the sub-problem. The heuristic ated by the initialConstructionHeuristic to dynamically decompose the
𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐 is then used to solve the sub-problem, giving a set of RP instance into a set of sub-problems. To generate a sub-problem, we
solutions, composed of a set of routes (including the original solution
first select a subset 𝑆 ′ of routes in 𝑆 and then define a sub-problem
𝑆 ′ ). This is passed to an SPP solver to produce a solution 𝐾 ′ . This
𝑃 ′ based on that. We define the support graph 𝐺′ to contain all of the
solution must obey the fleet-size constraint. If the objective function
nodes selected in routes of 𝑆 ′ and the depot, as well as the edges 𝐸 ′
value 𝑐(𝐾 ′ ) is less than that of 𝑐(𝑆 ′ ), we accept the new routes and
between those nodes. Let the number of vehicles 𝑘′ required to service
substitute them in 𝑆 in place of 𝑅′ to form an improved solution using
the customers of the sub-problem be equal to the number of routes in
the subroutine 𝑢𝑝𝑑𝑎𝑡𝑒𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛.
𝑆 ′ . Then, the problem 𝑃 ′ is simply to solve a CVRP on 𝐺′ with at most
Algorithm 3 The General Heuristic 𝑘′ vehicles
Consider that we can view the solution 𝑆 as set of routes 𝑆 =
Require: 𝑃 , 𝑙, 𝑇
{𝑟1 , … , 𝑟𝑘 }. The subset 𝑆 ′ is initialised with a route 𝑟 ∈ 𝑆 at random.
𝑆 = 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐(𝑃 )
We define some distance measure 𝑑(𝑟, 𝑟𝑖 ) between the centroids of the
for 𝑖 ∈ {1, ..., 𝑇 } do
routes 𝑟 and 𝑟𝑖 . We insert routes into 𝑆 ′ in the increasing order of their
𝑃 ′ , 𝑆 ′ , = 𝑠𝑒𝑙𝑒𝑐𝑡𝑆𝑢𝑏𝑃 𝑟𝑜𝑏𝑙𝑒𝑚(𝑃 , 𝑆, 𝑙)
distance 𝑑(𝑟, 𝑟𝑖 ) from the initial route 𝑟. We keep adding routes to 𝑆 ′
= 𝑙𝑒𝑎𝑟𝑛𝑒𝑑𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐(𝑃 ′ )
as long as the total number of nodes in all routes in 𝑆 ′ is less than a
𝐾 ′ = 𝑠𝑒𝑡𝑃 𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑖𝑛𝑔𝑃 𝑟𝑜𝑏𝑙𝑒𝑚(, 𝑆 ′ )
predefined threshold 𝑙. This ensures that the number of nodes in 𝑆 ′ is
𝑆 = 𝑢𝑝𝑑𝑎𝑡𝑒𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛(𝑆, 𝑆 ′ , 𝐾 ′ )
always less than or equal to 𝑙.
end for
As the solution 𝑆 gets updated during the course of the heuris-
return S
tic and the initial route 𝑟 is randomly selected, we obtain differ-
Our approach is very general. We may make use of any start- ent sub-problems. Fig. 2 shows an overview of our decomposition
ing heuristic we like, as long as it can generate an initial feasible approach.
solution. The sub-problem selection heuristic can be replaced for a This decomposition allows a RL heuristic trained on graphs of
more intricate approach, which may yield better candidates to re- around 𝑙 nodes to be able to provide many good solutions for each of
solve. Improved learning-based heuristics for the sub-problem may these sub-problems. In our experiments, we set the default value of 𝑙 to
be used, and they may take the same form as those of Kool et al. be 125.
(2018). We may also use alternative recombination heuristics instead An important assumption that we are making in this decomposition
of the set-partitioning approach. Section 4.2 describes our choice of strategy is that in 𝑃 ′ , we ask for nodes in 𝐺′ to be covered by 𝑘′
initial construction heuristics for CVRP, Section 4.3 describes how routes. It is conceivable that the optimal solution does not obey the
we dynamically decompose the RP instance using the initial solution, decomposition boundaries and even when it does obey the decomposi-
Section 4.4 describes our choice of learning heuristic to solve the sub- tion boundary, it covers the nodes with fewer or more than 𝑘′ routes.
problems and Section 4.5 describes how we use CVRP-SPP to combine However, as we show in Section 5, our heuristic still finds solutions
the RL solutions for each sub-problem. with small gaps to the best known solutions in a reasonable time.
5
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
Fig. 2. An illustration of the problem decomposition and sub-problem construction. In (a), we have the original (master) problem. In (b), we start with an initial solution. Each
route is given a different colour and depot-customer edges are depicted as dashed for clarity. In (c) we select a subset of the nodes to form a new problem by selecting a subset of
the routes. The non-selected nodes are depicted in grey. In (d) we use the learning-based heuristic to solve the sub-problem consisting of the selected nodes. The new sub-solution
must have the same number of routes as the original solution to the sub-problem.
Fig. 3. The context vector may now include information relating to the capacity or fuel remaining in the vehicle.
4.4. The heuristic for solving sub-problems As noted in Section 2.2.1, the existing literature on end-to-end
learning techniques has mostly focused on training NN solvers to solve
Next, we describe the learning heuristic 𝑙𝑒𝑎𝑟𝑛𝑒𝑑𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐 that we VRP problems of a fixed size. Our approach constructs sub-problems
use to solve the sub-problems with up to 𝑙 customers. We base our with a range of sizes (bounded above by our threshold 𝑙), depending
heuristic on the AM model of Kool et al. (2018) and we use the NN upon the structure of the initial feasible solution and the random
to construct a set of 8𝑁 solutions of the sub-problem 𝑃 ′ using initial routes selected. This implies that our learning heuristic has to
POMO (Kwon et al., 2020) construction with augmentation. We re- be effective on graphs of different structures and varying sizes (smaller
fer the reader to Sections 2.2.1 and 2.2.2 for more details of these than 𝑙). We achieve this by training the neural network on a range
techniques and refer the reader to Kool et al. (2018) for an in-depth of problem sizes, exposing it to problems with different values of 𝑁
explanation of the neural network architecture and Kwon et al. (2020) before it is expected to solve the varying size sub-problems. These
for details of the training and inference. modifications are described below.
Although the architecture of the neural network remains largely
the same as that of Kwon et al. (2020), we make some important 4.4.1. Training
modifications in the training and masking to adapt the approach. The In order to provide high-quality solutions on our sub-problems,
architectural change is reflected in the modification of the context we need to ensure that our training data is diverse in terms of the
vector, which includes the usual graph embedding vector, 𝑣𝑔 , initial instance sizes and the problem structure. Ideally, the training dataset
and final node embedding vectors 𝑣𝑖 and 𝑣𝑓 . We may now insert should have the same distribution as the test dataset. Unfortunately in
representations of the current capacity 𝑄𝑡 or fuel 𝐵𝑡 of the vehicle (see the literature, there has been little focus on the role of diversity in
Fig. 3). In particular, 𝑄𝑡 is the fraction of capacity remaining in the training. Most previous learning heuristics for solving RPs that have
vehicle at step 𝑡 and 𝐵𝑡 is the fraction of fuel remaining at step 𝑡. These focused the training on a narrow set of simplistic problem instances
are concatenated into a vector of 386 elements if both 𝑄𝑡 and 𝐵𝑡 are with low variability in structure and no variability in size. For instance,
included, and 384 otherwise (as it is in the original works). all problem instances used for training the AM models in the works
6
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
of Kool et al. (2018) and Kwon et al. (2020) have the same value of Clarke–Wright heuristic that is used as the initial construction heuristic.
𝑁 and each problem instance in each batch and epoch in a training Next, we examine the gain from the CVRP-SPP step and the use of the
run is generated by placing nodes randomly on a unit square with a POMO model as the sub-problem solver.
small range of uniform random demands. Furthermore, each vehicle We also present the solution quality of the state-of-the-art hand-
has an identical capacity 𝑄. As a result, the vast majority of the optimal tuned, carefully-crafted HGS heuristic for CVRP. This is to understand
solutions to these problems have very similar structure: several short how far are the learning heuristics from the state-of-the-art optimi-
routes with few customers and similar length. In contrast, real-life VRP sation heuristic for CVRP. As noted earlier, our focus is primarily on
instances exhibit a much greater variation in structure. Customer nodes comparing learning heuristics that can be easily adapted to solve dif-
can be closely clustered, depots can be far away from these clusters and ferent variants of the VRP problems and HGS cannot be easily adapted
some routes can be extremely long. for other variants of VRP.
Since we want to generate many different high-quality solutions for
our sub-problems, we use more diverse problem instances for training. 5.1. Datasets
The work of Bdeir et al. (2022) demonstrates the efficacy of this pro-
cess. Training problem instances vary in size throughout an epoch. Each We test the performance of our approach on the dataset of Uchoa
batch contains problem instances of the same size, but two subsequent et al. (2017). We consider only the larger problem instances with
batches may have different values of 𝑁. For each batch, we sample more than three hundred customers, for which the question of scal-
from the discrete uniform distribution 𝑁 ∼ (75, 125) when 𝑙 = 125. ability is relevant. We consider two partitions of this benchmark set:
Training instances are also generated using the approach of Uchoa et al. medium-sized instances with between three and seven hundred cus-
(2017) to ensure a wide variety of problem structures are encountered tomer (denoted X𝑀 ) and larger instances with between seven hundred
during training. For the purpose of this training, we do not fix 𝑘 for and one thousand customers (denoted X𝐿 ).
these instances, since we cannot guarantee that a solution produced by
the neural network will have 𝑘 routes. As in previous works, the neural 5.2. Experimental setup
network is trained using reinforcement learning techniques with the
reward being greatest when the length of a solution is smallest. The All experiments were carried out using Python. The neural networks
remaining details of our training methodology are in line with Kool were developed and trained using the pyTorch package and the SPPs
et al. (2018) and Kwon et al. (2020). were formulated and solved using the python interface for the Xpress
solver.1 The pyHygese package was used as an interface for the HGS
4.5. CVRP-Set partitioning problems solver. Neural network training was carried out on a machine with
four Nvidia GeForce GTX 1080 GPUs and 72 Intel Xeon E5-2697 v4
A major issue with end-to-end learning heuristics for solving RPs @ 2.30 GHz CPUs. Inference and problem solving using the proposed
is that when the RPs have a constraint on the number of routes 𝑘 method was carried out on a machine with twelve Intel Core i7-9750H
(fleet-size constraint), the learning solutions frequently violate it and @ 2.60 GHz CPUs and a single Nvidia GeForce GTX 1080 GPU.
either produce solutions with too few or too many routes. This is be-
cause learning approaches minimise objective function value (solution 5.3. Evaluation metric
length) without any consideration of the constraint on the number of
routes. To guarantee the fleet-size constraint and improve the solution We compare solutions in terms of gap with respect to the best known
quality further, we formulate a CVRP set partitioning problem. solutions (BKSs) at the time of writing. This gap is defined in Eq. (6) in
The routes considered in the CVRP set partitioning problem include terms of the objective function value 𝑧 of the incumbent solution and
all the routes in 𝑆 ′ . In addition, we consider all the routes in up to 8𝑁 the objective function value 𝑧𝐵𝐾𝑆 of the best known solution:
different solutions produced by the POMO construction augmentation 𝑧 − 𝑧𝐵𝐾𝑆
100 × . (6)
(refer to Sections 2.2.2 and 4.4 for how we get up to 8𝑁 solutions). 𝑧𝐵𝐾𝑆
We then use the ILP formulation described in Section 3.2 to find the We report the mean gap of five separate runs for our proposed
minimum cost subset of atmost 𝑘′ (equal to number of routes in 𝑆 ′ ) approach and for HGS. HGS runtimes were restricted to match
routes such that all customers in 𝐺′ are covered. the mean runtime of the five runs of our proposed approach. For
Note that even if all the solutions returned by the 𝑙𝑒𝑎𝑟𝑛𝑒𝑑𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐 the AM model, the solution was obtained by generating 𝑁 solutions.
have more than 𝑘′ routes, the CVRP-SPP ILP will still return a guar- For POMO, we perform greedy node selection inference with eight
anteed solution with at most 𝑘′ routes. This is because 𝑆 ′ is a feasible augmentations of the original problem instances, as described in the
solution to the ILP. In fact, the solution 𝐾 ′ will at least be as good as original work (Kwon et al., 2020) and Section 2.2.2.
𝑆 ′ in terms of solution quality.
We then update the solution 𝑆 by replacing 𝑆 ′ in 𝑆 by 𝐾 ′ . The 5.4. Comparison with other ML heuristics on medium-sized problem in-
modified 𝑆 is still a feasible solution covering all customers and sat- stances
isfying fleet-size constraint while improving the overall objective. The
modified 𝑆 is then used to generate the next sub-problem, if needed. The results for the medium problem instances are presented in
We make use of the formulation in Section 3.2 for this work. Table 1. Comparing the various learning heuristics, we observe that
performance varies considerably for the different problem instances
5. Results and analysis and solution approaches. The gaps with respect to the best known
solution are particularly large for the AM model, which only produces
In this section, we present the results of our computational exper- one solution at test time. POMO yields relatively better solutions in
iments comparing the different learning heuristics for solving CVRP. some cases because of its improved node-selection technique. However,
Our focus is on large and complex instances of Uchoa et al. (2017) it still yields solutions with very large gaps for most instances in this
and hence, we primarily focus on comparing our learning heuristic set. A moderate positive correlation exists between the gap observed
with the AM (Kool et al., 2018) and POMO (Kwon et al., 2020) for these approaches (0.449 and 0.294 respectively for AM and POMO)
techniques. We note that the other learning heuristics for solving CVRP and the size of the problem. A stronger correlation exists (0.505 and
take considerably longer to solve these larger instances.
To understand the gains resulting from the different components, we
1
provide detailed ablation studies. We first compare our results with the https://www.fico.com/en/products/fico-xpress-optimization.
7
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
Table 1
Performance on the 𝑋𝑀 problem instances.
Problem BKS Gap (%) Time (s)
HGS POMO AM CW AM POMO CW
X-n303-k21 21 736 0.00 22.8 33.59 13.70 3.80 <1 <1 <1 91
X-n308-k13 25 859 0.01 15.96 16.91 3.57 0.90 <1 <1 <1 95
X-n313-k71 94 043 0.00 3.57 9.16 2.21 1.50 <1 <1 <1 94
X-n317-k53 78 355a 0.00 2.53 19.83 3.85 2.30 <1 <1 <1 101
X-n322-k28 29 834a 0.00 80.01 76.05 8.93 3.10 <1 <1 <1 104
X-n327-k20 27 532 0.00 36.44 19.39 9.31 1.20 <1 <1 <1 98
X-n331-k15 31 102a 0.00 25.06 305.83 7.11 3.40 <1 <1 <1 108
X-n336-k84 139 111 0.01 3.01 28.77 12.67 2.60 <1 1 <1 112
X-n344-k43 42 050 0.01 21.23 209.37 14.75 2.00 <1 1 <1 114
X-n351-k40 25 896 0.02 14.18 43.47 5.93 1.30 <1 1 <1 111
X-n359-k29 51 505 0.01 10.03 9.85 3.30 2.10 <1 1 <1 119
X-n367-k17 22 814 0.01 16.98 208.44 8.02 3.30 <1 1 <1 120
X-n376-k94 147 713a 0.02 1.91 27.62 1.68 3.30 <1 1 <1 124
X-n384-k52 65 928 0.02 11.39 10.26 2.37 2.00 <1 1 <1 126
X-n393-k38 38 620a 0.01 258.14 51.52 6.20 3.40 <1 1 <1 128
X-n401-k29 66 154 0.01 6.35 51.73 2.57 2.90 <1 1 <1 134
X-n411-k19 19 712 0.02 37.22 213.78 2.20 2.90 <1 1 <1 141
X-n420-k130 107 798a 0.02 4.83 31.57 4.71 2.30 <1 2 <1 146
X-n429-k61 19 712 0.03 26.60 18.04 4.46 3.60 <1 1 <1 149
X-n439-k37 36 391a 0.01 372.30 298.23 9.48 3.00 <1 1 <1 158
X-n449-k29 55 233 0.02 28.07 24.47 5.39 2.40 <1 1 <1 161
X-n459-k26 24 139 0.03 487.25 94.27 15.40 1.60 <1 1 <1 170
X-n469-k138 221 824a 0.00 3.36 59.57 15.86 1.20 <1 2 <1 177
X-n480-k70 89 449 0.02 9.31 11.31 3.34 1.90 <1 2 <1 182
X-n491-k59 66 483 0.02 28.94 18.43 3.13 2.40 <1 2 <1 189
X-n502-k39 69 226 0.00 23.35 287.18 5.97 2.70 <1 2 <1 193
X-n513-k21 24 201 0.02 727.86 312.55 8.90 2.10 <1 2 <1 195
X-n524-k153 154 593 0.03 8.41 97.39 6.40 3.10 <1 4 <1 198
X-n536-k96 94 896 0.03 12.91 27.14 7.11 3.30 <1 3 <1 210
X-n548-k50 86 700 0.02 70.15 12.89 3.87 1.60 <1 3 <1 204
X-n561-k42 42 717 0.02 479.50 302.29 3.66 1.50 <1 3 <1 203
X-n573-k30 50 673 0.02 19.47 224.79 7.88 3.00 <1 3 <1 207
X-n586-k159 190 316 0.03 6.31 78.32 8.62 2.00 <1 5 <1 218
X-n599-k92 108 451 0.03 70.91 27.56 3.66 1.40 <1 4 <1 211
X-n613-k62 59 535 0.04 288.40 314.07 4.77 2.30 <1 4 <1 233
X-n627-k43 62 164 0.03 229.13 241.97 3.59 1.80 <1 4 <1 231
X-n641-k35 63 682 0.03 209.78 402.12 9.43 1.80 <1 5 <1 243
X-n655-k131 106 780a 0.01 47.51 236.15 7.88 1.70 <1 7 <1 245
X-n670-k130 146 332 0.03 25.42 299.54 4.00 1.60 <1 7 <1 267
X-n685-k75 68 205 0.04 232.13 292.48 7.24 1.90 <1 7 <1 256
a
Indicates proven optimal solutions.
0.450 respectively for AM and POMO) between the ratio 𝑁∕𝑘 and the as the learning techniques evolve further, this gap will reduce even
observed gap. This suggests that for these approaches, performance for the simpler variants of VRP that have received significant research
is worse for larger problem instances and for solutions with more attention over many decades and where careful, hand-crafted heuristics
customers in a route. In contrast, our learning heuristic produces have been developed.
solutions with gap no greater than 3.8%, which is a significant improve- The results in Table 1 are for the case where HGS is allowed to
ment over AM and POMO. In fact, outperforms AM and POMO on run as much time as our learning heuristic. In contrast, if we restrict
almost all instances in this set. For , the correlation with problem size HGS to run for only as much time as CW heuristic, we found that the
and mean gap is 0.097 which is very weak and suggests no association. HGS metaheuristic achieved a mean gap of 3.89% on the instances that
The correlation between the ratio 𝑁∕𝑘 and the observed gap is −0.349. it could obtain a solution for (84 of 100 instances) while the mean
The maximum value of 𝑁∕𝑘 for this problem set is 24. This suggests gap between the solution obtained with the CW heuristic and the best
that partitioning the problem actually yields better performance on instances known solution across all instances of the Uchoa problem set, Uchoa
where the sub-problems have fewer routes (smaller 𝑘) with many customers et al. (2017), was 5.99%. This implies that the lower gaps of HGS come
(larger 𝑁). at the cost of additional running time.
The Clarke–Wright heuristic yields much smaller gaps, in less time,
than the AM and POMO heuristics, for all but one problem instance in 5.5. Comparison on larger problem instances
this set. Our heuristic can produce significant improvements in the
quality of the CW initial solution with no gap greater than 3.8%. Table 2 shows the performance results for the larger problem in-
HGS performs excellently, with gaps smaller than 1% in most cases stances of the benchmark set. We note that none of these instances
and achieving optimal solutions in some of cases. However as noted have proven optimal solutions. Similar to the case of medium sized
earlier, the focus of our work is on comparing the learning heuristics problem instances, we observe the our learning heuristic achieves a
that can easily adapt to other VRP variants while HGS is very specific mean gap of less than 3.3% with respect to the best known solution.
to CVRP and cannot easily be modified for other VRP variants. The This significantly outperforms the AM and POMO heuristics on all
mean gap of HGS solutions is provided to understand how far are the instances in this category. Furthermore, the mean gap of the solutions
solutions of our learning heuristic from those of the state-of-the-art produced by our heuristic is still uncorrelated with the increase in the
manually designed heuristic. Although there is still a good bit of gap size of the problem instances (with an R-value of −0.07). The runtime
between the solutions of and HGS (see Table 1), we expect that of our heuristic does increase steadily with the instance size. However,
8
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
Table 2
Performance on the 𝑋𝐿 problem instances.
Problem BKS Gap (%) Time (s)
HGS POMO AM CW AM POMO CW
X-n701-k44 81 923 0.03 85.96 354.24 4.62 2.80 <1 6 <1 278
X-n716-k35 43 373 0.04 85.68 300.73 5.57 3.30 <1 7 <1 303
X-n733-k159 136 187 0.03 23.39 221.93 2.58 2.20 1 8 <1 322
X-n749-k98 77 269 0.03 210.64 173.51 3.02 1.30 <1 7 <1 335
X-n766-k71 114 417 0.03 15.38 142.82 4.34 2.30 <1 7 <1 367
X-n783-k48 72 386 0.03 312.11 259.99 6.26 2.00 <1 9 <1 378
X-n801-k40 73 305 0.04 404.58 514.36 5.31 1.80 1 10 <1 414
X-n819-k171 158 121 0.03 25.08 46.38 5.02 2.40 1 11 <1 433
X-n837-k142 193 737 0.04 38.66 248.90 3.49 2.70 1 13 <1 467
X-n856-k95 88 965 0.03 415.89 270.32 3.84 3.30 1 12 <1 471
X-n876-k59 99 299 0.02 29.35 211.38 3.10 1.40 1 12 <1 481
X-n895-k37 53 860 0.04 389.04 630.38 8.81 2.50 1 15 <1 502
X-n916-k207 329 179 0.03 12.74 25.52 4.60 1.89 1 15 <1 525
X-n936-k151 132 715 0.05 93.08 86.12 11.39 2.80 1 15 <1 544
X-n957-k87 85 465 0.04 309.89 228.99 4.27 2.40 2 19 <1 564
X-n979-k58 118 796 0.05 23.02 87.12 3.88 2.90 2 16 <1 578
X-n1001-k43 72 355 0.04 420.38 303.93 7.05 2.10 2 16 <1 601
Table 3 Table 4
Comparing performance with and without the set partitioning step. Forgoing the Performance comparison for the smallest of the medium size problem instances. We
recombination of solutions to sub-problems by solving an SPP improves run-times, but compare timings and solution quality when the sub-solver is replaced with the original
significantly impacts performance. AM model.
Benchmark set Mean gap (%) Mean time (s) Problems Mean gap (%) Mean time (s)
(No SPP) (No SPP) (AMG) (AMS) (AMG) (AMS)
X𝑀 4.56 2.31 95 167 X𝑆 6.53 6.21 2.41 33 1653 104
X𝐿 4.66 2.35 341 435
perform these tests for the medium-sized problems with four hundred
even the largest problem of 1000 customers has a mean solve time of or fewer customers (denoted by X𝑆 ). We report performance based on
just over ten minutes. the average gap from the best-known solutions and the average solution
Again, we observe that the mean gap of our heuristic solutions does time across all instances.
improve considerably over the CW heuristic. The HGS heuristic does From Table 4, we observe that the performance of our heuristic with
achieve a gap of lower than 1% in almost all cases, but as stated earlier, modified POMO as the internal heuristic is considerably better than
this heuristic is not easy to adapt to other variants of VRP problems and with AMG or AMS as an internal heuristic. The mean computation time
is thus, not the focus of our comparison. of our heuristic with POMO as the internal heuristic is less than two
minutes. Thus, we conclude that our choice of modified POMO as the
5.6. Ablation study: Set partitioning sub-problem solver is well justified.
We note that the advantage of the POMO approach is that it
Worst-case solve-times for CVRP-SPPs can be quite long even for achieves similar performance in terms of solution quality to the sam-
small instances. The re-combination of many solutions to form an pling variant of the AM model while taking approximately the same
improved sub-solution is useful, but it is worth investigating the benefit time (Kwon et al., 2020). With augmentation, solve-time is increased
it brings to the overall solution quality in our learning-based heuristic. somewhat, but this yields significantly better solutions than the AM
If an initial solution obeys the fleet-size constraints, then it is sufficient approach. Our results show that these advantages of POMO also result
to simply replace sub-solutions with any solution that has the same in it being a better heuristic for solving our sub-problems.
number of routes and has a better objective function value. If this is
not the case, then the SPP step can help to combine known solutions 5.8. Using the ML heuristic as the initial heuristic
into a solution that obeys the fleet-size constraints.
Table 3 shows the effects of removing the SPP step from our In our learning heuristic, we have used CW as our initial construc-
proposed approach. We report the mean gap over each partition of the tion heuristic to generate the first feasible solution. CW is a simple
benchmark problem set, as well as the mean solve time. Removing the heuristic that can be adapted for many VRP variants.
SPP step leads to a significant decrease in the solve time (reducing it Ideally, our goal is to replace the CW heuristic with a more adapt-
by almost half on average in the case of the medium-sized problem able learning heuristic. However, there are several challenges to achiev-
instances) but it leads to a steep deterioration in the solution quality. ing that. The results from Tables 1 and 2 indicate that the solution
For both the medium and the large problem instances, the mean gap quality of AM and POMO heuristics varies a lot over the benchmark
with respect to the best known solutions doubles. problem instances. Some of the gaps with respect to the best known
solutions reach hundreds of percent. Furthermore, in 49 of the 57
5.7. Ablation study: AM v POMO as the internal heuristic problem instances, the POMO greedy sampling approach produces
solutions that do not obey the fleet-size constraint. In each case, more
In this section, we study if our choice of POMO (with modified fea- routes are produced than required. In some cases, there are hundreds
tures and training data) as the learnedHeuristic to solve sub-problems more routes than in the best solution found.
in our learning heuristic is justified. We explore this by testing the AM To explore if a learning heuristic can be used as an initial con-
model as the internal sub-problem solver for the proposed approach. struction heuristic, we relax the constraint that the initial solution
We test both the greedy solution construction approach of the AM contains the correct number of routes. This enables us to use the
model (AMG) and the sampling variant of the AM model (AMS). We AM or POMO method to generate an initial solution. We allow a
9
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
Table 5 to return from customer 𝑖 to the depot, that is: 𝑇 − (𝑢𝑖 + 𝑡𝑖𝛿 ) ≥ 0. For
Using the learning-based heuristics as the initial heuristic. Allowing candidate sub- hard time windows, a vehicle must wait at the customer if it arrives
solutions with fewer routes to replace existing sub-solutions allows the proposed
heuristic to obey fleet-size constraints for more of the problem instances. This yields a
before the opening of the time window 𝑢𝑖 and can only begin servicing
comparable gap to the best known solutions. the customer once the time window opens.
Problem Mean gap Solutions obeying fleet-size constraints
6.1.3. Route duration and length constraints
POMO POMO
(POMO initial) (POMO initial) In the case of route length constraints, the cumulative length of the
edges used to form a complete route 𝑟 may not exceed some threshold
X𝑀 76.97 2.68 8 21
X𝐿 229.11 2.89 0 3 𝐿. In the case of route duration constraints, the total time taken to
traverse all edges, service each customer and refuel or recharge the
vehicle cannot exceed the total allotted routing time 𝑇 .
candidate sub-solution to replace an existing solution if it improves the 6.1.4. Battery and fuel constraints
objective function value or if it reduces the number of routes required Battery constraints and fuel constraints concern the prevention of
to service the customers in the sub-problem. We allow candidate sub- the condition in which the vehicle may become stranded. The maxi-
solutions with fewer routes to replace existing sub-solutions until the mum battery capacity or fuel level of a vehicle is 𝐵. Traversing each
fleet-size constraint is respected. After this, candidate sub-solutions may arc or edge (𝑖, 𝑗) expends a certain amount of fuel or energy 𝜖𝑖𝑗 . The
only replace existing sub-solutions if they contain the same number total fuel level, or battery energy level, 𝑏𝑖 , upon arrival at any node
of routes as the existing sub-solution and the objective function value 𝑖 may never be less than zero. To prevent the vehicle from becoming
is lower. Doing this allows us to effectively match the performance stranded, it must either return to the depot before running out of fuel
of the proposed heuristic, even without SPP. Unlike , this is not or battery energy, or it must visit a charging station node. Depending
guaranteed to satisfy the fleet-size constraint. Nonetheless, Table 5 on the charging strategy, a family of constraints may exist to restrict
shows that our learning heuristic with POMO as the initial construction the charging profile once the vehicle arrives.
heuristic ((POMO initial)) obtains solutions that respect the fleet-size The heuristic used to identify the initial solution is the end-to-
constraints in 24 of the 57 problem instances. end model, and improvements are permitted by accepting improving
solutions with fewer routes at each stage of the decomposition. The
6. Extending to other vehicle routing problem variants architectural differences are minor for each problem variant to the
end-to-end neural network sub-solver are minor and are limited to the
Next, we show how our approach can be easily extended to other handling of the initial features through an embedding, the decoder
vehicle routing problem variants with a particular focus on green context and the creation of masks to ensure the feasibility of each node
vehicle routing problems and electric vehicle routing problems. This selection.
extension need not require any changes to the architecture of the neural
network under consideration nor in the training scheme (though small 6.2. Constraint masks
changes may greatly improve performance). It is necessary, however,
to consider additional constraints, which affects the masking scheme Masks are generated in the same manner as described in the original
works of Kool et al. (2018) and Kwon et al. (2020) and others. If visiting
that is adopted.
a customer would violate the capacity constraint, then the visit should
Many variants of the vehicle routing problem can be considered
not be considered. If a customer has already been visited, it may not be
with simple modifications of the existing masking scheme. In some
visited again. If a node has just been visited, then it cannot be selected
cases, we also need to modify the features, embedder and decoder con-
next for insertion. A customer may not be visited if, when starting from
text. In the remainder of this section, we describe these modifications
a node 𝑖, there is not enough time remaining to reach 𝑗 (see Fig. 4).
and demonstrate the effectiveness of the extension of our approach on
Similarly, a customer may not be visited if there is not enough time
a green vehicle routing problem benchmark.
to both visit (and service) that customer and return to the depot. A
customer may not be visited if there is not enough fuel remaining to
6.1. Constraints in VRP problems
reach node 𝑗 from node 𝑖. A customer node may not be visited, if doing
so would render the vehicle stranded. A customer may only be visited
We first review the constraints that are often found in the various
if the vehicle has enough fuel or charge to either visit that customer
vehicle routing problem variants. and return directly to the depot or if it has enough fuel or energy to
visit that customer, reach a refuelling or recharging station and refuel
6.1.1. Capacity constraints or recharge and still have enough time to return to the depot. This set of
Vehicle capacity constraints are often ignored in formulations of mask becomes a tree of decisions, much like the one depicted in Fig. 5.
the green and electric vehicle routing problems. We allow for the A node may only be visited if the conjunction of all the logical
possibility that vehicles may have a limited cargo-carrying capacity 𝑄. conditions for visiting that node are satisfied.
The vehicles may service a demand 𝑑𝑖 at each customer in 𝐼. The sum During training, it is necessary to ensure the convergence of a
of the demands of a customer in a given route 𝑟 may not exceed 𝑄. solution by mandating that at least one customer is selected in any
route. This prevents behaviour observed at the beginning stages of
6.1.2. Time windows constraints training where multiple routes are constructed in which only refuelling
Time windows constraints restrict or influence the decision to ser- or recharging stations are visited. Constructing many routes of this
vice customers at particular times. A vehicle arrives at a node 𝑖 at kind does not render the solutions for some of the models of these
time 𝑡𝑖 . In the case of hard time windows, the customer 𝑖 may only problems infeasible, but it does significantly slow training or possibly
be serviced between the beginning of the time window 𝑙𝑖 and the close make training impossible. This is because the training updates occur
of the time window 𝑢𝑖 . In the case of soft time windows, a customer only once all solutions for a batch of problem instances have been
may be serviced outside of this range of times, but a penalty may be completed. If any one solution does not converge, then no updates
incurred. It is always ensured that the time window beginning starts to the weights of the neural network will be made. This mask can
after the opening time of the problem (𝑡 = 0) and that the time window be removed at testing time if required, or retained to guarantee the
finish occurs before the close of the problem closing time (𝑡 = 𝑇 ). To feasibility of a solution. This may come at a cost in restricting the
ensure feasibility in the case that 𝑇 is finite, there must be enough time solution space that the end-to-end model can access.
10
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
Fig. 4. Masking scheme. Considering the vehicle starts at node 𝑖, which could be any node, it must consider if it can now visit customer 𝑗. This is possible if it has both enough
energy and time to do so and to return to the depot. Otherwise, a check can be performed to evaluate if subsequently visiting a charging station could make such a visit feasible.
Fig. 5. A depiction of a decision tree used to generate the mask that determines whether or not a customer may be visited, considering the vehicle is an electric vehicle. If a
node is feasible, the probability 𝑝𝑡𝑗 that node 𝑗 is selected at time step 𝑡 is nonzero. It is zero-valued otherwise.
11
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
Next, we consider the feature vectors that can be used for these The context for each problem variant contains different elements,
problem variants. The feature vectors describing each node represent depending on whether or not each element of that variant is considered.
the input information for the neural network for that node. Depending Each context vector, as with the original paper of Kool et al. (2018) and
on the constraints that have been considered, the constructed feature follow-up works such as Falkner and Schmidt-Thieme (2020) contain
vector may have a different size. the vectors representing the embedding of the previously visited node
and the graph embedding. The context must represent the current state
6.3.1. Depot features of the solution. It must therefore contain information about the current
The depot features are always represented by a two-dimensional amount of fuel, the time remaining for travel, the distance remaining
feature vector. These are the coordinates of the depot in the first for travel and the current cumulative variance in the time expended
quadrant of the Cartesian plane. and the current cumulative variance in the fuel or battery energy used.
If the current route contains a set of edges 𝑟 and the last-visited node
6.3.2. Customer features is node 𝑗, then we have the potential elements of the context vector as
As with the depot features, the absolute coordinates on the Cartesian follows:
plane are always considered. In addition, the polar coordinates with
1 ∑
respect to the depot are considered because they are found to be useful 𝑞̄𝑖 = 1 − (𝑡 + 𝜎𝑒𝑡 ) (10)
𝑇 𝑒∈𝑟 𝑒
in many classical heuristics that have been developed in the literature.
1 ∑
The normalised length of the vector√between the depot node 𝛿 and 𝑞̄𝑖 = 1 − (𝜖 + 𝜎𝑒𝜖 ) (11)
𝐵 𝑒∈𝑟 𝑒
the customer node 𝑖, 𝜌𝑖 = 𝓁2 (𝑥𝛿 , 𝑥𝑖 )∕ 2 is the first of these features.
The normalised angle 𝜔𝑖 = 𝜃𝑖 ∕2𝜋 (in radians) is computed as the Here, 𝜎𝑒𝑡 is the variance in time taken to traverse edge 𝑒 and 𝜎𝑒𝜖 is
angle subtended by the positive side of the horizontal axis and the line the variance in energy.
pointing from the depot to the node 𝑖, assuming the depot to be at the
origin of that coordinate system. The customer feature 𝑧𝑖 may contain a 6.6. Demonstrating effectiveness on a Green vehicle routing problem
number of elements, depending on the constraints that are considered.
These include the normalised customer demand, 𝑑𝑖 ∕𝑄, where 𝑑𝑖 is the We extended our learning approach for a green vehicle routing
demand of customer 𝑖 and 𝑄 is the vehicle capacity. This may also problem variant by adapting the mask, feature vector, embedders and
include the service time 𝑠𝑖 , which we assume to be always less than decoding context as described above. Next, we demonstrate that this
half an hour (following the convention of Montoya et al. (2017)), but simple extension of our approach is already quite effective for this
it may be normalised by the maximum service time 𝑆 to become 𝑠𝑖 ∕𝑆. problem.
It may also contain the normalised time window bounds 𝑙𝑖 ∕𝑇 and 𝑢𝑖 ∕𝑇 ,
if they are considered. 6.6.1. Generating problem instances
For the training dataset of our learning algorithm, all problem
6.3.3. Refuelling or recharging station features instances are generated with customers and demands distributed ac-
The features for a refuelling station or a recharging station should cording to the approach of Uchoa et al. (2017). Customers can be
indicate the likelihood that a given station is a suitable choice, given its placed at random in the unit square uniformly, in clusters or a mixture
location and its technology. As with the other node types, the Cartesian of both. The demand of a customer is also random, but are integer-
coordinates of the nodes are considered and, as with the customer valued and may be generated uniformly in a random range, which
nodes, the normalised vector length 𝜌𝑖 and angle 𝜔𝑖 with respect to the may be a small range, a large range or a medium range. The demand
depot. If the refuelling or recharging function is modelled as a constant may also be some fixed value that depends on the quadrant. Service
then an additional feature is simply the time required to perform a full times are generated similarly, but are not integer values, they are real-
charge divided by the total time allotted for routing, 𝛥𝑖 (0, 𝐵)∕𝑇 . If the valued and may not exceed half an hour and may not be less than one
refuelling or recharging function is modelled as a linear function of the twentieth of an hour.
current amount of fuel or state of charge upon arrival at the station, The distance 𝑑𝑖𝑗 between each pair of nodes is determined by
then this feature also suffices to differentiate charging stations. If the computing the Euclidean distance 𝓁2 (𝑥𝑖 , 𝑥𝑗 ) between them, multiplying
charging function is modelled as a nonlinear function (or piecewise it by one thousand, and rounding it (using the ceiling function) to the
linear) function of the state of charge, then we may represent the closest integer:
charging station with the normalised time taken to complete a full
refuelling or recharging and the normalised time required to perform 𝑑𝑖𝑗 = ⌈1000 ∗ 𝓁2 (𝑥𝑖 , 𝑥𝑗 )⌉ (12)
a refuelling or recharging up to 80%, 𝛥𝑖 (0, 8𝐵 ). This is suggested by
10 The base time 𝑡𝑖𝑗 taken to travel between each node is computed in
the fact that up until this point, the charging function of most electric
the manner of the instances generated by Montoya et al. (2017). The
vehicles can be accurately approximated as a linear function of the state
base time is assumed to be the time taken for a vehicle travelling at
of charge (Montoya et al., 2017).
40 km/h. If the travel time is stochastic, then traffic conditions along
6.4. Node feature embedder an edge are presumed to be a function of the centrality of its nodes.
A node that is closer on average to other nodes is assumed to have
For each node type, we have a different set of weights in the encod- a higher degree of traffic. The centrality of a node is computed by
ing. Among nodes of the same node type, the weights used to perform determining the average Euclidean distance 𝑐𝑖 between a given node
the embedding are shared. For depot nodes, we use the following: and all the other nodes and subtracting it from
√ the maximum possible
distance between edges in the unit square, 2:
𝑞𝑖 = 𝑊𝐷 𝑥𝑖 + 𝑏𝐷 ∀𝑖 ∈ 𝐷 (7)
√ |𝑉 |
1 ∑
For customer nodes, we use the following embedder: 𝑐𝑖 = 2− 𝓁 (𝑥 , 𝑥 ) (13)
𝑛 𝑗=1,𝑗≠𝑖 2 𝑖 𝑗
𝑞 𝑖 = 𝑊 𝐼 𝑥 𝑖 + 𝑏𝐼 ∀𝑖 ∈ 𝐼 (8)
The fuel consumption along an edge is a linear function of the time
For refuelling stations, we use the following embedder: taken to traverse an edge. If this time taken to traverse an edge is
stochastic, then it is a linear function of the stochastic time. That is,
𝑞𝑖 = 𝑊𝐹 𝑥𝑖 + 𝑏𝐹 ∀𝑖 ∈ 𝐹 (9)
the fuel consumption is linear in the realised value of the time taken
The rest of the encoder is as described in the work of Kwon et al. to traverse the given edge. Fleet size constraints are not considered to
(2020). ensure that each problem instance is feasible.
12
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
Table 6 difficult problem instances with a large variety of instances sizes using
Results obtained by testing the extension of our approach on the green vehicle routing
machine-learning techniques.
problem set of Erdoğan and Miller-Hooks (2012). Here we compare the solution of our
approach to that of the best known solution on each instance.
Our learning heuristic uses the ML models only on problems of
Instance BKS Our solution Ratio Our time (s)
a similar size for which they are trained. This means that we do
not have to engineer the ML models themselves for scalability. Our
20c2sC1 1235.21 1331.06 1.08 0
20c3sC2 1539.94 1610.44 1.05 0
decomposition is a route-first type of decomposition, instead of, say,
20c3sC3 985.41 1094.53 1.11 0 a geometric decomposition based solely on the angular position of the
20c3sC4 1080.16 1131.97 1.05 0 customer nodes. This is to ensure that we can satisfy the fleet-size
20c3sC5 2190.68 2210.11 1.01 0 constraint.
20c3sC6 2785.86 2965.38 1.06 0
Compared to the AM and POMO approaches, the main gain of our
20c3sC7 1393.98 1455.74 1.04 0
20c3sC8 3319.71 3741.65 1.13 0 heuristic comes from the decomposition of the problem and using the
20c3sC9 1799.95 1810.29 1.01 0 learning models on sub-problems of roughly similar size. Running the
20c3sC10 2583.42 2858.83 1.11 0 CVRP-SPP improves the solution quality of sub-problems even further.
111c21s 5626.64 5914.43 1.05 1
In contrast, the gains from changing the features and updating the
111c22s 5610.57 5840.33 1.04 1
111c24s 5412.48 5903.1 1.09 1
training procedure are relatively modest.
111c26s 5408.38 5641.18 1.04 1
The Set Partitioning Problem
111c28s 5331.93 5491.93 1.03 1
200c21s 10 428.59 11 478.12 1.10 32
Our ablation study indicates that the set partitioning step can in-
250c21s 11 886.61 13 976.34 1.18 36 crease the quality of the solutions obtained by reducing the gap with
300c21s 14 242.56 15 162.68 1.06 49 respect to the best known solution, as shown in Table 3. However,
350c21s 16 471.69 17 651.53 1.07 54 the time required to yield such an improvement is significant. Should
400c21s 21 952.48 24 103.09 1.10 76
the time be available for such methods to be used, the set partitioning
450c21s 21 854.57 24 634.06 1.13 75
500c21s 24 527.46 26 988.32 1.10 102 approach is a general one that can work for any variant of a routing
problem. All that is required is an existing set of feasible routes,
with a feasible solution guaranteed if these routes are obtained from
solutions to the original problem. In some cases performance of the
6.6.2. Results on the green vehicle routing problem benchmark machine learning heuristic may be poor enough on the sub-problems
We trained our learning model on the generated instances described that improving solutions are rarely generated. The set partitioning step
earlier with between 20 and 50 customer nodes. For testing, we use may be necessary for any significant improvements to be made after
the green vehicle routing problem benchmark set of Erdoğan and the first few iterations. Without this step, the solution quality may
Miller-Hooks (2012). We summarise the results in Table 6. align closely to that of the initial solution. An alternative approach
First, we note that even on instances with 400–500 customers, we might be to make use of the retraining mode discussed in the work
are able to compute good solutions in a small time (75–102 s). To of Kwon et al. (2020). Instead of using the solution construction scheme
the best of our knowledge, our approach is the first machine learning using the trained machine learning heuristic, a training mode is used
technique to find such good solutions on these benchmark instances. instead. In essence, the trained heuristic acts as a strong basis for
This shows that our decomposition strategy is effective in dealing finding good solutions for a particular problem variant. At test time,
with large instances in reasonable time. We believe that with further however, training can be restarted using only the problems encountered
algorithm engineering tricks, this approach can be scaled to even solve at test time. By re-training the machine learning heuristic on only these
problem instances with thousands of nodes. problems, generalisation performance is traded for improved solution
In terms of generalisation, we note that our approach is able to quality for only these test problems. By starting with an already-trained
achieve consistently good results for large problem instances, even machine learning heuristic, we can yield improved solutions to these
though it was trained on significantly smaller problem instances. Com- problems at test time. This may allow this approach to yield better
pared to the best known solution on each instance, the solutions solutions without the need of the set partitioning approach. Future
produced by our approach have a ratio of less than equal to 1.18 on work could focus on the development of this approach, extending the
all instances and less than equal to 1.1 on most instances. Also, from retraining scheme to other problem variants and implementing it as the
the results shown in Table 6, we observe that the optimality ratio core machine learning heuristic of this technique.
with respect to the best known solutions shows moderate correlation
with the size of the problem instances (𝑅 = 0.408). In terms of Selecting Sub-Problems
generalisation across instance distributions, we note that the training In this work, we select sub-problems on which to attempt im-
instances are synthetically generated and the test instances come from provement with a hand-crafted heuristic, relying on distances of route
the benchmark set of Erdoğan and Miller-Hooks (2012). centroids. The initial route is selected randomly. Although this can yield
These results indicate that by simply constructing a new masking improvements, it remains to develop this further. One approach may
scheme and slightly adapting the features, embedder and decoder, our involve introducing a tabu list, for example, to ensure focus is placed
approach can be used to solve other vehicle routing problem variants on regions of the problem where improvement has not yet occurred.
such as the green vehicle routing problem. Another approach may be to develop a learning-based approach for
sub-problem selection. This could encourage improvement on regions
7. Discussion that are most likely to yield improvements and reduce the time spent
on sub-problems where no improvement occurs.
Partitioning a large problem instance into smaller sub-problems en- Dealing with Routes that have Many Customers The perfor-
ables machine-learning based solvers, which can solve smaller problems mance of our approach depends on the ability of the internal sub-solver
effectively, to scale. This does not involve adapting the architecture to quickly and effectively solve the sub-problems. In our approach,
of the machine-learning solver, but uses it effectively out-of-the box. we limit the sub-problem size to 𝑙 and assume that the sub-problem
The benefit of this is that existing effective solvers can be immediately sizes are in a limited range below 𝑙. Thus, a learning model trained
incorporated in this way. Since many of these approaches can be made on instances of sizes in that limited range would be able to deal with
to solve a large variety of problem variants, this approach can be the sub-problems effectively. However, if the number of vehicles 𝑘
adapted to solve large problems for those variants also. This should serving the customers is small and the number of customers is large, it is
minimise engineering effort for developing effective solvers for more possible that 𝑁∕𝑘 is close to 𝑙 or perhaps larger than 𝑙. If it is close to 𝑙,
13
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
then only one route will generally be used to solve a sub-problem. This References
effectively reduces each sub-problem to a travelling salesman problem
over the customers contained in that route. If it is larger than 𝑙, then we Applegate, D.L., Bixby, R.E., Chvátal, V., Cook, W., Espinoza, D.G., Goycoolea, M.,
will have to increase 𝑙 or to abandon the solve. If we increase 𝑙, then Helsgaun, K., 2009. Certification of an optimal TSP tour through 85,900 cities.
Oper. Res. Lett. 37 (1), 11–15.
the trained learning models may not be effective on the larger size sub-
Bdeir, A., Falkner, J.K., Schmidt-Thieme, L., 2022. Attention, filling in the gaps for
problems and as a result, there may be fewer (if any) improvements in generalization in routing problems. In: Joint European Conference on Machine
sub-problem solutions. Learning and Knowledge Discovery in Databases (ECML PKDD). Springer, pp.
The recent approach of Efficient Active Search (Hottung et al., 505–520.
2021) offers an approach to yield improved solution quality even for Bruglieri, M., Mancini, S., Pezzella, F., Pisacane, O., 2019. A path-based solution
problem instances much different than those encountered at training approach for the green vehicle routing problem. Comput. Oper. Res. 103, 109–122.
Clarke, G., Wright, J.W., 1964. Scheduling of vehicles from a central depot to a number
time. This is achieved by adding an instance-specific layer to the
of delivery points. Oper. Res. 12 (4), 568–581.
architecture of the neural network solver. This layer only is retrained
Cordeau, J.-F., Gendreau, M., Laporte, G., Potvin, J.-Y., Semet, F., 2002. A guide to
at test time for a test instance to improve the solution quality for that vehicle routing heuristics. J. Oper. Res. Soc. 53, 512–522.
specific instance. This allows learning-based solvers to obtain high- Cordeau, J.-F., Laporte, G., Savelsbergh, M.W., Vigo, D., 2007. Vehicle routing. In:
quality solutions even for problem instances larger than those that Handbooks in Operations Research and Management Science. Vol. 14, Elsevier, pp.
the neural network was intended to solve. This comes at the cost of 367–428.
increased solve times as a result of retraining, but solution quality da Costa, P., Rhuggenaath, J., Zhang, Y., Akcay, A., Kaymak, U., 2021. Learning 2-opt
heuristics for routing problems via deep reinforcement learning. SN Comput. Sci.
gains can be significant. Classical techniques find it considerably harder
2, 1–16.
to obtain good solutions for such problems than for problems with Desaulniers, G., Errico, F., Irnich, S., Schneider, M., 2016. Exact algorithms for electric
many shorter routes (Pecin et al., 2017). Initial experiments replacing vehicle-routing problems with time windows. Oper. Res. 64 (6), 1388–1405.
the sub-solver with the EAS approach indicate that this may be a Duan, L., Zhan, Y., Hu, H., Gong, Y., Wei, J., Zhang, X., Xu, Y., 2020. Efficiently
promising approach in future works, since it can be used to solve solving the practical vehicle routing problem: A novel joint learning approach.
problem instances that have routes with many customers. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining. pp. 3054–3063.
Erdoğan, S., Miller-Hooks, E., 2012. A green vehicle routing problem. Transportation
8. Conclusions research part E: Logistics and Transportation Review 48 (1), 100–114.
Falkner, J.K., Schmidt-Thieme, L., 2020. Learning to solve vehicle routing problems
We proposed a learning heuristic for solving VRP problems that de- with time windows through joint attention. arXiv preprint arXiv:2006.09100.
composes a problem instance into smaller sub-problems and solves the Helsgaun, K., 2017. An Extension of the Lin-Kernighan-Helsgaun TSP Solver for
sub-problems using a neural network based machine learning heuristic. Constrained Traveling Salesman and Vehicle Routing Problems. Vol. 12, Roskilde:
Roskilde University.
We leverage the decomposition to ensure that the neural network solver
Hottung, A., Kwon, Y.-D., Tierney, K., 2021. Efficient active search for combinatorial
is only ever called to solve sub-problems of the master problem that
optimization problems. arXiv preprint arXiv:2106.05126.
it has been trained to solve effectively. This allows us to obtain good Koç, Ç., Karaoglan, I., 2016. The green vehicle routing problem: A heuristic based exact
solutions for large problems without significantly changing the neural solution approach. Appl. Soft Comput. 39, 154–164.
network architecture. The solutions for the sub-problems are further Kool, W., van Hoof, H., Gromicho, J., Welling, M., 2022. Deep policy dynamic program-
improved by formulating and solving a set-partitioning problem. We ming for vehicle routing problems. In: International Conference on Integration of
show that the solutions from our heuristic have small gaps with respect Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR).
Springer, pp. 190–213.
to the best known solutions on the large and complex CVRP problem
Kool, W., Van Hoof, H., Welling, M., 2018. Attention, learn to solve routing problems!.
instances from the dataset of Uchoa et al. (2017). We provide an arXiv preprint arXiv:1803.08475.
ablation study that quantifies the gains resulting from our choice of Kwon, Y.-D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S., 2020. Pomo: Policy
heuristics for the different components. optimization with multiple optima for reinforcement learning. Adv. Neural Inf.
Our learning heuristic provides a general framework for solving a Process. Syst. 33, 21188–21198.
wide range of other, more highly-constrained routing problem variants Li, S., Yan, Z., Wu, C., 2021. Learning to delegate for large-scale vehicle routing. Adv.
Neural Inf. Process. Syst. 34, 26198–26211.
with little modification. This has the potential to significantly reduce
Lu, H., Zhang, X., Yang, S., 2019. A learning-based iterative method for solving vehicle
the time to design new learning heuristics for the different variants and routing problems. In: International Conference on Learning Representations.
for customising them to specific input distributions. Initial experimen- Moghdani, R., Salimifard, K., Demir, E., Benyettou, A., 2021. The green vehicle routing
tation shows that this can be used to solve problem instances of the problem: A systematic literature review. J. Clean. Prod. 279, 123691.
GVRP of Erdoğan and Miller-Hooks (2012), even at scale. Montoya, A., Guéret, C., Mendoza, J.E., Villegas, J.G., 2017. The electric vehicle routing
problem with nonlinear charging function. Transp. Res. B 103, 87–110.
CRediT authorship contribution statement Nazari, M., Oroojlooy, A., Snyder, L., Takác, M., 2018. Reinforcement learning for
solving the vehicle routing problem. Adv. Neural Inf. Process. Syst. 31.
Pecin, D., Pessoa, A., Poggi, M., Uchoa, E., 2017. Improved branch-cut-and-price for
James Fitzpatrick: Conceptualization, Methodology, Software, In- capacitated vehicle routing. Math. Program. Comput. 9, 61–100.
vestigation, Writing — original draft. Deepak Ajwani: Conceptualiza- Pessoa, A., Sadykov, R., Uchoa, E., Vanderbeck, F., 2020. A generic exact solver for
tion, Methodology, Writing — original draft. Paula Carroll: Conceptu- vehicle routing and related problems. Math. Program. 183, 483–523.
alization, Methodology, Writing — review & editing. Queiroga, E., Sadykov, R., Uchoa, E., 2021. A POPMUSIC matheuristic for the
capacitated vehicle routing problem. Comput. Oper. Res. 136, 105475.
Rabecq, B., Chevrier, R., 2022. A deep learning attention model to solve the vehicle
Data availability
routing problem and the pick-up and delivery problem with time windows. arXiv
preprint arXiv:2212.10399.
We have used publicly available datasets in our work. Ribeiro, C.C., Hansen, P., Taillard, É.D., Voss, S., 2002. POPMUSIC—Partial op-
timization metaheuristic under special intensification conditions. Essays Surv.
Acknowledgements Metaheuristics 613–629.
Subramanian, A., Uchoa, E., Ochi, L.S., 2013. A hybrid algorithm for a class of vehicle
routing problems. Comput. Oper. Res. 40 (10), 2519–2531.
This publication has emanated from research conducted with the
Uchoa, E., Pecin, D., Pessoa, A., Poggi, M., Vidal, T., Subramanian, A., 2017. New
financial support of Science Foundation Ireland under Grant num-
benchmark instances for the capacitated vehicle routing problem. European J. Oper.
ber 18/CRT/6183. For the purpose of Open Access, the authors have Res. 257 (3), 845–858.
applied a CC BY public copyright licence to any Author Accepted Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł.,
Manuscript version arising from this submission. Polosukhin, I., 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30.
14
J. Fitzpatrick et al. Computers and Operations Research 171 (2024) 106787
Vidal, T., Crainic, T.G., Gendreau, M., Lahrichi, N., Rei, W., 2012. A hybrid genetic Xin, L., Song, W., Cao, Z., Zhang, J., 2021. Multi-decoder attention model with
algorithm for multidepot and periodic vehicle routing problems. Oper. Res. 60 (3), embedding glimpse for solving vehicle routing problems. In: Proceedings of the
611–624. AAAI Conference on Artificial Intelligence. Vol. 35, pp. 12042–12049.
Vinyals, O., Fortunato, M., Jaitly, N., 2015. Pointer networks. Adv. Neural Inf. Process.
Syst. 28.
15