A Multiobjective Edge-based Learning Algorithm for the Vehicle Routing Problem With Time Windows
A Multiobjective Edge-based Learning Algorithm for the Vehicle Routing Problem With Time Windows
Information Sciences
journal homepage: www.elsevier.com/locate/ins
A R T I C L E I N F O A B S T R A C T
Dataset link: https:// The multiobjective vehicle routing problem with time windows has attracted much attention
github.com/wzydeath/VRPTW-data in recent decades. Until now, various metaheuristic methods have been proposed to solve the
problem. However, designing effective methods is not trivial and heavily depends on experts’
Keywords:
Multiobjective optimization knowledge. As a research hotspot in recent years, a few deep reinforcement learning methods have
Deep reinforcement learning been tried to solve the multiobjective vehicle routing problem with symmetric distance and time
Vehicle routing problem matrices. However, due to the complex traffic conditions, the travel distance and time between two
nodes are probably asymmetric in real-world scenarios. This article introduces a multiobjective
edge-based learning algorithm (MOEL) to tackle this issue. In this method, a single neural network
model is established and trained to approximate the whole Pareto front of the problem. The edge
features, including travel distance and time matrices, are fully learned and used to construct high
quality solutions. MOEL is compared against three state-of-the-art deep reinforcement learning
methods (MODRL/D-EL, PMOCO, EMNH) and five metaheuristic methods (NSGA-II, MOEA/D,
NSGA-III, MOEA/D-D, MOIA). Experimental results on the real-world instances indicate that
MOEL significantly outperforms all competitors, improving IGD by up to 99.80% and HV by up to
62.84%. In addition, MOEL achieves a maximum runtime reduction of 88.65% compared to the
deep reinforcement learning methods, highlighting its efficiency and effectiveness for solving the
problem.
1. Introduction
With the development of the economy and society, logistics has become an increasingly essential part of the national economy.
Logistics planning plays a vital role in logistics, and the vehicle routing problem (VRP) [1] is a critical issue in logistics planning.
The basic VRP aims to dispatch vehicles from a depot to serve customers and meet their requests. The sequencing of the customers
assigned to each vehicle needs to be determined to minimize the total routing cost. Considering different business scenarios, many
variant problems have emerged based on the basic VRP [2]. VRP with time windows (VRPTW) is one of the variants of VRP that
considers time window constraints on customers. Each customer specifies a delivery time window for their order. If the vehicle arrives
before the start of the time window, it will need to wait to begin, while it cannot arrive after the end of the time window. The goal
* Corresponding author.
E-mail addresses: [email protected] (Y. Zhou), [email protected] (L. Kong), [email protected] (H. Wang).
https://doi.org/10.1016/j.ins.2025.122223
Received 30 August 2024; Received in revised form 18 April 2025; Accepted 18 April 2025
Symbols Definition
of VRPTW is to plan a set of routes subject to vehicle capacity and time window constraints. The problem is single-objective if only
one objective (such as vehicle number or travel distance) is optimized. Otherwise, the problem is considered multiobjective when
multiple conflicting objectives are optimized simultaneously. This variant is crucial in the field of logistics, as delivery time windows
are commonly set by end-customers for the delivery of goods. Such planning is deeply connected to both the accuracy of deliveries
and the satisfaction of customers [2].
VRPTW is an NP-hard combinatorial optimization problem [3], making it difficult to solve. Metaheuristic methods are widely used
and have been successfully applied to address this problem [2]. However, designing effective heuristics requires significant experts’
domain knowledge and trial and error. Besides, most of these methods are iterative and need a long time to solve a problem. Recently,
learning-based methods have attracted more and more attention in solving VRPs [4]. As a representative learning methodology, deep
reinforcement learning (DRL) methods have successfully solved VRPs. These methods formulate the VRPs as a sequential decision
making process and train a deep neural network by reinforcement learning to solve problems. Until now, most DRL algorithms have
been designed and applied to single-objective VRPs. Recently, some works have focused on multi-objective VRPs (MOVRPs) [4].
However, these works have the following limitations:
1. Most of these methods are applied to the multi-objective traveling salesman problem (MOTSP) and multi-objective capaci
tated VRP (MOCVRP). Very few have applied to multi-objective VRPs with complex constraints such as multi-objective VRPTW
(MOVRPTW).
2. Most of these methods are evaluated using manually created instances, which take Euclidean distance as both the travel distance
and travel time, i.e., the distance and time matrices are symmetric and directly computed from the coordinates of nodes (cities or
customers). However, the distance and time matrices are usually asymmetric due to varying traffic conditions in the real world.
This article focuses on a real-world MOVRPTW, with problem instances derived from the daily distribution scenarios of JingDong
logistics1 in China. Two conflicting objectives, total travel distance and total waiting time, are to be optimized simultaneously. The
travel distance between nodes is determined by the path length in the transportation network. Besides, although the travel time is
related to travel distance, complex traffic factors such as traffic jams need to be considered when calculating the travel time. Therefore,
the time matrix between two nodes is asymmetric and different from the distance matrix. Existing DRL methods for MOVRPs are
node-based models, using node features (including the coordinates of nodes) to construct a sequence of nodes as a solution for an
instance. The edge features (including the distance and time matrices) are not directly used for solution construction in these methods.
As a result, these methods may not be effective for the real-world MOVRPTW considered in this article.
To address the challenges of solving MOVRPTW, a novel multiobjective edge-based learning algorithm, termed MOEL, is proposed
in this article. MOEL employs a single neural network to approximate the whole Pareto front, enabling flexible trade-offs between
multiple objectives without requiring additional metaheuristics. The contributions of this article are summarized as follows:
1. This article introduces a new multiobjective DRL method, MOEL, for solving MOVRPTW with complex edge features, such as
asymmetric time and distance matrices. Unlike existing node-based DRL methods, MOEL directly leverages edge information for
solution construction, making it suitable for the complex MOVRPTW instances.
2. MOEL employs a single model to generate high-quality solutions for any given instance and preference. By incorporating the
preferences into the model, MOEL eliminates the need for training multiple models, providing a flexible way to solve MOVRPTW.
3. Comprehensive experiments show that MOEL significantly outperforms the state-of-the-art DRL and metaheuristic methods,
especially on large-scale MOVRPTW instances, in terms of both solution quality and computational efficiency.
The remaining sections are organized as follows. Section 2 provides an overview of related work, including the categories of
multiobjective optimization methods and the methods for MOVRPs. Section 3 describes the definition of MOVRPTW. Section 4
introduces the proposed MOEL. Section 5 gives the experimental details. Finally, Section 6 presents the conclusions and outlines
future work. Tables 1 and 2 summarize the main symbols used in this article.
1
www.jd.com, the largest online or offline retailer in China.
2
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 2
Notations of MOEL.
Symbols Definition
2. Related work
Considering the timing of decision maker’s preference input, multiobjective optimization methods can generally be divided into
three categories [5]:
1. A priori methods: Preferences, such as the weights of objectives, are provided before the optimization process begins.
2. A posteriori methods: Preferences are specified after the optimization process, once trade-offs among nondominated solutions
are available for evaluation.
3. Interactive: Preferences are refined iteratively by obtaining feedback from the decision maker at multiple stages during the
optimization process.
In this study, we focus on the second category, i.e., a posteriori methods. These methods aim to generate a diverse set of nondomi
nated solutions that exhibit both good convergence (closeness to the Pareto front) and diversity (spread along the Pareto front). Such
solutions help decision makers gain a better understanding of the problem and the available alternatives, thus leading to a conscious
and better choice [6]. Accordingly, this article proposes an effective posterior method for the MOVRPTW. In the following sections,
we briefly review multiobjective metaheuristics and DRL approaches that fall within the a posteriori category for solving MOVRPs.
Multiobjective metaheuristics methods are manually designed strategies that iteratively explore the solution space to approximate
Pareto optimal solutions. These methods are widely applied to MOVRPs [7]. They can be broadly categorized into three types: multi
objective evolutionary algorithms, multiobjective local searches and multiobjective mimetic algorithms. Multiobjective evolutionary
algorithms are popular for decision making because their population-based approach allows them to approximate the entire Pareto
front in a single run. The representative algorithms for multiobjective optimization problems (MOPs) include nondominated sorting
genetic algorithm (NSGA-II) [8], reference-point-based many objective evolutionary algorithm following NSGA-II framework (NSGA
III) [9], multiobjective evolutionary algorithm based on decomposition (MOEA/D) [10], multiobjective particle swarm optimization
(PSO) [11], competitive swarm optimizer (CSO) [12] and multi-objective immune algorithm (MOIA) [13]. Most of the multiobjective
evolutionary algorithms are designed and tested on multiobjective continuous optimization problems. Multiobjective local searches,
like Pareto local search (PLS) [14], leverage problem-specific knowledge to guide the search directly toward the Pareto front. There
fore, this sort of algorithm is a good choice for tackling multiobjective combinatorial optimization problems (MOCOPs). However,
these methods usually depend on carefully handcrafted designs and are often specialized for each problem. Moreover, the running
time for these methods is still long due to their iterative nature.
DRL methods have become a novel approach to solving VRPs in the last few years [4]. They are data-driven methods. Most of
them automatically learn the heuristics for solving the problem through deep neural networks with encoder-decoder architecture.
This basically eliminates the need to design a solution strategy manually. Recently, multiobjective DRL algorithms have gradually
attracted researchers’ interest in solving MOCOPs. Based on the number of models, they can be roughly divided into two categories:
multiple models and single models.
The methods using multiple models often follow the decomposition discipline, i.e., decompose an MOCOP into a set of scalar
optimization subproblems and model each subproblem as a neural network. Following this concept, Li et al. [15] proposed a DRL
based multiobjective optimization algorithm (DRL-MOA) to solve the MOTSP. In DRL-MOA, each subproblem is modeled using a
3
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 3
DRL methods for MOVRPs.
pointer network, and the parameters of all subproblems are optimized via a neighborhood-based parameter-transfer strategy. Wu et
al. [16] extended the previous work [15] and proposed an attention model-based multiobjective optimization algorithm (MODRL/D
AM) for the MOTSP. The attention model can extract the node features and graph structures of MOTSP, which achieves better results
than DRL-MOA. Several studies have focused on evolutionary learning to further improve the performance of models. Shao et al. [17]
proposed a multiobjective evolutionary algorithm based on decomposition and dominance (MONEADD). It uses generic operations
and reward signals to evolve neural networks without further engineering. Based on the work [16], Zhang et al. [18] introduced a
multiobjective deep reinforcement learning with evolutionary learning algorithm (MODRL/D-EL) for the MOVRPTW. This approach
employs a two-stage hybrid learning strategy, where DRL with parameter-transfer is used in the first stage, followed by evolutionary
learning to fine-tune model parameters in the second stage.
Since the number of Pareto optimal solutions for an MOCOP may be extremely large, the required number of models to approximate
the whole Pareto front would be huge. In addition, the reference points for decomposition are predefined and fixed before training,
making methods using multiple models less flexible. Recently, several studies tried to solve MOCOPs with a single model. Lin et
al. [19] proposed a preference-conditioned neural multiobjective combinatorial optimization model (PMOCO) to approximate the
whole Pareto front with any reference points. It uses an attention encoder to extract the node features and a preference-conditioned
decoder to generate a solution with a given reference point. Instead of using a whole decoder parameterized by reference points,
Ye et al. [20] proposed a weight-specific-decoder attention model (WSDAM) that uses a small weight-adaptive layer in the decoder.
Gao et al. [21] proposed a single-model multiobjective pointer network (MOPN). It uses a pointer network as the encoder. To deal
with the MOCOPs, the node information and reference point are combined as the input of the encoder. Wang et al. [22] proposed a
multiobjective routing attention model (MORAM). Its encoder consists of the objective encoder, the router and the global encoder,
which can dynamically determine the subproblem embeddings. Following the concept of meta-learning, Zhang et al. [23] proposed a
meta-learning-based DRL (MLDRL) for the MOCOPs. First, a meta-model is trained to learn knowledge of the whole Pareto front. It is
updated by multiple subproblems constructed by different reference points. Then, in the fine-tuning step, a submodel of a subproblem
is obtained by updating the meta-model within a few steps. To accelerate the training process, Chen et al. [24] proposed an efficient
meta neural heuristic (EMNH) that uses a multi-task model composed of a parameter-shared body and respective task-related heads
to train the meta-model. To enhance the diversity of generated solutions, Li et al. [25] proposed a DRL-based method that employs
the multiple decoder attention model (MOMDAM). It applies one encoder to generate node embeddings for a given subproblem, and
then applies multiple decoders to produce a diverse set of solutions for the subproblem.
Table 3 summarizes the related work on multiobjective DRL algorithms. Most of the existing methods have been applied to the
MOTSP and MOCVRP, and very few have been applied to the MOVRPTW that has more complex constraints. Moreover, most of them
focus only on node features and are evaluated using manually generated problem sets that rely on Euclidean distance for both travel
distance and time. As a result, these methods may not perform well on real-world MOVRPTW problems that involve complex edge
features. Inspired by the reference [26], this article proposes a single model using edge features to construct solutions for MOVRPTW,
which is detailed in the following sections.
3. Problem definition
The MOVRPTW is defined as an MOCOP in a directed graph = {, }, where = {𝑣𝑖 |𝑖 = 0, … , 𝑛} is a set of nodes in the graph,
and = {𝑒𝑖𝑗 |𝑣𝑖 , 𝑣𝑗 ∈ } is a set of edges in the graph. The node 𝑣0 denotes the depot, and others denote the customers. Each node 𝑣𝑖
has the following features:
4
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
𝑡𝑣𝑖 ,𝑣𝑗 The travel time between the nodes 𝑣𝑖 and 𝑣𝑗 .
A fleet of homogeneous vehicles, each with a capacity 𝐶 , is dispatched to serve the customers. Each vehicle begins at the depot,
serves a sequence of customers, and then returns to the depot. The objective is to plan 𝑀 routes, = [𝑅1 , … , 𝑅𝑀 ], by simultaneously
minimizing the total travel distance and total waiting time, while satisfying the constraints of the MOVRPTW. Each route 𝑅𝑗 is
represented as a sequence of nodes ⟨𝜋0 , 𝜋1 , … , 𝜋𝑁𝑗 ⟩, where 𝜋0 = 𝜋𝑁𝑗 = 𝑣0 denotes the depot, and the intermediate nodes 𝜋𝑖 ∈
represent the customers in the 𝑗 th route. Let 𝑎𝑖 be the arrival time at node 𝜋𝑖 , and 𝑙𝑖−1 be the departure time from node 𝜋𝑖−1 . The
arrival time at node 𝜋𝑖 is calculated as follows:
Arriving after the time window is forbidden. The total travel distance of the 𝑗 th route is calculated as:
𝑁𝑗 −1
∑
𝑑𝑖𝑠𝑡𝑗 = 𝑑𝜋𝑖 ,𝜋𝑖+1 . (4)
𝑖=0
In this article, two objectives, total travel distance 𝑓1 and total waiting time 𝑓2 , are optimized simultaneously. These objectives
have direct practical relevance in terms of logistics. 𝑓1 is an economic objective that significantly impacts total logistics costs. Mini
mizing 𝑓1 reduces the cost of transportation. Meanwhile, minimizing 𝑓2 increases efficiency and avoids wasting working time. The
two objectives 𝑓1 and 𝑓2 are conflicting due to the time window constraint. Minimizing travel distance often indicates selecting
the shortest path between nodes. However, because of the time windows, vehicles may arrive early and need to wait to serve the
customer, increasing the waiting time. On the other hand, to minimize waiting time, routes may need to be adjusted so that vehicles
arrive closer to the start of the time window, which may lead to longer travel distances. Therefore, multiobjective optimization is
necessary to generate a set of trade-off solutions. The objective 𝑓1 is defined as:
𝑀
∑
𝑓1 = 𝑑𝑖𝑠𝑡𝑗 (5)
𝑗=1
1. Capacity constraint: The total demand on each route must not exceed the vehicle’s capacity 𝐶 . Specifically, for the 𝑗 th route 𝑅𝑗 ,
∑ 𝑁𝑗
the constraint is defined as: 𝑖=1 𝑔𝑖 ≤ 𝐶 , where 𝑔𝑖 represents the demand of the 𝑖th customer in the route.
2. Time window constraint: A vehicle cannot arrive after the end of the time window of each node, i.e., for each 𝜋𝑖 , 𝑎𝑖 ≤ 𝑡𝑤𝑒𝑖 must
hold.
4. Solution methodology
5
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Let 𝒙, 𝒚 ∈ Ω, 𝒙 Pareto dominates 𝒚 (denoted as 𝒙 ≺ 𝒚 ) iff 𝑓𝑖 (𝒙) ≤ 𝑓𝑖 (𝒚) for every 𝑖 ∈ {1, … , 𝑚}, and 𝑓𝑗 (𝒙) < 𝑓𝑗 (𝒚) for at least one
𝑗 ∈ {1, … , 𝑚}. A solution 𝒙∗ is Pareto optimal if no other solution in Ω dominates it. The objective vector of 𝒙∗ , 𝐹 (𝒙∗ ), represents
a Pareto optimal objective vector. The Pareto set includes all Pareto optimal solutions, while the Pareto front comprises all Pareto
optimal objective vectors. In MOPs, objectives are often conflicting, meaning improving one objective may result in the degradation
of another. Consequently, no single solution can simultaneously optimize all objectives. Instead, a set of trade-offs is often needed for
decision making.
The proposed MOEL employs a single model with an encoder-decoder structure, inspired by the attention model proposed by
Kool et al. [27]. The encoder-decoder architecture is a sequence-to-sequence modeling framework widely applied to combinatorial
optimization tasks. It consists of two main components:
• Encoder: The encoder processes the input problem features and transforms them into fixed-dimensional latent embeddings. These
embeddings capture the spatial and structural relationships within the problem instance, providing a comprehensive summary
of the input instance.
• Decoder: The decoder constructs the solution in a step-by-step manner. At each step, it selects an action from the available op
tions. Guided by the attention mechanism, the decoder dynamically focuses on the most relevant parts of the input representation
during each step of the solution construction.
This model treats the solution generation process as a sequential decision-making task, which can be trained using reinforcement
learning. The key elements include state, action and reward. State represents the current status of the problem. Action defines a
decision to be performed at the current state. Reward evaluates the quality of actions taken at each state. The policy defines a strategy
for selecting actions based on the current state. Reinforcement learning aims to optimize the policy to maximize the cumulative
reward over time. Representative algorithms for policy optimization include policy gradient methods and actor-critic methods.
In MOEL, the encoder inputs edge features to produce edge embeddings via sublayers, while the decoder uses these embeddings
to sequentially generate a series of edges as a solution. In order to deal with the multi-objective nature of the problem, the preference
for objectives is incorporated into the inputs and learned, thus allowing MOEL to solve the problem with any preferences. When
trained, the proposed MOEL is expected to obtain high-quality solutions in a short running time for the MOVRPTW. Fig. 1 illustrates
the inference process of MOEL. A decision maker’s preference is denoted as a reference point in the figure. For example, the point
(0.8, 0.2) means 80% importance on objective 𝑓1 and 20% on objective 𝑓2 . Each time, MOEL takes the problem instance and a
preference as an input vector and generates a corresponding edge-based solution for the preference. As a result, MOEL generates four
solutions satisfying different preferences.
To make MOEL tackle preferences effectively, the MOVRPTW is decomposed into single-objective subproblems using preference
based scalarization methods. These methods decompose the problem into multiple subproblems with different preferences. By solving
all subproblems, a set of approximated Pareto solutions can be obtained. Common scalarization techniques include weighted sum
method, Tchebycheff method and penalty-based boundary intersection (PBI) method [10]. Let 𝝀 = (𝜆1 , 𝜆2 )𝑇 be a reference point
where 𝜆1 + 𝜆2 = 1 and 𝜆1 , 𝜆2 ≥ 0. For a given instance 𝑠 of the MOVRPTW, the three scalarization methods are described as follows:
6
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
• Weighted Sum Method: The corresponding subproblem formulated by weighted sum method is defined as:
𝑓𝑖 (𝒙) − min𝑖
𝑓̄𝑖 (𝒙) = (10)
max𝑖 − min𝑖
where max𝑖 and min𝑖 are the maximum and minimum values of the 𝑖th objective value, respectively. Normalization is used
because the two objectives, 𝑓1 and 𝑓2 , have different scales.
• Tchebycheff Method: The Tchebycheff method formulates the subproblem as:
The proposed MOEL aims to use a single model to solve any subproblems of the MOVRPTW. A subproblem of the MOVRPTW
can be described as a Markov decision process. In the node-based scheme, given an initial state with an empty solution, the model
sequentially appends a node to the solution at each step and creates a complete solution at the end. The methods for the node-based
scheme aim to find a stochastic policy that produces a sequence of nodes 𝝅 = (𝜋0 , … , 𝜋𝑇 ) to minimize the expected subproblem’s
scalar function, where 𝜋0 = 𝜋𝑇 = 𝑣0 and other 𝜋𝑖 ∈ .
According to [26], the node-based scheme can be transformed into an edge-based scheme. While the node-based approach selects
a node at each step, the edge-based approach chooses an edge at each step. Specifically, let an edge-based solution be 𝝉 = (𝜏1 , … , 𝜏𝑇 ),
where 𝜏𝑡 = 𝑒𝜋𝑡−1 ,𝜋𝑡 ∈ for 1 ≤ 𝑡 ≤ 𝑇 . The proposed MOEL aims to find a stochastic policy 𝑝𝜃 (𝝉) parameterized by 𝜃 . For an instance
𝑠 and a subproblem with a reference point 𝝀, the policy sequentially constructs a feasible edge-based solution 𝝉 with the minimal
scalar value:
𝑇
∏
𝑝𝜃 (𝝉|𝑠, 𝝀) = 𝑝𝜃 (𝜏𝑡 |𝑠, 𝝀, 𝜏1∶𝑡−1 ) (13)
𝑡=1
4.3. Encoder
The encoder of the MOEL is illustrated in Fig. 2(a). The input vectors of the encoder consist of the node features, edge features
and the information of reference point. Specifically, given a reference point 𝝀 = (𝜆1 , 𝜆2 )𝑇 , an input vector is defined as:
where 𝑞𝑗 , 𝑡𝑤𝑏𝑗 , 𝑡𝑤𝑒𝑗 are the node features of the edge 𝑒𝑖𝑗 ’s ending node 𝑣𝑗 . The coordinates of 𝑣𝑗 are excluded from the node features,
because an edge’s travel distance and travel time depend on the traffic condition and are not directly calculated with the coordinates.
Meanwhile, only the features of the ending node are contained because when constructing a solution, the next edge is selected based
on the ending node of the current edge. 𝑑𝑣𝑖 ,𝑣𝑗 and 𝑡𝑣𝑖 ,𝑣𝑗 are the edge features of 𝑒𝑖𝑗 .
First, the encoder maps each input vector to a 𝑑 -dimensional space by a trainable parameter matrix 𝑊0 ∈ ℝ𝑑×7 as follows:
ℎ0𝑖𝑗 is the initial edge embedding of 𝑒𝑖𝑗 . Then, 𝐿 encoding layers further update the edge embeddings. Each layer consists of an
attention layer and an aggregation layer. Let 𝑯 𝑙−1 = {ℎ𝑙−1
11
, … , ℎ𝑙−1
𝑛𝑛 } be the input of the 𝑙 th attention layer, the output of this layer
for 𝑒𝑖𝑗 is calculated as follows:
• 𝑆(𝑒𝑖𝑗 ) represents the candidate edges starting from the node 𝑣𝑗 . Suppose the current selected edge is 𝑒𝑖𝑗 . When constructing
a solution, the next selected edge will be an edge 𝑒𝑗𝑘 starting from 𝑣𝑗 . Therefore, the most relevant edges of 𝑒𝑖𝑗 will be 𝑒𝑗𝑘 ,
𝑘 = 1, … , 𝑛 among all edges. To reduce the computational complexity, 𝑆(𝑒𝑖𝑗 ) contains the 𝑟-nearest edges starting from 𝑣𝑗 .
7
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
• SHA(⋅) denotes the single-head attention operation [28], BN(⋅) denotes the batch normalization operation [29], and ReLU(⋅)
denotes the ReLU activation [30].
• 𝑊1 ∈ ℝ𝑑×𝑑 is a trainable parameter matrix.
where 𝑊2 ∈ ℝ𝑑×2𝑑 and 𝑊3 ∈ ℝ𝑑×𝑑 are trainable parameters, and [; ] represents the concatenation operation. The parameters are not
shared between layers.
4.4. Decoder
The decoder sequentially generates a solution 𝝉 based on the edge embeddings 𝑯 𝐿 = {ℎ𝐿
11
, … , ℎ𝐿
𝑛𝑛 }. For simplicity, the superscript
𝐿 is omitted, and 𝑯 is used instead of 𝑯 𝐿 . Fig. 2(b) illustrates the structure of the decoder. At time step 𝑡 ∈ {1, … , 𝑇 }, the decoder
selects an edge 𝜏𝑡 using the context embedding ℎ ̃ 𝑡 and the embedding of candidate edges. The context embedding ℎ̃ 𝑡 represents the
𝑐 𝑐
decoding context information at step 𝑡, which is defined as follows:
{[ ]
𝑄 ; 𝑇 ; 𝐷 ; 𝑅 ; 𝝀 𝑡 > 1,
ℎ̃ 𝑡𝑐 = [ 𝑡 𝑡 𝑡 𝑡 ] (19)
𝑄𝑡 ; 𝑇𝑡 ; 𝐷𝑡 ; 𝑣1 ; 𝝀 𝑡 = 1,
where 𝝀 is the same reference point used in the encoder. 𝑄𝑡 , 𝑇𝑡 and 𝐷𝑡 are the remaining capacity, travel time and total travel distance
of the vehicle at step 𝑡, respectively. Initially, 𝑄1 is set to the maximum capacity 𝐶 , and 𝑇1 , 𝐷1 are set to 0. If an edge 𝑒𝑖𝑗 is selected
at step 𝑡 − 1, 𝑄𝑡 , 𝑇𝑡 and 𝐷𝑡 are computed as follows:
{
𝐶 if 𝑣𝑗 = 𝑣0 ,
𝑄𝑡 = (20)
𝑄𝑡−1 − 𝑞𝑗 otherwise,
{
0 if 𝑣𝑗 = 𝑣0 ,
𝑇𝑡 = 𝑏 (21)
max(𝑇𝑡−1 + 𝑡𝑣𝑖 ,𝑣𝑗 , 𝑡𝑤𝑗 ) otherwise,
𝐷𝑡 = 𝐷𝑡−1 + 𝑑𝑣𝑖 ,𝑣𝑗 (22)
8
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
𝑅𝑡 represents the embedding of the partial solution 𝝉 1∶𝑡−1 constructed from step 1 to step 𝑡 − 1. Since 𝝉 1∶𝑡−1 is a sequence, the long
short-term memory (LSTM) [31] is adopted to calculate the 𝑑 -dimensional 𝑅𝑡 as follows:
ℎ̃ 𝑡𝑐 = 𝑊𝑐 ℎ̃ 𝑡𝑐 (24)
̃ 𝑡 is updated by the multi-head attention (MHA) layer as follows:
Suppose the vehicle is at node 𝑣𝑖 at step 𝑡, ℎ 𝑐
If one of these conditions holds, 𝑒𝑖𝑗 cannot be feasibly selected. As a result, 𝑢𝑘𝑖𝑗 is masked as −∞.
From the compatibilities 𝑢𝑘𝑖𝑗 , the attention weight 𝛼𝑖𝑗
𝑘 is calculated by the softmax operation:
𝑢𝑘𝑖𝑗
𝑒
𝛼𝑖𝑗𝑘 = (27)
∑𝑛 𝑢𝑘 ′
𝑗 ′ =1 𝑒
𝑖𝑗
The output of the MHA layer is calculated by contacting all 𝑣𝑘𝑖𝑗 (𝑘 = 1, … , 𝐾 ) as follows:
𝐾
∑
ℎ𝑡𝑐 = 𝑊𝑂𝑘 𝑣𝑘𝑖𝑗 (29)
𝑘=1
where 𝑊𝑉𝑘 ∈ ℝ𝑑𝑘 ×𝑑 and 𝑊𝑂𝑘 ∈ ℝ𝑑×𝑑𝑘 are trainable parameter matrices.
Finally, ℎ𝑡𝑐 is fed into a single-head attention layer to compute the probability of choosing an edge at step 𝑡. The compatibility of
ℎ𝑡𝑐 for 𝑒𝑖𝑗 (𝑗 = 1, … , 𝑛) is calculated as follows:
⎧ (ℎ𝑡 )𝑇 (𝑊 ′ ℎ𝑖𝑗 )
⎪𝐶 ⋅ tanh( 𝑐 √ 𝐾 ) if 𝑒𝑖𝑗 can be feasibly selected,
𝑢̄ 𝑖𝑗 = ⎨ 𝑑 (30)
⎪−∞ otherwise,
⎩
where 𝑊𝐾′ ∈ ℝ𝑑×𝑑 is a trainable parameter matrix, and 𝐶 = 10 is a hyperparameter for clipping. Then, the softmax operation is used
to calculate the probability of choosing 𝜏𝑡 = 𝑒𝑖𝑗 at step 𝑡 as follows:
𝑒𝑢̄ 𝑖𝑗
𝑝𝑖𝑗 = 𝑝𝜃 (𝜏𝑡 |𝑠, 𝝀, 𝜏1∶𝑡−1 ) = ∑𝑛 𝑢̄ 𝑖𝑗 ′
(31)
𝑗 ′ =1 𝑒
The decoder can select an edge based on the probabilities of the edges at each step and construct a feasible solution at the end.
9
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Algorithm 1 Training process of MOEL.
Input: distribution of reference point Λ, distribution of MOVRPTW instances 𝑆 , number of training steps 𝑆𝑖𝑡𝑒𝑟 , batch size 𝐵
Output: The trained parameters 𝜃
1: Initialize the model parameters 𝜃
2: for 𝑖𝑡𝑒𝑟 = 1 to 𝑆𝑖𝑡𝑒𝑟 do
3: 𝝀 ← Sample_reference_point(Λ)
4: 𝑠𝑖 ← Sample_instance(𝑺 ) ∀𝑖 ∈ {1, … , 𝐵}
5: 𝝉 𝑖 ← Sample_solution(𝑝𝜃 (⋅|𝑠𝑖 , 𝝀)) ∀𝑖 ∈ {1, … , 𝐵}
6: 𝑏(𝑠𝑖 |𝝀) ← Greedy_rollout(𝑝𝜃 (⋅|𝑠𝑖 , 𝝀)) ∀𝑖 ∈ {1, … , 𝐵}
∑𝐵
7: ∇𝐽 (𝜃) ← 𝐵1 𝑖=1 [(𝑔(𝝉 𝑖 |𝑠𝑖 , 𝝀) − 𝑏(𝑠𝑖 |𝝀))∇𝜃 log 𝑝𝜃 (𝝉 𝑖 |𝑠𝑖 , 𝝀)]
8: 𝜃 ← ADAM(𝜃 , ∇𝐽 (𝜃))
9: end for
For an instance 𝑠 of the MOVRPTW and a given reference point 𝝀, the goal of the proposed model is to minimize the expected
value for the scalar function (denoted as 𝑔(𝝉|𝑠, 𝝀)):
The complexity of the proposed MOEL mainly lies in the attention mechanism. For a problem with 𝑛 nodes (𝑛2 edges) and
embedding dimension 𝑑 , the complexity is as follows:
• In the encoder, the attention mechanism generates embeddings for all edges, which takes 𝑂(𝑛2 ⋅ 𝑟 ⋅ 𝑑 + 𝑛2 ⋅ 𝑑 2 ), where 𝑟 denotes
the number of candidate edges.
• During the decoding process, the solution is generated sequentially. At each step, the attention mechanism is applied to select an
edge, which takes 𝑂(𝑛⋅𝑑 2 +𝑛2 ⋅𝑑). Therefore, the complexity of the decoder for generating a complete solution is 𝑂(𝑛2 ⋅𝑑 2 +𝑛3 ⋅𝑑).
Since the key components in the attention mechanism, such as matrix multiplications, can be calculated in parallel, the efficiency
of MOEL can be significantly improved using GPU acceleration.
5. Experiments
In this section, all experiments are conducted on a server equipped with eight Intel Xeon (Cascade Lake) Platinum 8269CY CPUs
at 2.5 GHz and 16.0 GB of RAM to investigate the performance of the proposed MOEL. A single 4090 GPU is used to train the network
models in the experiments. All algorithms are implemented in Python.
The problem instance used in this article comes from the B2B delivery scenarios of JingDong logistics in Beijing [34]. Specifically,
the whole data set contains one depot and 1600 customers. Node coordinates, travel distance and time matrices are obtained from
real business scenarios. All data are normalized into [0, 1]. The time windows and demands are set as follows:
10
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
• Time windows: The time window of the depot is set to [480, 1440], which means the depot opens at 8:00 and closes at 24:00.
As the working time is 8 hours, i.e., 480 minutes, the begin service time 𝑡𝑤𝑏𝑖 of the 𝑖th customer is randomly selected from
[480, 930]. The interval of the time window is randomly selected from [30, 60, 90, 120]. The end service time 𝑡𝑤𝑒𝑖 cannot exceed
16:00. The time window of each node 𝑡𝑤𝑖 = [𝑡𝑤𝑏𝑖 , 𝑡𝑤𝑒𝑖 ] is further normalized into [0, 1] as follows:
𝑡𝑤𝑖 − 480
𝑡𝑤𝑖 = (36)
1000
• Demands: The demands of the nodes are set as in [18]. The depot has a demand of 0, while customer demands are randomly
generated. Specifically, the demand 𝑞𝑖 for the 𝑖th customer is sampled from a normal distribution (15, 10) and truncated to an
integer within the range [1, 42]. This value is then scaled by the vehicle capacity 𝐶 , where 𝑞𝑖 = 𝑞𝑖 ∕𝐶 . For consistency with [18],
the vehicle capacity 𝐶 is set to 750.
The performance of an algorithm for the MOVRPTW is assessed in terms of convergence and diversity. In this article, the following
two indicators are used:
1. Inverted generational distance (IGD) [10]: Let 𝑃 ∗ be a set of uniformly distributed points along the Pareto front. Let 𝑨 be an
obtained solution set, the IGD value is calculated as:
∑
𝑣∈𝑷 ∗ 𝑑(𝑣, 𝑨)
IGD(𝑨, 𝑷 ∗ ) = (37)
|𝑷 ∗ |
where 𝑑(𝑣, 𝑨) is the minimum Euclidean distance between 𝑣 and the points in 𝐴. Since the true Pareto front of the MOVRPTW
is unknown, all nondominated solutions generated by the competitors are gathered as the Pareto front.
2. Hypervolume indicator (HV) [35]: A reference point and an obtained solution set are required to calculate HV. Let 𝕃 be the
Lebesgue measure in 𝑅𝑚 , the HV is defined as follows:
⋃
HV(𝑨, 𝒛∗ ) = 𝕃( {𝒙|𝒂 ≺ 𝒙 ≺ 𝒛∗ }) (38)
𝒂∈𝑨
where 𝑨 is the solution set and 𝒛∗ is a reference point. For the problem with customer sizes 20, 50, 100 and 150, 𝒛∗ is set to
(15, 3000), (30, 6000), (60, 9000) and (90, 12000) respectively.
A higher (lower) value of HV (IGD) can be regarded as a better set of solutions in terms of convergence and diversity. In addition,
nonparametric statistical tests, i.e., Holm’s test and Wilcoxon signed-rank test [36] at 5% significance level, are applied to show the
difference between MOEL and the compared algorithm.
11
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 4
Comparisons of average IGD and HV values of MOEL-WS, MOEL-TCH and MOEL-PBI on the
instances with 50 customers.
HV Holm’s test
Algorithm Average ranking HV 𝑝-value 𝛼 /𝑖 Diff.?
Table 5
Average HV values of MOEL with dif
ference combinations of parameters on
100 instances with 50 customers.
1 2 5 11682525.34
2 2 10 11767618.13
3 2 15 11733958.74
4 3 5 11783095.44
5 3 10 11906972.5
6 3 15 11862821.59
7 4 5 11836763.94
8 4 10 11823853.05
9 4 15 11828377.24
Table 5 shows the average HV values of MOEL with difference combinations of parameters. The best value is shown in bold. In
addition, Fig. 3 shows the mean effect of each parameter. The results show that the best configuration for MOEL is {𝐿 = 3, 𝑟 = 10}.
This setting is adopted for the rest of the experiments in this article.
MOEL is trained on instances with 50 customers and tested on instances with varying customer sizes. In the training and testing
instances, the customers and the information of travel distance and travel time are randomly selected from the whole data set. The
time window and demand of each customer are randomly sampled as introduced in Section 5.1. To study the training behavior of
MOEL, a validation dataset containing 100 instances with 50 customers is randomly generated to assess the performance of MOEL.
The HV values for each training step on the validation dataset are recorded. Fig. 4 shows the variation of average HV values obtained
12
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 6
Average IGD and HV values of MONL and MOEL on the datasets. The running time for each algorithm is listed.
by MOEL. It can be observed that MOEL is almost convergent at 150000 steps. Therefore, the training step 𝑆𝑖𝑡𝑒𝑟 is set to 150000. The
training time of MOEL is about 23 hours.
The main feature of MOEL is that it is an edge-based model that extracts the edge features and constructs a feasible solution
using edge embeddings. To validate the effectiveness of the edge-based model, a node-based model, termed MONL, is proposed to
be compared with MOEL. The MONL and MOEL have the same architecture. The difference is that MONL extracts only the node
features and constructs a solution based on node embeddings. The hyperparameters of MONL are set the same as those of MOEL. The
training time of MOEL and MONL is about 23 hours and 20 hours, respectively. Since MOEL needs more computational resources to
compute all edge embeddings, its training and testing time is generally longer than that of the algorithm that computes only the node
embeddings.
Table 6 shows the average IGD and HV values of MONL and MOEL on instances with different problem sizes. The best average
value between the two algorithms is highlighted in bold. It can be observed that MOEL obtains better values than MONL on all
datasets in terms of IGD and HV. Since all instances in a dataset can be tested in parallel, the total running time for a dataset is
presented in Table 6. MOEL has a slightly longer running time compared with MONL. From the Wilcoxon signed-rank test in Table 7,
MOEL obtains higher 𝑅+ than 𝑅− values and 𝑝-values < 0.05 on all datasets in terms of IGD and HV, meaning MOEL significantly
outperforms MONL on these datasets.
Fig. 5 illustrates the nondominated solutions obtained by MOEL and MONL on one instance of each dataset to visually show the
results. From the figure, the nondominated solutions obtained by MOEL have better convergence and diversity properties than MONL.
From the numeric and visual results, MOEL performs better than MONL in solving the MOVRPTW, which confirms the effectiveness
of the edge-based model. Since the travel distance and time matrices are usually asymmetric due to the traffic condition in the real
13
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 7
Statistical results of performance comparisons of MOEL with MONL (MOEL
vs. MONL) by Wilcoxon’s test.
IGD
Problem size 𝑅+ 𝑅− 𝑝-value 𝛼 =0.05 𝛼 =0.15
20 customers 3347 1703 0.004684 Yes Yes
50 customers 4300 750 0 Yes Yes
100 customers 5032 18 0 Yes Yes
150 customers 5050 0 0 Yes Yes
HV
Problem size 𝑅+ 𝑅− 𝑝-value 𝛼 =0.05 𝛼 =0.15
20 customers 3570 1480 0.000325 Yes Yes
50 customers 4440 660 0 Yes Yes
100 customers 5020 30 0 Yes Yes
150 customers 5050 0 0 Yes Yes
Fig. 5. Nondominated solutions obtained by MOEL and MONL on an instance of each dataset.
world MOVRPTW, the edge-based model can better learn these features, whereas the node-based model cannot directly use them. It
is recommended to use the edge-based model to solve the MOVRPs with asymmetric travel distance and time matrices.
The proposed MOEL is compared with several baseline algorithms, which fall into two categories: DRL methods and metaheuristic
methods.
• MODRL/D-EL: It is a two-stage method for MOVRPTW. In the first stage, the decomposition strategy is applied to generate 100
subproblems, and each subproblem is solved by an attention model. As a result, MODRL/D-EL has 100 models to be trained.
These models are trained by the DRL with parameter-transfer strategy. In the second stage, the evolutionary learning is applied
to fine-tune the parameters of the models.
• PMOCO: It is a single model that approximates the whole Pareto front of MOCOPs. To this end, the decoder of PMOCO is
parameterized by reference points. Since the original PMOCO has not been tested on MOVRPTW, we have adapted it to solve
the MOVRPTW considered in this article.
• EMNH: It is a newly proposed meta neural heuristic for MOCOP. In EMNH, a meta-model is first trained and then fine-tuned
by an efficient hierarchical method. EMNH has been shown to be superior to MLDRL [23], another meta neural heuristic for
14
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 8
Average IGD and HV values of MODRL/D-EL, PMOCO, EMNH and MOEL on the datasets. The running time for each algorithm is listed.
Table 9
Statistical results of performance comparisons of MOEL with MODRL/D-EL, PMOCO and EMNH by Wilcoxon’s test.
HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MODRL/D-EL 1348 3702 1 No No MOEL vs. MODRL/D-EL 757 4293 1 No No
MOEL vs. PMOCO 2466 2584 1 No No MOEL vs. PMOCO 2010 3040 1 No No
MOEL vs. EMNH 4782 268 0 Yes Yes MOEL vs. EMNH 4745 305 0 Yes Yes
100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MODRL/D-EL 4625 425 0 Yes Yes MOEL vs. MODRL/D-EL 3872 1178 0.000004 Yes Yes
MOEL vs. PMOCO 4702 348 0 Yes Yes MOEL vs. PMOCO 4023 1027 0 Yes Yes
MOEL vs. EMNH 5050 0 0 Yes Yes MOEL vs. EMNH 5050 0 0 Yes Yes
150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MODRL/D 5048 2 0 Yes Yes MOEL vs. MODRL/D 5045 5 0 Yes Yes
MOEL vs. PMOCO 5050 0 0 Yes Yes MOEL vs. PMOCO 5048 2 0 Yes Yes
MOEL vs. EMNH 5050 0 0 Yes Yes MOEL vs. EMNH 5050 0 0 Yes Yes
MOVRPTW. Since EMNH has also not been tested on MOVRPTW, we have adapted it to solve the MOVRPTW considered in this
article.
The hyperparameters of MODRL/D-EL, PMOCO and EMNH are set the same as in their original references, except that the training
time is set the same as that of MOEL for fair comparisons. Moreover, the number of symmetric sampled weight vectors 𝑁̃ in EMNH
is set to 1 instead of the number of objectives 𝑚, because we find that the model is difficult to train when 𝑁̃ = 𝑚 but can be well
trained when 𝑁 ̃ = 1 for solving MOVRPTW. All the compared DRL methods are node-based models. To be compared with MOEL,
these methods are trained on the problem instances with 50 customers described in Section 5.1. The weighted sum is chosen as the
scalarizing method in all the compared methods. When testing, PMOCO and EMNH use 100 reference points generated by simplex
lattice design for subproblem decomposition. Since MODRL/D-EL contains 100 models, it can also generate 100 solutions. Therefore,
MOEL and the compared algorithms all generate 100 solutions for each instance.
Table 8 shows the average IGD and HV values of MODRL/D-EL, PMOCO, EMNH and MOEL on each dataset. The best average value
for a dataset is highlighted in bold. Besides, Table 9 shows the statistical results of comparisons between MOEL and the competitors.
The result shows that MOEL performs better in solving large-scale instances. Specifically, MOEL significantly outperforms EMNH
on all datasets and outperforms MODRL/D-EL and PMOCO on the datasets with 100 and 150 customers. For instances with 100
customers, MOEL improves HV by 1.2%, 1.45%, and 11.2%, and IGD by 40.37%, 45.51%, and 95% over MODRL/D-EL, PMOCO,
and EMNH, respectively. For instances with 150 customers, MOEL achieves HV improvements of 2.8%, 4.59%, and 12.28%, and IGD
improvements of 89.9%, 91.8%, and 98.93% over the same algorithms.
Fig. 6 illustrates the nondominated solutions obtained by MOEL and the competitors on one instance of each dataset. It is clear
that the solutions obtained by MOEL have better convergence and diversity properties on instance with larger problem sizes. MOEL
is inferior to MODRL/D-EL and PMOCO on the datasets with 20 and 50 customers. The reason may be that the parameter-transfer
strategy in MODRL/D-EL and the parameterized decoder of PMOCO can help the models better learn the strategy to solve the problem.
However, because MOEL can explicitly capture complex edge features, it has good generalization ability on larger-scale instances.
Therefore, edge-based models, e.g., MOEL, are recommended for solving the large-scale MOVRPTW.
15
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Fig. 6. Nondominated solutions obtained by MOEL and the competitors on an instance of each dataset.
• NSGA-II: It is a Pareto dominance-based multiobjective evolutionary method. Initially, a population is generated. During each
iteration, parents are randomly selected, and offspring are created using crossover and mutation operators. The combined parents
and offspring are then sorted using the fast nondominated sorting method. Finally, a new population is formed for the next
iteration based on the rank of nondominated fronts and crowding distance.
• MOEA/D: Different from NSGA-II, MOEA/D is a decomposition-based multiobjective evolutionary method. It decomposes an
MOP into multiple single-objective subproblems by a scalarizing method. In the beginning, a set of reference points and an
initial population are created. The neighborhood relation of subproblems is calculated based on Euclidean distances between
reference points. In each iteration, each subproblem is optimized using the information of its neighboring subproblems and
reproduction operations. The population is updated by the scalar values of the parents and offspring.
• NSGA-III: It extends NSGA-II by introducing a reference point-based selection mechanism, which ensures better convergence and
diversity on high-dimensional Pareto front. The key features include nondominated sorting, reference point association and an
elite preserving strategy.
• MOEA/D-D: The original MOEA/D uses a set of predefined reference points for decomposition. Inspired by [34], a weight space
partition strategy is incorporated into MOEA/D, which dynamically generates a set of well-distributed reference points at each
iteration. This strategy can lead to a solution set with better convergence and diversity.
• MOIA: It is a novel MOIA-based algorithm. In this method, each solution is associated with a subproblem. Then, a novel
decomposition-based clonal selection strategy is designed to clone the solutions with larger improvements for the subproblems,
encouraging searching around the promising subproblems. The algorithm shows superior performance compared to the original
MOIA for solving multiobjective continuous optimization problems.
The population size is set to 100 for all competitors. Therefore, MOEL and the competitors generate 100 solutions for each instance.
For MOVRPTW, the route-exchange crossover and the remove and reinsert mutation operator [38] are applied to generate feasible
offspring. The crossover and mutation rates are set to 1 and 0.01 respectively. Other parameters in the competitors are set the same
as in their original references. The weighted sum is chosen as the scalarizing method in MOEA/D, MOEA/D-D and MOIA for fair
comparison. As in [15], the maximum number of iterations for the competitors is set to 500, 1000, 2000, and 4000, respectively.
Table 10 shows the average IGD and HV values of the competitors with different iterations and MOEL on instances with different
problem sizes. The best average value for a dataset is highlighted in bold. Tables 11-15 show the statistical results of comparisons
between MOEL and the competitors. Although MOEL is trained on instances with 50 customers, it performs well on instances with
different problem sizes. The tables show that MOEL significantly outperforms NSGA-II, MOEA/D, and MOEA/D-D in terms of IGD
and HV on instances with problem sizes of {50, 100, 150}. In addition, MOEL shows superior performance compared to NSGA-III
and MOIA on instances with problem sizes of {100, 150}. Specifically, For instances with 100 customers, MOEL improves HV by
16
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 10
Average IGD and HV values of the competitors with different iterations and MOEL on the datasets. The running time for each algorithm is listed.
Table 11
Statistical results of performance comparisons of MOEL with NSGA-II by Wilcoxon’s test.
HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-II-500 5025 25 0 Yes Yes MOEL vs. NSGA-II-500 3752 1298 0.000024 Yes Yes
MOEL vs. NSGA-II-1000 4598 452 0 Yes Yes MOEL vs. NSGA-II-1000 3172 1878 0.025993 Yes Yes
MOEL vs. NSGA-II-2000 3982 1068 0.000001 Yes Yes MOEL vs. NSGA-II-2000 2849 2201 0.264535 No No
MOEL vs. NSGA-II-4000 2287 2763 1 No No MOEL vs. NSGA-II-4000 1627 3423 1 No No
100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-II-500 5050 0 0 Yes Yes MOEL vs. NSGA-II-500 5050 0 0 Yes Yes
MOEL vs. NSGA-II-1000 5050 0 0 Yes Yes MOEL vs. NSGA-II-1000 5050 0 0 Yes Yes
MOEL vs. NSGA-II-2000 5050 0 0 Yes Yes MOEL vs. NSGA-II-2000 5050 0 0 Yes Yes
MOEL vs. NSGA-II-4000 5050 0 0 Yes Yes MOEL vs. NSGA-II-4000 5050 0 0 Yes Yes
150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-II-500 5050 0 0 Yes Yes MOEL vs. NSGA-II-500 5050 0 0 Yes Yes
MOEL vs. NSGA-II-1000 5050 0 0 Yes Yes MOEL vs. NSGA-II-1000 5050 0 0 Yes Yes
MOEL vs. NSGA-II-2000 5050 0 0 Yes Yes MOEL vs. NSGA-II-2000 5050 0 0 Yes Yes
MOEL vs. NSGA-II-4000 5050 0 0 Yes Yes MOEL vs. NSGA-II-4000 5050 0 0 Yes Yes
41.57%, 27.47%, 39.97%, 25.88%, and 25.34%, and IGD by 97.80%, 97.14%, 98.09%, 96.66%, and 95.94% compared to NSGA-II,
MOEA/D, NSGA-III, MOEA/D-D, and MOIA, respectively. For instances with 150 customers, MOEL achieves HV improvements of
62.84%, 40.83%, 61.61%, 39.63%, and 41.59%, and IGD improvements of 99.78%, 99.50%, 99.80%, 99.45%, and 99.40% over the
same algorithms.
Figs. 7-11 visually present the results of MOEL and the competitors on one instance of each dataset. The competitors are iterative
methods requiring numerous iterations to enhance the quality of solutions. It can be observed that the solutions obtained by the
competitors improve as the number of iterations increases. Nevertheless, the solutions obtained by MOEL show better convergence
and diversity properties, especially for the instances with more customers. In terms of running time, because the competitors are
iterative methods, whereas the DRL methods, e.g., MOEL, are constructive methods, the running time of MOEL is much shorter than
that of the competitors. For example, for an instance with 150 customers, NSGA-II, MOEA/D, NSGA-III, MOEA/D-D, MOIA with
17
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Fig. 7. Nondominated solutions obtained by MOEL and NSGA-II on an instance of each dataset.
Fig. 8. Nondominated solutions obtained by MOEL and MOEA/D on an instance of each dataset.
18
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Fig. 9. Nondominated solutions obtained by MOEL and MOEA/D-D on an instance of each dataset.
Fig. 10. Nondominated solutions obtained by MOEL and NSGA-III on an instance of each dataset.
19
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 12
Statistical results of performance comparisons of MOEL with MOEA/D by Wilcoxon’s test.
HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-500 4937 113 0 Yes Yes MOEL vs. MOEA/D-500 4646 404 0 Yes Yes
MOEL vs. MOEA/D-1000 4032 1018 0 Yes Yes MOEL vs. MOEA/D-1000 4146 904 Yes Yes
MOEL vs. MOEA/D-2000 3234 1816 0.014708 Yes Yes MOEL vs. MOEA/D-2000 3740 1310 0.000029 Yes Yes
MOEL vs. MOEA/D-4000 2408 2642 1 No No MOEL vs. MOEA/D-4000 3393 1657 0.002825 Yes Yes
100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-500 5050 0 0 Yes Yes MOEL vs. MOEA/D-500 5050 0 0 Yes Yes
MOEL vs. MOEA/D-1000 5050 0 0 Yes Yes MOEL vs. MOEA/D-1000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-2000 5050 0 0 Yes Yes MOEL vs. MOEA/D-2000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-4000 5050 0 0 Yes Yes MOEL vs. MOEA/D-4000 5050 0 0 Yes Yes
150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-500 5050 0 0 Yes Yes MOEL vs. MOEA/D-500 5050 0 0 Yes Yes
MOEL vs. MOEA/D-1000 5050 0 0 Yes Yes MOEL vs. MOEA/D-1000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-2000 5050 0 0 Yes Yes MOEL vs. MOEA/D-2000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-4000 5050 0 0 Yes Yes MOEL vs. MOEA/D-4000 5050 0 0 Yes Yes
Table 13
Statistical results of performance comparisons of MOEL with NSGA-III by Wilcoxon’s test.
HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-III-500 4941 109 0 Yes Yes MOEL vs. NSGA-III-500 4326 724 0 Yes Yes
MOEL vs. NSGA-III-1000 4483 567 0 Yes Yes MOEL vs. NSGA-III-1000 3379 1671 0.003303 Yes Yes
MOEL vs. NSGA-III-2000 2494 2556 1 No No MOEL vs. NSGA-III-2000 2492 2558 1 No No
MOEL vs. NSGA-III-4000 1439 3611 1 No No MOEL vs. NSGA-III-4000 1154 3896 1 No No
100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-III-500 5050 0 0 Yes Yes MOEL vs. NSGA-III-500 5050 0 0 Yes Yes
MOEL vs. NSGA-III-1000 5050 0 0 Yes Yes MOEL vs. NSGA-III-1000 5050 0 0 Yes Yes
MOEL vs. NSGA-III-2000 5050 0 0 Yes Yes MOEL vs. NSGA-III-2000 5050 0 0 Yes Yes
MOEL vs. NSGA-III-4000 5050 0 0 Yes Yes MOEL vs. NSGA-III-4000 5050 0 0 Yes Yes
150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-III-500 5050 0 0 Yes Yes MOEL vs. NSGA-III-500 5050 0 0 Yes Yes
MOEL vs. NSGA-III-1000 5050 0 0 Yes Yes MOEL vs. NSGA-III-1000 5050 0 0 Yes Yes
MOEL vs. NSGA-III-2000 5050 0 0 Yes Yes MOEL vs. NSGA-III-2000 5050 0 0 Yes Yes
MOEL vs. NSGA-III-4000 5050 0 0 Yes Yes MOEL vs. NSGA-III-4000 5050 0 0 Yes Yes
4000 iterations and MOEL cost 647, 359, 758, 388, 616 and 86 seconds respectively, as shown in Table 10. This represents runtime
reductions of 86.77%, 76.04%, 88.65%, 77.84%, and 86.04% compared to these methods. The results demonstrate that MOEL is more
efficient than the metaheuristic methods.
In summary, MOEL performs better than the compared metaheuristic methods in terms of solution quality and running time for
solving MOVRPTW. Note that we use the classical crossover and mutation operators in the competitors, and we believe that novel
operators exist that are better suited to solving the MOVRPTW. However, designing such operators is not trivial and requires specific
knowledge from experts. The merit of MOEL is that it can automatically learn the heuristics from the data and doesn’t heavily depend
on the expert’s knowledge. Meanwhile, when trained, MOEL can obtain the nondominated solutions in a short running time, which
is more suitable in real-time scenarios than the traditional metaheuristic methods.
20
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 14
Statistical results of performance comparisons of MOEL with MOEA/D-D by Wilcoxon’s test.
HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-D-500 4693 357 0 Yes Yes MOEL vs. MOEA/D-D-500 4603 447 0 Yes Yes
MOEL vs. MOEA/D-D-1000 3893 1157 0.000003 Yes Yes MOEL vs. MOEA/D-D-1000 4163 887 0 Yes Yes
MOEL vs. MOEA/D-D-2000 3065 1985 0.06311 No Yes MOEL vs. MOEA/D-D-2000 3888 1162 0.000003 Yes Yes
MOEL vs. MOEA/D-D-4000 2617 2433 0.750451 No No MOEL vs. MOEA/D-D-4000 3747 1303 0.000026 Yes Yes
100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-D-500 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-500 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-1000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-1000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-2000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-2000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-4000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-4000 5050 0 0 Yes Yes
150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-D-500 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-500 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-1000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-1000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-2000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-2000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-4000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-4000 5050 0 0 Yes Yes
Table 15
Statistical results of performance comparisons of MOEL with MOIA by Wilcoxon’s test.
HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOIA-500 4702 348 0 Yes Yes MOEL vs. MOIA-500 4547 503 0 Yes Yes
MOEL vs. MOIA-1000 3407 1643 0.002411 Yes Yes MOEL vs. MOIA-1000 3854 1196 0.000005 Yes Yes
MOEL vs. MOIA-2000 1832 3218 1 No No MOEL vs. MOIA-2000 3404 1646 0.002495 Yes Yes
MOEL vs. MOIA-4000 1445 3605 1 No No MOEL vs. MOIA-4000 2788 2262 0.364937 No No
100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOIA-500 5050 0 0 Yes Yes MOEL vs. MOIA-500 5050 0 0 Yes Yes
MOEL vs. MOIA-1000 5050 0 0 Yes Yes MOEL vs. MOIA-1000 5050 0 0 Yes Yes
MOEL vs. MOIA-2000 5050 0 0 Yes Yes MOEL vs. MOIA-2000 5050 0 0 Yes Yes
MOEL vs. MOIA-4000 5050 0 0 Yes Yes MOEL vs. MOIA-4000 5050 0 0 Yes Yes
150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOIA-500 5050 0 0 Yes Yes MOEL vs. MOIA-500 5050 0 0 Yes Yes
MOEL vs. MOIA-1000 5050 0 0 Yes Yes MOEL vs. MOIA-1000 5050 0 0 Yes Yes
MOEL vs. MOIA-2000 5050 0 0 Yes Yes MOEL vs. MOIA-2000 5050 0 0 Yes Yes
MOEL vs. MOIA-4000 5050 0 0 Yes Yes MOEL vs. MOIA-4000 5050 0 0 Yes Yes
This article focuses on a real-world MOVRPTW whose travel distance and time matrices are asymmetric due to complex traffic
conditions. To solve this problem, a multiobjective edge-based learning algorithm, MOEL, is designed and implemented. The algorithm
follows the encoder-decoder architecture. To explicitly learn the complex edge features, the encoder encodes each edge, including
time and distance information, to produce edge embeddings. The decoder generates feasible solutions based on the edge embeddings.
To handle the multiobjective nature of the problem, the preference of the decision maker is incorporated into the input vectors to
make a single model solve the problem with any preferences. To assess the performance of the proposed MOEL, an experiment is
first conducted to show the effectiveness of the edge-based model. Then, MOEL is compared with three state-of-the-art DRL methods
and five representative metaheuristic methods on the real-world MOVRPTW instances. Experimental results indicate that MOEL
significantly outperforms the competitors on most instances and is especially effective in solving large-scale instances.
The proposed MOEL has several limitations as follows:
21
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Fig. 11. Nondominated solutions obtained by MOEL and MOIA on an instance of each dataset.
• Unlike the node-based models, MOEL needs to compute all edge embeddings for a batch of instances. Therefore, it needs more
computational resources than node-based models, including GPU time and memory. When solving larger-scale problems (e.g.,
200 customers), it is hard to train and use the MOEL model due to the limitations of our available resources (e.g., a single 4090
GPU).
• MOEL obtains worse results than MODRL/D-EL and PMOCO on the small-scale instances.
Based on these limitations, this work can be extended in several directions in the future. First, divide-and-conquer methods [39]
can be adopted to tackle the MOVRPTW with more customers under limited resources. Second, the edge-based formulation is gen
eral and can be used in any algorithm following the encoder-decoder architecture. To improve the algorithm’s performance, it can
be incorporated into other algorithms, e.g., MODRL/D-EL and PMOCO. Third, instance augmentation can be used to enhance the
generalization ability of MOEL.
Ying Zhou: Writing -- original draft, Methodology, Investigation, Conceptualization. Lingjing Kong: Writing -- review & editing,
Validation, Software. Hui Wang: Writing -- review & editing, Validation, Funding acquisition.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.
Acknowledgement
This work is supported by the National Natural Science Foundation of China (62202314, 62072483), the Natural Science Foun
dation of Guangdong Province of China (2018A0303130055, 2022A1515010417, 2018A030310664), the Shenzhen Fundamental
Research Fund under Grant No. 0220820010535001 and the Education Planning Project of Guangdong Province ``The Exploration
and Practice of Academic Value-Added Evaluation Under Artificial Intelligence Empowerment'' (2024GXJK782).
22
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Data availability
The data that support the findings of this study are openly available in https://github.com/wzydeath/VRPTW-data.
References
[1] G.B. Dantzig, J.H. Ramser, The truck dispatching problem, Manag. Sci. 6 (1) (1959) 80--91.
[2] G.D. Konstantakopoulos, S.P. Gayialis, E.P. Kechagias, Vehicle routing problem and related algorithms for logistics distribution: a literature review and classifi
cation, Oper. Res. 22 (2022) 2033--2062.
[3] J.K. Lenstra, A.H.G.R. Kan, Complexity of vehicle routing and scheduling problems, Networks 11 (2) (1981) 221--227.
[4] A. Bogyrbayeva, M. Meraliyev, T. Mustakhov, B. Dauletbayev, Machine learning to solve vehicle routing problems: a survey, IEEE Trans. Intell. Transp. Syst.
25 (6) (2024) 4754--4772.
[5] M.T.M. Emmerich, A.H. Deutz, A tutorial on multiobjective optimization: fundamentals and evolutionary methods, Nat. Comput. 17 (2018) 585--609.
[6] K. Deb, K. Sindhya, J. Hakanen, Multi-objective optimizations, in: Decision Sciences, Taylor & Francis Group, 2016, pp. 146--179.
[7] S. Zajac, S. Huber, Objectives and methods in multi-objective routing problems: a survey and classification scheme, Eur. J. Oper. Res. 290 (1) (2021) 1--25.
[8] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182--197.
[9] K. Deb, H. Jain, An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: Solving problems
with box constraints, IEEE Trans. Evol. Comput. 18 (4) (2014) 577--601.
[10] Q. Zhang, H. Li, MOEA/D: a multiobjective evolutionary algorithm based on decomposition, IEEE Trans. Evol. Comput. 11 (6) (2007) 712--731.
[11] L. Li, Y. Li, Q. Lin, Z. Ming, C.A.C. Coello, A convergence and diversity guided leader selection strategy for many-objective particle swarm optimization, Eng.
Appl. Artif. Intell. 115 (2022) 105249.
[12] L. Li, Y. Li, Q. Lin, S. Liu, J. Zhou, Z. Ming, Neural net-enhanced competitive swarm optimizer for large-scale multiobjective optimization, IEEE Trans. Cybern.
54 (6) (2023) 3502--3515.
[13] L. Li, Q. Lin, S. Liu, D. Gong, C.A.C. Coello, Z. Ming, A novel multi-objective immune algorithm with a decomposition-based clonal selection, Appl. Soft Comput.
81 (2019) 105490.
[14] L. Paquete, T. Schiavinotto, T. Stützle, On local optima in multiobjective combinatorial optimization problems, Ann. Oper. Res. 156 (1) (2007) 83.
[15] K. Li, T. Zhang, R. Wang, Deep reinforcement learning for multiobjective optimization, IEEE Trans. Cybern. 51 (6) (2021) 3103--3114.
[16] H. Wu, J. Wang, Z. Zhang, Modrl/d-am: multiobjective deep reinforcement learning algorithm using decomposition and attention model for multiobjective
optimization, in: Artificial Intelligence Algorithms and Applications, ISICA 2019, 2020, pp. 575--589.
[17] Y. Shao, J.C.-W. Lin, G. Srivastava, D. Guo, H. Zhang, H. Yi, A. Jolfaei, Multi-objective neural evolutionary algorithm for combinatorial optimization problems,
IEEE Trans. Neural Netw. Learn. Syst. 34 (4) (2023) 2133--2143.
[18] Y. Zhang, J. Wang, Z. Zhang, Y. Zhou, Modrl/d-el: multiobjective deep reinforcement learning with evolutionary learning for multiobjective optimization, in:
2021 International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1--8.
[19] X. Lin, Z. Yang, Q. Zhang, Pareto set learning for neural multi-objective combinatorial optimization, in: 10th International Conference on Learning Representations
(ICLR 2022), 2022, pp. 1--30.
[20] T. Ye, Z. Zhang, J. Chen, J. Wang, Weight-specific-decoder attention model to solve multiobjective combinatorial optimization problems, in: 2022 IEEE Interna
tional Conference on Systems, Man, and Cybernetics (SMC), 2022, pp. 2839--2844.
[21] L. Yang Gao, R. Wang, C. Liu, Z. Hong Jia, Multi-objective pointer network for combinatorial optimization, arXiv:2204.11860, 2022.
[22] Z. Wang, S. Yao, G. Li, Q. Zhang, Multiobjective combinatorial optimization using a single deep reinforcement learning model, IEEE Trans. Cybern. 54 (3) (2024)
1984--1996.
[23] Z. Zhang, Z. Wu, H. Zhang, J. Wang, Meta-learning-based deep reinforcement learning for multiobjective optimization problems, IEEE Trans. Neural Netw. Learn.
Syst. 34 (10) (2023) 7978--7991.
[24] J. Chen, J. Wang, Z. Zhang, Z. Cao, T. Ye, S. Chen, Efficient meta neural heuristic for multi-objective combinatorial optimization, in: NeurIPS 2023, in: Advances
in Neural Information Processing Systems, vol. 36, 2023, 2023, pp. 1--13.
[25] S. Li, F. Wang, Q. He, X. Wang, Deep reinforcement learning for multi-objective combinatorial optimization: a case study on multi-objective traveling salesman
problem, Swarm Evol. Comput. 83 (2023) 101398.
[26] Y. Zhang, J. Wang, Z. Zhang, Edge-based formulation with graph attention network for practical vehicle routing problem with time windows, in: 2022 International
Joint Conference on Neural Networks (IJCNN), 2022, pp. 1--8.
[27] W. Kool, H. van Hoof, M. Welling, Attention, learn to solve routing problems!, in: ICLR 2019, 2019, pp. 1--25.
[28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Łukasz Kaiser, I. Polosukhin, Attention is all you need, in: NIPS 2017, in: Advances in
Neural Information Processing Systems, vol. 30, 2017, pp. 1--11.
[29] S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, in: 32nd International Conference on Machine
Learning, vol. 37, 2015, pp. 448--456.
[30] V. Nair, G.E. Hinton, Rectified linear units improve restricted Boltzmann machines, in: 27th International Conference on Machine Learning, 2010, pp. 1--8.
[31] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (8) (1997) 1735--1780.
[32] R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8 (1992) 229--256.
[33] D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, in: International Conference on Learning Representations, 2015, pp. 1--15.
[34] Y. Zhou, L. Kong, Y. Cai, Z. Wu, S. Liu, J. Hong, K. Wu, A decomposition-based local search for large-scale many-objective vehicle routing problems with
simultaneous delivery and pickup and time windows, IEEE Syst. J. 14 (4) (2020) 5253--5264.
[35] J.G. Falcón-Cardona, C.A.C. Coello, Indicator-based multi-objective evolutionary algorithms: a comprehensive survey, ACM Comput. Surv. 53 (2) (2021) 1--35.
[36] J. Derrac, S. García, D. Molina, F. Herrera, A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and
swarm intelligence algorithms, Swarm Evol. Comput. 1 (1) (2011) 3--18.
[37] H. Scheffé, Experiments with mixtures, J. R. Stat. Soc., Ser. B, Methodol. 20 (2) (1958) 344--360.
[38] K.C. Tan, Y.H. Chew, A hybrid multiobjective evolutionary algorithm for solving vehicle routing problem with time windows, Comput. Optim. Appl. 34 (2006)
115--151.
[39] H. Ye, J. Wang, H. Liang, Z. Cao, Y. Li, F. Li, Glop: learning global partition and local construction for solving large-scale routing problems in real-time, in:
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, 2024, pp. 20284--20292.
23