0% found this document useful (0 votes)

4 views

A Multiobjective Edge-based Learning Algorithm for the Vehicle Routing Problem With Time Windows

This article presents a multiobjective edge-based learning algorithm (MOEL) for solving the vehicle routing problem with time windows (VRPTW), which is a complex NP-hard optimization problem. MOEL utilizes a single neural network to approximate the entire Pareto front, leveraging edge features like asymmetric travel distances and times, and significantly outperforms existing deep reinforcement learning and metaheuristic methods in both solution quality and computational efficiency. Experimental results demonstrate MOEL's effectiveness, achieving up to 99.80% improvement in IGD and 88.65% reduction in runtime compared to competitors.

Uploaded by

Andres Rios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

A Multiobjective Edge-based Learning Algorithm for the Vehicle Routing Problem With Time Windows

Uploaded by

Andres Rios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Information Sciences 715 (2025) 122223

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

A multiobjective edge-based learning algorithm for the vehicle

routing problem with time windows
Ying Zhou a, , Lingjing Kong a , Hui Wang b,∗
a School of Artificial Intelligence, Shenzhen Institute of Information Technology, No. 2188 Longxiang Avenue, Longgang District, Shenzhen, 518172,
China
b School of Computer and Software Engineering, Shenzhen Institute of Information Technology, No. 2188 Longxiang Avenue, Longgang District,

Shenzhen, 518172, China

A R T I C L E I N F O A B S T R A C T

Dataset link: https:// The multiobjective vehicle routing problem with time windows has attracted much attention
github.com/wzydeath/VRPTW-data in recent decades. Until now, various metaheuristic methods have been proposed to solve the
problem. However, designing effective methods is not trivial and heavily depends on experts’
Keywords:
Multiobjective optimization knowledge. As a research hotspot in recent years, a few deep reinforcement learning methods have
Deep reinforcement learning been tried to solve the multiobjective vehicle routing problem with symmetric distance and time
Vehicle routing problem matrices. However, due to the complex traffic conditions, the travel distance and time between two
nodes are probably asymmetric in real-world scenarios. This article introduces a multiobjective
edge-based learning algorithm (MOEL) to tackle this issue. In this method, a single neural network
model is established and trained to approximate the whole Pareto front of the problem. The edge
features, including travel distance and time matrices, are fully learned and used to construct high
quality solutions. MOEL is compared against three state-of-the-art deep reinforcement learning
methods (MODRL/D-EL, PMOCO, EMNH) and five metaheuristic methods (NSGA-II, MOEA/D,
NSGA-III, MOEA/D-D, MOIA). Experimental results on the real-world instances indicate that
MOEL significantly outperforms all competitors, improving IGD by up to 99.80% and HV by up to
62.84%. In addition, MOEL achieves a maximum runtime reduction of 88.65% compared to the
deep reinforcement learning methods, highlighting its efficiency and effectiveness for solving the
problem.

1. Introduction

With the development of the economy and society, logistics has become an increasingly essential part of the national economy.
Logistics planning plays a vital role in logistics, and the vehicle routing problem (VRP) [1] is a critical issue in logistics planning.
The basic VRP aims to dispatch vehicles from a depot to serve customers and meet their requests. The sequencing of the customers
assigned to each vehicle needs to be determined to minimize the total routing cost. Considering diﬀerent business scenarios, many
variant problems have emerged based on the basic VRP [2]. VRP with time windows (VRPTW) is one of the variants of VRP that
considers time window constraints on customers. Each customer specifies a delivery time window for their order. If the vehicle arrives
before the start of the time window, it will need to wait to begin, while it cannot arrive after the end of the time window. The goal

* Corresponding author.
E-mail addresses: [email protected] (Y. Zhou), [email protected] (L. Kong), [email protected] (H. Wang).

https://doi.org/10.1016/j.ins.2025.122223
Received 30 August 2024; Received in revised form 18 April 2025; Accepted 18 April 2025

Available online 23 April 2025

0020-0255/© 2025 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 1
Notations of MOVRPTW.

Symbols Deﬁnition

𝑛 Number of nodes in the problem.

 = {𝑣𝑖 |𝑖 = 0, … , 𝑛} A set of nodes in the graph, where 𝑣0 is the depot.
 = {𝑒𝑖𝑗 |𝑣𝑖 , 𝑣𝑗 ∈ } A set of edges in the graph.
𝑞𝑖 Delivery demand of 𝑣𝑖
𝑡𝑤𝑖 = [𝑡𝑤𝑏𝑖 , 𝑡𝑤𝑒𝑖 ] Time windows of 𝑣𝑖 .
𝑑𝑣𝑖 ,𝑣𝑗 Travel distance between the nodes 𝑣𝑖 and 𝑣𝑗 .
𝑡𝑣𝑖 ,𝑣𝑗 Travel time between the nodes 𝑣𝑖 and 𝑣𝑗 .

of VRPTW is to plan a set of routes subject to vehicle capacity and time window constraints. The problem is single-objective if only
one objective (such as vehicle number or travel distance) is optimized. Otherwise, the problem is considered multiobjective when
multiple conflicting objectives are optimized simultaneously. This variant is crucial in the field of logistics, as delivery time windows
are commonly set by end-customers for the delivery of goods. Such planning is deeply connected to both the accuracy of deliveries
and the satisfaction of customers [2].
VRPTW is an NP-hard combinatorial optimization problem [3], making it difficult to solve. Metaheuristic methods are widely used
and have been successfully applied to address this problem [2]. However, designing effective heuristics requires significant experts’
domain knowledge and trial and error. Besides, most of these methods are iterative and need a long time to solve a problem. Recently,
learning-based methods have attracted more and more attention in solving VRPs [4]. As a representative learning methodology, deep
reinforcement learning (DRL) methods have successfully solved VRPs. These methods formulate the VRPs as a sequential decision
making process and train a deep neural network by reinforcement learning to solve problems. Until now, most DRL algorithms have
been designed and applied to single-objective VRPs. Recently, some works have focused on multi-objective VRPs (MOVRPs) [4].
However, these works have the following limitations:

1. Most of these methods are applied to the multi-objective traveling salesman problem (MOTSP) and multi-objective capaci
tated VRP (MOCVRP). Very few have applied to multi-objective VRPs with complex constraints such as multi-objective VRPTW
(MOVRPTW).
2. Most of these methods are evaluated using manually created instances, which take Euclidean distance as both the travel distance
and travel time, i.e., the distance and time matrices are symmetric and directly computed from the coordinates of nodes (cities or
customers). However, the distance and time matrices are usually asymmetric due to varying traﬃc conditions in the real world.

This article focuses on a real-world MOVRPTW, with problem instances derived from the daily distribution scenarios of JingDong
logistics1 in China. Two conflicting objectives, total travel distance and total waiting time, are to be optimized simultaneously. The
travel distance between nodes is determined by the path length in the transportation network. Besides, although the travel time is
related to travel distance, complex traffic factors such as traffic jams need to be considered when calculating the travel time. Therefore,
the time matrix between two nodes is asymmetric and different from the distance matrix. Existing DRL methods for MOVRPs are
node-based models, using node features (including the coordinates of nodes) to construct a sequence of nodes as a solution for an
instance. The edge features (including the distance and time matrices) are not directly used for solution construction in these methods.
As a result, these methods may not be effective for the real-world MOVRPTW considered in this article.
To address the challenges of solving MOVRPTW, a novel multiobjective edge-based learning algorithm, termed MOEL, is proposed
in this article. MOEL employs a single neural network to approximate the whole Pareto front, enabling flexible trade-offs between
multiple objectives without requiring additional metaheuristics. The contributions of this article are summarized as follows:

1. This article introduces a new multiobjective DRL method, MOEL, for solving MOVRPTW with complex edge features, such as
asymmetric time and distance matrices. Unlike existing node-based DRL methods, MOEL directly leverages edge information for
solution construction, making it suitable for the complex MOVRPTW instances.
2. MOEL employs a single model to generate high-quality solutions for any given instance and preference. By incorporating the
preferences into the model, MOEL eliminates the need for training multiple models, providing a flexible way to solve MOVRPTW.
3. Comprehensive experiments show that MOEL signiﬁcantly outperforms the state-of-the-art DRL and metaheuristic methods,
especially on large-scale MOVRPTW instances, in terms of both solution quality and computational eﬃciency.

The remaining sections are organized as follows. Section 2 provides an overview of related work, including the categories of
multiobjective optimization methods and the methods for MOVRPs. Section 3 describes the deﬁnition of MOVRPTW. Section 4
introduces the proposed MOEL. Section 5 gives the experimental details. Finally, Section 6 presents the conclusions and outlines
future work. Tables 1 and 2 summarize the main symbols used in this article.

1
www.jd.com, the largest online or oﬄine retailer in China.

2
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 2
Notations of MOEL.

Symbols Deﬁnition

𝑇 Total steps in the decoder.

𝐵 Batch size.
𝐿 Number of encoding layers.
𝑠 A problem instance.
𝝀 = (𝜆1 , 𝜆2 )𝑇 A reference point which denotes a preference.
𝝅 = (𝜋1 , … , 𝜋𝑇 ) A node-based solution, where 𝜋1 = 𝜋𝑇 = 𝑣0 , and other 𝜋𝑖 ∈  .
𝝉 = (𝜏1 , … , 𝜏𝑇 ) An edge-based solution, where 𝜏𝑡 = 𝑒𝜋𝑡−1 ,𝜋𝑡 ∈  .
𝜃 Trainable parameters in MOEL.
𝑝𝜃 (𝝉) A stochastic policy parameterized by 𝜃 .
𝑯 𝑙 = {ℎ𝑙11 , … , ℎ𝑙𝑛𝑛 } Output of the 𝑙th encoding layer.
ℎ̃ 𝑡𝑐 Context embedding at step 𝑡 in the decoder.
𝑊𝑖 (𝑖 = 0, … , 3) Trainable parameter matrices in the encoder.
𝑊𝑐 , 𝑊𝑄 , 𝑊𝐾 , 𝑊𝑉 , 𝑊𝑂 Trainable parameter matrices in the decoder.
𝐽 (𝜃) Training loss.
𝑔(𝝉|𝑠, 𝝀) A scalar function for the instance 𝑠, which evaluates the solution 𝝉 based on the reference point 𝝀.
𝑏(𝑠|𝝀) Greedy rollout baseline for the instance 𝑠, which is computed with the reference point 𝝀.

2. Related work

Considering the timing of decision maker’s preference input, multiobjective optimization methods can generally be divided into
three categories [5]:

1. A priori methods: Preferences, such as the weights of objectives, are provided before the optimization process begins.
2. A posteriori methods: Preferences are specified after the optimization process, once trade-offs among nondominated solutions
are available for evaluation.
3. Interactive: Preferences are refined iteratively by obtaining feedback from the decision maker at multiple stages during the
optimization process.

In this study, we focus on the second category, i.e., a posteriori methods. These methods aim to generate a diverse set of nondomi
nated solutions that exhibit both good convergence (closeness to the Pareto front) and diversity (spread along the Pareto front). Such
solutions help decision makers gain a better understanding of the problem and the available alternatives, thus leading to a conscious
and better choice [6]. Accordingly, this article proposes an eﬀective posterior method for the MOVRPTW. In the following sections,
we briefly review multiobjective metaheuristics and DRL approaches that fall within the a posteriori category for solving MOVRPs.

2.1. Multiobjective metaheuristics

Multiobjective metaheuristics methods are manually designed strategies that iteratively explore the solution space to approximate
Pareto optimal solutions. These methods are widely applied to MOVRPs [7]. They can be broadly categorized into three types: multi
objective evolutionary algorithms, multiobjective local searches and multiobjective mimetic algorithms. Multiobjective evolutionary
algorithms are popular for decision making because their population-based approach allows them to approximate the entire Pareto
front in a single run. The representative algorithms for multiobjective optimization problems (MOPs) include nondominated sorting
genetic algorithm (NSGA-II) [8], reference-point-based many objective evolutionary algorithm following NSGA-II framework (NSGA
III) [9], multiobjective evolutionary algorithm based on decomposition (MOEA/D) [10], multiobjective particle swarm optimization
(PSO) [11], competitive swarm optimizer (CSO) [12] and multi-objective immune algorithm (MOIA) [13]. Most of the multiobjective
evolutionary algorithms are designed and tested on multiobjective continuous optimization problems. Multiobjective local searches,
like Pareto local search (PLS) [14], leverage problem-specific knowledge to guide the search directly toward the Pareto front. There
fore, this sort of algorithm is a good choice for tackling multiobjective combinatorial optimization problems (MOCOPs). However,
these methods usually depend on carefully handcrafted designs and are often specialized for each problem. Moreover, the running
time for these methods is still long due to their iterative nature.

2.2. Multiobjective DRL algorithms

DRL methods have become a novel approach to solving VRPs in the last few years [4]. They are data-driven methods. Most of
them automatically learn the heuristics for solving the problem through deep neural networks with encoder-decoder architecture.
This basically eliminates the need to design a solution strategy manually. Recently, multiobjective DRL algorithms have gradually
attracted researchers’ interest in solving MOCOPs. Based on the number of models, they can be roughly divided into two categories:
multiple models and single models.
The methods using multiple models often follow the decomposition discipline, i.e., decompose an MOCOP into a set of scalar
optimization subproblems and model each subproblem as a neural network. Following this concept, Li et al. [15] proposed a DRL
based multiobjective optimization algorithm (DRL-MOA) to solve the MOTSP. In DRL-MOA, each subproblem is modeled using a

3
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 3
DRL methods for MOVRPs.

Algorithms Models Num. of Models Problems Feature type

DRL-MOA [15] pointer network number of reference points MOTSP node

MODRL/D-AM [16] attention model number of reference points MOTSP node
MONEADD [17] pointer network number of reference points MOTSP, multiobjective knapsack problem (MOKP) node
MODRL/D-EL [18] attention model number of reference points MOVRPTW node
PMOCO [19] attention model 1 MOTSP, MOCVRP, MOKP node
WSDAM [20] attention model 1 MOTSP, MOCVRP node
MOPN [21] pointer network 1 MOTSP node
MORAM [22] attention model 1 MOTSP node
MLDRL [23] attention model 1 MOTSP, MOVRPTW node
EMNH [24] attention model 1 MOTSP, MOCVRP, MOKP node
MOMDAM [25] attention model one encoder and multiple decoders MOTSP node

pointer network, and the parameters of all subproblems are optimized via a neighborhood-based parameter-transfer strategy. Wu et
al. [16] extended the previous work [15] and proposed an attention model-based multiobjective optimization algorithm (MODRL/D
AM) for the MOTSP. The attention model can extract the node features and graph structures of MOTSP, which achieves better results
than DRL-MOA. Several studies have focused on evolutionary learning to further improve the performance of models. Shao et al. [17]
proposed a multiobjective evolutionary algorithm based on decomposition and dominance (MONEADD). It uses generic operations
and reward signals to evolve neural networks without further engineering. Based on the work [16], Zhang et al. [18] introduced a
multiobjective deep reinforcement learning with evolutionary learning algorithm (MODRL/D-EL) for the MOVRPTW. This approach
employs a two-stage hybrid learning strategy, where DRL with parameter-transfer is used in the first stage, followed by evolutionary
learning to fine-tune model parameters in the second stage.
Since the number of Pareto optimal solutions for an MOCOP may be extremely large, the required number of models to approximate
the whole Pareto front would be huge. In addition, the reference points for decomposition are predefined and fixed before training,
making methods using multiple models less flexible. Recently, several studies tried to solve MOCOPs with a single model. Lin et
al. [19] proposed a preference-conditioned neural multiobjective combinatorial optimization model (PMOCO) to approximate the
whole Pareto front with any reference points. It uses an attention encoder to extract the node features and a preference-conditioned
decoder to generate a solution with a given reference point. Instead of using a whole decoder parameterized by reference points,
Ye et al. [20] proposed a weight-specific-decoder attention model (WSDAM) that uses a small weight-adaptive layer in the decoder.
Gao et al. [21] proposed a single-model multiobjective pointer network (MOPN). It uses a pointer network as the encoder. To deal
with the MOCOPs, the node information and reference point are combined as the input of the encoder. Wang et al. [22] proposed a
multiobjective routing attention model (MORAM). Its encoder consists of the objective encoder, the router and the global encoder,
which can dynamically determine the subproblem embeddings. Following the concept of meta-learning, Zhang et al. [23] proposed a
meta-learning-based DRL (MLDRL) for the MOCOPs. First, a meta-model is trained to learn knowledge of the whole Pareto front. It is
updated by multiple subproblems constructed by diﬀerent reference points. Then, in the fine-tuning step, a submodel of a subproblem
is obtained by updating the meta-model within a few steps. To accelerate the training process, Chen et al. [24] proposed an eﬃcient
meta neural heuristic (EMNH) that uses a multi-task model composed of a parameter-shared body and respective task-related heads
to train the meta-model. To enhance the diversity of generated solutions, Li et al. [25] proposed a DRL-based method that employs
the multiple decoder attention model (MOMDAM). It applies one encoder to generate node embeddings for a given subproblem, and
then applies multiple decoders to produce a diverse set of solutions for the subproblem.
Table 3 summarizes the related work on multiobjective DRL algorithms. Most of the existing methods have been applied to the
MOTSP and MOCVRP, and very few have been applied to the MOVRPTW that has more complex constraints. Moreover, most of them
focus only on node features and are evaluated using manually generated problem sets that rely on Euclidean distance for both travel
distance and time. As a result, these methods may not perform well on real-world MOVRPTW problems that involve complex edge
features. Inspired by the reference [26], this article proposes a single model using edge features to construct solutions for MOVRPTW,
which is detailed in the following sections.

3. Problem deﬁnition

The MOVRPTW is defined as an MOCOP in a directed graph  = {, }, where  = {𝑣𝑖 |𝑖 = 0, … , 𝑛} is a set of nodes in the graph,
and  = {𝑒𝑖𝑗 |𝑣𝑖 , 𝑣𝑗 ∈ } is a set of edges in the graph. The node 𝑣0 denotes the depot, and others denote the customers. Each node 𝑣𝑖
has the following features:

𝑞𝑖 [ Delivery ] demand of 𝑣𝑖 , where 𝑞0 = 0.

𝑡𝑤𝑖 = 𝑡𝑤𝑏𝑖 , 𝑡𝑤𝑒𝑖 Time windows of 𝑣𝑖 . The vehicle starts from the depot 𝑣0 at 𝑡𝑤𝑏0 , and should return to 𝑣0 before 𝑡𝑤𝑒0 . Meanwhile,
each customer 𝑣𝑖 should be served within its time window 𝑡𝑤𝑖 .

Each edge 𝑒𝑖,𝑗 has the following features:

𝑑𝑣𝑖 ,𝑣𝑗 The travel distance between the nodes 𝑣𝑖 and 𝑣𝑗 .

4
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
𝑡𝑣𝑖 ,𝑣𝑗 The travel time between the nodes 𝑣𝑖 and 𝑣𝑗 .

A fleet of homogeneous vehicles, each with a capacity 𝐶 , is dispatched to serve the customers. Each vehicle begins at the depot,
serves a sequence of customers, and then returns to the depot. The objective is to plan 𝑀 routes,  = [𝑅1 , … , 𝑅𝑀 ], by simultaneously
minimizing the total travel distance and total waiting time, while satisfying the constraints of the MOVRPTW. Each route 𝑅𝑗 is
represented as a sequence of nodes ⟨𝜋0 , 𝜋1 , … , 𝜋𝑁𝑗 ⟩, where 𝜋0 = 𝜋𝑁𝑗 = 𝑣0 denotes the depot, and the intermediate nodes 𝜋𝑖 ∈ 
represent the customers in the 𝑗 th route. Let 𝑎𝑖 be the arrival time at node 𝜋𝑖 , and 𝑙𝑖−1 be the departure time from node 𝜋𝑖−1 . The
arrival time at node 𝜋𝑖 is calculated as follows:

𝑎𝑖 = 𝑙𝑖−1 + 𝑡𝜋𝑖−1 ,𝜋𝑖 . (1)

A vehicle is permitted to arrive before the time window of a customer. In this case, it will wait until the starting time begins. The
waiting time of 𝜋𝑖 is calculated as:

𝑤𝑖 = max(0, 𝑡𝑤𝑏𝑖 − 𝑎𝑖 ) (2)

and the total waiting time of the 𝑗 th route is calculated as:
𝑁𝑗 −1
∑
𝑤𝑎𝑖𝑡𝑗 = 𝑤𝑖 (3)
𝑖=1

Arriving after the time window is forbidden. The total travel distance of the 𝑗 th route is calculated as:
𝑁𝑗 −1
∑
𝑑𝑖𝑠𝑡𝑗 = 𝑑𝜋𝑖 ,𝜋𝑖+1 . (4)
𝑖=0

In this article, two objectives, total travel distance 𝑓1 and total waiting time 𝑓2 , are optimized simultaneously. These objectives
have direct practical relevance in terms of logistics. 𝑓1 is an economic objective that signiﬁcantly impacts total logistics costs. Mini
mizing 𝑓1 reduces the cost of transportation. Meanwhile, minimizing 𝑓2 increases eﬃciency and avoids wasting working time. The
two objectives 𝑓1 and 𝑓2 are conflicting due to the time window constraint. Minimizing travel distance often indicates selecting
the shortest path between nodes. However, because of the time windows, vehicles may arrive early and need to wait to serve the
customer, increasing the waiting time. On the other hand, to minimize waiting time, routes may need to be adjusted so that vehicles
arrive closer to the start of the time window, which may lead to longer travel distances. Therefore, multiobjective optimization is
necessary to generate a set of trade-off solutions. The objective 𝑓1 is defined as:
𝑀
∑
𝑓1 = 𝑑𝑖𝑠𝑡𝑗 (5)
𝑗=1

The objective 𝑓2 is defined as:

𝑀
∑
𝑓2 = 𝑤𝑎𝑖𝑡𝑗 (6)
𝑗=1

Therefore, the MOVRPTW considered in this article is defined as:

min 𝐹 = {𝑓1 , 𝑓2 } (7)

subject to the following constraints (1) and (2):

1. Capacity constraint: The total demand on each route must not exceed the vehicle’s capacity 𝐶 . Speciﬁcally, for the 𝑗 th route 𝑅𝑗 ,
∑ 𝑁𝑗
the constraint is defined as: 𝑖=1 𝑔𝑖 ≤ 𝐶 , where 𝑔𝑖 represents the demand of the 𝑖th customer in the route.
2. Time window constraint: A vehicle cannot arrive after the end of the time window of each node, i.e., for each 𝜋𝑖 , 𝑎𝑖 ≤ 𝑡𝑤𝑒𝑖 must
hold.

4. Solution methodology

4.1. Multiobjective optimization

An MOP consists of 𝑚 objective functions to be optimized simultaneously, which is defined as follows:

min 𝐹 (𝒙) = {𝑓1 (𝒙), … , 𝑓𝑚 (𝒙)} (8)

subject to 𝒙 = {𝑥1 , … , 𝑥𝑛 } ∈ Ω, where Ω is the feasible 𝑛-dimensional decision space.

5
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223

Fig. 1. Illustration of MOEL on solving a bi-objective problem.

Let 𝒙, 𝒚 ∈ Ω, 𝒙 Pareto dominates 𝒚 (denoted as 𝒙 ≺ 𝒚 ) iff 𝑓𝑖 (𝒙) ≤ 𝑓𝑖 (𝒚) for every 𝑖 ∈ {1, … , 𝑚}, and 𝑓𝑗 (𝒙) < 𝑓𝑗 (𝒚) for at least one
𝑗 ∈ {1, … , 𝑚}. A solution 𝒙∗ is Pareto optimal if no other solution in Ω dominates it. The objective vector of 𝒙∗ , 𝐹 (𝒙∗ ), represents
a Pareto optimal objective vector. The Pareto set includes all Pareto optimal solutions, while the Pareto front comprises all Pareto
optimal objective vectors. In MOPs, objectives are often conflicting, meaning improving one objective may result in the degradation
of another. Consequently, no single solution can simultaneously optimize all objectives. Instead, a set of trade-offs is often needed for
decision making.

4.2. Model architecture

The proposed MOEL employs a single model with an encoder-decoder structure, inspired by the attention model proposed by
Kool et al. [27]. The encoder-decoder architecture is a sequence-to-sequence modeling framework widely applied to combinatorial
optimization tasks. It consists of two main components:

• Encoder: The encoder processes the input problem features and transforms them into fixed-dimensional latent embeddings. These
embeddings capture the spatial and structural relationships within the problem instance, providing a comprehensive summary
of the input instance.
• Decoder: The decoder constructs the solution in a step-by-step manner. At each step, it selects an action from the available op
tions. Guided by the attention mechanism, the decoder dynamically focuses on the most relevant parts of the input representation
during each step of the solution construction.

This model treats the solution generation process as a sequential decision-making task, which can be trained using reinforcement
learning. The key elements include state, action and reward. State represents the current status of the problem. Action defines a
decision to be performed at the current state. Reward evaluates the quality of actions taken at each state. The policy defines a strategy
for selecting actions based on the current state. Reinforcement learning aims to optimize the policy to maximize the cumulative
reward over time. Representative algorithms for policy optimization include policy gradient methods and actor-critic methods.
In MOEL, the encoder inputs edge features to produce edge embeddings via sublayers, while the decoder uses these embeddings
to sequentially generate a series of edges as a solution. In order to deal with the multi-objective nature of the problem, the preference
for objectives is incorporated into the inputs and learned, thus allowing MOEL to solve the problem with any preferences. When
trained, the proposed MOEL is expected to obtain high-quality solutions in a short running time for the MOVRPTW. Fig. 1 illustrates
the inference process of MOEL. A decision maker’s preference is denoted as a reference point in the figure. For example, the point
(0.8, 0.2) means 80% importance on objective 𝑓1 and 20% on objective 𝑓2 . Each time, MOEL takes the problem instance and a
preference as an input vector and generates a corresponding edge-based solution for the preference. As a result, MOEL generates four
solutions satisfying different preferences.
To make MOEL tackle preferences effectively, the MOVRPTW is decomposed into single-objective subproblems using preference
based scalarization methods. These methods decompose the problem into multiple subproblems with different preferences. By solving
all subproblems, a set of approximated Pareto solutions can be obtained. Common scalarization techniques include weighted sum
method, Tchebycheff method and penalty-based boundary intersection (PBI) method [10]. Let 𝝀 = (𝜆1 , 𝜆2 )𝑇 be a reference point
where 𝜆1 + 𝜆2 = 1 and 𝜆1 , 𝜆2 ≥ 0. For a given instance 𝑠 of the MOVRPTW, the three scalarization methods are described as follows:

6
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
• Weighted Sum Method: The corresponding subproblem formulated by weighted sum method is defined as:

min 𝑔 𝑤𝑠 (𝒙|𝑠, 𝝀) = 𝜆1 𝑓̄1 (𝒙) + 𝜆2 𝑓̄2 (𝒙) (9)

where 𝒙 is a solution and 𝑓̄𝑖 (𝒙) is the normalized 𝑖th objective value of 𝑓𝑖 (𝒙), which is computed as follows:

𝑓𝑖 (𝒙) − min𝑖
𝑓̄𝑖 (𝒙) = (10)
max𝑖 − min𝑖
where max𝑖 and min𝑖 are the maximum and minimum values of the 𝑖th objective value, respectively. Normalization is used
because the two objectives, 𝑓1 and 𝑓2 , have diﬀerent scales.
• Tchebycheff Method: The Tchebycheff method formulates the subproblem as:

min 𝑔 𝑡𝑐ℎ (𝒙|𝑠, 𝝀) = min {𝜆𝑖 |𝑓̄𝑖 (𝒙) − 𝑧∗𝑖 |} (11)

1≤𝑖≤2

where 𝒛∗ is an ideal point, i.e., 𝑧∗𝑖 = min{𝑓̄𝑖 (𝒙)|𝒙 ∈ Ω}.

• PBI Method: The PBI method formulates the subproblem as:

min 𝑔 𝑝𝑏𝑖 (𝒙|𝑠, 𝝀) = 𝑑1 + 𝜃𝑑2 (12)

where 𝑑1 = ||(𝐹 (𝒙) − 𝒛∗ )𝝀||∕||𝝀|| and 𝑑2 = ||𝐹 (𝒙) − (𝒛∗ + 𝑑1 𝝀)||.

The proposed MOEL aims to use a single model to solve any subproblems of the MOVRPTW. A subproblem of the MOVRPTW
can be described as a Markov decision process. In the node-based scheme, given an initial state with an empty solution, the model
sequentially appends a node to the solution at each step and creates a complete solution at the end. The methods for the node-based
scheme aim to find a stochastic policy that produces a sequence of nodes 𝝅 = (𝜋0 , … , 𝜋𝑇 ) to minimize the expected subproblem’s
scalar function, where 𝜋0 = 𝜋𝑇 = 𝑣0 and other 𝜋𝑖 ∈  .
According to [26], the node-based scheme can be transformed into an edge-based scheme. While the node-based approach selects
a node at each step, the edge-based approach chooses an edge at each step. Speciﬁcally, let an edge-based solution be 𝝉 = (𝜏1 , … , 𝜏𝑇 ),
where 𝜏𝑡 = 𝑒𝜋𝑡−1 ,𝜋𝑡 ∈  for 1 ≤ 𝑡 ≤ 𝑇 . The proposed MOEL aims to find a stochastic policy 𝑝𝜃 (𝝉) parameterized by 𝜃 . For an instance
𝑠 and a subproblem with a reference point 𝝀, the policy sequentially constructs a feasible edge-based solution 𝝉 with the minimal
scalar value:
𝑇
∏
𝑝𝜃 (𝝉|𝑠, 𝝀) = 𝑝𝜃 (𝜏𝑡 |𝑠, 𝝀, 𝜏1∶𝑡−1 ) (13)
𝑡=1

4.3. Encoder

The encoder of the MOEL is illustrated in Fig. 2(a). The input vectors of the encoder consist of the node features, edge features
and the information of reference point. Speciﬁcally, given a reference point 𝝀 = (𝜆1 , 𝜆2 )𝑇 , an input vector is defined as:

𝑒𝑖𝑗 = (𝑞𝑗 , 𝑡𝑤𝑏𝑗 , 𝑡𝑤𝑒𝑗 , 𝑑𝑣𝑖 ,𝑣𝑗 , 𝑡𝑣𝑖 ,𝑣𝑗 , 𝜆1 , 𝜆2 )𝑇 (14)

where 𝑞𝑗 , 𝑡𝑤𝑏𝑗 , 𝑡𝑤𝑒𝑗 are the node features of the edge 𝑒𝑖𝑗 ’s ending node 𝑣𝑗 . The coordinates of 𝑣𝑗 are excluded from the node features,
because an edge’s travel distance and travel time depend on the traﬃc condition and are not directly calculated with the coordinates.
Meanwhile, only the features of the ending node are contained because when constructing a solution, the next edge is selected based
on the ending node of the current edge. 𝑑𝑣𝑖 ,𝑣𝑗 and 𝑡𝑣𝑖 ,𝑣𝑗 are the edge features of 𝑒𝑖𝑗 .
First, the encoder maps each input vector to a 𝑑 -dimensional space by a trainable parameter matrix 𝑊0 ∈ ℝ𝑑×7 as follows:

ℎ0𝑖𝑗 = 𝑊0 𝑒𝑖𝑗 (15)

ℎ0𝑖𝑗 is the initial edge embedding of 𝑒𝑖𝑗 . Then, 𝐿 encoding layers further update the edge embeddings. Each layer consists of an
attention layer and an aggregation layer. Let 𝑯 𝑙−1 = {ℎ𝑙−1
11
, … , ℎ𝑙−1
𝑛𝑛 } be the input of the 𝑙 th attention layer, the output of this layer
for 𝑒𝑖𝑗 is calculated as follows:

ℎ̄ 𝑙𝑖𝑗 = SHA(ℎ𝑙−1 𝑙−1

𝑖𝑗 , {ℎ𝑗𝑘 |∀𝑒𝑗𝑘 ∈ {𝑆(𝑒𝑖𝑗 )}}) (16)

ℎ̂ 𝑙𝑖𝑗 = BN𝑙 (ℎ𝑙−1 𝑙̄𝑙

𝑖𝑗 + ReLU(𝑊1 ℎ𝑖𝑗 )) (17)
where

• 𝑆(𝑒𝑖𝑗 ) represents the candidate edges starting from the node 𝑣𝑗 . Suppose the current selected edge is 𝑒𝑖𝑗 . When constructing
a solution, the next selected edge will be an edge 𝑒𝑗𝑘 starting from 𝑣𝑗 . Therefore, the most relevant edges of 𝑒𝑖𝑗 will be 𝑒𝑗𝑘 ,
𝑘 = 1, … , 𝑛 among all edges. To reduce the computational complexity, 𝑆(𝑒𝑖𝑗 ) contains the 𝑟-nearest edges starting from 𝑣𝑗 .

7
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223

Fig. 2. Architecture of the proposed MOEL.

• SHA(⋅) denotes the single-head attention operation [28], BN(⋅) denotes the batch normalization operation [29], and ReLU(⋅)
denotes the ReLU activation [30].
• 𝑊1 ∈ ℝ𝑑×𝑑 is a trainable parameter matrix.

̂ 𝑙 as the input and calculate the edge embedding of the 𝑙 th layer

After the 𝑙 th attention layer, the 𝑙 th aggregation layer will take ℎ 𝑖𝑗
ℎ𝑙𝑖𝑗 as follows:

ℎ𝑙𝑖𝑗 = BN𝑙 (ℎ̂ 𝑙𝑖𝑗 + ReLU(𝑊2𝑙 [𝑊3𝑙 ℎ𝑙−1 ̂𝑙

𝑖𝑗 ; ℎ𝑖𝑗 ])) (18)

where 𝑊2 ∈ ℝ𝑑×2𝑑 and 𝑊3 ∈ ℝ𝑑×𝑑 are trainable parameters, and [; ] represents the concatenation operation. The parameters are not
shared between layers.

4.4. Decoder

The decoder sequentially generates a solution 𝝉 based on the edge embeddings 𝑯 𝐿 = {ℎ𝐿
11
, … , ℎ𝐿
𝑛𝑛 }. For simplicity, the superscript
𝐿 is omitted, and 𝑯 is used instead of 𝑯 𝐿 . Fig. 2(b) illustrates the structure of the decoder. At time step 𝑡 ∈ {1, … , 𝑇 }, the decoder
selects an edge 𝜏𝑡 using the context embedding ℎ ̃ 𝑡 and the embedding of candidate edges. The context embedding ℎ̃ 𝑡 represents the
𝑐 𝑐
decoding context information at step 𝑡, which is defined as follows:
{[ ]
𝑄 ; 𝑇 ; 𝐷 ; 𝑅 ; 𝝀 𝑡 > 1,
ℎ̃ 𝑡𝑐 = [ 𝑡 𝑡 𝑡 𝑡 ] (19)
𝑄𝑡 ; 𝑇𝑡 ; 𝐷𝑡 ; 𝑣1 ; 𝝀 𝑡 = 1,
where 𝝀 is the same reference point used in the encoder. 𝑄𝑡 , 𝑇𝑡 and 𝐷𝑡 are the remaining capacity, travel time and total travel distance
of the vehicle at step 𝑡, respectively. Initially, 𝑄1 is set to the maximum capacity 𝐶 , and 𝑇1 , 𝐷1 are set to 0. If an edge 𝑒𝑖𝑗 is selected
at step 𝑡 − 1, 𝑄𝑡 , 𝑇𝑡 and 𝐷𝑡 are computed as follows:
{
𝐶 if 𝑣𝑗 = 𝑣0 ,
𝑄𝑡 = (20)
𝑄𝑡−1 − 𝑞𝑗 otherwise,
{
0 if 𝑣𝑗 = 𝑣0 ,
𝑇𝑡 = 𝑏 (21)
max(𝑇𝑡−1 + 𝑡𝑣𝑖 ,𝑣𝑗 , 𝑡𝑤𝑗 ) otherwise,
𝐷𝑡 = 𝐷𝑡−1 + 𝑑𝑣𝑖 ,𝑣𝑗 (22)

8
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
𝑅𝑡 represents the embedding of the partial solution 𝝉 1∶𝑡−1 constructed from step 1 to step 𝑡 − 1. Since 𝝉 1∶𝑡−1 is a sequence, the long
short-term memory (LSTM) [31] is adopted to calculate the 𝑑 -dimensional 𝑅𝑡 as follows:

𝑅𝑡 = LSTM([ℎ𝜏1 , ℎ𝜏2 , … , ℎ𝜏𝑡−1 ]) (23)

where [ℎ𝜏1 , ℎ𝜏2 , … , ℎ𝜏𝑡−1 ] is the edge embeddings of the partial solution 𝝉 1∶𝑡−1 . When 𝑡 = 1, 𝑅𝑡 is replaced by a 𝑑 -dimensional trainable
vector 𝑣1 because the partial solution is empty. The context embedding ℎ ̃ 𝑡 is a (𝑑 + 5)-dimensional vector. A trainable parameter
𝑐
matrix 𝑊𝑐 ∈ ℝ𝑑×(𝑑+5) is used to map ℎ ̃ 𝑡 to a 𝑑 -dimensional vector:
𝑐

ℎ̃ 𝑡𝑐 = 𝑊𝑐 ℎ̃ 𝑡𝑐 (24)
̃ 𝑡 is updated by the multi-head attention (MHA) layer as follows:
Suppose the vehicle is at node 𝑣𝑖 at step 𝑡, ℎ 𝑐

ℎ𝑡𝑐 = MHA(ℎ̃ 𝑡𝑐 , {ℎ𝑖𝑗 |∀𝑒𝑖𝑗 , 𝑗 = 1, … , 𝑛}) (25)

Because the decoder need to select an edge starting from 𝑣𝑖 at step 𝑡, the embeddings of these edges, ℎ𝑖𝑗 (𝑗 = 1, … , 𝑛), are used to
̃ 𝑡 . Speciﬁcally, the compatibility of ℎ̃ 𝑡 with 𝑒𝑖𝑗 (𝑗 = 1, … , 𝑛) for each head 𝑘 = 1, … , 𝐾 is calculated as follows:
calculate the ℎ 𝑐 𝑐
{ 1 ̃ 𝑡 )𝑇 (𝑊 𝑘 ℎ𝑖𝑗 )
√ (𝑊 𝑘 ℎ if 𝑒𝑖𝑗 can be feasibly selected,
𝑄 𝑐 𝐾
𝑢𝑘𝑖𝑗 = 𝑑𝑘 (26)
−∞ otherwise,
𝑑
where 𝑑𝑘 = . 𝑊𝑄𝑘 ∈ ℝ𝑑𝑘 ×𝑑 and 𝑊𝐾𝑘 ∈ ℝ𝑑𝑘 ×𝑑 are the trainable parameters to calculate the query and key vectors for the 𝑘th head.
𝐾
The masking technique is applied to the infeasible edges to prevent using the information about these edges. The feasibility checking
of an edge 𝑒𝑖𝑗 consists of the following conditions:

1. 𝑒𝑖𝑗 has already been selected in previous steps.

2. The capacity constraint will be violated if 𝑒𝑖𝑗 is selected, i.e., 𝑄𝑡 < 0.
3. The time window constraint will be violated if 𝑒𝑖𝑗 is selected, i.e., 𝑇𝑡 > 𝑡𝑤𝑒𝑗 .

If one of these conditions holds, 𝑒𝑖𝑗 cannot be feasibly selected. As a result, 𝑢𝑘𝑖𝑗 is masked as −∞.
From the compatibilities 𝑢𝑘𝑖𝑗 , the attention weight 𝛼𝑖𝑗
𝑘 is calculated by the softmax operation:

𝑢𝑘𝑖𝑗
𝑒
𝛼𝑖𝑗𝑘 = (27)
∑𝑛 𝑢𝑘 ′
𝑗 ′ =1 𝑒
𝑖𝑗

The output vector of the 𝑘th head is calculated as follows:

𝑛
∑
𝑣𝑘𝑖𝑗 = 𝛼𝑖𝑗𝑘 ′ (𝑊𝑉𝑘 ℎ𝑖𝑗 ′ ) (28)
𝑗 ′ =1

The output of the MHA layer is calculated by contacting all 𝑣𝑘𝑖𝑗 (𝑘 = 1, … , 𝐾 ) as follows:

𝐾
∑
ℎ𝑡𝑐 = 𝑊𝑂𝑘 𝑣𝑘𝑖𝑗 (29)
𝑘=1

where 𝑊𝑉𝑘 ∈ ℝ𝑑𝑘 ×𝑑 and 𝑊𝑂𝑘 ∈ ℝ𝑑×𝑑𝑘 are trainable parameter matrices.
Finally, ℎ𝑡𝑐 is fed into a single-head attention layer to compute the probability of choosing an edge at step 𝑡. The compatibility of
ℎ𝑡𝑐 for 𝑒𝑖𝑗 (𝑗 = 1, … , 𝑛) is calculated as follows:

⎧ (ℎ𝑡 )𝑇 (𝑊 ′ ℎ𝑖𝑗 )
⎪𝐶 ⋅ tanh( 𝑐 √ 𝐾 ) if 𝑒𝑖𝑗 can be feasibly selected,
𝑢̄ 𝑖𝑗 = ⎨ 𝑑 (30)
⎪−∞ otherwise,
⎩
where 𝑊𝐾′ ∈ ℝ𝑑×𝑑 is a trainable parameter matrix, and 𝐶 = 10 is a hyperparameter for clipping. Then, the softmax operation is used
to calculate the probability of choosing 𝜏𝑡 = 𝑒𝑖𝑗 at step 𝑡 as follows:

𝑒𝑢̄ 𝑖𝑗
𝑝𝑖𝑗 = 𝑝𝜃 (𝜏𝑡 |𝑠, 𝝀, 𝜏1∶𝑡−1 ) = ∑𝑛 𝑢̄ 𝑖𝑗 ′
(31)
𝑗 ′ =1 𝑒

The decoder can select an edge based on the probabilities of the edges at each step and construct a feasible solution at the end.

9
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Algorithm 1 Training process of MOEL.
Input: distribution of reference point Λ, distribution of MOVRPTW instances 𝑆 , number of training steps 𝑆𝑖𝑡𝑒𝑟 , batch size 𝐵
Output: The trained parameters 𝜃
1: Initialize the model parameters 𝜃
2: for 𝑖𝑡𝑒𝑟 = 1 to 𝑆𝑖𝑡𝑒𝑟 do
3: 𝝀 ← Sample_reference_point(Λ)
4: 𝑠𝑖 ← Sample_instance(𝑺 ) ∀𝑖 ∈ {1, … , 𝐵}
5: 𝝉 𝑖 ← Sample_solution(𝑝𝜃 (⋅|𝑠𝑖 , 𝝀)) ∀𝑖 ∈ {1, … , 𝐵}
6: 𝑏(𝑠𝑖 |𝝀) ← Greedy_rollout(𝑝𝜃 (⋅|𝑠𝑖 , 𝝀)) ∀𝑖 ∈ {1, … , 𝐵}
∑𝐵
7: ∇𝐽 (𝜃) ← 𝐵1 𝑖=1 [(𝑔(𝝉 𝑖 |𝑠𝑖 , 𝝀) − 𝑏(𝑠𝑖 |𝝀))∇𝜃 log 𝑝𝜃 (𝝉 𝑖 |𝑠𝑖 , 𝝀)]
8: 𝜃 ← ADAM(𝜃 , ∇𝐽 (𝜃))
9: end for

4.5. Model training

For an instance 𝑠 of the MOVRPTW and a given reference point 𝝀, the goal of the proposed model is to minimize the expected
value for the scalar function (denoted as 𝑔(𝝉|𝑠, 𝝀)):

𝐽 (𝜃|𝑠, 𝝀) = 𝔼𝝉∼𝑝𝜃 (⋅|𝑠) 𝑔(𝝉|𝑠, 𝝀) (32)

To train the model, diﬀerent instances 𝑠 ∈ 𝑺 and reference point 𝝀 ∈ Λ are sampled at each iteration. Therefore, the training loss is
defined as:

𝐽 (𝜃) = 𝔼𝑠∼𝑺,𝝀∼Λ 𝐽 (𝜃|𝑠, 𝝀) (33)

The policy gradient based REINFORCE algorithm [32] is used for model training. For a given instance 𝑠 and a reference point 𝝀,
the gradient for the subproblem’s cost is calculated as:

∇𝐽 (𝜃|𝑠, 𝝀) = 𝔼𝝉∼𝑝𝜃 (⋅|𝑠) [(𝑔(𝝉|𝑠, 𝝀) − 𝑏(𝑠|𝝀))∇𝜃 log 𝑝𝜃 (𝝉|𝑠, 𝝀)] (34)

where 𝑏(𝑠|𝝀) is the greedy rollout baseline to reduce the gradient variance. This gradient is estimated by Monte Carlo sampling [19].
At each iteration, first, a reference point 𝝀 = (𝜆1 , 𝜆2 )𝑇 and 𝐵 instances {𝑠1 , … , 𝑠𝐵 } ∈ 𝑆 are randomly generated. The point 𝝀 needs
to satisfy 𝜆1 + 𝜆2 = 1 and 𝜆1 , 𝜆2 ≥ 0. Then, 𝐵 solutions {𝝉 1 , … , 𝝉 𝐵 } are randomly sampled using the model. The gradient is estimated
as follows:
𝐵
1∑
∇𝐽 (𝜃) ≈ [(𝑔(𝝉 𝑖 |𝑠𝑖 , 𝝀) − 𝑏(𝑠𝑖 |𝝀))∇𝜃 log 𝑝𝜃 (𝝉 𝑖 |𝑠𝑖 , 𝝀)] (35)
𝐵 𝑖=1
The optimizer Adam [33] is used to update the model’s parameters. Algorithm 1 presents the training process of the model.

4.6. Complexity analysis

The complexity of the proposed MOEL mainly lies in the attention mechanism. For a problem with 𝑛 nodes (𝑛2 edges) and
embedding dimension 𝑑 , the complexity is as follows:

• In the encoder, the attention mechanism generates embeddings for all edges, which takes 𝑂(𝑛2 ⋅ 𝑟 ⋅ 𝑑 + 𝑛2 ⋅ 𝑑 2 ), where 𝑟 denotes
the number of candidate edges.
• During the decoding process, the solution is generated sequentially. At each step, the attention mechanism is applied to select an
edge, which takes 𝑂(𝑛⋅𝑑 2 +𝑛2 ⋅𝑑). Therefore, the complexity of the decoder for generating a complete solution is 𝑂(𝑛2 ⋅𝑑 2 +𝑛3 ⋅𝑑).

Since the key components in the attention mechanism, such as matrix multiplications, can be calculated in parallel, the eﬃciency
of MOEL can be signiﬁcantly improved using GPU acceleration.

5. Experiments

In this section, all experiments are conducted on a server equipped with eight Intel Xeon (Cascade Lake) Platinum 8269CY CPUs
at 2.5 GHz and 16.0 GB of RAM to investigate the performance of the proposed MOEL. A single 4090 GPU is used to train the network
models in the experiments. All algorithms are implemented in Python.

5.1. Problem instances

The problem instance used in this article comes from the B2B delivery scenarios of JingDong logistics in Beijing [34]. Speciﬁcally,
the whole data set contains one depot and 1600 customers. Node coordinates, travel distance and time matrices are obtained from
real business scenarios. All data are normalized into [0, 1]. The time windows and demands are set as follows:

10
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
• Time windows: The time window of the depot is set to [480, 1440], which means the depot opens at 8:00 and closes at 24:00.
As the working time is 8 hours, i.e., 480 minutes, the begin service time 𝑡𝑤𝑏𝑖 of the 𝑖th customer is randomly selected from
[480, 930]. The interval of the time window is randomly selected from [30, 60, 90, 120]. The end service time 𝑡𝑤𝑒𝑖 cannot exceed
16:00. The time window of each node 𝑡𝑤𝑖 = [𝑡𝑤𝑏𝑖 , 𝑡𝑤𝑒𝑖 ] is further normalized into [0, 1] as follows:

𝑡𝑤𝑖 − 480
𝑡𝑤𝑖 = (36)
1000
• Demands: The demands of the nodes are set as in [18]. The depot has a demand of 0, while customer demands are randomly
generated. Speciﬁcally, the demand 𝑞𝑖 for the 𝑖th customer is sampled from a normal distribution  (15, 10) and truncated to an
integer within the range [1, 42]. This value is then scaled by the vehicle capacity 𝐶 , where 𝑞𝑖 = 𝑞𝑖 ∕𝐶 . For consistency with [18],
the vehicle capacity 𝐶 is set to 750.

5.2. Performance measures

The performance of an algorithm for the MOVRPTW is assessed in terms of convergence and diversity. In this article, the following
two indicators are used:

1. Inverted generational distance (IGD) [10]: Let 𝑃 ∗ be a set of uniformly distributed points along the Pareto front. Let 𝑨 be an
obtained solution set, the IGD value is calculated as:
∑
𝑣∈𝑷 ∗ 𝑑(𝑣, 𝑨)
IGD(𝑨, 𝑷 ∗ ) = (37)
|𝑷 ∗ |
where 𝑑(𝑣, 𝑨) is the minimum Euclidean distance between 𝑣 and the points in 𝐴. Since the true Pareto front of the MOVRPTW
is unknown, all nondominated solutions generated by the competitors are gathered as the Pareto front.
2. Hypervolume indicator (HV) [35]: A reference point and an obtained solution set are required to calculate HV. Let 𝕃 be the
Lebesgue measure in 𝑅𝑚 , the HV is defined as follows:
⋃
HV(𝑨, 𝒛∗ ) = 𝕃( {𝒙|𝒂 ≺ 𝒙 ≺ 𝒛∗ }) (38)
𝒂∈𝑨

where 𝑨 is the solution set and 𝒛∗ is a reference point. For the problem with customer sizes 20, 50, 100 and 150, 𝒛∗ is set to
(15, 3000), (30, 6000), (60, 9000) and (90, 12000) respectively.

A higher (lower) value of HV (IGD) can be regarded as a better set of solutions in terms of convergence and diversity. In addition,
nonparametric statistical tests, i.e., Holm’s test and Wilcoxon signed-rank test [36] at 5% signiﬁcance level, are applied to show the
diﬀerence between MOEL and the compared algorithm.

5.3. Training and testing details

5.3.1. Scalarization method selection

An important question in the design of MOEL is determining the most suitable scalarization method. To address this issue, the
widely used methods, i.e., weighted sum method, Tchebycheff method and PBI method, are applied for decomposition in MOEL. The
models with these methods are denoted as MOEL-WS, MOEL-TCH and MOEL-PBI, respectively. The parameters of these methods are
set the same as in the following section. The models are trained on instances with 50 customers and tested on 100 instances of the
same size.
Table 4 presents the results of the comparisons. The best values are highlighted in bold. It shows that MOEL-WS obtains the best
average IGD values, whereas MOEL-PBI obtains the best HV values. To further analyze the significant differences of these methods,
the Holm’s test is conducted based on the average rankings calculated by the Friedman method. Lower ranking indicates better
performance. The results show that MOEL-WS is the best ranking algorithm in terms of IGD and HV. Therefore, MOEL-WS is selected
as the control algorithm.
In Holm’s test column, if the 𝑝-value is smaller than the adjusted 𝛼 , there are significant differences between the control algorithm
and the competitor. From the last column of Table 4, no statistically significant differences are observed between MOEL-WS and the
other two methods. The analysis indicates that all three scalarization methods perform similarly in MOEL. Therefore, the weighted
sum method is chosen as the scalarization method for the following experiments.

5.3.2. Parameter setting

The hyperparameters of the proposed MOEL are set as follows. The dimension of the embeddings 𝑑 , the number of heads 𝐾 in the
MHA layer and the learning rate of ADAM are set to 64, 8, 10−4 , respectively, as suggested in reference [26]. The number of encoding
layers 𝐿 and the number of candidate edges 𝑟 in the encoder are the parameters to be tuned. We test 𝐿 ∈ {2, 3, 4} and 𝑟 ∈ {5, 10, 15},
resulting in nine combinations of parameter settings. Each parameter combination is evaluated by training MOEL on instances with
50 customers and testing on 100 instances of the same size. The training batch size 𝐵 is set to 64, and the training step 𝑆𝑖𝑡𝑒𝑟 is set to
100000.

11
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 4
Comparisons of average IGD and HV values of MOEL-WS, MOEL-TCH and MOEL-PBI on the
instances with 50 customers.

IGD Holm’s test

Algorithm Average ranking IGD 𝑝-value 𝛼 /𝑖 Diff.?

MOEL-WS 1.86 17.04

MOEL-TCH 1.97 18.86 0.436677 0.05 No
MOEL-PBI 2.17 18.33 0.028377 0.016667 No

HV Holm’s test
Algorithm Average ranking HV 𝑝-value 𝛼 /𝑖 Diff.?

MOEL-WS 1.96 11899869.01

MOEL-TCH 2.07 11881721.94 0.436677 0.016667 No
MOEL-PBI 1.97 11910485.06 0.943628 0.05 No

Table 5
Average HV values of MOEL with dif
ference combinations of parameters on
100 instances with 50 customers.

No. 𝐿 𝑟 Average Value

1 2 5 11682525.34
2 2 10 11767618.13
3 2 15 11733958.74
4 3 5 11783095.44
5 3 10 11906972.5
6 3 15 11862821.59
7 4 5 11836763.94
8 4 10 11823853.05
9 4 15 11828377.24

Fig. 3. Mean eﬀects plots for each parameter.

Table 5 shows the average HV values of MOEL with diﬀerence combinations of parameters. The best value is shown in bold. In
addition, Fig. 3 shows the mean eﬀect of each parameter. The results show that the best configuration for MOEL is {𝐿 = 3, 𝑟 = 10}.
This setting is adopted for the rest of the experiments in this article.
MOEL is trained on instances with 50 customers and tested on instances with varying customer sizes. In the training and testing
instances, the customers and the information of travel distance and travel time are randomly selected from the whole data set. The
time window and demand of each customer are randomly sampled as introduced in Section 5.1. To study the training behavior of
MOEL, a validation dataset containing 100 instances with 50 customers is randomly generated to assess the performance of MOEL.
The HV values for each training step on the validation dataset are recorded. Fig. 4 shows the variation of average HV values obtained

12
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223

Fig. 4. Variation of average HV values obtained by MOEL on the validation dataset.

Table 6
Average IGD and HV values of MONL and MOEL on the datasets. The running time for each algorithm is listed.

Algorithm 20 customers 50 customers 100 customers 150 customers

IGD HV Time/s IGD HV Time/s IGD HV Time/s IGD HV Times

MONL 46.26 27312.43 11 48.5 116824.2 31 37.49 38134.73 45 61.1 785033.22 65

MOEL 38.48 27599.3 11 22.65 119468 27 8.01 393576 57 2.17 821194.27 86

by MOEL. It can be observed that MOEL is almost convergent at 150000 steps. Therefore, the training step 𝑆𝑖𝑡𝑒𝑟 is set to 150000. The
training time of MOEL is about 23 hours.

5.3.3. Testing details

During testing, the greedy strategy (selecting the edge with the maximum probability) is used to construct the solution by the
decoder. The simplex-lattice design [37] is used to generate 𝑁 well-distributed reference points {𝝀1 , … , 𝝀𝑁 } for decomposing the
subproblems. Therefore, 𝑁 solutions are generated by MOEL for each instance. In this article, 𝑁 is set to 100. Four datasets with 20,
50, 100 and 150 customers are used to test the performance of MOEL. Each dataset contains 100 instances to be tested.

5.4. Eﬀectiveness of the edge-based model

The main feature of MOEL is that it is an edge-based model that extracts the edge features and constructs a feasible solution
using edge embeddings. To validate the effectiveness of the edge-based model, a node-based model, termed MONL, is proposed to
be compared with MOEL. The MONL and MOEL have the same architecture. The difference is that MONL extracts only the node
features and constructs a solution based on node embeddings. The hyperparameters of MONL are set the same as those of MOEL. The
training time of MOEL and MONL is about 23 hours and 20 hours, respectively. Since MOEL needs more computational resources to
compute all edge embeddings, its training and testing time is generally longer than that of the algorithm that computes only the node
embeddings.
Table 6 shows the average IGD and HV values of MONL and MOEL on instances with different problem sizes. The best average
value between the two algorithms is highlighted in bold. It can be observed that MOEL obtains better values than MONL on all
datasets in terms of IGD and HV. Since all instances in a dataset can be tested in parallel, the total running time for a dataset is
presented in Table 6. MOEL has a slightly longer running time compared with MONL. From the Wilcoxon signed-rank test in Table 7,
MOEL obtains higher 𝑅+ than 𝑅− values and 𝑝-values < 0.05 on all datasets in terms of IGD and HV, meaning MOEL significantly
outperforms MONL on these datasets.
Fig. 5 illustrates the nondominated solutions obtained by MOEL and MONL on one instance of each dataset to visually show the
results. From the figure, the nondominated solutions obtained by MOEL have better convergence and diversity properties than MONL.
From the numeric and visual results, MOEL performs better than MONL in solving the MOVRPTW, which confirms the effectiveness
of the edge-based model. Since the travel distance and time matrices are usually asymmetric due to the traffic condition in the real

13
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 7
Statistical results of performance comparisons of MOEL with MONL (MOEL
vs. MONL) by Wilcoxon’s test.

IGD
Problem size 𝑅+ 𝑅− 𝑝-value 𝛼 =0.05 𝛼 =0.15
20 customers 3347 1703 0.004684 Yes Yes
50 customers 4300 750 0 Yes Yes
100 customers 5032 18 0 Yes Yes
150 customers 5050 0 0 Yes Yes

HV
Problem size 𝑅+ 𝑅− 𝑝-value 𝛼 =0.05 𝛼 =0.15
20 customers 3570 1480 0.000325 Yes Yes
50 customers 4440 660 0 Yes Yes
100 customers 5020 30 0 Yes Yes
150 customers 5050 0 0 Yes Yes

Fig. 5. Nondominated solutions obtained by MOEL and MONL on an instance of each dataset.

world MOVRPTW, the edge-based model can better learn these features, whereas the node-based model cannot directly use them. It
is recommended to use the edge-based model to solve the MOVRPs with asymmetric travel distance and time matrices.

5.5. Comparing MOEL with the competitors

The proposed MOEL is compared with several baseline algorithms, which fall into two categories: DRL methods and metaheuristic
methods.

5.5.1. Comparing MOEL with DRL methods

MODRL/D-EL [18], PMOCO [19] and EMNH [24] are selected as the compared DRL methods for solving the MOVRPTW, which
are introduced as follows:

• MODRL/D-EL: It is a two-stage method for MOVRPTW. In the first stage, the decomposition strategy is applied to generate 100
subproblems, and each subproblem is solved by an attention model. As a result, MODRL/D-EL has 100 models to be trained.
These models are trained by the DRL with parameter-transfer strategy. In the second stage, the evolutionary learning is applied
to fine-tune the parameters of the models.
• PMOCO: It is a single model that approximates the whole Pareto front of MOCOPs. To this end, the decoder of PMOCO is
parameterized by reference points. Since the original PMOCO has not been tested on MOVRPTW, we have adapted it to solve
the MOVRPTW considered in this article.
• EMNH: It is a newly proposed meta neural heuristic for MOCOP. In EMNH, a meta-model is first trained and then fine-tuned
by an eﬃcient hierarchical method. EMNH has been shown to be superior to MLDRL [23], another meta neural heuristic for

14
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 8
Average IGD and HV values of MODRL/D-EL, PMOCO, EMNH and MOEL on the datasets. The running time for each algorithm is listed.

Algorithm 20 customers 50 customers 100 customers 150 customers

IGD HV Time/s IGD HV Time/s IGD HV Time/s IGD HV Times

MODRL/D-EL 25.24 27972.3 14 20.45 119964.9 25 13.43 388911.32 60 21.75 798851.23 69

PMOCO 38.3 27585.11 9 22.29 120005 19 14.7 387968.59 38 26.43 785146.2 52
EMNH 81.68 26366.71 23 141.87 110478.7 34 160.05 354198.77 76 202.18 731406.8 107
MOEL 38.48 27599.31 11 22.65 119467.6 27 8.01 393576.2 57 2.17 821194.27 86

Table 9
Statistical results of performance comparisons of MOEL with MODRL/D-EL, PMOCO and EMNH by Wilcoxon’s test.

HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MODRL/D-EL 1348 3702 1 No No MOEL vs. MODRL/D-EL 757 4293 1 No No
MOEL vs. PMOCO 2466 2584 1 No No MOEL vs. PMOCO 2010 3040 1 No No
MOEL vs. EMNH 4782 268 0 Yes Yes MOEL vs. EMNH 4745 305 0 Yes Yes

50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15

MOEL vs. MODRL/D-EL 2008 3042 1 No No MOEL vs. MODRL/D-EL 1158 3892 1 No No
MOEL vs. PMOCO 1910 3140 1 No No MOEL vs. PMOCO 2050 3000 1 No No
MOEL vs. EMNH 5050 0 0 Yes Yes MOEL vs. EMNH 5050 0 0 Yes Yes

100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MODRL/D-EL 4625 425 0 Yes Yes MOEL vs. MODRL/D-EL 3872 1178 0.000004 Yes Yes
MOEL vs. PMOCO 4702 348 0 Yes Yes MOEL vs. PMOCO 4023 1027 0 Yes Yes
MOEL vs. EMNH 5050 0 0 Yes Yes MOEL vs. EMNH 5050 0 0 Yes Yes

150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MODRL/D 5048 2 0 Yes Yes MOEL vs. MODRL/D 5045 5 0 Yes Yes
MOEL vs. PMOCO 5050 0 0 Yes Yes MOEL vs. PMOCO 5048 2 0 Yes Yes
MOEL vs. EMNH 5050 0 0 Yes Yes MOEL vs. EMNH 5050 0 0 Yes Yes

MOVRPTW. Since EMNH has also not been tested on MOVRPTW, we have adapted it to solve the MOVRPTW considered in this
article.

The hyperparameters of MODRL/D-EL, PMOCO and EMNH are set the same as in their original references, except that the training
time is set the same as that of MOEL for fair comparisons. Moreover, the number of symmetric sampled weight vectors 𝑁̃ in EMNH
is set to 1 instead of the number of objectives 𝑚, because we find that the model is difficult to train when 𝑁̃ = 𝑚 but can be well
trained when 𝑁 ̃ = 1 for solving MOVRPTW. All the compared DRL methods are node-based models. To be compared with MOEL,
these methods are trained on the problem instances with 50 customers described in Section 5.1. The weighted sum is chosen as the
scalarizing method in all the compared methods. When testing, PMOCO and EMNH use 100 reference points generated by simplex
lattice design for subproblem decomposition. Since MODRL/D-EL contains 100 models, it can also generate 100 solutions. Therefore,
MOEL and the compared algorithms all generate 100 solutions for each instance.
Table 8 shows the average IGD and HV values of MODRL/D-EL, PMOCO, EMNH and MOEL on each dataset. The best average value
for a dataset is highlighted in bold. Besides, Table 9 shows the statistical results of comparisons between MOEL and the competitors.
The result shows that MOEL performs better in solving large-scale instances. Specifically, MOEL significantly outperforms EMNH
on all datasets and outperforms MODRL/D-EL and PMOCO on the datasets with 100 and 150 customers. For instances with 100
customers, MOEL improves HV by 1.2%, 1.45%, and 11.2%, and IGD by 40.37%, 45.51%, and 95% over MODRL/D-EL, PMOCO,
and EMNH, respectively. For instances with 150 customers, MOEL achieves HV improvements of 2.8%, 4.59%, and 12.28%, and IGD
improvements of 89.9%, 91.8%, and 98.93% over the same algorithms.
Fig. 6 illustrates the nondominated solutions obtained by MOEL and the competitors on one instance of each dataset. It is clear
that the solutions obtained by MOEL have better convergence and diversity properties on instance with larger problem sizes. MOEL
is inferior to MODRL/D-EL and PMOCO on the datasets with 20 and 50 customers. The reason may be that the parameter-transfer
strategy in MODRL/D-EL and the parameterized decoder of PMOCO can help the models better learn the strategy to solve the problem.
However, because MOEL can explicitly capture complex edge features, it has good generalization ability on larger-scale instances.
Therefore, edge-based models, e.g., MOEL, are recommended for solving the large-scale MOVRPTW.

5.5.2. Metaheuristic methods

Five multiobjective metaheuristic methods, i.e., NSGA-II [8], MOEA/D [10], NSGA-III [9], MOEA/D with weight space parti
tion strategy (MOEA/D-D), MOIA [13], are selected as the compared metaheuristic methods for solving the MOVRPTW, which are
introduced as follows:

15
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223

Fig. 6. Nondominated solutions obtained by MOEL and the competitors on an instance of each dataset.

• NSGA-II: It is a Pareto dominance-based multiobjective evolutionary method. Initially, a population is generated. During each
iteration, parents are randomly selected, and offspring are created using crossover and mutation operators. The combined parents
and offspring are then sorted using the fast nondominated sorting method. Finally, a new population is formed for the next
iteration based on the rank of nondominated fronts and crowding distance.
• MOEA/D: Diﬀerent from NSGA-II, MOEA/D is a decomposition-based multiobjective evolutionary method. It decomposes an
MOP into multiple single-objective subproblems by a scalarizing method. In the beginning, a set of reference points and an
initial population are created. The neighborhood relation of subproblems is calculated based on Euclidean distances between
reference points. In each iteration, each subproblem is optimized using the information of its neighboring subproblems and
reproduction operations. The population is updated by the scalar values of the parents and offspring.
• NSGA-III: It extends NSGA-II by introducing a reference point-based selection mechanism, which ensures better convergence and
diversity on high-dimensional Pareto front. The key features include nondominated sorting, reference point association and an
elite preserving strategy.
• MOEA/D-D: The original MOEA/D uses a set of predefined reference points for decomposition. Inspired by [34], a weight space
partition strategy is incorporated into MOEA/D, which dynamically generates a set of well-distributed reference points at each
iteration. This strategy can lead to a solution set with better convergence and diversity.
• MOIA: It is a novel MOIA-based algorithm. In this method, each solution is associated with a subproblem. Then, a novel
decomposition-based clonal selection strategy is designed to clone the solutions with larger improvements for the subproblems,
encouraging searching around the promising subproblems. The algorithm shows superior performance compared to the original
MOIA for solving multiobjective continuous optimization problems.

The population size is set to 100 for all competitors. Therefore, MOEL and the competitors generate 100 solutions for each instance.
For MOVRPTW, the route-exchange crossover and the remove and reinsert mutation operator [38] are applied to generate feasible
offspring. The crossover and mutation rates are set to 1 and 0.01 respectively. Other parameters in the competitors are set the same
as in their original references. The weighted sum is chosen as the scalarizing method in MOEA/D, MOEA/D-D and MOIA for fair
comparison. As in [15], the maximum number of iterations for the competitors is set to 500, 1000, 2000, and 4000, respectively.
Table 10 shows the average IGD and HV values of the competitors with different iterations and MOEL on instances with different
problem sizes. The best average value for a dataset is highlighted in bold. Tables 11-15 show the statistical results of comparisons
between MOEL and the competitors. Although MOEL is trained on instances with 50 customers, it performs well on instances with
different problem sizes. The tables show that MOEL significantly outperforms NSGA-II, MOEA/D, and MOEA/D-D in terms of IGD
and HV on instances with problem sizes of {50, 100, 150}. In addition, MOEL shows superior performance compared to NSGA-III
and MOIA on instances with problem sizes of {100, 150}. Specifically, For instances with 100 customers, MOEL improves HV by

16
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 10
Average IGD and HV values of the competitors with diﬀerent iterations and MOEL on the datasets. The running time for each algorithm is listed.

Algorithm 20 customers 50 customers 100 customers 150 customers

IGD HV Time/s IGD HV Time/s IGD HV Time/s IGD HV Time/s

NSGA-II-500 57.19 24997.68 47 446.66 75729.02 58 1275.31 191186.8 77 2272.29 357142.42 93

NSGA-II-1000 47.4 26101.43 91 293.04 83497.86 110 964.58 214671.6 141 1952.55 387588.09 118
NSGA-II-2000 43.23 26914.53 178 156.35 93776.09 217 671.05 240373.7 270 1482.3 440167.92 339
NSGA-II-4000 27.34 27821.81 351 65.32 103129 426 362.57 277982.6 527 966.52 504338.85 647

MOEA/D-500 86.84 25428.56 11 355.24 81865.96 21 993.06 210704.9 39 1849.3 387708.67 52

MOEA/D-1000 65.95 26730.03 21 185.89 93289.36 39 655.36 240095 70 1313.13 439084.24 98
MOEA/D-2000 57.42 27117.64 42 91.19 103979.8 74 372.46 274956.4 132 736.46 507649.81 188
MOEA/D-4000 51.64 27523.97 83 112.92 108700.8 142 279.68 308756 250 432.37 583151.14 359

NSGA-III-500 75.33 25233.52 56 536.97 75622.02 63 1343.68 195693.8 90 2400.32 356568.67 94

NSGA-III-1000 53.19 26430.37 109 373.35 84684.17 124 1135.55 212112.4 176 2087.38 385554.36 182
NSGA-III-2000 41.67 27590.77 217 183.74 95447.92 243 765.36 244486.4 308 1538.63 445713.31 354
NSGA-III-4000 24.3 28091.7 444 11.48 143782 475 418.31 281201.3 583 1023.68 508073.99 758

MOEA/D-D-500 77.65 25862.78 13 373.56 80957.85 23 1007.77 210486.4 40 1847.33 388207.8 52

MOEA/D-D-1000 62.36 26855.75 27 222.67 91907.69 45 638.79 239797 77 1305.06 438341.8 103
MOEA/D-D-2000 57.7 27274.2 53 134.44 101961.4 87 345.01 276203.5 147 703.02 509706.84 202
MOEA/D-D-4000 58.44 27165.08 105 84.42 110967.1 165 239.96 312638.7 270 393.38 588128.33 388

MOIA-500 68.65 25991.2 25 364.04 82130.32 37 1034.79 208561.5 63 1960.36 385017.88 89

MOIA-1000 51.23 27300.74 48 175.56 93075.07 73 641.94 240184.2 122 1397.11 435282.55 172
MOIA-2000 44.69 27886.05 95 100.89 103664.8 137 334.43 274939.4 225 792.44 502605.41 330
MOIA-4000 38.94 28080.43 186 60.68 119883.1 265 197.33 313989.1 422 363.42 580011.44 616

MOEL 38.48 27599.31 11 22.65 119467.6 27 8.01 393576 57 2.17 821194.27 86

Table 11
Statistical results of performance comparisons of MOEL with NSGA-II by Wilcoxon’s test.

HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-II-500 5025 25 0 Yes Yes MOEL vs. NSGA-II-500 3752 1298 0.000024 Yes Yes
MOEL vs. NSGA-II-1000 4598 452 0 Yes Yes MOEL vs. NSGA-II-1000 3172 1878 0.025993 Yes Yes
MOEL vs. NSGA-II-2000 3982 1068 0.000001 Yes Yes MOEL vs. NSGA-II-2000 2849 2201 0.264535 No No
MOEL vs. NSGA-II-4000 2287 2763 1 No No MOEL vs. NSGA-II-4000 1627 3423 1 No No

50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15

MOEL vs. NSGA-II-500 5050 0 0 Yes Yes MOEL vs. NSGA-II-500 5050 0 0 Yes Yes
MOEL vs. NSGA-II-1000 5050 0 0 Yes Yes MOEL vs. NSGA-II-1000 5050 0 0 Yes Yes
MOEL vs. NSGA-II-2000 5050 0 0 Yes Yes MOEL vs. NSGA-II-2000 5050 0 0 Yes Yes
MOEL vs. NSGA-II-4000 5050 0 0 Yes Yes MOEL vs. NSGA-II-4000 4814 236 0 Yes Yes

100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-II-500 5050 0 0 Yes Yes MOEL vs. NSGA-II-500 5050 0 0 Yes Yes
MOEL vs. NSGA-II-1000 5050 0 0 Yes Yes MOEL vs. NSGA-II-1000 5050 0 0 Yes Yes
MOEL vs. NSGA-II-2000 5050 0 0 Yes Yes MOEL vs. NSGA-II-2000 5050 0 0 Yes Yes
MOEL vs. NSGA-II-4000 5050 0 0 Yes Yes MOEL vs. NSGA-II-4000 5050 0 0 Yes Yes

150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-II-500 5050 0 0 Yes Yes MOEL vs. NSGA-II-500 5050 0 0 Yes Yes
MOEL vs. NSGA-II-1000 5050 0 0 Yes Yes MOEL vs. NSGA-II-1000 5050 0 0 Yes Yes
MOEL vs. NSGA-II-2000 5050 0 0 Yes Yes MOEL vs. NSGA-II-2000 5050 0 0 Yes Yes
MOEL vs. NSGA-II-4000 5050 0 0 Yes Yes MOEL vs. NSGA-II-4000 5050 0 0 Yes Yes

41.57%, 27.47%, 39.97%, 25.88%, and 25.34%, and IGD by 97.80%, 97.14%, 98.09%, 96.66%, and 95.94% compared to NSGA-II,
MOEA/D, NSGA-III, MOEA/D-D, and MOIA, respectively. For instances with 150 customers, MOEL achieves HV improvements of
62.84%, 40.83%, 61.61%, 39.63%, and 41.59%, and IGD improvements of 99.78%, 99.50%, 99.80%, 99.45%, and 99.40% over the
same algorithms.
Figs. 7-11 visually present the results of MOEL and the competitors on one instance of each dataset. The competitors are iterative
methods requiring numerous iterations to enhance the quality of solutions. It can be observed that the solutions obtained by the
competitors improve as the number of iterations increases. Nevertheless, the solutions obtained by MOEL show better convergence
and diversity properties, especially for the instances with more customers. In terms of running time, because the competitors are
iterative methods, whereas the DRL methods, e.g., MOEL, are constructive methods, the running time of MOEL is much shorter than
that of the competitors. For example, for an instance with 150 customers, NSGA-II, MOEA/D, NSGA-III, MOEA/D-D, MOIA with

17
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223

Fig. 7. Nondominated solutions obtained by MOEL and NSGA-II on an instance of each dataset.

Fig. 8. Nondominated solutions obtained by MOEL and MOEA/D on an instance of each dataset.

18
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223

Fig. 9. Nondominated solutions obtained by MOEL and MOEA/D-D on an instance of each dataset.

Fig. 10. Nondominated solutions obtained by MOEL and NSGA-III on an instance of each dataset.

19
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 12
Statistical results of performance comparisons of MOEL with MOEA/D by Wilcoxon’s test.

HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-500 4937 113 0 Yes Yes MOEL vs. MOEA/D-500 4646 404 0 Yes Yes
MOEL vs. MOEA/D-1000 4032 1018 0 Yes Yes MOEL vs. MOEA/D-1000 4146 904 Yes Yes
MOEL vs. MOEA/D-2000 3234 1816 0.014708 Yes Yes MOEL vs. MOEA/D-2000 3740 1310 0.000029 Yes Yes
MOEL vs. MOEA/D-4000 2408 2642 1 No No MOEL vs. MOEA/D-4000 3393 1657 0.002825 Yes Yes

50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15

MOEL vs. MOEA/D-500 5050 0 0 Yes Yes MOEL vs. MOEA/D-500 5050 0 0 Yes Yes
MOEL vs. MOEA/D-1000 5050 0 0 Yes Yes MOEL vs. MOEA/D-1000 5048 2 0 Yes Yes
MOEL vs. MOEA/D-2000 5050 0 0 Yes Yes MOEL vs. MOEA/D-2000 5049 1 0 Yes Yes
MOEL vs. MOEA/D-4000 5050 0 0 Yes Yes MOEL vs. MOEA/D-4000 4968 82 0 Yes Yes

100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-500 5050 0 0 Yes Yes MOEL vs. MOEA/D-500 5050 0 0 Yes Yes
MOEL vs. MOEA/D-1000 5050 0 0 Yes Yes MOEL vs. MOEA/D-1000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-2000 5050 0 0 Yes Yes MOEL vs. MOEA/D-2000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-4000 5050 0 0 Yes Yes MOEL vs. MOEA/D-4000 5050 0 0 Yes Yes

150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-500 5050 0 0 Yes Yes MOEL vs. MOEA/D-500 5050 0 0 Yes Yes
MOEL vs. MOEA/D-1000 5050 0 0 Yes Yes MOEL vs. MOEA/D-1000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-2000 5050 0 0 Yes Yes MOEL vs. MOEA/D-2000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-4000 5050 0 0 Yes Yes MOEL vs. MOEA/D-4000 5050 0 0 Yes Yes

Table 13
Statistical results of performance comparisons of MOEL with NSGA-III by Wilcoxon’s test.

HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-III-500 4941 109 0 Yes Yes MOEL vs. NSGA-III-500 4326 724 0 Yes Yes
MOEL vs. NSGA-III-1000 4483 567 0 Yes Yes MOEL vs. NSGA-III-1000 3379 1671 0.003303 Yes Yes
MOEL vs. NSGA-III-2000 2494 2556 1 No No MOEL vs. NSGA-III-2000 2492 2558 1 No No
MOEL vs. NSGA-III-4000 1439 3611 1 No No MOEL vs. NSGA-III-4000 1154 3896 1 No No

50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15

MOEL vs. NSGA-III-500 5050 0 0 Yes Yes MOEL vs. NSGA-III-500 5050 0 0 Yes Yes
MOEL vs. NSGA-III-1000 5050 0 0 Yes Yes MOEL vs. NSGA-III-1000 5050 0 0 Yes Yes
MOEL vs. NSGA-III-2000 5050 0 0 Yes Yes MOEL vs. NSGA-III-2000 5050 0 0 Yes Yes
MOEL vs. NSGA-III-4000 23 5027 1 No No MOEL vs. NSGA-III-4000 877 4173 1 No No

100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-III-500 5050 0 0 Yes Yes MOEL vs. NSGA-III-500 5050 0 0 Yes Yes
MOEL vs. NSGA-III-1000 5050 0 0 Yes Yes MOEL vs. NSGA-III-1000 5050 0 0 Yes Yes
MOEL vs. NSGA-III-2000 5050 0 0 Yes Yes MOEL vs. NSGA-III-2000 5050 0 0 Yes Yes
MOEL vs. NSGA-III-4000 5050 0 0 Yes Yes MOEL vs. NSGA-III-4000 5050 0 0 Yes Yes

150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. NSGA-III-500 5050 0 0 Yes Yes MOEL vs. NSGA-III-500 5050 0 0 Yes Yes
MOEL vs. NSGA-III-1000 5050 0 0 Yes Yes MOEL vs. NSGA-III-1000 5050 0 0 Yes Yes
MOEL vs. NSGA-III-2000 5050 0 0 Yes Yes MOEL vs. NSGA-III-2000 5050 0 0 Yes Yes
MOEL vs. NSGA-III-4000 5050 0 0 Yes Yes MOEL vs. NSGA-III-4000 5050 0 0 Yes Yes

4000 iterations and MOEL cost 647, 359, 758, 388, 616 and 86 seconds respectively, as shown in Table 10. This represents runtime
reductions of 86.77%, 76.04%, 88.65%, 77.84%, and 86.04% compared to these methods. The results demonstrate that MOEL is more
eﬃcient than the metaheuristic methods.
In summary, MOEL performs better than the compared metaheuristic methods in terms of solution quality and running time for
solving MOVRPTW. Note that we use the classical crossover and mutation operators in the competitors, and we believe that novel
operators exist that are better suited to solving the MOVRPTW. However, designing such operators is not trivial and requires specific
knowledge from experts. The merit of MOEL is that it can automatically learn the heuristics from the data and doesn’t heavily depend
on the expert’s knowledge. Meanwhile, when trained, MOEL can obtain the nondominated solutions in a short running time, which
is more suitable in real-time scenarios than the traditional metaheuristic methods.

20
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Table 14
Statistical results of performance comparisons of MOEL with MOEA/D-D by Wilcoxon’s test.

HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-D-500 4693 357 0 Yes Yes MOEL vs. MOEA/D-D-500 4603 447 0 Yes Yes
MOEL vs. MOEA/D-D-1000 3893 1157 0.000003 Yes Yes MOEL vs. MOEA/D-D-1000 4163 887 0 Yes Yes
MOEL vs. MOEA/D-D-2000 3065 1985 0.06311 No Yes MOEL vs. MOEA/D-D-2000 3888 1162 0.000003 Yes Yes
MOEL vs. MOEA/D-D-4000 2617 2433 0.750451 No No MOEL vs. MOEA/D-D-4000 3747 1303 0.000026 Yes Yes

50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15

MOEL vs. MOEA/D-D-500 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-500 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-1000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-1000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-2000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-2000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-4000 5040 10 0 Yes Yes MOEL vs. MOEA/D-D-4000 4976 74 0 Yes Yes

100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-D-500 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-500 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-1000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-1000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-2000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-2000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-4000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-4000 5050 0 0 Yes Yes

150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOEA/D-D-500 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-500 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-1000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-1000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-2000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-2000 5050 0 0 Yes Yes
MOEL vs. MOEA/D-D-4000 5050 0 0 Yes Yes MOEL vs. MOEA/D-D-4000 5050 0 0 Yes Yes

Table 15
Statistical results of performance comparisons of MOEL with MOIA by Wilcoxon’s test.

HV IGD
20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 20 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOIA-500 4702 348 0 Yes Yes MOEL vs. MOIA-500 4547 503 0 Yes Yes
MOEL vs. MOIA-1000 3407 1643 0.002411 Yes Yes MOEL vs. MOIA-1000 3854 1196 0.000005 Yes Yes
MOEL vs. MOIA-2000 1832 3218 1 No No MOEL vs. MOIA-2000 3404 1646 0.002495 Yes Yes
MOEL vs. MOIA-4000 1445 3605 1 No No MOEL vs. MOIA-4000 2788 2262 0.364937 No No

50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 50 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15

MOEL vs. MOIA-500 5050 0 0 Yes Yes MOEL vs. MOIA-500 5050 0 0 Yes Yes
MOEL vs. MOIA-1000 5050 0 0 Yes Yes MOEL vs. MOIA-1000 5050 0 0 Yes Yes
MOEL vs. MOIA-2000 5050 0 0 Yes Yes MOEL vs. MOIA-2000 5048 2 0 Yes Yes
MOEL vs. MOIA-4000 2775 2275 0.389073 No No MOEL vs. MOIA-4000 4842 208 0 Yes Yes

100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 100 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOIA-500 5050 0 0 Yes Yes MOEL vs. MOIA-500 5050 0 0 Yes Yes
MOEL vs. MOIA-1000 5050 0 0 Yes Yes MOEL vs. MOIA-1000 5050 0 0 Yes Yes
MOEL vs. MOIA-2000 5050 0 0 Yes Yes MOEL vs. MOIA-2000 5050 0 0 Yes Yes
MOEL vs. MOIA-4000 5050 0 0 Yes Yes MOEL vs. MOIA-4000 5050 0 0 Yes Yes

150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15 150 customers 𝑅+ 𝑅- 𝑝-value 𝛼 =0.05 𝛼 =0.15
MOEL vs. MOIA-500 5050 0 0 Yes Yes MOEL vs. MOIA-500 5050 0 0 Yes Yes
MOEL vs. MOIA-1000 5050 0 0 Yes Yes MOEL vs. MOIA-1000 5050 0 0 Yes Yes
MOEL vs. MOIA-2000 5050 0 0 Yes Yes MOEL vs. MOIA-2000 5050 0 0 Yes Yes
MOEL vs. MOIA-4000 5050 0 0 Yes Yes MOEL vs. MOIA-4000 5050 0 0 Yes Yes

6. Conclusion and future work

This article focuses on a real-world MOVRPTW whose travel distance and time matrices are asymmetric due to complex traffic
conditions. To solve this problem, a multiobjective edge-based learning algorithm, MOEL, is designed and implemented. The algorithm
follows the encoder-decoder architecture. To explicitly learn the complex edge features, the encoder encodes each edge, including
time and distance information, to produce edge embeddings. The decoder generates feasible solutions based on the edge embeddings.
To handle the multiobjective nature of the problem, the preference of the decision maker is incorporated into the input vectors to
make a single model solve the problem with any preferences. To assess the performance of the proposed MOEL, an experiment is
first conducted to show the effectiveness of the edge-based model. Then, MOEL is compared with three state-of-the-art DRL methods
and five representative metaheuristic methods on the real-world MOVRPTW instances. Experimental results indicate that MOEL
significantly outperforms the competitors on most instances and is especially effective in solving large-scale instances.
The proposed MOEL has several limitations as follows:

21
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223

Fig. 11. Nondominated solutions obtained by MOEL and MOIA on an instance of each dataset.

• Unlike the node-based models, MOEL needs to compute all edge embeddings for a batch of instances. Therefore, it needs more
computational resources than node-based models, including GPU time and memory. When solving larger-scale problems (e.g.,
200 customers), it is hard to train and use the MOEL model due to the limitations of our available resources (e.g., a single 4090
GPU).
• MOEL obtains worse results than MODRL/D-EL and PMOCO on the small-scale instances.

Based on these limitations, this work can be extended in several directions in the future. First, divide-and-conquer methods [39]
can be adopted to tackle the MOVRPTW with more customers under limited resources. Second, the edge-based formulation is gen
eral and can be used in any algorithm following the encoder-decoder architecture. To improve the algorithm’s performance, it can
be incorporated into other algorithms, e.g., MODRL/D-EL and PMOCO. Third, instance augmentation can be used to enhance the
generalization ability of MOEL.

CRediT authorship contribution statement

Ying Zhou: Writing -- original draft, Methodology, Investigation, Conceptualization. Lingjing Kong: Writing -- review & editing,
Validation, Software. Hui Wang: Writing -- review & editing, Validation, Funding acquisition.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Acknowledgement

This work is supported by the National Natural Science Foundation of China (62202314, 62072483), the Natural Science Foun
dation of Guangdong Province of China (2018A0303130055, 2022A1515010417, 2018A030310664), the Shenzhen Fundamental
Research Fund under Grant No. 0220820010535001 and the Education Planning Project of Guangdong Province ``The Exploration
and Practice of Academic Value-Added Evaluation Under Artificial Intelligence Empowerment'' (2024GXJK782).

22
Y. Zhou, L. Kong and H. Wang
Information Sciences 715 (2025) 122223
Data availability

The data that support the findings of this study are openly available in https://github.com/wzydeath/VRPTW-data.

References

[1] G.B. Dantzig, J.H. Ramser, The truck dispatching problem, Manag. Sci. 6 (1) (1959) 80--91.
[2] G.D. Konstantakopoulos, S.P. Gayialis, E.P. Kechagias, Vehicle routing problem and related algorithms for logistics distribution: a literature review and classifi
cation, Oper. Res. 22 (2022) 2033--2062.
[3] J.K. Lenstra, A.H.G.R. Kan, Complexity of vehicle routing and scheduling problems, Networks 11 (2) (1981) 221--227.
[4] A. Bogyrbayeva, M. Meraliyev, T. Mustakhov, B. Dauletbayev, Machine learning to solve vehicle routing problems: a survey, IEEE Trans. Intell. Transp. Syst.
25 (6) (2024) 4754--4772.
[5] M.T.M. Emmerich, A.H. Deutz, A tutorial on multiobjective optimization: fundamentals and evolutionary methods, Nat. Comput. 17 (2018) 585--609.
[6] K. Deb, K. Sindhya, J. Hakanen, Multi-objective optimizations, in: Decision Sciences, Taylor & Francis Group, 2016, pp. 146--179.
[7] S. Zajac, S. Huber, Objectives and methods in multi-objective routing problems: a survey and classification scheme, Eur. J. Oper. Res. 290 (1) (2021) 1--25.
[8] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182--197.
[9] K. Deb, H. Jain, An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: Solving problems
with box constraints, IEEE Trans. Evol. Comput. 18 (4) (2014) 577--601.
[10] Q. Zhang, H. Li, MOEA/D: a multiobjective evolutionary algorithm based on decomposition, IEEE Trans. Evol. Comput. 11 (6) (2007) 712--731.
[11] L. Li, Y. Li, Q. Lin, Z. Ming, C.A.C. Coello, A convergence and diversity guided leader selection strategy for many-objective particle swarm optimization, Eng.
Appl. Artif. Intell. 115 (2022) 105249.
[12] L. Li, Y. Li, Q. Lin, S. Liu, J. Zhou, Z. Ming, Neural net-enhanced competitive swarm optimizer for large-scale multiobjective optimization, IEEE Trans. Cybern.
54 (6) (2023) 3502--3515.
[13] L. Li, Q. Lin, S. Liu, D. Gong, C.A.C. Coello, Z. Ming, A novel multi-objective immune algorithm with a decomposition-based clonal selection, Appl. Soft Comput.
81 (2019) 105490.
[14] L. Paquete, T. Schiavinotto, T. Stützle, On local optima in multiobjective combinatorial optimization problems, Ann. Oper. Res. 156 (1) (2007) 83.
[15] K. Li, T. Zhang, R. Wang, Deep reinforcement learning for multiobjective optimization, IEEE Trans. Cybern. 51 (6) (2021) 3103--3114.
[16] H. Wu, J. Wang, Z. Zhang, Modrl/d-am: multiobjective deep reinforcement learning algorithm using decomposition and attention model for multiobjective
optimization, in: Artificial Intelligence Algorithms and Applications, ISICA 2019, 2020, pp. 575--589.
[17] Y. Shao, J.C.-W. Lin, G. Srivastava, D. Guo, H. Zhang, H. Yi, A. Jolfaei, Multi-objective neural evolutionary algorithm for combinatorial optimization problems,
IEEE Trans. Neural Netw. Learn. Syst. 34 (4) (2023) 2133--2143.
[18] Y. Zhang, J. Wang, Z. Zhang, Y. Zhou, Modrl/d-el: multiobjective deep reinforcement learning with evolutionary learning for multiobjective optimization, in:
2021 International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1--8.
[19] X. Lin, Z. Yang, Q. Zhang, Pareto set learning for neural multi-objective combinatorial optimization, in: 10th International Conference on Learning Representations
(ICLR 2022), 2022, pp. 1--30.
[20] T. Ye, Z. Zhang, J. Chen, J. Wang, Weight-specific-decoder attention model to solve multiobjective combinatorial optimization problems, in: 2022 IEEE Interna
tional Conference on Systems, Man, and Cybernetics (SMC), 2022, pp. 2839--2844.
[21] L. Yang Gao, R. Wang, C. Liu, Z. Hong Jia, Multi-objective pointer network for combinatorial optimization, arXiv:2204.11860, 2022.
[22] Z. Wang, S. Yao, G. Li, Q. Zhang, Multiobjective combinatorial optimization using a single deep reinforcement learning model, IEEE Trans. Cybern. 54 (3) (2024)
1984--1996.
[23] Z. Zhang, Z. Wu, H. Zhang, J. Wang, Meta-learning-based deep reinforcement learning for multiobjective optimization problems, IEEE Trans. Neural Netw. Learn.
Syst. 34 (10) (2023) 7978--7991.
[24] J. Chen, J. Wang, Z. Zhang, Z. Cao, T. Ye, S. Chen, Eﬃcient meta neural heuristic for multi-objective combinatorial optimization, in: NeurIPS 2023, in: Advances
in Neural Information Processing Systems, vol. 36, 2023, 2023, pp. 1--13.
[25] S. Li, F. Wang, Q. He, X. Wang, Deep reinforcement learning for multi-objective combinatorial optimization: a case study on multi-objective traveling salesman
problem, Swarm Evol. Comput. 83 (2023) 101398.
[26] Y. Zhang, J. Wang, Z. Zhang, Edge-based formulation with graph attention network for practical vehicle routing problem with time windows, in: 2022 International
Joint Conference on Neural Networks (IJCNN), 2022, pp. 1--8.
[27] W. Kool, H. van Hoof, M. Welling, Attention, learn to solve routing problems!, in: ICLR 2019, 2019, pp. 1--25.
[28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Łukasz Kaiser, I. Polosukhin, Attention is all you need, in: NIPS 2017, in: Advances in
Neural Information Processing Systems, vol. 30, 2017, pp. 1--11.
[29] S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, in: 32nd International Conference on Machine
Learning, vol. 37, 2015, pp. 448--456.
[30] V. Nair, G.E. Hinton, Rectified linear units improve restricted Boltzmann machines, in: 27th International Conference on Machine Learning, 2010, pp. 1--8.
[31] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (8) (1997) 1735--1780.
[32] R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8 (1992) 229--256.
[33] D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, in: International Conference on Learning Representations, 2015, pp. 1--15.
[34] Y. Zhou, L. Kong, Y. Cai, Z. Wu, S. Liu, J. Hong, K. Wu, A decomposition-based local search for large-scale many-objective vehicle routing problems with
simultaneous delivery and pickup and time windows, IEEE Syst. J. 14 (4) (2020) 5253--5264.
[35] J.G. Falcón-Cardona, C.A.C. Coello, Indicator-based multi-objective evolutionary algorithms: a comprehensive survey, ACM Comput. Surv. 53 (2) (2021) 1--35.
[36] J. Derrac, S. García, D. Molina, F. Herrera, A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and
swarm intelligence algorithms, Swarm Evol. Comput. 1 (1) (2011) 3--18.
[37] H. Scheffé, Experiments with mixtures, J. R. Stat. Soc., Ser. B, Methodol. 20 (2) (1958) 344--360.
[38] K.C. Tan, Y.H. Chew, A hybrid multiobjective evolutionary algorithm for solving vehicle routing problem with time windows, Comput. Optim. Appl. 34 (2006)
115--151.
[39] H. Ye, J. Wang, H. Liang, Z. Cao, Y. Li, F. Li, Glop: learning global partition and local construction for solving large-scale routing problems in real-time, in:
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, 2024, pp. 20284--20292.

Cersai: Central Registry of Securitisation Asset Reconstruction and Security Interest of India
100% (1)
Cersai: Central Registry of Securitisation Asset Reconstruction and Security Interest of India
3 pages
TN3 Assessment To The Teacher
100% (1)
TN3 Assessment To The Teacher
5 pages
The Book of Bone and Ebony
100% (1)
The Book of Bone and Ebony
118 pages
A Multiobjective Evolutionary Algorithm With Enhanced Reproduction Operators for the Vehicle Routing Problem With Time Windows
No ratings yet
A Multiobjective Evolutionary Algorithm With Enhanced Reproduction Operators for the Vehicle Routing Problem With Time Windows
8 pages
P Cell Evolutionary
No ratings yet
P Cell Evolutionary
29 pages
56579
No ratings yet
56579
8 pages
A New Mathematical Model For A Competitive Vehicle Routing Problem PDF
No ratings yet
A New Mathematical Model For A Competitive Vehicle Routing Problem PDF
10 pages
Vehicle Routing Problem With Time Windows A Hybrid Particle Swarm Optimization Approach
No ratings yet
Vehicle Routing Problem With Time Windows A Hybrid Particle Swarm Optimization Approach
5 pages
Review On Efficient Multi-Objective Optimization For The Vehicle Routing Problem With Time
No ratings yet
Review On Efficient Multi-Objective Optimization For The Vehicle Routing Problem With Time
7 pages
New Lower Bounds On The Number of Vehicles For The Vehicle Routing Problem With Time Windows
No ratings yet
New Lower Bounds On The Number of Vehicles For The Vehicle Routing Problem With Time Windows
2 pages
A Hybrid Algorithm For Vehicle Routing Problem With Time Windows and Target Time
No ratings yet
A Hybrid Algorithm For Vehicle Routing Problem With Time Windows and Target Time
10 pages
JIFS202129
No ratings yet
JIFS202129
14 pages
ALNS Routing
No ratings yet
ALNS Routing
44 pages
A General Heuristic For Vehicle Routing Problems: David Pisinger, Stefan Ropke
No ratings yet
A General Heuristic For Vehicle Routing Problems: David Pisinger, Stefan Ropke
33 pages
Deep Reinforcement Learning For Multi-Objective Optimization
No ratings yet
Deep Reinforcement Learning For Multi-Objective Optimization
12 pages
1-S2.0-S1568494610000773-Main RUTEO PDF
No ratings yet
1-S2.0-S1568494610000773-Main RUTEO PDF
12 pages
Solving Large-Scale Vehicle Routing Problems With Time Windows: The State-of-the-Art
No ratings yet
Solving Large-Scale Vehicle Routing Problems With Time Windows: The State-of-the-Art
45 pages
Artificial Intelligence Heuristics in Solving Vehicle Routing Problems With Time Window Constraints
No ratings yet
Artificial Intelligence Heuristics in Solving Vehicle Routing Problems With Time Window Constraints
13 pages
s10589-005-3070-3
No ratings yet
s10589-005-3070-3
37 pages
General Paper For VRP PDF
No ratings yet
General Paper For VRP PDF
9 pages
Optimizing The Vehicle Routing Problem With Time Windows: A Discrete Particle Swarm Optimization Approach
No ratings yet
Optimizing The Vehicle Routing Problem With Time Windows: A Discrete Particle Swarm Optimization Approach
14 pages
VRPTW Journal
No ratings yet
VRPTW Journal
32 pages
A New Mathematical Model For A Competitive Vehicle Routing Problem With Time Windows Solved by Simulated Annealing
No ratings yet
A New Mathematical Model For A Competitive Vehicle Routing Problem With Time Windows Solved by Simulated Annealing
10 pages
A Sequential Insertion Heuristic For The Initial S
No ratings yet
A Sequential Insertion Heuristic For The Initial S
13 pages
A New Mathematical
No ratings yet
A New Mathematical
10 pages
2013_A_Hybrid_Chaos-Particle_Swarm_Optimization_Algorit
No ratings yet
2013_A_Hybrid_Chaos-Particle_Swarm_Optimization_Algorit
24 pages
IJRR41
No ratings yet
IJRR41
9 pages
A Vehicle Routing Problem Solution Considering Minimizing Fuel Consumption
No ratings yet
A Vehicle Routing Problem Solution Considering Minimizing Fuel Consumption
7 pages
Fams 09 1155356
No ratings yet
Fams 09 1155356
17 pages
MonteCarloVehicleRouting
No ratings yet
MonteCarloVehicleRouting
9 pages
JN - 2013 - Li - Model and Simulation For Collaborative VRPSPD
No ratings yet
JN - 2013 - Li - Model and Simulation For Collaborative VRPSPD
8 pages
Lei Haoieeetevc2024
No ratings yet
Lei Haoieeetevc2024
14 pages
2018 Reinforcement Learning For Solving VRP
No ratings yet
2018 Reinforcement Learning For Solving VRP
21 pages
Solving the vehicle routing problem with deep reinforcement learning
No ratings yet
Solving the vehicle routing problem with deep reinforcement learning
15 pages
Tabu Search
No ratings yet
Tabu Search
6 pages
Two Phase Algorithm For Solving VRPTW Problem
No ratings yet
Two Phase Algorithm For Solving VRPTW Problem
16 pages
Improved Simulated Annealing For Optimization of Vehicle Routing Problem With Time Windows (VRPTW)
No ratings yet
Improved Simulated Annealing For Optimization of Vehicle Routing Problem With Time Windows (VRPTW)
9 pages
Close Open VRPTW
No ratings yet
Close Open VRPTW
10 pages
Evolving Heuristics For Dynamic Vehicle Routing With Time Windows Using Genetic Programming
No ratings yet
Evolving Heuristics For Dynamic Vehicle Routing With Time Windows Using Genetic Programming
8 pages
A Bi-Criteria Vehicles Routing Problem
No ratings yet
A Bi-Criteria Vehicles Routing Problem
23 pages
MODRL/D-EL: Multiobjective Deep Reinforcement Learning With Evolutionary Learning For Multiobjective Optimization
No ratings yet
MODRL/D-EL: Multiobjective Deep Reinforcement Learning With Evolutionary Learning For Multiobjective Optimization
8 pages
A Branch and Cut Procedure
No ratings yet
A Branch and Cut Procedure
21 pages
Applications of Combinatorial Optimization
From Everand
Applications of Combinatorial Optimization
Vangelis Th. Paschos
No ratings yet
2938521352185632
No ratings yet
2938521352185632
6 pages
Ehicle Routing Problems With Soft Time Windows N Optimization Based Approach
No ratings yet
Ehicle Routing Problems With Soft Time Windows N Optimization Based Approach
10 pages
VRPTW
No ratings yet
VRPTW
30 pages
A Particle Swarm Optimization Algorithm With Crossover For Vehicle Routing Problem With Time Windows
No ratings yet
A Particle Swarm Optimization Algorithm With Crossover For Vehicle Routing Problem With Time Windows
4 pages
An Effective Search Framework Combining Meta-Heuristics To Solve The Vehicle Routing Problems With Time Windows
No ratings yet
An Effective Search Framework Combining Meta-Heuristics To Solve The Vehicle Routing Problems With Time Windows
23 pages
MIP Model For Split Delivery VRP With Fleet & Driver Scheduling
No ratings yet
MIP Model For Split Delivery VRP With Fleet & Driver Scheduling
5 pages
Regression Machine Learning Models for the Short-Time Prediction of Genetic Algorithm Results in a Vehicle Routing Problem
No ratings yet
Regression Machine Learning Models for the Short-Time Prediction of Genetic Algorithm Results in a Vehicle Routing Problem
15 pages
SSRN Id4120102
No ratings yet
SSRN Id4120102
37 pages
Vehicle Routing Problem PDF
100% (1)
Vehicle Routing Problem PDF
152 pages
The Vehicle Routing Problem With Multiple Use of Vehicles
No ratings yet
The Vehicle Routing Problem With Multiple Use of Vehicles
20 pages
An Approach For Vehicle Routing Problem Using Grasshopper Optimization Algorithm and Simulated Annealing
No ratings yet
An Approach For Vehicle Routing Problem Using Grasshopper Optimization Algorithm and Simulated Annealing
6 pages
A Capacitated Multi Pickup Online Food Delivery Problem With Time Windows: A Branch and Cut Algorithm
No ratings yet
A Capacitated Multi Pickup Online Food Delivery Problem With Time Windows: A Branch and Cut Algorithm
22 pages
R495
No ratings yet
R495
25 pages
1 s2.0 S0968090X20307610 Main
No ratings yet
1 s2.0 S0968090X20307610 Main
14 pages
A Framework For Solving Time Dependent Vehicle Routing Problem With Time Windows
No ratings yet
A Framework For Solving Time Dependent Vehicle Routing Problem With Time Windows
24 pages
The Technology Selection Process For Precision Agriculture Using Multi-Criteria Decision Making
No ratings yet
The Technology Selection Process For Precision Agriculture Using Multi-Criteria Decision Making
11 pages
Full Paper - 1st - ICAST.20.053
No ratings yet
Full Paper - 1st - ICAST.20.053
7 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Computational Geometry: Exploring Geometric Insights for Computer Vision
From Everand
Computational Geometry: Exploring Geometric Insights for Computer Vision
Fouad Sabry
No ratings yet
Blockchain Foundation Courseware - English
From Everand
Blockchain Foundation Courseware - English
Eppo Luppes
No ratings yet
7. 1-s2.0-S0029801825000599-main
No ratings yet
7. 1-s2.0-S0029801825000599-main
20 pages
5. 1-s2.0-S0305054824002302-main
No ratings yet
5. 1-s2.0-S0305054824002302-main
15 pages
CCDC 2014 6852736
No ratings yet
CCDC 2014 6852736
5 pages
Mathematics 11 01274
No ratings yet
Mathematics 11 01274
18 pages
Systematic Literature Review On Solution Approaches For The Index Tracking Problem in The Last Decade
No ratings yet
Systematic Literature Review On Solution Approaches For The Index Tracking Problem in The Last Decade
23 pages
L. William Ross-Child M.L.C. - The Silent Struggle - Taking Charge of ADHD in Adults, The Complete Guide To Accept Yourself, Embrace Neurodiversity, Master Your Moods, Improve Relationships, Stay Orga
No ratings yet
L. William Ross-Child M.L.C. - The Silent Struggle - Taking Charge of ADHD in Adults, The Complete Guide To Accept Yourself, Embrace Neurodiversity, Master Your Moods, Improve Relationships, Stay Orga
134 pages
Kris Hoff Honored as a Woman of Excellence for Spring 2025 by P.O.W.E.R. (Professional Organization of Women of Excellence Recognized)
No ratings yet
Kris Hoff Honored as a Woman of Excellence for Spring 2025 by P.O.W.E.R. (Professional Organization of Women of Excellence Recognized)
3 pages
Technical Specification: 1) Filter Feed Pump With Motor 1 Nos
No ratings yet
Technical Specification: 1) Filter Feed Pump With Motor 1 Nos
4 pages
JD Office Manager Executive Secretary
No ratings yet
JD Office Manager Executive Secretary
1 page
The Vocabulary - Com Top 1000 - Vocabulary List
No ratings yet
The Vocabulary - Com Top 1000 - Vocabulary List
163 pages
Frobenius Method and Bessel Function: ODE: Assignment-7
No ratings yet
Frobenius Method and Bessel Function: ODE: Assignment-7
13 pages
VIVID E9 XDclear Ultrasound Machine Brochure
No ratings yet
VIVID E9 XDclear Ultrasound Machine Brochure
8 pages
Newsletter(MGN) (1)
No ratings yet
Newsletter(MGN) (1)
32 pages
Ayush Raj Singh - COA - Mini Project
No ratings yet
Ayush Raj Singh - COA - Mini Project
16 pages
Scoreboarding or SVA?: in A UVM Class-Based Environment
No ratings yet
Scoreboarding or SVA?: in A UVM Class-Based Environment
3 pages
IXL For IEP Progress Monitoring Implementation Guide
No ratings yet
IXL For IEP Progress Monitoring Implementation Guide
3 pages
Verification and Validation CMMI
No ratings yet
Verification and Validation CMMI
23 pages
Business Document Writing Technique
No ratings yet
Business Document Writing Technique
42 pages
GEA HRT
No ratings yet
GEA HRT
25 pages
Rizal Reviewer (Finals)
No ratings yet
Rizal Reviewer (Finals)
35 pages
Deming Juran Crosby
No ratings yet
Deming Juran Crosby
9 pages
2424 Fort Worth Star-Telegram 1908-08-30 11
No ratings yet
2424 Fort Worth Star-Telegram 1908-08-30 11
1 page
A Comparative Study Between Direct Torque Control
No ratings yet
A Comparative Study Between Direct Torque Control
9 pages
Polaris Indiaxculture
No ratings yet
Polaris Indiaxculture
21 pages
pg119 C Accum
No ratings yet
pg119 C Accum
24 pages
Advanced Soil Mechanics Assignment 2018
No ratings yet
Advanced Soil Mechanics Assignment 2018
43 pages
Department: Mathematics S. Y. B.Sc. Syllabus
No ratings yet
Department: Mathematics S. Y. B.Sc. Syllabus
25 pages
ERG-TAX 3.1-VAT Exempt Transactions
No ratings yet
ERG-TAX 3.1-VAT Exempt Transactions
7 pages
Indian Rayon Finance
No ratings yet
Indian Rayon Finance
72 pages
General Biology 2: 2 Quarter - Module 2
No ratings yet
General Biology 2: 2 Quarter - Module 2
2 pages
Assignment Mkt243-Localit Bakery (Final)
No ratings yet
Assignment Mkt243-Localit Bakery (Final)
25 pages
Sample Question Paper-Mid Term
No ratings yet
Sample Question Paper-Mid Term
1 page