0% found this document useful (0 votes)
11 views

A graph convolutional encoder and multi-head attention decoder network for TSP via reinforcement learning

Uploaded by

illusion1asd
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

A graph convolutional encoder and multi-head attention decoder network for TSP via reinforcement learning

Uploaded by

illusion1asd
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Engineering Applications of Artificial Intelligence 112 (2022) 104848

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence


journal homepage: www.elsevier.com/locate/engappai

A graph convolutional encoder and multi-head attention decoder network for


TSP via reinforcement learning
Jia Luo a , Chaofeng Li a ,∗, Qinqin Fan a , Yuxin Liu b
a Institute of Logistics Science and Engineering, Shanghai Maritime University, 200135, Shanghai, China
b
College of Information Engineering, Shanghai Maritime University, 200135, Shanghai, China

ARTICLE INFO ABSTRACT


Keywords: For the traveling salesman problem (TSP), it is usually hard to find a high-quality solution in polynomial time.
TSP In the last two years, graph neural networks emerge as a promising technique for TSP. However, most related
Graph convolutional network learning-based methods do not make full use of the hierarchical features; thereby, resulting in relatively-
Attention mechanism
low performance. Furthermore, the decoder in those methods only generates single permutation and needs
Deep reinforcement learning
additional search strategies to improve the permutation, which leads to more computing time. In this work,
we propose a novel graph convolutional encoder and multi-head attention decoder network (GCE-MAD Net)
to fix the two drawbacks. The graph convolutional encoder realizes to aggregate neighborhood information
through updated edge features and extract hierarchical graph features from all graph convolutional layers. The
multi-head attention decoder takes the first and last selected node embeddings and fused graph embeddings as
input to generate probability distributions of selecting next unvisited node in order to consider global features.
The GCE-MAD Net further allows to choose several nodes at each time step and generate a permutations
pool after decoding to increase diversity of solution space. To assess the performance of GCE-MAD Net, we
conduct experiments with randomly generated instances. The simulation results show the proposed GCE-MAD
Net outperforms the traditional heuristics methods and existing learning-based algorithms on all evaluation
metrics. Especially, when encountering large scale problem instances, the small scale pretrained GCE-MAD Net
can get much better solutions than CPLEX solver with less time.

1. Introduction exponential complexity in the worst case. For some specific problems,
approximation methods (Williamson and Shmoys, 2011) can find sub-
Combinatorial Optimization (CO) problems have always gained optimal solutions with probable worst-case guarantees in polynomial
widespread attention in applied mathematics and operations research, time, but still be of poor approximation ratios (Rego et al., 2011).
and exit in many real-life industries such as manufacturing, supply Although, heuristics can find satisfactory results within reasonable
chain management, urban transportation, and lately in drone routing computational time, they lack theoretical guarantee on the solution
(Davendra, 2010; MirHassani and Habibi, 2013; Huang et al., 2020; quality, require substantial trial-and-error and highly depend on the
Tran et al., 2020). Although wide research papers present new ap-
intuition and experience of human experts to improve solution quality
proaches to this field, it is still a challenge to obtain satisfactory results
(Khan and Maiti, 2019; Pandiri and Singh, 2019; Ebadinezhad, 2020;
due to the NP-hardness of those CO problems especially in practical
Al-Gaphari et al., 2021; Saji and Barkatou, 2021).
application scenarios (Paschos, 2014). The Traveling Salesman Problem
To make a better trade-off between a good quality solution and a
(TSP) is among the most extensively solved CO problems in practice,
and has been studied for its simple problem description (Hromkovič, brief solving time to solve TSP, the learning-based methods, have been
2013; Osaba et al., 2020). The state-of-the-art methodologies to TSP investigated and achieve competitive performance to the above non-
could be classified into exact methods, approximation methods, and learning-based methods (Bengio et al., 2021). The first challenge is
heuristics methods that either require too much time to compute or to introduce learning-based algorithm for TSP. Vinyals et al. (2015b),
not mathematically well defined. Exact methods can find the optimal for the first time, proposed a Ptr-Net for TSP with Recurrent Neu-
solution under the theoretical guarantee, e.g., branch-and-bound (B&B) ral Networks (RNNs) which is trained by supervised learning and
framework (Wang et al., 2012; Subramanyam and Gounaris, 2016; Kin- achieved significant improvement over no-learning-based methods on
able et al., 2017), but poorly on large-scale routing problems for their computing-time. However, it is hard to obtain the label dada when the

∗ Corresponding author.
E-mail address: [email protected] (C. Li).

https://doi.org/10.1016/j.engappai.2022.104848
Received 18 September 2021; Received in revised form 20 January 2022; Accepted 22 March 2022
Available online 7 April 2022
0952-1976/© 2022 Elsevier Ltd. All rights reserved.
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

number of nodes becomes large. To ease the difficulty of training the Section 2. Section 3 gives the definition of TSP and the Markov Decision
model without label data, Bello et al. (2016) further applied reinforce- Process of the GCE-MAD Net. Section 4 introduces the proposed GCE-
ment learning to train the Ptr-Net for TSP. Although RL pretraining MAD Ne in detail. Section 5 introduces the training method of the
updated the model parameters with the actor–critic algorithm (Mnih proposed deep reinforcement learning method. We evaluate GCE-MAD
et al., 2015), it failed to utilize the graph-structured features of TSP Net on TSP instances and compare it against state-of-the-art machine
instances which can be embedded in node representations for different learning methods, optimal solver CPLEX and traditional heuristics in
downstream tasks by graph embedding or network embedding tech- Section 6. Finally, conclusions and prospects are listed in Section 7.
niques (Goyal and Ferrara, 2018). As a result, the pretrained model
cannot make full use of node features. 2. Related work
What is more, same or similar actions would be taken according to
the policy at decoding steps. Multiple decoders can generate different Existing methods for TSP mainly include exact algorithms, ap-
subsequences, which contributes to better complete solution. However, proximation methods, and heuristics methods. Those methods can be
the existing learning-based methods, e.g. AM (Kool et al., 2018), S2V- concluded as model-based (Wu et al., 2019; Ali et al., 2020; Al-Gaphari
DQN (Dai et al., 2017), Ptr-Net (Vinyals et al., 2015b), neglect to et al., 2021; Kanna et al., 2021; Saji and Barkatou, 2021; Wang and
pursue more sequences when executing decoding strategy. Although Han, 2021) methods which need to build MIP model first and only fit
Joshi et al. (2019) proposed a model which can output TSP solutions to a specific instance. The details of those model-based methods for TSP
in one shot, but it needed additional procedures, such as beam search, are summarized in Davendra (2010). Here, we focus on the emerging
to generate reasonable solutions when the period of training model learning-based methods (Bengio et al., 2021; Li et al., 2021; Talbi,
is finished. Those additional procedures are used to find the optimal 2021) which have achieved dramatic advantages against conventional
solution based on the same pretrained model, which will also reduce model-based methods in solution quality and computing-time.
the diversity of the solution space. Recent success of applying deep learning to solve CO problems can
To tackle the issues and limitations above, we propose a novel graph be traced back to the pointer network (Ptr-Net) (Vinyals et al., 2015b),
convolutional encoder and multi-head attention decoder network (GCE- it took three challenging CO problems as sequence-to-sequence prob-
MAD Net) by extracting the hierarchical features from the original TSP lems, and overcame a drawback that output length depends on input by
graph input and decoding multiple sequences to increase the diversity a pointer. Ptr-Net is a variant of attention mechanism and uses attention
of solution space. The encoder based on GCN with node and edge as a probability distribution called ‘pointer’ to select an element of the
features as input. The node features are 2-dimensional city coordinates input sequence as the output. Ptr-Net is trained by supervise learning
and edge features are binary elements about any two cities connecting and the ground-truth output permutations are given by the Concorde
or not. The outputs of encoder are then passed into a decoder using solver. Because Ptr-Net is sensitive and expensive to the quality of
attention mechanism (Vaswani et al., 2017) to predict the probabil- label data, Bello et al. (2016) created an actor–critic reinforcement
ity distribution of unselected nodes. The multiple decoders scheme learning based algorithm, in which Ptr-Net is the actor network, three
is utilized to increase diversity of solution space at each timestep, other network modules consist of the Critic network. Although, Ptr-Net
specifically, each decoder generates a TSP solution and the optimal architecture can learn a good solution for CO problems, it does not
solution is selected from those solutions. The entire encoder–decoder reflect graph structure of CO problems. The original network (Vinyals
network is trained by an improved reinforce learning (Williams, 1992) et al., 2015b) is designed for NLP, not for TSP, so there is a limitation
algorithm. which neglects the permutation invariance of the input cities. The work
We propose a novel graph convolutional encoder and multi-head Nazari (2018) presented a permutation-invariance encoder to let the
attention decoder network (GCE-MAD Net) for TSP. The contributions network learn the input order invariance. Kool et al. (2018) did not
of this work are as follows: use positional encoding in the Transformer (Vaswani et al., 2017), and
(1) We propose a graph convolutional network as an encoder to produced resulting node embeddings which were invariant to the input
aggregate neighbor features of each node. The node and edge features order.
affect each other. The relative weight between two neighboring nodes is Considering no unique representation of a TSP graph, Graph Neu-
computed by the edge features, and edge features are updated through ral Networks (GNNs) have potential to play the role as an encoder
two connected node features. Furthermore, shallow features from the because of their permutation-invariance and sparsity-awareness (Wu
original graph input are obtained through residual block, which results et al., 2020; Zhou, 2020). Nazaria et al. encodes CO problems by a
in each node can aggregate features from all graph convolutional structure2vec graph embedding network and constructs solutions incre-
layers. mentally (Dai et al., 2017). Replacing structure2vec graph embedding
(2) We propose a multiple decoders strategy which can generate model, Graph convolutional networks (GCNs) (Duvenaud et al., 2015;
several complete sequences at once. The probability distribution of Defferrard et al., 2016; Gehring et al., 2017; Marcheggiani and Titov,
selecting next node is calculated through the multi-head attention 2017; Chen et al., 2018; Li et al., 2018) play an important role to
mechanism-based decoder, which take the graph features, the first encode node representations for estimating the likelihood of whether
selected node features and the last selected node features as input. a node is part of optimal solution. Deudon et al. (2018) took the graph
Furthermore, the multiple decoders scheme realizes to select several attention network as the encoder to aggregate neighbors features of
nodes at each time step to produce several complete sequences, which each node, and utilized the same PN for selecting the node inserted
increases diversity of solution space. into the subtour. The Sinkhorn Policy Gradient (SPG) algorithm were
(3) We propose a tailored deep reinforcement learning-based al- proposed to learn policies on permutation matrices. One sinkhorn layer
gorithm is designed to train the GCE-MAD Net. During training, the followed the GRU Cho et al. (2014) was used to produce continuous
baseline is fixed until a stronger baseline appeared. New baseline is relaxations of permutation matrices.
the minimal cost of several solutions generated by the multiple de- Different with the above deep reinforcement learning methods,
coders scheme at each epoch. This baseline updated policy can ensure some researches achieved the parameters of the GCN encoder to gen-
GCE-MAD Net is always improved over itself. erate node embeddings for downstream task, i.e., large number of
(4) Our experiments show the GCE-MAD Net is efficient and has ground-truth output permutations were needed in advance to optimize
stronger generalization ability than the state-of-the-art learning-based the parameters. The graph learning network (GLN) directly learned the
algorithms. patterns of generated TSP solutions, in a certain sense, this model was
This paper is organized as follows. First, related work about tackling trained by some ground-truth circles (Nammouchi et al., 2020). Joshi
CO problems by machine learning based algorithms is discussed in et al. (2019) took a GCN as the encoder to aggregate neighbors features,

2
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

and utilized a MLP to output the heatmap of possible connected edges the query with all nodes, and it can be used to obtain the probability
of each node at one shot. Although two methods were trained by the distribution of picking next unvisited node by softmax function. It is
supervised learning way, additional strategies, e.g. beam search, for notable, the multiple decoder strategy in this paper allows to select
searching optimal solution still increased the computing-time. several nodes for different permutations at a time, and the optimal
The above learning-based have achieved competitive performance solution comes from shoes permutations.
on solving TSP, but most of them seldom reflects hierarchical fea-
tures from the original graph input. Motivated by this, a residual 4.1. The encoder
graph convolutional network is used to exact and fuse features from
all layers. Furthermore, single decoder in the existing learning-based Our encoder is based on GCN which exploits neural network op-
methods is easy to make similar decision of choosing next node, so, erations over graph-structured data. For TSP instances, the input of
we design a multiple decoders scheme to gain various probability the encoder includes two parts: node and edge features. Node feature
distributions at each time step, which improves solution quality and 𝐱𝑖 ∈ [0, 1]2 is a vector representing 2-dimensional coordinate of the 𝑖th
enhance generalization ability. node. Meanwhile, edge feature is a binary element about node 𝑖 and 𝑗
connecting or not and is defined by Eq. (4).
3. Problem definition {
1, if node 𝑖 connects with node 𝑗
𝑒𝑖𝑗 = (4)
0, otherwise
In this paper, we focus on solving any random instance 𝑠 which is
the symmetric two-dimensional Euclidean TSP and formulated as an Because pairwise computation for all nodes is intractable when gen-
undirected graph 𝐺 (𝑉 , 𝐸). In the graph, 𝑉 = {1, 2, … , 𝑛} (with |𝑉 | = 𝑛) eralizing model to large-scale problem instances, k-nearest neighbors is
{ }
represents a set of nodes, and 𝐸 = 𝑒11 , 𝑒12 , … , 𝑒𝑛𝑛 denotes a set of adopted to make input graph sparse. Specifically, the neighbors of each
edges, 𝑒𝑖𝑗 means relationship between node 𝑖 and 𝑗. 𝐱𝑖 ∈ R2 is a vector node 𝑁𝑒 are computed by Eq. (5) which realizes to diffuse information
representing coordinate of node 𝑖. Given coordinates of all nodes in with the same message steps in different graph sizes.
( )
the graph, 𝐗 = 𝐱1 , … , 𝐱𝑛 , we wish to find the optimal permutation
( ) 𝑁𝑒 = 𝑛 × 𝑘% (5)
𝝅 = 𝜋1 , … , 𝜋𝑛 with minimal tour length 𝑅. The elements 𝜋𝑡 ∈ 𝑉 in
permutation 𝝅 selected at each time step 𝑡 ∈ {1, … , 𝑛} are the orders where, 𝑘 is a hyperparameter that is a multiple of 10.
of those nodes in the graph. Feasible permutation 𝝅 must satisfy two The input node and edge features are firstly respectively embedded
conditions: (1) each node is served exactly once; (2) all nodes can in ℎ dimensional features through two fully connected layers, and this
only be served once, 𝜋𝑡 ≠ 𝜋𝑡′ , ∀𝑡 ≠ 𝑡′ . The tour length of the feasible operation is called ‘primitive embedding’ and represented by Eqs. (6)
permutation 𝝅 is defined as Eq. (1). and (7).

‖ ‖ ∑‖ 𝑛
‖ 𝐡0𝑖 = 𝐀0 𝐱𝑖 + 𝐛0 , ∀𝑖 ∈ {1, … , 𝑛} (6)
𝑅 (𝝅) = ‖𝐱𝜋1 − 𝐱𝜋𝑛 ‖ + ‖𝐱𝜋𝑡 − 𝐱𝜋𝑡−1 ‖ (1)
‖ ‖2 ‖ ‖2
𝑡=1

Given a random TSP instance 𝑠, a stochastic policy 𝑝𝜽 (𝝅|𝑠) used to 𝐞0𝑖𝑗 = 𝐀1 𝐞𝑖𝑗 + 𝐛1 , ∀𝑗 ∈ {1, … , 𝑛} (7)
( )
generate a permutation 𝝅 = 𝜋1 , … , 𝜋𝑛 is defined as: where, 𝐀0 ∈ Rℎ×2
and 𝐀1 ∈ Rℎ
represent learnable weight parameters,

𝑛
( ) 𝐛0 ∈ Rℎ and 𝐛1 ∈ Rℎ are defined as the bias parameters. The primitive
𝑝𝜽 (𝝅|𝑠) = 𝑝𝜽 𝜋𝑡 |𝑠, 𝝅 1 ∶ 𝑡−1 (2) node representation 𝐡0𝑖 and edge representation 𝐞0𝑖𝑗 are passed into
𝑡=1
the first graph convolutional layer of the encoder. In the remaining
here, 𝜽 represents parameters to be learned. section, 𝐡𝑙𝑖 and 𝐞𝑙𝑖𝑗 denote node and edge representations of graph
The iterative process to select node is modeled as the following convolutional layer 𝑙 ∈ {1, … , 𝐿} in the encoder, respectively, and both
Markov Decision Process (MDP). representations are alternatively updated as Fig. 2.
(1) Observation 𝝅 1 ∶ 𝑡 represents the generated subtour 𝝅 1 ∶ 𝑡 = Fig. 2 describes a single graph convolutional layer to update node
( )
𝜋1 , … , 𝜋𝑡 at time step 𝑡 ∈ {1, … , 𝑛}. representations. The graph convolutional layer is regarded as a message
(2) Action 𝑎𝑡 defines one node to be inserted in the subtour at time passing process that information can be passed from one node to one of
step 𝑡 ∈ {1, … , 𝑛}. its neighbors with a certain probability. The probabilities are computed
( )
(3) Transition function 𝑙 𝝅 1 ∶ 𝑡 , 𝑎𝑡 converts observation 𝝅 1 ∶ 𝑡 to by the additional edge representations and they are summed up to one
( )
𝝅 1 ∶ 𝑡+1 , i.e., 𝝅 1 ∶ 𝑡+1 = 𝑙 𝝅 1 ∶ 𝑡 , 𝑎𝑡 . over all neighbors of node 𝑖. Residual connection is also applied to
(4) Reward function is defined as Eq. (3). memorize information over each graph convolutional layer (Bresson
( ) ( ) and Laurent, 2017). As a result, the next graph convolutional layer node
𝑟𝑡 = 𝑟 𝝅 1 ∶ 𝑡 , 𝑎𝑡 , 𝝅 1 ∶ 𝑡+1 = 𝑅 𝝅 1 ∶ 𝑡 (3)
states derived by Eqs. (8) and (9). 𝐡0𝑖 and 𝐞0𝑖𝑗 are the output of ‘primitive
embedding’, which are calculated by Eqs. (6) and (7).
4. Proposed GCE-MAD Net ( ( ))

The details of proposed GCE-MAD Net are explained in the following 𝐡𝑙𝑖 = 𝐡𝑙−1
𝑖 + ReLU BN 𝐖 𝑙−1 𝑙−1
1
𝐡 𝑖 + 𝜼 𝑙−1
𝑖𝑗 ⊙ 𝐖 𝑙−1 𝑙−1
2
𝐡𝑗 ,
𝑗∈N(𝑖)
subsections in terms of encoder architecture, decoder architecture. As ( )
visualized in Fig. 1, the architecture of the GCE-MAD Net follows the so- 𝜎 𝐞𝑙−1
𝑖𝑗
called encoder–decoder paradigm. In the figure, the GCN block consists 𝜼𝑙−1
𝑖𝑗 = (∑ ) (8)
𝑙−1
of several graph convolutional layers, which get stacked on top of each
𝜎 𝑗∈N(𝑖) 𝐞𝑖𝑗

other. After aggregating node features by the last graph convolutional


( ( ))
layer, a mean pooling strategy is used to generate graph-level repre- 𝐞𝑙𝑖𝑗 = 𝐞𝑙−1 + ReLU BN 𝐖𝑙−1 𝐡𝑙−1 + 𝐖𝑙−1 𝐡𝑙−1 + 𝐖𝑙−1 𝐞𝑙−1 , 𝑗 ∈ N (𝑖) (9)
𝑖𝑗 3 𝑖 4 𝑗 5 𝑖𝑗
sentation based on node embeddings. At each time step, in multi-head
attention layer, the tailored query come from graph-level representa- where, 𝐖𝑙−1
1
, 𝐖𝑙−1
2
, 𝐖𝑙−1
3
, 𝐖𝑙−1
4
and 𝐖𝑙−1
5
are Rℎ×ℎ weight matrices to
tion and node representations of the first selected node and the last be learned, 𝑙−1
𝜼𝑖𝑗 is a weight function to compute the relative weight
selected node. The keys and values come from node representations. between two neighboring nodes, 𝜎 (⋅) is the sigmoid function, ReLU (⋅)
The query in single-head attention layer is the output of the multi- is the rectified linear unit, BN (⋅) stands for batch normalization.
head attention layer, the keys also come from node representations. Because stacking multiple graph convolutional layers can capture
The output of the single-head attention layer is the compatibility of long-range dependencies among the graph-structured data, the encoder

3
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Fig. 1. The framework of proposed GCE-MAD Net.

adopts more than one graph convolutional layers and its structure is timestep 𝑡 ∈ {1, … , 𝑇 } efficiently, a tailored context node is designed
depicted as Fig. 3. Firstly, the primitive embedding module receives to gather messages with node representations 𝐡𝐿 𝑚
𝑖 . The context vector 𝐜𝑡
node features 𝐱𝑖 ∈ R2 and edge features 𝑒𝑖𝑗 ∈ R and projects them can be view as the embedding of the tailored context node and defined
from low dimension to ℎ dimension. Next, those ℎ dimensional rep- formally as follows:
resentations are passed to graph convolutional layers to update node ( )
⎧ concat 𝐡, 𝐯𝑚 , 𝐯𝑚 , 𝑡 = 1
representations. Specifically, the 𝑙th graph convolutional layer takes ⎪
𝐜𝑚 ( 1 2 ) (10)
the output of (𝑙 − 1)th graph convolutional layer as input. Finally, 𝑡 =⎨ 𝐿 𝐿
the output of the 𝐿th graph convolutional layer is the output of the ⎪ concat 𝐡, 𝐡𝝅 𝑚 , 𝐡𝝅 𝑚 , 𝑡 > 1
⎩ 𝑡−1 1
encoder, and is the updated node representations 𝐡𝐿
𝑖 with ℎ dimension.
here concat (⋅) is the horizontal concatenation operator. The context
vector 𝐜𝑚
𝑡 ∈R
3⋅ℎ is calculated through concatenating the mean of node
4.2. The decoder ∑
embeddings 𝐡 = 1𝑛 𝑛𝑖=1 𝐡𝐿 𝐿
𝑖 , embedding of the last selected node 𝐡𝝅 𝑚
𝑡−1
The multiple decoders scheme represents more than one decoder and embedding the first selected node 𝐡𝐿
𝝅𝑚
. For 𝑡 = 1, we use learned
1
with identical structures but unshared parameters, and the decod- parameters 𝐯𝑚 ∈ Rℎ and 𝐯𝑚 ∈ Rℎ as input placeholders of each decoder.
1 2
ing procedure of single decoder follows Kool et al. (2018). Let 𝑚 ∈ Following multi-head attention mechanism (Vaswani et al., 2017),
{1, … , 𝑀} be the index of decoder (the superscript (m) in following a query and a set of key–value pairs are needed to map the output.
work means the index of decoder), every decoder constructs a solution
𝝅 𝑚 sequentially. To produce probabilities of visiting each valid node at 𝐪𝑚 𝑚 𝑚 𝑚 𝑚 𝑚 𝐿 𝑚 𝐿
𝑡 , 𝐤𝑖 , 𝐯𝑖 = 𝐖𝑄 𝐜𝑡 , 𝐖𝐾 𝐡𝑖 , 𝐖𝑉 𝐡𝑖 (11)

4
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Fig. 2. Updating procedure of node representations (Graph Convolutional Layer).

Fig. 3. The structure of the encoder.

here, parameters 𝐖𝑚 𝑄
is a R𝑘×3ℎ matrix and 𝐖𝑚 𝐾
is a R𝑘×ℎ matrix, 𝑘 = 𝐹ℎ , As we take the multi-head attention mechanism, let we denote the
𝐹 is the number of heads, 𝐖𝑉 is a R𝑚 𝑣×ℎ matrix. In this work, keys above result vector by 𝐚𝑚,𝑓
𝑡 for 𝑓 ∈ {1, 2, … , 𝐹 }. Using learnable matrix
𝐤𝑚 𝑘 𝑚 𝑣
𝑖 ∈ R and values 𝐯𝑖 ∈ R keep unchanged during decoding period,
𝐖𝑚,𝑓
𝑂
∈ Rℎ×𝑣 can project back to a vector 𝐜𝑡𝑚∗ ∈ Rℎ . The final multi-head
so, keys 𝐤𝑚 𝑚
𝑖 and values 𝐯𝑖 of each decoder are only computed as Eq. (11)
attention value for context node is calculated as Eq. (14).
once, but iterate a single query 𝐪𝑚 𝑡 from the context vector at every ∑
𝐹
timestep 𝑡. 𝐜𝑚∗
𝑡 = 𝐖𝑚,𝑓
𝑂
𝐚𝑚,𝑓
𝑡 (14)
The compatibility 𝑢𝑚 𝑚
𝑗,𝑡 ∈ R of query 𝐪𝑡 with all nodes are computed
𝑓 =1
according to Eq. (12). For TSP, masking nodes when it is visited, Finally, to compute probability distribution, a layer with single
i.e., setting 𝑢𝑚
𝑗,𝑡 = −∞ at time step 𝑡: attention head is added on the layer with multi-heads. Following Bello
et al. (2016), we clip the compatibilities within [−𝐷, 𝐷] using tanh:
⎧ (𝐪𝑚 )𝑇 𝐤𝑚
⎪ 𝑡

𝑗
, if 𝑗 ≠ 𝜋𝑡𝑚′ , ∀𝑡 ≠ 𝑡′ ⎧ ( ( 𝑚 )𝑇 𝑚 )
𝑢𝑚 = ⎨ 𝑘 (12) 𝐫𝑡 𝐠𝑗
𝑗,𝑡
⎪ −∞, otherwise ⎪ 𝐷∗ tanh √ , if 𝑗 ≠ 𝜋𝑡𝑚′ , ∀𝑡 ≠ 𝑡′
⎩ 𝑢𝑚
𝑗,𝑡 =⎨ 𝑘 (15)
⎪ −∞, otherwise

Next, we can get the vector 𝐚𝑚
𝑡 through convex combination of
messages 𝐯𝑚 𝐫𝑡𝑚 , 𝐠𝑚 𝑚 𝑚∗ 𝑚 𝐿
𝑡 = 𝐖𝑅 𝐜𝑡 , 𝐖𝐺 𝐡𝑗 (16)
𝑗 at timestep 𝑡:
∑ ( ) here, parameters 𝐖𝑚 and 𝐖𝑚 are Rℎ×ℎ matrices due to a single head,
𝐚𝑚
𝑡 = softmax 𝑢𝑚 𝑚
𝑗,𝑡 𝐯𝑗 (13) 𝑅 𝐺
𝑗
i.e., 𝑘 = 𝐹ℎ and 𝐹 = 1.

5
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Fig. 4. The decoding procedure for TSP.

Fig. 5. The structure of the decoder.

The compatibilities computed through Eq. (15) as unnormalized log- The model with the best set of parameters among previous epochs is
probabilities is used to compute the probability vector 𝐩𝑚 by softmax used as the baseline model to greedily decode the result as the baseline
function. The elements in 𝐩𝑚 are defined by Eq. (17). 𝑏 (𝑠). The value of baseline 𝑏 (𝑠) in this work is the minimal cost defined
( 𝑚 ) ( ) as Eq. (20).
𝑝𝑚 𝑚 𝑚
𝑖, t = 𝑝𝜃 𝜋𝑡 = 𝑖|𝑠, 𝝅 1 ∶ 𝑡−1 = softmax 𝑢𝑗,𝑡 (17) ( )
𝑏 (𝑠) = min 𝑅 𝝅 𝑚 = 𝜋1𝑚 , … , 𝜋𝑛𝑚 |𝑠 (20)
𝑚
The process of single decoder to pick nodes is described as Fig. 4. At ( ( ))
each time step 𝑡, the multi-head attention layer takes the 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 vector 𝜋𝑡𝑚 = argmax 𝑝𝜃 𝜋𝑡𝑚 = 𝑖|𝑠, 𝝅 𝑚 1 ∶ 𝑡−1
(21)
𝑖
𝐜𝑚 𝐿
𝑡 and node embeddings 𝐡𝑖 as input, and outputs the final multi-head
𝑚∗ All steps are listed as Algorithm 1.
attention value 𝐜𝑡 for the 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 node. Then, the single-head attention
layer takes the vector 𝐜𝑚∗ 𝐿
𝑡 and node embeddings 𝐡𝑖 as input to generate 6. Experiments
the compatibilities. The probability distribution of selecting the next
valid node is derived by Eq. (17).
Controlled experiments are designed to probe the performance of
Fig. 5 shows the structure of the decoder, ‘Decoder’ module in it has
the GCE-MAD Net on solving 2D Euclidean TSP instances.
been described in Fig. 4. Each decoder can generate a permutation, the
final solution is selected from those permutations. 6.1. Evaluation metrics

5. Training method Five typical metrics are used to evaluate the performance of GCE-
MAD Net, they are defined as follows.
(1) Average predicted tour length. The average predicted tour
For random input instance 𝑠, each decoder individually samples a
length over validation and test instances, computed as Eq. (22)
trajectory 𝝅 𝑚 to get separate REINFORCE loss with the same greedy
1 ∑
rollout baseline. For training the model, we define the sum of the 𝑁

expectation of the cost 𝑅 (𝝅 𝑚 ) (tour length computed as Eq. (1)) as the 𝑅= 𝑅 (22)
𝑁 𝑖=1 𝑖
loss function presented by Eq. (18).
∑ [ ] (2) Win rate. The win rate 𝑟𝑤𝑖𝑛 is the proportion of winning cases
𝑙𝑜𝑠𝑠 (𝜽|𝑠) = 𝐸𝑝𝜃 (𝝅|𝑠) 𝑅 (𝝅 𝑚 ) (18) over the total number of tested cases 𝑁, as defined at Eq. (23). A case is
𝑚 considered won when its tour length computed by a method is shorter
where, parameters 𝜽 is optimized by gradient descent using the REIN- than the tour length computed by the CPLEX.
FORCE algorithm with rollout baseline 𝑏 (𝑠).
𝑟𝑤𝑖𝑛 = 𝑁𝑤𝑖𝑛 ∕𝑁 (23)
∑ [ ]
∇𝜽 𝑙𝑜𝑠𝑠 (𝜽|𝑠) = 𝐸𝑝𝜃 (𝝅|𝑠) (𝑅 (𝝅 𝑚 ) − 𝑏 (𝑠)) ∇𝜽 log 𝑝𝜃 (𝝅 𝑚 |𝑠) (19)
where 𝑁𝑤𝑖𝑛 is the number of tested winning cases.
𝑚

6
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

(3) Fail rate. The fail rate 𝑟𝑓 𝑎𝑖𝑙 is the proportion of failed cases over Table 1
The details of data set.
the total number of tested cases 𝑁, as defined at Eq. (24). A case is
considered failed when its tour length computed by a method is longer Data Name Number

than the tour length computed by the CPLEX. TSP20_Training 1,280,000


Training set
TSP50_Training 1,280,000
𝑟𝑓 𝑎𝑖𝑙 = 𝑁𝑓 𝑎𝑖𝑙 ∕𝑁 (24) TSP20_Validation 1280
Validation set
TSP50_Validation 1280
where 𝑁𝑓 𝑎𝑖𝑙 is the number of tested winning cases.
TSP20_Testing 1280
(4) Optimality gap. The average optimal gap between optimization Testing set TSP50_Testing 1280
solver CPLEX and our model is computed as Eq. (25). TSP100_Testing 1280
( )
1 ∑ 𝑅𝑖
𝑁 ′
𝐺𝑎𝑝 = −1 (25)
𝑁 𝑖=1 𝑅𝑖
24 GB of memory to complete the training, and construct the model
here, 𝑁 denotes the number of instances; 𝑅′𝑖 is the optimal solution by Torch 1.7.0, and implement the code by Python 3.6.
delivered by our model; 𝑅𝑖 means the IBM CPLEX’s optimal result. Following Kool et al. (2018) and Joshi et al. (2019), our model is
(5) Evaluation time. The inference time of our model shown as trained with the end-to-end manner using Adam, with constant learning
Tables 2–4 is run on one single Nvidia Geforce 3090 GPU. The time of rate of 1 × 10−4 , batch size of 512, and 100 epochs. (We initialize
√ √ the
)
CPLEX and other heuristics solutions are gained on Intel(R) Core (TM) weights of our model using the uniform initializer 𝑈 −1∕ 𝑑, 1∕ 𝑑 ,
i9-10900 CPU @ 2.80 GHz. where 𝑑 is the input dimension. The number of graph convolutional
layers is 3, node and edge embedding dimension are both set 128, the
6.2. Dataset generation graph diameter of each node equals 50%, which lead to a good trade-off
between solution quality and computational complexity. The number
Training data. As described in our problem statement at Section 3, of heads in the multi-head attention decoder is 8, and the number of
the 2D coordinates of each node is the only information should be decoders equals 5. Finally, significance in the t-test is 0.05.
known in advance. Following the way of generating dataset in Kool For the limited GPU memory, the datasets generated on the fly of
et al. (2018), for each node, we generate its coordinates by sampling two problem sizes (20, 50) are used to train the parameters of the GCE-
uniformly at random in the unit square [0, 1] × [0, 1], i.e., problem MAD Net, which generates two parameter sets of the GCE-MAD Net. For
instances consist of a set of 2D coordinates. It is notable that the value evaluating their performance, the test datasets of three problem sizes
of coordinate retains 16 decimal places, which leads to 1016 × 1016 (20, 50 and 100) are generated. For readability, the details of those
different nodes in the unit square. The parameters of the GCE-MAD datasets in this work are listed in Table 1. The pretrained models on
Net are trained for 100 epochs on training dataset which is generated TSP20_Training and TSP50_Training sets are named Model_TSP20 and
on the fly. In each training epoch, 2500 batches of 512 instances are Model_TSP50 respectively.
processed.
Validation and test data. Validation and test instances are generated 6.4. Sensitivity analysis
separately in advance on the same data generator.
In this section, we analyze sensitivity of four parameters that could
6.3. Training setting greatly influence the proposed model, i.e., the number of decoders, the
number of attention heads in each decoder, the embedding dimension
The proposed approach is implemented using Pytorch with CUDA in GCN block, and the number of layers in the encoder. The line plots
acceleration, specifically, we use a Nvidia Geforce 3090 GPU with in Fig. 6 shows changing conditions of average tour length when we

7
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Table 2
Comparison on instances with 20 customers.
Method 𝑅̄ 𝑟𝑤𝑖𝑛 𝑟𝑓 𝑎𝑖𝑙 𝐺𝑎𝑝 𝑡 (s)
MIP CPLEX 3.89 + −0.32 – – 0 18.12
Nearest neighbor 4.52 + −0.03 2.81% 97.19% 16.13% 2.78
Nearest insertion 4.34 + −0.02 2.03% 97.97% 12.03% 0
Heuristic
Random insertion 4.02 + −0.02 24.77% 75.23% 3.32% 0
Farthest insertion 3.94 + −0.02 39.22% 60.78% 1.30% 0
AM 3.86 + −0.31 59.69% 26.40% −0.71% 0.07
Learning-based GCN 3.93 + −0.33 31.32% 61.88% 1.24% 0.09
algorithm GLN-TSP 3.85 – – – –
Our GCE-MAD Net 3.85 + −0.31 71.41% 12.73% −0.91% 0.23

Table 3
Comparison on instances with 50 customers.
Method 𝑅̄ 𝑟𝑤𝑖𝑛 𝑟𝑓 𝑎𝑖𝑙 𝐺𝑎𝑝 𝑡 (s)
MIP CPLEX 5.76 + −0.27 – – 0.00% 2.78
Nearest neighbor 7.00 + −0.03 0.08% 99.92% 21.50% 0
Nearest insertion 6.78 + −0.02 0 100% 17.84% 0
Heuristic
Random insertion 6.13 + −0.02 2.9% 97.1% 6.48% 0
Farthest insertion 6.01 + −0.02 9.92% 90.08% 4.36% 0.17
AM 5.78 + −0.28 39.38% 59.3% 0.35% 0.25
Learning-based GCN 6.08 + −0.32 19.92% 80.08% 4.86% –
algorithm GLN-TSP 5.85 – – – 0.70
Our GCE-MAD Net 5.73 + −0.27 58.75% 40.31% −0.40% 2.78

Table 4
Comparison on instances with 100 customers.
Method 𝑅̄ 𝑟𝑤𝑖𝑛 𝑟𝑓 𝑎𝑖𝑙 𝐺𝑎𝑝 𝑡 (s)
MIP CPLEX 9.01 + −0.73 – – 0.00% 121.17
Nearest neighbor 9.67 + −0.03 21.48% 78.52% 7.94% 2.68
Nearest insertion 9.45 + −0.02 23.59% 76.41% 5.54% 0.01
Heuristic
Random insertion 8.52 + −0.02 73.52% 26.48% −4.93% 0
Farthest insertion 8.35 + −0.02 81.80% 18.20% −6.77% 0
AM 8.14 + −0.28 93.60% 6.40% −9.19% 0.26
Learning-based GCN 8.78 + −0.34 57.81% 42.19% −1.98% 0.60
algorithm GLN-TSP – – – – –
Our GCE-MAD Net 8.04 + −0.27 96.33% 3.77% −10.27% 0.28

change those parameters in our model. As we aim to find the optimal and commonly-used solver CPLEX. The details on these approaches are
solution with minimal cost (tour length), the line plots should always introduced as following.
present downward trend. (1) Nearest neighbor
From Fig. 6(a), we find the dimension of embedding in the encoder The nearest neighbor heuristic initializes a path with a random
can greatly influence our mode. Useful information of customer features single node (we always start with the first node in the input). In each
in low dimensional space may be neglected, which can explain the iteration, the next node is selected as it is the nearest one to the end
improvement with increased embedding dimension. However, a little node of the current partial path. The selected in this iteration becomes
change happened when we set dimension from 128 to 256, because the new end node. Finally, after all nodes are selected, the end node
node and edge features of TSP are easy to be represent. should connect to the start node to form a tour.
Similarly, we assess the performance of the number of embed- (2) Nearest/random/farthest insertion
ding layers in the encoder, as Fig. 6(b). The improvement of in- The insertion heuristics initialize a tour with two nodes. In each
creased layers explains GCNs aggregate and pass message by stacking iteration, one node is selected to insert to the tour by some rules.
layers. Different rules generate different insertion heuristics. Let 𝑆 be the set of
Next, we retrain our model by setting the number of decoders in 1, nodes in the tour, 𝑑𝑖𝑗 represents the distance from node 𝑖 to 𝑗. Nearest
2, 3, 4 and 5. The performance curves are depicted in Fig. 6(c). It is insertion inserts node 𝑖 so that it can be nearest to any node in the
obviously that better solutions can be found with increased decoders, tour:
especially, when adding the headers from 1 to 2.
𝑖∗ = argmin min 𝑑𝑖𝑗 (26)
Finally, the evaluation curves of multiple attention heads are plotted 𝑖∉𝑆 𝑗∈𝑆

in Fig. 6(d). More attention heads allow context node to receive more Farthest insertion inserts node 𝑖 so that the distance from node 𝑖 to
types of messages from all nodes. Through those messages, we can the nearest node 𝑗 is maximal:
select the next node inserting to the optimal solution, so more heads
improve the optimal solution quality. 𝑖∗ = argmax min 𝑑𝑖𝑗 (27)
𝑖∉𝑆 𝑗∈𝑆

Random insertion inserts a node randomly.


6.5. Baselines
6.6. Comparison with state-of-the-art methods
The approaches selected for the performance comparison include
two state-of-the-art deep learning methods (Kool et al., 2018; Joshi Tables 2–4 show the predicted tour length of each method under
et al., 2019), two heuristics used to compare in Kool et al. (2018) different instance sizes and each method is run on the same test dataset.

8
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Fig. 6. Sensitivity analysis in GCE-MAD Net parameters.

For fair comparison, we take pretrained models from Kool et al. (2018) the red line. Fig. 7 (d), (e), and (f) are the comparison of three methods
and GCN (Joshi et al., 2019), and both models have the same 100 and the CPLEX when they are used to solve TSP50_Testing, the farthest
trail runs. The results of GLN-TSP are the same with the original paper insertion behaves badly, because as Fig. 7 (d) shows that almost all blue
(Nammouchi et al., 2020). As shown in Tables 2–4, the results on points are under the red line. In contrast, two learning-based methods
20 and 50 customers are obtained on Model_TSP20 and Model_TSP50 perform well, but the GCE-MAD Net outperforms the AM (Kool et al.,
respectively. Note that our result on 100 customers is also obtained on 2018). Fig. 7 (g), (h), and (i) are the comparison of three methods and
Model_TSP50. As we can see, our GCE-MAD Net can get the minimal the CPLEX when they are used to solve TSP100_Testing. The CPLEX
average tour length in test data among those classical heuristics and performs worse than three comparison methods on most cases, which
learning-based algorithms. It also outperforms the CPLEX, especially also demonstrates that the traditional solver CPLEX is powerless on
the performance on instances with 100 customers. large scale problems, and our GCE-MAD Net is a nice alternative.
For clearly comparing the performance of the heuristics methods
6.7. Impact of graph density
and the CPLEX on 1,280 test instances, except the indicator opti-
mality gap, we also give the value of the win rate 𝑟𝑤𝑖𝑛 and fail
In this paper, we use a fixed graph diameter strategy defined as
rate 𝑟𝑓 𝑎𝑖𝑙 . When solving TSP20_Testing and TSP50_Testing, our GCE-
Eq. (5) to realize different graph sizes with different graph density,
MAD Net can find better solutions on over 50% instances than the
i.e., the neighbors of each node in different graph sizes should be
CPLEX. It is notable that the GCE-MAD Net outperform the CPLEX
various. Fig. 8 shows the change with different graph density. The pre-
on near 100% TSP100_Testing instances, which demonstrates that our
trained Model_TSP20 and Model_TSP50 with different graph density are
GCE-MAD Net keeps a good competitive performance with increasing
tested on three hold-out datasets: TSP20_Testing, TSP50_Testing, and
problem complexity.
TSP100_Testing. From Fig. 8, as we can see, no matter the pretrained
Fig. 7 shows the situation that the farthest insertion, the AM (Kool Model_TSP20 or Model_TSP50, their performances are better with the
et al., 2018), and the GCE-MAD Net compare with the CPLEX on solving graph of a sparse 40%-nearest neighbors than the full graph, i.e., no
TSP20_Testing, TSP50_Testing, and TSP100_Testing. The coordinates of neighbors or too much neighbors will decrease the performance of the
a blue point represent the tour length computed by the three com- pretrained model.
parison methods and the tour length computed by the CPLEX. If the
blue point is under the red line (𝑦 = 𝑥), it represents that the CPLEX 6.8. Ablation study
finds a better solution than the comparison method, and other situation
can be understood in the same way. Fig. 7 (a), (b), and (c) are the In order to show the efficiency of the GCE-MAD Net, two inte-
comparison of three methods and the CPLEX when they are used to gral parts of this network need to be test: (1) the residual block in
solve TSP20_Testing, as we can see, our GCE-MAD Net find better the encoder, (2) the multi-decoder strategy. Three models with the
solution than the CPLEX on most cases, only a few blue points are under corresponding part absent are trained, and their performance on the

9
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Fig. 7. The comparison of three methods (the farthest insertion, the AM (Kool et al., 2018), and our GCE-MAD Net) with the CPLEX on TSP20_Testing (Fig. 7 (a), (b), and (c)),
TSP50_Testing (Fig. 7 (d), (e), and (f)), and TSP100_Testing (Fig. 7 (g), (h), and (i)).

Fig. 8. The impact of different graph diameter.

10
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Fig. 9. Visualization of random samples for ablation study.

11
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Table 5
Ablation investigation of residual block and multi-decoder.
Model 1st 2nd 3rd 4th
Network Residual block ✗ ✓ ✗ ✓
settings multi-decoder ✗ ✗ ✓ ✓
20 3.867 + −0.315 3.865 + −0.314 3.854 + −0.310 3.850 + −0.309
Average tour
50 5.941 + −0.316 5.929 + −0.313 5.852 + −0.297 5.832 + −0.292
length
100 9.055 + −0.392 9.028 + −0.415 8.641 + −0.333 8.620 + −0.336

same test dataset are used to compare to the complete model (the
GCE-MAD Net). Details of those four models are listed as Table 5.
It is notable that the four networks have the same graph diameter
𝑘 = 40%, and are pretrained on TSP20, then tested on TSP20, TSP50 and
TSP100. The baselines of those problem sizes are obtained by model
1st without residual block and multi-decoder, which are very poor
(the average tour length is 3.867 + −0.315, 5.941 + −0.316, 9.055
+ −0.392, respectively). Then, one of residual block or multi-decoder
is added to the baseline, resulting in 3.865 + −0.314, 5.929 + −0.313,
9.028 + −0.415 and 3.854 + −0.310, 5.852 + −0.297, 8.641 + −0.333
respectively (from model 2nd to 3rd in Table 5), which validates that
each component can efficiently improve the performance of the base-
line. Finally, we further add two components to the baseline, resulting
in 3.850 + −0.309, 5.832 + −0.292, 8.620 + −0.336 (model 4th in
Table 5). It can be concluded that two components perform better
than only one component. We also visualize random samples of three
problem sizes as Fig. 9. The 4th model (the GCE-MAD Net) performs
the best, which demonstrates that the effectiveness and benefits of the Fig. 10. The generalization performance.
residual block and multi-decoder strategy.

6.9. Generalization ability of GCE-MAD Net find optimal solutions in very little time even with variable sizes
customers.
In the real world, delivery requests may come from over 50 cus- In future research, extending the model to solve very large-scale TSP
tomers, but training on larger graphs from scratch is intractable and with a huge number of customers is of great interest. A more challenge
sample inefficient, so, small-scale pretrained model generates well on task is to propose deep reinforcement learning based algorithms to
large-scale problem instance is very important. We demonstrate the solve more realistic problems, e.g., TSP with time window, dynamic
generalization ability of the proposed model in Fig. 10. A hold-out test demands and so on.
out of 124,160 TSP instances, consist of 1280 instances each of TSP3,
TSP4, . . . , TSP100. Model_TSP20 and Model_TSP50 are two pretrained CRediT authorship contribution statement
models which are trained on graph size 20 and 50 nodes respectively
and are used to evaluate on variable sizes (from 3 to 100) instances. Jia Luo: Conceptualization, Methodology, Software, Validation,
Optimality gaps of those instances are shown as Fig. 10. As it is Formal analysis, Investigation, Data curation, Visualization, Writing
difficult for CPLEX to find the best solution of large-scale instances, – original draft. Chaofeng Li: Conceptualization, Validation, Super-
the gap shocks back and forth during graph size from 70 to 100. vision, Project administration, Formal analysis, Writing – review &
When graph size is from 4 to 40, Model_TSP20 and Model_TSP50 both editing. Qinqin Fan: Formal analysis, Writing – review & editing.
outperform the CPLEX. Starting from graph size 40 nodes, Model_TSP50 Yuxin Liu: Investigation, Writing – review & editing.
still outperforms the CPLEX. In general, we can conclude that the GCE-
Declaration of competing interest
MAD Net pretrained on larger graph size achieves better generalization
ability.
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared to
7. Conclusion and future direction
influence the work reported in this paper.

In this research, a novel deep reinforcement learning based encoder– Acknowledgments


decoder framework called GCE-MAD Net is proposed to solve TSP.
We propose a graph convolutional network to aggregate neighbors This work is supported by grants from National Natural Science
features of each node through edge features and fuse features from all Foundation of China (No. 62176150) and Shanghai Sailing Program,
graph convolutional layers. The attention decoder allows to select next China (No. 21YF1416700). Thanks to Hideyuki TAKAGI for his valuable
city according to graph-level features, which is beneficial to take an comments and discussions on this work.
action from global view. Furthermore, the multiple decoders strategy
in this work results in several solutions at one time, which diversity the Appendix. Solutions of variable sizes
solution space. The computational experiments verify proposed GCE-
MAD Net performs better than the state-of-the-art deep reinforcement Fig. 11 shows instance solutions for TSP with 10, 20, 30, 40, 50, 60,
learning based algorithms, also supreme over CPLEX optimizer and 70, 80, 90, and 100 nodes obtained by GCE-MAD NET compared with
traditional heuristics. Our findings suggest that the deep reinforcement the optimal solutions found by CPLEX. CPLEX tends to add the nearest
learning based algorithms are successful to solve CO problems. Unlike nodes next to the current node to the partial route which leads to a
some traditional heuristics, our model performs well in solution quality higher cost than GCE-MAD NET.
and efficiency. Moreover, well-trained models can be used offline to

12
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Fig. 11. Random TSP instances of variable sizes are solved by CPLEX (left column) and our model (right column).

13
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Fig. 11. (continued).

14
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Fig. 11. (continued).

References Huang, H., Savkin, A.V., Huang, C., 2020. A new parcel delivery system with drones
and a public train. J. Intell. Robot. Syst. 100 (3), 31341–31354.
Al-Gaphari, G.H., Al-Amry, R., Al-Nuzaili, A.S., 2021. Discrete crow-inspired algorithms Joshi, C.K., Laurent, T., Bresson, X., 2019. An efficient graph convolutional network
for traveling salesman problem. Eng. Appl. Artif. Intell. 97, 104006. technique for the travelling salesman problem. URL arXiv:1906.01227.
Ali, I.M., Essam, D., Kasmarik, K., 2020. A novel design of differential evolution for Kanna, S.K.R., Sivakumar, K., Lingaraj, N., 2021. Development of deer hunting
solving discrete traveling salesman problems. Swarm Evol. Comput. 52, 100607. linked earthworm optimization algorithm for solving large scale traveling salesman
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S., 2016. Neural combinatorial problem. Knowl.-Based Syst. 227, 107199.
optimization with reinforcement learning. In: International Conference on Learning Khan, I., Maiti, M.K., 2019. A swap sequence based artificial bee colony algorithm for
Representations. San Juan. traveling salesman problem. Swarm Evol. Comput. 44, 428–438.
Bengio, Y., Lodi, A., Prouvost, A., 2021. Machine learning for combinatorial op- Kinable, J., Smeulders, B., Delcour, E., Spieksma, F.C.R., 2017. Exact algorithms for the
timization: a methodological tour d’Horizon. European J. Oper. Res. 290, equitable traveling salesman problem. European J. Oper. Res. 261 (2), 475–485.
405–421. Kool, W., Hoof, H.V., Welling, M., 2018. Attention, learn to solve routing problems!
Bresson, X., Laurent, T., 2017. Residual gated graph ConvNets. arXiv preprint arXiv: in: International Conference on Learning Representations. Vancouver, BC.
1711.07553. Li, Q., Han, Z., Wu, X.-M., 2018. Deeper insights into graph convolutional networks
Chen, J., Ma, T., Xiao, C., 2018. Fastgcn: fast learning with graph convolutional for semi-supervised learning. In: Thirty-Second AAAI Conference on Artificial
networks via importance sampling. arXiv preprint arXiv:1801.10247. Intelligence.
Cho, K., Gulcehre, B.v.M.C., Bahdanau, D., Schwenk, F.B.H., Bengio, Y., 2014. Learn- Li, W., Wang, G.-G., Gandomi, A., 2021. A survey of learning-based intelligent
ing phrase representations using RNN encoder–decoder for statistical machine optimization algorithms. In: Archives of Computational Methods in Engineering.
translation. In: EMNLP. pp. 1–19.
Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L., Learning combinatorial opti-
Marcheggiani, D., Titov, I., 2017. Encoding sentences with graph convolutional
mization algorithms over graphs. In: Advances in Neural Information Processing
networks for semantic role labeling. In: EMNLP.
Systems, vol. 30. Long Beach, CA, pp. 6348–6358.
MirHassani, S.A., Habibi, F., 2013. Solution approaches to the course timetabling
Davendra, D., 2010. Traveling Salesman Problem: Theory and Applications. BoD–Books
problem. 39, (2), pp. 133–149.
on Demand.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G.,
Defferrard, M., Bresson, X., Vandergheynst, P., 2016. Convolutional neural networks
Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., 2015. Human-level control
on graphs with fast localized spectral filtering. In: Advances in Neural Information
through deep reinforcement learning. Nature 518 (7540), 529–533.
Processing Systems, vol. 29. Barcelona, SPAIN, pp. 3844–3852.
Nammouchi, A., Ghazzai, H., Massoud, Y., 2020. A generative graph method to solve
Deudon, M., Cournut, P., Lacoste, A., Adulyasak, Y., Rousseau, L., 2018. Learning
the travelling salesman problem. In: IEEE 63rd International Midwest Symposium
heuristics for the TSP by policy gradient. In: International Conference on the
on Circuits and Systems. pp. 89–92.
Integration of Constraint Programming, Artificial Intelligence, and Operations
Nazari, M., 2018. Reinforcement learning for solving the vehicle routing problem. In:
Research. Delft, The Netherlands, pp. 170–181.
Duvenaud, D.K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru- Advances in Neural Information Processing Systems. pp. 9839–9849.
Guzik, A., Adams, R.P., 2015. Convolutional networks on graphs for learning Osaba, E., Yang, X.-S., Ser, J.Del., 2020. Traveling Salesman Problem: A Perspective
molecular fingerprints. In: Advances in Neural Information Processing Systems, vol. Review of Recent Research and New Results with Bio-Inspired Metaheuristics. pp.
28. pp. 2224–2232. 135–164.
Ebadinezhad, S., 2020. DEACO: ADopting dynamic evaporation strategy to enhance Pandiri, V., Singh, A., 2019. An artificial bee colony algorithm with variable degree of
ACO algorithm for the traveling salesman problem. Eng. Appl. Artif. Intell. 92, perturbation for the generalized covering traveling salesman problem. Appl. Soft
103649. Comput. 78, 481–495.
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N., 2017. Convolutional Paschos, V.T., 2014. Applications of Combinatorial Optimization, vol. 3. John Wiley &
sequence to sequence learning. In: International Conference on Machine Learning, Sons.
pp. 1243–1252. Rego, C., Gamboa, D., Glover, F., Osterman, C., 2011. Traveling salesman problem
Goyal, P., Ferrara, E., 2018. Graph embedding techniques, applications, and heuristics: Leading methods. Implement. Lat. Adv. 211 (3), 427–441.
performance: A survey. Knowl.-Based Syst. 151, 78–94. Saji, Y., Barkatou, M., 2021. A discrete bat algorithm based on Lévy flights for Euclidean
Hromkovič, J., 2013. Algorithmics for Hard Problems: Introduction to Combinatorial traveling salesman problem. Expert Syst. Appl. 172, 114639.
Optimization, Randomization, Approximation, and Heuristics. Springer Science & Subramanyam, A., Gounaris, C.E., 2016. A branch-and-cut framework for the consistent
Business Media. traveling salesman problem. European J. Oper. Res. 248 (2), 384–395.

15
J. Luo, C. Li, Q. Fan et al. Engineering Applications of Artificial Intelligence 112 (2022) 104848

Talbi, E.-G., 2021. Machine learning into metaheuristics: A survey and taxonomy. ACM Wang, Z., Zhang, Y., Zhou, W., Liu, H., 2012. Solving traveling salesman problem in
Comput. Surv. 54 (6), 1–32. the Adleman–Lipton model. Appl. Math. Comput. 219 (4), 2267–2270.
Tran, D.-D., Vafaeipour, M., Baghdadi, M.El., Barrero, R., Mierlo, J.Van., Hegazy, O., Williams, R.J., 1992. Simple statistical gradient-following algorithms for connectionist
2020. Thorough state-of-the-art analysis of electric and hybrid vehicle powertrains: reinforcement learning. Mach. Learn. 8 (3), 229–256.
Topologies and integrated energy management strategies. Renew. Sustain. Energy Williamson, D.P., Shmoys, D.B., 2011. The Design of Approximation Algorithms.
Rev. 119, 109596. Cambridge University Press.
Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y., 2020. A comprehensive
Polosukhin, I., Attention is all you need. In: Advances in Neural Information survey on graph neural networks. 32, (1), pp. 4–24.
Processing Systems. Long Beach, CA, pp. 5998–6008. Wu, J.Muren., Zhou, L., Du, Z., Lv, Y., 2019. Mixed steepest descent algorithm for the
Vinyals, O., Fortunato, M., Jaitly, N., 2015b. Pointer networks. In: Advances in Neural traveling salesman problem and application in air logistics. Transp. Res. E 126,
Information Processing Systems, vol. 28. Montréal, Canada, pp. 2692–2700. 87–102.
Wang, Y., Han, Z., 2021. Ant colony optimization for traveling salesman problem based Zhou, J., 2020. Graph neural networks: A review of methods and applications. AI Open
on parameters optimization. Appl. Soft Comput. 107, 107439. 1, 57–81.

16

You might also like