Surrogate Models Mop
Surrogate Models Mop
Abstract—Surrogate-assisted evolutionary algorithms (SAEAs) provided by the surrogates, promising candidate solutions can
have emerged as a promising approach to addressing expensive be selected for evaluation using the true objective functions,
and black-box problems. Most existing SAEAs leverage regres- thereby reducing the number of true objective evaluations and
sion models to predict the objective values, reducing the use
of true objective functions. However, these methods focus on improving search efficiency [6].
learning the mapping from the decision space to the objective Many SAEAs have been proposed based on different types
space and may fail to reveal the relationship between solutions of cheap surrogate models, such as Kriging models [7], [8],
in the decision space. Recently, graph neural networks (GNNs) random forests [9], radial basis functions (RBFs) [10], and
have attracted increased attention due to their powerful ability to neural networks [11]. These models often take the decision
expose sample interaction. In this paper, we propose employing
a graph neural network for learning embeddings of solutions vector as the input and its corresponding objective functions
in the decision space, followed by a classification task aimed as the output, and generally learn a regression model to
at predicting dominance relationships and a regression task for efficiently approximate expensive objective functions. Alter-
obtaining the estimated fitness values. To this end, we generate a natively, classification models have been adopted in multi-
graph at each generation, where nodes represent solutions, and objective SAEAs to identify whether a candidate solution is
edges are added depending on the Euclidean distances between
nodes. In addition, a new acquisition function that adaptively non-dominated. Besides, classification models are also often
weights the predictions on objective values and dominance used in the constrained problems [9] to determine whether
relationships is proposed to effectively identify new samples. The a solution is feasible or not, using k-nearest neighbors algo-
performance of the proposed method is examined by extensive rithms [12], or support vector machine [13], [14], to name
empirical studies on a widely used test suite in comparison to its a few. Moreover, ensemble models are studied to take ad-
peer algorithms, and the results confirm the effectiveness of the
proposed method. vantage of different surrogate models with the help of model
Index Terms—Expensive optimization, Surrogate model, management strategies [15], [16]. After constructing surrogate
Graph neural network, Acquisition function. models, the next step is called model management, i.e., the
way of collaboratively using the surrogate models and the
I. I NTRODUCTION true objective functions, and efficiently updating the surrogates
Expensive multi-objective optimization problems can be and effectively guiding the search. For example, acquisition
seen in many real-world applications [1], [2], such as feature functions (AFs) in Bayesian optimization are designed to
selection [3], robust optimization of large-scale network [4], strike a balance between exploration and exploitation. As a
and neural architecture search [5]. Take neural architecture result, candidate solutions that are promising to improve the
search as an example, the evaluation of an architecture requires optimization performance are obtained by optimizing the ac-
training and testing on datasets, which is very computationally quisition function. Commonly used AFs include the expected
expensive [6]. Therefore, optimizing a neural architecture is improvement [17], lower confidence bound (LCB) [2].
challenging for optimization algorithms, as the number of Most surrogate models, both for regression and classifica-
affordable evaluations is limited. tion, do not pay much attention to the relationship between
Multi-objective evolutionary algorithms (MOEAs) have the solutions in the decision space. However, exploiting the
been widely used in solving multi-objective optimization prob- topology of decision variables in the decision space is crucial
lems, as they can obtain a set of Pareto optimal solutions for further enhancing prediction accuracy. Because the relative
in a single run. In general, MOEAs generate an offspring position information between solutions in the decision space
population from the previous population and select better can reflect their corresponding positions in the objective space,
solutions based on their fitness values to survive to the particularly in continuous optimization. The definition of con-
next generation. However, MOEAs generally require a large tinuous functions f at x0 is that for every ϵ > 0, there exists
number of objective evaluations to find a set of non-dominated a δ > 0 such that for all x in the domain:
solutions, which is inapplicable to expensive multi-objective
|x − x0 | < δ implies |f (x) − f (x0 )| < ϵ. (1)
optimization problems. Therefore, surrogate-assisted evolu-
tionary algorithms (SAEAs) have been proposed to construct Furthermore, when the function exhibits Lipschitz continuity,
surrogate models to approximate the objective functions or its rate of change is bounded by a real value. In other words,
learn the dominance relationship. Based on the predictions the difference in objective function values is expected to be
relatively small if the two solutions are neighbors in the the proposed method GNNAEA in detail. In Section IV, we
decision space. Therefore, exploring the topology structure in present the results of comparative experiments on benchmark
the decision space should hopefully help better approximate problems, as well as the sensitivity analysis and ablation stud-
the functional relationship between the decision variables and ies. Section V summarizes this paper and suggests promising
the objectives. Fortunately, the realization of this concept is future research directions.
attainable through the application of graph neural networks,
given their adeptness at capturing relationships among nodes II. R ELATED W ORK
within a graph [18]. A. Pareto Optimal Fronts
Different from other machine learning methods, such as Multi-objective optimization problems (MOPs) aim to si-
multilayer perceptrons (MLPs) [19], convolution neural net- multaneously optimize multiple objective functions that are
works [20], and recurrent neural networks [21], GNNs are often conflicting with each other. Therefore, there is no single
capable of dealing with irregular data structures and have solution that could be optimal for all functions. Instead, the
been applied in many real-world applications [22], including target is to obtain a set of solutions forming the Pareto optimal
chemical reaction prediction [23], traffic state prediction [24], set, where every element x∗ in the set satisfies:
and social recommendation [25]. The distinct characteristic of
GNNs is that they update nodes by aggregating and combining ∀i : fi (x∗ ) ≤ fi (x), ∃j : fj (x∗ ) < fj (x), (2)
information (i.e., the embeddings) from their neighboring where x represents other solutions not included in the Pareto
nodes through edges by taking graphs as the input [26], optimal set, i and j = 1, 2, . . . , M , and M is the number of
demonstrating remarkable power and flexibility in revealing optimization functions. The objective values of elements in the
topology within graphs. Pareto optimal set consist of the Pareto optimal front.
In this paper, we propose a GNN-assisted evolutionary algo-
rithm, called GNNAEA, for solving expensive multi-objective B. Surrogate Models
optimization problems (EMOPs). Inspired by the Lipschitz Surrogate models are often used in evolutionary algorithms
continuity in mathematics, a graph is constructed depending to reduce the time or cost for real fitness evaluations [27]. In
on the Euclidean distance of solutions, which is expected to general, surrogate models are trained before the evolutionary
contain the implicit information in the objective space. A GNN process, using the collected data, and are used to evaluate
model is employed to generate embeddings for the downstream fitness values during the evolutionary process. In online data-
classification and regression tasks. Based on the prediction driven optimization, models can be updated and fine-tuned by
of the classification and regression models, a new acquisition leveraging the real fitness evaluations obtained from newly
function is proposed to effectively identify new samples. The generated solutions.
main contributions of this work are summarized as follows. The accuracy of surrogate models is essential to the over-
all performance of MOEAs, and some simple yet efficient
• To better represent the distance relationship between
surrogate models have been studied in the past decades.
solutions in the decision space, a graph is constructed
The second-order polynomial response surface (PRS) [28] is
wherein each node represents a solution. Edges are added
often used in some low-dimensional optimization problems,
if the Euclidean distance between two nodes is below a
due to its simple structure. Kriging model is a statistical
predefined threshold among all Euclidean distances.
method that gives the best linear unbiased prediction as well
• Taking the generated graph as its input, a graph neural
as its uncertainty at unsampled locations [29]. Radial basis
network is applied to learn the representation of the
function networks [30] can be considered as machine learning-
solutions in the decision space, followed by two MLPs
based surrogate models, which contain several weighted basis
designed to predict the approximate fitness values (treated
functions. The basic idea of using RBFs is that the predicted
as a regression task) and the domination relationship
objective values should be closely aligned with those that
(a classification task), respectively. After training, this
share similar values in the decision variables. However, the
surrogate model can generalize the knowledge to unseen
fundamental concept differs in the utilization of the Euclidean
graphs during the evolutionary process.
distance between RBFs and GNNs; that is, the usage of
• A new acquisition function (AF) is proposed based on the
RBFs emphasizes the Euclidean distance between samples and
prediction of the objective values and the dominance rela-
basis function centers, while GNNs focus on capturing the
tionship. First, an MOEA optimizes the predicted objec-
relationships between samples.
tive values for a fixed number of generations. Afterward,
the final optimized population is evaluated by the AF that C. Graph Neural Networks
adaptively weights the objective value prediction and the Over the past decade, graph and node embedding methods
prediction of the dominance relationship prediction, from have gained popularity for solving graph-based problems,
which the new samples are selected. with numerous promising approaches proposed to explore and
The remainder of this paper is organized as follows. Section reveal structural information within graphs, such as Deep-
II briefly introduces the definition of Pareto fronts, surrogate Walk [31] and Node2Vec [32]. These methods often employ
models, and graph neural networks. Section III describes random walks around nodes to get insights within a local
Fig. 1: The Framework of the proposed method GNNAEA.
area. However, they face limitations in capturing generalized M LPC for learning the dominance relationship and M LPR
patterns within a single graph or across multiple graphs. In for fitting the fitness values of the solutions, respectively.
recent years, graph neural networks have attracted increasing On the other hand, the acquisition function is used to
attention with their learning ability in graphs. These networks evaluate solutions generated from the evolutionary algorithm
are typically categorized into spectral-based and spatial-based NSGA-II [33], and choose u solutions for real evaluations after
approaches. The former aims to design new graph filters based several iterations. In the evolutionary process, the population
on graph signal processing theory [26], while the latter focuses is first initialized and is considered as a parent population to
on introducing novel message-passing approaches [24]. generate offspring by crossover and mutation. The generated
Here, we give a brief introduction to the basic idea of offspring population is then processed by the surrogate model
spectral-based GNNs that we utilize in this work. In general, to get the approximate fitness values. The new parent popu-
with a graph filter g, the input x is expressed as lation is obtained after environmental selection if w < wmax ;
otherwise, the population is evaluated by the AF and the
x ∗G g = F −1 (F(x) ⊙ F(g)), (3)
promising u solutions are output for real evaluations.
where ⊙ represents the element-wise product. F(x) = UT x After getting solutions through real evaluations, the surro-
and F −1 (x̂) = UT x̂ are the graph Fourier transformation and gate model is updated. Finally, all solutions in the training
the inverse graph Fourier transformation, respectively. U can dataset are output as the final result once the number of real
be found in L = In − D−1/2 AD−1/2 = UΛUT , where A fitness evaluations F E reaches the predefined value F Emax .
is the adjacency matrix of a graph, D is the diagonal matrix
representing the node degrees, Λ and U are the diagonal matrix B. A GNN-assisted Model
of eigenvalues and the corresponding matrix of eigenvectors
of L. 1) Graph Generation: The graph is constructed depending
on the Euclidean distance among solutions. The Euclidean
III. T HE P ROPOSED M ETHOD distance matrix E ∈ RN ×N is first calculated, where Eij
A. The Overall Framework represents the Euclidean distance between solution i and j,
The overall framework of the proposed GNNAEA is shown with N being the number of solutions. Then, the adjacent
in Fig. 1. At first, Latin hypercube sampling is applied matrix A ∈ RN ×N is derived by setting elements in E whose
to sample 11D − 1 points to be evaluated by real fitness values are greater than the threshold to zero and the rest to
evaluations in the decision space, where D is the dimension one. In other words, two nodes (solutions) are connected if
of the decision space. The training dataset comprises node the Euclidean distance is less than the specified threshold.
pairs, each associated with a solution, its real fitness values The threshold is determined as the value below which the
and dominance relationships. Within the surrogate model, the connection ratio rc of distance values falls, for example, the
initial step involves generating a graph based on the Euclidean threshold is the lower quantile when rc = 0.25. This concept is
distance among training data. This graph is then used as the motivated by the idea that solutions positioned closely together
input for the GNN block to obtain a richer expression by in the decision space tend to exhibit similarities and can thus
revealing the graph structure and the relationships between benefit from learning from each other.
nodes. The embedding of solutions obtained from the GNN 2) GNN Block: Here, we apply the classic spectral-based
block is subsequently utilized in two MLPs, denoted as GNN model, GCN [26], to process the generated graph and
obtain the representation containing the relationship between First, a multi-objective evolutionary algorithm (NSGA-II
nodes. in this work) is adopted to optimize the predicted values
We assume that each row of solution matrix X ∈ RN ×D for a fixed number of generations, resulting in the optimized
represents one solution, where D is the dimensionality of the population P ∗ . Then an AF is proposed to select u promising
decision space. The k-th layer of GCN can be expressed as candidate solutions to be evaluated by the true objective func-
tions. Based on the way of constructing the surrogate model,
H(k) = ReLU(D̃−1/2 ÃD̃−1/2 H(k−1) W(k) ), (4)
an insight into the predictions of M LPR and M LPC can be
where à = A + In and D̃ is the diagonal degree matrix of provided: minimizing the predictions of the objective values f̂
′ ′
Ã. W(k) ∈ RD ×D is the learnable weight matrix in the k-th given by M LPR enhances the exploitation, while maximizing
layer (k ∈ Z ), where D′ is the number of dimensions of
+ the probability of being dominated p0 for candidate solutions
′
embeddings. H(0) = X, and W(1) ∈ RD×D . From Equation prioritizes exploration. Hence, we carefully design a new
(4), we can see that the i-th solution is updated by its neighbor acquisition function based on the predictions of the regression
solutions through D̃−1/2 ÃD̃−1/2 . model and the classifier as follows:
3) MLPs for Regression and Classification: The final node AFRC (x) = α · f̂ − (1 − α) · p0 , (8)
embedding matrix H(K) is obtained from K hidden layers,
which contains the information extracted from each node and where α is a trade-off parameter. Generally, learning the
its neighbor nodes. The downstream tasks are regression for functional relationship between decision variables and each
fitting the real fitness values and classification for distinguish- objective value is more difficult than that between the decision
ing the non-dominated solutions from the dominated ones, valuables and the domination relationship. Accordingly, α is
facilitated by two MLPs. adapted to the process of the optimization. In the beginning,
Applying M LPR , the estimated fitness values are obtained the selection relies more on the prediction provided by the
through the equation: classifier, as the regression model may fail to effectively
estimate the candidates. This leads to a smaller value assigned
Fe = σ(H(K) × We ), (5) to α. With the optimization processes, more data are acquired,
D ′ ×M further enhancing the quality of the regression model. As a
where σ represents the sigmoid function, and We ∈ R .
On the other hand, the classification target is conducted by result, the fitness predictions can play a more important role
M LPC as follows, in the selection of new samples. That is, the value of α should
be adapted as follows:
Le = δ(H(K) × Wc ), (6)
′
α = F E/F Emax , (9)
where δ is the softmax function, and We ∈ RD ×2 . Since
where F E is the current fitness evaluation number and F Emax
the softmax function is applied as the activated function,
denotes the maximum number of available fitness evaluations.
two dimensions of Le represent the probability of solutions
being grouped into 0 (dominated) and 1 (non-dominated),
respectively. IV. E XPERIMENTS
4) Training Method: For the regression task, the loss func- A. Experimental Settings
tion fr is the Mean Squared Error (MSE) function, which aims
1) Test Problems: To test the effectiveness of the proposed
to minimize the distance between the estimated fitness values
GNN-assisted evolutionary algorithm, we have selected a
and the normalized real fitness values. On the other hand, the
widely used test suite of scalable multi-objective test problems,
cross-entropy function fc is applied in the classification task.
i.e., the DTLZ [34] test suite. For all the test instances used
Therefore, the final loss function f is obtained as follows,
in the experimental studies, the number of decision variables
f = fc + 5 × min{fc , fr }. (7) D is set to 10, and the number of objectives is set to M = 3
and 5, respectively.
To avoid overfitting, only part of the training data (rs of the 2) Performance Indicators: The inverted generational dis-
number of training data) is sampled for generating the graph. tance (IGD) [35], is adopted as the performance indicator to
The AdamW optimizer with the weight decay being 0.01 is evaluate the quality of the non-dominated solutions obtained
applied to optimize and update weights in the GNN model. by each algorithm. We denote a set of uniformly distributed
The training rate is set to be 0.005, and 20 epochs are used solutions sampled from the objective space along the true
for one-round training. Pareto front as P ∗ , and the Parteo front approximation as
C. New Sample Selection P̂ obtained approximation to the PF. IGD is calculated as
follows:
After training the GNN model and MLPs to predict ob- P
∗ ∗ d(υ, P̂ )
jective values and classify the candidate solutions into non- IGD(P , P̂ ) = υ∈P ∗ (10)
|P |
dominated and dominated ones, the next step is to identify
promising new samples by balancing exploration and exploita- where d(υ, P̂ ) is the minimum Euclidean distance between υ
tion. To achieve this, we propose a new selection strategy and all points in P̂ . The smaller IGD value, the better the
based on the information provided by the surrogate models. achieved solution set is.
Number of GCN Layers Ratio of Connections in Graphs Ratio of Samplings in Training
4 4 4
mean value of IGD ranking mean value of IGD ranking mean value of IGD ranking
3 3 3
IGD Ranking
IGD Ranking
IGD Ranking
2 2 2
1 1 1
0 0 0
layers = 2 layers = 3 layers = 4 ratio = 0.1 ratio = 0.25 ratio = 0.5 ratio = 0.5 ratio = 0.7 ratio = 1.0
Fig. 2: The sensitivity analysis of hyperparameters, all experiments are conducted over 10 independent runs.
TABLE I: Mean (Standard Deviation) IGD values obtained by K-RVEA, SMSEGO, MESMO and GNNAEA for MOPs with
M = 3 and 5.
Problem M K-RVEA SMSEGO MESMO GNNAEA
3 1.04e+2 (1.53e+1) – 1.05e+2 (3.22e+1) – 8.91e+1 (1.61e+1) – 6.72e+1 (2.34e+1)
DTLZ1
5 5.31e+1 (1.40e+1) – 5.58e+1 (1.27e+1) – / 4.57e+1 (1.04e+1)
3 1.87e–1 (5.19e–2) – 3.18e–1 (4.11e–2) – 1.43e–1 (2.41e–2) – 1.13e–1 (3.23e–3)
DTLZ2
5 2.84e–1 (2.15e–2) – 3.94e–1 (1.28e–2) – / 2.09e–1 (1.11e–2)
3 2.49e+2 (5.18e+1) – 2.47e+2 (6.25e+1) – 2.18e+2 (1.86e+0) ≈ 2.14e+2 (4.32e+1)
DTLZ3
5 1.45e+2 (3.76e+1) ≈ 1.64e+2 (4.00e+1) – / 1.40e+1 (4.41e+1)
3 4.40e–1 (9.82e–2) + 7.21e–1 (1.51e–1) – 4.32e–1 (1.69e–1) + 6.13e–1 (9.60e–2)
DTLZ4
5 4.40e–1 (9.82e–2) + 7.63e–1 (6.51e–2) ≈ / 8.00e–1 (5.41e–2)
3 1.05e–1 (3.29e–2) – 1.85e–1 (4.58e–2) – 1.70e–1 (1.34e–2) – 6.46e–2 (4.23e–3)
DTLZ5
5 4.56e–2 (1.41e–2) + 1.28e–1 (2.52e–2) – / 6.07e–2 (8.80e–3)
3 3.00e+0 (4.81e–1) + 4.99e+0 (3.96e–1) + 4.53e+0 (1.67e+0) + 6.66e+0 (2.69e–2)
DTLZ6
5 1.94e+0 (2.06e–1) + 4.11e+0 (4.03e–1) ≈ / 4.83e+0 (4.01e–2)
3 1.29e–1 (7.67e–3) + 2.58e+0 (1.10e+0) – 1.17e+0 (6.14e–1) – 6.42e+0 (8.23e–2)
DTLZ7
5 4.95e–1 (4.42e–2) + 1.15e+0 (2.77e–1) + / 6.73e+0 (1.62e+0)
+/–/ ≈ 6/6/2 2/10/2 2/4/1
3) Parameter Settings: All algorithms use evolutionary DTLZ7, where the ranking of IGD values are presented in Fig.
algorithms as an optimizer, and the population size is set to 2.
100. After 20 generations of optimization based on surrogate As we can see in Fig. 2 (a), the performance of the algorithm
models, three promising candidate solutions are evaluated by degrades with an increase in the number of GCN layers. This
the true objective functions. can be attributed to over-smoothing [36], wherein the perfor-
In our experiments, we perform each algorithm 10 times mance of GNNs diminishes with deeper structures as nodes
on each test instance and record the corresponding mean and tend to exhibit similar or indistinguishable representations. The
standard deviations (std) of the IGD results. Furthermore, we IGD ranking decreases with the ratio of connections rc in the
also use the Wilcoxon rank sum test at a significance level generated graph increasing from 0.1 to 0.5 in Fig. 2 (b), which
of 0.05 to estimate whether there is a significant difference is reasonable as nodes can gather more information in a dense
between the proposed method and other algorithms under com- graph. Based on the mean value of the IGD ranking in Fig. 2
parison. The symbol ”(–)” indicates that a significantly better (c), 70% of the data yields the optimal performance to balance
performance is achieved by the proposed algorithm, while between the overfitting and sample size. Therefore, we adopt
the symbol ”(+)” indicates the compared algorithm achieves the number of GCN layers being 2, rc = 0.5, and rs = 0.7 in
a significantly better performance. Additionally, the symbol the following experiments.
”(≈)” indicates that the compared and proposed algorithms
C. Comparative Experimental Results
show similar performance in terms of IGD values.
We compare the proposed algorithm with several repre-
sentative and state-of-the-art surrogate-assisted evolution algo-
B. Sensitivity Analysis
rithms, i.e., K-RVEA [7], SMSEGO [37] and MESMO [38].
We conduct a sensitivity analysis on three hyperparameters: We first test all algorithms on the DTLZ test suite with M = 3
the number of GCN layers in the GNN block, the ratio of con- and 5 and the results of IGD values are presented in Table I.
nections in the generated graphs rc , and the ratio of sampling Note that the IGD results for M = 5 achieved by MESMO
data to generate the graph rs . The experimental results are are not given in Table I due to its unaffordable computation
obtained over 10 independent runs on three-objective DTLZ1- cost. Accordingly, we can observe that the proposed GNNAEA
(a) K-RVEA (b) SMSEGO (c) MESMO (d) GNNAEA
Fig. 3: The final solution set with the median IGD values found by K-RVEA, SMSEGO, MESMO and GNNAEA on DTLZ1
with M = 3.
Fig. 4: The final solution set with the median IGD values found by K-RVEA, SMSEGO, MESMO and GNNAEA on DTLZ2
with M = 3.
Fig. 5: The final solution set with the median IGD values found by K-RVEA, SMSEGO, MESMO and GNNAEA on DTLZ5
with M = 3.
TABLE II: Mean (Standard Deviation) IGD values obtained by GNNAEA with its three variants MLP1-AEA, MLP2-AEA,
and GNN2-AEA for MOPs with M = 3.
Problem M MLP1-AEA MLP2-AEA GNN2-AEA GNNAEA
DTLZ1 3 9.36e+1 (1.38e+1) – 9.89e+1 (1.30e+1) – 1.05e+2 (1.39e+1) – 6.72e+1 (2.34e+1)
DTLZ2 3 3.57e–1 (2.53e–2) – 3.08e–1 (3.82e–2) – 2.05e–1 (1.88e–2) – 1.13e–1 (3.23e–3)
DTLZ3 3 3.50e+2 (6.84e+1) – 2.83e+2 (6.01e+1) – 3.03e+2 (6.14e+1) – 2.14e+2 (4.32e+1)
DTLZ4 3 7.19e–1 (6.53e–2) – 6.50e–1 (9.26e–2) – 7.43e–1 (6.07e–2) – 6.13e–1 (9.60e–2)
DTLZ5 3 2.70e–1 (3.55e–2) – 2.37e–1 (2.23e–2) – 1.05e–1 (1.67e–2) – 6.46e–2 (4.23e–3)
DTLZ6 3 6.16e+0 (1.28e–1) + 6.15e+0 (8.40e–1) + 6.75e+0 (9.24e–2) – 6.66e+0 (2.69e–2)
DTLZ7 3 1.17e+0 (4.42e–1) + 1.63e+0 (2.42e–1) + 5.73e+0 (1.54e+0) + 6.42e+0 (8.23e–2)
+/–/ ≈ 2/5/0 2/5/0 1/6/0
achieves the best overall performance on the DTLZ test suite, to learn the graph structure. Afterward, regression models
compared with K-RVEA, SMSEGO and MESMO. Specifi- to predict the objective values and a classification model to
cally, GNNAEA can find better Pareto front approximations provide the domination relationship are constructed. Finally,
on DTLZ1, DTLZ2, DTLZ3 and DTLZ5. This indicates a new acquisition function is proposed to carefully trade off
that the proposed surrogate model can effectively guide the between exploration and exploitation, resulting in effective
optimization of the expensive multi-objective optimization. new sample selection. The performance of the proposed GN-
The proposed algorithm fails to address DTLZ6 and DTLZ7, NAEA and the effectiveness of the proposed surrogate models
where K-RVEA shows promising performance. A possible and new acquisition function are validated on the DTLZ test
explanation is that DTLZ6 has a degenerate Pareto front, suite under the comparison with three representative surrogate-
which is always a curve in the hyper-space, regardless of the assisted multi-objective evolutionary algorithms.
number of objectives; DTLZ7 has a disconnected Pareto front, Despite that GNNAEA provides insights into the use of
where the number of segments can be as large as 2M −1 , where GNN models to leverage the graph structure information from
M is the number of objectives. While learning the topology the search space, it fails to address some multi-objective
of the Pareto front of DTLZ6 and DTLZ7 is challenging for optimization problems with degenerate and discontinuous PFs.
GNNAEA, K-RVEA has shown its ability to address problems Therefore, further investigation on how to effectively learn the
with degenerate or disconnected Pareto fronts [39]. topology of irregular PFs is an interesting research direction.
To take a closer look at the results achieved by each Moreover, the generation of graphs based on the solutions in
algorithm, Figs. 3-5 illustrates the Pareto front approximations search space impacts the quality of the GNN models, which
on DTLZ1, DTLZ2 and DTLZ5 with M = 3, respectively. deserves more exploration.
Specifically, Fig. 3 shows the superiority of GNNAEA as R EFERENCES
the Pareto front approximation found by GNNAEA is better
[1] Y. Jin and B. Sendhoff, “A systems approach to evolutionary mul-
than that found by all other algorithms under comparison. tiobjective structural optimization and beyond,” IEEE Computational
Despite that K-RVEA also converges to the true Pareto front, Intelligence Magazine, vol. 4, no. 3, pp. 62–76, 2009.
it fails to maintain the diversity of the population. By contrast, [2] B. Liu, Q. Zhang, and G. G. Gielen, “A gaussian process surrogate model
assisted evolutionary algorithm for medium scale expensive optimization
MESMO is able to achieve a set of non-dominated solutions problems,” IEEE Transactions on Evolutionary Computation, vol. 18,
with good diversity, which is, however, far from the true Pareto no. 2, pp. 180–192, 2013.
front. Similar observations can be made from the results of [3] S. Liu, H. Wang, W. Peng, and W. Yao, “A surrogate-assisted evolu-
tionary feature selection algorithm with parallel random grouping for
DTLZ2 and DTLZ5. Therefore, the proposed GNNAEA shows high-dimensional classification,” IEEE Transactions on Evolutionary
competitive performance on MOPs, indicating the efficiency Computation, vol. 26, no. 5, pp. 1087–1101, 2022.
of the GNN-based surrogate model and the designed new [4] S. Wang, J. Liu, and Y. Jin, “Surrogate-assisted robust optimization of
large-scale networks based on graph embedding,” IEEE Transactions on
acquisition function. Evolutionary Computation, vol. 24, no. 4, pp. 735–749, 2019.
[5] R. Shi, J. Luo, and Q. Liu, “Fast evolutionary neural architecture
D. Ablation Study search based on bayesian surrogate model,” in 2021 IEEE Congress
on Evolutionary Computation (CEC). IEEE, 2021, pp. 1217–1224.
In this subsection, we analyze the effectiveness of the [6] Y. Jin, H. Wang, T. Chugh, D. Guo, and K. Miettinen, “Data-driven
proposed surrogate model by comparing it with three variants, evolutionary optimization: An overview and case studies,” IEEE Trans-
namely, MLP1-AEA, MLP2-AEA, and GNN2-AEA. To be actions on Evolutionary Computation, vol. 23, no. 3, pp. 442–458, 2018.
[7] T. Chugh, Y. Jin, K. Miettinen, J. Hakanen, and K. Sindhya, “A
specific, MLP1-AEA and MLP2-AEA replace the GNN block surrogate-assisted reference vector guided evolutionary algorithm for
with one-layer MLP and two-layer MLP with sigmoid activat- computationally expensive many-objective optimization,” IEEE Trans-
ing function, respectively. On the other hand, in GNN2-AEA, actions on Evolutionary Computation, vol. 22, no. 1, pp. 129–142, 2016.
[8] L. Willmes, T. Back, Y. Jin, and B. Sendhoff, “Comparing neural
two GNN models are employed and updated independently networks and kriging for fitness approximation in evolutionary opti-
to generate embeddings for the classification and regression mization,” in The 2003 Congress on Evolutionary Computation, 2003.
tasks, respectively. The results are shown in Table II. CEC’03., vol. 1. IEEE, 2003, pp. 663–670.
[9] H. Wang and Y. Jin, “A random forest-assisted evolutionary algorithm
In general, the proposed method GNNAEA performs the for data-driven constrained multiobjective combinatorial optimization of
best in five instances out of seven instances. Two GNN-based trauma systems,” IEEE Transactions on cybernetics, vol. 50, no. 2, pp.
methods outperform the MLP-based ones in most cases, in- 536–549, 2018.
[10] R. G. Regis, “Evolutionary programming for high-dimensional con-
dicating the information processing and extraction capabilities strained expensive black-box optimization using radial basis functions,”
of GNNs. Additionally, the superior performance of GNNAEA IEEE Transactions on Evolutionary Computation, vol. 18, no. 3, pp.
compared to GNN2-AEA indicates that a single GNN, concur- 326–347, 2013.
[11] Y. Jin, M. Olhofer, and B. Sendhoff, “A framework for evolutionary
rently driven by updates for both classification and regression optimization with approximate fitness functions,” IEEE Transactions on
tasks, can generate higher-quality representations. In other evolutionary computation, vol. 6, no. 5, pp. 481–494, 2002.
words, the classification and regression tasks mutually enhance [12] Y. Tenne, K. Izui, and S. Nishiwaki, “Handling undefined vectors
in expensive optimization problems,” in European Conference on the
each other’s performance. Applications of Evolutionary Computation. Springer, 2010, pp. 582–
591.
V. C ONCLUSION [13] S. D. Handoko, C. K. Kwoh, and Y.-S. Ong, “Feasibility structure
modeling: An effective chaperone for constrained memetic algorithms,”
In this paper, we represent solutions as a graph, based on IEEE Transactions on Evolutionary Computation, vol. 14, no. 5, pp.
which we propose a new surrogate based on the GNN model 740–758, 2010.
[14] J. Poloczek and O. Kramer, “Local svm constraint surrogate models for [36] C. Yang, R. Wang, S. Yao, S. Liu, and T. Abdelzaher, “Revisiting over-
self-adaptive evolution strategies,” in KI 2013: Advances in Artificial smoothing in deep gcns,” arXiv preprint arXiv:2003.13663, 2020.
Intelligence: 36th Annual German Conference on AI, Koblenz, Germany, [37] T. Wagner, M. Emmerich, A. Deutz, and W. Ponweiser, “On expected-
September 16-20, 2013. Proceedings 36. Springer, 2013, pp. 164–175. improvement criteria for model-based multi-objective optimization,” in
[15] D. Lim, Y. Jin, Y.-S. Ong, and B. Sendhoff, “Generalizing surrogate- Parallel Problem Solving from Nature, PPSN XI: 11th International
assisted evolutionary computation,” IEEE Transactions on Evolutionary Conference, Kraków, Poland, September 11-15, 2010, Proceedings, Part
Computation, vol. 14, no. 3, pp. 329–355, 2009. I 11. Springer, 2010, pp. 718–727.
[16] H. Wang, Y. Jin, and J. Doherty, “Committee-based active learning for [38] S. Belakaria, A. Deshwal, and J. R. Doppa, “Max-value entropy search
surrogate-assisted particle swarm optimization of expensive problems,” for multi-objective Bayesian optimization,” in Advances in Neural
IEEE Transactions on cybernetics, vol. 47, no. 9, pp. 2664–2677, 2017. Information Processing Systems, vol. 32, 2019.
[17] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimiza- [39] K. Wan, C. He, A. Camacho, K. Shang, R. Cheng, and H. Ishibuchi,
tion of expensive black-box functions,” Journal of Global optimization, “A hybrid surrogate-assisted evolutionary algorithm for computationally
vol. 13, pp. 455–492, 1998. expensive many-objective optimization,” in 2019 IEEE Congress on
[18] J. Kakkad, J. Jannu, K. Sharma, C. Aggarwal, and S. Medya, “A Evolutionary Computation (CEC). IEEE, 2019, pp. 2018–2025.
survey on explainability of graph neural networks,” arXiv preprint
arXiv:2306.01958, 2023.
[19] X. Wang, J. Wang, K. Zhang, F. Lin, and Q. Chang, “Convergence and
objective functions of noise-injected multilayer perceptrons with hidden
multipliers,” Neurocomputing, vol. 452, pp. 796–812, 2021.
[20] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional
neural networks: analysis, applications, and prospects,” IEEE Transac-
tions on Neural Networks and Learning Systems, 2021.
[21] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget:
Continual prediction with lstm,” Neural Computation, vol. 12, no. 10,
pp. 2451–2471, 2000.
[22] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and
M. Sun, “Graph neural networks: A review of methods and applications,”
AI open, vol. 1, pp. 57–81, 2020.
[23] K. Do, T. Tran, and S. Venkatesh, “Graph transformation policy network
for chemical reaction prediction,” in Proceedings of the 25th ACM
SIGKDD international conference on knowledge discovery & data
mining, 2019, pp. 750–760.
[24] S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan, “Attention based spatial-
temporal graph convolutional networks for traffic flow forecasting,” in
Proceedings of the AAAI conference on artificial intelligence, vol. 33,
no. 01, 2019, pp. 922–929.
[25] W. Fan, Y. Ma, Q. Li, Y. He, E. Zhao, J. Tang, and D. Yin, “Graph
neural networks for social recommendation,” in The world wide web
conference, 2019, pp. 417–426.
[26] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
[27] C. He, Y. Zhang, D. Gong, and X. Ji, “A review of surrogate-assisted
evolutionary algorithms for expensive optimization problems,” Expert
Systems with Applications, p. 119495, 2023.
[28] D. Wang and C. Xie, “An efficient hybrid sequential approximate opti-
mization method for problems with computationally expensive objective
and constraints,” Engineering with Computers, vol. 38, no. 1, pp. 727–
738, 2022.
[29] Y. He, J. Sun, P. Song, and X. Wang, “Dual kriging assisted efficient
global optimization of expensive problems with evaluation failures,”
Aerospace Science and Technology, vol. 105, p. 106006, 2020.
[30] J. Yi, L. Gao, X. Li, C. A. Shoemaker, and C. Lu, “An on-line
variable-fidelity surrogate-assisted harmony search algorithm with multi-
level screening strategy for expensive engineering design optimization,”
Knowledge-based systems, vol. 170, pp. 1–19, 2019.
[31] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning
of social representations,” in Proceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery and data mining,
2014, pp. 701–710.
[32] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for
networks,” in Proceedings of the 22nd ACM SIGKDD international
conference on Knowledge discovery and data mining, 2016, pp. 855–
864.
[33] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist
multiobjective genetic algorithm: Nsga-ii,” IEEE Transactions on Evo-
lutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
[34] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, “Scalable multi-
objective optimization test problems,” in Proceedings of the 2002
Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600),
vol. 1. IEEE, 2002, pp. 825–830.
[35] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Da Fon-
seca, “Performance assessment of multiobjective optimizers: An analysis
and review,” IEEE Transactions on Evolutionary Computation, vol. 7,
no. 2, pp. 117–132, 2003.