0% found this document useful (0 votes)
0 views

A Q-learning-based multi-objective hyper-heuristic algorithm with fuzzy policy decision technology

This document introduces a Q-learning-based multi-objective hyper-heuristic algorithm that utilizes fuzzy policy decision technology to enhance the selection and application of low-level heuristics for solving multi-objective optimization problems. The proposed method, QLFHH, incorporates an iterative learning strategy and an improved ε-greedy exploration strategy, demonstrating effectiveness through experimental comparisons with state-of-the-art algorithms. The study highlights the advantages of hyper-heuristics in achieving cross-domain capabilities and addresses challenges in evaluating and designing multi-objective selection hyper-heuristics.

Uploaded by

20202003300046
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

A Q-learning-based multi-objective hyper-heuristic algorithm with fuzzy policy decision technology

This document introduces a Q-learning-based multi-objective hyper-heuristic algorithm that utilizes fuzzy policy decision technology to enhance the selection and application of low-level heuristics for solving multi-objective optimization problems. The proposed method, QLFHH, incorporates an iterative learning strategy and an improved ε-greedy exploration strategy, demonstrating effectiveness through experimental comparisons with state-of-the-art algorithms. The study highlights the advantages of hyper-heuristics in achieving cross-domain capabilities and addresses challenges in evaluating and designing multi-objective selection hyper-heuristics.

Uploaded by

20202003300046
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Expert Systems With Applications 277 (2025) 127232

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

A Q-learning-based multi-objective hyper-heuristic algorithm with fuzzy


policy decision technology
Fuqing Zhao *,1 , Zewu Geng , Jianlin Zhang , Tianpeng Xu
School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China

A R T I C L E I N F O A B S T R A C T

Keywords: While metaheuristic methods have achieved success in solving various computationally difficult optimization
Hyper-heuristics problems, the design of metaheuristic methods relies on the domain-specific knowledge and requires expert
Q-learning experience. On the contrary, selection hyper-heuristics are emerging cross-domain search methods that perform
Fuzzy policy
search over a set of low-level heuristics. In this study, a selection hyper-heuristic method is introduced to learn,
Multi-objective optimization
select, and apply low-level heuristics to solve multi-objective optimization problems. The proposed hyper-
heuristic is an iterative process including a learning strategy based on Q-learning, an exploration strategy
based on ε-greedy, and a fuzzy policy decision technology. The performance of the proposed selection hyper-
heuristic is experimentally studied on a range of benchmarks as well as several multi-objective real-world
problems. Empirical results demonstrate the effectiveness and cross-domain capability of the proposed selection
hyper-heuristic through comparison with certain state-of-the-art algorithms.

1. Introduction MOEAs, as metaheuristics (Table 1), have the advantage that they
can easily incorporate domain knowledge of problems. Whereas, their
Multi-objective optimization problems (MOPs) exist in various fields disadvantage lies in poor generality which needs to design customized
of the real world and are filled with challenges compared with single- algorithms for certain problems. Once switching to new problem do­
objective optimization problems (SOPs) (Caramia & Dell’Olmo, 2020). mains or new problems from a similar domain, MOEAs often need to be
The most widely utilized search techniques in MOPs are multi-objective tuned or even redesigned. In order to improve the cross-domain ability,
evolutionary algorithms (MOEAs), and it is possible to find a set of trade- an intuitive way is to combine multiple MOEAs to obtain their advan­
off solutions that meet the requirements in a reasonable time. The main tages while with the purpose of avoiding their shortcomings. So, a new
distinction between MOEAs is their different environment selection type of algorithm, hyper-heuristic, has emerged. With the development
strategy which is the way the candidate solutions are ranked and of optimization algorithms, hyper-heuristics have gained increasing
retained at each iteration (Hua et al., 2021). Based on different envi­ attention in various research fields (Y. Zhang et al., 2022). Unlike the
ronment selection strategies, existing MOEAs can be broadly divided heuristic algorithm, the hyper-heuristic algorithm controls the heuristic
into three categories. The first one is Pareto front (PF) dominance-based algorithm by the information accepted during the search process to
MOEAs, and representative algorithms of this type are fast non- obtain the solutions to the problem. There are two primary categories of
dominated sorting algorithm (NSGA-II) and its variants (Deb, Agrawal, hyper-heuristics: (1) heuristic selection methodologies: to select suitable
Pratap, & Meyarivan, 2000; Tian, Cheng, Zhang, Su, et al., 2019). The meta-heuristics; (2) heuristic generation methodologies: to generate
second category is indicator-based MOEAs, such as indicator-based meta-heuristics from given components for problems (Burke et al.,
evolutionary algorithms (IBEA) and hypervolume estimation algo­ 2019). Selection hyper-heuristics mix meta-heuristics as low-level heu­
rithm (HypE) (Zitzler & Künzli, 2004; Bader & Zitzler, 2011), and the ristics (LLHs), and high-level strategy guide the search process intelli­
third category is decomposition-based MOEAs, such as multi-objective gently and decide which low-level heuristic should be applied to the
evolutionary algorithm based on decomposition (MOEA/D) and its current solutions according to the situation (Venske et al., 2022).
variants (Cao et al., 2021). Besides, to obtain a set of well converged and uniformly distributed

* Corresponding author.
E-mail addresses: [email protected] (F. Zhao), [email protected] (Z. Geng), [email protected] (J. Zhang), [email protected] (T. Xu).
1
https://orcid.org/0000-0002-7336-9699.

https://doi.org/10.1016/j.eswa.2025.127232
Received 16 August 2023; Received in revised form 7 March 2025; Accepted 9 March 2025
Available online 18 March 2025
0957-4174/© 2025 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Table 1 3) An improved ∊ − greedy strategy allows the algorithm to maintain a


The Traits of test instances. certain degree of exploration while training intelligent agent, and the
Problem Properties M D efficiency of agent training is enhanced.
4) The fuzzy policy decision technology is incorporated into the process
DTLZ1 Regular PF, linear and multimodal 2,3 6,7
DTLZ2 Regular PF, concave 2,3 11,12 of offspring reproduction, which effectively reduces the original
DTLZ3 Regular PF, concave, multimodal 2,3 11,12 decision variable space, ensuring that the agent maintains a certain
DTLZ4 Regular PF, concave, biased 2,3 11,12 convergence rate in training the Q-table and enhances the quality of
DTLZ5 Irregular PF, concave, degenerate 2,3 11,12 the obtained solutions.
DTLZ6 Irregular PF, concave 2,3 11,12
DTLZ7 Irregular PF, mixed, discontinuous, biased 2,3 21,22
ZDT1 Regular PF, convex 2 30 The remainder of this paper is organized as follows: Section II re­
ZDT2 Regular PF, non-convex 2 30 views the related work. The proposed Q-learning-based multi-objective
ZDT3 Regular PF, non-contiguous convex 2 30 hyper-heuristic algorithm is described in Section III. The experimental
ZDT4 Regular PF, multimodal 2 10
settings and empirical results are explained in Section IV. Finally, the
ZDT6 Regular PF, non-uniformity 2 10
IMOP1 Regular PF, convex and concave 2 10 conclusion and future work is discussed in Section V.
IMOP2 Regular PF, convex and concave 2 10
IMOP3 Irregular PF, multi-segment discontinuous 2 10 2. Related works
IMOP4 Irregular PF, wavy line 3 10
IMOP5 Irregular PF, 8 independent circular faces 3 10
IMOP6 Irregular PF, multi-grid 3 10
2.1. MOEAs
IMOP7 Irregular PF, 1/8 sphere 3 10
IMOP8 Irregular PF, many-segment discontinuous 3 10 In recent years, researchers have designed several novel multi-
objective optimization algorithms based on the strengths and weak­
nesses of existing MOEAs. Li et al. (2016) proposed a bicriteria evolution
approximate solutions, designing algorithms that can balance conver­
framework (BCE) for the Pareto criterion (PC) and Non-Pareto criterion
gence as well as diversity during searching is a challenging issue. As far
(NPC) to enhance PC and NPC-based MOEAs with loss in diversity, and
as we know, most of the existing algorithms study the balance between
BCE-based MOEA/D (BCEMOEAD) is proposed. Li and Yao (2019b)
convergence and diversity within the Pareto-based and Decomposition-
presented a method to adaptively adjust the weights (AdaW) during the
based framework. The search process that first guarantees the conver­
evolution process. AdaW improves the quality of solutions obtained by
gence and then improves the diversity can get acceptable solution sets.
MOEAs by detailing the five components of weight adaption-weight
However, the research on the balance issue in the framework of hyper-
generation, weight addition, weight deletion, profile maintenance, and
heuristics has not been reported.
weight update frequency-to identify the ideal weight distribution for a
Recent research that integrated learning strategies with selection
given problem step by step. Tian et al. (2020) presented a generalized
hyper-heuristics has revealed that learning methods are essential for
frontier modeling method for MOEAs (GFMMOEA) that estimates the
creating efficient selection hyper-heuristics with the capacity for adap­
shape of the non-dominated frontier by training a generalized simplex
tive adjustment (Zhao et al., 2022). As a well-established and typical
model. Based on the estimated frontier, the MOEA is further developed.
reinforcement learning (RL) algorithm, researchers have extensively
Wang et al. (2020) presented an adaptive Bayesian approach for agent-
researched and used Q-learning to address the problem of how agents
assisted evolutionary algorithms to solve MOPs.
make decisions in complex situations (Zhu et al., 2023; Jia et al., 2023).
In general, the Q-learning trains an intelligent agent (Q-table) using the
2.2. Selection hyper-heuristic
Monte Carlo approach, then the agent will obtain prospective Q-values
between various states to make judgments based on the trained Q-table
Hyper-heuristics uses a high-level strategy to control and select LLHs.
and the various states in which it is situated. Since in selection hyper-
One merit of hyper-heuristics is that the search is performed on the
heuristics, the high-level strategy is used to select the appropriate low-
search space of LLHs, rather than the search space of the problems. The
level heuristic during evolution.
separation between high-level strategy and problems increases the level
Based on the above observations, this study proposes a Q-learning-
of generality of hyper-heuristics for it is easy to switch to new problems
based hyper-heuristic (QLFHH).2 QLFHH is an iterative process that
simply by replacing LLHs. Another merit is that by combining multiple
learns, selects, and applies LLHs to solve the given problem. QLFHH
LLHs, hyper-heuristics can inherit the advantages of each LLH while
includes three main components: the Q-learning-based high-level
avoiding their disadvantages.
strategy, an improved ∊ − greedy strategy, and the fuzzy policy deci­
In general, existing multi-objective hyper-heuristics can be divided
sion technology.
into two categories: (1) Multi-objective selection hyper- heuristics: this
The main contributions in this paper are as follows:
type of hyper-heuristics selects one LLH during each iteration and ap­
plies it to solve the problem, such as MCHH (markov chain hyper-
1) In this paper, Q-learning is applied to high-level strategy of selection
heuristic), HHCF (hyper-heuristic based on choice function), HH-RILA
hyper-heuristic to choose action from three well-established MOEAs
(learning automata-based selection hyper-heuristic), PAPHH (pertur­
through iterative learning.
bation adaptive pursuit strategy based hyper-heuristic) and HHEG
2) In QLFHH, appropriate state sets and reward mechanisms are also
(hyper-heuristic algorithm based on adaptive epsilon-greedy selection)
devised for Q-learning. The utilization of the fast non-dominated
(McClymont & Keedwell, 2011; Maashi et al., 2014; W. Li et al., 2019; S.
sorting method to acquire the state of individuals in a population
Zhang et al., 2020; T. Yang et al., 2021); (2) Multi-objective combina­
visually illustrates the distribution of solutions. Evaluating the pop­
tion/hybridization hyper-heuristics: this type of hyper-heuristics com­
ulation based on performance metrics after each iteration allows for
bines a set of LLHs that can run simultaneously to create new solutions,
the allocation of rewards and penalties, thereby facilitating agent
such as AMALGAM (a multialgorithm genetically adaptive multi­
training.
objective) and MS-MOEA (multi-strategy ensemble multi-objective
evolutionary algorithm) (Vrugt & Robinson, 2007; Y. Wang & Li, 2010).
The focus of this paper is the multi-objective selection hyper-
heuristic. The successful applications of practical MOPs verify the
2
The code of the QLFHH and relevant experimental data are published on effectiveness of multi-objective selection hyper-heuristics. However, in
GitHub (https://github.com/gengzewu/QLFHH-matlab). designing multi-objective selection hyper-heuristics, there are a series of

2
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

difficulties. criterion and the heuristic selection mechanism are both included in the
Firstly, the evaluation of MOPs is complicated. Single-objective high-level strategy, and the heuristic selection mechanism determines
optimization problems can use the objective value to evaluate the which LLHs should be called. The acceptance criterion is then used to
quality of the obtained solution. However, the result of MOPs is a so­ evaluate the population and decide whether or not to accept the ob­
lution set instead of a single solution. The objective functions of the tained solutions. The low-level heuristics consist of a set of meta-heu­
solution set must be promoted simultaneously and distributed well to ristics or operators. The pseudo-code for QLFHH is illustrated in
the true Pareto front. Variety indicators for evaluating the solution set of Algorithm 1.
MOPs are proposed in the literature. When evaluating the quality of the In QLFHH, reasonable designs of action and states are essential to Q-
intermediate solution set obtained by a selected LLH in the framework of learning, because they are able to effectively reflect population states
hyper-heuristics, most of the exiting multi-objective selection hyper- and boost learning effectiveness. In this paper, each state corresponds to
heuristics use a single indicator. For example, MCHH uses the Pareto the situation of the population at that moment. Each action is a low-level
dominance concept to estimate the ratio of the offspring solution set heuristic.
dominating the parent solution set. HH-RILA employs the hyper-volume In the agent training process, an improved ε − greedy strategy is
(HV) indicator for evaluation, while PAPHH employs the Spacing indi­ utilized to select the action and shown in Eq. (1) and Eq. (2):
cator or HV indicator as the fitness function. HHCF uses four indicators
0.5
to rank the performance of LLHs, and is applied to solve the software ∈= 10×(t− 0.6×tmax )
(1)
module clustering problem. 1+ e tmax

Secondly, the learning strategy plays an important part which will {


guide the selection strategy to select suitable LLH. How to design an randomSet(A), other
μ(at |st ) = (2)
efficient learning strategy is crucial. It is good to note that learning the a* rand < ∊
performance of LLHs is based on the evaluation results. In the literature,
MHypEA (Multi-objective Hyper-heuristic Evolutionary Algorithm) Where ∊ refers to a sample value drawn from a standard normal distri­
(Kumari & Srinivas, 2016) and MOHH (Multi-Objective Hyper Heuristic bution at iteration t. The tmax is the stopping criteria, which means the
algorithm) (Qian et al., 2018) use reinforcement learning to learn LLHs. maximum number of iterations and tmax = maxFEs/N. In this paper,
MCHH employs reinforcement learning as well as Markov chains to maxFEs denotes the maximum number of function evaluations. Set(A) is
adaptively learn the performance of each LLH. HHCF proposes a novel an action set of Q-learning. a* denotes the action with the maximum Q-
choice function mechanism considering both intensification and diver­ value at the state st . From Eq. (1) it is clear that the value of ∊ gradually
sification aspects. HH-RILA and HH-mRILA (W. Li et al., 2023) measure decreases to 0 as the number of evaluations increases. According to Eq.
the change in HV values. The measurement will be used to reward or (1) and Eq. (2), at the beginning of training, the agent has an almost 50
punish LLH in a linear reward-penalty scheme. In addition, a number of % probability of exploring new actions. As time t increases, the proba­
studies incorporate learning into tabu search to design hyper-heuristics bility of the condition r and > 1 − ∊ holding gradually decreases. The
for solving space allocation and timetabling problems. In recent years, agent tends to select the action with the maximum Q-value. In other
HH-AP (Hitomi & Selva, 2016),PAPHH, and HHEG incorporated adap­ words, during the early stage of action selection (i.e., the selection of
tive operator selection (AOS) into multi-objective selection hyper- LLH), the agent maintains a certain degree of exploration, but as the
heuristics to solve the Exploration vs. Exploitation dilemma, that is, training progress, it favors using the learned knowledge to guide the
prefer ring high-quality LLHs, but still giving ‘poor’ LLHs the opportu­ selection of actions.
nity to be selected. The environment-based learning experiences of the agent are stored
Thirdly, after learning, another main question is how to select an in a Q-table. Each row and column in the Q-table stands for state and
appropriate LLH. Certain selection strategies are based on the stochastic action respectively. Each Q-value in the Q-table is calculated by Eq. (3).
method. For example, Mcclymont et al. and Walker et al. use the Markov Q(st , at )←Q(st , at ) + α(rt+1 + γmaxQ(st+1 , at+1 ) − Q(st , at ) ) (3)
chain to stochastically select the next LLH by traversing weighted edges
with the current LLH as the start node (Walker & Keedwell, 2016). (
t
)
MHypEA and PAPHH use the roulette-wheel approach to select an LLH α(t) = 1 − 0.9 × (4)
tmax
according to the weights of LLHs proportionally. Others are based on the
greedy method which only selects the LLH with the highest selection where Q(st , at ) represents the Q-value when the action at is taken in the
probability or the best performance. For example, HHCF selects the LLH state st . Parameter α controls the learning weight of each training round
with the highest choice function value into the next generation. MOHH and ranges from [0,1], called the learning rate. As shown in Eq. (4), the
utilizes the greedy method to select the LLH with the highest score ob­ value of α is gradually decreased from 1, as the value of α decreases from
tained from the reinforcement learning. Besides, HH-EG, HH-RILA and 1 to 0, the increasing attention the agent pays to the reward obtained by
HH-mRILA are rare studies combining the stochastic method and the the current action. The discount rate is indicated by γ, which ranges from
greedy method in the selection strategy. [0–1]. The reward after taking action at at state st is rt+1 . maxQ(st+1 ,
Inspired by the above studies, this paper uses Q-learning to learn, at+1 ) is the maximum Q-value of the Q-table at the state st+1 when taking
select, and apply LLHs to solve the given problem in the process of action at+1 . The pseudo-code for QLFHH is illustrated in Algorithm 1.
iteration, and the comprehensive indicator HV used in this paper, is also The flowchart of the QLFHH is shown in Fig. 1, and the process is
used to evaluate the quality of the solutions obtained by each LLH. described as follows: First, initialize the algorithm parameters, popula­
tion, Q-table, and state, and randomly select a low-level heuristic as the
3. Q-learning-based multi-objective hyper- heuristic initial action. Then, enter the main loop: obtain the current population
state, perform the selected action to generate offspring (with fuzzy op­
3.1. The proposed QLFHH erations), and produce the next generation through environmental se­
lection. Update the population state, calculate the reward value via the
The framework of selection hyper-heuristic is consisted of two parts: reward mechanism, and update the Q-table. Repeat this loop until the
a high-level strategy, and a group of low-level heuristics. The acceptance

3
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Fig. 1. Flowchart of the proposed QLFHH.

termination criterion is met. the references (X. Yang et al., 2021). The incorporation of fuzzy sets and
Algorithm 1 QLFHH membership functions in evolutionary algorithms offers several advan­
Input: A stop criterion (MaxFE); the size of population (N); initialized tages. It enables the algorithm to explore the original decision space
population P0 ; Learning rate (α); discount rate (γ); population state set while simultaneously narrowing down the search range of the decision
(S); action set(A)
Output: Pend
space. Consequently, during the agent’s training process, the fuzzy
1: Initialize Q-table, a state set(S), a action set(A), ε-greedy policy policy effectively reduces the original decision variable space, ensuring
2: while the stopping criterion is not met do: that Q-learning with fuzzy hyper-parameters maintains a certain
3: get the state of the current population (Pt ):st convergence rate in training the Q-table and enhances the quality of the
4: ​ if rand ≤ ∊ obtained solutions.
5: ​ actiont ← rand action ⌊ ( )⌋
6: ​ else Γ1n = 10i ⋅R−n 1 ⋅ Xn − Xnl ⋅Rn ⋅10− i + Xnl (5)
7: ​ actiont ← actionk // k =argmaxk (Q(s, at ) )
8: end if ⌈ ( )⌉

Γ2n = 10i ⋅R−n 1 ⋅ Xn − Xnl ⋅Rn ⋅10− i + Xnl (6)
9: ​ offspring ← Reproduction(Pt , actiont )
10: Pt+1 ← Environmental Selection (offspring, Pt , actiont )
11: evaluate the Pt+1 , then obtain rt 1
​ μÃ1 (Xn ) = (7)
12: ​ update next st+1 Xn − Γ1n
13: ​ update Q-table by using Eq. (3)
14: end while 1
15: return Pend μÃ2 (Xn ) = (8)
Γ2n − Xn
{( ⃒ )}

3.2. Fuzzy policy A
̃1 = Xn , μÃ1 (Xn )⃒⃒n = 1, 2, ⋯, D (9)

The fuzzy theory is also employed in QLFHH for solving MOPs. This {( ⃒

)}
paper introduces a fuzzy policy decision technique that incorporates A
̃2 = Xn , μÃ2 (Xn )⃒⃒n = 1, 2, ⋯, D , (10)
decision variables belonging to two fuzzy sets within the universe of
discourse. The technique involves calculating the degree of membership ⎧

for each decision variable separately with respect to the two fuzzy sets. ⎪
⎪ Γ1n ,

⎪ μÃ1 (Xn ) > μÃ2 (Xn )

The decision variable values are then updated to match the values
Xn = Γ2n ,
ʹ μÃ1 (Xn ) < μÃ2 (Xn ). (11)
represented by the fuzzy set with a higher degree of membership. This ⎪
⎪ ( 1 2)
⎩ rand Γn , Γn , μÃ1 (Xn ) = μÃ2 (Xn )


process transforms the original solution into a fuzzy solution, as all ⎪
decision variable values in the solution vector are adjusted accordingly.
Subsequently, the fuzzy solution is utilized in the evolutionary process
Where, X represents a D-dimensional original solution vector; Xʹ repre­
instead of the original solution. For the specific introduction of fuzzy
sents a D-dimensional fuzzy solution vector; Xn represents the n-th de
theory such as the degree of membership and fuzzy set, please refer to

4
­
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

cision variable of the original solution X; Xnl represents the lower limit of Nd ∑ ⃒
∑ M
(⃒ )
the value of the n-th decision variable; Rn is the length of the value in­ CDtavg =
⃒ i+1 ⃒
⃒fj − fji− 1 ⃒/Nd (12)
j− 1
terval of the i-th decision variable. Γ1n and Γ2n are the fuzzy target value of i=1

the n-th decision variable; the n-th decision variable is either fuzzed into
where t denotes tth generation, Nd denotes the number of all individuals
Γ1 or is blurred into Γ2 . Ã 1 and A
̃ 2 are fuzzy sets corresponding to two
n n
except for the boundary points, M represents the number of objectives,
fuzzy target values. μÃ1 and μÃ2 are membership functions correspond­
fji+1 and fji− 1 represent the jth objective function value indicating i + 1th
ing to two fuzzy sets. The following is the process of fuzzifying an
individual and i − 1th individual respectively.
original solution X into a fuzzy solution Xʹ: Calculate the membership
The state set is divided into four states: S = (S1 , S2 , S3 , S4 ) according
function values of X belonging to two fuzzy sets, and X will be fuzzified
to whether the maxFront value of the current population is greater than
into the fuzzy target value corresponding to the fuzzy set with the larger
one and whether the average crowding distance is better than that of the
degree of membership.
previous generation. The State set is as follows:
Algorithm 2 Fuzzy Operation
Input: P (population), N (size of population), Rate (fuzzy evolutionary rate), offspring
(∅) S1 :max(FrontNo) > 1, CDtavg >.CDt−avg1
Output: final offspring (fuzzy solution).
1: for 1: N do S2 :max(FrontNo) > 1, CDtavg <.CDt−avg1
2: calculate Γ1 and Γ2 by Eq. (5) and Eq. (6) S3 :max(FrontNo) = 1, CDtavg >.CDt−avg1
3: calculate μÃ1 and μÃ2 by Eq. (7) and Eq. (8)
4: if μÃ1 > μÃ2 then // Eq. (11) S4 :max(FrontNo) = 1, CDtavg <.CDt−avg1
5: logical←1
6: else
7: logical←0 3.4. Action and reward
8: end if
9: Xʹ←Γ2
)
Theoretically, QLFHH can incorporate any population-based meta-
10: Xʹ(find(logical))←Γ1 (find(logical) ; heuristics into the framework. In the QLFHH, three MOEAs are used as
11: offspring ←X ∪ offspring
ʹ
LLHs, i.e., NSGA-II, IBEA, and SPEA2 (Zitzler, Laumanns, & Thiele,
12: end for
13: return offspring
2001). These well-known meta-heuristics are chosen as LLHs because
they are classic MOEAs that can solve both low-dimensional and high-
dimensional MOPs. Moreover, they are different in environmental se­
The detailed procedure of the fuzzy operation is given in Algorithm 2. lection strategies.
First, obtain the length of the decision variable value interval. Next, NSGA-II uses a fast non-dominated sorting algorithm to rank solu­
tions into different levels. It also uses crowding distance to maintain the
calculate the two fuzzy target values Γ1 and Γ1 of the decision variables.
diversity of solutions and avoid solutions clustering together. This
Γ1 and Γ2 correspond to fuzzy sets à 1 and A
̃ 2 respectively. Calculate the
method ensures that the solution set is evenly distributed on the Pareto
membership degree of decision variables in two fuzzy sets. Then, update front.
the value of the decision variable according to the degree of member­ IBEA uses the hypervolume indicator to filter the best solutions and
ship. The update rule is that the value of the decision variable will be balance the diversity and superiority of the solutions. In addition, IBEA
updated to the fuzzy target value corresponding to the fuzzy set with a uses a mechanism called “ε-dominance” to handle the relative superi­
larger membership degree. Finally, the program returns to the fuzzy ority of the solutions and better preserve the diversity of solutions. In
solution. In particular, logical is a Boolean variable matrix, find(1) ε-dominance, a reference point is used to define a threshold for the
returns true and find(0) return false. hypervolume indicator, and solutions that are ε-dominated by the
reference point are considered nondominated. This mechanism allows
3.3. State set IBEA to find solutions that are not only Pareto optimal but also diverse.
SPEA2 incorporates a precise fitness assignment strategy, consid­
In this paper, Q-learning is employed as the high-level selection ering both the number of individuals dominating a solution and the
strategy. It is essential to clarify the representation of the current pop­ number of individuals by which it is dominated. It employs a nearest-
ulation state. For multi-objective optimization problems, the population neighbor density estimation technique to enhance search efficiency.
state is constructed based on the fitness value of the population (the Furthermore, SPEA2 enhances the archive truncation method used in
quality of the solutions), and the following two aspects are considered. the previous version of strength pareto evolutionary algorithm (SPEA)
by replacing the average linkage method. This improvement ensures the
(1) The degree of population proximity to the true Pareto front of the preservation of boundary points in the archive.
function. For each generation, the agent selects an appropriate action based on
(2) Population diversity. the current population state. In QLFHH, the aforementioned NSGAII,
IBEA, and SPEA2 are selected as the low-level heuristics within the
In NSGA-II, fast non-dominated sorting can effectively obtain a hyper-heuristic framework. These chosen low-level heuristics have been
dominant relationship among individuals in a population (Deng et al., proven successful in solving multi-objective optimization problems over
2022). The dominance of the individual can reflect the quality of the the past decades.
solutions. When we want to get the population state, a fast non- In QLFHH, the agent is not told which action to take but instead
dominated sorting technique divides the population into different Par­ discovers which action yields to higher reward by executing them on the
eto fronts, the value of FrontNo denotes the Pareto front where the so­ population, it is clear that this form of reward method can have a pos­
lution lies and indicates the quality of the solution. If max(FrontNo) > 1, itive effect. The metric Hyper-volume (HV) is introduced to evaluate the
the solutions in FrontNo = 1 dominate the solutions in FrontNo > 1. The population. If the HV value of tth generation is larger than t − 1th gen­
closer the population is to the true Pareto front, the larger the proportion eration, the agent will give a positive reward value or give a negative
of non-dominated solutions. While max(FrontNo) = 1, all solutions are reward value to the current action.
non-dominated. Then population diversity can be approximated by
calculating the average crowding distance (CDavg ) of individuals, and
CDtavg is calculated by Eq. (12).

5
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

3.5. Offspring reproduction and environmental selection (4) Updating the Q-Table: The Q-values are updated based on the
performance of the selected LLH. This involves updating the Q-
The process of training agents requires multiple explorations, which table, which is again an O(1) operation.
will affect the convergence speed of QLFHH. Therefore, in this paper, the
fuzzy optimization policy is introduced to produce fuzzy solutions as The overall time complexity of QLFHH is dominated by the time
offspring when the current iteration progress Iter < = Rate, where Rate complexity of applying the LLHs and evaluating the solutions. According
represents fuzzy evolutionary rate and Iter = FE / maxFE. Otherwise, to the literature on NSGA-II, IBEA, and SPEA2: NSGA-II has a time
( )
the offspring is generated by crossover and mutation strategies in the complexity of O MN2 , where M is the number of objectives and N is the
selected low-level heuristic. The whole procedure of offspring repro­ ( )
population size. IBEA has a time complexity of O N2 . SPEA2 has a time
duction is shown in Algorithm 3. In Algorithm 3, the crossover and
complexity of O(MN2 ).
mutation strategies are simulated binary crossover (SBX) and poly­
Given that QLFHH utilizes these LLHs, its time complexity is influ­
nomial mutation (PM), respectively (Pan et al., 2021). These strategies
enced by the most computationally expensive LLH it employs. Specif­
are also common offspring generation methods used in NSGAII, IBEA,
ically, if we consider the worst-case scenario where the selected LLH has
and SPEA2.
the highest time complexity, the overall time complexity of QLFHH can
Algorithm 3 offspring = Reproduction(P) ( )
be approximated by O M2 logM .
Input: P (population), Rate (fuzzy evolutionary rate), Crossover strategy, Mutation
strategy
Output: offspring. 4. Experimental results and analysis
Initialize Iter //Iter = FE/maxFE;
1: if Iter <= Rate then
4.1. Test suites
2: offspring ← Fuzzy Operation (Iter, P);
3: else
4: offspring ← Crossover and Mutation (P); The proposed Q-learning-based multi-objective hyper-heuristic al­
5: end if gorithm with fuzzy policy (QLFHH) controls three low-level heuristics:
6: return: offspring. NSGA-II, SPEA2, and IBEA. The performance of QLFHH is not only
compared to each individual LLH, but also to three popular MOEAs in
recent years: Adaw, BCEMOEAD, and GFMMOEA.
In this paper, ZDT(1–6), DTLZ(1–7), and IMOP(1–8) 20 complex
The whole procedure of Environmental Selection is shown in Algorithm MOPs test problems are used to evaluate the performance of QLFHH
4. For the detailed environmental selection process, please refer to the (Tian, Cheng, Zhang, Li, et al., 2019). These problems have various
relevant literature on NSGAII, IBEA, and SPEA2. characteristics, such as liner, multimodal, concave, convex, multi-grid,
and multi-segment discontinuous, etc. The characteristics of each test
Algorithm 4 Environmental Selection instance are listed in Table 1, where M represents the objective number
Input: offspring, P (population), action of the optimization function and D represents the dimension of decision
Output: P (final population)
variables. The proposed QLFHH algorithm has also been tested on multi-
1: if action = NSGAII
objective real-world problems (real-world MOPs).
2: population = NSGAII-EnvironmentalSelection(offspring, P);
3: else if action = IBEA
4: population = IBEA-EnvironmentalSelection(offspring, P);
5: else if action = SPEA2 4.2. Performance metrics
6: population = SPEA2-EnvironmentalSelection(offspring, P);
7: end if Hyper-volume (HV) and Inverted generational distance (IGD) are
8: return: the final population P
two widely used performance indicators that cover all aspects of the
quality of solutions obtained by an algorithm (M. Li & Yao, 2019a). The
IGD is calculated by Eq. (13) and Eq. (14):
3.6. Time complexity of QLFHH ∑n
di
IGD = i=1 (13)
n
This section provides a comprehensive analysis of the computational
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
complexity of QLFHH. The algorithm operates by continuously selecting ∑k ( )2
low-level heuristics (LLHs) to apply to the population, while simulta­ di = j=1
pai,j − ptrue j (14)
neously training the Q-table. Each iteration involves two main pro­
cesses: the selection of LLHs and the acceptance of solutions. Here n is the number of solutions in true Pareto front, and di denotes
the minimum Euclidean distance between the Pareto optimal solution
(1) Selection of LLHs: In each iteration, QLFHH evaluates the current obtained by the algorithm and the solutions of the true Pareto front. k
state of the population and uses the Q-table to select the most determines the number of objectives in the test function. pai,j represents
appropriate LLH. This decision-making process involves a lookup the ith Pareto solution obtained by algorithm on the jth objective func­
in the Q-table, which is an O(1) operation given that it is a tion. ptrue
j is the nearest Pareto solution in the true Pareto front from pai,j .
constant-time access. It can be discovered that the lower the IGD value, the better the algo­
(2) Application of LLHs: Once an LLH is selected, it is applied to the rithm’s performance in terms of convergence and diversity. It indicates
population. The time complexity of this step is dependent on the that all of the algorithm’s solutions are contained in the true Pareto front
specific LLH used. Different LLHs have different computational if IGD = 0. When calculating the IGD, the true Pareto front will be
costs. For instance, crossover and mutation operations typical in provided beforehand.
evolutionary algorithms can vary in complexity. The HV is calculated using Eq. (8). Where zr = (zr1 , zr2 , ⋯zrm ) repre­
(3) Evaluation of Solutions: Once an LLH is selected, it is applied to sents a reference point in the objective space that is dominated by all the
the population. The time complexity of this step is dependent on Pareto-optimal points. S denotes the set of solutions obtained by the
the specific LLH used. Different LLHs have different computa­ algorithm, and HV assesses both convergence and diversity of solutions
tional costs. For instance, crossover and mutation operations by measuring the size of the objective space dominated by the solutions
typical in evolutionary algorithms can vary in complexity. in S, bounded by zr . Here, m represents the number of objectives.

6
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

(⋃ )
[ ]*⋯* [ ]
HV(S) = VOL f1 (x), zr1 fm (x), zrm (15)
x∈S

From Eq. (15), it can be concluded that if the solutions are closer to
the complete true Pareto front, the better the quality of the solutions and
the larger the HV value, where VOL(*) represents the Lebesgue measure.

4.3. Parameter sensitivity analysis and parameter setting

The proposed QLFHH has four main parameters: population size N,


learning rate α, fuzzy evolutionary rate Rate and discount factor γ. α is
the amount that weighs the previous learning result against this learning
result. When α is set too low, the agent only emphasizes prior knowl­
edge, and no new reward is accumulated. The impact of past perfor­
mance on the Q-value of state-action pairs decreases as α increases. In
general, 0.5 is taken to balance the previous knowledge and the new
reward (Ji, Guo, Gao, Gong, & Wang, 2023). After some preliminary
Fig. 2. Main effects plot of the control parameters.
experiments, the setting values of the three parameters are as follows:
N∈{60, 80, 100}, Rate∈{0.4, 0.6, 0.8} and γ∈{0.7, 0.8, 0.9}.
For the fairness of the experiment, the algorithm was run indepen­
dently 30 times under each set of parameter combinations, and the
maximum number of evaluations was set to 25,000. The test instance is
the 2-objective DTLZ1. The results of ANOVA for parameters calibration
are shown in Table 2 and the main effects plot of the control parameters
is shown in Fig. 2 (Zhao et al., 2022). From Table 2 and Fig. 3, N is the
main parameter affecting the performance of the QLFHH. From the
above experiments, the parameters are set as N = 100, Rate = 0.8 and γ
= 0.8.
In all the experiments afterwards, for a fair comparison, each algo­
rithm is executed 30 times independently with 50,000 evaluations for
each test function. In this paper, all algorithms’ population size (N) is set
to 100. The (Fig. 4) experiment results are saved in Tables 3–9, where
the best results of all algorithms are highlighted in bold.
In the comparison experiments, all algorithms are given real-valued
variables. The crossover operator uses SBX and the crossover probability
is set to Pc = 1.0. The mutation operator uses PM and the mutation
probability is set to Pm = 1/d, where d denotes the number of variables
in the decision space. In BCEMOEAD, the parameter T denotes the Fig. 3. Box plot of IGD on four test algorithms.
neighborhood size and is set to ⌈0.1*N⌉, the parameter δ represents the
neighborhood selection probability is set to 0.9, and the parameter nr et al., 2017), and performed on an AMD R5800H/16G/1T computer.
donates the maximum number of solutions replaced by each offspring,
which is set to ⌈0.01*N⌉ (N denotes the population size). In AdaW, the 4.4. Effectiveness analysis of strategies
time of updating the weight vectors is every 5 % of the total generations
and the time of not allowing the update is the last 10 % of generations, In QLFHH, there are mainly three different strategies, Q-learning, ε
the maximum archive capacity is set to 2 N. In GFM-MOEA, the penalty − greedy policy and fuzzy optimization mechanism. The ε − greedy
parameter θ and the frequency fr are set to 0.2 and 0.1 respectively. In policy, a sub-strategy of Q-learning, serves as its selection mechanism
SPEA2, both the primary population and secondary population size are and is therefore included within Q-learning. Q-learning is used as a high-
set to 100. For IBEA, the fitness scaling factor k is set to 0.05. The level strategy for the hyper-heuristic algorithm to select different low-
intensification parameter α of HH-CF is set to 100. Additionally, to level heuristics according to the environment. Fuzzy operation is
evaluate the statistical significance of the difference between the results added in the process of offspring reproduction to balance the conver­
obtained by QLFHH and other compared algorithms, Wilcoxon’s rank- gence speed of the algorithm and to improve the quality of the final
sum test is used, and the significance level α is set to 0.05. In the last solution of the algorithm.
row of Tables 2-6, the number of test problems for which the perfor­ The effectiveness of the proposed strategy is tested by removing the
mance of the compared algorithm is superior to, inferior to, or roughly recommended strategies from this paper separately and comparing them
equivalent to that of QLFHH is listed. with QLFHH. The QLFHH that replaces Q-learning with random selec­
All experiments in this paper are implemented on PlatEMO (Tian tion and removes fuzzy strategies is denoted as HHRC. The QLFHH with
a random choice instead of Q-learning and preserving the fuzzy strategy
Table 2 is denoted as HHRCF. The QLFHH that only removes fuzzy strategies is
The result of ANOVA for parameters calibration. denoted as HHQL. To ensure the fairness of the comparison, all algo­
Source Sum Sq. d. f. Mean Sq. F-ratio Prob > F rithms are run 30 times independently, 50,000 iterations, and the pop­
N 0.03217 2 0.01609 6017.5 0 ulation size is set to 100. All algorithms are tested on 3-objective DTLZ7
Rate 0.00298 2 0.00149 557.12 0 test problem.
γ 0.00025 2 0.0013 47.55 0 The box plot results for the four tested algorithms are shown in Fig. 3.
Error 0.00002 8 0 ​ ​ From Fig. 3, it can be seen that both 2 strategies have an impact on the
Total 0.03593 26
effectiveness of the algorithm. The performance of the algorithm is
​ ​ ​

7
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Fig. 4. The convergence behavior of the different algorithms on ZDT.

worst when Q-learning and fuzzy strategy is removed. The performance 4.5. Analysis of results
of the algorithm with only fuzzy removed is better than the performance
of the algorithm with only Q-learning removed, and both are better than The convergence behavior of the different algorithms on the 3-objec­
HHRC, which proves the effectiveness of Q-learning and fuzzy strate­ tive DTLZ test problems are depicted in Fig. 5. The logarithm of the
gies. It is observed that the improvement of algorithm performance by inverse generational distance (IGD) values obtained by each algorithm
the combination of Q-learning and fuzzy strategies is greater than the on the test problems are used to provide a clearer representation of the
improvement of algorithm performance by a single strategy, and Q- convergence process for each algorithm. The graph reveals that QLFHH
learning has a greater impact on the stability of the algorithm. exhibits superior convergence speed and solution quality compared to
the other algorithms under consideration. Furthermore, Table 3 presents
the HV values obtained by comparing different algorithms and QLFHH
on DTLZ problems. From the table, it is evident that QLFHH outperforms

8
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Table 3
The HV values obtained by comparison algorithms and QLFHH on DTLZ.
Problem AdaW BCEMOEAD GFMMOEA IBEA NSGA-II SPEA2 QLFHH

DTLZ1 5.8211e-1 (2.83e-4) 5.8219e-1 (4.82e-4) 5.8199e-1 (5.82e-4) 4.1301e-1 (2.69e-2) 5.8095e-1 (9.68e-4) 5.8199e-1 (5.70e-4) 5.8256e-1 (9.60e-
(2) − − − − − − 6)
DTLZ2 3.4745e-1 (1.96e-4) 3.4727e-1 (7.60e-5) 3.4747e-1 (5.97e-5) 3.4620e-1 (2.35e-4) 3.4656e-1 (1.87e-4) 3.4726e-1 (8.44e-5) 3.4756e-1 (4.37e-
(2) − − − − − − 5)
DTLZ3 3.4254e-1 (3.05e-3) 3.4246e-1 (2.94e-3) 3.4283e-1 (2.95e-3) 1.6774e-1 (4.18e-3) 3.4260e-1 (3.11e-3) 3.4133e-1 (4.00e-3) 3.4769e-1 (2.78e-
(2) − − − − − − 5)
DTLZ4 3.4726e-1 (4.36e-4) 3.2161e-1 (7.82e-2) 2.7049e-1 (1.20e-1) 2.6118e-1 (1.22e-1) 2.7839e-1 (1.15e-1) 3.2163e-1 (7.82e-2) 3.2162e-1 (7.82e-
(2) þ = − − − = 2)
DTLZ5 3.4743e-1 (2.42e-4) 3.4727e-1 (7.71e-5) 3.4745e-1 (6.50e-5) 3.4614e-1 (2.62e-4) 3.4658e-1 (1.71e-4) 3.4725e-1 (1.09e-4) 3.4757e-1 (4.30e-
(2) − − − − − − 5)
DTLZ6 3.4740e-1 (2.93e-4) 3.4746e-1 (9.22e-5) 3.4758e-1 (4.00e-5) 3.4269e-1 (6.62e-4) 3.4646e-1 (2.19e-4) 3.4756e-1 (3.75e-5) 3.4771e-1 (2.43e-
(2) − − − − − − 5)
DTLZ7 2.4291e-1 (9.49e-5) 2.4294e-1 (1.67e-5) 2.4150e-1 (2.55e-4) 2.4042e-1 (1.22e-2) 2.4269e-1 (5.60e-5) 2.4288e-1 (3.15e-5) 2.4295e-1 (7.70e-
(2) = − − − − − 6)
DTLZ1 8.4017e-1 (7.23e-4) 8.4093e-1 (6.60e-4) 8.4179e-1 (6.33e-4) 4.5677e-1 (6.02e-2) 8.2753e-1 (2.50e-3) 8.4174e-1 (8.12e-4) 8.4305e-1 (1.30e-
(3) − − − − − − 4)
DTLZ2 5.5919e-1 (8.94e-4) 5.5781e-1 (1.03e-3) 5.5986e-1 (1.17e-3) 5.5752e-1 (8.89e-4) 5.3453e-1 (5.22e-3) 5.5571e-1 (1.39e-3) 5.6137e-1 (5.44e-
(3) − − − − − − 4)
DTLZ3 5.4491e-1 (7.94e-3) 5.4922e-1 (8.44e-3) 4.9166e-1 (1.06e-1) 2.4150e-1 (6.37e-3) 5.1989e-1 (1.36e-2) 5.4646e-1 (5.35e-3) 5.6237e-1 (5.40e-
(3) − − − − − − 4)
DTLZ4 5.4560e-1 (5.41e-2) 5.5747e-1 (1.30e-3) 4.9426e-1 (1.36e-1) 5.5756e-1 (1.17e-3) 5.2354e-1 (8.18e-2) 4.4883e-1 (1.23e-1) 5.1827e-1 (6.85e-
(3) + + − þ + = 2)
DTLZ5 1.9985e-1 (1.80e-4) 1.9949e-1 (1.23e-4) 1.9990e-1 (6.95e-5) 1.9864e-1 (3.06e-4) 1.9913e-1 (1.42e-4) 1.9955e-1 (1.39e-4) 2.0003e-1 (2.01e-
(3) − − − − − − 4)
DTLZ6 1.9988e-1 (6.27e-5) 1.9994e-1 (4.39e-5) 2.0008e-1 (3.90e-5) 1.9665e-1 (8.28e-4) 1.9946e-1 (1.39e-4) 2.0006e-1 (4.23e-5) 2.0018e-1 (2.59e-
(3) − − − − − − 5)
DTLZ7 2.7953e-1 (5.48e-4) 2.7738e-1 (6.45e-4) 2.3969e-1 (5.55e-2) 2.7077e-1 (2.12e-2) 2.6594e-1 (1.04e-2) 2.7533e-1 (6.16e-3) 2.8031e-1 (4.47e-
(3) − − − − − − 4)
+/-/= 2/11/1 1/12/1 0/14/0 1/13/0 1/13/0 0/12/2 ​

Table 4
The IGD values obtained by comparison algorithms and QLFHH on ZDT.
Problem AdaW BCEMOEAD GFMMOEA IBEA NSGA-II SPEA2 QLFHH

ZDT1 3.8925e-3 (3.78e-5) 3.9481e-3 (5.38e-5) 3.8613e-3 (3.39e-5) 4.1221e-3 (5.83e-5) 4.7606e-3 (2.52e-4) 3.9561e-3 (6.66e-5) 3.7491e-3 (4.76e-
(2) − − − − − − 5)
ZDT2 3.8912e-3 (3.33e-5) 3.8929e-3 (3.69e-5) 3.8597e-3 (2.99e-5) 8.3190e-3 (7.19e-4) 4.8283e-3 (1.78e-4) 3.9381e-3 (5.07e-5) 3.8368e-3 (3.63e-
(2) − − − − − − 5)
ZDT3 4.6079e-3 (6.86e-5) 4.6470e-3 (4.28e-5) 1.6692e-2 (3.94e-3) 1.5741e-2 (6.36e-4) 6.4564e-3 (5.41e-3) 4.8606e-3 (1.05e-4) 5.0536e-3 (3.05e-
(2) þ + − − − + 4)
ZDT4 4.1694e-3 (2.83e-4) 4.0645e-3 (1.80e-4) 3.9516e-3 (1.14e-4) 1.8849e-2 (6.12e-3) 4.7403e-3 (2.20e-4) 4.0684e-3 (2.23e-4) 3.7028e-3 (3.73e-
(2) − − − − − − 5)
ZDT6 3.1325e-3 (4.35e-5) 3.0957e-3 (2.10e-5) 3.1907e-3 (1.34e-4) 4.4453e-3 (1.18e-4) 3.6948e-3 (9.50e-5) 3.0868e-3 (2.40e-5) 3.0705e-3 (1.83e-
(2) − − − − − − 5)
+/-/= 1/4/0 1/4/0 0/5/0 0/5/0 0/5/0 1/4/0 ​

Table 5
The HV values obtained by comparison algorithms and QLFHH on IMOP.
Problem AdaW BCEMOEAD GFMMOEA IBEA NSGA-II SPEA2 QLFHH

IMOP1 9.8729e-1 (4.86e-5) 9.8729e-1 (5.34e-5) 9.8722e-1 (4.12e-5) 9.8454e-1 (1.36e-3) 9.8713e-1 (9.07e-5) 9.8717e-1 (6.49e-5) 9.8754e-1 (3.57e-
(2) − − − − − − 5)
IMOP2 2.1556e-1 (2.55e-2) 1.2179e-1 (5.08e-2) 2.1302e-1 (1.68e-2) 1.9835e-1 (6.03e-2) 2.3140e-1 (1.65e-4) 2.3180e-1 (8.10e-5) 2.3210e-1 (8.51e-
(2) − − − − − − 5)
IMOP3 6.5283e-1 (6.12e-3) 6.5866e-1 (1.63e-4) 6.4993e-1 (1.29e-2) 6.5688e-1 (1.25e-4) 6.5843e-1 (9.49e-5) 6.5823e-1 (1.92e-3) 6.5858e-1 (2.46e-
(2) − ¼ − − − = 4)
IMOP4 4.3233e-1 (1.03e-3) 4.3306e-1 (4.71e-4) 4.3340e-1 (7.54e-4) 4.3339e-1 (2.23e-4) 4.3278e-1 (3.14e-4) 4.3307e-1 (4.15e-4) 4.3408e-1 (2.40e-
(3) − − − − − − 4)
IMOP5 5.1638e-1 (1.18e-2) 5.0719e-1 (8.96e-3) 5.0327e-1 (4.86e-3) 5.0932e-1 (3.76e-3) 4.9290e-1 (3.85e-3) 4.9966e-1 (1.67e-3) 5.6010e-1 (1.32e-
(3) − − − − − − 2)
IMOP6 5.2916e-1 (4.92e-4) 5.2815e-1 (5.86e-4) 5.1732e-1 (4.91e-2) 5.1755e-1 (1.27e-3) 4.9781e-1 (9.20e-3) 5.0292e-1 (6.65e-2) 5.2671e-1 (8.66e-
(3) þ + − − − − 4)
IMOP7 4.9288e-1 (8.30e-2) 5.2721e-1 (6.77e-4) 2.1774e-1 (1.93e-1) 1.7505e-1 (1.59e-1) 4.9359e-1 (8.30e-3) 1.5269e-1 (1.49e-1) 5.1813e-1 (3.17e-
(3) = + − − − − 3)
IMOP8 5.3014e-1 (5.20e-3) 5.1977e-1 (2.18e-3) 5.3866e-1 (4.48e-3) 5.3605e-1 (5.33e-4) 4.7191e-1 (5.71e-3) 5.0473e-1 (3.17e-2) 5.3639e-1 (3.52e-
(3) − − þ − − − 2)
+/-/= 1/6/1 2/5/1 1/7/0 0/8/0 0/8/0 0/7/1 ​

9
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Table 6 the other compared algorithms on IMOP4, IMOP6, and IMOP7.


The HV values of each algorithms on real-world problems. In Fig. 8, the box plots of IGD on QLFHH and other compared algo­
Problem IBEA NSGAII SPEA2 QLFHH rithms are utilized to illustrate the proposed QLFHH’s performance.
There are six test problems are used to assess their effectiveness, i.e.,
MOKP(2) 5.2451e-1 5.2363e-1 5.2608e-1 5.3906e-1
(2.58e-3) − (2.94e-3) − (2.53e-3) − (2.76e-3) DTLZ3, DTLZ4, ZDT4., IMOP4, IMOP6, and IMOP7, and all simulations
MONRP 6.5277e-1 6.4513e-1 6.5443e-1 6.8031e-1 are run by 30 times. In these test problems, DTLZ3 and DTLZ4 are both
(2) (8.12e-3) − (8.99e-3) − (9.01e-3) − (4.70e-3) 3-object problems; ZDT4, DTLZ3, and DTLZ4 are all regular PF; IMOP4,
MOTSP 7.7030e-1 7.7066e-1 7.7078e-1 7.9740e-1 IMOP6, and IMOP7 are all irregular PF; In addition, ZDT4 and DTLZ3
(2) (1.08e-2) − (8.18e-3) − (9.60e-3) − (1.24e-2)
mQAP(2) 6.7909e-1 6.7929e-1 6.7884e-1 6.6793e-1
are both multimodal problems; DTLZ4 is biased, and IMOP6 is a multi-
(2.87e-3) + (2.95e-3) þ (1.95e-3) + (2.75e-3) grid problem. From Fig. 8, it can be learned that the results obtained by
MLDMP 7.7140e-1 7.2608e-1 5.6763e-1 8.3113e-1 the proposed QLFHH are concentrated on most test problems, and it is
(3) (1.12e-2) − (3.65e-2) − (1.67e-1) − (1.94e-3) reasonable that the solutions obtained by QLFHH on the test instances
MPDMP 3.1782e-2 2.7031e-1 2.7767e-1 2.7792e-1
are more stable than compared algorithms. Finally, it can be concluded
(3) (1.22e-2) − (2.02e-3) − (5.85e-4) = (5.59e-4)
MLDMP 7.8031e-2 8.1793e-2 1.6400e-1 2.9640e-1 that QLFHH is capable of handling both regular PF and irregular PF test
(5) (8.04e-3) − (6.28e-2) − (7.43e-2) − (6.42e-3) problems.
MPDMP 6.4367e-3 1.0289e-1 1.1583e-1 1.1644e-1 Based on the analysis of the experimental results on the test instances
(5) (8.59e-5) − (2.99e-3) − (5.77e-4) − (4.96e-4) of DTLZ, ZDT, and IMOP mentioned above, QLFHH performs better than
MLDMP 3.8341e-3 5.3276e-5 2.6650e-3 4.1450e-2
(8) (1.05e-4) − (2.03e-4) − (1.91e-3) − (8.71e-4)
the comparison algorithms on most test instances, but in other test in­
MPDMP 8.2762e-4 2.2449e-2 2.8189e-2 2.8310e-2 stances, its performance is slightly worse than that of the comparison
(8) (3.47e-5) − (1.19e-3) − (2.49e-4) = (2.51e-4) algorithms. The reason for analysis is that QLFHH uses Q-learning as the
MLDMP 1.1281e-3 0.0000e + 5.7978e-4 1.4057e-2 high-level selection mechanism, and in the training stage and selection
(10) (1.33e-4) − 0 (0.00e + 0) − (5.11e-4) − (5.74e-4)
stage, random selection of actions will inevitably occur in Q-learning,
MPDMP 9.8553e-5 6.9496e-3 8.8491e-3 8.8742e-3
(10) (9.27e-6) − (3.48e-4) − (8.36e-5) = (1.22e-4) that is, the best action in the current state is not selected. At the same
+/-/= 1/13/0 1/13/0 1/10/3 ​ time, the design of the state and reward mechanism of QLFHH may lead
to the inability of QLFHH to fully train the Q-table during the optimi­
zation process, so that the algorithm can exert its maximum perfor­
the other algorithms on DTLZ, except DTLZ4. The results of Friedman’s mance. The design of the low-level heuristics also affects the
test, as shown in Fig. 6(a), indicate that QLFHH exhibits the best per­ performance of the algorithm. Although NSGAII, IBEA, and SPEA2 cover
formance among all algorithms in solving DTLZ problems. Compared to three different types of multi-objective optimization algorithms, and
NSGA-II and IBEA, QLFHH shows significant differences. While there each can give play to its advantages on different problems, compared
may not be significant differences compared to other algorithms, with the comparison algorithms, these three low-level heuristics still
QLFHH still performs the best. Overall, the experimental results have shortcomings in some test problems with special characteristics.
demonstrate that QLFHH is an efficient algorithm for solving DTLZ From the above analysis, QLFHH still has certain room for improvement,
problems. so that it can be applicable to solve more multi-objective optimization
The convergence behavior of the different algorithms on ZDT test problems.
problems are shown in Fig. 5, and it shows that the proposed QLFHH has Plots of the approximate Pareto fronts (PFs) on 3-objective IMOP6
a better quality of solutions and speed of convergence on various of the obtained by each algorithm are illustrated in Fig. 9. From the figure, it
test problems. From Fig. 6(b), it can be concluded that QLFHH ranks first can be observed that, in comparison to the true Pareto front, the PFs
compared among all the algorithms on ZDT.
The experimental results of the IGD values obtained by comparison
algorithms and QLFHH on ZDT are listed in Table 4, where bold in­ Table 8
dicates that the algorithm obtained standard deviation results with the The statistical comparisons with each LLH on HV value.
best mean IGD value on the test instance. These results demonstrate that, Problem IBEA NSGA-II SPEA2 QLFHH
when compared to the other seven algorithms, the QLFHH performs best
Real-world-pro (36) (38) (31) (15)
on ZDT test problems, but for ZDT3, there are no appreciable differences
Average rank (36) (38) (31) (15)
among the compared algorithms. The results of Wilcoxon’s rank sum test Final rank 3 4 2 1
show that the QLFHH algorithm has better performance than compared
algorithms.
The HV values obtained by comparison algorithms and QLFHH on Table 9
IMOP are presented in Table 5. These simulation results demonstrate The statistical comparisons with hyper-heuristics on HV value.
that QLFHH achieves the best performance on IMOP1, IMOP2, IMOP3,
Problem HHRC HHCF HHEG QLFHH
IMOP5, and IMOP7. However, for IMOP4, IMOP6, and IMOP8, there are
no significant differences observed among the compared algorithms. Real-world-pro (33) (42) (29) (16)
Average rank (33) (42) (29) (16)
The convergence behavior of the different algorithms on the IMOP test
Final rank 3 4 2 1
problems is illustrated in Fig. 7. It is evident that QLFHH outperforms

Table 7
The statistical comparisons with each algorithm on HV value.
Problem AdaW BCEMOEAD GFMMOEA IBEA NSGA-II SPEA2 QLFHH

DTLZ (2) (20) (25) (25) (49) (37) (29) (9)


DTLZ (3) (26) (24) (27) (39) (40) (30) (11)
ZDT (2) (19) (16) (17) (31) (32) (20) (5)
IMOP (2) (12) (11) (16) (18) (12) (11) (4)
IMOP (3) (18) (17) (18) (19) (30) (29) (9)
Average rank (19.0) (18.6) (20.6) (31.2) (30.2) (23.8) (7.6)
Final rank 3 2 4 7 6 5 1

10
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Fig. 5. The convergence behavior of the different algorithms on 3-objective DTLZ.

obtained by IBEA are distributed along the boundaries, while NSGA-II performs better in combining the strengths of each LLH.
prefers to concentrate on the left half. In contrast, the PFs obtained by The details in Fig. 10 show that QLFHH ranks first on 12 out of 14
SPEA2 are evenly distributed. It also can be observed that the PFs ob­ DTLZ problems. The advantages of QLFHH can also be found on ZDT and
tained by HHRC and HHCF are distributed along the boundaries. This IMOP test suites, with the best rank on all ZDT problems and 5 out of 8
may be attributed to the fact that they tend to select IBEA and NSGA-II IMOP problems.
frequently during the search process. Compared to other algorithms,
HHEG and QLFHH exhibit better convergence and diversity in their
obtained PFs. This is because they utilize the collaborative effects of 4.6. Experimental results on real-world MOPs
each LLH during the search process, which leads to better performance.
In comparison to HHEG, the PFs obtained by QLFHH is uniform and QLFHH is also tested on six kinds of multi-objective real-world
smooth, which demonstrates that the proposed strategy in this paper problems. The QLFHH algorithm was compared with state-of-the-art
algorithms: IBEA, NSGA-II, and SPEA2, as well as with various hyper-

11
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Fig. 6. Rankings for DTLZ. and ZDT.

Fig. 7. The convergence behavior of the different algorithms on IMOP.

heuristics, including those random strategies based hyper-heuristic objective TSP (MOTSP), a traveling salesman needs to visit multiple
(HHRC), choice functions based hyper-heuristic (HHCF), and epsilon- locations and attempt to minimize the total path length while satisfying
greedy based hyper-heuristic (HHEG). Here are introductions of these multiple objectives or weights. This problem has various applications in
problems: fields such as logistics, transportation planning, and intelligent
Knapsack Problem (KP) (Zitzler & Thiele, 1999) involves the selec­ transportation.
tion of items to be placed into two backpacks, each with a weight limit. Next Release Problem (NRP) (Y. Zhang et al., 2007) is concerned
The primary objective is to maximize the total value of all items placed with achieving maximum satisfaction and minimum resource con­
into both backpacks. To convert the KP into a minimization problem, the sumption when a software development company completes the next
objective function can be multiplied by a negative one (− 1). version of its software product. This problem is of great importance in
Traveling Salesman Problem (TSP) (Corne & Knowles, 2007) refers software engineering, as it involves optimizing the release planning
to the problem of finding the shortest path that visits all given locations process and balancing the needs and constraints of various stakeholders.
while taking into account multiple objectives or weights. In multi- Quadratic Assignment Problem (QAP) (Knowles & Corne, 2003) is a

12
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Fig. 8. Box plots of IGD value on QLFHH and compared algorithms.

combinatorial optimization problem that involves allocating a set of minimize the distances between a single point and a given set of target
facilities to a set of locations, where each facility has a certain flow to lines. This problem arises in various fields, such as robotics, computer
every other facility and each location has a certain distance to every vision, and geographic information systems.
other location. The objective of the QAP is to minimize the total More experimental details on KP, NRP, TSP, QAP, MP-DMP and ML-
weighted sum of the flows between all pairs of facilities, while satisfying DMP are shown in Fig. 11. The number of objectives is listed in the
the constraints of assigning each facility to exactly one location and each bracket next to the problem. The mean HV value obtained by each al­
location to exactly one facility. gorithm is dotted in Fig. 11, and the rankings of each algorithm obtained
Multipoint Distance Minimization Problem (MP-DMP) (M. Li et al., by Wilcoxon’s rank-sum test are shown on the right of each subplot. The
2018) is a mathematical optimization problem that aims to minimize the mean HV obtained by comparison algorithms and QLFHH on real-world
distances between a single point and a given set of target points. This problems are also listed in Table 6. When comparing with meta-heuris­
problem is widely applicable in various fields, such as facility location, tics. QLFHH ranks first on 11 out of 12 problems except for 2-objective
network design, and vehicle routing. mQAP. Compared to hyper-heuristics. QLFHH is the top algorithm on 12
Multiline Distance Minimization Problem (ML-DMP) (Koppen & problems except for mQAP and (5-, 8-, and 10-objective) ML-DMP.
Yoshida, 2007) is a mathematical optimization problem that aims to

13
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Fig. 9. The approximate PFs among 30 runs obtained by each algorithm on 3-objective IMOP6.

4.7. Cross-domain performance analysis ability of each algorithm across the 2 and 3-objective DTLZ and IMOP
test suites, as well as the real-world test suite. Each algorithm is run in 30
The cross-domain ability of hyper-heuristics needs to be assessed independent trials on one problem. Table 7, Table 8 and Table 9 display
when solving multiple problems. This section analyzes the cross-domain the ranking of QLFHH and other algorithms based on the experimental

14
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Fig. 10. Mean HV values and corresponding Wilcoxon’s rank-sum test of each algorithm.

Fig. 11. Mean HV values and corresponding Wilcoxon’s rank-sum test of each algorithm on real-world problems.

15
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

results of all test problems. It can be shown that QLFHH is the best- Declaration of competing interest
performing algorithm on all test suites among all algorithms. It dem­
onstrates that QLFHH is both competitive and effective in addressing a The authors declare that they have no known competing financial
wide range of different characteristics of MOPs, including real-world interests or personal relationships that could have appeared to influence
MOPs. ’Average rank’ denotes the rank value averaged from all test the work reported in this paper.
suites, while ’Final rank’ is the rank according to the ’Average rank’.
QLFHH is the top algorithm, having the best cross-domain ability with Acknowledgements
an ’Average rank’ of 7.6, 15, and 16 respectively, ranking first.
This work was financially supported by the National Natural Science
Foundation of China under grant 62473182. It was also supported by the
4.8. Discussions Key Program of National Natural Science Foundation of Gansu Province
under Grant 23JRRA784, the Industry Support Project of Gansu Prov­
From the results of the experiment above, it can be seen that the ince College under Grant 2024CYZC-15, the Intellectual property pro­
QLFHH is an effective approach for MOPs. In the early stage of evolu­ gram of Gansu Province under Grant 24ZSCQG045 respectively.
tion, slow convergence speed will occur on ZDT(1–6), DTLZ(5–7),
IMOP1 and IMOP6 test problems, which is because in the process of Data availability
training, the agent requires numerous explorations, and adopts different
low-level heuristics in order to train the agent. In the middle of the al­ Data will be made available on request.
gorithm, ∊ value is reduced to approximately 0, the exploration process
gradually decreases then fuzzification decreases and shifts to exact References
evolution. In the exact evolution, the trained agent selects the low-level
heuristic with the highest Q-value in different states according to the Q- Bader, J., & Zitzler, E. (2011). HypE: An Algorithm for Fast Hypervolume-Based Many-
Objective Optimization. Evolutionary Computation, 19(1), 45–76. https://doi.org/
table. QLFHH outperformed the 3 individual low-level heuristics on
10.1162/EVCO_A_00009
almost all of the selected test problems, and compared to the other four Burke, E. K., Hyde, M. R., Kendall, G., Ochoa, G., Özcan, E., & Woodward, J. R. (2019).
algorithms, QLFHH also shows a certain degree of advantage. A classification of hyper-heuristic approaches: Revisited. International Series in
In selecting hyper-heuristic algorithms, low-level heuristics’ perfor­ Operations Research and Management Science, 272, 453–477. https://doi.org/
10.1007/978-3-319-91086-4_14/COVER
mance directly affects its optimization effect. Three well-known meta- Cao, J., Zhang, J., Zhao, F., & Chen, Z. (2021). A two-stage evolutionary strategy based
heuristic algorithms, NSGA-II, SPEA2, and IBEA, are selected as LLHs in MOEA/D to multi-objective problems. Expert Systems with Applications, 185(July
this paper. Choosing other metaheuristic algorithms or operators is 2020), Article 115654. https://doi.org/10.1016/j.eswa.2021.115654
Caramia, M., & Dell’Olmo, P. (2020). Multi-objective Optimization. Multi-Objective
worth studying. Management in Freight Logistics, 21–51. https://doi.org/10.1007/978-3-030-50812-8_
In summary, the proposed QLFHH performs competitively in solving 2
MOPs with various features. At the same time, this study also provides a Corne, D. W., & Knowles, J. D. (2007). Techniques for highly multiobjective
optimisation: Some nondominated points are better than others. In Proceedings of
new idea for solving multi-objective optimization problems in the real GECCO 2007: Genetic and Evolutionary Computation Conference (pp. 773–780).
world. https://doi.org/10.1145/1276958.1277115
Deb, K., Agrawal, S., Pratap, A., & Meyarivan, T. (2000). A fast elitist non-dominated
sorting genetic algorithm for multi-objective optimization: NSGA-II. Lecture Notes in
5. Conclusion and future work Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), 1917, 849–858. https://doi.org/10.1007/3-540-45356-3_83
Deng, W., Zhang, X., Zhou, Y., Liu, Y., Zhou, X., Chen, H., & Zhao, H. (2022). An
In this paper, a novel multi-objective hyper-heuristic is proposed,
enhanced fast non-dominated solution sorting genetic algorithm for multi-objective
named QLFHH. This algorithm combines the advantages of the existing problems. Information Sciences, 585, 441–453. https://doi.org/10.1016/J.
three MOEAs to solve MOPs with different features and produce INS.2021.11.052
competitive results. Q-learning and ∊ – greedy are used in the high-level Hitomi, N., & Selva, D. (2016). A Hyperheuristic Approach to Leveraging Domain
Knowledge in Multi-Objective Evolutionary Algorithms. Proceedings of the ASME
strategy of the hyper-heuristic framework. The state set, action set, and Design Engineering Technical Conference, 2B–2016. https://doi.org/10.1115/
reward mechanism are designed according to the characteristics of DETC2016-59870
MOPs. A fuzzy policy is utilized in the offspring reproduction process, it Hua, Y., Liu, Q., Hao, K., & Jin, Y. (2021). A Survey of Evolutionary Algorithms for Multi-
Objective Optimization Problems with Irregular Pareto Fronts. IEEE/CAA Journal of
is helpful in improving the performance of the algorithm. A potential Automatica Sinica, 8(2), 303–318. https://doi.org/10.1109/JAS.2021.1003817
drawback of QLFHH is probably that it considers HV metrics in the Jia, Y., Yan, Q., & Wang, H. (2023). Q-learning driven multi-population memetic
reward mechanism and also uses non-dominated sorting in determining algorithm for distributed three-stage assembly hybrid flow shop scheduling with
flexible preventive maintenance. Expert Systems with Applications, 232, Article
the population state, HV and non-dominated sorting are both time- 120837. https://doi.org/10.1016/J.ESWA.2023.120837
consuming, and it results in a longer overall time for the algorithm. In Ji, J. J., Guo, Y. N., Gao, X. Z., Gong, D. W., & Wang, Y. P. (2023). Q-Learning-Based
QLFHH, the role of Q-learning impacts the entire model, making the Hyperheuristic Evolutionary Algorithm for Dynamic Task Allocation of
Crowdsensing. IEEE Transactions on Cybernetics, 53(4), 2211–2224. https://doi.org/
limitations of QLFHH closely tied to those of Q-learning. When the
10.1109/TCYB.2021.3112675
dimension of the problem is very high, the state space and action space Knowles, J., & Corne, D. (2003). Instance generators and test suites for the multiobjective
will increase exponentially, resulting in the Q-table of Q-learning being quadratic assignment problem. Lecture Notes in Computer Science (Including Subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2632,
too large to be effectively learned and stored. In the context of dynam­
295–310. https://doi.org/10.1007/3-540-36970-8_21/COVER
ically changing environments (such as: real-time optimization, online Koppen, M., & Yoshida, K. (2007). Substitute distance assignments in NSGA-II for
learning), Q-learning may not be able to adapt to the changes quickly, handling many-objective optimization problems. In Lecture Notes in Computer Science
resulting in the model being outdated. Therefore, the proposed QLFHH (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics). https://doi.org/10.1007/978-3-540-70928-2_55/COVER
is not suitable for situations such as strict time requirements, excessive Kumari, A. C., & Srinivas, K. (2016). Hyper-heuristic approach for multi-objective
problem dimensions, and dynamic environments. software module clustering. Journal of Systems and Software, 117, 384–401. https://
In future work, QLFHH will be applied to tackle different complex doi.org/10.1016/J.JSS.2016.04.007
Li, M., Grosan, C., Yang, S., Liu, X., & Yao, X. (2018). Multiline Distance Minimization: A
optimization problems (multi-model, multi-tasking, or real-world Visualized Many-Objective Test Problem Suite. IEEE Transactions on Evolutionary
problems) to assess its effectiveness and cross-domain ability. Further­ Computation, 22(1), 61–78. https://doi.org/10.1109/TEVC.2017.2655451
more, it is worthwhile to combine different low-level heuristics (LLHs) Li, M., Yang, S., & Liu, X. (2016). Pareto or Non-Pareto: Bi-criterion evolution in
multiobjective optimization. IEEE Transactions on Evolutionary Computation, 20(5),
based on the problem and to design various state sets and reward 645–665. https://doi.org/10.1109/TEVC.2015.2504730
mechanisms. The improvement for the limitation of QLFHH is also a
challenging work.

16
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232

Li, M., & Yao, X. (2019a). Quality evaluation of solution sets in multiobjective Walker, D. J., & Keedwell, E. (2016). Multi-objective optimisation with a sequence-based
optimisation: A survey. ACM Computing Surveys, 52(2), 1–43. https://doi.org/ selection hyper-heuristic. In GECCO 2016 Companion - Proceedings of the 2016 Genetic
10.1145/3300148 and Evolutionary Computation Conference (pp. 81–82). https://doi.org/10.1145/
Li, M., & Yao, X. (2019b). What weights work for you? Adapting weights for any Pareto 2908961.2909016
front shape in decomposition-based evolutionary multiobjective optimisation. Wang, X., Jin, Y., Schmitt, S., & Olhofer, M. (2020). An adaptive Bayesian approach to
Evolutionary Computation, 28(2), 227–253. https://doi.org/10.1162/evco_a_00269 surrogate-assisted evolutionary multi-objective optimization. Information Sciences,
Li, W., Özcan, E., Drake, J. H., & Maashi, M. (2023). A generality analysis of 519, 317–331. https://doi.org/10.1016/j.ins.2020.01.048
multiobjective hyper-heuristics. Information Sciences, 627, 34–51. https://doi.org/ Wang, Y., & Li, B. (2010). Multi-strategy ensemble evolutionary algorithm for dynamic
10.1016/J.INS.2023.01.047 multi-objective optimization. Memetic Computing, 2(1), 3–24. https://doi.org/
Li, W., Özcan, E., & John, R. (2019). A Learning Automata-Based Multiobjective Hyper- 10.1007/S12293-009-0012-0/METRICS
Heuristic. IEEE Transactions on Evolutionary Computation, 23(1), 59–73. https://doi. Yang, T., Zhang, S., & Li, C. (2021). A multi-objective hyper-heuristic algorithm based on
org/10.1109/TEVC.2017.2785346 adaptive epsilon-greedy selection. Complex and Intelligent Systems, 7(2), 765–780.
Maashi, M., Özcan, E., & Kendall, G. (2014). A multi-objective hyper-heuristic based on https://doi.org/10.1007/S40747-020-00230-8/FIGURES/13
choice function. Expert Systems with Applications, 41(9), 4475–4493. https://doi.org/ Yang, X., Zou, J., Yang, S., Zheng, J., & Liu, Y. (2021). A Fuzzy Decision Variables
10.1016/J.ESWA.2013.12.050 Framework for Large-scale Multiobjective Optimization. IEEE Transactions on
McClymont, K., & Keedwell, E. C. (2011). Markov chain hyper-heuristic (MCHH): An Evolutionary Computation. https://doi.org/10.1109/TEVC.2021.3118593
online selective hyper-heuristic for multi-objective continuous problems. Genetic and Zhang, S., Ren, Z., Li, C., & Xuan, J. (2020). A perturbation adaptive pursuit strategy
Evolutionary Computation Conference, GECCO’11, 2003–2010. https://doi.org/ based hyper-heuristic for multi-objective optimization problems. Swarm and
10.1145/2001576.2001845 Evolutionary Computation, 54, Article 100647. https://doi.org/10.1016/J.
Pan, L., Xu, W., Li, L., He, C., & Cheng, R. (2021). Adaptive simulated binary crossover SWEVO.2020.100647
for rotated multi-objective optimization. Swarm and Evolutionary Computation, 60, Zhang, Y., Bai, R., Qu, R., Tu, C., & Jin, J. (2022). A deep reinforcement learning based
Article 100759. https://doi.org/10.1016/J.SWEVO.2020.100759 hyper-heuristic for combinatorial optimisation with uncertainties. European Journal
Qian, Z., Zhao, Y., Wang, S., Leng, L., & Wang, W. (2018). A hyper heuristic algorithm for of Operational Research, 300(2), 418–427. https://doi.org/10.1016/J.
low carbon location routing problem. In Lecture Notes in Computer Science (Including EJOR.2021.10.032
Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Zhang, Y., Harman, M., & Mansouri, S. A. (2007). The multi-objective next release
https://doi.org/10.1007/978-3-319-92537-0_21/COVER problem. Proceedings of GECCO 2007: Genetic and Evolutionary Computation
Tian, Y., Cheng, R., Zhang, X., Li, M., & Jin, Y. (2019). Diversity Assessment of Multi- Conference, 1129–1137. doi: 10.1145/1276958.1277179.
Objective Evolutionary Algorithms: Performance Metric and Benchmark Problems Zhao, F., Di, S., & Wang, L. (2022). A Hyperheuristic With Q-Learning for the
[Research Frontier]. IEEE Computational Intelligence Magazine, 14(3), 61–74. https:// Multiobjective Energy-Efficient Distributed Blocking Flow Shop Scheduling Problem.
doi.org/10.1109/MCI.2019.2919398 IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2022.3192112
Tian, Y., Cheng, R., Zhang, X., Su, Y., & Jin, Y. (2019). A Strengthened Dominance Zhu, Q., Wu, X., Lin, Q., Ma, L., Li, J., Ming, Z., & Chen, J. (2023). A survey on
Relation Considering Convergence and Diversity for Evolutionary Many-Objective Evolutionary Reinforcement Learning algorithms. Neurocomputing, 126628. https://
Optimization. IEEE Transactions on Evolutionary Computation, 23(2), 331–345. doi.org/10.1016/J.NEUCOM.2023.126628
https://doi.org/10.1109/TEVC.2018.2866854 Zitzler, E., & Künzli, S. (2004). Indicator-based selection in multiobjective search. Lecture
Tian, Y., Zhang, X., Cheng, R., He, C., & Jin, Y. (2020). Guiding Evolutionary Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and
Multiobjective Optimization with Generic Front Modeling. IEEE Transactions on Lecture Notes in Bioinformatics), 3242, 832–842. https://doi.org/10.1007/978-3-540-
Cybernetics, 50(3), 1106–1119. https://doi.org/10.1109/TCYB.2018.2883914 30217-9_84
Venske, S. M., Almeida, C. P., Lüders, R., & Delgado, M. R. (2022). Selection hyper- Zitzler, E., Laumanns, M., & Thiele, L. (2001). SPEA2: Improving the Strength Pareto
heuristics for the multi and many-objective quadratic assignment problem. Evolutionary Algorithm. Evolutionary Methods for Design Optimization and Control with
Computers & Operations Research, 148, Article 105961. https://doi.org/10.1016/J. Applications to Industrial Problems, 95–100. doi: 10.1.1.28.7571.
COR.2022.105961 Zitzler, E., & Thiele, L. (1999). Multiobjective evolutionary algorithms: A comparative
Vrugt, J. A., & Robinson, B. A. (2007). Improved evolutionary optimization from case study and the strength Pareto approach. IEEE Transactions on Evolutionary
genetically adaptive multimethod search. Proceedings of the National Academy of Computation, 3(4), 257–271. https://doi.org/10.1109/4235.797969
Sciences of the United States of America, 104(3), 708–711. https://doi.org/10.1073/
PNAS.0610471104/SUPPL_FILE/IMAGE9.GIF

17

You might also like