A Q-learning-based multi-objective hyper-heuristic algorithm with fuzzy policy decision technology
A Q-learning-based multi-objective hyper-heuristic algorithm with fuzzy policy decision technology
A R T I C L E I N F O A B S T R A C T
Keywords: While metaheuristic methods have achieved success in solving various computationally difficult optimization
Hyper-heuristics problems, the design of metaheuristic methods relies on the domain-specific knowledge and requires expert
Q-learning experience. On the contrary, selection hyper-heuristics are emerging cross-domain search methods that perform
Fuzzy policy
search over a set of low-level heuristics. In this study, a selection hyper-heuristic method is introduced to learn,
Multi-objective optimization
select, and apply low-level heuristics to solve multi-objective optimization problems. The proposed hyper-
heuristic is an iterative process including a learning strategy based on Q-learning, an exploration strategy
based on ε-greedy, and a fuzzy policy decision technology. The performance of the proposed selection hyper-
heuristic is experimentally studied on a range of benchmarks as well as several multi-objective real-world
problems. Empirical results demonstrate the effectiveness and cross-domain capability of the proposed selection
hyper-heuristic through comparison with certain state-of-the-art algorithms.
1. Introduction MOEAs, as metaheuristics (Table 1), have the advantage that they
can easily incorporate domain knowledge of problems. Whereas, their
Multi-objective optimization problems (MOPs) exist in various fields disadvantage lies in poor generality which needs to design customized
of the real world and are filled with challenges compared with single- algorithms for certain problems. Once switching to new problem do
objective optimization problems (SOPs) (Caramia & Dell’Olmo, 2020). mains or new problems from a similar domain, MOEAs often need to be
The most widely utilized search techniques in MOPs are multi-objective tuned or even redesigned. In order to improve the cross-domain ability,
evolutionary algorithms (MOEAs), and it is possible to find a set of trade- an intuitive way is to combine multiple MOEAs to obtain their advan
off solutions that meet the requirements in a reasonable time. The main tages while with the purpose of avoiding their shortcomings. So, a new
distinction between MOEAs is their different environment selection type of algorithm, hyper-heuristic, has emerged. With the development
strategy which is the way the candidate solutions are ranked and of optimization algorithms, hyper-heuristics have gained increasing
retained at each iteration (Hua et al., 2021). Based on different envi attention in various research fields (Y. Zhang et al., 2022). Unlike the
ronment selection strategies, existing MOEAs can be broadly divided heuristic algorithm, the hyper-heuristic algorithm controls the heuristic
into three categories. The first one is Pareto front (PF) dominance-based algorithm by the information accepted during the search process to
MOEAs, and representative algorithms of this type are fast non- obtain the solutions to the problem. There are two primary categories of
dominated sorting algorithm (NSGA-II) and its variants (Deb, Agrawal, hyper-heuristics: (1) heuristic selection methodologies: to select suitable
Pratap, & Meyarivan, 2000; Tian, Cheng, Zhang, Su, et al., 2019). The meta-heuristics; (2) heuristic generation methodologies: to generate
second category is indicator-based MOEAs, such as indicator-based meta-heuristics from given components for problems (Burke et al.,
evolutionary algorithms (IBEA) and hypervolume estimation algo 2019). Selection hyper-heuristics mix meta-heuristics as low-level heu
rithm (HypE) (Zitzler & Künzli, 2004; Bader & Zitzler, 2011), and the ristics (LLHs), and high-level strategy guide the search process intelli
third category is decomposition-based MOEAs, such as multi-objective gently and decide which low-level heuristic should be applied to the
evolutionary algorithm based on decomposition (MOEA/D) and its current solutions according to the situation (Venske et al., 2022).
variants (Cao et al., 2021). Besides, to obtain a set of well converged and uniformly distributed
* Corresponding author.
E-mail addresses: [email protected] (F. Zhao), [email protected] (Z. Geng), [email protected] (J. Zhang), [email protected] (T. Xu).
1
https://orcid.org/0000-0002-7336-9699.
https://doi.org/10.1016/j.eswa.2025.127232
Received 16 August 2023; Received in revised form 7 March 2025; Accepted 9 March 2025
Available online 18 March 2025
0957-4174/© 2025 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
2
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
difficulties. criterion and the heuristic selection mechanism are both included in the
Firstly, the evaluation of MOPs is complicated. Single-objective high-level strategy, and the heuristic selection mechanism determines
optimization problems can use the objective value to evaluate the which LLHs should be called. The acceptance criterion is then used to
quality of the obtained solution. However, the result of MOPs is a so evaluate the population and decide whether or not to accept the ob
lution set instead of a single solution. The objective functions of the tained solutions. The low-level heuristics consist of a set of meta-heu
solution set must be promoted simultaneously and distributed well to ristics or operators. The pseudo-code for QLFHH is illustrated in
the true Pareto front. Variety indicators for evaluating the solution set of Algorithm 1.
MOPs are proposed in the literature. When evaluating the quality of the In QLFHH, reasonable designs of action and states are essential to Q-
intermediate solution set obtained by a selected LLH in the framework of learning, because they are able to effectively reflect population states
hyper-heuristics, most of the exiting multi-objective selection hyper- and boost learning effectiveness. In this paper, each state corresponds to
heuristics use a single indicator. For example, MCHH uses the Pareto the situation of the population at that moment. Each action is a low-level
dominance concept to estimate the ratio of the offspring solution set heuristic.
dominating the parent solution set. HH-RILA employs the hyper-volume In the agent training process, an improved ε − greedy strategy is
(HV) indicator for evaluation, while PAPHH employs the Spacing indi utilized to select the action and shown in Eq. (1) and Eq. (2):
cator or HV indicator as the fitness function. HHCF uses four indicators
0.5
to rank the performance of LLHs, and is applied to solve the software ∈= 10×(t− 0.6×tmax )
(1)
module clustering problem. 1+ e tmax
3
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
termination criterion is met. the references (X. Yang et al., 2021). The incorporation of fuzzy sets and
Algorithm 1 QLFHH membership functions in evolutionary algorithms offers several advan
Input: A stop criterion (MaxFE); the size of population (N); initialized tages. It enables the algorithm to explore the original decision space
population P0 ; Learning rate (α); discount rate (γ); population state set while simultaneously narrowing down the search range of the decision
(S); action set(A)
Output: Pend
space. Consequently, during the agent’s training process, the fuzzy
1: Initialize Q-table, a state set(S), a action set(A), ε-greedy policy policy effectively reduces the original decision variable space, ensuring
2: while the stopping criterion is not met do: that Q-learning with fuzzy hyper-parameters maintains a certain
3: get the state of the current population (Pt ):st convergence rate in training the Q-table and enhances the quality of the
4: if rand ≤ ∊ obtained solutions.
5: actiont ← rand action ⌊ ( )⌋
6: else Γ1n = 10i ⋅R−n 1 ⋅ Xn − Xnl ⋅Rn ⋅10− i + Xnl (5)
7: actiont ← actionk // k =argmaxk (Q(s, at ) )
8: end if ⌈ ( )⌉
Γ2n = 10i ⋅R−n 1 ⋅ Xn − Xnl ⋅Rn ⋅10− i + Xnl (6)
9: offspring ← Reproduction(Pt , actiont )
10: Pt+1 ← Environmental Selection (offspring, Pt , actiont )
11: evaluate the Pt+1 , then obtain rt 1
μÃ1 (Xn ) = (7)
12: update next st+1 Xn − Γ1n
13: update Q-table by using Eq. (3)
14: end while 1
15: return Pend μÃ2 (Xn ) = (8)
Γ2n − Xn
{( ⃒ )}
⃒
3.2. Fuzzy policy A
̃1 = Xn , μÃ1 (Xn )⃒⃒n = 1, 2, ⋯, D (9)
The fuzzy theory is also employed in QLFHH for solving MOPs. This {( ⃒
⃒
)}
paper introduces a fuzzy policy decision technique that incorporates A
̃2 = Xn , μÃ2 (Xn )⃒⃒n = 1, 2, ⋯, D , (10)
decision variables belonging to two fuzzy sets within the universe of
discourse. The technique involves calculating the degree of membership ⎧
⎪
for each decision variable separately with respect to the two fuzzy sets. ⎪
⎪ Γ1n ,
⎪
⎪ μÃ1 (Xn ) > μÃ2 (Xn )
⎨
The decision variable values are then updated to match the values
Xn = Γ2n ,
ʹ μÃ1 (Xn ) < μÃ2 (Xn ). (11)
represented by the fuzzy set with a higher degree of membership. This ⎪
⎪ ( 1 2)
⎩ rand Γn , Γn , μÃ1 (Xn ) = μÃ2 (Xn )
⎪
⎪
process transforms the original solution into a fuzzy solution, as all ⎪
decision variable values in the solution vector are adjusted accordingly.
Subsequently, the fuzzy solution is utilized in the evolutionary process
Where, X represents a D-dimensional original solution vector; Xʹ repre
instead of the original solution. For the specific introduction of fuzzy
sents a D-dimensional fuzzy solution vector; Xn represents the n-th de
theory such as the degree of membership and fuzzy set, please refer to
4
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
cision variable of the original solution X; Xnl represents the lower limit of Nd ∑ ⃒
∑ M
(⃒ )
the value of the n-th decision variable; Rn is the length of the value in CDtavg =
⃒ i+1 ⃒
⃒fj − fji− 1 ⃒/Nd (12)
j− 1
terval of the i-th decision variable. Γ1n and Γ2n are the fuzzy target value of i=1
the n-th decision variable; the n-th decision variable is either fuzzed into
where t denotes tth generation, Nd denotes the number of all individuals
Γ1 or is blurred into Γ2 . Ã 1 and A
̃ 2 are fuzzy sets corresponding to two
n n
except for the boundary points, M represents the number of objectives,
fuzzy target values. μÃ1 and μÃ2 are membership functions correspond
fji+1 and fji− 1 represent the jth objective function value indicating i + 1th
ing to two fuzzy sets. The following is the process of fuzzifying an
individual and i − 1th individual respectively.
original solution X into a fuzzy solution Xʹ: Calculate the membership
The state set is divided into four states: S = (S1 , S2 , S3 , S4 ) according
function values of X belonging to two fuzzy sets, and X will be fuzzified
to whether the maxFront value of the current population is greater than
into the fuzzy target value corresponding to the fuzzy set with the larger
one and whether the average crowding distance is better than that of the
degree of membership.
previous generation. The State set is as follows:
Algorithm 2 Fuzzy Operation
Input: P (population), N (size of population), Rate (fuzzy evolutionary rate), offspring
(∅) S1 :max(FrontNo) > 1, CDtavg >.CDt−avg1
Output: final offspring (fuzzy solution).
1: for 1: N do S2 :max(FrontNo) > 1, CDtavg <.CDt−avg1
2: calculate Γ1 and Γ2 by Eq. (5) and Eq. (6) S3 :max(FrontNo) = 1, CDtavg >.CDt−avg1
3: calculate μÃ1 and μÃ2 by Eq. (7) and Eq. (8)
4: if μÃ1 > μÃ2 then // Eq. (11) S4 :max(FrontNo) = 1, CDtavg <.CDt−avg1
5: logical←1
6: else
7: logical←0 3.4. Action and reward
8: end if
9: Xʹ←Γ2
)
Theoretically, QLFHH can incorporate any population-based meta-
10: Xʹ(find(logical))←Γ1 (find(logical) ; heuristics into the framework. In the QLFHH, three MOEAs are used as
11: offspring ←X ∪ offspring
ʹ
LLHs, i.e., NSGA-II, IBEA, and SPEA2 (Zitzler, Laumanns, & Thiele,
12: end for
13: return offspring
2001). These well-known meta-heuristics are chosen as LLHs because
they are classic MOEAs that can solve both low-dimensional and high-
dimensional MOPs. Moreover, they are different in environmental se
The detailed procedure of the fuzzy operation is given in Algorithm 2. lection strategies.
First, obtain the length of the decision variable value interval. Next, NSGA-II uses a fast non-dominated sorting algorithm to rank solu
tions into different levels. It also uses crowding distance to maintain the
calculate the two fuzzy target values Γ1 and Γ1 of the decision variables.
diversity of solutions and avoid solutions clustering together. This
Γ1 and Γ2 correspond to fuzzy sets à 1 and A
̃ 2 respectively. Calculate the
method ensures that the solution set is evenly distributed on the Pareto
membership degree of decision variables in two fuzzy sets. Then, update front.
the value of the decision variable according to the degree of member IBEA uses the hypervolume indicator to filter the best solutions and
ship. The update rule is that the value of the decision variable will be balance the diversity and superiority of the solutions. In addition, IBEA
updated to the fuzzy target value corresponding to the fuzzy set with a uses a mechanism called “ε-dominance” to handle the relative superi
larger membership degree. Finally, the program returns to the fuzzy ority of the solutions and better preserve the diversity of solutions. In
solution. In particular, logical is a Boolean variable matrix, find(1) ε-dominance, a reference point is used to define a threshold for the
returns true and find(0) return false. hypervolume indicator, and solutions that are ε-dominated by the
reference point are considered nondominated. This mechanism allows
3.3. State set IBEA to find solutions that are not only Pareto optimal but also diverse.
SPEA2 incorporates a precise fitness assignment strategy, consid
In this paper, Q-learning is employed as the high-level selection ering both the number of individuals dominating a solution and the
strategy. It is essential to clarify the representation of the current pop number of individuals by which it is dominated. It employs a nearest-
ulation state. For multi-objective optimization problems, the population neighbor density estimation technique to enhance search efficiency.
state is constructed based on the fitness value of the population (the Furthermore, SPEA2 enhances the archive truncation method used in
quality of the solutions), and the following two aspects are considered. the previous version of strength pareto evolutionary algorithm (SPEA)
by replacing the average linkage method. This improvement ensures the
(1) The degree of population proximity to the true Pareto front of the preservation of boundary points in the archive.
function. For each generation, the agent selects an appropriate action based on
(2) Population diversity. the current population state. In QLFHH, the aforementioned NSGAII,
IBEA, and SPEA2 are selected as the low-level heuristics within the
In NSGA-II, fast non-dominated sorting can effectively obtain a hyper-heuristic framework. These chosen low-level heuristics have been
dominant relationship among individuals in a population (Deng et al., proven successful in solving multi-objective optimization problems over
2022). The dominance of the individual can reflect the quality of the the past decades.
solutions. When we want to get the population state, a fast non- In QLFHH, the agent is not told which action to take but instead
dominated sorting technique divides the population into different Par discovers which action yields to higher reward by executing them on the
eto fronts, the value of FrontNo denotes the Pareto front where the so population, it is clear that this form of reward method can have a pos
lution lies and indicates the quality of the solution. If max(FrontNo) > 1, itive effect. The metric Hyper-volume (HV) is introduced to evaluate the
the solutions in FrontNo = 1 dominate the solutions in FrontNo > 1. The population. If the HV value of tth generation is larger than t − 1th gen
closer the population is to the true Pareto front, the larger the proportion eration, the agent will give a positive reward value or give a negative
of non-dominated solutions. While max(FrontNo) = 1, all solutions are reward value to the current action.
non-dominated. Then population diversity can be approximated by
calculating the average crowding distance (CDavg ) of individuals, and
CDtavg is calculated by Eq. (12).
5
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
3.5. Offspring reproduction and environmental selection (4) Updating the Q-Table: The Q-values are updated based on the
performance of the selected LLH. This involves updating the Q-
The process of training agents requires multiple explorations, which table, which is again an O(1) operation.
will affect the convergence speed of QLFHH. Therefore, in this paper, the
fuzzy optimization policy is introduced to produce fuzzy solutions as The overall time complexity of QLFHH is dominated by the time
offspring when the current iteration progress Iter < = Rate, where Rate complexity of applying the LLHs and evaluating the solutions. According
represents fuzzy evolutionary rate and Iter = FE / maxFE. Otherwise, to the literature on NSGA-II, IBEA, and SPEA2: NSGA-II has a time
( )
the offspring is generated by crossover and mutation strategies in the complexity of O MN2 , where M is the number of objectives and N is the
selected low-level heuristic. The whole procedure of offspring repro ( )
population size. IBEA has a time complexity of O N2 . SPEA2 has a time
duction is shown in Algorithm 3. In Algorithm 3, the crossover and
complexity of O(MN2 ).
mutation strategies are simulated binary crossover (SBX) and poly
Given that QLFHH utilizes these LLHs, its time complexity is influ
nomial mutation (PM), respectively (Pan et al., 2021). These strategies
enced by the most computationally expensive LLH it employs. Specif
are also common offspring generation methods used in NSGAII, IBEA,
ically, if we consider the worst-case scenario where the selected LLH has
and SPEA2.
the highest time complexity, the overall time complexity of QLFHH can
Algorithm 3 offspring = Reproduction(P) ( )
be approximated by O M2 logM .
Input: P (population), Rate (fuzzy evolutionary rate), Crossover strategy, Mutation
strategy
Output: offspring. 4. Experimental results and analysis
Initialize Iter //Iter = FE/maxFE;
1: if Iter <= Rate then
4.1. Test suites
2: offspring ← Fuzzy Operation (Iter, P);
3: else
4: offspring ← Crossover and Mutation (P); The proposed Q-learning-based multi-objective hyper-heuristic al
5: end if gorithm with fuzzy policy (QLFHH) controls three low-level heuristics:
6: return: offspring. NSGA-II, SPEA2, and IBEA. The performance of QLFHH is not only
compared to each individual LLH, but also to three popular MOEAs in
recent years: Adaw, BCEMOEAD, and GFMMOEA.
In this paper, ZDT(1–6), DTLZ(1–7), and IMOP(1–8) 20 complex
The whole procedure of Environmental Selection is shown in Algorithm MOPs test problems are used to evaluate the performance of QLFHH
4. For the detailed environmental selection process, please refer to the (Tian, Cheng, Zhang, Li, et al., 2019). These problems have various
relevant literature on NSGAII, IBEA, and SPEA2. characteristics, such as liner, multimodal, concave, convex, multi-grid,
and multi-segment discontinuous, etc. The characteristics of each test
Algorithm 4 Environmental Selection instance are listed in Table 1, where M represents the objective number
Input: offspring, P (population), action of the optimization function and D represents the dimension of decision
Output: P (final population)
variables. The proposed QLFHH algorithm has also been tested on multi-
1: if action = NSGAII
objective real-world problems (real-world MOPs).
2: population = NSGAII-EnvironmentalSelection(offspring, P);
3: else if action = IBEA
4: population = IBEA-EnvironmentalSelection(offspring, P);
5: else if action = SPEA2 4.2. Performance metrics
6: population = SPEA2-EnvironmentalSelection(offspring, P);
7: end if Hyper-volume (HV) and Inverted generational distance (IGD) are
8: return: the final population P
two widely used performance indicators that cover all aspects of the
quality of solutions obtained by an algorithm (M. Li & Yao, 2019a). The
IGD is calculated by Eq. (13) and Eq. (14):
3.6. Time complexity of QLFHH ∑n
di
IGD = i=1 (13)
n
This section provides a comprehensive analysis of the computational
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
complexity of QLFHH. The algorithm operates by continuously selecting ∑k ( )2
low-level heuristics (LLHs) to apply to the population, while simulta di = j=1
pai,j − ptrue j (14)
neously training the Q-table. Each iteration involves two main pro
cesses: the selection of LLHs and the acceptance of solutions. Here n is the number of solutions in true Pareto front, and di denotes
the minimum Euclidean distance between the Pareto optimal solution
(1) Selection of LLHs: In each iteration, QLFHH evaluates the current obtained by the algorithm and the solutions of the true Pareto front. k
state of the population and uses the Q-table to select the most determines the number of objectives in the test function. pai,j represents
appropriate LLH. This decision-making process involves a lookup the ith Pareto solution obtained by algorithm on the jth objective func
in the Q-table, which is an O(1) operation given that it is a tion. ptrue
j is the nearest Pareto solution in the true Pareto front from pai,j .
constant-time access. It can be discovered that the lower the IGD value, the better the algo
(2) Application of LLHs: Once an LLH is selected, it is applied to the rithm’s performance in terms of convergence and diversity. It indicates
population. The time complexity of this step is dependent on the that all of the algorithm’s solutions are contained in the true Pareto front
specific LLH used. Different LLHs have different computational if IGD = 0. When calculating the IGD, the true Pareto front will be
costs. For instance, crossover and mutation operations typical in provided beforehand.
evolutionary algorithms can vary in complexity. The HV is calculated using Eq. (8). Where zr = (zr1 , zr2 , ⋯zrm ) repre
(3) Evaluation of Solutions: Once an LLH is selected, it is applied to sents a reference point in the objective space that is dominated by all the
the population. The time complexity of this step is dependent on Pareto-optimal points. S denotes the set of solutions obtained by the
the specific LLH used. Different LLHs have different computa algorithm, and HV assesses both convergence and diversity of solutions
tional costs. For instance, crossover and mutation operations by measuring the size of the objective space dominated by the solutions
typical in evolutionary algorithms can vary in complexity. in S, bounded by zr . Here, m represents the number of objectives.
6
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
(⋃ )
[ ]*⋯* [ ]
HV(S) = VOL f1 (x), zr1 fm (x), zrm (15)
x∈S
From Eq. (15), it can be concluded that if the solutions are closer to
the complete true Pareto front, the better the quality of the solutions and
the larger the HV value, where VOL(*) represents the Lebesgue measure.
7
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
worst when Q-learning and fuzzy strategy is removed. The performance 4.5. Analysis of results
of the algorithm with only fuzzy removed is better than the performance
of the algorithm with only Q-learning removed, and both are better than The convergence behavior of the different algorithms on the 3-objec
HHRC, which proves the effectiveness of Q-learning and fuzzy strate tive DTLZ test problems are depicted in Fig. 5. The logarithm of the
gies. It is observed that the improvement of algorithm performance by inverse generational distance (IGD) values obtained by each algorithm
the combination of Q-learning and fuzzy strategies is greater than the on the test problems are used to provide a clearer representation of the
improvement of algorithm performance by a single strategy, and Q- convergence process for each algorithm. The graph reveals that QLFHH
learning has a greater impact on the stability of the algorithm. exhibits superior convergence speed and solution quality compared to
the other algorithms under consideration. Furthermore, Table 3 presents
the HV values obtained by comparing different algorithms and QLFHH
on DTLZ problems. From the table, it is evident that QLFHH outperforms
8
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
Table 3
The HV values obtained by comparison algorithms and QLFHH on DTLZ.
Problem AdaW BCEMOEAD GFMMOEA IBEA NSGA-II SPEA2 QLFHH
DTLZ1 5.8211e-1 (2.83e-4) 5.8219e-1 (4.82e-4) 5.8199e-1 (5.82e-4) 4.1301e-1 (2.69e-2) 5.8095e-1 (9.68e-4) 5.8199e-1 (5.70e-4) 5.8256e-1 (9.60e-
(2) − − − − − − 6)
DTLZ2 3.4745e-1 (1.96e-4) 3.4727e-1 (7.60e-5) 3.4747e-1 (5.97e-5) 3.4620e-1 (2.35e-4) 3.4656e-1 (1.87e-4) 3.4726e-1 (8.44e-5) 3.4756e-1 (4.37e-
(2) − − − − − − 5)
DTLZ3 3.4254e-1 (3.05e-3) 3.4246e-1 (2.94e-3) 3.4283e-1 (2.95e-3) 1.6774e-1 (4.18e-3) 3.4260e-1 (3.11e-3) 3.4133e-1 (4.00e-3) 3.4769e-1 (2.78e-
(2) − − − − − − 5)
DTLZ4 3.4726e-1 (4.36e-4) 3.2161e-1 (7.82e-2) 2.7049e-1 (1.20e-1) 2.6118e-1 (1.22e-1) 2.7839e-1 (1.15e-1) 3.2163e-1 (7.82e-2) 3.2162e-1 (7.82e-
(2) þ = − − − = 2)
DTLZ5 3.4743e-1 (2.42e-4) 3.4727e-1 (7.71e-5) 3.4745e-1 (6.50e-5) 3.4614e-1 (2.62e-4) 3.4658e-1 (1.71e-4) 3.4725e-1 (1.09e-4) 3.4757e-1 (4.30e-
(2) − − − − − − 5)
DTLZ6 3.4740e-1 (2.93e-4) 3.4746e-1 (9.22e-5) 3.4758e-1 (4.00e-5) 3.4269e-1 (6.62e-4) 3.4646e-1 (2.19e-4) 3.4756e-1 (3.75e-5) 3.4771e-1 (2.43e-
(2) − − − − − − 5)
DTLZ7 2.4291e-1 (9.49e-5) 2.4294e-1 (1.67e-5) 2.4150e-1 (2.55e-4) 2.4042e-1 (1.22e-2) 2.4269e-1 (5.60e-5) 2.4288e-1 (3.15e-5) 2.4295e-1 (7.70e-
(2) = − − − − − 6)
DTLZ1 8.4017e-1 (7.23e-4) 8.4093e-1 (6.60e-4) 8.4179e-1 (6.33e-4) 4.5677e-1 (6.02e-2) 8.2753e-1 (2.50e-3) 8.4174e-1 (8.12e-4) 8.4305e-1 (1.30e-
(3) − − − − − − 4)
DTLZ2 5.5919e-1 (8.94e-4) 5.5781e-1 (1.03e-3) 5.5986e-1 (1.17e-3) 5.5752e-1 (8.89e-4) 5.3453e-1 (5.22e-3) 5.5571e-1 (1.39e-3) 5.6137e-1 (5.44e-
(3) − − − − − − 4)
DTLZ3 5.4491e-1 (7.94e-3) 5.4922e-1 (8.44e-3) 4.9166e-1 (1.06e-1) 2.4150e-1 (6.37e-3) 5.1989e-1 (1.36e-2) 5.4646e-1 (5.35e-3) 5.6237e-1 (5.40e-
(3) − − − − − − 4)
DTLZ4 5.4560e-1 (5.41e-2) 5.5747e-1 (1.30e-3) 4.9426e-1 (1.36e-1) 5.5756e-1 (1.17e-3) 5.2354e-1 (8.18e-2) 4.4883e-1 (1.23e-1) 5.1827e-1 (6.85e-
(3) + + − þ + = 2)
DTLZ5 1.9985e-1 (1.80e-4) 1.9949e-1 (1.23e-4) 1.9990e-1 (6.95e-5) 1.9864e-1 (3.06e-4) 1.9913e-1 (1.42e-4) 1.9955e-1 (1.39e-4) 2.0003e-1 (2.01e-
(3) − − − − − − 4)
DTLZ6 1.9988e-1 (6.27e-5) 1.9994e-1 (4.39e-5) 2.0008e-1 (3.90e-5) 1.9665e-1 (8.28e-4) 1.9946e-1 (1.39e-4) 2.0006e-1 (4.23e-5) 2.0018e-1 (2.59e-
(3) − − − − − − 5)
DTLZ7 2.7953e-1 (5.48e-4) 2.7738e-1 (6.45e-4) 2.3969e-1 (5.55e-2) 2.7077e-1 (2.12e-2) 2.6594e-1 (1.04e-2) 2.7533e-1 (6.16e-3) 2.8031e-1 (4.47e-
(3) − − − − − − 4)
+/-/= 2/11/1 1/12/1 0/14/0 1/13/0 1/13/0 0/12/2
Table 4
The IGD values obtained by comparison algorithms and QLFHH on ZDT.
Problem AdaW BCEMOEAD GFMMOEA IBEA NSGA-II SPEA2 QLFHH
ZDT1 3.8925e-3 (3.78e-5) 3.9481e-3 (5.38e-5) 3.8613e-3 (3.39e-5) 4.1221e-3 (5.83e-5) 4.7606e-3 (2.52e-4) 3.9561e-3 (6.66e-5) 3.7491e-3 (4.76e-
(2) − − − − − − 5)
ZDT2 3.8912e-3 (3.33e-5) 3.8929e-3 (3.69e-5) 3.8597e-3 (2.99e-5) 8.3190e-3 (7.19e-4) 4.8283e-3 (1.78e-4) 3.9381e-3 (5.07e-5) 3.8368e-3 (3.63e-
(2) − − − − − − 5)
ZDT3 4.6079e-3 (6.86e-5) 4.6470e-3 (4.28e-5) 1.6692e-2 (3.94e-3) 1.5741e-2 (6.36e-4) 6.4564e-3 (5.41e-3) 4.8606e-3 (1.05e-4) 5.0536e-3 (3.05e-
(2) þ + − − − + 4)
ZDT4 4.1694e-3 (2.83e-4) 4.0645e-3 (1.80e-4) 3.9516e-3 (1.14e-4) 1.8849e-2 (6.12e-3) 4.7403e-3 (2.20e-4) 4.0684e-3 (2.23e-4) 3.7028e-3 (3.73e-
(2) − − − − − − 5)
ZDT6 3.1325e-3 (4.35e-5) 3.0957e-3 (2.10e-5) 3.1907e-3 (1.34e-4) 4.4453e-3 (1.18e-4) 3.6948e-3 (9.50e-5) 3.0868e-3 (2.40e-5) 3.0705e-3 (1.83e-
(2) − − − − − − 5)
+/-/= 1/4/0 1/4/0 0/5/0 0/5/0 0/5/0 1/4/0
Table 5
The HV values obtained by comparison algorithms and QLFHH on IMOP.
Problem AdaW BCEMOEAD GFMMOEA IBEA NSGA-II SPEA2 QLFHH
IMOP1 9.8729e-1 (4.86e-5) 9.8729e-1 (5.34e-5) 9.8722e-1 (4.12e-5) 9.8454e-1 (1.36e-3) 9.8713e-1 (9.07e-5) 9.8717e-1 (6.49e-5) 9.8754e-1 (3.57e-
(2) − − − − − − 5)
IMOP2 2.1556e-1 (2.55e-2) 1.2179e-1 (5.08e-2) 2.1302e-1 (1.68e-2) 1.9835e-1 (6.03e-2) 2.3140e-1 (1.65e-4) 2.3180e-1 (8.10e-5) 2.3210e-1 (8.51e-
(2) − − − − − − 5)
IMOP3 6.5283e-1 (6.12e-3) 6.5866e-1 (1.63e-4) 6.4993e-1 (1.29e-2) 6.5688e-1 (1.25e-4) 6.5843e-1 (9.49e-5) 6.5823e-1 (1.92e-3) 6.5858e-1 (2.46e-
(2) − ¼ − − − = 4)
IMOP4 4.3233e-1 (1.03e-3) 4.3306e-1 (4.71e-4) 4.3340e-1 (7.54e-4) 4.3339e-1 (2.23e-4) 4.3278e-1 (3.14e-4) 4.3307e-1 (4.15e-4) 4.3408e-1 (2.40e-
(3) − − − − − − 4)
IMOP5 5.1638e-1 (1.18e-2) 5.0719e-1 (8.96e-3) 5.0327e-1 (4.86e-3) 5.0932e-1 (3.76e-3) 4.9290e-1 (3.85e-3) 4.9966e-1 (1.67e-3) 5.6010e-1 (1.32e-
(3) − − − − − − 2)
IMOP6 5.2916e-1 (4.92e-4) 5.2815e-1 (5.86e-4) 5.1732e-1 (4.91e-2) 5.1755e-1 (1.27e-3) 4.9781e-1 (9.20e-3) 5.0292e-1 (6.65e-2) 5.2671e-1 (8.66e-
(3) þ + − − − − 4)
IMOP7 4.9288e-1 (8.30e-2) 5.2721e-1 (6.77e-4) 2.1774e-1 (1.93e-1) 1.7505e-1 (1.59e-1) 4.9359e-1 (8.30e-3) 1.5269e-1 (1.49e-1) 5.1813e-1 (3.17e-
(3) = + − − − − 3)
IMOP8 5.3014e-1 (5.20e-3) 5.1977e-1 (2.18e-3) 5.3866e-1 (4.48e-3) 5.3605e-1 (5.33e-4) 4.7191e-1 (5.71e-3) 5.0473e-1 (3.17e-2) 5.3639e-1 (3.52e-
(3) − − þ − − − 2)
+/-/= 1/6/1 2/5/1 1/7/0 0/8/0 0/8/0 0/7/1
9
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
Table 7
The statistical comparisons with each algorithm on HV value.
Problem AdaW BCEMOEAD GFMMOEA IBEA NSGA-II SPEA2 QLFHH
10
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
obtained by IBEA are distributed along the boundaries, while NSGA-II performs better in combining the strengths of each LLH.
prefers to concentrate on the left half. In contrast, the PFs obtained by The details in Fig. 10 show that QLFHH ranks first on 12 out of 14
SPEA2 are evenly distributed. It also can be observed that the PFs ob DTLZ problems. The advantages of QLFHH can also be found on ZDT and
tained by HHRC and HHCF are distributed along the boundaries. This IMOP test suites, with the best rank on all ZDT problems and 5 out of 8
may be attributed to the fact that they tend to select IBEA and NSGA-II IMOP problems.
frequently during the search process. Compared to other algorithms,
HHEG and QLFHH exhibit better convergence and diversity in their
obtained PFs. This is because they utilize the collaborative effects of 4.6. Experimental results on real-world MOPs
each LLH during the search process, which leads to better performance.
In comparison to HHEG, the PFs obtained by QLFHH is uniform and QLFHH is also tested on six kinds of multi-objective real-world
smooth, which demonstrates that the proposed strategy in this paper problems. The QLFHH algorithm was compared with state-of-the-art
algorithms: IBEA, NSGA-II, and SPEA2, as well as with various hyper-
11
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
heuristics, including those random strategies based hyper-heuristic objective TSP (MOTSP), a traveling salesman needs to visit multiple
(HHRC), choice functions based hyper-heuristic (HHCF), and epsilon- locations and attempt to minimize the total path length while satisfying
greedy based hyper-heuristic (HHEG). Here are introductions of these multiple objectives or weights. This problem has various applications in
problems: fields such as logistics, transportation planning, and intelligent
Knapsack Problem (KP) (Zitzler & Thiele, 1999) involves the selec transportation.
tion of items to be placed into two backpacks, each with a weight limit. Next Release Problem (NRP) (Y. Zhang et al., 2007) is concerned
The primary objective is to maximize the total value of all items placed with achieving maximum satisfaction and minimum resource con
into both backpacks. To convert the KP into a minimization problem, the sumption when a software development company completes the next
objective function can be multiplied by a negative one (− 1). version of its software product. This problem is of great importance in
Traveling Salesman Problem (TSP) (Corne & Knowles, 2007) refers software engineering, as it involves optimizing the release planning
to the problem of finding the shortest path that visits all given locations process and balancing the needs and constraints of various stakeholders.
while taking into account multiple objectives or weights. In multi- Quadratic Assignment Problem (QAP) (Knowles & Corne, 2003) is a
12
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
combinatorial optimization problem that involves allocating a set of minimize the distances between a single point and a given set of target
facilities to a set of locations, where each facility has a certain flow to lines. This problem arises in various fields, such as robotics, computer
every other facility and each location has a certain distance to every vision, and geographic information systems.
other location. The objective of the QAP is to minimize the total More experimental details on KP, NRP, TSP, QAP, MP-DMP and ML-
weighted sum of the flows between all pairs of facilities, while satisfying DMP are shown in Fig. 11. The number of objectives is listed in the
the constraints of assigning each facility to exactly one location and each bracket next to the problem. The mean HV value obtained by each al
location to exactly one facility. gorithm is dotted in Fig. 11, and the rankings of each algorithm obtained
Multipoint Distance Minimization Problem (MP-DMP) (M. Li et al., by Wilcoxon’s rank-sum test are shown on the right of each subplot. The
2018) is a mathematical optimization problem that aims to minimize the mean HV obtained by comparison algorithms and QLFHH on real-world
distances between a single point and a given set of target points. This problems are also listed in Table 6. When comparing with meta-heuris
problem is widely applicable in various fields, such as facility location, tics. QLFHH ranks first on 11 out of 12 problems except for 2-objective
network design, and vehicle routing. mQAP. Compared to hyper-heuristics. QLFHH is the top algorithm on 12
Multiline Distance Minimization Problem (ML-DMP) (Koppen & problems except for mQAP and (5-, 8-, and 10-objective) ML-DMP.
Yoshida, 2007) is a mathematical optimization problem that aims to
13
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
Fig. 9. The approximate PFs among 30 runs obtained by each algorithm on 3-objective IMOP6.
4.7. Cross-domain performance analysis ability of each algorithm across the 2 and 3-objective DTLZ and IMOP
test suites, as well as the real-world test suite. Each algorithm is run in 30
The cross-domain ability of hyper-heuristics needs to be assessed independent trials on one problem. Table 7, Table 8 and Table 9 display
when solving multiple problems. This section analyzes the cross-domain the ranking of QLFHH and other algorithms based on the experimental
14
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
Fig. 10. Mean HV values and corresponding Wilcoxon’s rank-sum test of each algorithm.
Fig. 11. Mean HV values and corresponding Wilcoxon’s rank-sum test of each algorithm on real-world problems.
15
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
results of all test problems. It can be shown that QLFHH is the best- Declaration of competing interest
performing algorithm on all test suites among all algorithms. It dem
onstrates that QLFHH is both competitive and effective in addressing a The authors declare that they have no known competing financial
wide range of different characteristics of MOPs, including real-world interests or personal relationships that could have appeared to influence
MOPs. ’Average rank’ denotes the rank value averaged from all test the work reported in this paper.
suites, while ’Final rank’ is the rank according to the ’Average rank’.
QLFHH is the top algorithm, having the best cross-domain ability with Acknowledgements
an ’Average rank’ of 7.6, 15, and 16 respectively, ranking first.
This work was financially supported by the National Natural Science
Foundation of China under grant 62473182. It was also supported by the
4.8. Discussions Key Program of National Natural Science Foundation of Gansu Province
under Grant 23JRRA784, the Industry Support Project of Gansu Prov
From the results of the experiment above, it can be seen that the ince College under Grant 2024CYZC-15, the Intellectual property pro
QLFHH is an effective approach for MOPs. In the early stage of evolu gram of Gansu Province under Grant 24ZSCQG045 respectively.
tion, slow convergence speed will occur on ZDT(1–6), DTLZ(5–7),
IMOP1 and IMOP6 test problems, which is because in the process of Data availability
training, the agent requires numerous explorations, and adopts different
low-level heuristics in order to train the agent. In the middle of the al Data will be made available on request.
gorithm, ∊ value is reduced to approximately 0, the exploration process
gradually decreases then fuzzification decreases and shifts to exact References
evolution. In the exact evolution, the trained agent selects the low-level
heuristic with the highest Q-value in different states according to the Q- Bader, J., & Zitzler, E. (2011). HypE: An Algorithm for Fast Hypervolume-Based Many-
Objective Optimization. Evolutionary Computation, 19(1), 45–76. https://doi.org/
table. QLFHH outperformed the 3 individual low-level heuristics on
10.1162/EVCO_A_00009
almost all of the selected test problems, and compared to the other four Burke, E. K., Hyde, M. R., Kendall, G., Ochoa, G., Özcan, E., & Woodward, J. R. (2019).
algorithms, QLFHH also shows a certain degree of advantage. A classification of hyper-heuristic approaches: Revisited. International Series in
In selecting hyper-heuristic algorithms, low-level heuristics’ perfor Operations Research and Management Science, 272, 453–477. https://doi.org/
10.1007/978-3-319-91086-4_14/COVER
mance directly affects its optimization effect. Three well-known meta- Cao, J., Zhang, J., Zhao, F., & Chen, Z. (2021). A two-stage evolutionary strategy based
heuristic algorithms, NSGA-II, SPEA2, and IBEA, are selected as LLHs in MOEA/D to multi-objective problems. Expert Systems with Applications, 185(July
this paper. Choosing other metaheuristic algorithms or operators is 2020), Article 115654. https://doi.org/10.1016/j.eswa.2021.115654
Caramia, M., & Dell’Olmo, P. (2020). Multi-objective Optimization. Multi-Objective
worth studying. Management in Freight Logistics, 21–51. https://doi.org/10.1007/978-3-030-50812-8_
In summary, the proposed QLFHH performs competitively in solving 2
MOPs with various features. At the same time, this study also provides a Corne, D. W., & Knowles, J. D. (2007). Techniques for highly multiobjective
optimisation: Some nondominated points are better than others. In Proceedings of
new idea for solving multi-objective optimization problems in the real GECCO 2007: Genetic and Evolutionary Computation Conference (pp. 773–780).
world. https://doi.org/10.1145/1276958.1277115
Deb, K., Agrawal, S., Pratap, A., & Meyarivan, T. (2000). A fast elitist non-dominated
sorting genetic algorithm for multi-objective optimization: NSGA-II. Lecture Notes in
5. Conclusion and future work Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), 1917, 849–858. https://doi.org/10.1007/3-540-45356-3_83
Deng, W., Zhang, X., Zhou, Y., Liu, Y., Zhou, X., Chen, H., & Zhao, H. (2022). An
In this paper, a novel multi-objective hyper-heuristic is proposed,
enhanced fast non-dominated solution sorting genetic algorithm for multi-objective
named QLFHH. This algorithm combines the advantages of the existing problems. Information Sciences, 585, 441–453. https://doi.org/10.1016/J.
three MOEAs to solve MOPs with different features and produce INS.2021.11.052
competitive results. Q-learning and ∊ – greedy are used in the high-level Hitomi, N., & Selva, D. (2016). A Hyperheuristic Approach to Leveraging Domain
Knowledge in Multi-Objective Evolutionary Algorithms. Proceedings of the ASME
strategy of the hyper-heuristic framework. The state set, action set, and Design Engineering Technical Conference, 2B–2016. https://doi.org/10.1115/
reward mechanism are designed according to the characteristics of DETC2016-59870
MOPs. A fuzzy policy is utilized in the offspring reproduction process, it Hua, Y., Liu, Q., Hao, K., & Jin, Y. (2021). A Survey of Evolutionary Algorithms for Multi-
Objective Optimization Problems with Irregular Pareto Fronts. IEEE/CAA Journal of
is helpful in improving the performance of the algorithm. A potential Automatica Sinica, 8(2), 303–318. https://doi.org/10.1109/JAS.2021.1003817
drawback of QLFHH is probably that it considers HV metrics in the Jia, Y., Yan, Q., & Wang, H. (2023). Q-learning driven multi-population memetic
reward mechanism and also uses non-dominated sorting in determining algorithm for distributed three-stage assembly hybrid flow shop scheduling with
flexible preventive maintenance. Expert Systems with Applications, 232, Article
the population state, HV and non-dominated sorting are both time- 120837. https://doi.org/10.1016/J.ESWA.2023.120837
consuming, and it results in a longer overall time for the algorithm. In Ji, J. J., Guo, Y. N., Gao, X. Z., Gong, D. W., & Wang, Y. P. (2023). Q-Learning-Based
QLFHH, the role of Q-learning impacts the entire model, making the Hyperheuristic Evolutionary Algorithm for Dynamic Task Allocation of
Crowdsensing. IEEE Transactions on Cybernetics, 53(4), 2211–2224. https://doi.org/
limitations of QLFHH closely tied to those of Q-learning. When the
10.1109/TCYB.2021.3112675
dimension of the problem is very high, the state space and action space Knowles, J., & Corne, D. (2003). Instance generators and test suites for the multiobjective
will increase exponentially, resulting in the Q-table of Q-learning being quadratic assignment problem. Lecture Notes in Computer Science (Including Subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2632,
too large to be effectively learned and stored. In the context of dynam
295–310. https://doi.org/10.1007/3-540-36970-8_21/COVER
ically changing environments (such as: real-time optimization, online Koppen, M., & Yoshida, K. (2007). Substitute distance assignments in NSGA-II for
learning), Q-learning may not be able to adapt to the changes quickly, handling many-objective optimization problems. In Lecture Notes in Computer Science
resulting in the model being outdated. Therefore, the proposed QLFHH (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics). https://doi.org/10.1007/978-3-540-70928-2_55/COVER
is not suitable for situations such as strict time requirements, excessive Kumari, A. C., & Srinivas, K. (2016). Hyper-heuristic approach for multi-objective
problem dimensions, and dynamic environments. software module clustering. Journal of Systems and Software, 117, 384–401. https://
In future work, QLFHH will be applied to tackle different complex doi.org/10.1016/J.JSS.2016.04.007
Li, M., Grosan, C., Yang, S., Liu, X., & Yao, X. (2018). Multiline Distance Minimization: A
optimization problems (multi-model, multi-tasking, or real-world Visualized Many-Objective Test Problem Suite. IEEE Transactions on Evolutionary
problems) to assess its effectiveness and cross-domain ability. Further Computation, 22(1), 61–78. https://doi.org/10.1109/TEVC.2017.2655451
more, it is worthwhile to combine different low-level heuristics (LLHs) Li, M., Yang, S., & Liu, X. (2016). Pareto or Non-Pareto: Bi-criterion evolution in
multiobjective optimization. IEEE Transactions on Evolutionary Computation, 20(5),
based on the problem and to design various state sets and reward 645–665. https://doi.org/10.1109/TEVC.2015.2504730
mechanisms. The improvement for the limitation of QLFHH is also a
challenging work.
16
F. Zhao et al. Expert Systems With Applications 277 (2025) 127232
Li, M., & Yao, X. (2019a). Quality evaluation of solution sets in multiobjective Walker, D. J., & Keedwell, E. (2016). Multi-objective optimisation with a sequence-based
optimisation: A survey. ACM Computing Surveys, 52(2), 1–43. https://doi.org/ selection hyper-heuristic. In GECCO 2016 Companion - Proceedings of the 2016 Genetic
10.1145/3300148 and Evolutionary Computation Conference (pp. 81–82). https://doi.org/10.1145/
Li, M., & Yao, X. (2019b). What weights work for you? Adapting weights for any Pareto 2908961.2909016
front shape in decomposition-based evolutionary multiobjective optimisation. Wang, X., Jin, Y., Schmitt, S., & Olhofer, M. (2020). An adaptive Bayesian approach to
Evolutionary Computation, 28(2), 227–253. https://doi.org/10.1162/evco_a_00269 surrogate-assisted evolutionary multi-objective optimization. Information Sciences,
Li, W., Özcan, E., Drake, J. H., & Maashi, M. (2023). A generality analysis of 519, 317–331. https://doi.org/10.1016/j.ins.2020.01.048
multiobjective hyper-heuristics. Information Sciences, 627, 34–51. https://doi.org/ Wang, Y., & Li, B. (2010). Multi-strategy ensemble evolutionary algorithm for dynamic
10.1016/J.INS.2023.01.047 multi-objective optimization. Memetic Computing, 2(1), 3–24. https://doi.org/
Li, W., Özcan, E., & John, R. (2019). A Learning Automata-Based Multiobjective Hyper- 10.1007/S12293-009-0012-0/METRICS
Heuristic. IEEE Transactions on Evolutionary Computation, 23(1), 59–73. https://doi. Yang, T., Zhang, S., & Li, C. (2021). A multi-objective hyper-heuristic algorithm based on
org/10.1109/TEVC.2017.2785346 adaptive epsilon-greedy selection. Complex and Intelligent Systems, 7(2), 765–780.
Maashi, M., Özcan, E., & Kendall, G. (2014). A multi-objective hyper-heuristic based on https://doi.org/10.1007/S40747-020-00230-8/FIGURES/13
choice function. Expert Systems with Applications, 41(9), 4475–4493. https://doi.org/ Yang, X., Zou, J., Yang, S., Zheng, J., & Liu, Y. (2021). A Fuzzy Decision Variables
10.1016/J.ESWA.2013.12.050 Framework for Large-scale Multiobjective Optimization. IEEE Transactions on
McClymont, K., & Keedwell, E. C. (2011). Markov chain hyper-heuristic (MCHH): An Evolutionary Computation. https://doi.org/10.1109/TEVC.2021.3118593
online selective hyper-heuristic for multi-objective continuous problems. Genetic and Zhang, S., Ren, Z., Li, C., & Xuan, J. (2020). A perturbation adaptive pursuit strategy
Evolutionary Computation Conference, GECCO’11, 2003–2010. https://doi.org/ based hyper-heuristic for multi-objective optimization problems. Swarm and
10.1145/2001576.2001845 Evolutionary Computation, 54, Article 100647. https://doi.org/10.1016/J.
Pan, L., Xu, W., Li, L., He, C., & Cheng, R. (2021). Adaptive simulated binary crossover SWEVO.2020.100647
for rotated multi-objective optimization. Swarm and Evolutionary Computation, 60, Zhang, Y., Bai, R., Qu, R., Tu, C., & Jin, J. (2022). A deep reinforcement learning based
Article 100759. https://doi.org/10.1016/J.SWEVO.2020.100759 hyper-heuristic for combinatorial optimisation with uncertainties. European Journal
Qian, Z., Zhao, Y., Wang, S., Leng, L., & Wang, W. (2018). A hyper heuristic algorithm for of Operational Research, 300(2), 418–427. https://doi.org/10.1016/J.
low carbon location routing problem. In Lecture Notes in Computer Science (Including EJOR.2021.10.032
Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Zhang, Y., Harman, M., & Mansouri, S. A. (2007). The multi-objective next release
https://doi.org/10.1007/978-3-319-92537-0_21/COVER problem. Proceedings of GECCO 2007: Genetic and Evolutionary Computation
Tian, Y., Cheng, R., Zhang, X., Li, M., & Jin, Y. (2019). Diversity Assessment of Multi- Conference, 1129–1137. doi: 10.1145/1276958.1277179.
Objective Evolutionary Algorithms: Performance Metric and Benchmark Problems Zhao, F., Di, S., & Wang, L. (2022). A Hyperheuristic With Q-Learning for the
[Research Frontier]. IEEE Computational Intelligence Magazine, 14(3), 61–74. https:// Multiobjective Energy-Efficient Distributed Blocking Flow Shop Scheduling Problem.
doi.org/10.1109/MCI.2019.2919398 IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2022.3192112
Tian, Y., Cheng, R., Zhang, X., Su, Y., & Jin, Y. (2019). A Strengthened Dominance Zhu, Q., Wu, X., Lin, Q., Ma, L., Li, J., Ming, Z., & Chen, J. (2023). A survey on
Relation Considering Convergence and Diversity for Evolutionary Many-Objective Evolutionary Reinforcement Learning algorithms. Neurocomputing, 126628. https://
Optimization. IEEE Transactions on Evolutionary Computation, 23(2), 331–345. doi.org/10.1016/J.NEUCOM.2023.126628
https://doi.org/10.1109/TEVC.2018.2866854 Zitzler, E., & Künzli, S. (2004). Indicator-based selection in multiobjective search. Lecture
Tian, Y., Zhang, X., Cheng, R., He, C., & Jin, Y. (2020). Guiding Evolutionary Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and
Multiobjective Optimization with Generic Front Modeling. IEEE Transactions on Lecture Notes in Bioinformatics), 3242, 832–842. https://doi.org/10.1007/978-3-540-
Cybernetics, 50(3), 1106–1119. https://doi.org/10.1109/TCYB.2018.2883914 30217-9_84
Venske, S. M., Almeida, C. P., Lüders, R., & Delgado, M. R. (2022). Selection hyper- Zitzler, E., Laumanns, M., & Thiele, L. (2001). SPEA2: Improving the Strength Pareto
heuristics for the multi and many-objective quadratic assignment problem. Evolutionary Algorithm. Evolutionary Methods for Design Optimization and Control with
Computers & Operations Research, 148, Article 105961. https://doi.org/10.1016/J. Applications to Industrial Problems, 95–100. doi: 10.1.1.28.7571.
COR.2022.105961 Zitzler, E., & Thiele, L. (1999). Multiobjective evolutionary algorithms: A comparative
Vrugt, J. A., & Robinson, B. A. (2007). Improved evolutionary optimization from case study and the strength Pareto approach. IEEE Transactions on Evolutionary
genetically adaptive multimethod search. Proceedings of the National Academy of Computation, 3(4), 257–271. https://doi.org/10.1109/4235.797969
Sciences of the United States of America, 104(3), 708–711. https://doi.org/10.1073/
PNAS.0610471104/SUPPL_FILE/IMAGE9.GIF
17