0% found this document useful (0 votes)
78 views23 pages

A Collaborative Iterated Greedy Algorithm With Reinforcement Learning For Energy-Aware Distributed Blocking Flow-Shop Scheduling

Uploaded by

Ayoub Ouhadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views23 pages

A Collaborative Iterated Greedy Algorithm With Reinforcement Learning For Energy-Aware Distributed Blocking Flow-Shop Scheduling

Uploaded by

Ayoub Ouhadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Swarm and Evolutionary Computation 83 (2023) 101399

Contents lists available at ScienceDirect

Swarm and Evolutionary Computation


journal homepage: www.elsevier.com/locate/swevo

A collaborative iterated greedy algorithm with reinforcement learning for


energy-aware distributed blocking flow-shop scheduling
Haizhu Bao a, Quanke Pan a, b, *, Rubén Ruiz c, Liang Gao d
a
School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, PR China
b
School of Computer Science, Liaocheng University, Liaocheng 252000, PR China
c
Grupo de Sistemas de Optimización Aplicada, Universitat Politècnica de València, Camino de Vera S/N, València 46021, Spain
d
State Key Lab of Intelligent Manufacturing Equipment and Technology in Huazhong University of Science and Technology, Wuhan 430074, PR China

A R T I C L E I N F O A B S T R A C T

Keywords: Energy-aware scheduling has attracted increasing attention mainly due to economic benefits as well as reducing
Energy-aware scheduling the carbon footprint at companies. In this paper, an energy-aware scheduling problem in a distributed blocking
Flow-shop flow-shop with sequence-dependent setup times is investigated to minimize both makespan and total energy
Q-learning
consumption. A mixed-integer linear programming model is constructed and a cooperative iterated greedy al­
Iterated greedy
Multi-objective optimization
gorithm based on Q-learning (CIG) is proposed. In the CIG, a top-level Q-learning is focused on enhancing the
utilization ratio of machines to minimize makespan by finding a scheduling policy from four sequence-related
operations. A bottom-level Q-learning is centered on improving energy efficiency to reduce total energy con­
sumption by learning the optimal speed governing policy from four speed-related operations. According to the
structure characteristics of solutions, several properties are explored to design an energy-saving strategy and
acceleration strategy. The experimental results and statistical analysis prove that the CIG is superior to the state-
of-the-art competitors with improvement percentages of 20.16 % over 2880 instances from the well-known
benchmark set in the literature.

1. Introduction significant for the smooth processing of jobs. The duration of these op­
erations is affinitive with the sequence of two adjacent operations,
The blocking flow-shop scheduling problem (BFSP) arises in various which is called sequence-dependent setup time (SDST) in literature [2].
production environments. The common characteristic of this kind of Further, the production process typically involves two conflicting ob­
scheduling problem is that machines do not have buffers, which causes jectives such as equipment utilization and energy efficiency.
the previous operation of a job to be released on the anterior machine Hence, this paper addresses an energy-efficiency distributed BFSP
only when the following machine is not occupied [1]. Fig. 1 shows the with SDSTs (EDBFSP-SDST) to minimize makespan and total energy
common process in aluminum production in the non-ferrous metallur­ consumption (TEC). Minimizing the makespan is akin to improving
gical industry which includes electrolysis, casting, cold rolling, anneal­ equipment utilization. Minimizing TEC is helpful in increasing energy
ing, and aluminum foil rolling. In the process of electrolysis, pure efficiency and thus reducing carbon footprint [3,4].
aluminum is extracted and purified from alumina in electrolysis cells. The iterated greedy (IG) algorithm is a proven advantage approach
The cells operate 24 h a day at a supercurrent (150,000 amps or more) to in scheduling [2,5,6]. Framinan and Leisten [7] designed a
ensure that the molten aluminum does not solidify. In the continuous multi-objective IG for the canonical flow-shop scheduling problem
casting process, the molten aluminum can be released from the elec­ (FSP). Ciavotta et al. [8] developed a restarted iterated Pareto greedy for
trolytic cells to the foundry only when the continuous casting machine is the FSP with setup times. Please note that IG originally iterated with a
not occupied, because the continuous casting machine does not have single solution and was developed for a single goal, so if high-quality
buffers and needs to keep the aluminum in liquid form and avoid so­ results must be attained, it is impossible to make simple adjustments.
lidification. In addition, this process also includes certain nonproductive Fortunately, reinforcement learning (RL) has been actively utilized to be
operations, such as job transportation and tool replacement, which are effective for multi-objective scheduling problems [9,10]. In our work, a

* Corresponding author at: School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, PR China.
E-mail address: [email protected] (Q. Pan).

https://doi.org/10.1016/j.swevo.2023.101399
Received 16 May 2023; Received in revised form 4 July 2023; Accepted 8 September 2023
Available online 11 September 2023
2210-6502/© 2023 Elsevier B.V. All rights reserved.
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Fig. 1. The whole process of aluminum production.

collaborative IG algorithm with reinforcement learning (CIG) is [28,29], mixed no-idle distributed PFSP [30], DBFSP [2,16,27,31–35],
designed. Compared to previous research work, the contribution of this parallel FSP [2], no-idle flow-shop [36,37], and no-wait flow-shop [38,
paper can be summarized as follows. 39]. For the most of the above algorithms, constructive heuristics were
employed to quickly obtain initial solutions. Then, metaheuristics were
1) A bi-population cooperative IG with two levels Q-learning is utilized to further ameliorate them. Metaheuristics utilized global in­
developed. formation to generate new solutions in each iteration and obtained
2) The theoretical properties of the EDBFSP-SDST are explored. An near-optimal solutions but suffered from poor time efficiency due to
energy-saving strategy and an acceleration strategy based on the their complex iteration process and search patterns in a huge search
properties are designed to further optimize the solutions. space. This limits their practical application in large-scale production
3) The EDBFSP-SDST with the makespan and TEC criteria is presented environments, where multi-objective production scheduling further
and modeled by using a mixed-integer linear programming (MILP) challenges such metaheuristics.
model that is not studied before our study. For RL-based methods, Zhao et al. [40] utilized a Q-learning to settle
an EDFSP. Heger and Voss [41] presented an innovative RL as a
The remainder of this paper is structured as follows. In Section 2, we hyper-heuristic to dynamically regulate dispatching rules in a flexible
present a survey of the closely related literature. The considered flow-shop. Lee and Kim [42] addressed a robotic FSP by the RL method.
EDBFSP-SDST is described in Section 3. The proposed CIG is detailed in Pan et al. [43] designed a deep RL for the PFSP to realize the end-to-end
Section 4. The experimental results are shown in Section 5. Finally, the output. Cheng et al. [1] investigated a hyper-heuristic with RL for a
conclusions and further research topics are given in Section 6. mixture of job-shop and FSP. Cao et al. [44] addressed a scheduling
problem of semiconductor testing facilities with the makespan criterion
2. Related works by a cuckoo search algorithm with RL. Shiue et al. [45] developed a
real-time scheduling based on RL by multiple dispatching rules. Park
The literature including the distributed FSP (DFSP), BFSP, DFSP with et al. [46] used an approach based on RL for the scheduling problem
SDST, and RL in scheduling are reviewed. The study on DFSP has two with reentrant flows and SDSTs to minimize makespan. Overall, the
subfields. One is the problem models, and the other is the approaches. above studies utilized Q-learning to determine an appropriate dis­
For the models, Naderi and Ruiz [11] studied a distributed permutation patching rule in each state. Compared with the dispatching rules and
flow-shop scheduling (DPFSP) model first in 2010. Hatami et al. [12] metaheuristics, long-term scheduling performance can be accomplished
solved the model of a DFSP with an assembly stage (DAFSP) in supply by adaptively determining optimal dispatching rules and local search
chain systems. Ribas et al. [13] proposed a MILP model for a distributed methods for different states by RL agents [47].
BFSP (DBFSP), which is the first literature for the DBFSP. The methods
for the DFSPs can be classified into three main categories: dispatching 3. The EDBFSP-SDST problem
rules-based, metaheuristic-based, and RL-based methods. Diverse dis­
patching rules have been developed in the last decade, such as the basic For the EDBFSP-SDST, we have n jobs and δ identical factories. Each
priority rules [14], composite rules [15], and heuristic rules [16]. factory is a flow-shop consisting of m machines. A job can only be
Despite the ease of implementation, most dispatching rules are myopic assigned to one of the factories for processing. Each job follows the same
and depend on the production environment and optimization objectives, machine route and does not change the sequence of jobs from one ma­
it is difficult to reach an equipoise among different objectives using a chine to another. The setup times are separate from the processing times
unitary rule. and related to the sequence. Since there is no intermediate buffer be­
Multifarious metaheuristics have been employed for the basic DFSPs tween consecutive machines, the job does not leave the current machine
and DFSPs with additional constraints. For the basic DFSPs, the meta­ immediately after completion until the next machine is free and setup is
heuristics include IG [6], artificial bee colony algorithm [17], and complete. Each machine is set with s speed levels, which cannot be
estimation of the distribution algorithm [18]. For the DFSPs with changed during the processing. Suppose that the higher the processing
additional constraints, Ribas et al. [19] considered an IG algorithm for a speed is, the shorter the processing time is and the more energy con­
DBFSP with a total tardiness criterion. Wang and Wang [20] considered sumption the machine consumes. The other common constraints of flow-
an energy-efficient DFSP (EDFSP) and designed a knowledge-based shop are also considered:
cooperative algorithm (KCA). Wang et al. [21] investigated a
multi-objective whale swarm algorithm (MOWSA) to settle an EDFSP. (1) All jobs are independent and released at time zero;
An evolutionary algorithm based on decomposition (MOEA/D) [22] is (2) each job is only processed on one machine at the same time;
designed for solving an EDFSP with the objectives of makespan and TEC. (3) each machine processes only one job at the same time;
There are also some other issues, such as a parallel batching DFSP with (4) pre-emption is not allowed.
deteriorating jobs [23], a DFSP with machine breakdown [24], a
distributed re-entrant PFSP [25]. In addition, the SDST in FSPs has been The EDBFSP-SDST can be decomposed into three subproblems: first,
studied for practical applications [26], for example, the BFSP [27], PFSP the problem of job assignment (assigning n jobs to δ factories), second,

2
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Fig. 2. Aluminum production model in the distributed environment.

determining the sequence in which the jobs are processed, and third, ∑
n
determining the processing speed of each job on m machines. The ob­ s.t. xj′,j = 1, j = 1, 2, …, n (2)
jectives are to minimize makespan and TEC. j′=0,j∕
=j′

3.1. Mathematical model ∑


n
xj′,j = 1, j′ = 1, 2, …, n (3)
=j′
j=0,j∕
3.1.1. Notation
The notations are listed as follows. ∑
n

Indexes:
xj′,0 = δ (4)
j′=1
j, j′ Index of job, j, j′ = 1, 2, ⋯, n.
i Index of machine, i = 0, 1, 2, ⋯, m. 0 indicates a dummy machine.
f Index of factory, f = 1, 2, ⋯, δ. ∑
n

v Index of speed level, v = 1, 2, …, s.


x0,j = δ (5)
j=1
Sets:
J Job sequence, J = {J1 , J2 , …, Jn }.
M Machine sequence, M = {M1 , M2 , …, Mm }. ∑
n ∑
m ∑
s

F Factory sequence, F = {F1 , F2 , …, Fδ }.


ξj,i,v = 1 (6)
j=1 i=1 v=1
Ψ Subset of J which has at least two jobs.
πf Job sequence of Ff , πf = {πf (1), πf (2), …, πf (nf )}, πf ⊆ J, f = 1, 2, ⋯,
tj,i ξj,i,v
δ. pj,i = , i = 1, 2, …, m, j = 1, 2, …, n, v = 1, 2, …, s (7)
Π A feasible solution, Π = {π1 , …, πf , …, πδ }. Vv
Parameters:
∑∑
n Number of jobs. = j′
xj′,j ≤ |Ψ| − 1, ∀Ψ ⊆ J, j ∕ (8)
m Number of machines. J ′ ∈Ψ Jj ∈Ψ
Number of factories.
j
δ
s Number of processing speed levels.
nf Number of jobs in Ff , nf < n. Dj,i ≥ Dj,i− 1 + pj,i , i = 1, 2, …, m, j = 1, 2, …, n (9)
h A sufficiently large positive number.
( )
Vv The v-th processing speed level. Dj,i ≥ Dj′,i+1 + stj′,j,i+1 + xj′,j − 1 ⋅h, i = 1, 2, ⋯, m − 1, j = 1, 2, ⋯, n, j′
Oj,i Operation of job Jj on machine Mi .
tj,i Standard processing time of Oj,i . = j′
= 1, 2, ⋯, n, j ∕
pj,i Actual processing time of Oj,i . (10)
stj′,j,i Setup time for changeover from Jj′ to Jj on Mi .
βi,v Energy consumption per unit time of Mi running at Vv . ( )
Dj,i ≥ Dj′,i + pj,i + stj′,j,i + xj′,j − 1 ⋅h, i = 1, 2, ⋯, m, j = 1, 2, ⋯, n, j′
γi Energy consumption per unit time of Mi at setup mode.
ζi Energy consumption per unit time of Mi at blocked mode. = j′
= 1, 2, ⋯, n, j ∕ (11)
θi Energy consumption per unit time of Mi at idle mode.
Variable: ( )
Dj,i ≥ st0,j,i + pj,i + x0,j − 1 ⋅h, j = 1, 2, …, n, i = 1, 2, …, m (12)
xj′,j Binary variable that takes value 1 if Jj is an immediate successor of Jj′ ,
and 0 otherwise. ( )
ξi,j,v Binary variable that takes value 1 if Jj is processed at Vv on Mi , and Dj,i ≥ st0,j,i+1 + x0,j − 1 ⋅h, j = 1, 2, …, n, i = 0, 1, …, m − 1 (13)
0 otherwise.
PEC Continuous variable for TEC when machines run at processing mode. ∑
n ∑
m ∑
s
SEC Continuous variable for TEC when machines run at setup mode. PEC = ξj,i,v ⋅pj,i ⋅βi,v (14)
BEC Continuous variable for TEC when machines run at blocked mode. j=1 i=1 v=1
IEC Continuous variable for TEC when machines run at idle mode.
TEC Continuous variable for TEC of the schedule. ∑
n ∑
n ∑
m
Dj,i Departure time of Oj,i . SEC = = j′
xj′,j ⋅stj′,j,i ⋅γi , j ∕ (15)
Cmax Maximum completion time of all factories. j′=0 j=1 i=1


n ∑
m− 1
( ) n ∑
∑ n
( )
3.1.2. MILP model of EDBFSP-SDST BEC= Dj,i − Dj,i− 1 − pj,i ⋅ζi + = j′
xj′,j ⋅ Dj,1 − Dj′,1 − stj′,j,1 − pj,1 ⋅ζ1 ,j∕
The EDBFSP-SDST is described as follows. j=1 i=2 j′=0 j=1

Objectives : Min{Cmax , TEC} (1) (16)

3
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Table 1 (6) implies that each operation Oj, i must be processed at one speed. The
Energy-relevant parameters. actual processing time of operation Oj, i is defined in (7). Constraint (8)
v Vv βi,v γi ζi θi ensures that the job sequence cannot form a sub-ring. Constraint (9)
ensures that the departure time of Oj,i must be greater than or equal to
1 1.00 4 × (Vv )2 0.70 1.00 0.50
2 1.30 the sum of the departure time of Oj,i− 1 plus the actual processing time of
3 1.55 operation Oj,i . Constraints (10) and (11) represent the relationship be­
4 1.75 tween the departure time of any two adjacent jobs. Constraints (12) and
5 2.10
(13) represent the departure time of the first job. The total energy
consumption when machines run in processing mode, setup mode,
blocked mode, and idle mode are defined by Eqs. (14)–(17), respec­
Table 2 tively. The TEC is defined in Eq. (18). The Cmax is defined in Eq. (19). The
The tj,i of jobs on M1 and M2 . MILP model is verified by the well-known CPLEX solver. The code of the
J1 J2 J3 J4 J5 MILP model is published on GitHub (https://github.com/banian
M1 4.00 7.80 8.75 9.00 5.25 2314/Model-of-EDBFSP-SDST.git).
M2 6.20 7.00 8.75 6.30 1.30

3.2. Realistic problem example

Table 3
Fig. 2 shows an illustration of the realistic EDBFSP-SDST in an
SDSTs stj′,j,1 and stj′,j,2 of jobs on M1 and M2 .
aluminum production industry. Aluminum industry is driven by
stj′,j,1 J1 J2 J3 J4 J5 stj′,j,2 J1 J2 J3 J4 J5 customer orders, which signifies that customers conclude contracts with
6 3 2 3 6 1 3 6 1 9 plants firstly, and then transform them into production orders in the
J1 0 3 6 8 9 0 5 4 4 5 form of alloy composition, size, etc. Then, arrange the order on the
J2 2 0 4 5 9 9 0 9 4 4 electrolytic cell and continuous casting machines, which is accompanied
J3 2 5 0 2 3 1 9 0 9 4
J4 1 2 1 0 9 1 2 9 0 9
by a large amount of energy consumption and environmental pollution,
J5 1 5 2 5 0 9 1 4 2 0 as a result of the electricity, natural gas, or other fossil fuels consump­
tion. Hence, the processes in the aluminum production can be modelled
after the proposed EDBFSP-SDST.
There is an example to show how the decision variables reflect the
Table 4
Speed of jobs on machines. solution by considering an EDBFSP-SDST with n = 5, m = 2, and δ = 2.
The energy-relevant parameters are listed in Table 1. The standard
Job Speed level Actual speed
processing times of jobs and SDSTs are shown in Tables 2 and 3,
M1 M2 M1 M2 respectively. The processing speeds are given in Table 4. The energy
J1 1 3 1.00 1.55 consumption per unit of time is detailed in Table 5. According to Table 4,
J2 2 4 1.30 1.75 the actual processing times of jobs are calculated in Table 6 by Eq. (7).
J3 4 5 1.75 2.10 Fig. 3 presents a Gantt graph for a solution, where the processing
J4 1 5 1.00 2.10
sequences are π1 = {3, 1, 5} and π2 = {2,4}. Since the example in Fig. 3
J5 4 2 1.75 1.30
was accurately solved using the CPLEX in the Section 3.1.2 and is the
optimal solution, there is no block time. In Appendix section, we have
drawn a Gantt chart for another schedule in this example again. The two
Table 5
optimization objectives are Cmax = 26 and TEC = 461.88, respectively.
Energy consumption per unit of time.
The implementation details of Cmax and TEC are shown in the Appendix
Machine J1 J2 J3 J4 J5 section.
M1 4.00 6.76 12.25 4.00 12.25
M2 9.61 12.25 17.64 17.64 12.25
4. The proposed CIG algorithm

For the EDBFSP-SDST, the following three problems must be


Table 6 addressed, (1) assigning jobs to factories, (2) sequencing jobs within
The pj,i of jobs on M1 and M2 . each factory, and (3) setting processing speeds for each operation. As a
Machine J1 J2 J3 J4 J5 multi-objective optimization, Cmax and TEC should be balanced. We
M1 4 6 5 9 3 designed two subpopulations (pop1 and pop2 ), which focuses on Cmax and
M2 4 4 5 3 1 TEC respectively. In pop1 , we selected four IGs for the Cmax criterion. In
pop2 , four common speed adjustment strategies are utilized to update
TEC. In each iteration, it is crucial to select an operator from these
n ∑
∑ n ∑
m
( ) strategies. It motivates us to embed the RL method into the algorithm.
IEC = = j′
xj′,j ⋅ Dj,i− 1 − Dj′,i − stj′,j,i ⋅θi , j ∕ (17) Thus, a collaborative IG algorithm with RL named CIG is proposed with
j′=0 j=1 i=2
four main parts: initialization, bi-population collaborative IG, energy
saving, and acceleration. In the second part, the destruction, recon­
TEC = PEC + SEC + BEC + IEC (18)
struction, local search, and acceptance criterion are redesigned in the
Cmax = max Dj,m (19) framework of reinforcement learning.
j=1,…,n The framework of the CIG is shown in Fig. 4. First, the initialized
Objective (1) is to minimize the Cmax and TEC. Constraints (2) and (3) individuals are divided into two subpopulations based on Cmax . Next, the
indicate that each job only has one direct predecessor and one direct two objectives are optimized through the speed change mechanism and
successor. Constraints (4) and (5) ensure that the dummy jobs are placed operation adjustment. Specifically, the solutions with lower Cmax is
at the end and start of the scheduling solution, respectively. Constraint continuously updated through an IG algorithm based on Q-learning in
pop1 . In pop2 , the solution with smaller TEC is continuously updated

4
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Fig. 3. Gantt chart of the EDBFSP-SDST example.

through an IG algorithm based on Q-learning. Then, pop1 and pop2 are


updated to perform the energy-saving operation and acceleration
operation, respectively. Finally, a non-dominated Pareto front approxi­
mation (Pareto archive) is returned to perform the next iteration. The
pseudo-code of CIG is detailed in Algorithm 1. The pseudo-code of
MIBSTCIG is detailed in Algorithm 2.

4.1. Population initialization

For EDBFSP-SDST, an unreasonable job sequence increases the setup


time between operations, further increasing the block time and idle time
and leading to an excessive Cmax . Hence, a heuristic is developed to
minimize the idle time and block time created by setup time (MIBST) as
much as possible. The pseudo-code of MIBST is detailed in Algorithm 2.

m { ( ) ( )}
dρf = max 0, pπf (μ− 1),i + stπf (μ− 1),ρ,i − pρ,i− 1 + stπf (μ− 1),ρ,i− 1 (20)
i=2

For the MIBST, the first step is to select the first job of each factory.
Let U denotes the set of all sequences that are not scheduled. The un­
scheduled jobs are allocated in the second step. The detailed execution
Fig. 4. The framework of CIG. process is shown in Lines 4 to 16. For the third step, the NEH is executed
in each factory except for the first and last job as shown in Lines 17-21.
For the computational effort, the time complexity of calculating the

Algorithm 1
The proposed CIG.

5
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Algorithm 2
Population initialization procedure MIBST.

4.2. Bi-population collaborative IG

Q-learning is one of the most effective approaches in RL, whose


purpose is to obtain the optimal behavior to achieve the ideal state
through the interaction of reward and punishment between the agent
and the constantly changing environment [40,46,48,49]. There is a
state-action pair in Q-learning to reflect the accumulated reward and
penalty value (expressed as a Q-value) in Eq. (21). Let S = {s1 , s2 , …, sn }
denotes a set of different states, A = {a1 , a2 , …, am } expresses a set of
optional actions, rt expresses the signal at step t, α ∈ [0, 1] is the learning
rate, γ ∈ [0, 1] expresses the discount factor, and Qt (st , at ) is the Q-value
at step t. When facing different states in the future, the Q-values are
utilized to obtain the optimal result. The execution details of Q-learning
are shown in Fig. 5.
{ }
Qt+1 (st , at ) = Qt (st , at ) + αt rt + γargmaxQ(st+1 , a) − Qt (st , at ) (21)
a∈A

In this section, a sequence-related operation (SERQ-l) and a speed-


related operation (SPRQ-l) are designed to update the sub-
populations. In this process, the top-level Q-learnings (Q 1 and Q 3)
are focused on enhancing the utilization ratio of machines to find a
scheduling policy from four sequence-related operations that can
minimize Cmax . The bottom-level Q-learnings (Q 2 and Q 4) are focused
on improving energy efficiency and learning the optimal speed gov­
erning policy from four speed-related operations to minimize TEC. The
Fig. 5. Execution details of the Q-learning algorithm. pseudo-code of SERQ-l and SPRQ-l is shown in Algorithm 3.
The sequence-related operations in SERQ-1 are SER_IG1, SER_IG2,
measurement is O(PS ×mn) in the first part. In the second part, the SER_IG3, and SER_IG4. We will later demonstrate, in the experimental
speedup approach presented by [16] is utilized to measure neighbouring section (Section 5.2), why we need this.
solutions. The whole complexity of MIBST is O(PS × fmn2 ).
(1) SER_IG1. Destruction: select and remove 2 jobs randomly in the
critical factory; Reconstruction: swap positions of the jobs;
Number of executions: number of jobs in the critical factory;

6
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Algorithm 3
SERQ-l and SPRQ-l subpopulation improvement procedures.

Fig. 6. Overall operating mechanism of the proposed CIG.

Acceptance criterion: greedy acceptance of the one with the best (4) SER_IG4. Destruction: select and remove a job randomly from the
Cmax . critical factory; Reconstruction: insert the job with all positions in
(2) SER_IG2. Destruction: select and remove a job in the critical the non-critical factory; Number of executions: once; Acceptance
factory and a job in a random non-critical factory; Reconstruc­ criterion: greedy acceptance of the one with the best Cmax .
tion: swap positions of the jobs; Number of executions: number of
jobs in the non-critical factory; Acceptance criterion: greedy SPRQ-l aims to reduce energy consumption by changing the speed of
acceptance of the one with the best Cmax . machines in pop2 . There are also four procedures involved:
(3) SER_IG3. Destruction: select and remove a job randomly in the
critical factory; Reconstruction: insert the job into all positions in (1) SPR_IG1. Destruction: select two jobs randomly from a random
the critical factory; Number of executions: once; Acceptance factory; Reconstruction: swap these processing speeds on the
criterion: greedy acceptance of the one with the best Cmax . same machines; Number of executions: number of jobs in the

7
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Algorithm 4
Determine the critical path.

selected factory; Acceptance criterion: greedy acceptance of the Definition 1. Let Πa and Πb express two solutions respectively. Πa
one with the best TEC. dominates Πb (denotes by Πa ≻ Πb ), if and only if 1) ∀ i ∈ {1, 2},
(2) SPR_IG2. Destruction: select two jobs randomly from a random fi (Πa ) ≤ fi (Πb ); and 2) ∃i ∈ {1, 2}, fi (Πa ) < fi (Πb ). Here, f1 = Cmax and
factory; Reconstruction: swap these processing speeds on the f2 = TEC.
different machines; Number of executions: number of jobs in the
f
selected factory; Acceptance criterion: greedy acceptance of the Theorem 1. Let Dmax represents the maximum completion time of Ff . If
one with the best TEC. f f
Πa and Πb satisfy 1) ∀ f ∈ {1, 2, …, δ}, Dmax (Πa ) = Dmax (Πb ); 2) ∀j ∈ {1,
(3) SPR_IG3. Destruction: select two jobs randomly from a random 2,…,n},i ∈ {1,2,…,m},Vv (Πa ) ≤ Vv (Πb ); and 3) ∃ j ∈ {1,2,…,n},i ∈ {1,
factory; Reconstruction: reverse the processing speeds between 2, …, m}, Vv (Πa ) < Vv (Πb ), then, TEC(Πa ) < TEC(Πb ) and Πa ≻ Πb .
two positions on the same machines; Number of executions: half
of the number of jobs in the selected factory; Acceptance crite­ Property 1. Let BT(Oj′, i ) denote the block time of the operation Oj′, i . If
rion: greedy acceptance of the one with the best TEC. Πa satisfies 1) ∃j′ ∈ {1,2,…,n − 1},i ∈ {1,2,…,m},BT(Πa (Oj′, i )) > 0; and
(4) SPR_IG4. Destruction: select two jobs randomly from a random
factory; Reconstruction: shuffle the processing speeds between 2) for a new solution Π′a , if Vv (PΠ′ (Oj′, i )) < Vv (PΠa (Oj′, i )) while other
a

two positions on the same machines; Number of executions: half speeds are the same, then TEC(Π′a ) < TEC(Πa ) and Cmax (Π′a ) = Cmax (Πa ).
of the number of jobs in the selected factory; Acceptance crite­ The maximum decrement Δ for Oj′, i satisfies Δ =
rion: greedy acceptance of the one with the best TEC. {
= BTj′,i = Dj′,i − Dj′,i− 1 − pj′,i , i > 1
.
= BTj,i = Dj,1 − Dj′,1 − stj′,j,1 − pj,1 , i = 1, j ∕
= j′
4.3. Markov decision process model in CIG
Property 2. If Πa satisfies1) ∃j′ ∈ {1, 2, …, n − 1}, i ∈ {3, 4, …, m},
RL is generally modeled as a 3-tuple (s, a, and r) Markov Decision BT(Πa (Oj′, i )) > 0; 2) IT(PΠa (Oj, i− 1 )) > 0; and 3) for a new solution Π′a , if
Process (MDP) model, where s, a, and r are the state, action, and reward,
Vv (PΠ′ (Oj′, i− 1 )) < Vv (PΠa (Oj′, i− 1 )) while other speeds are the same, then
respectively [40]. Fig. 6 shows the overall operating mechanism of the a

CIG. The MDP model is described in detail below. TEC(Π′a ) < TEC(Πa ) and Cmax (Π′a ) = Cmax (Πa ). The maximum decre­
State (s): for the EDBFSP-SDST, job sequence, processing speed, and ment Δ for Oj′, i satisfies Δ = min{BTj′,i ,ITj,i− 1 } = min{Dj′,i − Dj′,i− 1 − pj′,i ,
optimization objectives (Cmax and TEC) are regarded as the different Dj,i− 2 − Dj′,i− 1 − stj′,j,i− 1 }.
states indicating the current situation after achieving a certain action.
Action (a): there are two types of actions: those based on the job (aJ,t : Theorem 2. Let IT(Oj, i ) denote the idle time of Oj, i . Let Fc denote the
SER_IG1, SER_IG2, SER_IG3, and SER_IG4) and the actions based on critical factory in Πa . The operations in the critical path (the method to
machine speed (av,t : SPR_IG1, SPR_IG2, SPR_IG3, and SPR_IG4). determine the critical path is shown in Algorithm 4) are called critical
Reward (r): If an improved solution is obtained by carrying out the operations (denotes by PΠa (Oj, i )). If Πa satisfies 1) there is only one
action a, then the value of r is 1, otherwise r = 0. critical path in the Fc ; 2) ∃ j ∈ {1, 2, …, n}, i ∈ {2, 3, …, m},
Agent: each individual of CIG is regarded as an agent. IT(PΠa (Oj, i )) > 0; and 3) for a new solution Π′a , if Vv (PΠ′ (Oj, i− 1 )) >
a

Vv (PΠa (Oj, i− 1 )) while the other speeds are the same, then Cmax (Π′a ) <
4.4. Energy-saving and acceleration operations Cmax (Πa ) and the maximum decrement Δ for Oj, i− 1 satisfies Δ = ITj,i =
Dj,i− 1 − Dj′,i − stj′,j,i , i > 2, j ∕
= j′.
For EDBFSP-SDST, several theoretical properties are explored, which
can be utilized to design effective search operators.

8
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Fig. 7. Several properties and theorem diagrams of EDBFSP-SDST. (a) Property 1; (b) Property 2; (c) Theorem 2.

Fig. 8. (a) Find the critical path. (b) Decrease the speed of noncritical operation and give a new solution with better TEC.

Fig. 9. (a) Find the critical path. (b) Increase the speed of critical operations and give a new solution with a better Cmax .

Fig. 7 illustrates the above properties and theorem. Properties 1 and using idle times. Theorem 2 is to shorten Cmax by reducing the processing
2 are two different cases of Theorem 1 which reduce TEC by increasing time of specific operations.
the processing time of the specific operations but without increasing According to Properties 1 and 2, diminishing the speeds of the
Cmax . The difference is that Property 1 reduces TEC by converting block noncritical operations can decrease the TEC without deteriorating the
time into increased processing time, while Property 2 reduces TEC by Cmax . Hence, the energy-saving approach is carried out for all

9
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Table 7 algorithm. The IGD is an integrated indicator, which calculates the


ANOVA results of CIG. convergence and diversity of a solution set at the same time. Higher
Source Sum of Degrees of Mean F-ratio p-value values for the first three metrics indicate better performance.
squares freedom square Conversely, the smaller the IGD value, the better the algorithm result.
PS 136.8 5 62.25 40.33 0 The experiments are run in the Windows 10 Professional Edition with an
α 14.4 3 3.08 2 0 Intel(R) Xeon(R) Silver 4210R CPU @2.40GHz and 128 GBytes of RAM.
γ 2.4 2 0.943 0.61 0 All implemented algorithms are coded by Java.
ε 7.3 2 0.767 0.5 0
PS ∗α 12.8 9 1.69 1.09 0.0361
PS ∗γ 2.5 4 0.984 0.64 0.0464 5.1. Investigation of parameter setting
PS ∗ε 0.2 4 0.744 0.48 0.8864
α ∗γ 0.1 5 1.56 1.01 0.4415
α ∗ε 0.2 5 1.04 0.68 0.8067 The CIG contains four critical parameters: population size PS,
γ ∗ε 0.1 4 1.43 0.93 0.5014 learning rate α, discount factor γ, and epsilon-greedy ε. The design of
Error 20173.6 213840 1.54 experiments (DOE) approach [54] is utilized to calibrate the above pa­
Total 26083.0 276479 rameters. The levels of the parameters are set: PS = {20,30,40,50}, α =
{0.5, 0.6, 0.7, 0.8.0.9, 1.0}, γ = {0.7, 0.8.0.9, 1.0}, and ε =
individuals. First, determine the critical path. Secondly, decrease the {0.7, 0.8.0.9, 1.0} according to the relevant literature [40,49]. There are
speed of noncritical operations without affecting the Cmax . Last, replace 4 × 6 × 4 × 4 = 384 different parameter combinations in total. The
the Pareto archive with the newly generated solutions. Fig. 8 shows the termination criterion is the maximum CPU time (t = 10 × n × m × f)
process of decreasing the speeds in a factory. milliseconds. The calibration benchmark is n × m = {20,50,100,200} ×
Theorem 2 implies that Cmax can decrease if the speeds of critical {5,10,20}. Each combination includes one instance and four setup time
operations are increased. Therefore, the following acceleration method levels. The number of factories is set to f = {3, 5, 7}. Hence, a total of
is performed for all the individuals with lower TEC. First, determine the 384 × 4 × 3 × 1 × 4 × 3 = 55, 296 treatments are tested. Each instance
critical path for the critical factory. Then, accelerate the speed of critical is independently tested five times (replicates). Hence, there are 55,
operations where idle time on the next machine is not zero. Last, replace 296 × 5 = 276, 480 results in this calibration experiment.
the Pareto archive with the newly generated solutions. Fig. 9 shows the The results are evaluated by the analysis of variance (ANOVA)
acceleration process in the critical factory. technique to study the statistical significance of the effects of the studied
factors and their interactions. Three main hypotheses (in order of
5. Experiments and analysis importance, independence of the residuals, homoscedasticity of the
variants and levels of the different factors and normality of the model
The well-known benchmark for SDST/PFSP [50] is employed to residuals) are checked and no significant deviations are found. The
verify the performance of CIG. The benchmark consists of four subsets response variable of the ANOVA experiment is HV. The results of
(SSD-10, 50, 100, and 125). The SSD-10 means that the setup time is ANOVA are listed in Table 7. According to Table 7, PS, α, and γ show
equivalent to 10 % of the average processing time, and so on. Each set statistically significant effects for the CIG as their corresponding
consists of 120 instances of Taillard [51] with 12 different combinations. p-values are smaller than 0.05. The main effects plots are drawn in
The number of factories ranges from 2 to 7. Hence, the total of bench­ Fig. 10. From Fig. 10, PS = 30 yields better solutions than the other
mark instances is 4 × 120 × 6 = 2,880. The processing speed Vv is set to levels. Large PS values result in expensive local search steps, which
{1.0, 1.3, 1.55, 1.75, 2.1}. The energy-relevant parameters are shown in means fewer iterations in the CPU time allotted. Similarly, small values
Table 1. In Sections 5.1 and 5.3, a distinct benchmark is designed by the limit the intensification, so an intermediate value of 30 shows the best
above approach for calibration, to avoid over-fitting. The termination results.
criterion of algorithms is the maximum elapsed CPU time (c × n × m × Additionally, the interactions of PS ∗ α and PS ∗ γ are momentous
f) milliseconds, where the c is set to 20. The well-known hypervolume since their p-values are less than 0.05 in Table 7. If there is significant
(HV) [8], C Metric (CM) [52], Inverted Generational Distance (IGD) interaction between parameters, Fig. 10 is meaningless [55,56]. Further,
[53], and Overall Non-dominated Vector Generation (ONVG) [40] are the 2-level interactions between the factors are drawn in Fig. 11. The
utilized to measure the results. The HV calculates the normalized vol­ interaction of these parameters is weak and does not deviate from the
ume of the solution. The CM reflects the convergence performance of an above rules. As a result, the parameters setting of CIG is PS = 30, α =
0.7, γ = 0.8, and ε = 0.8.

Fig. 10. Main effects plot of all parameters.

10
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Fig. 11. Interaction plots with Tukey Honest Significant Differences (HSD) 95 % confidence intervals. (a) PS ∗ α. (b) PS ∗ γ.

Fig. 12. Interval plot of C metric with at the 95 % confidence interval. (a) S1 and S1′; (b) S2 and S2′; (c) S3 and S3′; (d) S4 and S4′; (e) S5 and S5′.

5.2. Evaluation of main components of CIG than S′i , i = 1, 2, 3, 4, 5, respectively. The interval plot of C Metric in­
dicates that the solutions of variants are dominated by the solutions of
Five variants of CIG are developed to validate the effectiveness of the CIG.
above components. The CIG contains five improvement components: (1) The ONVG results are reported in Table 8. The best value for each
initialization method (MIBST), (2) sequence-related operation (SERQ-l) group is highlighted in boldface. By comparing ONVG in SSD10, SSD50,
based on Q-learning (pop1 ), (3) speed-related operation (SPRQ-l) based SSD100, and SSD125, the values of ONVG for CIG are larger than CIG-RI
on Q-learning (pop2 ), (4) energy-saving strategy based on the properties, in all instances. As a result, the proposed MIBST leads to more non-
and (5) acceleration strategy based on the properties. Hence, five vari­ dominated solutions than random initialization, indicating that the
ants of CIG are compared: CIG with random initialization (CIG-RI), CIG construction heuristic based on problem knowledge has a positive
without SERQ-l (CIG-NE), CIG without SPRQ-l (CIG-NP), CIG without impact on the CIG. According to Table 8 and Fig. 12(b), CIG generates
the energy-saving strategy (CIG-NS), and CIG without the acceleration more non-dominated solutions than CIG-NE for different scenarios,
strategy (CIG-NA). The termination criterion is the same as in the above which reveals that the sequence-related operation based on Q-learning
section. to conduct a search direction is meaningful. Moreover, CIG-NP is infe­
The expression of C Metric is simplified to represent clearly. The rior to the CIG, which demonstrates that the speed-related operation
specific simplifications are shown as S1 = C(CIG, CIG-RI) and S′1 = based on Q-learning is significant for reducing energy consumption.
C(CIG-RI, CIG), S2 = C(CIG, CIG-NE) and S′2 = C(CIG-NE, CIG), S3 = Additionally, CIG-NS is inferior to CIG. The energy-saving strategy can
C(CIG, CIG-NP) and S′3 = C(CIG-NP, CIG), S4 = C(CIG, CIG-NS) and S′4 effectively reduce TEC by reducing the speed of certain processes, but it
does not increase Cmax according to Property 2 in Section 4.4.
= C(CIG-NS, CIG), S5 = C(CIG, CIG-NA) and S′5 = C(CIG-NA, CIG). As
The complete CIG outperforms CIG-NA in all instances. As observed
shown in Fig. 12, the values of the C Metric for Si are generally greater
in Table 8 and Fig. 12(e), CIG is better than CIG-NA. The Cmax of each

11
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Table 8
Average ONVG of all variants.
(n, m) SSD10 SSD50

CIG-RI CIG-NE CIG-NP CIG-NS CIG-NA CIG CIG-RI CIG-NE CIG-NP CIG-NS CIG-NA CIG

20 × 5 371 375 343 399 334 433 339 350 314 371 314 390
20 × 10 358 359 359 347 318 443 318 336 326 356 317 387
20 × 20 362 342 331 356 338 400 315 358 350 341 320 366
50 × 5 390 376 378 361 357 464 323 340 322 322 358 386
50 × 10 375 363 362 344 378 416 340 315 354 339 309 375
50 × 20 333 329 356 355 350 419 307 338 318 316 315 363
100 × 5 359 375 382 353 385 405 347 340 335 374 357 393
100 × 306 330 348 331 332 351 310 330 331 319 331 349
10
100 × 315 302 328 336 349 366 310 308 318 299 320 324
20
200 × 315 311 337 319 335 392 298 304 279 298 301 338
10
200 × 299 317 304 292 287 324 299 306 279 300 323 332
20
500 × 284 284 319 326 286 329 293 293 282 294 293 310
20
Average 338.9 338.6 345.6 343.3 337.4 395.2 316.6 326.5 317.3 327.4 321.5 359.4

SSD100 SSD125

20 × 5 245 263 265 255 251 302 217 260 245 260 231 268
20 × 10 270 281 284 273 273 319 237 270 254 276 258 286
20 × 20 303 315 277 319 302 331 277 275 288 263 306 334
50 × 5 269 264 245 282 262 316 235 248 250 242 242 303
50 × 10 304 315 286 303 291 344 254 261 251 253 271 316
50 × 20 296 289 292 313 288 337 284 288 294 288 274 341
100 × 5 312 299 299 272 285 338 265 253 280 292 251 310
100 × 263 289 290 278 282 299 260 251 287 280 237 299
10
100 × 283 279 301 285 296 314 281 235 271 261 280 286
20
200 × 282 269 269 271 283 305 256 251 278 277 265 283
10
200 × 265 293 290 287 294 309 254 253 253 269 263 277
20
500 × 264 279 267 266 279 279 235 235 235 241 235 250
20
Average 279.7 286.3 280.4 283.7 282.2 316.1 254.6 256.7 265.5 266.8 259.4 296.1

Fig. 13. Pareto front of CIG and its variants at different factories. (a) f = 2; (b) f = 4; (c) f = 6;

solution is reduced by reducing the Cmax of the critical path in the factory that all components are efficient for the performance of CIG.
with the largest Cmax according to Theorem 2. The Pareto front gener­ Furthermore, in order to demonstrate the contribution of each
ated by CIG is more diverse and uniform than that generated by the component to the proposed CIG algorithm, the statistical test of each
other variants in Fig. 13, which means that CIG can weigh the two ob­ component and its combination is conducted. A Wilcoxon signed-rank
jectives well. The Pareto front is the result obtained from all runs. test is applied to verify the significant differences between CIG and its
Additionally, Fig. 14 depicts the means plots with Tukey Honest Sig­ variants in Table 9. And the confidence level is set as 95 % (α = 0.05).
nificant Differences (HSD) confidence intervals with a 95 % confidence The sign “ + ” represents that the proposed CIG has great out­
level and investigates the interaction between the response variable. It performance when compared to another variant with a significant
can be seen from the results in Fig. 14 that they are consistent with the confidence level. On the contrary, the sign “ − ” is denoted that the CIG
above conclusions. As above experiments and analysis, it can be verified has worse performance compared with another variant with a

12
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Fig. 14. Means plots and Tukey HSD 95 % confidence intervals in the ANOVA experiment for the CIG and its variants. (a) at different SSD. (b) at different factories.

Table 9
Wilcoxon signed-rank comparison between the proposed CIG and its variants with C Metric (α = 0.05).
(n, m) CIG vs. CIG-RI CIG vs. CIG-NE CIG vs. CIG-NP CIG vs. CIG-NS CIG vs. CIG-NA

R+/R- p win R+/R- p win R+/R- p win R+/R- p win R+/R- p win

20 × 5 64/56 6.88E-01 = 99/21 1.38E-14 + 64/56 3.71E-01 = 52/68 8.41E-02 = 66/54 1.72E-01 =
20 × 10 61/59 7.58E-01 = 94/26 4.85E-12 + 57/63 2.74E-01 = 65/55 7.59E-01 = 66/54 9.82E-01 =
20 × 20 51/69 3.26E-01 = 91/29 9.33E-11 + 66/54 6.55E-01 = 56/64 2.49E-01 = 65/55 2.69E-01 =
50 × 5 64/56 9.51E-02 = 93/27 1.11E-12 + 69/51 4.51E-01 = 67/53 5.71E-01 = 63/57 8.83E-01 =
50 × 10 62/58 8.44E-01 = 99/21 1.49E-11 + 60/60 8.98E-01 = 62/58 7.63E-01 = 64/56 7.66E-01 =
50 × 20 72/48 1.30E-01 = 97/23 1.99E-12 + 60/60 6.09E-01 = 68/52 3.31E-01 = 66/54 4.91E-01 =
100 × 5 58/62 5.93E-01 = 85/35 4.00E-06 + 62/58 4.55E-01 = 61/59 1.74E-01 = 60/60 9.45E-01 =
100 × 10 68/52 4.08E-01 = 69/51 9.81E-02 = 58/62 2.92E-01 = 68/52 5.43E-01 = 62/58 5.77E-01 =
100 × 20 64/56 5.55E-01 = 70/50 6.28E-02 = 53/67 1.55E-01 = 59/61 9.84E-01 = 65/55 5.65E-01 =
200 × 10 58/62 7.06E-01 = 83/37 6.70E-05 + 62/58 9.07E-01 = 56/64 4.29E-01 = 67/53 7.71E-01 =
200 × 20 64/56 6.41E-01 = 66/54 6.82E-02 = 57/63 6.28E-01 = 56/64 2.25E-01 = 60/60 2.16E-01 =
500 × 20 81/39 8.84E-03 + 120/0 1.73E-06 + 52/68 1.02E-02 = 57/63 2.89E-02 = 101/19 6.72E-02 +
+ / = /− 1/11/0 9/3/0 0/12/0 0/12/0 1/11/0

significant confidence level. And sign “ = ” means there is no significant


Table 10
difference between the CIG and the compared one. The value of R+
Parameter settings of the algorithms.
denotes the sum of the ranks for 5 independent replications test where
Algorithms Description Parameters Parameter CIG has outperformance with the variant. Inversely, R− represents the
(factors) and combinations
sum of the rank of the opposite compared result. As for p-value is the
values (levels)
confidence value to detect the difference between the proposed CIG and
KCA (Wang Knowledge-based PS = {10, 20, 30, 16
compared one.
and Wang, cooperative algorithm 40}
2020) LS = {0,100,200,
In Table 9, the CIG is not inferior to any variant on all instances. It
300} can be observed that CIG with the MIBST is superior to it with random
MOWSA Multi-objective whale PS = {20, 50, 80, 64 initialization in terms of the C Metric. Especially when the number of
(Wang et al., swarm algorithm 100} jobs is 500, this advantage has significant differences. It demonstrates
2020) α = {0.8, 0.85,
the effectiveness of MIBST. It also wins 9 instances within the confidence
0.9, 0.95}
β = {0.1, 0.15, level in comparison with CIG-NE. It demonstrates the contribution of the
0.2, 0.25} sequence-related operation to the proposed CIG algorithm. Although
MOEA/D Multi-objective PS = {50, 80, 81 CIG does not win all instances compared to CIG-NP, CIG-NS, and CIG-
(Wang et al., evolutionary algorithm 100} NA, according to the average of all instances in Fig. 12(c, d, and e), it
2021) based on decomposition T = {2, 5, 10}
α = {0.8, 0.85,
can be seen that CIG still outperforms these three variants in overall
0.9} performance. In summary, the good performance of CIG can attribute to
β = {0.05, 0.1, the main components.
0.2}
HHQL [40] Hyper-heuristic with Q- PS = {20, 30, 40, 15
learning 50, 60} 5.3. Calibration of the competing methods
γ = {0.7,0.8,0.9}
CWOA [32] Cooperative whale PS = {5, 10, 15, 4
The effectiveness of CIG is further demonstrated by comparing KCA
optimization algorithm 20}
INSGA-II [39] Improved NSGA-II pc = {0.2,0.4,0.6, 16 [20], MOWSA [21], MOEA/D [22], hyper-heuristic with Q-learning
0.8} (HHQL) [40],cooperative whale optimization algorithm (CWOA) [32],
pm = {0.2, 0.4, and improved NSGA-II (INSGA-II) [39]. The KCA, MOWSA, MOEA/D,
0.6, 0.8} and INSGA-II are effective approaches for other EFSPs. HHQL, which
also employs Q-learning, is an advanced method for energy-efficient
DBFSP (EDBFSP). The CWOA is the latest report on EDBFSP. Hence,

13
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Fig. 15. Main effects plot of all comparison algorithms parameters.

Table 11
Average HV of CIG and compared algorithms (SSD10 and SSD50).
(n, m) SSD10 SSD50

KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG

20 × 5 0.088 0.096 0.084 0.097 0.094 0.097 0.119 0.097 0.112 0.092 0.109 0.092 0.109 0.118
20 × 10 0.072 0.081 0.070 0.080 0.077 0.080 0.100 0.078 0.095 0.074 0.088 0.077 0.088 0.105
20 × 20 0.056 0.065 0.054 0.062 0.060 0.062 0.082 0.062 0.079 0.060 0.069 0.061 0.069 0.091
50 × 5 0.072 0.081 0.069 0.088 0.073 0.088 0.094 0.080 0.105 0.077 0.100 0.076 0.100 0.117
50 × 10 0.059 0.067 0.057 0.076 0.062 0.076 0.080 0.063 0.083 0.061 0.082 0.060 0.082 0.091
50 × 20 0.046 0.051 0.045 0.058 0.049 0.058 0.064 0.052 0.068 0.051 0.066 0.050 0.066 0.078
100 × 5 0.042 0.039 0.039 0.046 0.047 0.045 0.052 0.041 0.044 0.039 0.051 0.030 0.032 0.052
100 × 0.041 0.044 0.039 0.051 0.062 0.043 0.056 0.048 0.064 0.046 0.068 0.058 0.038 0.068
10
100 × 0.038 0.045 0.037 0.053 0.067 0.042 0.054 0.042 0.058 0.041 0.059 0.054 0.035 0.059
20
200 × 0.037 0.039 0.027 0.037 0.033 0.021 0.055 0.028 0.031 0.021 0.028 0.024 0.015 0.042
10
200 × 0.030 0.034 0.022 0.031 0.025 0.016 0.046 0.026 0.029 0.020 0.026 0.021 0.014 0.038
20
500 × 0.024 0.022 0.019 0.025 0.024 0.019 0.040 0.019 0.016 0.014 0.018 0.019 0.014 0.029
20
Average 0.051 0.055 0.047 0.059 0.056 0.054 0.070 0.053 0.065 0.050 0.064 0.052 0.055 0.074

these methods are compared with the proposed CIG. As we all know, the HV. As shown in Table 10, the levels of the parameters are set ac­
metaheuristics need to be properly calibrated for optimal performance. cording to the relevant literature.
Same as Section 5.1, the DOE methodology is employed to calibrate The experimental results are evaluated by the ANOVA method. Due
these four competing approaches. The response variable to minimize is to space limitations, the complete details of the calibrations for these

14
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Table 12
Average HV of CIG and comparison algorithms (SSD100 and SSD125).
(n, m) SSD100 SSD125

KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG

20 × 5 0.084 0.091 0.082 0.096 0.082 0.096 0.099 0.077 0.081 0.073 0.086 0.067 0.079 0.086
20 × 10 0.063 0.071 0.061 0.073 0.066 0.073 0.088 0.067 0.076 0.067 0.078 0.065 0.078 0.083
20 × 20 0.053 0.060 0.051 0.059 0.058 0.059 0.078 0.056 0.065 0.055 0.064 0.055 0.064 0.072
50 × 5 0.057 0.064 0.055 0.072 0.058 0.072 0.073 0.057 0.063 0.056 0.073 0.047 0.055 0.075
50 × 10 0.051 0.060 0.050 0.068 0.053 0.068 0.071 0.053 0.058 0.052 0.068 0.046 0.055 0.069
50 × 20 0.043 0.051 0.042 0.055 0.045 0.055 0.060 0.047 0.055 0.047 0.060 0.043 0.054 0.060
100 × 5 0.032 0.035 0.031 0.041 0.026 0.027 0.042 0.028 0.031 0.028 0.036 0.023 0.019 0.038
100 × 0.035 0.042 0.034 0.050 0.044 0.031 0.050 0.030 0.033 0.029 0.038 0.024 0.020 0.042
10
100 × 0.033 0.040 0.032 0.046 0.041 0.030 0.046 0.028 0.029 0.027 0.034 0.023 0.020 0.038
20
200 × 0.024 0.027 0.019 0.024 0.020 0.013 0.034 0.024 0.027 0.019 0.023 0.019 0.013 0.035
10
200 × 0.022 0.025 0.018 0.022 0.018 0.012 0.031 0.022 0.025 0.018 0.021 0.017 0.012 0.032
20
500 × 0.015 0.013 0.013 0.016 0.016 0.013 0.024 0.014 0.012 0.011 0.014 0.014 0.012 0.022
20
Average 0.043 0.048 0.041 0.052 0.044 0.046 0.058 0.042 0.046 0.040 0.050 0.037 0.040 0.054

Table 13
Average IGD of CIG and compared algorithms (SSD10 and SSD50).
(n, m) SSD10 SSD50

KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG

20 × 5 0.042 0.008 0.010 0.048 0.035 0.033 0.013 0.061 0.047 0.017 0.075 0.061 0.024 0.010
20 × 10 0.048 0.010 0.012 0.054 0.043 0.038 0.014 0.066 0.054 0.016 0.071 0.066 0.022 0.011
20 × 20 0.059 0.013 0.016 0.063 0.053 0.048 0.017 0.074 0.061 0.012 0.079 0.072 0.016 0.010
50 × 5 0.055 0.020 0.024 0.062 0.049 0.034 0.025 0.073 0.053 0.015 0.084 0.074 0.020 0.013
50 × 10 0.081 0.037 0.043 0.088 0.085 0.040 0.038 0.093 0.064 0.021 0.102 0.095 0.028 0.018
50 × 20 0.084 0.039 0.045 0.093 0.084 0.048 0.044 0.102 0.067 0.031 0.111 0.101 0.039 0.026
100 × 5 0.061 0.043 0.050 0.070 0.051 0.083 0.026 0.072 0.040 0.047 0.081 0.057 0.066 0.038
100 × 0.089 0.060 0.070 0.097 0.082 0.105 0.042 0.097 0.041 0.047 0.105 0.074 0.113 0.057
10
100 × 0.126 0.078 0.092 0.132 0.092 0.121 0.052 0.110 0.044 0.051 0.115 0.070 0.122 0.069
20
200 × 0.138 0.101 0.110 0.156 0.133 0.174 0.046 0.146 0.103 0.128 0.167 0.152 0.176 0.064
10
200 × 0.160 0.116 0.128 0.174 0.160 0.195 0.066 0.162 0.120 0.137 0.177 0.167 0.195 0.064
20
500 × 0.251 0.210 0.237 0.254 0.244 0.262 0.066 0.210 0.174 0.188 0.220 0.193 0.197 0.082
20
Average 0.100 0.061 0.070 0.108 0.093 0.098 0.037 0.106 0.072 0.059 0.116 0.099 0.085 0.038

Table 14
Average IGD of CIG and comparison algorithms (SSD100 and SSD125).
(n, m) SSD100 SSD125

KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG

20 × 5 0.077 0.059 0.025 0.090 0.076 0.031 0.017 0.083 0.061 0.023 0.105 0.086 0.032 0.019
20 × 10 0.080 0.061 0.023 0.085 0.076 0.027 0.020 0.090 0.066 0.021 0.102 0.094 0.029 0.018
20 × 20 0.074 0.059 0.022 0.078 0.070 0.023 0.018 0.087 0.071 0.018 0.090 0.090 0.025 0.015
50 × 5 0.075 0.056 0.032 0.092 0.069 0.038 0.028 0.100 0.060 0.047 0.114 0.097 0.063 0.041
50 × 10 0.095 0.053 0.050 0.106 0.096 0.047 0.043 0.102 0.065 0.044 0.117 0.100 0.060 0.038
50 × 20 0.116 0.065 0.050 0.126 0.119 0.048 0.043 0.116 0.078 0.040 0.126 0.119 0.054 0.034
100 × 5 0.077 0.048 0.056 0.085 0.068 0.089 0.046 0.092 0.053 0.062 0.102 0.068 0.107 0.050
100 × 0.119 0.070 0.082 0.127 0.114 0.130 0.060 0.118 0.077 0.090 0.128 0.106 0.140 0.062
10
100 × 0.126 0.075 0.087 0.135 0.125 0.131 0.065 0.131 0.089 0.103 0.143 0.121 0.159 0.068
20
200 × 0.128 0.081 0.141 0.264 0.134 0.317 0.087 0.160 0.113 0.149 0.184 0.153 0.199 0.082
10
200 × 0.145 0.089 0.135 0.341 0.147 0.436 0.079 0.180 0.132 0.165 0.215 0.184 0.236 0.078
20
500 × 0.200 0.166 0.190 0.214 0.192 0.201 0.090 0.221 0.184 0.215 0.250 0.245 0.234 0.096
20
Average 0.109 0.074 0.075 0.145 0.107 0.127 0.050 0.123 0.087 0.082 0.140 0.122 0.112 0.050

15
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Table 15
CIG improvement percentages over other algorithms with respect to performance.
SSD10 SSD50 SSD100 SSD125 Average

Improvement percentages (HV) KCA 41.41 % 41.79 % 38.97 % 34.45 % 39.16 %


MOWSA 31.10 % 20.06 % 25.11 % 24.59 % 25.22 %
MOEA/D 58.45 % 58.17 % 49.80 % 45.85 % 53.07 %
HHQL 23.68 % 22.07 % 17.15 % 17.72 % 20.16 %
CWOA 31.70 % 48.65 % 38.44 % 55.92 % 43.68 %
INSGA-II 53.05 % 64.52 % 53.79 % 68.33 % 59.92 %
Improvement percentage (IGD) KCA 61.20 % 66.18 % 56.64 % 61.29 % 61.33 %
MOWSA 7.93 % 41.42 % 32.67 % 42.91 % 31.23 %
MOEA/D 64.66 % 69.08 % 65.14 % 65.66 % 66.14 %
HHQL 22.25 % 21.26 % 25.51 % 27.27 % 24.07 %
CWOA 57.38 % 60.47 % 55.52 % 59.16 % 58.13 %
INSGA-II 52.36 % 47.87 % 41.69 % 48.18 % 47.53 %

four competing methods are provided in supporting documents. As a proposed MIBST in Section 4.1 to ensure a comparable scenario. As
summary, we present in Fig. 15 all the parameters tested for each observed in Tables 11–14, the values of HV of CIG are better than that of
approach. The values in the red circle are the best-calibrated level for the compared algorithms in almost all cases. HHQL also considers
each parameter after the experiment and analysis. certain problem characteristics of the EDBFSP, which should be the
main reason why the algorithm performance is second only to CIG. To
summarize the performance evaluation results, Table 15 presents the
5.4. Comparison of related algorithms improvement percentages over the other algorithms with respect to
performance. CIG outperforms the state-of-the-art competitor with the
The results for the approaches are listed in Tables 11–14. The best- lowest improvement percentage of 20.16 %.
observed values in these tables are highlighted in boldface. Each algo­ The boxplots of different setup times are shown in Fig. 16. The
rithm is independently run 10 times on each test instance. It should be boxplots directly reflect the stability of the algorithms. As shown in
noted that the initial solutions of the seven algorithms are yielded by the

Fig. 16. Boxplots of IGD for all algorithms. (a) SSD10; (b) SSD50; (c) SSD100; (d) SSD125.

16
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Fig. 17. Interval plot of C metric with at the 95 % confidence interval.

Fig. 16, the length of the box for CIG is short than that of the other al­ C(MOWSA, CIG), C = C(CIG, MOEA /D) and C′ = C(MOEA /D, CIG), D =
gorithms. Therefore, the results of CIG are concentrated. HIG is quite C(CIG, HHQL) and D′ = C(HHQL, CIG), E = C(CIG, CWOA) and E′ =
sensitive to the variety of setup times, according to Fig. 16. As the setup
C(CWOA, CIG), F = C(CIG, INSGA − II) and F′ = C(INSGA − II, CIG) . The
times cannot be changed by the speed, the performance of the proposed
C Metric of CIG is generally larger than that of all other approaches.
operation will be compromised in the case of a large level of setup time,
The previous tables and figures only show the total averages for all
which is more obvious under SSD125.
instances and replicates. Fig. 18 shows the means plots and Tukey HSD
The interval plots are drawn in Fig. 17. The expression of C Metric is
95 % confidence intervals in the ANOVA experiment for all algorithms
simplified to represent clearly. The specific simplifications are shown as
against instance size. In small instances, it can be observed that there is
A = C(CIG, KCA) and A′ = C(KCA, CIG), B = C(CIG, MOWSA) and B′ =

17
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Fig. 18. Means plots and Tukey HSD 95 % confidence intervals in the ANOVA experiment for all algorithms. (a) at different jobs. (b) at different instance sizes.

Fig. 19. Pareto front of all algorithms in some instances.

no overlapping interval exists between CIG and the other compared al­ all runs. As illustrated in Fig. 19, the CIG provides a more diverse set of
gorithms except for MOEA/D and MOWSA. Nevertheless, we can see non-dominated solutions than the other algorithms to balance these
how the CIG performs much better on medium and large instances conflicting objectives for the decision makers. The set of Pareto-optimal
because the convergence speed of the other algorithms in huge search solutions provided by CIG is provided to help decision makers choose
space is slow when there is a lack of prior knowledge. Fig. 19 shows the their preferred solutions in management. Overall, CIG is superior to the
approximate distribution of non-dominated solutions acquired by seven other algorithms.
algorithms in four instances. The Pareto front is the result obtained from Then, Friedman test is utilized to carry out a statistical comparison.

18
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Table 16
Results achieved by Friedman test at different jobs (IGD).
Algorithms SSD10 SSD50 SSD100 SSD125

IGD HV IGD HV IGD HV IGD HV

KCA 5.42 2.71 5.34 2.58 5.26 2.65 5.19 2.47


MOWSA 3.09 5.33 2.86 5.18 2.86 5.46 2.81 5.53
MOEA/D 5.64 4.61 5.61 4.99 5.58 4.75 5.53 4.97
HHQL 3.02 1.99 2.72 1.78 3.09 1.94 2.89 1.76
CWOA 5.18 3.54 2.85 3.63 4.43 3.80 4.12 3.66
INSGA-II 3.95 3.74 5.87 3.94 4.39 3.57 4.30 3.67
CIG 1.59 6.08 1.85 5.31 1.69 5.83 1.76 5.94
CN 720 720 720 720 720 720 720 720
p-value 6.62E-58 5.01E-48 2.14E-77 4.53E-46 1.72E-56 4.73E-46 1.81E-65 8.51E-52
Crit. Diff α = 0.05 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300
Crit. Diff α = 0.1 0.273 0.273 0.273 0.273 0.273 0.273 0.273 0.273

Fig. 20. Friedman-test at different metrics. (a) IGD; (b) HV.

Fig. 21. Gantt chart of the schedule (π1 = {3, 1, 5} and π2 = {2, 4}).

The results are shown in Table 16 and Fig. 20, where CN is the number of are suitable to apply these two algorithms to the small-scale bench­
cases. IGD and HV are used to evaluate all the compared algorithms. The marks. However, as the scale of the problem increases, the performances
solid line and dotted line in Fig. 20 are the critical difference (CD) at 95 are not as good as CIG due to the lack of a better global search.
% and 90 % confidence intervals. As illustrated in Fig. 20, CIG has the Furthermore, the HHQL is an effective hyper-heuristic based on
best ranking among the seven algorithms. problem-specific knowledge. Among the compared algorithms, the
As above experiments, the results of CIG are significantly superior to HHQL can find good solutions on the large scale-scale problems. How­
the compared algorithms. To be specific, CWOA and MOWSA are meta- ever, HHQL does not consider the impact of setup time on TEC and Cmax .
heuristics without the problem-specific knowledge. In the large-scale Therefore, the performance of HHQL is closely related to the level of
problems, CWOA and MOWSA have slow convergence and poor solu­ setup time, especially when SSD125 is used, the performance of HHQL is
tion accuracy due to the huge search space. In CIG, the problem-specific far inferior to that of CIG. The KCA cleverly designs a search strategy
properties are embedded into the evolution process. Therefore, CIG that combines the characteristics of the DPFSP. It is found through ex­
outperforms the other compared algorithms in the large-scale problems. periments that KCA performs worse than CIG because this strategy is not
The MOEA/D and INSGA-II combine the decomposition method with the suitable for the considered EDBFSP-SDST.
neighborhood search for continuous iterative optimization. These two
algorithms mainly focus on improving the ability of local search. They

19
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

Fig. 22. Schematic diagram of block time in EDBFSP-STSD.

6. Conclusion and future work scheduling problems should be designed to deal with the situation of a
large number of states or actions in RL; (3) theoretical properties should
A CIG is proposed to solve the EDBFSP-SDST with the minimization of be extracted and utilized to develop the effective green scheduling
Cmax and TEC. The theoretical properties of the EDBFSP-SDST are inte­ approach.
grated into the evolutionary process of the CIG. In the initialization, MIBST
yields better solutions than random initialization by reducing block time CRediT authorship contribution statement
and idle time, which has been proved in Section 5.2. Through a bi-
population collaborative IG based on Q-learning, the diversity and qual­ Haizhu Bao: Software, Writing – original draft, Methodology.
ity of solutions are improved effectively. The energy-saving strategy and Quanke Pan: Funding acquisition, Investigation, Supervision, Re­
the acceleration strategy based on properties further optimize the objec­ sources, Formal analysis. Rubén Ruiz: Project administration, Writing –
tives. Extensive numerical tests show that the proposed algorithm has review & editing, Conceptualization. Liang Gao: Visualization, Writing
better performances than the existing algorithms in terms of solution – review & editing.
quality and diversity. Moreover, the proposed algorithm can obtain
feasible solutions with good quality at different stopping criteria. The su­ Declaration of Competing Interest
perior performances of the algorithm mainly owe to the following aspects.
(1) Utilization of the heuristic based on theoretical properties to The authors declare that they have no known competing financial
produce a population with good quality and diversity. interests or personal relationships that could have appeared to influence
(2) Cooperation of two populations to balance two objectives. the work reported in this paper.
(3) Utilization of the two levels Q-learning to enrich search behavior.
(4) Utilization of energy-saving strategy and acceleration strategy to Data availability
enhance exploitation capability.
Since the problem-specific operators are designed, the algorithm I have shared the link to my data/code at the Attach File step.
may not be applied to some other scheduling problems directly. How­
ever, the ideas in designing problem-specific operators and cooperative
utilization of reinforcement learning are certain guidelines for solving Acknowledgments
other complex scheduling problems.
There are further research directions: (1) the distributed production This work is supported by the National Nature Science Foundation of
scheduling in the uncertain environment, where situations like material China 62273221 and 61973203, Program of Shanghai Academic/
shortages and variations in processing times, is worthy of attention; (2) a Technology Research Leader 21XD1401000, and Shanghai Key Labo­
more accurate environment description equation for flow-shop ratory of Power station Automation Technology.

Appendix

A.1. A detailed explanation of Fig. 3

Since the example in Fig. 3 was accurately solved using the CPLEX in the Section 3.1.2 and is the optimal solution, there is no block time. In Fig. 21,
we have drawn a Gantt chart for another schedule (π1 = {5, 2, 3} and π2 = {4, 1}). After J3 was processed on the M1 in F1 , J3 was blocked on M1
because M2 was still processing J2 .

20
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

A.2. The implementation details of Cmax and TEC

The calculation process of Cmax is as follows:


D3,0 = D0,1 + st0,3,1 = 0 + 2 = 2
{ }
D3,1 = max D0,2 + st0,3,2 , D3,0 + p3,1 = max{0 + 6, 2 + 5} = 7

D3,2 = D3,1 + p3,2 = 7 + 5 = 12

D1,0 = D3,1 + st3,1,1 = 7 + 2 = 9


{ }
D1,1 = max D3,2 + st3,1,2 , D1,0 + p1,1 = max{12 + 1, 9 + 4} = 13

D1,2 = D1,1 + p1,2 = 13 + 4 = 17

D5,0 = D1,1 + st1,5,1 = 13 + 9 = 22


{ }
D5,1 = max D1,2 + st1,5,2 , D5,0 + p5,1 = max{17 + 5, 22 + 3} = 25

D5,2 = D5,1 + p5,2 = 25 + 1 = 26

D2,0 = D0,1 + st0,2,1 = 0 + 3 = 3


{ }
D2,1 = max D0,2 + st0,2,2 , D2,0 + p2,1 = max{0 + 3, 3 + 6} = 9

D2,2 = D2,1 + p2,2 = 9 + 4 = 13

D4,0 = D2,1 + st2,4,1 = 9 + 5 = 14


{ }
D4,1 = max D2,2 + st2,4,2 , D4,0 + p4,1 = max{13 + 4, 14 + 9} = 23

D4,2 = D4,1 + p4,2 = 23 + 3 = 26

The calculation process of TEC is as follows:



n ∑
m ∑
s
PEC = ξj,i,v ⋅pj,i ⋅βi,v
j=1 i=1 v=1

= 5 × 4 × 1.752 + 5 × 4 × 2.12 + 4 × 4 × 12 + 4 × 4 × 1.552 + 3 × 4 × 1.752 + 1 × 4 × 1.32 + 6 × 4 × 1.32 + 4 × 4 × 1.752 + 9 × 4 × 12 + 3 × 4


× 2.12
= 425.88


n ∑
n ∑
m
SEC = xj′,j ⋅stj′,j,i ⋅γi = (2 + 6 + 2 + 1 + 9 + 5 + 3 + 3 + 5 + 4) × 0.7 = 28
j′=0 j=1 i=1

BEC = 0


n ∑
n ∑
m
( )
IEC = xj′,j ⋅ Dj,i− 1 − Dj′,i − stj′,j,i ⋅θi = (1 + 3 + 6 + 6) × 0.5 = 8
j′=0 j=1 i=2

TEC = PEC + SEC + BEC + IEC = 461.88

A.3. A detailed explanation of Eq. (20)

Fig. 22 illustrates the Gantt chart with four machines and the sequence of four jobs J4 ,J2 , J3 ,J1 , i.e., π = {4, 2, 3,1}. Cmax can be calculated in the
following expression:

m ∑
n ∑
m ∑
n
Cmax = pπ(n),i + BTπ(j),1 + BTπ(n),i + pπ(j),1 + stπ(j− 1),π (j),1 , (22)
i=2 j=2 i=2 j=1

where π (0)=0. The block time of the J2 on the M1 is BT2,1 = BTπ(2),1 = (p4,2 +st4,2,2 ) − (p2,1 +st4,2,1 ). Similarly, we can obtain all the block times in the
Fig. 22, BT2,2 = BTπ(2),2 = (p4,3 + st4,2,3 ) − (p2,2 + st4,2,2 ), BT3,1 = BTπ(3),1 = BT2,2 + (p2,2 + st2,3,2 ) − (p3,1 + st2,3,1 ), BT3,2 = BTπ(3),2 = (p2,3 + st2,3,3 )
− (p3,2 + st2,3,2 ), and BT1,2 = BTπ(4),2 = (p3,3 + st3,1,3 ) − (p1,2 + st3,1,2 ) − IT1,2 , where IT1,2 represents the idle time before J1 is processed on M2 .

21
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

To minimize the occurrence of block time in scheduling, Eq. (22) is defined as the distance between two adjacent jobs (Jj is an immediate successor of
Jj′ ), which represents the impact of Jj′ on the block time of Jj . The smaller dj is, the better the Jj′ fits in front of the Jj . After the first job in the initial sequence
is determined, the next job is the one that has the shortest distance dj between the searched and determined job, and so on, until the sorting is completed.

∑m
dj = i=2 max{0, (pj′,i + stj′,j,i ) − (pj,i− 1 + stj′,j,i− 1 )} (22)

f
In the MIBST, after determining the first job of each factory, the job with the lowest dj is arranged in the next position of π f by Eq. (20), and so on,
until the sorting is completed. Eq. (20) is an extension of Eq. (22) in the distributed factory.

References [20] J. Wang, L. Wang, A knowledge-based cooperative algorithm for energy-efficient


scheduling of distributed flow-shop, IEEE Trans. Syst. Man Cybern. Syst. 50 (2020)
1805–1819, https://doi.org/10.1109/TSMC.2017.2788879.
[1] L. Cheng, Q. Tang, L. Zhang, Z. Zhang, Multi-objective Q-learning-based hyper-
[21] G. Wang, L. Gao, X. Li, P. Li, M.F. Tasgetiren, Energy-efficient distributed
heuristic with Bi-criteria selection for energy-aware mixed shop scheduling, Swarm
permutation flow shop scheduling problem using a multi-objective whale swarm
Evol. Comput. 69 (2022), 100985, https://doi.org/10.1016/j.swevo.2021.100985.
algorithm, Swarm Evol. Comput. 57 (2020), 100716, https://doi.org/10.1016/j.
[2] I. Ribas, R. Companys, X. Tort-Martorell, An iterated greedy algorithm for the
swevo.2020.100716.
parallel blocking flow shop scheduling problem and sequence-dependent setup
[22] G. Wang, X. Li, L. Gao, P. Li, Energy-efficient distributed heterogeneous welding
times, Expert Syst. Appl. 184 (2021), 115535, https://doi.org/10.1016/j.
flow shop scheduling problem using a modified MOEA/D, Swarm Evol. Comput. 62
eswa.2021.115535.
(2021), 100858, https://doi.org/10.1016/j.swevo.2021.100858.
[3] R. Li, W. Gong, L. Wang, C. Lu, X. Zhuang, Surprisingly popular-based adaptive
[23] J. Li, M. Song, L. Wang, P. Duan, Y. Han, H. Sang, Q. Pan, Hybrid artificial bee
memetic algorithm for energy-efficient distributed flexible job shop scheduling,
colony algorithm for a parallel batching distributed flow-shop problem with
IEEE Trans. Cybern. PP (2023) 1–11, https://doi.org/10.1109/
deteriorating jobs, IEEE Trans. Cybern. 50 (2020) 2425–2439.
TCYB.2023.3280175.
[24] K. Wang, Y. Huang, H. Qin, A fuzzy logic-based hybrid estimation of distribution
[4] R. Li, W. Gong, C. Lu, L. Wang, A learning-based memetic algorithm for energy-
algorithm for distributed permutation flowshop scheduling problems under
efficient flexible job-shop scheduling with type-2 fuzzy processing time, IEEE
machine breakdown, J. Oper. Res. Soc. 67 (2016) 68–82, https://doi.org/10.1057/
Trans. Evol. Comput. 27 (2023) 610–620, https://doi.org/10.1109/
jors.2015.50.
TEVC.2022.3175832.
[25] A. Rifai, H. Nguyen, S. Dawal, Multi-objective adaptive large neighborhood search
[5] F. Zhao, Z. Xu, L. Wang, N. Zhu, T. Xu, J. Jonrinaldi, A population-based iterated
for distributed reentrant permutation flow shop scheduling, Appl. Soft Comput. 40
greedy algorithm for distributed assembly no-wait flow-shop scheduling problem,
(2016) 42–57, https://doi.org/10.1016/j.asoc.2015.11.034.
IEEE Trans. Ind. Inf. 19 (2023) 6692–6705, https://doi.org/10.1109/
[26] S. Hatami, R. Ruiz, C. Andrés-Romano, Heuristics and metaheuristics for the
TII.2022.3192881.
distributed assembly permutation flowshop scheduling problem with sequence
[6] R. Ruiz, Q.K. Pan, B. Naderi, Iterated greedy methods for the distributed
dependent setup times, Int. J. Prod. Econ. 169 (2015) 76–88, https://doi.org/
permutation flowshop scheduling problem, Omega 83 (2019) 213–222, https://
10.1016/j.ijpe.2015.07.027.
doi.org/10.1016/j.omega.2018.03.004.
[27] I. Ribas, R. Companys, A computational evaluation of constructive heuristics for
[7] J.M. Framinan, R. Leisten, A multi-objective iterated greedy search for flowshop
the parallel blocking flow shop problem with sequence-dependent setup times, Int.
scheduling with makespan and flowtime criteria, OR Spectr. 30 (2008) 787–804,
J. Ind. Eng. Comput. 12 (2021) 321–328, https://doi.org/10.5267/j.
https://doi.org/10.1007/s00291-007-0098-z.
ijiec.2021.1.004.
[8] M. Ciavotta, G. Minella, R. Ruiz, Multi-objective sequence dependent setup times
[28] F. Zhao, G. Zhou, L. Wang, A cooperative scatter search with reinforcement
permutation flowshop: a new algorithm and a comprehensive study, Eur. J. Oper.
learning mechanism for the distributed permutation flowshop scheduling problem
Res. 227 (2013) 301–313, https://doi.org/10.1016/j.ejor.2012.12.031.
with sequence-dependent setup times, IEEE Trans. Syst. Man Cybern. Syst. (2023),
[9] A.S. Xanthopoulos, D.E. Koulouriotis, V.D. Tourassis, D.M. Emiris, Intelligent
https://doi.org/10.1109/TSMC.2023.3256484.
controllers for bi-objective dynamic scheduling on a single machine with sequence-
[29] V. Riahi, M.A.H. Newton, A. Sattar, Constraint based local search for flowshops
dependent setups, Appl. Soft Comput. J. 13 (2013) 4704–4717, https://doi.org/
with sequence-dependent setup times, Eng. Appl. Artif. Intell. 102 (2021), 104264,
10.1016/j.asoc.2013.07.015.
https://doi.org/10.1016/J.ENGAPPAI.2021.104264.
[10] F. Zhao, X. Hu, L. Wang, T. Xu, N. Zhu, A reinforcement learning-driven brain
[30] F. Rossi, M. Nagano, Heuristics and iterated greedy algorithms for the distributed
storm optimisation algorithm for multi-objective energy-efficient distributed
mixed no-idle flowshop with sequence-dependent setup times, Comput. Ind. Eng.
assembly no-wait flow shop scheduling problem, Int. J. Prod. Res. (2022), https://
157 (2021), 107337, https://doi.org/10.1016/j.cie.2021.107337.
doi.org/10.1080/00207543.2022.2070786.
[31] F. Zhao, L. Zhao, L. Wang, H. Song, An ensemble discrete differential evolution for
[11] B. Naderi, R. Ruiz, The distributed permutation flowshop scheduling problem,
the distributed blocking flowshop scheduling with minimizing makespan criterion,
Comput. Oper. Res. 37 (2010) 754–768, https://doi.org/10.1016/j.
Expert Syst. Appl. 160 (2020), 113678, https://doi.org/10.1016/J.
cor.2009.06.019.
ESWA.2020.113678.
[12] S. Hatami, R. Ruiz, C. Andrés-Romano, The distributed assembly permutation
[32] F. Zhao, Z. Xu, H. Bao, T. Xu, N. Zhu, Jonrinaldi, A cooperative whale optimization
flowshop scheduling problem, Int. J. Prod. Res. 51 (2013) 5292–5308, https://doi.
algorithm for energy-efficient scheduling of the distributed blocking flow-shop
org/10.1080/00207543.2013.807955.
with sequence-dependent setup time, Comput. Ind. Eng. (2023), 109082, https://
[13] I. Ribas, R. Companys, X. Tort-Martorell, Efficient heuristics for the parallel
doi.org/10.1016/J.CIE.2023.109082.
blocking flow shop scheduling problem, Expert Syst. Appl. 74 (2017) 41–54,
[33] Z. Shao, D. Pi, W. Shao, A novel discrete water wave optimization algorithm for
https://doi.org/10.1016/j.eswa.2017.01.006.
blocking flow-shop scheduling problem with sequence-dependent setup times,
[14] C. Koulamas, S.S. Panwalkar, New index priority rules for no-wait flow shops,
Swarm Evol. Comput. 40 (2018) 53–75, https://doi.org/10.1016/j.
Comput. Ind. Eng. 115 (2018) 647–652, https://doi.org/10.1016/j.
swevo.2017.12.005.
cie.2017.12.015.
[34] H. Miyata, M. Nagano, An iterated greedy algorithm for distributed blocking flow
[15] K. Kianfar, S.M.T. Fatemi Ghomi, A. Oroojlooy Jadid, Study of stochastic sequence-
shop with setup times and maintenance operations to minimize makespan,
dependent flexible flow shop via developing a dispatching rule and a hybrid GA,
Comput. Ind. Eng. 171 (2022), 108366, https://doi.org/10.1016/j.
Eng. Appl. Artif. Intell. 25 (2012) 494–506, https://doi.org/10.1016/j.
cie.2022.108366.
engappai.2011.12.004.
[35] X. Han, Y. Han, B. Zhang, H. Qin, J. Li, Y. Liu, D. Gong, An effective iterative
[16] F. Zhao, H. Bao, L. Wang, T. Xu, N. Zhu, A heuristic and meta-heuristic based on
greedy algorithm for distributed blocking flowshop scheduling problem with
problem-specific knowledge for distributed blocking flow-shop scheduling problem
balanced energy costs criterion, Appl. Soft Comput. 129 (2022), 109502, https://
with sequence-dependent setup times, Eng. Appl. Artif. Intell. 116 (2022), 105443,
doi.org/10.1016/j.asoc.2022.109502.
https://doi.org/10.1016/j.engappai.2022.105443.
[36] F. Zhao, L. Zhang, J. Cao, J. Tang, A cooperative water wave optimization
[17] J. Li, S. Bai, P. Duan, H. Sang, Y. Han, Z. Zheng, An improved artificial bee colony
algorithm with reinforcement learning for the distributed assembly no-idle
algorithm for addressing distributed flow shop with distance coefficient in a
flowshop scheduling problem, Comput. Ind. Eng. 153 (2021), 107082, https://doi.
prefabricated system, Int. J. Prod. Res. 57 (2019) 6922–6942, https://doi.org/
org/10.1016/J.CIE.2020.107082.
10.1080/00207543.2019.1571687.
[37] F. Zhao, R. Ma, L. Wang, A self-learning discrete jaya algorithm for multiobjective
[18] S. Wang, L. Wang, M. Liu, Y. Xu, An effective estimation of distribution algorithm
energy-efficient distributed no-idle flow-shop scheduling problem in
for solving the distributed permutation flow-shop scheduling problem, Int. J. Prod.
heterogeneous factory system, IEEE Trans. Cybern. 52 (2022) 12675–12686,
Econ. 145 (2013) 387–396, https://doi.org/10.1016/J.IJPE.2013.05.004.
https://doi.org/10.1109/TCYB.2021.3086181.
[19] I. Ribas, R. Companys, X. Tort-Martorell, An iterated greedy algorithm for solving
[38] F. Zhao, X. He, L. Wang, A two-stage cooperative evolutionary algorithm with
the total tardiness parallel blocking flow shop scheduling problem, Expert Syst.
problem-specific knowledge for energy-efficient scheduling of no-wait flow-shop
Appl. 121 (2019) 347–361, https://doi.org/10.1016/j.eswa.2018.12.039.

22
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399

problem, IEEE Trans. Cybern. 51 (2021) 5291–5303, https://doi.org/10.1109/ Autom. Sci. Eng. 19 (2022) 3020–3038, https://doi.org/10.1109/
TCYB.2020.3025662. TASE.2021.3104716.
[39] Q. Zeng, J. Li, R. Li, T. Huang, Y. Han, H. Sang, Improved NSGA-II for energy- [48] H. Wang, B. Sarker, J. Li, J. Li, Adaptive scheduling for assembly job shop with
efficient distributed no-wait flow-shop with sequence-dependent setup time, uncertain assembly times based on dual Q-learning, Int. J. Prod. Res. 59 (2021)
Complex Intell. Syst. 9 (2022) 825–849, https://doi.org/10.1007/S40747-022- 5867–5883, https://doi.org/10.1080/00207543.2020.1794075.
00830-6/FIGURES/11. [49] J. Lin, Y. Li, H. Song, Semiconductor final testing scheduling using Q-learning
[40] F. Zhao, S. Di, L. Wang, A hyperheuristic with q-learning for the multiobjective based hyper-heuristic, Expert Syst. Appl. 187 (2022), 115978, https://doi.org/
energy-efficient distributed blocking flow shop scheduling problem, IEEE Trans. 10.1016/j.eswa.2021.115978.
Cybern. (2022) 1–14, https://doi.org/10.1109/tcyb.2022.3192112. [50] R. Ruiz, T. Stützle, An iterated greedy heuristic for the sequence dependent setup
[41] J. Heger, T. Voss, Dynamically adjusting the k-values of the ATCS rule in a flexible times flowshop problem with makespan and weighted tardiness objectives, Eur. J.
flow shop scenario with reinforcement learning, Int. J. Prod. Res. 61 (2023) Oper. Res. 187 (2008) 1143–1159, https://doi.org/10.1016/j.ejor.2006.07.029.
146–160, https://doi.org/10.1080/00207543.2021.1943762. [51] E. Taillard, Benchmarks for basic scheduling problems, Eur. J. Oper. Res. 64 (1993)
[42] J. Lee, H. Kim, Reinforcement learning for robotic flow shop scheduling with 278–285, https://doi.org/10.1016/0377-2217(93)90182-M.
processing time variations, Int. J. Prod. Res. 60 (2022) 2346–2368, https://doi. [52] J. Wang, L. Wang, A cooperative memetic algorithm with feedback for the energy-
org/10.1080/00207543.2021.1887533. aware distributed flow-shops with flexible assembly scheduling, IEEE Trans. Evol.
[43] Z. Pan, L. Wang, J. Wang, J. Lu, Deep reinforcement learning based optimization Comput. 168 (2022) 461–475, https://doi.org/10.1016/j.cie.2022.108126.
algorithm for permutation flow-shop scheduling, IEEE Trans. Emerg. Top. Comput. [53] J. Li, X. Chen, P. Duan, J. Mou, KMOEA: a knowledge-based multiobjective
Intell (2021) 1–12, https://doi.org/10.1109/TETCI.2021.3098354. algorithm for distributed hybrid flow shop in a prefabricated system, IEEE Trans.
[44] Z. Cao, C. Lin, M. Zhou, R. Huang, Scheduling semiconductor testing facility by Ind. Inf. 18 (2022) 5318–5329, https://doi.org/10.1109/TII.2021.3128405.
using cuckoo search algorithm with reinforcement learning and surrogate [54] A. Dean, D. Voss, Fractional factorial experiments. Design and Analysis of
modeling, IEEE Trans. Autom. Sci. Eng. 16 (2019) 825–837, https://doi.org/ Experiments, Springer, New York, New York, NY, 1999, pp. 483–545, https://doi.
10.1109/TASE.2018.2862380. org/10.1007/0-387-22634-6_15.
[45] Y. Shiue, K. Lee, C. Su, Real-time scheduling for a smart factory using a [55] S. Du, W. Zhou, D. Wu, M. Fei, An effective discrete monarch butterfly optimization
reinforcement learning approach, Comput. Ind. Eng. 125 (2018) 604–614, https:// algorithm for distributed blocking flow shop scheduling with an assembly machine,
doi.org/10.1016/j.cie.2018.03.039. Expert Syst. Appl. 225 (2023), https://doi.org/10.1016/J.ESWA.2023.120113.
[46] I. Park, J. Huh, J. Kim, J. Park, A reinforcement learning approach to robust [56] X. He, Q. Pan, L. Gao, L. Wang, P.N. Suganthan, A greedy cooperative Co-evolution
scheduling of semiconductor manufacturing facilities, IEEE Trans. Autom. Sci. Eng. ary algorithm with problem-specific knowledge for multi-objective flowshop group
17 (2020) 1420–1431, https://doi.org/10.1109/TASE.2019.2956762. scheduling problems, IEEE Trans. Evol. Comput. 639798 (2021), https://doi.org/
[47] S. Luo, L. Zhang, Y. Fan, Real-time scheduling for dynamic partial-no-wait 10.1109/tevc.2021.3115795, 1–1.
multiobjective flexible job shop by deep reinforcement learning, IEEE Trans.

23

You might also like