A Collaborative Iterated Greedy Algorithm With Reinforcement Learning For Energy-Aware Distributed Blocking Flow-Shop Scheduling
A Collaborative Iterated Greedy Algorithm With Reinforcement Learning For Energy-Aware Distributed Blocking Flow-Shop Scheduling
A R T I C L E I N F O A B S T R A C T
Keywords: Energy-aware scheduling has attracted increasing attention mainly due to economic benefits as well as reducing
Energy-aware scheduling the carbon footprint at companies. In this paper, an energy-aware scheduling problem in a distributed blocking
Flow-shop flow-shop with sequence-dependent setup times is investigated to minimize both makespan and total energy
Q-learning
consumption. A mixed-integer linear programming model is constructed and a cooperative iterated greedy al
Iterated greedy
Multi-objective optimization
gorithm based on Q-learning (CIG) is proposed. In the CIG, a top-level Q-learning is focused on enhancing the
utilization ratio of machines to minimize makespan by finding a scheduling policy from four sequence-related
operations. A bottom-level Q-learning is centered on improving energy efficiency to reduce total energy con
sumption by learning the optimal speed governing policy from four speed-related operations. According to the
structure characteristics of solutions, several properties are explored to design an energy-saving strategy and
acceleration strategy. The experimental results and statistical analysis prove that the CIG is superior to the state-
of-the-art competitors with improvement percentages of 20.16 % over 2880 instances from the well-known
benchmark set in the literature.
1. Introduction significant for the smooth processing of jobs. The duration of these op
erations is affinitive with the sequence of two adjacent operations,
The blocking flow-shop scheduling problem (BFSP) arises in various which is called sequence-dependent setup time (SDST) in literature [2].
production environments. The common characteristic of this kind of Further, the production process typically involves two conflicting ob
scheduling problem is that machines do not have buffers, which causes jectives such as equipment utilization and energy efficiency.
the previous operation of a job to be released on the anterior machine Hence, this paper addresses an energy-efficiency distributed BFSP
only when the following machine is not occupied [1]. Fig. 1 shows the with SDSTs (EDBFSP-SDST) to minimize makespan and total energy
common process in aluminum production in the non-ferrous metallur consumption (TEC). Minimizing the makespan is akin to improving
gical industry which includes electrolysis, casting, cold rolling, anneal equipment utilization. Minimizing TEC is helpful in increasing energy
ing, and aluminum foil rolling. In the process of electrolysis, pure efficiency and thus reducing carbon footprint [3,4].
aluminum is extracted and purified from alumina in electrolysis cells. The iterated greedy (IG) algorithm is a proven advantage approach
The cells operate 24 h a day at a supercurrent (150,000 amps or more) to in scheduling [2,5,6]. Framinan and Leisten [7] designed a
ensure that the molten aluminum does not solidify. In the continuous multi-objective IG for the canonical flow-shop scheduling problem
casting process, the molten aluminum can be released from the elec (FSP). Ciavotta et al. [8] developed a restarted iterated Pareto greedy for
trolytic cells to the foundry only when the continuous casting machine is the FSP with setup times. Please note that IG originally iterated with a
not occupied, because the continuous casting machine does not have single solution and was developed for a single goal, so if high-quality
buffers and needs to keep the aluminum in liquid form and avoid so results must be attained, it is impossible to make simple adjustments.
lidification. In addition, this process also includes certain nonproductive Fortunately, reinforcement learning (RL) has been actively utilized to be
operations, such as job transportation and tool replacement, which are effective for multi-objective scheduling problems [9,10]. In our work, a
* Corresponding author at: School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, PR China.
E-mail address: [email protected] (Q. Pan).
https://doi.org/10.1016/j.swevo.2023.101399
Received 16 May 2023; Received in revised form 4 July 2023; Accepted 8 September 2023
Available online 11 September 2023
2210-6502/© 2023 Elsevier B.V. All rights reserved.
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
collaborative IG algorithm with reinforcement learning (CIG) is [28,29], mixed no-idle distributed PFSP [30], DBFSP [2,16,27,31–35],
designed. Compared to previous research work, the contribution of this parallel FSP [2], no-idle flow-shop [36,37], and no-wait flow-shop [38,
paper can be summarized as follows. 39]. For the most of the above algorithms, constructive heuristics were
employed to quickly obtain initial solutions. Then, metaheuristics were
1) A bi-population cooperative IG with two levels Q-learning is utilized to further ameliorate them. Metaheuristics utilized global in
developed. formation to generate new solutions in each iteration and obtained
2) The theoretical properties of the EDBFSP-SDST are explored. An near-optimal solutions but suffered from poor time efficiency due to
energy-saving strategy and an acceleration strategy based on the their complex iteration process and search patterns in a huge search
properties are designed to further optimize the solutions. space. This limits their practical application in large-scale production
3) The EDBFSP-SDST with the makespan and TEC criteria is presented environments, where multi-objective production scheduling further
and modeled by using a mixed-integer linear programming (MILP) challenges such metaheuristics.
model that is not studied before our study. For RL-based methods, Zhao et al. [40] utilized a Q-learning to settle
an EDFSP. Heger and Voss [41] presented an innovative RL as a
The remainder of this paper is structured as follows. In Section 2, we hyper-heuristic to dynamically regulate dispatching rules in a flexible
present a survey of the closely related literature. The considered flow-shop. Lee and Kim [42] addressed a robotic FSP by the RL method.
EDBFSP-SDST is described in Section 3. The proposed CIG is detailed in Pan et al. [43] designed a deep RL for the PFSP to realize the end-to-end
Section 4. The experimental results are shown in Section 5. Finally, the output. Cheng et al. [1] investigated a hyper-heuristic with RL for a
conclusions and further research topics are given in Section 6. mixture of job-shop and FSP. Cao et al. [44] addressed a scheduling
problem of semiconductor testing facilities with the makespan criterion
2. Related works by a cuckoo search algorithm with RL. Shiue et al. [45] developed a
real-time scheduling based on RL by multiple dispatching rules. Park
The literature including the distributed FSP (DFSP), BFSP, DFSP with et al. [46] used an approach based on RL for the scheduling problem
SDST, and RL in scheduling are reviewed. The study on DFSP has two with reentrant flows and SDSTs to minimize makespan. Overall, the
subfields. One is the problem models, and the other is the approaches. above studies utilized Q-learning to determine an appropriate dis
For the models, Naderi and Ruiz [11] studied a distributed permutation patching rule in each state. Compared with the dispatching rules and
flow-shop scheduling (DPFSP) model first in 2010. Hatami et al. [12] metaheuristics, long-term scheduling performance can be accomplished
solved the model of a DFSP with an assembly stage (DAFSP) in supply by adaptively determining optimal dispatching rules and local search
chain systems. Ribas et al. [13] proposed a MILP model for a distributed methods for different states by RL agents [47].
BFSP (DBFSP), which is the first literature for the DBFSP. The methods
for the DFSPs can be classified into three main categories: dispatching 3. The EDBFSP-SDST problem
rules-based, metaheuristic-based, and RL-based methods. Diverse dis
patching rules have been developed in the last decade, such as the basic For the EDBFSP-SDST, we have n jobs and δ identical factories. Each
priority rules [14], composite rules [15], and heuristic rules [16]. factory is a flow-shop consisting of m machines. A job can only be
Despite the ease of implementation, most dispatching rules are myopic assigned to one of the factories for processing. Each job follows the same
and depend on the production environment and optimization objectives, machine route and does not change the sequence of jobs from one ma
it is difficult to reach an equipoise among different objectives using a chine to another. The setup times are separate from the processing times
unitary rule. and related to the sequence. Since there is no intermediate buffer be
Multifarious metaheuristics have been employed for the basic DFSPs tween consecutive machines, the job does not leave the current machine
and DFSPs with additional constraints. For the basic DFSPs, the meta immediately after completion until the next machine is free and setup is
heuristics include IG [6], artificial bee colony algorithm [17], and complete. Each machine is set with s speed levels, which cannot be
estimation of the distribution algorithm [18]. For the DFSPs with changed during the processing. Suppose that the higher the processing
additional constraints, Ribas et al. [19] considered an IG algorithm for a speed is, the shorter the processing time is and the more energy con
DBFSP with a total tardiness criterion. Wang and Wang [20] considered sumption the machine consumes. The other common constraints of flow-
an energy-efficient DFSP (EDFSP) and designed a knowledge-based shop are also considered:
cooperative algorithm (KCA). Wang et al. [21] investigated a
multi-objective whale swarm algorithm (MOWSA) to settle an EDFSP. (1) All jobs are independent and released at time zero;
An evolutionary algorithm based on decomposition (MOEA/D) [22] is (2) each job is only processed on one machine at the same time;
designed for solving an EDFSP with the objectives of makespan and TEC. (3) each machine processes only one job at the same time;
There are also some other issues, such as a parallel batching DFSP with (4) pre-emption is not allowed.
deteriorating jobs [23], a DFSP with machine breakdown [24], a
distributed re-entrant PFSP [25]. In addition, the SDST in FSPs has been The EDBFSP-SDST can be decomposed into three subproblems: first,
studied for practical applications [26], for example, the BFSP [27], PFSP the problem of job assignment (assigning n jobs to δ factories), second,
2
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
determining the sequence in which the jobs are processed, and third, ∑
n
determining the processing speed of each job on m machines. The ob s.t. xj′,j = 1, j = 1, 2, …, n (2)
jectives are to minimize makespan and TEC. j′=0,j∕
=j′
Indexes:
xj′,0 = δ (4)
j′=1
j, j′ Index of job, j, j′ = 1, 2, ⋯, n.
i Index of machine, i = 0, 1, 2, ⋯, m. 0 indicates a dummy machine.
f Index of factory, f = 1, 2, ⋯, δ. ∑
n
∑
n ∑
m− 1
( ) n ∑
∑ n
( )
3.1.2. MILP model of EDBFSP-SDST BEC= Dj,i − Dj,i− 1 − pj,i ⋅ζi + = j′
xj′,j ⋅ Dj,1 − Dj′,1 − stj′,j,1 − pj,1 ⋅ζ1 ,j∕
The EDBFSP-SDST is described as follows. j=1 i=2 j′=0 j=1
3
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Table 1 (6) implies that each operation Oj, i must be processed at one speed. The
Energy-relevant parameters. actual processing time of operation Oj, i is defined in (7). Constraint (8)
v Vv βi,v γi ζi θi ensures that the job sequence cannot form a sub-ring. Constraint (9)
ensures that the departure time of Oj,i must be greater than or equal to
1 1.00 4 × (Vv )2 0.70 1.00 0.50
2 1.30 the sum of the departure time of Oj,i− 1 plus the actual processing time of
3 1.55 operation Oj,i . Constraints (10) and (11) represent the relationship be
4 1.75 tween the departure time of any two adjacent jobs. Constraints (12) and
5 2.10
(13) represent the departure time of the first job. The total energy
consumption when machines run in processing mode, setup mode,
blocked mode, and idle mode are defined by Eqs. (14)–(17), respec
Table 2 tively. The TEC is defined in Eq. (18). The Cmax is defined in Eq. (19). The
The tj,i of jobs on M1 and M2 . MILP model is verified by the well-known CPLEX solver. The code of the
J1 J2 J3 J4 J5 MILP model is published on GitHub (https://github.com/banian
M1 4.00 7.80 8.75 9.00 5.25 2314/Model-of-EDBFSP-SDST.git).
M2 6.20 7.00 8.75 6.30 1.30
Table 3
Fig. 2 shows an illustration of the realistic EDBFSP-SDST in an
SDSTs stj′,j,1 and stj′,j,2 of jobs on M1 and M2 .
aluminum production industry. Aluminum industry is driven by
stj′,j,1 J1 J2 J3 J4 J5 stj′,j,2 J1 J2 J3 J4 J5 customer orders, which signifies that customers conclude contracts with
6 3 2 3 6 1 3 6 1 9 plants firstly, and then transform them into production orders in the
J1 0 3 6 8 9 0 5 4 4 5 form of alloy composition, size, etc. Then, arrange the order on the
J2 2 0 4 5 9 9 0 9 4 4 electrolytic cell and continuous casting machines, which is accompanied
J3 2 5 0 2 3 1 9 0 9 4
J4 1 2 1 0 9 1 2 9 0 9
by a large amount of energy consumption and environmental pollution,
J5 1 5 2 5 0 9 1 4 2 0 as a result of the electricity, natural gas, or other fossil fuels consump
tion. Hence, the processes in the aluminum production can be modelled
after the proposed EDBFSP-SDST.
There is an example to show how the decision variables reflect the
Table 4
Speed of jobs on machines. solution by considering an EDBFSP-SDST with n = 5, m = 2, and δ = 2.
The energy-relevant parameters are listed in Table 1. The standard
Job Speed level Actual speed
processing times of jobs and SDSTs are shown in Tables 2 and 3,
M1 M2 M1 M2 respectively. The processing speeds are given in Table 4. The energy
J1 1 3 1.00 1.55 consumption per unit of time is detailed in Table 5. According to Table 4,
J2 2 4 1.30 1.75 the actual processing times of jobs are calculated in Table 6 by Eq. (7).
J3 4 5 1.75 2.10 Fig. 3 presents a Gantt graph for a solution, where the processing
J4 1 5 1.00 2.10
sequences are π1 = {3, 1, 5} and π2 = {2,4}. Since the example in Fig. 3
J5 4 2 1.75 1.30
was accurately solved using the CPLEX in the Section 3.1.2 and is the
optimal solution, there is no block time. In Appendix section, we have
drawn a Gantt chart for another schedule in this example again. The two
Table 5
optimization objectives are Cmax = 26 and TEC = 461.88, respectively.
Energy consumption per unit of time.
The implementation details of Cmax and TEC are shown in the Appendix
Machine J1 J2 J3 J4 J5 section.
M1 4.00 6.76 12.25 4.00 12.25
M2 9.61 12.25 17.64 17.64 12.25
4. The proposed CIG algorithm
4
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
For the MIBST, the first step is to select the first job of each factory.
Let U denotes the set of all sequences that are not scheduled. The un
scheduled jobs are allocated in the second step. The detailed execution
Fig. 4. The framework of CIG. process is shown in Lines 4 to 16. For the third step, the NEH is executed
in each factory except for the first and last job as shown in Lines 17-21.
For the computational effort, the time complexity of calculating the
Algorithm 1
The proposed CIG.
5
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Algorithm 2
Population initialization procedure MIBST.
6
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Algorithm 3
SERQ-l and SPRQ-l subpopulation improvement procedures.
Acceptance criterion: greedy acceptance of the one with the best (4) SER_IG4. Destruction: select and remove a job randomly from the
Cmax . critical factory; Reconstruction: insert the job with all positions in
(2) SER_IG2. Destruction: select and remove a job in the critical the non-critical factory; Number of executions: once; Acceptance
factory and a job in a random non-critical factory; Reconstruc criterion: greedy acceptance of the one with the best Cmax .
tion: swap positions of the jobs; Number of executions: number of
jobs in the non-critical factory; Acceptance criterion: greedy SPRQ-l aims to reduce energy consumption by changing the speed of
acceptance of the one with the best Cmax . machines in pop2 . There are also four procedures involved:
(3) SER_IG3. Destruction: select and remove a job randomly in the
critical factory; Reconstruction: insert the job into all positions in (1) SPR_IG1. Destruction: select two jobs randomly from a random
the critical factory; Number of executions: once; Acceptance factory; Reconstruction: swap these processing speeds on the
criterion: greedy acceptance of the one with the best Cmax . same machines; Number of executions: number of jobs in the
7
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Algorithm 4
Determine the critical path.
selected factory; Acceptance criterion: greedy acceptance of the Definition 1. Let Πa and Πb express two solutions respectively. Πa
one with the best TEC. dominates Πb (denotes by Πa ≻ Πb ), if and only if 1) ∀ i ∈ {1, 2},
(2) SPR_IG2. Destruction: select two jobs randomly from a random fi (Πa ) ≤ fi (Πb ); and 2) ∃i ∈ {1, 2}, fi (Πa ) < fi (Πb ). Here, f1 = Cmax and
factory; Reconstruction: swap these processing speeds on the f2 = TEC.
different machines; Number of executions: number of jobs in the
f
selected factory; Acceptance criterion: greedy acceptance of the Theorem 1. Let Dmax represents the maximum completion time of Ff . If
one with the best TEC. f f
Πa and Πb satisfy 1) ∀ f ∈ {1, 2, …, δ}, Dmax (Πa ) = Dmax (Πb ); 2) ∀j ∈ {1,
(3) SPR_IG3. Destruction: select two jobs randomly from a random 2,…,n},i ∈ {1,2,…,m},Vv (Πa ) ≤ Vv (Πb ); and 3) ∃ j ∈ {1,2,…,n},i ∈ {1,
factory; Reconstruction: reverse the processing speeds between 2, …, m}, Vv (Πa ) < Vv (Πb ), then, TEC(Πa ) < TEC(Πb ) and Πa ≻ Πb .
two positions on the same machines; Number of executions: half
of the number of jobs in the selected factory; Acceptance crite Property 1. Let BT(Oj′, i ) denote the block time of the operation Oj′, i . If
rion: greedy acceptance of the one with the best TEC. Πa satisfies 1) ∃j′ ∈ {1,2,…,n − 1},i ∈ {1,2,…,m},BT(Πa (Oj′, i )) > 0; and
(4) SPR_IG4. Destruction: select two jobs randomly from a random
factory; Reconstruction: shuffle the processing speeds between 2) for a new solution Π′a , if Vv (PΠ′ (Oj′, i )) < Vv (PΠa (Oj′, i )) while other
a
two positions on the same machines; Number of executions: half speeds are the same, then TEC(Π′a ) < TEC(Πa ) and Cmax (Π′a ) = Cmax (Πa ).
of the number of jobs in the selected factory; Acceptance crite The maximum decrement Δ for Oj′, i satisfies Δ =
rion: greedy acceptance of the one with the best TEC. {
= BTj′,i = Dj′,i − Dj′,i− 1 − pj′,i , i > 1
.
= BTj,i = Dj,1 − Dj′,1 − stj′,j,1 − pj,1 , i = 1, j ∕
= j′
4.3. Markov decision process model in CIG
Property 2. If Πa satisfies1) ∃j′ ∈ {1, 2, …, n − 1}, i ∈ {3, 4, …, m},
RL is generally modeled as a 3-tuple (s, a, and r) Markov Decision BT(Πa (Oj′, i )) > 0; 2) IT(PΠa (Oj, i− 1 )) > 0; and 3) for a new solution Π′a , if
Process (MDP) model, where s, a, and r are the state, action, and reward,
Vv (PΠ′ (Oj′, i− 1 )) < Vv (PΠa (Oj′, i− 1 )) while other speeds are the same, then
respectively [40]. Fig. 6 shows the overall operating mechanism of the a
CIG. The MDP model is described in detail below. TEC(Π′a ) < TEC(Πa ) and Cmax (Π′a ) = Cmax (Πa ). The maximum decre
State (s): for the EDBFSP-SDST, job sequence, processing speed, and ment Δ for Oj′, i satisfies Δ = min{BTj′,i ,ITj,i− 1 } = min{Dj′,i − Dj′,i− 1 − pj′,i ,
optimization objectives (Cmax and TEC) are regarded as the different Dj,i− 2 − Dj′,i− 1 − stj′,j,i− 1 }.
states indicating the current situation after achieving a certain action.
Action (a): there are two types of actions: those based on the job (aJ,t : Theorem 2. Let IT(Oj, i ) denote the idle time of Oj, i . Let Fc denote the
SER_IG1, SER_IG2, SER_IG3, and SER_IG4) and the actions based on critical factory in Πa . The operations in the critical path (the method to
machine speed (av,t : SPR_IG1, SPR_IG2, SPR_IG3, and SPR_IG4). determine the critical path is shown in Algorithm 4) are called critical
Reward (r): If an improved solution is obtained by carrying out the operations (denotes by PΠa (Oj, i )). If Πa satisfies 1) there is only one
action a, then the value of r is 1, otherwise r = 0. critical path in the Fc ; 2) ∃ j ∈ {1, 2, …, n}, i ∈ {2, 3, …, m},
Agent: each individual of CIG is regarded as an agent. IT(PΠa (Oj, i )) > 0; and 3) for a new solution Π′a , if Vv (PΠ′ (Oj, i− 1 )) >
a
Vv (PΠa (Oj, i− 1 )) while the other speeds are the same, then Cmax (Π′a ) <
4.4. Energy-saving and acceleration operations Cmax (Πa ) and the maximum decrement Δ for Oj, i− 1 satisfies Δ = ITj,i =
Dj,i− 1 − Dj′,i − stj′,j,i , i > 2, j ∕
= j′.
For EDBFSP-SDST, several theoretical properties are explored, which
can be utilized to design effective search operators.
8
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Fig. 7. Several properties and theorem diagrams of EDBFSP-SDST. (a) Property 1; (b) Property 2; (c) Theorem 2.
Fig. 8. (a) Find the critical path. (b) Decrease the speed of noncritical operation and give a new solution with better TEC.
Fig. 9. (a) Find the critical path. (b) Increase the speed of critical operations and give a new solution with a better Cmax .
Fig. 7 illustrates the above properties and theorem. Properties 1 and using idle times. Theorem 2 is to shorten Cmax by reducing the processing
2 are two different cases of Theorem 1 which reduce TEC by increasing time of specific operations.
the processing time of the specific operations but without increasing According to Properties 1 and 2, diminishing the speeds of the
Cmax . The difference is that Property 1 reduces TEC by converting block noncritical operations can decrease the TEC without deteriorating the
time into increased processing time, while Property 2 reduces TEC by Cmax . Hence, the energy-saving approach is carried out for all
9
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
10
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Fig. 11. Interaction plots with Tukey Honest Significant Differences (HSD) 95 % confidence intervals. (a) PS ∗ α. (b) PS ∗ γ.
Fig. 12. Interval plot of C metric with at the 95 % confidence interval. (a) S1 and S1′; (b) S2 and S2′; (c) S3 and S3′; (d) S4 and S4′; (e) S5 and S5′.
5.2. Evaluation of main components of CIG than S′i , i = 1, 2, 3, 4, 5, respectively. The interval plot of C Metric in
dicates that the solutions of variants are dominated by the solutions of
Five variants of CIG are developed to validate the effectiveness of the CIG.
above components. The CIG contains five improvement components: (1) The ONVG results are reported in Table 8. The best value for each
initialization method (MIBST), (2) sequence-related operation (SERQ-l) group is highlighted in boldface. By comparing ONVG in SSD10, SSD50,
based on Q-learning (pop1 ), (3) speed-related operation (SPRQ-l) based SSD100, and SSD125, the values of ONVG for CIG are larger than CIG-RI
on Q-learning (pop2 ), (4) energy-saving strategy based on the properties, in all instances. As a result, the proposed MIBST leads to more non-
and (5) acceleration strategy based on the properties. Hence, five vari dominated solutions than random initialization, indicating that the
ants of CIG are compared: CIG with random initialization (CIG-RI), CIG construction heuristic based on problem knowledge has a positive
without SERQ-l (CIG-NE), CIG without SPRQ-l (CIG-NP), CIG without impact on the CIG. According to Table 8 and Fig. 12(b), CIG generates
the energy-saving strategy (CIG-NS), and CIG without the acceleration more non-dominated solutions than CIG-NE for different scenarios,
strategy (CIG-NA). The termination criterion is the same as in the above which reveals that the sequence-related operation based on Q-learning
section. to conduct a search direction is meaningful. Moreover, CIG-NP is infe
The expression of C Metric is simplified to represent clearly. The rior to the CIG, which demonstrates that the speed-related operation
specific simplifications are shown as S1 = C(CIG, CIG-RI) and S′1 = based on Q-learning is significant for reducing energy consumption.
C(CIG-RI, CIG), S2 = C(CIG, CIG-NE) and S′2 = C(CIG-NE, CIG), S3 = Additionally, CIG-NS is inferior to CIG. The energy-saving strategy can
C(CIG, CIG-NP) and S′3 = C(CIG-NP, CIG), S4 = C(CIG, CIG-NS) and S′4 effectively reduce TEC by reducing the speed of certain processes, but it
does not increase Cmax according to Property 2 in Section 4.4.
= C(CIG-NS, CIG), S5 = C(CIG, CIG-NA) and S′5 = C(CIG-NA, CIG). As
The complete CIG outperforms CIG-NA in all instances. As observed
shown in Fig. 12, the values of the C Metric for Si are generally greater
in Table 8 and Fig. 12(e), CIG is better than CIG-NA. The Cmax of each
11
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Table 8
Average ONVG of all variants.
(n, m) SSD10 SSD50
CIG-RI CIG-NE CIG-NP CIG-NS CIG-NA CIG CIG-RI CIG-NE CIG-NP CIG-NS CIG-NA CIG
20 × 5 371 375 343 399 334 433 339 350 314 371 314 390
20 × 10 358 359 359 347 318 443 318 336 326 356 317 387
20 × 20 362 342 331 356 338 400 315 358 350 341 320 366
50 × 5 390 376 378 361 357 464 323 340 322 322 358 386
50 × 10 375 363 362 344 378 416 340 315 354 339 309 375
50 × 20 333 329 356 355 350 419 307 338 318 316 315 363
100 × 5 359 375 382 353 385 405 347 340 335 374 357 393
100 × 306 330 348 331 332 351 310 330 331 319 331 349
10
100 × 315 302 328 336 349 366 310 308 318 299 320 324
20
200 × 315 311 337 319 335 392 298 304 279 298 301 338
10
200 × 299 317 304 292 287 324 299 306 279 300 323 332
20
500 × 284 284 319 326 286 329 293 293 282 294 293 310
20
Average 338.9 338.6 345.6 343.3 337.4 395.2 316.6 326.5 317.3 327.4 321.5 359.4
SSD100 SSD125
20 × 5 245 263 265 255 251 302 217 260 245 260 231 268
20 × 10 270 281 284 273 273 319 237 270 254 276 258 286
20 × 20 303 315 277 319 302 331 277 275 288 263 306 334
50 × 5 269 264 245 282 262 316 235 248 250 242 242 303
50 × 10 304 315 286 303 291 344 254 261 251 253 271 316
50 × 20 296 289 292 313 288 337 284 288 294 288 274 341
100 × 5 312 299 299 272 285 338 265 253 280 292 251 310
100 × 263 289 290 278 282 299 260 251 287 280 237 299
10
100 × 283 279 301 285 296 314 281 235 271 261 280 286
20
200 × 282 269 269 271 283 305 256 251 278 277 265 283
10
200 × 265 293 290 287 294 309 254 253 253 269 263 277
20
500 × 264 279 267 266 279 279 235 235 235 241 235 250
20
Average 279.7 286.3 280.4 283.7 282.2 316.1 254.6 256.7 265.5 266.8 259.4 296.1
Fig. 13. Pareto front of CIG and its variants at different factories. (a) f = 2; (b) f = 4; (c) f = 6;
solution is reduced by reducing the Cmax of the critical path in the factory that all components are efficient for the performance of CIG.
with the largest Cmax according to Theorem 2. The Pareto front gener Furthermore, in order to demonstrate the contribution of each
ated by CIG is more diverse and uniform than that generated by the component to the proposed CIG algorithm, the statistical test of each
other variants in Fig. 13, which means that CIG can weigh the two ob component and its combination is conducted. A Wilcoxon signed-rank
jectives well. The Pareto front is the result obtained from all runs. test is applied to verify the significant differences between CIG and its
Additionally, Fig. 14 depicts the means plots with Tukey Honest Sig variants in Table 9. And the confidence level is set as 95 % (α = 0.05).
nificant Differences (HSD) confidence intervals with a 95 % confidence The sign “ + ” represents that the proposed CIG has great out
level and investigates the interaction between the response variable. It performance when compared to another variant with a significant
can be seen from the results in Fig. 14 that they are consistent with the confidence level. On the contrary, the sign “ − ” is denoted that the CIG
above conclusions. As above experiments and analysis, it can be verified has worse performance compared with another variant with a
12
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Fig. 14. Means plots and Tukey HSD 95 % confidence intervals in the ANOVA experiment for the CIG and its variants. (a) at different SSD. (b) at different factories.
Table 9
Wilcoxon signed-rank comparison between the proposed CIG and its variants with C Metric (α = 0.05).
(n, m) CIG vs. CIG-RI CIG vs. CIG-NE CIG vs. CIG-NP CIG vs. CIG-NS CIG vs. CIG-NA
R+/R- p win R+/R- p win R+/R- p win R+/R- p win R+/R- p win
20 × 5 64/56 6.88E-01 = 99/21 1.38E-14 + 64/56 3.71E-01 = 52/68 8.41E-02 = 66/54 1.72E-01 =
20 × 10 61/59 7.58E-01 = 94/26 4.85E-12 + 57/63 2.74E-01 = 65/55 7.59E-01 = 66/54 9.82E-01 =
20 × 20 51/69 3.26E-01 = 91/29 9.33E-11 + 66/54 6.55E-01 = 56/64 2.49E-01 = 65/55 2.69E-01 =
50 × 5 64/56 9.51E-02 = 93/27 1.11E-12 + 69/51 4.51E-01 = 67/53 5.71E-01 = 63/57 8.83E-01 =
50 × 10 62/58 8.44E-01 = 99/21 1.49E-11 + 60/60 8.98E-01 = 62/58 7.63E-01 = 64/56 7.66E-01 =
50 × 20 72/48 1.30E-01 = 97/23 1.99E-12 + 60/60 6.09E-01 = 68/52 3.31E-01 = 66/54 4.91E-01 =
100 × 5 58/62 5.93E-01 = 85/35 4.00E-06 + 62/58 4.55E-01 = 61/59 1.74E-01 = 60/60 9.45E-01 =
100 × 10 68/52 4.08E-01 = 69/51 9.81E-02 = 58/62 2.92E-01 = 68/52 5.43E-01 = 62/58 5.77E-01 =
100 × 20 64/56 5.55E-01 = 70/50 6.28E-02 = 53/67 1.55E-01 = 59/61 9.84E-01 = 65/55 5.65E-01 =
200 × 10 58/62 7.06E-01 = 83/37 6.70E-05 + 62/58 9.07E-01 = 56/64 4.29E-01 = 67/53 7.71E-01 =
200 × 20 64/56 6.41E-01 = 66/54 6.82E-02 = 57/63 6.28E-01 = 56/64 2.25E-01 = 60/60 2.16E-01 =
500 × 20 81/39 8.84E-03 + 120/0 1.73E-06 + 52/68 1.02E-02 = 57/63 2.89E-02 = 101/19 6.72E-02 +
+ / = /− 1/11/0 9/3/0 0/12/0 0/12/0 1/11/0
13
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Table 11
Average HV of CIG and compared algorithms (SSD10 and SSD50).
(n, m) SSD10 SSD50
KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG
20 × 5 0.088 0.096 0.084 0.097 0.094 0.097 0.119 0.097 0.112 0.092 0.109 0.092 0.109 0.118
20 × 10 0.072 0.081 0.070 0.080 0.077 0.080 0.100 0.078 0.095 0.074 0.088 0.077 0.088 0.105
20 × 20 0.056 0.065 0.054 0.062 0.060 0.062 0.082 0.062 0.079 0.060 0.069 0.061 0.069 0.091
50 × 5 0.072 0.081 0.069 0.088 0.073 0.088 0.094 0.080 0.105 0.077 0.100 0.076 0.100 0.117
50 × 10 0.059 0.067 0.057 0.076 0.062 0.076 0.080 0.063 0.083 0.061 0.082 0.060 0.082 0.091
50 × 20 0.046 0.051 0.045 0.058 0.049 0.058 0.064 0.052 0.068 0.051 0.066 0.050 0.066 0.078
100 × 5 0.042 0.039 0.039 0.046 0.047 0.045 0.052 0.041 0.044 0.039 0.051 0.030 0.032 0.052
100 × 0.041 0.044 0.039 0.051 0.062 0.043 0.056 0.048 0.064 0.046 0.068 0.058 0.038 0.068
10
100 × 0.038 0.045 0.037 0.053 0.067 0.042 0.054 0.042 0.058 0.041 0.059 0.054 0.035 0.059
20
200 × 0.037 0.039 0.027 0.037 0.033 0.021 0.055 0.028 0.031 0.021 0.028 0.024 0.015 0.042
10
200 × 0.030 0.034 0.022 0.031 0.025 0.016 0.046 0.026 0.029 0.020 0.026 0.021 0.014 0.038
20
500 × 0.024 0.022 0.019 0.025 0.024 0.019 0.040 0.019 0.016 0.014 0.018 0.019 0.014 0.029
20
Average 0.051 0.055 0.047 0.059 0.056 0.054 0.070 0.053 0.065 0.050 0.064 0.052 0.055 0.074
these methods are compared with the proposed CIG. As we all know, the HV. As shown in Table 10, the levels of the parameters are set ac
metaheuristics need to be properly calibrated for optimal performance. cording to the relevant literature.
Same as Section 5.1, the DOE methodology is employed to calibrate The experimental results are evaluated by the ANOVA method. Due
these four competing approaches. The response variable to minimize is to space limitations, the complete details of the calibrations for these
14
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Table 12
Average HV of CIG and comparison algorithms (SSD100 and SSD125).
(n, m) SSD100 SSD125
KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG
20 × 5 0.084 0.091 0.082 0.096 0.082 0.096 0.099 0.077 0.081 0.073 0.086 0.067 0.079 0.086
20 × 10 0.063 0.071 0.061 0.073 0.066 0.073 0.088 0.067 0.076 0.067 0.078 0.065 0.078 0.083
20 × 20 0.053 0.060 0.051 0.059 0.058 0.059 0.078 0.056 0.065 0.055 0.064 0.055 0.064 0.072
50 × 5 0.057 0.064 0.055 0.072 0.058 0.072 0.073 0.057 0.063 0.056 0.073 0.047 0.055 0.075
50 × 10 0.051 0.060 0.050 0.068 0.053 0.068 0.071 0.053 0.058 0.052 0.068 0.046 0.055 0.069
50 × 20 0.043 0.051 0.042 0.055 0.045 0.055 0.060 0.047 0.055 0.047 0.060 0.043 0.054 0.060
100 × 5 0.032 0.035 0.031 0.041 0.026 0.027 0.042 0.028 0.031 0.028 0.036 0.023 0.019 0.038
100 × 0.035 0.042 0.034 0.050 0.044 0.031 0.050 0.030 0.033 0.029 0.038 0.024 0.020 0.042
10
100 × 0.033 0.040 0.032 0.046 0.041 0.030 0.046 0.028 0.029 0.027 0.034 0.023 0.020 0.038
20
200 × 0.024 0.027 0.019 0.024 0.020 0.013 0.034 0.024 0.027 0.019 0.023 0.019 0.013 0.035
10
200 × 0.022 0.025 0.018 0.022 0.018 0.012 0.031 0.022 0.025 0.018 0.021 0.017 0.012 0.032
20
500 × 0.015 0.013 0.013 0.016 0.016 0.013 0.024 0.014 0.012 0.011 0.014 0.014 0.012 0.022
20
Average 0.043 0.048 0.041 0.052 0.044 0.046 0.058 0.042 0.046 0.040 0.050 0.037 0.040 0.054
Table 13
Average IGD of CIG and compared algorithms (SSD10 and SSD50).
(n, m) SSD10 SSD50
KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG
20 × 5 0.042 0.008 0.010 0.048 0.035 0.033 0.013 0.061 0.047 0.017 0.075 0.061 0.024 0.010
20 × 10 0.048 0.010 0.012 0.054 0.043 0.038 0.014 0.066 0.054 0.016 0.071 0.066 0.022 0.011
20 × 20 0.059 0.013 0.016 0.063 0.053 0.048 0.017 0.074 0.061 0.012 0.079 0.072 0.016 0.010
50 × 5 0.055 0.020 0.024 0.062 0.049 0.034 0.025 0.073 0.053 0.015 0.084 0.074 0.020 0.013
50 × 10 0.081 0.037 0.043 0.088 0.085 0.040 0.038 0.093 0.064 0.021 0.102 0.095 0.028 0.018
50 × 20 0.084 0.039 0.045 0.093 0.084 0.048 0.044 0.102 0.067 0.031 0.111 0.101 0.039 0.026
100 × 5 0.061 0.043 0.050 0.070 0.051 0.083 0.026 0.072 0.040 0.047 0.081 0.057 0.066 0.038
100 × 0.089 0.060 0.070 0.097 0.082 0.105 0.042 0.097 0.041 0.047 0.105 0.074 0.113 0.057
10
100 × 0.126 0.078 0.092 0.132 0.092 0.121 0.052 0.110 0.044 0.051 0.115 0.070 0.122 0.069
20
200 × 0.138 0.101 0.110 0.156 0.133 0.174 0.046 0.146 0.103 0.128 0.167 0.152 0.176 0.064
10
200 × 0.160 0.116 0.128 0.174 0.160 0.195 0.066 0.162 0.120 0.137 0.177 0.167 0.195 0.064
20
500 × 0.251 0.210 0.237 0.254 0.244 0.262 0.066 0.210 0.174 0.188 0.220 0.193 0.197 0.082
20
Average 0.100 0.061 0.070 0.108 0.093 0.098 0.037 0.106 0.072 0.059 0.116 0.099 0.085 0.038
Table 14
Average IGD of CIG and comparison algorithms (SSD100 and SSD125).
(n, m) SSD100 SSD125
KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG KCA MOWSA MOEA/D HHQL CWOA INSGA-II CIG
20 × 5 0.077 0.059 0.025 0.090 0.076 0.031 0.017 0.083 0.061 0.023 0.105 0.086 0.032 0.019
20 × 10 0.080 0.061 0.023 0.085 0.076 0.027 0.020 0.090 0.066 0.021 0.102 0.094 0.029 0.018
20 × 20 0.074 0.059 0.022 0.078 0.070 0.023 0.018 0.087 0.071 0.018 0.090 0.090 0.025 0.015
50 × 5 0.075 0.056 0.032 0.092 0.069 0.038 0.028 0.100 0.060 0.047 0.114 0.097 0.063 0.041
50 × 10 0.095 0.053 0.050 0.106 0.096 0.047 0.043 0.102 0.065 0.044 0.117 0.100 0.060 0.038
50 × 20 0.116 0.065 0.050 0.126 0.119 0.048 0.043 0.116 0.078 0.040 0.126 0.119 0.054 0.034
100 × 5 0.077 0.048 0.056 0.085 0.068 0.089 0.046 0.092 0.053 0.062 0.102 0.068 0.107 0.050
100 × 0.119 0.070 0.082 0.127 0.114 0.130 0.060 0.118 0.077 0.090 0.128 0.106 0.140 0.062
10
100 × 0.126 0.075 0.087 0.135 0.125 0.131 0.065 0.131 0.089 0.103 0.143 0.121 0.159 0.068
20
200 × 0.128 0.081 0.141 0.264 0.134 0.317 0.087 0.160 0.113 0.149 0.184 0.153 0.199 0.082
10
200 × 0.145 0.089 0.135 0.341 0.147 0.436 0.079 0.180 0.132 0.165 0.215 0.184 0.236 0.078
20
500 × 0.200 0.166 0.190 0.214 0.192 0.201 0.090 0.221 0.184 0.215 0.250 0.245 0.234 0.096
20
Average 0.109 0.074 0.075 0.145 0.107 0.127 0.050 0.123 0.087 0.082 0.140 0.122 0.112 0.050
15
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Table 15
CIG improvement percentages over other algorithms with respect to performance.
SSD10 SSD50 SSD100 SSD125 Average
four competing methods are provided in supporting documents. As a proposed MIBST in Section 4.1 to ensure a comparable scenario. As
summary, we present in Fig. 15 all the parameters tested for each observed in Tables 11–14, the values of HV of CIG are better than that of
approach. The values in the red circle are the best-calibrated level for the compared algorithms in almost all cases. HHQL also considers
each parameter after the experiment and analysis. certain problem characteristics of the EDBFSP, which should be the
main reason why the algorithm performance is second only to CIG. To
summarize the performance evaluation results, Table 15 presents the
5.4. Comparison of related algorithms improvement percentages over the other algorithms with respect to
performance. CIG outperforms the state-of-the-art competitor with the
The results for the approaches are listed in Tables 11–14. The best- lowest improvement percentage of 20.16 %.
observed values in these tables are highlighted in boldface. Each algo The boxplots of different setup times are shown in Fig. 16. The
rithm is independently run 10 times on each test instance. It should be boxplots directly reflect the stability of the algorithms. As shown in
noted that the initial solutions of the seven algorithms are yielded by the
Fig. 16. Boxplots of IGD for all algorithms. (a) SSD10; (b) SSD50; (c) SSD100; (d) SSD125.
16
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Fig. 16, the length of the box for CIG is short than that of the other al C(MOWSA, CIG), C = C(CIG, MOEA /D) and C′ = C(MOEA /D, CIG), D =
gorithms. Therefore, the results of CIG are concentrated. HIG is quite C(CIG, HHQL) and D′ = C(HHQL, CIG), E = C(CIG, CWOA) and E′ =
sensitive to the variety of setup times, according to Fig. 16. As the setup
C(CWOA, CIG), F = C(CIG, INSGA − II) and F′ = C(INSGA − II, CIG) . The
times cannot be changed by the speed, the performance of the proposed
C Metric of CIG is generally larger than that of all other approaches.
operation will be compromised in the case of a large level of setup time,
The previous tables and figures only show the total averages for all
which is more obvious under SSD125.
instances and replicates. Fig. 18 shows the means plots and Tukey HSD
The interval plots are drawn in Fig. 17. The expression of C Metric is
95 % confidence intervals in the ANOVA experiment for all algorithms
simplified to represent clearly. The specific simplifications are shown as
against instance size. In small instances, it can be observed that there is
A = C(CIG, KCA) and A′ = C(KCA, CIG), B = C(CIG, MOWSA) and B′ =
17
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Fig. 18. Means plots and Tukey HSD 95 % confidence intervals in the ANOVA experiment for all algorithms. (a) at different jobs. (b) at different instance sizes.
no overlapping interval exists between CIG and the other compared al all runs. As illustrated in Fig. 19, the CIG provides a more diverse set of
gorithms except for MOEA/D and MOWSA. Nevertheless, we can see non-dominated solutions than the other algorithms to balance these
how the CIG performs much better on medium and large instances conflicting objectives for the decision makers. The set of Pareto-optimal
because the convergence speed of the other algorithms in huge search solutions provided by CIG is provided to help decision makers choose
space is slow when there is a lack of prior knowledge. Fig. 19 shows the their preferred solutions in management. Overall, CIG is superior to the
approximate distribution of non-dominated solutions acquired by seven other algorithms.
algorithms in four instances. The Pareto front is the result obtained from Then, Friedman test is utilized to carry out a statistical comparison.
18
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
Table 16
Results achieved by Friedman test at different jobs (IGD).
Algorithms SSD10 SSD50 SSD100 SSD125
Fig. 21. Gantt chart of the schedule (π1 = {3, 1, 5} and π2 = {2, 4}).
The results are shown in Table 16 and Fig. 20, where CN is the number of are suitable to apply these two algorithms to the small-scale bench
cases. IGD and HV are used to evaluate all the compared algorithms. The marks. However, as the scale of the problem increases, the performances
solid line and dotted line in Fig. 20 are the critical difference (CD) at 95 are not as good as CIG due to the lack of a better global search.
% and 90 % confidence intervals. As illustrated in Fig. 20, CIG has the Furthermore, the HHQL is an effective hyper-heuristic based on
best ranking among the seven algorithms. problem-specific knowledge. Among the compared algorithms, the
As above experiments, the results of CIG are significantly superior to HHQL can find good solutions on the large scale-scale problems. How
the compared algorithms. To be specific, CWOA and MOWSA are meta- ever, HHQL does not consider the impact of setup time on TEC and Cmax .
heuristics without the problem-specific knowledge. In the large-scale Therefore, the performance of HHQL is closely related to the level of
problems, CWOA and MOWSA have slow convergence and poor solu setup time, especially when SSD125 is used, the performance of HHQL is
tion accuracy due to the huge search space. In CIG, the problem-specific far inferior to that of CIG. The KCA cleverly designs a search strategy
properties are embedded into the evolution process. Therefore, CIG that combines the characteristics of the DPFSP. It is found through ex
outperforms the other compared algorithms in the large-scale problems. periments that KCA performs worse than CIG because this strategy is not
The MOEA/D and INSGA-II combine the decomposition method with the suitable for the considered EDBFSP-SDST.
neighborhood search for continuous iterative optimization. These two
algorithms mainly focus on improving the ability of local search. They
19
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
6. Conclusion and future work scheduling problems should be designed to deal with the situation of a
large number of states or actions in RL; (3) theoretical properties should
A CIG is proposed to solve the EDBFSP-SDST with the minimization of be extracted and utilized to develop the effective green scheduling
Cmax and TEC. The theoretical properties of the EDBFSP-SDST are inte approach.
grated into the evolutionary process of the CIG. In the initialization, MIBST
yields better solutions than random initialization by reducing block time CRediT authorship contribution statement
and idle time, which has been proved in Section 5.2. Through a bi-
population collaborative IG based on Q-learning, the diversity and qual Haizhu Bao: Software, Writing – original draft, Methodology.
ity of solutions are improved effectively. The energy-saving strategy and Quanke Pan: Funding acquisition, Investigation, Supervision, Re
the acceleration strategy based on properties further optimize the objec sources, Formal analysis. Rubén Ruiz: Project administration, Writing –
tives. Extensive numerical tests show that the proposed algorithm has review & editing, Conceptualization. Liang Gao: Visualization, Writing
better performances than the existing algorithms in terms of solution – review & editing.
quality and diversity. Moreover, the proposed algorithm can obtain
feasible solutions with good quality at different stopping criteria. The su Declaration of Competing Interest
perior performances of the algorithm mainly owe to the following aspects.
(1) Utilization of the heuristic based on theoretical properties to The authors declare that they have no known competing financial
produce a population with good quality and diversity. interests or personal relationships that could have appeared to influence
(2) Cooperation of two populations to balance two objectives. the work reported in this paper.
(3) Utilization of the two levels Q-learning to enrich search behavior.
(4) Utilization of energy-saving strategy and acceleration strategy to Data availability
enhance exploitation capability.
Since the problem-specific operators are designed, the algorithm I have shared the link to my data/code at the Attach File step.
may not be applied to some other scheduling problems directly. How
ever, the ideas in designing problem-specific operators and cooperative
utilization of reinforcement learning are certain guidelines for solving Acknowledgments
other complex scheduling problems.
There are further research directions: (1) the distributed production This work is supported by the National Nature Science Foundation of
scheduling in the uncertain environment, where situations like material China 62273221 and 61973203, Program of Shanghai Academic/
shortages and variations in processing times, is worthy of attention; (2) a Technology Research Leader 21XD1401000, and Shanghai Key Labo
more accurate environment description equation for flow-shop ratory of Power station Automation Technology.
Appendix
Since the example in Fig. 3 was accurately solved using the CPLEX in the Section 3.1.2 and is the optimal solution, there is no block time. In Fig. 21,
we have drawn a Gantt chart for another schedule (π1 = {5, 2, 3} and π2 = {4, 1}). After J3 was processed on the M1 in F1 , J3 was blocked on M1
because M2 was still processing J2 .
20
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
∑
n ∑
n ∑
m
SEC = xj′,j ⋅stj′,j,i ⋅γi = (2 + 6 + 2 + 1 + 9 + 5 + 3 + 3 + 5 + 4) × 0.7 = 28
j′=0 j=1 i=1
BEC = 0
∑
n ∑
n ∑
m
( )
IEC = xj′,j ⋅ Dj,i− 1 − Dj′,i − stj′,j,i ⋅θi = (1 + 3 + 6 + 6) × 0.5 = 8
j′=0 j=1 i=2
Fig. 22 illustrates the Gantt chart with four machines and the sequence of four jobs J4 ,J2 , J3 ,J1 , i.e., π = {4, 2, 3,1}. Cmax can be calculated in the
following expression:
∑
m ∑
n ∑
m ∑
n
Cmax = pπ(n),i + BTπ(j),1 + BTπ(n),i + pπ(j),1 + stπ(j− 1),π (j),1 , (22)
i=2 j=2 i=2 j=1
where π (0)=0. The block time of the J2 on the M1 is BT2,1 = BTπ(2),1 = (p4,2 +st4,2,2 ) − (p2,1 +st4,2,1 ). Similarly, we can obtain all the block times in the
Fig. 22, BT2,2 = BTπ(2),2 = (p4,3 + st4,2,3 ) − (p2,2 + st4,2,2 ), BT3,1 = BTπ(3),1 = BT2,2 + (p2,2 + st2,3,2 ) − (p3,1 + st2,3,1 ), BT3,2 = BTπ(3),2 = (p2,3 + st2,3,3 )
− (p3,2 + st2,3,2 ), and BT1,2 = BTπ(4),2 = (p3,3 + st3,1,3 ) − (p1,2 + st3,1,2 ) − IT1,2 , where IT1,2 represents the idle time before J1 is processed on M2 .
21
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
To minimize the occurrence of block time in scheduling, Eq. (22) is defined as the distance between two adjacent jobs (Jj is an immediate successor of
Jj′ ), which represents the impact of Jj′ on the block time of Jj . The smaller dj is, the better the Jj′ fits in front of the Jj . After the first job in the initial sequence
is determined, the next job is the one that has the shortest distance dj between the searched and determined job, and so on, until the sorting is completed.
∑m
dj = i=2 max{0, (pj′,i + stj′,j,i ) − (pj,i− 1 + stj′,j,i− 1 )} (22)
f
In the MIBST, after determining the first job of each factory, the job with the lowest dj is arranged in the next position of π f by Eq. (20), and so on,
until the sorting is completed. Eq. (20) is an extension of Eq. (22) in the distributed factory.
22
H. Bao et al. Swarm and Evolutionary Computation 83 (2023) 101399
problem, IEEE Trans. Cybern. 51 (2021) 5291–5303, https://doi.org/10.1109/ Autom. Sci. Eng. 19 (2022) 3020–3038, https://doi.org/10.1109/
TCYB.2020.3025662. TASE.2021.3104716.
[39] Q. Zeng, J. Li, R. Li, T. Huang, Y. Han, H. Sang, Improved NSGA-II for energy- [48] H. Wang, B. Sarker, J. Li, J. Li, Adaptive scheduling for assembly job shop with
efficient distributed no-wait flow-shop with sequence-dependent setup time, uncertain assembly times based on dual Q-learning, Int. J. Prod. Res. 59 (2021)
Complex Intell. Syst. 9 (2022) 825–849, https://doi.org/10.1007/S40747-022- 5867–5883, https://doi.org/10.1080/00207543.2020.1794075.
00830-6/FIGURES/11. [49] J. Lin, Y. Li, H. Song, Semiconductor final testing scheduling using Q-learning
[40] F. Zhao, S. Di, L. Wang, A hyperheuristic with q-learning for the multiobjective based hyper-heuristic, Expert Syst. Appl. 187 (2022), 115978, https://doi.org/
energy-efficient distributed blocking flow shop scheduling problem, IEEE Trans. 10.1016/j.eswa.2021.115978.
Cybern. (2022) 1–14, https://doi.org/10.1109/tcyb.2022.3192112. [50] R. Ruiz, T. Stützle, An iterated greedy heuristic for the sequence dependent setup
[41] J. Heger, T. Voss, Dynamically adjusting the k-values of the ATCS rule in a flexible times flowshop problem with makespan and weighted tardiness objectives, Eur. J.
flow shop scenario with reinforcement learning, Int. J. Prod. Res. 61 (2023) Oper. Res. 187 (2008) 1143–1159, https://doi.org/10.1016/j.ejor.2006.07.029.
146–160, https://doi.org/10.1080/00207543.2021.1943762. [51] E. Taillard, Benchmarks for basic scheduling problems, Eur. J. Oper. Res. 64 (1993)
[42] J. Lee, H. Kim, Reinforcement learning for robotic flow shop scheduling with 278–285, https://doi.org/10.1016/0377-2217(93)90182-M.
processing time variations, Int. J. Prod. Res. 60 (2022) 2346–2368, https://doi. [52] J. Wang, L. Wang, A cooperative memetic algorithm with feedback for the energy-
org/10.1080/00207543.2021.1887533. aware distributed flow-shops with flexible assembly scheduling, IEEE Trans. Evol.
[43] Z. Pan, L. Wang, J. Wang, J. Lu, Deep reinforcement learning based optimization Comput. 168 (2022) 461–475, https://doi.org/10.1016/j.cie.2022.108126.
algorithm for permutation flow-shop scheduling, IEEE Trans. Emerg. Top. Comput. [53] J. Li, X. Chen, P. Duan, J. Mou, KMOEA: a knowledge-based multiobjective
Intell (2021) 1–12, https://doi.org/10.1109/TETCI.2021.3098354. algorithm for distributed hybrid flow shop in a prefabricated system, IEEE Trans.
[44] Z. Cao, C. Lin, M. Zhou, R. Huang, Scheduling semiconductor testing facility by Ind. Inf. 18 (2022) 5318–5329, https://doi.org/10.1109/TII.2021.3128405.
using cuckoo search algorithm with reinforcement learning and surrogate [54] A. Dean, D. Voss, Fractional factorial experiments. Design and Analysis of
modeling, IEEE Trans. Autom. Sci. Eng. 16 (2019) 825–837, https://doi.org/ Experiments, Springer, New York, New York, NY, 1999, pp. 483–545, https://doi.
10.1109/TASE.2018.2862380. org/10.1007/0-387-22634-6_15.
[45] Y. Shiue, K. Lee, C. Su, Real-time scheduling for a smart factory using a [55] S. Du, W. Zhou, D. Wu, M. Fei, An effective discrete monarch butterfly optimization
reinforcement learning approach, Comput. Ind. Eng. 125 (2018) 604–614, https:// algorithm for distributed blocking flow shop scheduling with an assembly machine,
doi.org/10.1016/j.cie.2018.03.039. Expert Syst. Appl. 225 (2023), https://doi.org/10.1016/J.ESWA.2023.120113.
[46] I. Park, J. Huh, J. Kim, J. Park, A reinforcement learning approach to robust [56] X. He, Q. Pan, L. Gao, L. Wang, P.N. Suganthan, A greedy cooperative Co-evolution
scheduling of semiconductor manufacturing facilities, IEEE Trans. Autom. Sci. Eng. ary algorithm with problem-specific knowledge for multi-objective flowshop group
17 (2020) 1420–1431, https://doi.org/10.1109/TASE.2019.2956762. scheduling problems, IEEE Trans. Evol. Comput. 639798 (2021), https://doi.org/
[47] S. Luo, L. Zhang, Y. Fan, Real-time scheduling for dynamic partial-no-wait 10.1109/tevc.2021.3115795, 1–1.
multiobjective flexible job shop by deep reinforcement learning, IEEE Trans.
23