Dynamic Programming Approach To Solve Multi-Project Scheduling Problem-2024
Dynamic Programming Approach To Solve Multi-Project Scheduling Problem-2024
Discrete optimisation
Keywords: We consider the dynamic and stochastic resource-constrained multi-project scheduling problem which allows
Project scheduling for the random arrival of projects and stochastic task durations. Completing projects generates rewards, which
Markov decision processes are reduced by a tardiness cost in the case of late completion. Multiple types of resource are available, and
Approximate dynamic programming
projects consume different amounts of these resources when under processing. The problem is modelled as
Dynamic resource allocation
an infinite-horizon discrete-time Markov decision process and seeks to maximise the expected discounted
Dynamic programming
long-run profit. We use an approximate dynamic programming algorithm (ADP) with a linear approximation
model which can be used for online decision making. Our approximation model uses project elements that
are easily accessible by a decision-maker, with the model coefficients obtained offline via a combination of
Monte Carlo simulation and least squares estimation. Our numerical study shows that ADP often statistically
significantly outperforms the optimal reactive baseline algorithm (ORBA). In experiments on smaller problems
however, both typically perform suboptimally compared to the optimal scheduler obtained by stochastic
dynamic programming. ADP has an advantage over ORBA and dynamic programming in that ADP can be
applied to larger problems. We also show that ADP generally produces statistically significantly higher profits
than common algorithms used in practice, such as a rule-based algorithm and a reactive genetic algorithm.
1. Introduction refers to random project arrivals from different types of projects and
stochastic refers to uncertain task processing times. Dynamic general-
Project management and project scheduling are challenging. En- isations of RCMPSP are the dynamic RCMPSP and the dynamic and
gineering services, software development, IT services, construction stochastic RCMPSP. A discussion of the RCMPSP and its variants can
and R&D operate in dynamic environments, often processing multi-
be found in Satic et al. (2022).
ple projects simultaneously. Many unplanned factors may disturb the
The non-dynamic (i.e., static) variants of RCMPSP are extensively
project execution plan with new project arrivals and delays in task pro-
studied (Creemers, 2015). However, the dynamic variants of the
cessing. A recent project management survey (Wellingtone PPM, 2018)
showed that only 40% of projects are completed within their planned RCMPSP where new projects randomly arrive in the system are scarce
time, 46% of projects are completed within their predicted budget, and in the literature. To the best of our knowledge, there are only three
36% of projects realise their full benefits. In this paper we consider research papers available for the dynamic RCMPSP which are Pamay
the dynamic arrival of new projects and stochastic durations of tasks, et al. (2014), Parizi et al. (2017), Satic et al. (2020), and there are
and we propose a comprehensive model and solution approach for the only ten research papers available for the dynamic and stochastic
dynamic and stochastic resource-constrained multi-project scheduling RCMPSP which are Adler et al. (1995), Capa and Ulusoy (2015),
problem (dynamic and stochastic RCMPSP). Chen et al. (2019), Choi et al. (2007), Cohen et al. (2005), Fliedner
The dynamic and stochastic RCMPSP is a generalisation of the et al. (2012), Melchiors (2015), Melchiors and Kolisch (2009), Mel-
precedence-constrained scheduling problem, which was shown to be
chiors et al. (2018), Satic et al. (2022). They adopt different solution
an NP-hard problem in Garey and Johnson (1979, p. 239). Thus the
approaches, which have their own strengths and weaknesses.
dynamic and stochastic RCMPSP is also an NP-hard problem. Dynamic
∗ Corresponding author at: Lancaster University Management School, Bailrigg, Lancaster, LA1 4YX, United Kingdom.
E-mail address: [email protected] (U. Satic).
https://doi.org/10.1016/j.ejor.2023.10.046
Received 9 August 2021; Accepted 31 October 2023
Available online 10 November 2023
0377-2217/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
Adler et al. (1995), Cohen et al. (2005), Melchiors and Kolisch resource consumption and decision rewards as features and can be used
(2009) took advantage of the well-developed queueing network ap- for online decision making after estimating the coefficients of the linear
proach where interdependent resources process project tasks. This re- value function approximation in a simulation-based training phase.
quires consideration of projects of relatively simple structure such We compare the performance of our ADP algorithm with four solu-
as tasks requiring the allocation of a single unit of a single type of tion approaches from Satic et al. (2022), namely a DP algorithm that
resource. Capa and Ulusoy (2015), Fliedner et al. (2012), Pamay et al. computes the optimal policy; an optimal reactive baseline algorithm
(2014) considered a reactive scheduling method which generates a
(ORBA) and a genetic algorithm (GA) that both generate schedules to
baseline schedule for current projects and then updates it at each time a
maximise the total profit of ongoing projects; and a rule-based algo-
new project arrival disrupts the schedule. This approach can be remark-
rithm (RBA) that uses the longest task first rule to guide the allocation
ably suboptimal as evidenced in our computational study in Section 5.
of remaining resources to tasks.
Melchiors et al. (2018), Satic et al. (2022) modelled the problem as
a Markov decision process (MDP), using dynamic programming (DP) We run our benchmark tests on the problems of Satic et al. (2022).
to evaluate optimal policies. This solution approach suffers from the In addition, we generate new comparison problems that are larger and
curse of dimensionality and thus can only be used for unrealistically include non-sequential project networks and multiple resource types.
small problems. Chen et al. (2019) divided the multi-project problem The larger size problems are computationally intractable for DP and
into states according to the project’s completion conditions and then ORBA; thus, we benchmark ADP with GA and RBA on these problems.
searched best priority rules for each state, but priority rules are notably We contribute to the literature by (i) a new comprehensive MDP
prone to be suboptimal. model which considers the random arrival of new projects, stochastic
Our methodological approach is similar to Choi et al. (2007), Mel- task durations, multiple resource types, non-sequential project net-
chiors (2015), Parizi et al. (2017) in that we also formulate the problem works, project completion rewards, project due dates and tardiness
as an MDP and design a scheduling policy via approximate dynamic costs, (ii) a new approximation function that uses project completion re-
programming (ADP). However, our model is notably more comprehen- wards, tardiness costs and spent resource amounts for decision making,
sive and allows for solving problems that are larger and/or have a more and is capable of solving much larger, more complex and much more
complex structure, which are closer to those appearing in practice.
general problems than ADPs from existing literature, (iii) an extensive
Choi et al. (2007) considered applications in the agricultural and phar-
simulation study illustrating the strengths and weaknesses of different
maceutical industries; thus, they focused on serial project networks,
approaches, (iv) benchmarking with DP and ORBA whenever tractable
stochastic task outcomes (success or failure), a single resource type,
and with two other approaches in larger problems, (v) developing an
single resource usage per task and no project due dates. Melchiors
(2015, chapter 7) conducted experiments on small problems with two efficient implementation of the proposed ADP method in the Julia
projects with three tasks with a single unit of resource capacity for each programming language to solve dynamic and stochastic RCMPSPs.
resource type, tasks require a single unit of resource only, identical This paper is organised as follows: In Section 2, we describe the
project networks for both projects, rejection, holding and processing problem setting, the MDP model and our goal function. In Section 3,
costs, but no project due dates. Parizi et al. (2017) considered deter- we describe our ADP algorithm and its coefficient training procedure.
ministic task processing times with rejection, holding and processing In Section 4, we describe the comparator algorithms and discuss the
costs. Their numerical study had short simulation durations with heavy comparison results in Section 5. In Section 6 we conclude.
discounting.
ADP is a powerful tool that provides researchers with the ability
to adjust the complexity of the optimisation model to trade-off the 2. The dynamic and stochastic RCMPSP model
solution complexity of large (realistic) problems at the expense of
a modest suboptimality. An acceptable trade-off can be achieved by 2.1. Problem setting
careful mathematical modelling of the problem in hand; this is in con-
trast to general purpose methods such as genetic algorithms and other
A project is a group of tasks which are bound to each other with
heuristics which typically rely only on tuning of algorithm parameters.
Our literature summary shows that ADP has been used in dynamic predecessor–successor relationships, also called a project network. We
variants of the RCMPSP such as Choi et al. (2007), Melchiors (2015), consider ‘‘finish to start’’ precedence relations between tasks. There are
Parizi et al. (2017). It has also been applied in static variants of the 𝐾 types of renewable resources and the available integer number of
RCMPSP (Li & Womer, 2015; Li et al., 2023). Outside of applications units is represented by 𝐵𝑘 > 0 for resource type 𝑘 = 1, … , 𝐾. These
in project scheduling, ADP methods have been applied in areas such resources need to be allocated to process tasks of random duration from
as clinical trials (Ahuja & Birge, 2020), vehicle scheduling (He et al., randomly arriving projects.
2018), capacity allocation (Schütz & Kolisch, 2012), machine schedul- We assume that each arriving project can be categorised as one of
ing (Ronconi & Powell, 2010) and missile defence systems (Davis et al., the possible project types ( = {1, 2, … , 𝐽 }). All projects of type 𝑗 share
2017). features such as: inter-arrival time distribution 𝛬𝑗 , completion reward
We consider new projects arriving at random during the execution 𝑟𝑗 , due date 𝐹𝑗 , and tardiness cost 𝑤𝑗 ; set of tasks 𝑗 = {1, 2, … , 𝐼𝑗 },
of ongoing projects, project completions generate rewards which are project network with sets 𝑗,𝑖 of predecessors of each task 𝑖, task
decreased by tardiness costs if completed after their respective due resource usages 𝑏𝑘𝑗,𝑖 , and task duration distribution 𝛤𝑗,𝑖 .
dates, processing times of the project tasks are uncertain, multiple types
The system always accepts newly arrived projects until the system
of resources are available and multiple amounts of resources can be
capacity for type 𝑗 projects is reached and rejects the remaining newly
used by each project type. Thus, our model can help optimise the
arrived projects. We only consider non-preemptive task processing; thus,
use of company resources, reduce project delays, and improve overall
task processing cannot be paused from after it began until the end of its
productivity. We model the problem as an infinite-horizon discrete-
time MDP and seek to maximise the expected total discounted long-run random duration. The duration of a task is not revealed to the decision
profit. maker until it is actually reached.
In this paper we show that ADP is a very useful and advanta- When all tasks of a project are processed, the project is completed,
geous method for the dynamic and stochastic RCMPSP. We use an and the project completion reward is earned. However, if the project
ADP algorithm with a linear approximation model to approximate the due date passes before the project is completed, the tardiness cost is
value function of the Bellman equation. Our approximation model uses incurred.
455
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
456
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
An important element of the pre-decision state is also the number 2.4.4. Pre-acceptance state
of free resources, 𝐵𝑘free (𝑠) ∈ {0, 1, … , 𝐵𝑘 }, 𝑘 = 1, … , 𝐾. This is not A pre-acceptance state 𝑠∗ represents the system information at the
explicitly included as part of the state because it can be calculated from end of the transition time but immediately before the pre-decision
the other state elements via: state 𝑠′ at the following decision epoch. A pre-acceptance state consists
𝐼 of its project states 𝑷 ∗𝑗 which consist of pre-acceptance task states
∑
𝐽 ∑𝑗
𝐵𝑘free (𝑠) = 𝐵𝑘 − 𝑏𝑘𝑗,𝑖 𝛶 {𝑥𝑗,𝑖 > 0}, (8) 𝑥∗𝑗,𝑖 for each task 𝑖 ∈ 𝑗 and pre-acceptance due date states 𝑑𝑗∗ . A
𝑗=1 𝑖=1 pre-acceptance state shows the task processing progress after a post-
where 𝛶 {⋅} is an indicator function. decision state during the transitional time without new project arrivals.
Eq. (17) shows possible task state transitions from a 𝑥̂ 𝑗,𝑖 to 𝑥∗𝑗,𝑖 with their
2.4.2. Action probabilities. As (15) presents, a pre-acceptance due date state is zero
An action 𝑎 represents the decision regarding which tasks to begin (𝑑𝑗∗ = 0) if project type 𝑗 has just been completed or if the post-decision
processing from those tasks whose states are ‘‘pending for processing’’. due date state is zero. For the other possibilities, a pre-acceptance due
The action consists of action elements 𝑎𝑗,𝑖 for each task 𝑖 ∈ 𝑗 of all date state is equal to the post-decision due date minus one:
project types (𝑗 ∈ ). An action element takes the value of 1 (𝑎𝑗,𝑖 = 1) ⎧0, if 𝑑𝑗 = 0,
to represent the decision to start processing a qualifying task and takes ⎪
𝑑𝑗∗ = ⎨0, if ∃𝑖 ∈ 𝑗 ∶ 𝑥̂ 𝑗,𝑖 ≥ 0 and ∀𝑖 ∈ 𝑗 ∶ 𝑥∗𝑗,𝑖 = 0, (15)
the value 0 (𝑎𝑗,𝑖 = 0) otherwise: ⎪
⎩𝑑𝑗 − 1, otherwise.
⎡ 𝑎1,1 , 𝑎1,2 , … 𝑎1,𝐼1 ⎤
⎢ 𝑎 , ⎥ If a type 𝑗 project arrived during the transition time and state 𝑷 ∗𝑗 is such
𝑎2,2 , … 𝑎2,𝐼2
𝑎 = ⎢ 2,1 ⎥. (9) that a type 𝑗 project is completed or not present (∀𝑖 ∈ 𝑗 ∶ 𝑥∗𝑗,𝑖 = 0) the
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ 𝑎 , ⎥ system accepts the new type 𝑗 project. Otherwise, the system rejects
⎣ 𝐽 ,1 𝑎𝐽 ,2 , … 𝑎𝐽 ,𝐼𝐽 ⎦ the new arrival. From a pre-acceptance state 𝑠∗ to the following pre-
An action 𝑎 must fulfil three requirements: decision state 𝑠′ , the new task state becomes −1 and the due date state
becomes 𝐹𝑗 .
(1) The selected tasks for processing must have the task state ‘‘pending
for processing’’:
2.4.5. Transition function
∀𝑗 ∈ , 𝑖 ∈ 𝑗 ∶ 𝑎𝑗,𝑖 = 1 ⇒ 𝑥𝑗,𝑖 = −1. (10) The transition function, 𝑠′ = 𝑠𝑀 (𝑠, 𝑎, 𝑐), represents the transformation
of a system from a pre-decision state 𝑠 under action 𝑎 to a pre-
(2) There must be enough free resources of each type to allocate for decision state 𝑠′ at the next decision epoch by random events 𝑐 during
processing the selected tasks: the transition time. Random events include new project arrivals, task
completions and project completions. All task completions and project
𝐼
∑
𝐽 ∑𝑗
arrivals are stochastically independent.
𝑏𝑘𝑗,𝑖 𝛶 {𝑎𝑗,𝑖 = 1} ≤ 𝐵𝑘free (𝑠), 𝑘 = 1, … , 𝐾. (11)
A project of each type may arrive in the system during a transition
𝑗=1 𝑖=1
time according to its type’s arrival probability 𝜆𝑗 and is accepted if no
(3) All predecessor tasks of the selected tasks must be completed: type 𝑗 project exists in the system.
A task may complete processing according to a conditional prob-
∀𝑗 ∈ , 𝑖 ∈ 𝑗 ∶ 𝑎𝑗,𝑖 = 1 ⇒ ∀𝑚 ∈ 𝑗,𝑖 ∶ 𝑥𝑗,𝑚 = 0. (12) ability 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ), if the task’s processed time following this transition
Here, 𝑚 represents a predecessor of task 𝑖 (𝑚 ∈ 𝑗 ⧵ 𝑖, 𝑚 ∈ 𝑗,𝑖 ). (𝑡max
𝑗,𝑖 − 𝑥 ̂ 𝑗,𝑖 + 1) is equal to or greater than its minimal possible duration
The action where all action elements are zero is also a valid action 𝑡min
𝑗,𝑖 . Formally, we define 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ) = 0 for ∀𝑥̂ 𝑗,𝑖 > 𝑡max min
𝑗,𝑖 − 𝑡𝑗,𝑖 + 1, and
max min
require 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ) > 0 for 𝑥̂ 𝑗,𝑖 = 𝑡𝑗,𝑖 − 𝑡𝑗,𝑖 + 1, which guarantees that
and indicates that no task was selected to begin processing in that
period. More than one action may fulfil all three requirements for a 𝑡min
𝑗,𝑖 is the minimal possible duration. We also require 𝛾𝑗,𝑖 (1) = 1, which
pre-decision state. The set of these actions is named an action set (𝑠). guarantees that 𝑡max 𝑗,𝑖 is the maximal possible duration.
The probability of reaching a pre-decision state 𝑠′ from a pre-
2.4.3. Post-decision state decision state 𝑠 under action 𝑎 with the transition function 𝑃 (𝑠′ |𝑠, 𝑎) is
A post-decision state 𝑠̂ represents the system information immedi- the joint probability of task completions 𝑃 (𝑥′𝑗,𝑖 |𝑥̂ 𝑗,𝑖 ) and project arrivals
ately after a decision epoch and just before the transition time begins. 𝑃 (𝑷 ′𝑗 |𝑷̂ 𝑗 ):
In other words, a post-decision state is the system information from
∏𝐽 ⎛∏ 𝐼𝑗 ⎞
the pre-decision state 𝑠 updated by an action 𝑎 but before any task ⎜
𝑃 (𝑠′ |𝑠, 𝑎) = 𝑃 (𝑥∗𝑗,𝑖 |𝑥̂ 𝑗,𝑖 )⎟ 𝑃 (𝑷 ′𝑗 |𝑷𝑗∗ ), (16)
processing or random event occurs, i.e., 𝑠̂ ∶= 𝑓 (𝑠, 𝑎) where 𝑓 is a ⎜
𝑗=1 ⎝ 𝑖=1
⎟
⎠
deterministic function defined in (14) for each element of 𝑠 and 𝑎.
⎧𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ), if 𝑥̂ 𝑗,𝑖 ≥ 1 and 𝑥∗𝑗,𝑖 = 0,
A post-decision state consists of the post-decision project states 𝑷̂ 𝑗 . A ⎪
post-decision project state consists of post-decision task states 𝑥̂ 𝑗,𝑖 of ⎪1 − 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ), if 𝑥̂ 𝑗,𝑖 ≥ 2 and 𝑥∗𝑗,𝑖 = 𝑥̂ 𝑗,𝑖 − 1,
𝑃 (𝑥∗𝑗,𝑖 |𝑥̂ 𝑗,𝑖 ) = ⎨ (17)
each task 𝑖 ∈ 𝑗 and the same due date states 𝑑𝑗 of the pre-decision if 𝑥∗𝑗,𝑖 = 𝑥̂ 𝑗,𝑖 ≤ 0,
⎪1,
project state 𝑷 𝑗 : ⎪0, otherwise,
⎩
⎡ 𝑥̂ 1,1 , 𝑥̂ 1,2 , … 𝑥̂ 1,𝐼1 , 𝑑1 ⎤ ⎧𝜆𝑗 ,
⎢ 𝑥̂ , ⎥ if ∀𝑖 ∈ 𝑗 ∶ 𝑥′𝑗,𝑖 = −1, 𝑥∗𝑗,𝑖 = 0,
𝑥̂ 2,2 , … 𝑥̂ 2,𝐼2 , 𝑑2 ⎪
𝑠̂ = ⎢ 2,1 ⎥. (13) ⎪1 − 𝜆𝑗 , if ∀𝑖 ∈ 𝑗 ∶ 𝑥′𝑗,𝑖 = 𝑥∗𝑗,𝑖 = 0,
⎢ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥ ′ ∗
𝑃 (𝑷 𝑗 |𝑷𝑗 ) = ⎨ (18)
⎢ 𝑥̂ , 𝑥̂ 𝐽 ,2 , … 𝑥̂ 𝐽 ,𝐼𝐽 , 𝑑𝐽 ⎥ ⎪1, if ∀𝑖 ∈ 𝑗 ∶ 𝑥′𝑗,𝑖 = 𝑥∗𝑗,𝑖 ≠ 0,
⎣ 𝐽 ,1 ⎦
⎪0, otherwise.
A post-decision task state 𝑥̂ 𝑗,𝑖 is the updated state of a task from the ⎩
preceding pre-decision task state 𝑥𝑗,𝑖 . It is only the tasks that have Here in (17), the first line represents that the post-decision task state
been selected to start their processing (𝑎𝑗,𝑖 = 1) that change from the of task 𝑖 from project type 𝑗 allows for task completion and, with
pre-decision state −1 to the post-decision state 𝑡max
𝑗,𝑖 :
probability 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ), the task will be completed by the pre-acceptance
{ state. The second line represents that, with probability 1 − 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ), the
𝑡max
𝑗,𝑖 , if 𝑥𝑗,𝑖 = −1 and 𝑎𝑗,𝑖 = 1, task will not be completed by the pre-acceptance state. The third line
𝑥̂ 𝑗,𝑖 = (14)
𝑥𝑗,𝑖 , otherwise. represents that the post-decision task state of task 𝑖 from project type
457
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
In (18), the first line represents that, with 𝜆𝑗 probability, there will Model : 1 2 3 4 5 6 7 8 9 10 11
be an arrival of project type 𝑗 during the transition time and the new Feature 1 PT CRU TRU DR PT PT PT CRU TRU PT PT
type 𝑗 project will take the place of the previously completed or non- Feature 2 – – – – CRU TRU DR DR DR CRU TRU
existing type 𝑗 project. The second line represents that, with 1 − 𝜆𝑗 Feature 3 – – – – – – – – – DR DR
Here, 𝑡max
𝑗,𝑖 − 𝑥̂ 𝑗,𝑖 represents task 𝑖’s processed time. Since the task’s
𝑅𝑠,𝑎,𝑠′ processed time is zero when the action is taken (𝑥̂ 𝑗,𝑖 = 𝑡max
𝑗,𝑖 ), we use
𝐽 { }}
∑ ( { }) { the task’s processed time after the transition time ends (𝑡max
𝑗,𝑖 − 𝑥 ̂ 𝑗,𝑖 + 1)
= 𝑟𝑗 − 𝑤𝑗 𝛶 𝑑𝑗 = 0 𝛶 ∃𝑖 ∈ 𝑗 ∶ 𝑥̂ 𝑗,𝑖 > 0 and ∀𝑖 ∈ 𝑗 ∶ 𝑥′𝑗,𝑖 ≤ 0 . to differentiate the effect of actions.
𝑗=1
The second feature is the current resource usage (CRU) of project 𝑗:
(19)
𝐼
Here, the first indicator is for late project completion that takes the 𝐾 ∑
∑ 𝑗
{ }
value 1 if a project completes later than its due date (i.e., the project’s 𝑏𝑘𝑗,𝑖 𝛶 𝑥̂ 𝑗,𝑖 > 0 . (23)
𝑘=1 𝑖=1
due date state 𝑑𝑗 = 0) and takes the value 0 otherwise. The second
indicator is for project completion that takes the value 1 if a project Here, we consider the post-decision state’s resource allocation because
completes (at least one task is in progress in the post-decision state and it gives the best information about resources used by an action. For
all project tasks are complete at the end of the period) and takes the example, for single-period tasks, the resource allocation information of
value 0 otherwise. action may disappear after one transition time. Thus, the post-decision
Our objective function seeks to find the policy that maximises the state’s resource allocation gives the most precise information about
expected total discounted long-run profit: resource usage after the action is taken.
[∞ ] The third feature is the total resource used (TRU) by project 𝑗 to
∑
𝑉 ∗ (𝑠1 ) = max E𝜋 𝛼 𝑡−1 𝑅𝑠𝑡 ,𝑎,𝑠𝑡+1 . (20) date:
𝜋∈𝛱
𝑡=1 𝐼
∑
𝐾 ∑𝑗
Here, 𝑠1 is a given initial state; 𝑅𝑠𝑡 ,𝑎,𝑠𝑡+1 is the immediate profit of state 𝐵𝑗 (𝑠) + 𝑏𝑘𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 > 0}. (24)
transition from pre-decision states 𝑠𝑡 to 𝑠𝑡+1 under the action 𝑎 at the 𝑘=1 𝑖=1
time 𝑡; 𝛼 is a discount factor in the interval (0,1); 𝜋 is a policy from Here, 𝐵𝑗 (𝑠) is the TRU of the previous state with the selected action
the set of all stationary deterministic policies 𝛱 that prescribe in every ∑𝐼𝑗 𝑘
𝐵𝑗 (𝑠𝑡 ) = 𝑇 𝑅𝑈𝑗 (𝑠̂𝑡−1 ) and 𝑖=1 𝑏𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 > 0} represents the amount
state 𝑠 an action from the action set (𝑠). of type 𝑘 resource that will be used to process project type 𝑗 under
the latest taken action. This feature reflects the resource usage of a
3. Approximate dynamic programming (ADP)
project and, indirectly, the time that the project has been processed
in the system.
In theory, the problem (20) can be solved using the Bellman equa-
The fourth feature is decision reward (DR), which is the reward
tion:
∑ per period of processing, under the assumption the project will be
𝑉 ∗ (𝑠) = max 𝑃 (𝑠′ |𝑠, 𝑎)[𝑅𝑠,𝑎,𝑠′ + 𝛼𝑉 ∗ (𝑠′ )] ∀𝑠 ∈ , (21) processed continuously until completion:
𝑎∈(𝑠) ′
𝑠 ∈
⎧ 𝑟𝑗
but in practice, it suffers from ‘‘the curse of dimensionality’’. ‘‘The ⎪ℎ if 𝑑𝑗 > ℎ𝑗
curse of dimensionality’’ means that the number of states and compu- 𝑅̄ 𝑗 = ⎨ 𝑟𝑗𝑗−𝑤𝑗 (25)
⎪ ℎ𝑗 , otherwise.
tational requirements expands exponentially with the number of state ⎩
variables (Sutton & Barto, 2018). Satic et al. (2022) investigated the
In (25), the remaining late project horizon ℎ𝑗 is given by:
limitations of DP and stated that a state space larger than their five
projects with two tasks problem is computationally intractable for their 𝐼𝑗 { }
∑
hardware. ℎ𝑗 = 𝑥̂ 𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 ≠ −1} + 𝑡max
𝑗,𝑖 𝛶 {𝑥
̂ 𝑗,𝑖 = −1} , (26)
ADP is a modelling strategy to overcome ‘‘the curse of dimension- 𝑖=1
ality’’ problem of DP due to the use of the Bellman equation (Powell, i.e. the sum of remaining task durations assuming all remaining tasks
2009). In our ADP algorithm, we estimate the value function of the are completed on the maximum task duration. This is compared to the
Bellman Eq. (21) using a linear approximation model (27). A linear remaining duration to the due date, 𝑑𝑗 . When the remaining duration
approximation model is a regression model that fits the value function to the due date exceeds the remaining late project horizon, 𝑅̄ 𝑗 takes
of the Bellman equation by estimating a parameter vector 𝜃 (Powell, the value of the project completion reward divided by the remaining
2011, p 304). The linear approximation model only requires the current late project horizon. Otherwise, 𝑅̄ 𝑗 is reward minus tardiness cost
state and action information, and future state information and storing divided by the remaining late project horizon. Using this feature, the
the states becomes unnecessary. With the linear approximation model, duration to the project’s due date, a worst-case estimate of the remain-
the decision making is done in an online fashion; thus, ADP can be used ing processing duration, reward, and tardiness cost become decision
for larger size problems. elements.
458
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
Table 2 Table 4
Counts of when the model is not statistically significantly different from the highest Counts of when model has higher* profit from than Model 3.
profit. Model : 1 2 4 5 6 7 8 9 10 11
Model : 1 2 3 4 5 6 7 8 9 10 11
2p2t 1 2 2 2 0 2 2 2 2 2
2p2t 9 10 8 10 10 0 10 10 10 9 8 2p3t 0 0 0 0 0 0 0 0 0 0
2p3t 10 2 10 6 2 9 10 2 9 2 9 3p2t 6 2 9 1 7 7 4 6 3 5
3p2t 3 1 2 4 2 3 6 3 3 0 5 2p10t 0 1 0 1 0 0 2 5 1 0
2p10t 0 3 5 1 3 0 1 5 7 4 0 5p5t 0 0 0 0 2 0 0 2 0 0
5p5t 1 0 7 1 1 3 1 0 4 1 0 5p10t 0 0 0 0 0 0 0 5 0 0
5p10t 1 1 5 1 1 0 1 1 6 1 1 6p5t 3 4 3 8 7 4 5 8 6 7
6p5t 0 3 0 1 2 4 0 1 0 0 3 10p10t2r 0 7 1 9 9 0 8 8 8 8
10p10t2r 0 0 0 0 5 4 0 1 3 1 2 5p30t4r 2 0 2 0 0 1 2 0 0 0
5p30t4r 7 1 7 7 1 0 1 1 1 2 0 2p30t2r 0 0 0 0 0 0 0 0 0 0
2p30t2r 0 1 10 0 1 0 0 1 1 1 0
SUM 12 16 17 21 25 14 23 36 20 22
SUM 31 22 54 31 28 23 30 25 44 21 28
Table 5
Table 3 Counts of when model has lower* profit from than Model 3.
Counts of when the model is not statistically significantly different from the lowest Model : 1 2 4 5 6 7 8 9 10 11
profit.
2p2t 0 0 3 0 10 0 0 0 1 4
Model : 1 2 3 4 5 6 7 8 9 10 11
2p3t 0 9 6 9 1 0 9 1 9 1
2p2t 0 0 0 0 0 10 0 0 0 0 0 3p2t 9 3 9 5 10 8 2 3 3 10
2p3t 1 9 1 1 9 1 1 9 2 9 2 2p10t 2 7 1 5 2 2 5 0 6 5
3p2t 2 4 0 0 1 2 2 3 0 3 2 5p5t 8 10 9 9 8 9 10 6 9 9
2p10t 0 0 0 0 0 1 7 0 0 0 3 5p10t 10 9 9 9 10 10 9 5 9 9
5p5t 0 0 0 0 0 1 0 2 0 0 8 6p5t 6 6 5 2 3 5 5 2 4 3
5p10t 5 0 0 0 0 1 4 0 1 0 0 10p10t2r 10 2 9 1 1 10 2 2 2 2
6p5t 1 2 0 1 0 2 1 2 1 1 0 5p30t4r 3 7 4 9 10 7 7 9 7 10
10p10t2r 9 0 0 0 0 1 2 0 0 1 0 2p30t2r 10 9 10 9 10 10 9 9 9 10
5p30t4r 0 0 0 0 0 5 2 1 2 0 2
SUM 58 62 65 58 65 61 58 37 59 63
2p30t2r 1 0 0 1 0 1 8 0 0 0 1
SUM 19 15 1 3 10 25 27 17 6 14 18
2
* represents a statistical significance level of 0.05. ** represents a statis- 0.001. NS represents the absence of a statistically significant difference at a
tical significance level of 0.01. *** represents a statistical significance level of level of 0.05.
459
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
3
We also considered 𝜏ℎ𝑎𝑟𝑚𝑜𝑛𝑖𝑐 = 10 from Powell (2011, p 451) and KESTEN’s ORBA is an exact brute force algorithm that sequentially solves a
stepsize rule (Powell, 2011, p 436), but we received better results with the static RCMPSP from state 𝑠𝑡 to optimality. The static problem results by
stated settings. ignoring the possibility of future project arrivals. New project arrivals
460
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
Algorithm 3: : Value Iteration. highest ranked TSO is called the best TSO. After the first generation is
generated, GA begins iterations.
procedure state value iteration procedure
At each iteration, GA creates an empty set of TSOs and fills this new
𝛽 = 0.001 ⊳ 𝛽 is the stopping parameter
set with TSOs to the desired population size amount using elitist selec-
∀𝑠 ∈ ∶ 𝑉 𝑜𝑙𝑑 (𝑠) = 0 ⊳ initial state values
tion, crossover and mutation operators. The elitist selection operator
repeat
copies the desired amount of highest ranked TSOs from the previous
for ∀𝑠 ∈ ∶ do
∑ generation of TSOs to the new empty set of TSOs (new generation).
𝑉 (𝑠) = max 𝑠′ ∈ 𝑃 (𝑠′ |𝑠, 𝑎)(𝑅𝑠𝑡 ,𝑎,𝑠𝑡+1 + 𝛼𝑉 𝑜𝑙𝑑 (𝑠′ )) ⊳ value
𝑎∈𝐴(𝑠) Crossover and mutation operators fill the rest of the new generation.
function The crossover operator randomly selects two TSOs from the previ-
end for ous generation and randomly selects a task inside of the first TSO. The
𝑊𝑚𝑎𝑥 = max |𝑉 (𝑠) − 𝑉 𝑜𝑙𝑑 (𝑠)| ⊳ maximum value change crossover operator copies tasks, from the earliest task to be processed
𝑠∈
Update ∀𝑠 ∈ ∶ 𝑉 𝑜𝑙𝑑 (𝑠)
= 𝑉 (𝑠) up to the randomly selected task of the first (selected) TSO, to (make)
until 𝑊𝑚𝑎𝑥 ≤ 𝛽(1 − 𝛼)∕(2𝛼) a new TSO without changing the order of these tasks. Then, the
end procedure crossover operator re-orders the remaining tasks of the first (selected)
TSO according to order of these tasks in the second (selected) TSO,
then adds them to the new TSO (to after the randomly selected task).
The new TSO is always a precedence-feasible TSO since it is created
disrupt the current schedule and a new schedule incorporating these according to the order of tasks in both selected precedence-feasible
is required. In such a manner, ORBA represents an optimal reactive TSOs.
scheduling algorithm. Due to the computational requirements of brute The new TSO may be adjusted by the mutation operator with a
force algorithms, ORBA runs in factorial time and only small-size desired probability. Under the mutation operation, a task is selected
dynamic and stochastic RCMPSP problems can be solved within a at random and the location of this task in the TSO is randomly re-
reasonable time. Thus, we limit our test problems with ORBA to a assigned. The new location cannot be later than the task’s previous
maximum of 10 tasks. The ORBA used here extends that in Satic et al. order and cannot be sooner than its latest to be processed predecessor
(2022) by allowing for multiple resource types. task. Thus the mutation operator also ensures that the newly generated
Specifically, for a given pre-decision state 𝑠𝑡 , ORBA calculates the TSO is a precedence-feasible TSOs. Then the new TSO is added to
profits and makespans of all precedence-feasible task scheduling orders the new generation. When the size of the new generation reaches
(TSOs) then selects a TSO of maximal profit. A precedence-feasible TSO the population size amount, the TSOs are then ranked as in the first
is a permutation 𝜎 of tasks such that, for any 𝑚 < 𝑛 with 𝜎(𝑚) = (𝑗, 𝑖) and generation. GA iterates the generations until the desired number of
𝜎(𝑛) = (𝑗, 𝑘), 𝑘 ∉ 𝑗,𝑖 . In the case of ties, the candidate schedule with generations is reached. The best TSO of the final generation is used
minimal makespan is selected, and the schedule of the smallest project for decision making.
and task indices are selected from any remaining candidate schedules. The TSO is converted to an action in the same way as for ORBA.
The generated TSO is converted to non-idling action 𝑎 for given Similar to ORBA, GA’s TSO can be used for future pre-decision states
pre-decision state 𝑠𝑡 using a serial schedule-generation scheme (SGS). as long as a new project arrival does not disturb the system.
A non-idling action is one that will always allocate resource to tasks Since the reactive scheduling method reruns GA for each time an ar-
when it is possible to do so. In an SGS, if there are enough free rival disturbs the processing plan, the computational time requirement
resources to process the next task in the TSO, its action becomes one increases with the problem size.
(𝑎𝑗,𝑖 = 1), and its resource usage is subtracted from the free resources.
The process then repeats for the remaining tasks in order of the TSO 4.4. Rule-based algorithm (RBA)
until either no tasks can begin processing or there is insufficient free
resources to process any of the remaining tasks. The TSO is followed Rule-Based Algorithm (RBA) is a priority-based heuristic algorithm
in future periods as long as no new project arrives. If a new project which uses the longest task first priority rule. We considered RBA in
arrival disturbs the system, the current TSO becomes invalid, and ORBA benchmarking to show the performance of a simple heuristic algorithm.
generates a new TSO. Due to the simplicity of the algorithm, it runs very fast for all problem
sizes.
For a given pre-decision state 𝑠𝑡 , RBA creates a precedence-feasible
4.3. Genetic algorithm (GA)
TSO where the tasks with the longest task processing durations have
priority over other tasks. Then the TSO is converted to an action as
GA is a heuristic algorithm which searches the solution space using same as in ORBA.
a set of solutions (population). GA then improves the population many
times (generation) using bio-inspired operators such as crossover and 5. Computational results
mutation to find better solutions. We used the genetic algorithm to
benchmark with our ADP algorithm since GA is the most popular We simulate the dynamic project scheduling environment with
method for RCPSP family. GA is applied to dynamic problems using a random new project arrivals and stochastic task durations, and we
reactive scheduling method. We used GA from Satic et al. (2022) and, compare the expected total discounted long-run profit performance
in this paper, we extended it to multiple resource types. of DP, ADP, ORBA, GA and RBA. Algorithm 4 shows the simulation
For a given pre-decision state 𝑠𝑡 , GA generates the desired pop- procedure we used in our comparisons. The statistical significance of
ulation size amount of precedence-feasible TSOs, which are random ADP (Model 9) against other methods are shown in the tables at three
permutations of the pending for processing tasks (𝑥𝑗,𝑖 = −1). The levels (0.001, 0.01, 0.05). The experiments are coded in JuliaPro 1.3.1.2
algorithm evaluates profits and makespans of the TSOs and then ranks on a desktop computer with Intel i5-11400F CPU with 2.60 GHz clock
them using these values. These were evaluated via simulation under speed and 32 GB of RAM.
an assumption that there will be no new project arrivals and all tasks We used 100 problem scenarios in our comparison which are a
complete at 𝑡max . TSOs with higher profits get a higher rank. In the combination of 10 project settings and 10 project arrival probabilities.
case of ties, TSOs with smaller makespans get a higher rank. If the These arrival probabilities 𝜆𝑗 are 0.01 and from 0.1 to 0.9 with in-
tie continues, TSOs ranked according to their creation time (earliest crements of 0.1. Since we consider a dynamic environment 𝜆𝑗 = 0 is
to latest). This first set of TSOs is called the first generation, and the not used in this comparison instead 𝜆𝑗 = 0.01 is used to represent the
461
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
Algorithm 4: : Simulation. some tasks require the allocation of all resources. The resource strength
procedure Profit simulation
of the second problem is 0.50. Here, the initial task of each project can
for 𝑠𝑖𝑚 = 1 to 𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 do ⊳ for each simulation be processed in parallel with any task of the other projects. Using these
𝑉̃𝑠𝑖𝑚 = 0, 𝑠1 = 𝟎 ⊳ 𝑉̃𝑠𝑖𝑚 is cumulative simulation profit, 𝑠1 the initial small-size problems we are able to compare ADP’s performance with
empty pre-decision state optimal policies of DP, scheduling orders of ORBA, GA and RBA.
for 𝑡 = 1 to 𝑃 𝑒𝑟𝑖𝑜𝑑 do ⊳ for each simulation period Tables 6–8 illustrate the expected total discounted long-run profits
𝑠𝑡+1 = 𝑠𝑀 (𝑠𝑡 , 𝑎, 𝑐𝑡 ) ⊳ 𝑎 ∈ 𝜋(𝑠𝑡 ), 𝜋(𝑠𝑡 ) is the policy of selected (which are averages of simulations) of policy generation methods (ver-
solution method tical) at different arrival probabilities (horizontal). The colour of cells
𝑉̃𝑠𝑖𝑚 = 𝑉̃𝑠𝑖𝑚 + 𝛼 𝑡−1 𝑅𝑠𝑡 ,𝑎,𝑠𝑡+1 shows the statistical significance of compared algorithms against ADP.
end for The simulation results of two project types, two tasks and one
end for
1 ∑𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 ̃ resource type problems are shown in Table 6. ADP produces higher***
𝑉̃ = 𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑚=1
𝑉𝑠𝑖𝑚
̃ profits than ORBA, GA and RBA, except for 𝜆𝑗 = 0.01. Recall that GA
return 𝑉
end procedure is evaluated on fewer simulations. Hence, at 𝜆𝑗 = 0.01, RBA’s profit
is lower*** than ADP’s while GA’s profit is NS with a similar profit
achieved.
The simulation results of two project types, three tasks and one
nearly-static case. Also, 𝜆𝑗 = 1 is not used because it causes a non- resource type problems are shown in Table 7. Algorithms’ profits are
ergodic MDP where some feasible states cannot be reachable from any NS different from each other at 𝜆𝑗 = 0.01. From 𝜆𝑗 = 0.1 to 𝜆𝑗 = 0.8,
states (for example a state where all projects are completed but no new ADP has higher** profits than ORBA, GA and RBA.
project has arrived). Of the 100 scenarios, 30 represent smaller sized The simulation results of three project types, two tasks and one
problems on which we are able to compare all solution algorithms. For resource type problems are shown at Table 8. ADP has higher*** profits
the 70 larger sized problems it is only possible to evaluate ADP, GA than RBA except for 𝜆𝑗 = 0.01. At 𝜆𝑗 = 0.1, 0.2, 0.8, 0.9, ADP has higher*
and RBA. profits than ORBA, GA and RBA.
In Algorithm 4, we run 100 simulations for 1000 simulation periods We see the same result as in Satic et al. (2022) in that the reactive
and a discount rate of 𝛼 = 0.999. (For the small instances we use scheduling methods ORBA and GA have close to optimal profits at
10 000 simulations for DP, ADP, RBA and ORBA.) If a period repre- 𝜆𝑗 = 0.01 where the results are NS different from each other. However,
sents a day, this period would represent approximately three years of their results usually diverge from optimum as 𝜆𝑗 increases.
processing time. Simulations start from the empty pre-decision state In summary, our comparison of project settings from Satic et al.
(𝑠1 = 𝟎). In each simulation period an action is generated with the (2022) shows that ADP cannot match the optimal policy of DP. DP’s
policy being investigated (𝜋). Then the following pre-decision state is advantage over the other policies is that it is able to better identify
generated given the action taken and the transition function. The profit opportunities to defer the use of resources to start processing a project
is generated, recorded and included within 𝑉̃𝑠𝑖𝑚 . The discounted profits task to more profitable projects in later periods. ADP generated higher*
of completed projects during the transition time are added to 𝑉̃𝑠𝑖𝑚 . After profits than ORBA, GA and RBA respectively in 22, 21 and 27 of 30
the end of the simulations, the average discounted long-run profit 𝑉̃ of problem scenarios. ADP generated NS different profits than ORBA, GA
the investigated solution method is calculated. and RBA respectively in 2, 3 and 2 of 30 problem scenarios. ADP
ADP (Algorithm 1) is trained for 100 iterations each having 100 generated lower* profits than ORBA, GA and RBA respectively in 6,
simulations with 1000 periods. GA is trained for 100 generations, each 6 and 1 of 30 problem scenarios. Thus, we showed that our linear ADP
with 100 solutions. The elitest selection operator transfers the best 10% model performs better than ORBA, GA and RBA in up to 90% of the
of solutions to the next generation. A new TSO created by the crossover scenarios from Satic et al. (2022).
operator is handled by the mutation operator with a 50% chance. We
note that, while ADP requires training once prior to the simulations, 5.2. Test problem generation
GA requires multiple training occurrences during the simulations. GA
is required to generate a new schedule each time a new project arrives. The problems of Satic et al. (2022) were the only dynamic and
In this study, we assume that the number of tasks of different project stochastic RCMPSPs in the literature that have a reward after comple-
types is equal. Project tasks have a completion duration range 𝑡max − tion, a tardiness cost after a given due date, arrival probability of new
𝑡min + 1 = 3 and have uniformly distributed completion probabilities. projects during a transition time, randomly early, normal and late task
The shortest maximum task duration in our study is 𝑡max = 2, for completions. However, Satic et al. (2022) only considered small-size
such tasks the completion duration range is 2 and have an arbitrary problems where the project network is sequential (serial, 𝑂𝑆𝑗 = 1).
completion probability distribution. The completion probabilities used Thus we generate larger size test problems using ProGen/Max and
in this computational study are: MPSPLIB problems.
ProGen/Max is an RCMPSP generation software which is developed
⎧ 1 , if 𝑡max
𝑗,𝑖 ≥ 3, 1 ≤ 𝑥
̂ 𝑗,𝑖 ≤ 3 by Schwindt (1998) which extends its predecessor ProGen (Kolisch
⎪ 𝑥̂ 𝑗,𝑖
⎪1, if 𝑡max et al., 1995) with an option to consider the minimum and maximum
𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ) = ⎨ 3 𝑗,𝑖 = 2, 𝑥
̂ 𝑗,𝑖 = 2
(31) time lags between the start of activities.
⎪1, if 𝑡max
𝑗,𝑖 = 2, 𝑥
̂ 𝑗,𝑖 = 1
⎪ We used ProGen/Max to generate RCPSPs with different activity-on-
⎩0, otherwise. node networks, order strength (denoted 𝑂𝑆𝑗 ), task durations, resource
usage, and resource availability. We combined these RCPSPs prob-
5.1. Optimality gaps for small instances lems and added stochastic task completion, project arrival probability,
project completion reward, late completion cost, and due date. We add
We used project settings from Satic et al. (2022). The problems’ reasonable completion rewards and tardiness costs to each project. We
data is available in their paper and at https://github.com/ugursatic/ used the generated task durations as expected task durations 𝑡𝑗,𝑖 and
DSRCMPSP. These problems are arbitrarily created to be small and added one minimal possible (𝑡min 𝑗,𝑖 = 𝑡𝑗,𝑖 − 1) and one maximal possible
solvable by DP. In these problems, order strength, resource factor (𝑡max
𝑗,𝑖 = 𝑡𝑗,𝑖 + 1) duration options. We tested all problems with ten dif-
and the number of resources are set to 1.00. In other words, project ferent 𝜆𝑗 options which are 0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9. We
networks are serial, and all tasks use the same amount of resources. The generated the due date of the project via (32), where 𝜌 is an arbitrary
resource strength of the first and third problems is 0.00, which means factor, which we set to 1.5. We adjusted the resource availability using
462
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
Table 6
Two project types and two tasks problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DP 77 503 759 907 1000 1063 1109 1143 1169 1190
ADP 75 481 710 835 911 960 995 1020 1038 1051
ORBA 73 466 669 768 817 840 846 844 837 829
GA 72 452 636 708 745 752 754 735 735 724
RBA 72 413 529 551 542 525 507 491 480 473
Table 7
Two project and three tasks problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DP 115 590 798 905 970 1013 1044 1066 1083 1097
ADP 115 584 786 889 949 988 1015 1034 1048 983
ORBA 115 575 768 862 915 947 966 979 988 995
GA 114 573 772 857 911 952 965 987 984 999
RBA 115 573 762 854 905 937 954 968 975 983
Table 8
Three project and two tasks problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DP 199 878 1044 1122 1197 1263 1325 1378 1427 1468
ADP 181 746 857 936 1012 1089 1163 1229 1342 1373
ORBA 182 728 854 950 1042 1129 1207 1273 1325 1366
GA 186 727 848 947 1040 1128 1209 1277 1327 1367
RBA 183 736 815 862 921 986 1050 1114 1173 1228
(33) because the combination of resource availabilities of multiple the stochastic task completion and arrival probabilities as same as
single project problems makes the multi-project problem resource-rich: ProGen/Max generated problems. We generated the due date of the
project as:
( ∑𝐼 ) ∑𝐼𝑗
⎛ ⎧ ∑𝐾 𝑗
(𝑡 𝑏𝑘 ) ⎫ ∑𝐾 (𝑡 𝑏𝑘 )
𝑖=1 𝑗,𝑖 𝑗,𝑖
𝑖=1 𝑗,𝑖 𝑗,𝑖
⎜ ⎪ 𝑘=1 𝐵𝑘 ⎪ 𝑘=1 𝐵𝑘
⎜ ⎪ ⎪ 𝐹𝑗 ≈ 𝐵𝑒𝑠𝑡𝑗 + 𝐽 . (34)
𝐹𝑗 ≈ ⎜(1 − 𝑂𝑆𝑗 ) max ⎨𝐽 , max{𝑡𝑗,1 , 𝑡𝑗,2 , … , 𝑡𝑗,𝐼𝑗 }⎬ + 𝐾
⎜ ⎪ 𝐾 ⎪
⎜ ⎪ ⎪ 5.3. Performance analysis for larger instances
⎝ ⎩ ⎭
( ∑𝐼 )
⎧ ∑𝐾 𝑗
(𝑡 𝑏 ) ⎫⎞
𝑘
𝑖=1 𝑗,𝑖 𝑗,𝑖
⎪ 𝐼𝑗 ⎪⎟ We created five project settings with ProGen/Max and two project
𝑘=1
⎪∑
𝐵𝑘
⎪⎟ settings with MPSPLIB problems. The parameters of these problems are
(𝑂𝑆𝑗 ) max ⎨ 𝑡𝑗,𝑖 , 𝐽 ⎬⎟ 𝜌,
⎪ 𝑖=1 𝐾 ⎪⎟ shown in Appendix A. Our ProGen/Max problems are not resource-rich,
⎪ ⎪⎟ and their resource strengths are between 0.007 and 0.176. Maximum
⎩ ⎭⎠
task durations of these problems range between 2 and 21 according to
(32) a uniform distribution. More detailed information (e.g., task duration,
project network, resource usages) about these problems and more
∑𝐽 ( ( )) detailed test results are available at https://github.com/ugursatic/
50 − 4𝐽
𝐵𝑘 ≈ 𝐵𝑘𝑗 . (33) DSRCMPSP. The size of these problems exceeds the computational
𝑗=1
100
limits of DP and ORBA on our hardware, so we only compared ADP,
The MPSPLIB (http://www.mpsplib.com/) is a RCMPSP library GA and RBA.
which contains the problem set of Homberger (2007). These RCMPSP In a number of the larger scale problems, there can be relatively
problems are made by combining single project problems of PSPLIB small increases in profit as the arrival probability increases. We have
(http://www.om-db.wi.tum.de/psplib) which is RCPSP library (Kolisch included the full range of results for completeness and to appraise
& Sprecher, 1996). PSPLIB problems are generated with ProGen. potential differences in performance at different arrival probabilities.
The MPSPLIB contains 140 instances that differ by project type A broader discussion on the effect of arrival probabilities is provided
number, project number, task number, global resource type number in Appendix C.
and arrival times. Global resources are shared among all projects, and In five project types, five tasks and four resource types problems,
local resources are only used for a single project. Compared to our shorter projects are more profitable than longer ones. Thus, RBA with
ProGen/Max generated problems, MPSPLIB problems have predefined longer tasks first rule is disadvantaged, and we expect that ADP out-
tardiness costs and due dates. However, these due dates (𝐵𝑒𝑠𝑡𝑗 ) are the performs RBA. Table 9 shows that ADP produces better*** results than
shortest completion time found for single project problems, which we GA and RBA in all 𝜆𝑗 values.
need to modify to use in the dynamic multi-project setting. In two project types, ten tasks and two resource types problems,
From the MPSPLIB, we only considered 30 tasks per project prob- project type 1 is more profitable based on its reward/project horizon4
lems for algorithm benchmarking. We combined the given global and ratio. Also, tasks of project type 1 are longer than project type 2. Thus
local resources for each resource type. Then we reduced the resource
amount using (33). We use the predefined tardiness costs and twice
4
the tardiness cost as completion rewards to each project. We use The project horizon is the sum of its maximal task durations.
463
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
Table 9
Five project types, five tasks and four resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 937 1492 1453 1495 1475 1494 1369 1473 1295 1485
GA 869 1128 1143 1141 1158 1146 1143 1148 1148 1145
RBA 698 601 576 562 558 549 527 538 517 507
Table 10
Two project types, ten tasks and two resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 292 424 490 502 519 754 756 759 759 760
GA 304 448 454 455 457 457 457 458 459 455
RBA 290 420 423 427 437 445 453 454 464 467
Table 11
Five project types, ten tasks and two resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 530 2463 2489 2464 2442 2508 2436 2457 2513 2509
GA 1757 2324 2330 2337 2336 2339 2346 2341 2339 2342
RBA 1769 2341 2356 2377 2376 2378 2389 2377 2392 2389
Table 12
Six project types, five tasks and two resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 1113 1632 824 1408 1660 1443 1740 1538 1310 1560
GA 1226 1308 1313 1312 1307 1301 1301 1299 1318 1311
RBA 1210 1145 1107 1106 1127 1084 1100 1098 1095 1086
this problem is advantageous for the longest task first priority rule of shows that ADP produces higher*** results than RBA in 𝜆𝑗 = 0.1 and
RBA. However, Table 10 shows that ADP usually produces higher*** 𝜆𝑗 = 0.5.
profits than alternatives except for low project arrival rates 𝜆𝑗 = 0.01 In summary, our comparison of larger size problems shows that
and 𝜆𝑗 = 0.1. ADP’s profit was higher* than GA in 53 of 70 problem scenarios; GA’s
profit was higher* for 13 problem scenarios, and there was NS dif-
In five project types, ten tasks and two resource types problems,
ference in the remaining 4 problem scenarios. ADP generated higher*
the tardiness costs are set to approximately 10% of the project rewards.
profits than RBA in 55 problem scenarios; RBA’s profit was higher*
Table 11 shows that ADP’s profits are higher*** than GA and RBA from
than ADP only in 12 problem scenarios, and results between ADP and
𝜆𝑗 = 0.1 to 𝜆𝑗 = 0.9.
RBA are NS different in 3 problem scenarios. These results show that
Six project types, five tasks and two resource types problems consist the overall performance of ADP is better* in the majority of larger-size
of six copies of the same project with different reward and tardiness problems than GA and RBA.
cost combinations. In Table 12 ADP produces higher*** profits for most
project arrival probabilities except for 𝜆𝑗 = 0.01 and 𝜆𝑗 = 0.2. ADP’s and 6. Conclusion
GA’s profits are NS different at 𝜆𝑗 = 0.8.
The ten project types, ten tasks and two resource types problems are Paper summary. In this paper, we modelled the dynamic and
the largest problems we generate with ProGen/Max. In this problem, stochastic RCMPSP as an infinite-horizon discrete-time MDP where
we arbitrarily assigned rewards and tardiness costs. Table 13 shows projects have identical arrival probabilities at each transition time, and
that ADP has higher*** profits than GA and RBA at all project arrival tasks have random durations. Our objective function maximises the
probabilities. expected total discounted long-run profit. We used a linear approxi-
mation model to design a practical scheduling policy and showed that
Two project types, thirty tasks and four resource types problem is
it performs near-optimally in small problems and compares favourably
the smallest MPSPLIB problem in our comparison. The problem name
to existing heuristics in large problems.
is mp_j30_a2_nr2_set in MPSPLIB, and it has one global and three local
The motivation of this study is to create a more comprehensive
resources. The resource strength of type one resource is 0.207. But
project scheduling model by considering the uncertainties of stochastic
other resource strength values are 0.01, 0 and 0. These values represent
task durations, random new project arrivals, multiple types of resource
that at least one large task cannot be processed in parallel with different
usages and bigger and complex project networks. For this purpose, we
tasks and might create a bottleneck. Table 14 shows that ADP leads to
suggest a linear approximation model which generates decisions based
higher*** profits than GA at most arrival probabilities except for 𝜆𝑗 =
on resource consumption and decision rewards. Our linear approxima-
0.1. ADP’s profits are higher*** than RBA at most arrival probabilities
tion model generated the best profits after the exact methods in our
except for 𝜆𝑗 = 0.1 and 𝜆𝑗 = 0.3.
comparisons and contributed to the literature by extending the work
Five project types, thirty tasks and four resource types problem is of Satic et al. (2022) which only considered small-sized projects with
the largest problem we have used in our comparison. The problem sequential networks and single resource type.
name is mp_j30_a5_nr4_set in MPSPLIB. The original problem has three This study provides an efficient ADP algorithm for dynamic and
global and one local resource type. We used the given due dates in stochastic RCMPSP, which generates profits that are significantly higher
the problem without changing them. The resource strengths of resource than or equal to the profits of ORBA, GA and RBA in respectively
types one, two and three are high (0.449–0.687) and the order strengths 80%, 81% and 87% of our comparisons. DP produces better* results
of these projects are low. In other words, many tasks can be processed than ADP for small-size problems. However, it suffers from the curse of
together, and free-resource availability may allow it easily. Table 15 dimensionality and is not suitable for larger problems. ADP is a viable
464
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
Table 13
Ten project types, ten tasks and two resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 650 592 576 960 928 865 889 894 932 884
GA 522 536 546 544 539 542 546 540 540 548
RBA 552 550 555 554 553 556 549 552 554 555
Table 14
Two project types, thirty tasks and four resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 176 184 221 206 224 212 213 214 214 214
GA 168 197 195 192 193 192 192 190 189 193
RBA 161 200 208 209 207 209 210 209 209 210
Table 15
Five project types, thirty tasks and four resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 721 1515 864 1143 904 1748 913 1104 1591 1595
GA 719 1546 1665 1706 1727 1740 1741 1748 1755 1754
RBA 717 1493 1573 1589 1609 1614 1617 1627 1623 1627
Table A.18
solution method for more practical, larger scale problems that cannot Three project types and two tasks problem.
Also, this study gives insights to project managers to determine Number of project types 3
Project type no 1 2 3
more suitable methods for their environment by providing a perfor-
Completion reward 8 5 20
mance comparison of ADP, DP, ORBA, GA and RBA methods in various Due date 10 8 10
project settings and various arrival probabilities. We suggest using Tardiness cost 5 3 19
DP for problems that are within the computational limitations of DP. Order strength 1.00 1.00 1.00
However, we suggest using ADP methods for larger problems. Resource attributes
Future research direction. In real life, there are more dynamic and Resource factor 1.00
stochastic elements in dynamic project scheduling environments than Number of resource types 1
those (stochastic task durations, uncertain new project arrivals, finish to Resource type no 1
Resource amounts 3
start project networks, multiple resource types and usage) considered in Resource strength 0.00
this paper. Future work might consider other elements of the dynamic
project scheduling environment, such as stochastic resource availability
or multiple modes of task processing.
Appendix B. Modelling assumptions and model generality
Acknowledgements Models are always simplifications of reality, but may still be useful
for making decisions if they capture the key problem features and are
We acknowledge Mahshid Salemi Parizi for making their code avail- solvable within a practical time amount.
able. The first author acknowledges the Ministry of National Education Project due dates and task durations are given in whole numbers
of The Republic of Turkey for providing a PhD scholarship. We thank of periods, and available resources and task resource usages are given
the two anonymous reviewers for their careful and detailed reviews of in non-negative whole units. If a task would in practice last for a
the paper. fractional number of periods, we round it up to an integer assuming
that the resource employed on that task cannot be re-allocated to a new
task until the beginning of the next period (for the remainder of the
Appendix A. Problem attributes
period during which a task is completed, the resource may be allowed
to take a vacation or be allocated to other tasks which are shorter
See Tables A.16–A.25. than one period or are preemptive; these are however not included in
465
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
Table A.19 may be allowed to take a vacation or be allocated to other tasks which
Five project types, five tasks and four resource types problem.
require less than one resource unit; these are however not included in
Project attributes the considered system).
Number of project types 5 Geometrically distributed inter-arrival times are the discrete time
Project type no 1 2 3 4 5
Completion reward 63 50 19 48 46
analogue of exponentially distributed inter-arrival times in a continu-
Due date 54 105 232 96 139 ous time setting that are commonly deployed to model random arrivals.
Tardiness cost 21 25 15 25 25 The benefit of the memoryless property of this distribution means that
Order strength 0.80 0.70 0.50 0.50 0.40 the probability of a project arrival is constant and independent of the
Resource attributes system state. Other discrete probability distributions can be used with
Resource factor 0.80 some additional effort. For example, assume that the project arrival
Number of resource types 4.00 probability distribution is defined by the number of periods since the
Resource type no 1 2 3.00 4.00 last project arrival of the same type. In that case, by expanding the state
Resource amounts 18 19 15 18
Resource strength 0.12 0.1 0.08 0.13
to include this information, we can evaluate the conditional probability
of an arrival in the next period given the number of periods since the
last arrival.
Table A.20
The technical assumptions of our model such as the existence of
Two project types, ten tasks and two resource types problem.
project types and the limit of 1 project of each type in the system are
Project attributes
features that provide structure for our model which is then exploited
Number of project types 2
to find a solution efficiently. The model is however still useful even
Project type no 1 2
Completion reward 10 72
in situations that do not strictly fit these technical assumptions, as
Due date 125 129 we describe below. The model could be extended in a straightforward
Tardiness cost 5 15 manner allowing for more than 1 project of each type, at an expense of
Order strength 0.51 0.51 a more complicated notation (required additional indexing by project
Resource attributes number) and dynamics, which would potentially lead to an efficient
Resource factor 0.75 algorithm to solve it, but this modification was omitted in this paper
Number of resource types 2 as we believe the current model captures the key problem features.
Resource type no 1 2
Resource amounts 12 16
Our model (and algorithm) can be directly used to allow for any
Resource strength 0.10 0.18 fixed number of projects instead of just one project of each type. For
instance, if a company considers 2 real project types and is willing to
accept in parallel up to three projects of the first real project type and
Table A.21
Five project types, ten tasks and two resource types problem.
up to five projects of the second real project type, then our model with
8 model project types (limiting each model project type to one project)
Project attributes
can be employed, by defining each of the first three model project
Number of project types 5
Project type no 1 2 3 4 5
types identically to the first real project type and each of the remaining
Completion reward 99 83 85 98 92 five model project types identically to the second real project type. So,
Due date 131 138 166 135 180 our model will treat projects of the same type as different types. The
Tardiness cost 10 8 9 10 9 arrival probabilities of each model project type can be approximately
Order strength 0.36 0.36 0.36 0.69 0.67
obtained by dividing the arrival probability of the real project type by
Resource attributes the number of projects acceptable in parallel.
Resource factor 0.70 Actually, one of our examples uses this modelling trick. In our ‘‘Six
Number of resource types 2
project types, five tasks and two resource types’’ problem, all projects
Resource type no 1 2
Resource amounts 19 20 are copies of each other so there is actually only one type of project but
Resource strength 0.11 0.13 its capacity in the system becomes six. Although in that problem we
assigned different rewards, duration and tardiness costs, these project
variables could have been identical. So, multiple projects of the same
Table A.22
Six project types, five tasks and two resource types problem.
type can be accommodated in our model.
Project attributes The number of projects of each type in the system is naturally
Number of project types 6
limited by the amount of resources and the ratio between the due
Project type no 1 2 3 4 5 6 date and the minimum project duration. Suppose that a company is
Completion reward 99 75 50 30 10 80 managing one resource and the due dates are tight enough not allowing
Due date 177 177 177 177 177 177 to complete two projects one after the other by the earlier due date
Tardiness cost 25 25 25 25 5 60
(i.e., the above ratio is strictly lower than 2). Then it would unlikely
Order strength 0.50 0.50 0.50 0.50 0.50 0.50
be a good idea to have three or more projects of the same type in
Resource attributes
the system at any given time, because every project beyond the first
Resource factor 1.00 two (in the order of completion) would surely incur the tardiness cost.
Number of resource types 2
Resource type no 1 2
Therefore, perhaps except in some rare situations in which processing a
Resource amounts 24 18 project incurring the tardiness cost is still better than processing other
Resource strength 0.15 0.11 projects, it would be wise for the company to limit the number of
projects of each type to two, which would then allow them to use our
model to optimise the scheduling of the projects to achieve the highest
possible profit.
the considered system). Analogously, if a task would in practice use
Another limitation arising in practice, in some sectors such as
a fractional number of resource units, we round it up to an integer
software development, production, construction and R&D, is where
assuming that each resource unit employed on that task cannot be split
some types of resources are shared among all types of projects but
between several tasks (the remainder of the fractional resource unit
some resources are specific to one project type. For example, imagine a
466
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
Table A.23
Ten project types, ten tasks and two resource types problem.
Project attributes
Number of project types 10
Project type no 1 2 3 4 5 6 7 8 9 10
Completion reward 36 37 93 69 38 62 12 80 52 62
Due date 429 354 654 771 447 288 448 1124 288 708
Tardiness cost 29 14 70 24 18 21 2 50 40 29
Order strength 0.36 0.58 0.89 1.00 0.51 0.53 0.51 0.53 0.69 0.67
Resource attributes
Resource factor 0.70
Number of resource types 2
Resource type no 1 2
Resource amounts 11 11
Resource strength 0.01 0.01
Table A.24 to their key characteristics (such as the number of tasks, durations
Two project types, thirty tasks and four resource types problem.
of tasks and required resources) into project types while potentially
Project attributes heterogeneous project rewards and tardiness costs can be replaced by
Number of project types 2 their corresponding average values.
Project type no 1 2
Completion reward 40 19
Due date 137 74 Appendix C. The effect of arrival probabilities
Tardiness cost 20 9
Order strength 0.33 0.57
Resource attributes A common feature of the algorithms’ performance in Section 5 is
that accrued profit is relatively consistent for scenarios with higher
Resource factor 0.50
Number of resource types 4 arrival probabilities. This is particularly true in the larger instances.
Resource type no 1 2 3 4 Fig. C.2 highlights this phenomenon for four of the project scenarios
Resource amounts 21 11 10 10 by considering relative measures of profit generation and resource
Resource strength 0.20 0.01 0 0 consumption for ADP Model 3. $/Max is the expected total discounted
long-run profit achieved for the given arrival probability expressed as a
percentage of the maximal profit achieved across all tested arrival prob-
abilities. R𝑘_usage, 𝑘 = 1, … , 𝐾 is the average usage of type 𝑘 resource
expressed as a percentage of total type 𝑘 resource available. Profits
Table A.25
Five project types, thirty tasks and four resource types problem.
increase in line with the increase in arrival probability, but the increase
typically arrests as the resource usage plateaus. This indicates there is
Project attributes
little additional profit gain as resources approach their consumption
Number of project types 5
Project type no 1 2 3 4 5
limits or capabilities for processing tasks in parallel.
Completion reward 48 2 22 32 56 We argue that there is value in investigating the algorithmic per-
Due date 62 82 72 89 72 formance across the test range of arrival probabilities used. In order
Tardiness cost 24 1 11 16 28 to achieve these profits, policies deployed by the algorithms should
Order strength 0.45 0.44 0.39 0.50 0.44
respond to the differing arrival rates. For example, take a problem with
Resource attributes two identical project types that differ only by one project’s net profits
Resource factor 0.25 (on-time and tardy) exceeding the other project’s. Under low arrival
Number of resource types 4
probabilities, the system will experience periods where no, or only one
Resource type no 1 2 3.00 4.00
Resource amounts 58 78 102 19 project, is available for processing. A sensible use of resources will be
Resource strength 0.45 0.65 0.69 0.13 to progress both projects towards on-time completions. At high arrival
probabilities, the availability of both project types for processing is
more common. A sensible policy will focus on processing the higher
return project and deploys resource to the other project when it is
software development company that has two types of projects: mobile viable to do, for example, in parallel or if it is the only project in the
app development and web app development. Each type of project system at that time.
requires a different set of programming skills and testing environments. All policies we consider do this to some extent, as seen in Section 5,
If these special resources are limited to only one, only one project from with strongly performing policies better able to take advantage. DP will
each type can be processed in the system at once, while different project do this optimally, at the expense of evaluating the policy. ADP can
types can be processed in parallel. In these environments, the system learn to do this through appropriate training, by taking into account
cannot process multiple projects of the same type at once and therefore the downstream profit implications of their actions. Reactive baseline
it makes sense not to accept more than one project of each type in the algorithms, by assuming no future arrivals, have a limited view of the
system. future profit implications of their actions. But good policies in this
Classifying projects into types may often be possible in practice, class would aim to maximise the net profit from clearing the existing
even for companies that deal with bespoke projects. Our model can be projects from the system and, hence, would prioritise the higher value
employed at a chosen level of detail, but there is a trade-off. A higher project. Rule-based algorithm performance depends on how the rule is
level of detail would possibly lead to more different types of projects, applied. If the two projects were in an identical state and there was
i.e., to a larger problem which may be harder to solve although the sufficient resource to process a single task from one of these projects,
arrival probabilities would be smaller (or zero in the most extreme then the longest task rule would randomly decide on which project to
case of the highest level with all projects bespoke). On the other hand, action. This could unnecessarily delay one or both project completions
a lower level of detail would allow for grouping projects according resulting in lower net profits.
467
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
Fig. C.2. Relative profit performance and resource consumption for ADP Model 3 in four problem scenarios.
Table C.26
Performance analysis of DP policies design for problem given arrival probabilities applied in problems for different arrival
probabilities.
𝜆𝑗 DP 1% 10% 20% 30% 40% 50% 60% 70% 80% 90%
1% 199.3 199.3 199.3 199.4 199.2 198.9 198.3 198.0 198.0 197.8 197.6
10% 878.3 877.4 878.3 875.8 862.4 844.2 815.6 803.9 801.6 796.0 785.3
20% 1043.9 1036.1 1039.6 1043.9 1033.7 1016.0 986.0 973.0 968.4 963.3 953.1
30% 1122.0 1085.0 1093.0 1110.7 1122.0 1117.8 1099.1 1090.3 1086.2 1081.7 1074.1
40% 1196.6 1114.1 1124.3 1155.4 1189.2 1196.6 1189.7 1186.4 1182.2 1178.6 1170.9
50% 1263.5 1141.4 1153.5 1195.6 1245.9 1260.8 1263.5 1263.1 1259.2 1255.7 1250.7
60% 1324.7 1171.5 1184.1 1234.7 1294.5 1314.5 1320.8 1324.7 1323.4 1321.9 1316.0
70% 1377.6 1205.3 1217.4 1272.3 1334.5 1360.4 1370.3 1375.5 1377.6 1376.2 1373.2
80% 1426.8 1247.8 1254.0 1309.8 1369.4 1401.2 1413.5 1419.4 1425.0 1426.8 1422.1
90% 1467.9 1302.5 1295.0 1347.7 1396.6 1440.4 1451.9 1455.9 1464.5 1468.3 1467.9
We expand the discussion by considering the robustness of policies development time. Management Science, 41(3), 458–484. http://dx.doi.org/10.1287/
to deviations in the arrival probabilities that they were designed for. mnsc.41.3.458.
Ahuja, V., & Birge, J. R. (2020). An approximation approach for response adaptive
We focus on optimal DP policies for the three projects, two tasks and
clinical trial design. INFORMS Journal on Computing, 32(4), 877–894. http://dx.
one resource problem. In Table C.26 we present the expected total doi.org/10.1287/ijoc.2020.0969.
discounted long-run profit of the optimal policy for each 𝜆𝑗 , 𝑗 ∈ , Capa, C., & Ulusoy, G. (2015). Proactive project scheduling in an R&D department a bi-
applied to all arrival probability scenarios. In each instance, we ran objective genetic algorithm. In 2015 International conference on industrial engineering
and operations management (IEOM), Vol. 1 (pp. 1–6). http://dx.doi.org/10.1109/
10,000 simulations. The first two columns show the optimal profit
IEOM.2015.7093733.
(DP) for the problem scenario with arrival rate 𝜆𝑗 . The remaining Chen, H., Ding, G., Zhang, J., & Qin, S. (2019). Research on priority rules for the
columns show the profit performance of the DP policy optimised for the stochastic resource constrained multi-project scheduling problem with new project
arrival probability in the column header applied to the problem with arrival. Computers & Industrial Engineering, 137, Article 106060. http://dx.doi.org/
10.1016/j.cie.2019.106060.
arrival probability given in the 𝜆𝑗 -row. Naturally, the diagonal of these
Choi, J., Realff, M. J., & Lee, J. H. (2007). A Q-learning-based method applied
columns is exactly the optimal profit in column DP. Away from the to stochastic resource constrained project scheduling with new project arrivals.
diagonal, we see the value in a policy being able to adapt to different International Journal of Robust and Nonlinear Control, 17(13), 1214–1231. http:
conditions. As we move along a row, away from the optimal profit, the //dx.doi.org/10.1002/rnc.1164.
Cohen, I., Golany, B., & Shtub, A. (2005). Managing stochastic, finite capacity,
performance of policies designed for different arrival rates deteriorates.
multi-project systems through the cross-entropy methodology. Annals of Operations
For low and high arrival probability there can be a close level of Research, 134(1), 183–199. http://dx.doi.org/10.1007/s10479-005-5730-1.
performance for policies designed for the adjacent arrival probabilities. Creemers, S. (2015). Minimizing the expected makespan of a project with stochastic
In summary, our investigations highlight the effect of project arrival activity durations under resource constraints. Journal of Scheduling, 18(3), 263–273.
http://dx.doi.org/10.1007/s1095.
probabilities on profitable project selection. Comparing policy perfor- Davis, M. T., Robbins, M. J., & Lunday, B. J. (2017). Approximate dynamic program-
mance across a broad range of arrival probabilities is useful to establish ming for missile defense interceptor fire control. European Journal of Operational
the overall effectiveness of the algorithms considered. Research, 259(3), 873–886. http://dx.doi.org/10.1016/j.ejor.2016.11.023.
Fliedner, T., Gutjahr, W., Kolisch, R., & Melchiors, P. (2012). Solving the dynamic
stochastic resource-constrained multi-project scheduling problem with SRCPSP-
References methods. In Proceedings of the 13th international conference on project management
and scheduling, Leuven, Belgium: KU Leuven (pp. 148–151).
Adler, P. S., Mandelbaum, A., Nguyen, V., & Schwerer, E. (1995). From project Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory
to process management: An empirically-based framework for analyzing product of NP-completeness. New York, NY: W. H. Freeman & Co..
468
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469
He, F., Yang, J., & Li, M. (2018). Vehicle scheduling under stochastic trip times: Parizi, M. S., Gocgun, Y., & Ghate, A. (2017). Approximate policy iteration for dynamic
An approximate dynamic programming approach. Transportation Research Part C resource-constrained project scheduling. Operations Research Letters, 45(5), 442–447.
(Emerging Technologies), 96, 144–159. http://dx.doi.org/10.1016/j.trc.2018.09.010. http://dx.doi.org/10.1016/j.orl.2017.06.002.
Homberger, J. (2007). A multi-agent system for the decentralized resource-constrained Powell, W. B. (2009). What you should know about approximate dynamic program-
multi-project scheduling problem. International Transactions in Operational Research, ming. Naval Research Logistics, 56(3), 239–249. http://dx.doi.org/10.1002/nav.
14(6), 565–589. http://dx.doi.org/10.1111/j.1475-3995.2007.00614.x. 20347.
Kolisch, R., & Sprecher, A. (1996). PSPLIB: A project scheduling problem library. Powell, W. B. (2011). Approximate dynamic programming: Solving the curses of dimension-
European Journal of Operational Research, 96(1), 205–216. http://dx.doi.org/10. ality. Hoboken, NJ: John Wiley, http://dx.doi.org/10.1002/9781118029176.
1016/S0377-2217(96)00170-1. Ronconi, D. P., & Powell, W. B. (2010). Minimizing total tardiness in a stochastic single
Kolisch, R., Sprecher, A., & Drexl, A. (1995). Characterization and generation of machine scheduling problem using approximate dynamic programming. Journal of
a general class of resource-constrained project scheduling problems. Management Scheduling, 13, 597–607. http://dx.doi.org/10.1007/s10951-009-0160-6.
Science, 41(10), 1693–1703. http://dx.doi.org/10.1287/mnsc.41.10.1693. Satic, U., Jacko, P., & Kirkbride, C. (2020). Performance evaluation of scheduling
Li, H., & Womer, N. K. (2015). Solving stochastic resource-constrained project schedul- policies for the DRCMPSP. In M. Gribaudo, E. Sopin, & I. Kochetkova (Eds.),
ing problems by closed-loop approximate dynamic programming. European Journal Analytical and stochastic modelling techniques and applications, Vol. 12023 (pp. 100–
of Operational Research, 246(1), 20–33. http://dx.doi.org/10.1016/j.ejor.2015.04. 114). Cham: Springer International Publishing, http://dx.doi.org/10.1007/978-3-
015. 030-62885-7_8.
Li, H., Zhang, X., Sun, J., & Dong, X. (2023). Dynamic resource levelling in projects Satic, U., Jacko, P., & Kirkbride, C. (2022). Performance evaluation of scheduling
under uncertainty. International Journal of Production Research, 61(1), 198–218. policies for the dynamic and stochastic resource-constrained multi-project schedul-
http://dx.doi.org/10.1080/00207543.2020.1788737. ing problem. International Journal of Production Research, 60(4), 1411–1423. http:
Melchiors, P. (2015). Lecture notes in economics and mathematical systems, Dynamic and //dx.doi.org/10.1080/00207543.2020.1857450.
stochastic multi-project planning. Cham, Switzerland: Springer, http://dx.doi.org/10. Schütz, H.-J., & Kolisch, R. (2012). Approximate dynamic programming for capacity
1007/978-3-319-04540-5. allocation in the service industry. European Journal of Operational Research, 218(1),
Melchiors, P., & Kolisch, R. (2009). Scheduling of multiple R&D projects in a dynamic 239–250. http://dx.doi.org/10.1016/j.ejor.2011.09.007.
and stochastic environment. In Operations research proceedings 2008 (pp. 135–140). Schwindt, C. (1998). Generation of resource constrained project scheduling problems subject
Heidelberg: Springer, http://dx.doi.org/10.1007/978-3-642-00142-0_22. to temporal constraints: Report WIOR-543, Kaiserstrasse 12, D-76128 Karlsruhe,
Melchiors, P., Leus, R., Creemers, S., & Kolisch, R. (2018). Dynamic order acceptance Germany: Universitat Karlsruhe.
and capacity planning in a stochastic multi-project environment with a bottleneck Sutton, R. S., & Barto, A. G. (2018). Adaptive computation and machine learning,
resource. International Journal of Production Research, 56(1–2), 459–475. http://dx. Reinforcement learning: An introduction (second ed.). Cambridge, MA, USA: MIT
doi.org/10.1080/00207543.2018.1431417. Press.
Pamay, M. B., Bülbül, K., & Ulusoy, G. (2014). Dynamic resource constrained multi- Wellingtone PPM (2018). The state of project management annual survey 2018.
project scheduling problem with weighted earliness/tardiness costs. In P. S. Pulat, http://www.wellingtone.co.uk/wp-content/uploads/2018/05/The-State-of-Project-
S. C. Sarin, & R. Uzsoy (Eds.), International series in operations research & management Management-Survey-2018-FINAL.pdf, Accessed October 31, 2023.
science: vol. 200, Essays in production, project planning and scheduling (pp. 219–247).
Springer US, http://dx.doi.org/10.1007/978-1-4614-9056-2_10.
469