0% found this document useful (0 votes)

15 views16 pages

Dynamic Programming Approach To Solve Multi-Project Scheduling Problem-2024

Uploaded by

OmarIbrahimAl-Jabary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views16 pages

Dynamic Programming Approach To Solve Multi-Project Scheduling Problem-2024

Uploaded by

OmarIbrahimAl-Jabary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

European Journal of Operational Research 315 (2024) 454–469

Contents lists available at ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier.com/locate/eor

Discrete optimisation

A simulation-based approximate dynamic programming approach to

dynamic and stochastic resource-constrained multi-project scheduling
problem
U. Satic a,b ,∗, P. Jacko a,c , C. Kirkbride a
a
Lancaster University Management School, Bailrigg, Lancaster, LA1 4YX, United Kingdom
b Abdullah Gul University Faculty of Engineering, Kocasinan, Kayseri, 38080, Turkey
c Berry Consultants, Abingdon, Oxfordshire, OX14 5EG, United Kingdom

ARTICLE INFO ABSTRACT

Keywords: We consider the dynamic and stochastic resource-constrained multi-project scheduling problem which allows
Project scheduling for the random arrival of projects and stochastic task durations. Completing projects generates rewards, which
Markov decision processes are reduced by a tardiness cost in the case of late completion. Multiple types of resource are available, and
Approximate dynamic programming
projects consume different amounts of these resources when under processing. The problem is modelled as
Dynamic resource allocation
an infinite-horizon discrete-time Markov decision process and seeks to maximise the expected discounted
Dynamic programming
long-run profit. We use an approximate dynamic programming algorithm (ADP) with a linear approximation
model which can be used for online decision making. Our approximation model uses project elements that
are easily accessible by a decision-maker, with the model coefficients obtained offline via a combination of
Monte Carlo simulation and least squares estimation. Our numerical study shows that ADP often statistically
significantly outperforms the optimal reactive baseline algorithm (ORBA). In experiments on smaller problems
however, both typically perform suboptimally compared to the optimal scheduler obtained by stochastic
dynamic programming. ADP has an advantage over ORBA and dynamic programming in that ADP can be
applied to larger problems. We also show that ADP generally produces statistically significantly higher profits
than common algorithms used in practice, such as a rule-based algorithm and a reactive genetic algorithm.

1. Introduction refers to random project arrivals from different types of projects and
stochastic refers to uncertain task processing times. Dynamic general-
Project management and project scheduling are challenging. En- isations of RCMPSP are the dynamic RCMPSP and the dynamic and
gineering services, software development, IT services, construction stochastic RCMPSP. A discussion of the RCMPSP and its variants can
and R&D operate in dynamic environments, often processing multi-
be found in Satic et al. (2022).
ple projects simultaneously. Many unplanned factors may disturb the
The non-dynamic (i.e., static) variants of RCMPSP are extensively
project execution plan with new project arrivals and delays in task pro-
studied (Creemers, 2015). However, the dynamic variants of the
cessing. A recent project management survey (Wellingtone PPM, 2018)
showed that only 40% of projects are completed within their planned RCMPSP where new projects randomly arrive in the system are scarce
time, 46% of projects are completed within their predicted budget, and in the literature. To the best of our knowledge, there are only three
36% of projects realise their full benefits. In this paper we consider research papers available for the dynamic RCMPSP which are Pamay
the dynamic arrival of new projects and stochastic durations of tasks, et al. (2014), Parizi et al. (2017), Satic et al. (2020), and there are
and we propose a comprehensive model and solution approach for the only ten research papers available for the dynamic and stochastic
dynamic and stochastic resource-constrained multi-project scheduling RCMPSP which are Adler et al. (1995), Capa and Ulusoy (2015),
problem (dynamic and stochastic RCMPSP). Chen et al. (2019), Choi et al. (2007), Cohen et al. (2005), Fliedner
The dynamic and stochastic RCMPSP is a generalisation of the et al. (2012), Melchiors (2015), Melchiors and Kolisch (2009), Mel-
precedence-constrained scheduling problem, which was shown to be
chiors et al. (2018), Satic et al. (2022). They adopt different solution
an NP-hard problem in Garey and Johnson (1979, p. 239). Thus the
approaches, which have their own strengths and weaknesses.
dynamic and stochastic RCMPSP is also an NP-hard problem. Dynamic

∗ Corresponding author at: Lancaster University Management School, Bailrigg, Lancaster, LA1 4YX, United Kingdom.
E-mail address: [email protected] (U. Satic).

https://doi.org/10.1016/j.ejor.2023.10.046
Received 9 August 2021; Accepted 31 October 2023
Available online 10 November 2023
0377-2217/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Adler et al. (1995), Cohen et al. (2005), Melchiors and Kolisch resource consumption and decision rewards as features and can be used
(2009) took advantage of the well-developed queueing network ap- for online decision making after estimating the coefficients of the linear
proach where interdependent resources process project tasks. This re- value function approximation in a simulation-based training phase.
quires consideration of projects of relatively simple structure such We compare the performance of our ADP algorithm with four solu-
as tasks requiring the allocation of a single unit of a single type of tion approaches from Satic et al. (2022), namely a DP algorithm that
resource. Capa and Ulusoy (2015), Fliedner et al. (2012), Pamay et al. computes the optimal policy; an optimal reactive baseline algorithm
(2014) considered a reactive scheduling method which generates a
(ORBA) and a genetic algorithm (GA) that both generate schedules to
baseline schedule for current projects and then updates it at each time a
maximise the total profit of ongoing projects; and a rule-based algo-
new project arrival disrupts the schedule. This approach can be remark-
rithm (RBA) that uses the longest task first rule to guide the allocation
ably suboptimal as evidenced in our computational study in Section 5.
of remaining resources to tasks.
Melchiors et al. (2018), Satic et al. (2022) modelled the problem as
a Markov decision process (MDP), using dynamic programming (DP) We run our benchmark tests on the problems of Satic et al. (2022).
to evaluate optimal policies. This solution approach suffers from the In addition, we generate new comparison problems that are larger and
curse of dimensionality and thus can only be used for unrealistically include non-sequential project networks and multiple resource types.
small problems. Chen et al. (2019) divided the multi-project problem The larger size problems are computationally intractable for DP and
into states according to the project’s completion conditions and then ORBA; thus, we benchmark ADP with GA and RBA on these problems.
searched best priority rules for each state, but priority rules are notably We contribute to the literature by (i) a new comprehensive MDP
prone to be suboptimal. model which considers the random arrival of new projects, stochastic
Our methodological approach is similar to Choi et al. (2007), Mel- task durations, multiple resource types, non-sequential project net-
chiors (2015), Parizi et al. (2017) in that we also formulate the problem works, project completion rewards, project due dates and tardiness
as an MDP and design a scheduling policy via approximate dynamic costs, (ii) a new approximation function that uses project completion re-
programming (ADP). However, our model is notably more comprehen- wards, tardiness costs and spent resource amounts for decision making,
sive and allows for solving problems that are larger and/or have a more and is capable of solving much larger, more complex and much more
complex structure, which are closer to those appearing in practice.
general problems than ADPs from existing literature, (iii) an extensive
Choi et al. (2007) considered applications in the agricultural and phar-
simulation study illustrating the strengths and weaknesses of different
maceutical industries; thus, they focused on serial project networks,
approaches, (iv) benchmarking with DP and ORBA whenever tractable
stochastic task outcomes (success or failure), a single resource type,
and with two other approaches in larger problems, (v) developing an
single resource usage per task and no project due dates. Melchiors
(2015, chapter 7) conducted experiments on small problems with two efficient implementation of the proposed ADP method in the Julia
projects with three tasks with a single unit of resource capacity for each programming language to solve dynamic and stochastic RCMPSPs.
resource type, tasks require a single unit of resource only, identical This paper is organised as follows: In Section 2, we describe the
project networks for both projects, rejection, holding and processing problem setting, the MDP model and our goal function. In Section 3,
costs, but no project due dates. Parizi et al. (2017) considered deter- we describe our ADP algorithm and its coefficient training procedure.
ministic task processing times with rejection, holding and processing In Section 4, we describe the comparator algorithms and discuss the
costs. Their numerical study had short simulation durations with heavy comparison results in Section 5. In Section 6 we conclude.
discounting.
ADP is a powerful tool that provides researchers with the ability
to adjust the complexity of the optimisation model to trade-off the 2. The dynamic and stochastic RCMPSP model
solution complexity of large (realistic) problems at the expense of
a modest suboptimality. An acceptable trade-off can be achieved by 2.1. Problem setting
careful mathematical modelling of the problem in hand; this is in con-
trast to general purpose methods such as genetic algorithms and other
A project is a group of tasks which are bound to each other with
heuristics which typically rely only on tuning of algorithm parameters.
Our literature summary shows that ADP has been used in dynamic predecessor–successor relationships, also called a project network. We
variants of the RCMPSP such as Choi et al. (2007), Melchiors (2015), consider ‘‘finish to start’’ precedence relations between tasks. There are
Parizi et al. (2017). It has also been applied in static variants of the 𝐾 types of renewable resources and the available integer number of
RCMPSP (Li & Womer, 2015; Li et al., 2023). Outside of applications units is represented by 𝐵𝑘 > 0 for resource type 𝑘 = 1, … , 𝐾. These
in project scheduling, ADP methods have been applied in areas such resources need to be allocated to process tasks of random duration from
as clinical trials (Ahuja & Birge, 2020), vehicle scheduling (He et al., randomly arriving projects.
2018), capacity allocation (Schütz & Kolisch, 2012), machine schedul- We assume that each arriving project can be categorised as one of
ing (Ronconi & Powell, 2010) and missile defence systems (Davis et al., the possible project types ( = {1, 2, … , 𝐽 }). All projects of type 𝑗 share
2017). features such as: inter-arrival time distribution 𝛬𝑗 , completion reward
We consider new projects arriving at random during the execution 𝑟𝑗 , due date 𝐹𝑗 , and tardiness cost 𝑤𝑗 ; set of tasks 𝑗 = {1, 2, … , 𝐼𝑗 },
of ongoing projects, project completions generate rewards which are project network with sets 𝑗,𝑖 of predecessors of each task 𝑖, task
decreased by tardiness costs if completed after their respective due resource usages 𝑏𝑘𝑗,𝑖 , and task duration distribution 𝛤𝑗,𝑖 .
dates, processing times of the project tasks are uncertain, multiple types
The system always accepts newly arrived projects until the system
of resources are available and multiple amounts of resources can be
capacity for type 𝑗 projects is reached and rejects the remaining newly
used by each project type. Thus, our model can help optimise the
arrived projects. We only consider non-preemptive task processing; thus,
use of company resources, reduce project delays, and improve overall
task processing cannot be paused from after it began until the end of its
productivity. We model the problem as an infinite-horizon discrete-
time MDP and seek to maximise the expected total discounted long-run random duration. The duration of a task is not revealed to the decision
profit. maker until it is actually reached.
In this paper we show that ADP is a very useful and advanta- When all tasks of a project are processed, the project is completed,
geous method for the dynamic and stochastic RCMPSP. We use an and the project completion reward is earned. However, if the project
ADP algorithm with a linear approximation model to approximate the due date passes before the project is completed, the tardiness cost is
value function of the Bellman equation. Our approximation model uses incurred.

455
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

the assumption that at most one project of type 𝑗 can be present in

the system in any given period, we can use 𝜆𝑗 as the probability of
arrival of a single project of type 𝑗 during the transition time. At the
end of every period, the system only accepts a new arrival of a type 𝑗
project if no type 𝑗 project exists in the system, either in processing or
waiting. Otherwise, the system rejects the new arrival and continues its
processing as if there was no arrival.
To transform the task duration distribution 𝛤𝑗,𝑖 to discrete time, we
consider stochastic task durations such that a task may be completed in
some period between minimal possible task duration 𝑡min 𝑗,𝑖 and maximal
possible task duration 𝑡max 𝑗,𝑖 . The probability that a task completes in the
current period is 𝛾𝑗,𝑖 (⋅).
See Appendix B for a more detailed discussion of these and other
modelling assumptions.

Fig. 1. Discrete-time Markov Decision Process. 2.4. Model dynamics

2.4.1. Pre-decision state

2.2. Modelling framework For the dynamic and stochastic RCMPSP, a pre-decision state 𝑠
represents the system information available at a decision epoch. The
We consider the dynamic and stochastic RCMPSP as an infinite set of all pre-decision states is called the state space . A pre-decision
horizon Discrete-Time Markov Decision Process (DT-MDP) model. A DT- state 𝑠 consists of project states 𝑷 𝑗 for all project types 𝑗 ∈  :
MDP is 5-tuple consisting of state space , set of actions (𝑠), transition
function 𝑃 (𝑠′ |𝑠, 𝑎), the immediate profit 𝑅𝑠,𝑎,𝑠′ and discount factor 𝛼. 𝑠 = {𝑷 1 , 𝑷 2 … , 𝑷 𝐽 }. (1)
The DT-MDP is a discrete decision model where a decision-maker
A project state consists of task states 𝑥𝑗,𝑖 of tasks 𝑖 ∈ 𝑗 and the
observes the system state in regular time instants, takes an action in
remaining due date state 𝑑𝑗 :
the observed state to maximise the discounted profit, and the state
randomly changes. This problem can be modelled as a DT-MDP as the 𝑷 𝑗 = (𝑥𝑗,1 , 𝑥𝑗,2 , … , 𝑥𝑗,𝐼𝑗 , 𝑑𝑗 ), (2)
transition probabilities and reward functions can be defined to depend
1
only on the current state of the system and the action taken in that so that the pre-decision state can be seen as :
state, so the Markov property holds. ⎡ 𝑥1,1 , 𝑥1,2 , … 𝑥1,𝐼1 , 𝑑1 ⎤
The problem is modelled using discrete time periods 𝑡 = 1, 2, …. ⎢ 𝑥 , 𝑥2,2 , … 𝑥2,𝐼2 , 𝑑2 ⎥
Each period represents a unit of time (e.g., a week) which would be 𝑠 = ⎢ 2,1 ⎥. (3)
⎢ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥
appropriate for a particular real-life project scheduling problem. A ⎢ 𝑥 , 𝑥𝐽 ,2 , … 𝑥𝐽 ,𝐼𝐽 , 𝑑𝐽 ⎥
decision epoch is the beginning of each period 𝑡 at which the allocation ⎣ 𝐽 ,1 ⎦
of available resources to initiate project tasks takes place. The transition A task state 𝑥𝑗,𝑖 represents the status of the 𝑖th task of project type 𝑗:
time between the beginning and end of a period allows for completions
of tasks, completions of projects, and arrivals of new projects. At the 𝑥𝑗,𝑖 ∈ {−1, 0, 1, 2 … , 𝑡max
𝑗,𝑖 − 1}. (4)
end of each period, arriving projects are accepted to the system if there If task 𝑖 is ‘‘pending for processing’’, its state is −1. If task 𝑖 is ‘‘in
is capacity to do so; otherwise, they are rejected. This process repeats processing’’ (i.e., has been in processing for at least 1 period), its
for all new states over the infinite horizon. Fig. 1 illustrates this process. state shows the remaining task processing time to its maximal possible
We use terms pre-decision state 𝑠 for a state before the action is
duration 𝑡max
𝑗,𝑖 . If task 𝑖 is ‘‘completed’’, its state becomes 0.
taken, post-decision state 𝑠̂ for a state after the action is taken, and pre-
A due date state 𝑑𝑗 is the remaining due date of a type 𝑗 project
acceptance state 𝑠∗ for a state immediately before the decision about
and its value shows the number of remaining periods from the current
accepting or rejecting projects that have arrived during the current
decision epoch to the project due date:
period. We assume that the system transitions from a pre-decision to a
post-decision state, and from a pre-acceptance to a pre-decision state, 𝑑𝑗 ∈ {0, 1, 2, … , 𝐹𝑗 }. (5)
are instantaneous. These terms are described in detail below.
Processing task 𝑖 of project 𝑗 requires allocation of 𝑏𝑘𝑗,𝑖 units of In a decision epoch, if a due date state value is zero (𝑑𝑗 = 0) while the
resource 𝑘 in each period for the duration of the task. The unallocated project still has some uncompleted tasks (i.e., 𝑥𝑗,𝑖 = −1 or 𝑥𝑗,𝑖 > 0), a
resources are called free-resources 𝐵𝑘free (𝑠), and allocated resources are tardiness cost 𝑤𝑗 is deducted from the project reward 𝑟𝑗 at the project’s
added to free resources at the end of the period during which their completion.
assigned task is completed. Allocated resources are removed from The absence of a project type 𝑗 is shown by a project state 𝑷 𝑗 where
free resources instantaneously when a task starts processing at the all task states are 0 (∀𝑖 ∶ 𝑥𝑗,𝑖 = 0). For these cases the due date state of
beginning of a period. type 𝑗 project is taken as 0 (𝑑𝑗 = 0):

𝑷 𝑗 = (0, 0, … , 0, 0). (6)

2.3. Modelling assumptions
An accepted arrival of a project type 𝑗 at the end of the previous
A requirement of our discrete model is that project due dates and period is represented by a project state 𝑷 𝑗 where all task states are
task durations are given in whole numbers of periods, and available −1 (∀𝑖 ∈ 𝐼𝑗 ∶ 𝑥𝑗,𝑖 = −1) and the due date state’s value is 𝐹𝑗 :
resources and task resource usages are given in non-negative whole
units. We also limit the maximum number of projects from each project 𝑷 𝑗 = (−1, −1, … , −1, 𝐹𝑗 ). (7)
type in the system to one to simplify the notation of the presented
model.
To transform the inter-arrival time distribution 𝛬𝑗 to discrete time, 1
Note that (3) is not a full matrix when project types have different
we consider geometrically distributed inter-arrival times. Together with numbers of tasks.

456
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

An important element of the pre-decision state is also the number 2.4.4. Pre-acceptance state
of free resources, 𝐵𝑘free (𝑠) ∈ {0, 1, … , 𝐵𝑘 }, 𝑘 = 1, … , 𝐾. This is not A pre-acceptance state 𝑠∗ represents the system information at the
explicitly included as part of the state because it can be calculated from end of the transition time but immediately before the pre-decision
the other state elements via: state 𝑠′ at the following decision epoch. A pre-acceptance state consists
𝐼 of its project states 𝑷 ∗𝑗 which consist of pre-acceptance task states
∑
𝐽 ∑𝑗
𝐵𝑘free (𝑠) = 𝐵𝑘 − 𝑏𝑘𝑗,𝑖 𝛶 {𝑥𝑗,𝑖 > 0}, (8) 𝑥∗𝑗,𝑖 for each task 𝑖 ∈ 𝑗 and pre-acceptance due date states 𝑑𝑗∗ . A
𝑗=1 𝑖=1 pre-acceptance state shows the task processing progress after a post-
where 𝛶 {⋅} is an indicator function. decision state during the transitional time without new project arrivals.
Eq. (17) shows possible task state transitions from a 𝑥̂ 𝑗,𝑖 to 𝑥∗𝑗,𝑖 with their
2.4.2. Action probabilities. As (15) presents, a pre-acceptance due date state is zero
An action 𝑎 represents the decision regarding which tasks to begin (𝑑𝑗∗ = 0) if project type 𝑗 has just been completed or if the post-decision
processing from those tasks whose states are ‘‘pending for processing’’. due date state is zero. For the other possibilities, a pre-acceptance due
The action consists of action elements 𝑎𝑗,𝑖 for each task 𝑖 ∈ 𝑗 of all date state is equal to the post-decision due date minus one:
project types (𝑗 ∈  ). An action element takes the value of 1 (𝑎𝑗,𝑖 = 1) ⎧0, if 𝑑𝑗 = 0,
to represent the decision to start processing a qualifying task and takes ⎪
𝑑𝑗∗ = ⎨0, if ∃𝑖 ∈ 𝑗 ∶ 𝑥̂ 𝑗,𝑖 ≥ 0 and ∀𝑖 ∈ 𝑗 ∶ 𝑥∗𝑗,𝑖 = 0, (15)
the value 0 (𝑎𝑗,𝑖 = 0) otherwise: ⎪
⎩𝑑𝑗 − 1, otherwise.
⎡ 𝑎1,1 , 𝑎1,2 , … 𝑎1,𝐼1 ⎤
⎢ 𝑎 , ⎥ If a type 𝑗 project arrived during the transition time and state 𝑷 ∗𝑗 is such
𝑎2,2 , … 𝑎2,𝐼2
𝑎 = ⎢ 2,1 ⎥. (9) that a type 𝑗 project is completed or not present (∀𝑖 ∈ 𝑗 ∶ 𝑥∗𝑗,𝑖 = 0) the
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ 𝑎 , ⎥ system accepts the new type 𝑗 project. Otherwise, the system rejects
⎣ 𝐽 ,1 𝑎𝐽 ,2 , … 𝑎𝐽 ,𝐼𝐽 ⎦ the new arrival. From a pre-acceptance state 𝑠∗ to the following pre-
An action 𝑎 must fulfil three requirements: decision state 𝑠′ , the new task state becomes −1 and the due date state
becomes 𝐹𝑗 .
(1) The selected tasks for processing must have the task state ‘‘pending
for processing’’:
2.4.5. Transition function
∀𝑗 ∈  , 𝑖 ∈ 𝑗 ∶ 𝑎𝑗,𝑖 = 1 ⇒ 𝑥𝑗,𝑖 = −1. (10) The transition function, 𝑠′ = 𝑠𝑀 (𝑠, 𝑎, 𝑐), represents the transformation
of a system from a pre-decision state 𝑠 under action 𝑎 to a pre-
(2) There must be enough free resources of each type to allocate for decision state 𝑠′ at the next decision epoch by random events 𝑐 during
processing the selected tasks: the transition time. Random events include new project arrivals, task
completions and project completions. All task completions and project
𝐼
∑
𝐽 ∑𝑗
arrivals are stochastically independent.
𝑏𝑘𝑗,𝑖 𝛶 {𝑎𝑗,𝑖 = 1} ≤ 𝐵𝑘free (𝑠), 𝑘 = 1, … , 𝐾. (11)
A project of each type may arrive in the system during a transition
𝑗=1 𝑖=1
time according to its type’s arrival probability 𝜆𝑗 and is accepted if no
(3) All predecessor tasks of the selected tasks must be completed: type 𝑗 project exists in the system.
A task may complete processing according to a conditional prob-
∀𝑗 ∈  , 𝑖 ∈ 𝑗 ∶ 𝑎𝑗,𝑖 = 1 ⇒ ∀𝑚 ∈ 𝑗,𝑖 ∶ 𝑥𝑗,𝑚 = 0. (12) ability 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ), if the task’s processed time following this transition
Here, 𝑚 represents a predecessor of task 𝑖 (𝑚 ∈ 𝑗 ⧵ 𝑖, 𝑚 ∈ 𝑗,𝑖 ). (𝑡max
𝑗,𝑖 − 𝑥 ̂ 𝑗,𝑖 + 1) is equal to or greater than its minimal possible duration
The action where all action elements are zero is also a valid action 𝑡min
𝑗,𝑖 . Formally, we define 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ) = 0 for ∀𝑥̂ 𝑗,𝑖 > 𝑡max min
𝑗,𝑖 − 𝑡𝑗,𝑖 + 1, and
max min
require 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ) > 0 for 𝑥̂ 𝑗,𝑖 = 𝑡𝑗,𝑖 − 𝑡𝑗,𝑖 + 1, which guarantees that
and indicates that no task was selected to begin processing in that
period. More than one action may fulfil all three requirements for a 𝑡min
𝑗,𝑖 is the minimal possible duration. We also require 𝛾𝑗,𝑖 (1) = 1, which
pre-decision state. The set of these actions is named an action set (𝑠). guarantees that 𝑡max 𝑗,𝑖 is the maximal possible duration.
The probability of reaching a pre-decision state 𝑠′ from a pre-
2.4.3. Post-decision state decision state 𝑠 under action 𝑎 with the transition function 𝑃 (𝑠′ |𝑠, 𝑎) is
A post-decision state 𝑠̂ represents the system information immedi- the joint probability of task completions 𝑃 (𝑥′𝑗,𝑖 |𝑥̂ 𝑗,𝑖 ) and project arrivals
ately after a decision epoch and just before the transition time begins. 𝑃 (𝑷 ′𝑗 |𝑷̂ 𝑗 ):
In other words, a post-decision state is the system information from
∏𝐽 ⎛∏ 𝐼𝑗 ⎞
the pre-decision state 𝑠 updated by an action 𝑎 but before any task ⎜
𝑃 (𝑠′ |𝑠, 𝑎) = 𝑃 (𝑥∗𝑗,𝑖 |𝑥̂ 𝑗,𝑖 )⎟ 𝑃 (𝑷 ′𝑗 |𝑷𝑗∗ ), (16)
processing or random event occurs, i.e., 𝑠̂ ∶= 𝑓 (𝑠, 𝑎) where 𝑓 is a ⎜
𝑗=1 ⎝ 𝑖=1
⎟
⎠
deterministic function defined in (14) for each element of 𝑠 and 𝑎.
⎧𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ), if 𝑥̂ 𝑗,𝑖 ≥ 1 and 𝑥∗𝑗,𝑖 = 0,
A post-decision state consists of the post-decision project states 𝑷̂ 𝑗 . A ⎪
post-decision project state consists of post-decision task states 𝑥̂ 𝑗,𝑖 of ⎪1 − 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ), if 𝑥̂ 𝑗,𝑖 ≥ 2 and 𝑥∗𝑗,𝑖 = 𝑥̂ 𝑗,𝑖 − 1,
𝑃 (𝑥∗𝑗,𝑖 |𝑥̂ 𝑗,𝑖 ) = ⎨ (17)
each task 𝑖 ∈ 𝑗 and the same due date states 𝑑𝑗 of the pre-decision if 𝑥∗𝑗,𝑖 = 𝑥̂ 𝑗,𝑖 ≤ 0,
⎪1,
project state 𝑷 𝑗 : ⎪0, otherwise,
⎩
⎡ 𝑥̂ 1,1 , 𝑥̂ 1,2 , … 𝑥̂ 1,𝐼1 , 𝑑1 ⎤ ⎧𝜆𝑗 ,
⎢ 𝑥̂ , ⎥ if ∀𝑖 ∈ 𝑗 ∶ 𝑥′𝑗,𝑖 = −1, 𝑥∗𝑗,𝑖 = 0,
𝑥̂ 2,2 , … 𝑥̂ 2,𝐼2 , 𝑑2 ⎪
𝑠̂ = ⎢ 2,1 ⎥. (13) ⎪1 − 𝜆𝑗 , if ∀𝑖 ∈ 𝑗 ∶ 𝑥′𝑗,𝑖 = 𝑥∗𝑗,𝑖 = 0,
⎢ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥ ′ ∗
𝑃 (𝑷 𝑗 |𝑷𝑗 ) = ⎨ (18)
⎢ 𝑥̂ , 𝑥̂ 𝐽 ,2 , … 𝑥̂ 𝐽 ,𝐼𝐽 , 𝑑𝐽 ⎥ ⎪1, if ∀𝑖 ∈ 𝑗 ∶ 𝑥′𝑗,𝑖 = 𝑥∗𝑗,𝑖 ≠ 0,
⎣ 𝐽 ,1 ⎦
⎪0, otherwise.
A post-decision task state 𝑥̂ 𝑗,𝑖 is the updated state of a task from the ⎩
preceding pre-decision task state 𝑥𝑗,𝑖 . It is only the tasks that have Here in (17), the first line represents that the post-decision task state
been selected to start their processing (𝑎𝑗,𝑖 = 1) that change from the of task 𝑖 from project type 𝑗 allows for task completion and, with
pre-decision state −1 to the post-decision state 𝑡max
𝑗,𝑖 :
probability 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ), the task will be completed by the pre-acceptance
{ state. The second line represents that, with probability 1 − 𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ), the
𝑡max
𝑗,𝑖 , if 𝑥𝑗,𝑖 = −1 and 𝑎𝑗,𝑖 = 1, task will not be completed by the pre-acceptance state. The third line
𝑥̂ 𝑗,𝑖 = (14)
𝑥𝑗,𝑖 , otherwise. represents that the post-decision task state of task 𝑖 from project type

457
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

𝑗 is completed or pending for processing and, with 100% probability, Table 1

the task will retain its status in the pre-acceptance state. Considered linear approximation models.

In (18), the first line represents that, with 𝜆𝑗 probability, there will Model : 1 2 3 4 5 6 7 8 9 10 11
be an arrival of project type 𝑗 during the transition time and the new Feature 1 PT CRU TRU DR PT PT PT CRU TRU PT PT
type 𝑗 project will take the place of the previously completed or non- Feature 2 – – – – CRU TRU DR DR DR CRU TRU
existing type 𝑗 project. The second line represents that, with 1 − 𝜆𝑗 Feature 3 – – – – – – – – – DR DR

probability, there will be no new arrival of project type 𝑗 during the

transition time. The third line represents that, with 100% probability,
the arrival of projects will not affect the status of ongoing or waiting 3.1. Linear model selection
projects of the same type as the new project will be rejected.
We utilise four readily available project features and consider eleven
2.5. Objective function
linear approximation models comprised of different feature combina-
The immediate profit represents the accrued profit during the state tions. See Table 1 for details of each model’s features. The first feature
transition from 𝑠 under action 𝑎 to 𝑠′ , considering project completion is a measure of the processing time to date (PT) of the project:
rewards and tardiness costs. The immediate profit 𝑅𝑠,𝑎,𝑠′ is the sum of 𝐼𝑗 { } {
∑ }
completed project rewards minus the tardiness cost of late completions 𝑡max ̂ 𝑗,𝑖 + 1 𝛶 𝑥̂ 𝑗,𝑖 ≥ 0 .
𝑗,𝑖 − 𝑥 (22)
(𝑑𝑗 = 0): 𝑖=1

Here, 𝑡max
𝑗,𝑖 − 𝑥̂ 𝑗,𝑖 represents task 𝑖’s processed time. Since the task’s
𝑅𝑠,𝑎,𝑠′ processed time is zero when the action is taken (𝑥̂ 𝑗,𝑖 = 𝑡max
𝑗,𝑖 ), we use
𝐽 { }}
∑ ( { }) { the task’s processed time after the transition time ends (𝑡max
𝑗,𝑖 − 𝑥 ̂ 𝑗,𝑖 + 1)
= 𝑟𝑗 − 𝑤𝑗 𝛶 𝑑𝑗 = 0 𝛶 ∃𝑖 ∈ 𝑗 ∶ 𝑥̂ 𝑗,𝑖 > 0 and ∀𝑖 ∈ 𝑗 ∶ 𝑥′𝑗,𝑖 ≤ 0 . to differentiate the effect of actions.
𝑗=1
The second feature is the current resource usage (CRU) of project 𝑗:
(19)
𝐼
Here, the first indicator is for late project completion that takes the 𝐾 ∑
∑ 𝑗
{ }
value 1 if a project completes later than its due date (i.e., the project’s 𝑏𝑘𝑗,𝑖 𝛶 𝑥̂ 𝑗,𝑖 > 0 . (23)
𝑘=1 𝑖=1
due date state 𝑑𝑗 = 0) and takes the value 0 otherwise. The second
indicator is for project completion that takes the value 1 if a project Here, we consider the post-decision state’s resource allocation because
completes (at least one task is in progress in the post-decision state and it gives the best information about resources used by an action. For
all project tasks are complete at the end of the period) and takes the example, for single-period tasks, the resource allocation information of
value 0 otherwise. action may disappear after one transition time. Thus, the post-decision
Our objective function seeks to find the policy that maximises the state’s resource allocation gives the most precise information about
expected total discounted long-run profit: resource usage after the action is taken.
[∞ ] The third feature is the total resource used (TRU) by project 𝑗 to
∑
𝑉 ∗ (𝑠1 ) = max E𝜋 𝛼 𝑡−1 𝑅𝑠𝑡 ,𝑎,𝑠𝑡+1 . (20) date:
𝜋∈𝛱
𝑡=1 𝐼
∑
𝐾 ∑𝑗
Here, 𝑠1 is a given initial state; 𝑅𝑠𝑡 ,𝑎,𝑠𝑡+1 is the immediate profit of state 𝐵𝑗 (𝑠) + 𝑏𝑘𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 > 0}. (24)
transition from pre-decision states 𝑠𝑡 to 𝑠𝑡+1 under the action 𝑎 at the 𝑘=1 𝑖=1
time 𝑡; 𝛼 is a discount factor in the interval (0,1); 𝜋 is a policy from Here, 𝐵𝑗 (𝑠) is the TRU of the previous state with the selected action
the set of all stationary deterministic policies 𝛱 that prescribe in every ∑𝐼𝑗 𝑘
𝐵𝑗 (𝑠𝑡 ) = 𝑇 𝑅𝑈𝑗 (𝑠̂𝑡−1 ) and 𝑖=1 𝑏𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 > 0} represents the amount
state 𝑠 an action from the action set (𝑠). of type 𝑘 resource that will be used to process project type 𝑗 under
the latest taken action. This feature reflects the resource usage of a
3. Approximate dynamic programming (ADP)
project and, indirectly, the time that the project has been processed
in the system.
In theory, the problem (20) can be solved using the Bellman equa-
The fourth feature is decision reward (DR), which is the reward
tion:
∑ per period of processing, under the assumption the project will be
𝑉 ∗ (𝑠) = max 𝑃 (𝑠′ |𝑠, 𝑎)[𝑅𝑠,𝑎,𝑠′ + 𝛼𝑉 ∗ (𝑠′ )] ∀𝑠 ∈ , (21) processed continuously until completion:
𝑎∈(𝑠) ′
𝑠 ∈
⎧ 𝑟𝑗
but in practice, it suffers from ‘‘the curse of dimensionality’’. ‘‘The ⎪ℎ if 𝑑𝑗 > ℎ𝑗
curse of dimensionality’’ means that the number of states and compu- 𝑅̄ 𝑗 = ⎨ 𝑟𝑗𝑗−𝑤𝑗 (25)
⎪ ℎ𝑗 , otherwise.
tational requirements expands exponentially with the number of state ⎩
variables (Sutton & Barto, 2018). Satic et al. (2022) investigated the
In (25), the remaining late project horizon ℎ𝑗 is given by:
limitations of DP and stated that a state space larger than their five
projects with two tasks problem is computationally intractable for their 𝐼𝑗 { }
∑
hardware. ℎ𝑗 = 𝑥̂ 𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 ≠ −1} + 𝑡max
𝑗,𝑖 𝛶 {𝑥
̂ 𝑗,𝑖 = −1} , (26)
ADP is a modelling strategy to overcome ‘‘the curse of dimension- 𝑖=1

ality’’ problem of DP due to the use of the Bellman equation (Powell, i.e. the sum of remaining task durations assuming all remaining tasks
2009). In our ADP algorithm, we estimate the value function of the are completed on the maximum task duration. This is compared to the
Bellman Eq. (21) using a linear approximation model (27). A linear remaining duration to the due date, 𝑑𝑗 . When the remaining duration
approximation model is a regression model that fits the value function to the due date exceeds the remaining late project horizon, 𝑅̄ 𝑗 takes
of the Bellman equation by estimating a parameter vector 𝜃 (Powell, the value of the project completion reward divided by the remaining
2011, p 304). The linear approximation model only requires the current late project horizon. Otherwise, 𝑅̄ 𝑗 is reward minus tardiness cost
state and action information, and future state information and storing divided by the remaining late project horizon. Using this feature, the
the states becomes unnecessary. With the linear approximation model, duration to the project’s due date, a worst-case estimate of the remain-
the decision making is done in an online fashion; thus, ADP can be used ing processing duration, reward, and tardiness cost become decision
for larger size problems. elements.

458
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Table 2 Table 4
Counts of when the model is not statistically significantly different from the highest Counts of when model has higher* profit from than Model 3.
profit. Model : 1 2 4 5 6 7 8 9 10 11
Model : 1 2 3 4 5 6 7 8 9 10 11
2p2t 1 2 2 2 0 2 2 2 2 2
2p2t 9 10 8 10 10 0 10 10 10 9 8 2p3t 0 0 0 0 0 0 0 0 0 0
2p3t 10 2 10 6 2 9 10 2 9 2 9 3p2t 6 2 9 1 7 7 4 6 3 5
3p2t 3 1 2 4 2 3 6 3 3 0 5 2p10t 0 1 0 1 0 0 2 5 1 0
2p10t 0 3 5 1 3 0 1 5 7 4 0 5p5t 0 0 0 0 2 0 0 2 0 0
5p5t 1 0 7 1 1 3 1 0 4 1 0 5p10t 0 0 0 0 0 0 0 5 0 0
5p10t 1 1 5 1 1 0 1 1 6 1 1 6p5t 3 4 3 8 7 4 5 8 6 7
6p5t 0 3 0 1 2 4 0 1 0 0 3 10p10t2r 0 7 1 9 9 0 8 8 8 8
10p10t2r 0 0 0 0 5 4 0 1 3 1 2 5p30t4r 2 0 2 0 0 1 2 0 0 0
5p30t4r 7 1 7 7 1 0 1 1 1 2 0 2p30t2r 0 0 0 0 0 0 0 0 0 0
2p30t2r 0 1 10 0 1 0 0 1 1 1 0
SUM 12 16 17 21 25 14 23 36 20 22
SUM 31 22 54 31 28 23 30 25 44 21 28

Table 5
Table 3 Counts of when model has lower* profit from than Model 3.
Counts of when the model is not statistically significantly different from the lowest Model : 1 2 4 5 6 7 8 9 10 11
profit.
2p2t 0 0 3 0 10 0 0 0 1 4
Model : 1 2 3 4 5 6 7 8 9 10 11
2p3t 0 9 6 9 1 0 9 1 9 1
2p2t 0 0 0 0 0 10 0 0 0 0 0 3p2t 9 3 9 5 10 8 2 3 3 10
2p3t 1 9 1 1 9 1 1 9 2 9 2 2p10t 2 7 1 5 2 2 5 0 6 5
3p2t 2 4 0 0 1 2 2 3 0 3 2 5p5t 8 10 9 9 8 9 10 6 9 9
2p10t 0 0 0 0 0 1 7 0 0 0 3 5p10t 10 9 9 9 10 10 9 5 9 9
5p5t 0 0 0 0 0 1 0 2 0 0 8 6p5t 6 6 5 2 3 5 5 2 4 3
5p10t 5 0 0 0 0 1 4 0 1 0 0 10p10t2r 10 2 9 1 1 10 2 2 2 2
6p5t 1 2 0 1 0 2 1 2 1 1 0 5p30t4r 3 7 4 9 10 7 7 9 7 10
10p10t2r 9 0 0 0 0 1 2 0 0 1 0 2p30t2r 10 9 10 9 10 10 9 9 9 10
5p30t4r 0 0 0 0 0 5 2 1 2 0 2
SUM 58 62 65 58 65 61 58 37 59 63
2p30t2r 1 0 0 1 0 1 8 0 0 0 1
SUM 19 15 1 3 10 25 27 17 6 14 18

The remaining 27 scenarios were NS differences. Since, Model 3 is not

superior to Model 9, and Model 3 is a simple model with a single
All models are evaluated on the one hundred problem scenarios feature, Model 9 will be used in the remainder of the paper.
that make up the numerical study of Section 5. Note that, for each
of the 10 problem settings #p#t(#r) 10 problems that differ by the Model 9 considers relevant features of the dynamic and stochastic
choice of arrival probability are considered. In the problem setting RCMPSP. The approximate value function 𝑉̄ (𝑠)
̂ for Model 9 is:
identifier, p and t are the number of projects and tasks and r, if
present, is the number of types of resource (otherwise there is a single ⎧ ⎧ 𝐼𝑗 ⎫ ⎫
∑𝐽
⎪ 1⎪ ∑𝐾 ∑
⎪
𝑘 2 ̄ ⎪
resource type). Each model has been trained using an appropriate 𝑉̄ (𝑠)
̂ = ⎨𝜃𝑗 ⎨𝐵𝑗 (𝑠) + 𝑏𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 > 0}⎬ + 𝜃𝑗 𝑅𝑗 ⎬ . (27)
version of Algorithm 1 in Section 3.2 with 100 iterations each having 𝑗=1 ⎪ ⎪ 𝑘=1 𝑖=1 ⎪ ⎪
⎩ ⎩ ⎭ ⎭
100 simulations with 1000 periods and a discount factor 𝛼 = 0.999. In
all scenarios, we used common seeds to generate project arrivals and Coefficients 𝜃𝑗1 and 𝜃𝑗2 are generated using simulation training as de-
task completions to create a fair comparison between models, e.g., the scribed in Section 3.2. As an outcome of the combination of features
probability that the 𝑛th completion of project 𝑗’s 𝑖th task is consistent and coefficients, the suggested approximation function considers a
across scenarios. project’s due date, reward, tardiness cost, remaining processing time,
For our performance comparison in Table 2 and Table 3, we identify processing time to date and resource usage. All state information is
the model(s) with the highest (respectively, lowest) profit in each used for decision making directly or indirectly. That creates a balanced
problem and compare this with the profits from each model via a Welch approximation function.
t-test at the 0.1% level of significance. We aggregate the results by
We note that more complex models consider more features of the
recording the number of times each model was not statistically signif-
problem; thus, they have the potential to make better decisions using
icantly different from the model(s) with the highest or lowest profit.
more information. However, their performance suffers from multi-
Models 3 and 9 generated the highest profits or were not statistically
collinearity during the training phase. For example, Model 11, which
significantly different from the highest profits in 54 and 44 of the 100
has three features, typically generated the lowest profits in our tests.
scenarios respectively. Other models generated the highest profits or
Experimenting on a single scenario where Model 11 produces the
were not statistically significantly different from the highest profits 33
lowest profits, we heuristically combined the coefficients and features
or fewer times. In addition, Models 3 and 9 generated the lowest profits
from Models 1, 2 and 4 to make a new approximating model that
or were not statistically significantly different from the lowest profits
has the same features as Model 11. The resulting model generated
in 1 and 6 of the 100 scenarios respectively.
higher profits than other linear models in the scenario. To sum up,
Although Model 3 generated the highest profits more times than the multicollinearity issue affected the training of the ADP models, so
Model 9, we cannot claim that Model 3 is better than Model 9. In simpler linear models generated higher profits in our comparison.
additional comparative Welch t-tests of each model to Model 3 in
Tables 4 and 5, Model 3 generated higher*2 results than Model 9 in 37
of 100 scenarios but lower* results than Model 9 in 36 of 100 scenarios.

2
* represents a statistical significance level of 0.05. ** represents a statis- 0.001. NS represents the absence of a statistically significant difference at a
tical significance level of 0.01. *** represents a statistical significance level of level of 0.05.

459
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Algorithm 1: : ADP. Algorithm 2: : Online Decision Making.

procedure ADP algorithm procedure Online Decision making
∀𝑗 ∈  ∶ 𝜃𝑗1 = 𝜃𝑗2 = 0; initial state 𝑠1 = 𝟎. ⊳ initial values find (𝑠𝑡 ) for 𝑠𝑡
for 𝑖𝑡𝑟 = 1 to 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 do ⊳ for each iteration {⊳ (𝑠𝑡 ) is the action set for 𝑠𝑡
𝑉̃𝑠𝑖𝑚 = 0 ⊳ 𝑉̃𝑠𝑖𝑚 is cumulative simulation profit ∑
𝑠′𝑡 ∈ 𝑃 (𝑠𝑡 |𝑠𝑡 , 𝑎 )𝑅𝑠𝑡 ,𝑎′ ,𝑠′𝑡
∗ ′ ′
put set 𝛱 = argmax +
for 𝑠𝑖𝑚 = 1 to 𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 { do } ⊳ for each simulation 𝑎′ ∈(𝑠𝑡 )
∑ ∑𝐼𝑗 𝑘 }
∀𝑗 ∈  ∶ 𝐷𝑗1 (𝑠𝑖𝑚) = 𝐵𝑗 (𝑠1 ) + 𝐾 𝑘=1 𝑖=1 𝑏𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 > 0} ∑𝐽 { ∑𝐾 ∑𝐼𝑗 𝑘 }
𝛼 𝑡−1 𝑗=1 𝜃𝑗1 {𝐵𝑗 (𝑠) + 𝑘=1 𝑖=1 𝑏𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 > 0}} + 𝜃𝑗2 𝑅̄ 𝑗
∀𝑗 ∈  ∶ 𝐷𝑗2 (𝑠𝑖𝑚) = 𝑅̄ 𝑗
for 𝑡 = 1 to 𝑃 𝑒𝑟𝑖𝑜𝑑 do ⊳ for each simulation period select an action 𝑎 ∈ 𝛱 ∗ ⊳ 𝛱 ∗ is the set of all best action for 𝑠𝑡
find (𝑠𝑡 ) for 𝑠𝑡 { { ⊳ (𝑠𝑡 ) is the action } set for} 𝑠𝑡
∑ ∑ ∑𝐼𝑗 𝑘
return 𝑎
choose 𝑎 ∈ argmax 𝐽𝑗=1 𝜃𝑗1 𝐵𝑗 (𝑠) + 𝐾 2 ̄
𝑘=1 𝑖=1 𝑏𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 > 0} + 𝜃𝑗 𝑅𝑗 end procedure
𝑎′ ∈(𝑠𝑡 )
compute 𝑠𝑡+1 = 𝑠𝑀 (𝑠𝑡 , 𝑎, 𝑐𝑡 ) ⊳ state iteration via simulation
𝑉̃𝑠𝑖𝑚 = 𝑉̃𝑠𝑖𝑚 + 𝛼 𝑡−1 𝑅𝑠 ,𝑎,𝑠
𝑡
⊳ 𝑅𝑠𝑡 ,𝑎,𝑠 explained at Section 2.5
𝑡+1 𝑡+1
end for
𝑠1 = 𝑠𝑃 𝑒𝑟𝑖𝑜𝑑+1 ⊳ initial state for the next simulation function:
end for
∑𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 { ̃ ∑ (
choose (𝜃_𝑛𝑒𝑤1𝑗 , 𝜃_𝑛𝑒𝑤2𝑗 ) ∈ argmin 𝑠𝑖𝑚=1 𝑉𝑠𝑖𝑚 − 𝐽𝑗=1 𝜃_𝑛𝑒𝑤
̂ 1 𝐷1 (𝑠𝑖𝑚) +
𝑗 𝑗 ⎧ ⎫
𝐼𝑗
1 ,𝜃_𝑛𝑒𝑤2
∑𝐽
⎪ 1 ∑𝐾 ∑
2 ̄ ⎪
̂
𝜃_𝑛𝑒𝑤 ̂
𝑗 𝑗 𝑘
2 𝐷2 (𝑠𝑖𝑚)
)}2 choose 𝑎 ∈ arg max ⎨ 𝑗 𝑗
𝜃 {𝐵 (𝑠) + 𝑏 𝑗,𝑖 𝛶 {𝑥̂ 𝑗,𝑖 > 0}} + 𝜃𝑗 𝑗⎬ .
𝑅
̂
𝑗=1 ⎪ ⎪
𝜃_𝑛𝑒𝑤 𝑗 𝑗 𝑎′ ∈(𝑠𝑡 ) 𝑘=1 𝑖=1
𝜏 = 𝜏ℎ𝑎𝑟𝑚𝑜𝑛𝑖𝑐 ∕(𝜏ℎ𝑎𝑟𝑚𝑜𝑛𝑖𝑐 + 𝑖𝑡𝑟 − 1) ⊳ harmonic step size ⎩ ⎭
∀𝑗 ∈  ∶ 𝜃_𝑜𝑙𝑑𝑗1 = 𝜃𝑗1 and 𝜃_𝑜𝑙𝑑𝑗2 = 𝜃𝑗2
(30)
∀𝑗 ∈  ∶ 𝜃𝑗1 = (1 − 𝜏)𝜃_𝑜𝑙𝑑𝑗1 + 𝜏𝜃_𝑛𝑒𝑤1𝑗 , 𝜃𝑗2 = (1 − 𝜏)𝜃_𝑜𝑙𝑑𝑗2 + 𝜏𝜃_𝑛𝑒𝑤2𝑗
end for
In the case of multiple actions bringing the highest profit, we select the
return ∀𝑗 ∈  ∶ 𝜃𝑗1 and 𝜃𝑗2
end procedure action that processes most tasks. If the tie continues, the action that
processes tasks of the higher index is selected, e.g., project type 5 is
prioritised over project type 1.
The post-decision state 𝑠̂ begins with the implementation of the best
3.2. Generation of approximation function coefficients action. Then random events 𝑐𝑡 are simulated during the transition time
according to the transition function and the new pre-decision state 𝑠′𝑡+1
is achieved. If any project completes during the transition time, their
The ADP algorithm generates coefficients for features in the linear profit is added to the cumulative profit with discounting using the
approximation model (27) using simulation and the least-squares fit- discounting function 𝛼 𝑡−1 . This simulation process repeats for a specific
ting method. A least-squares fitting method minimises the sum of the amount of periods, and the final pre-decision state is used as an initial
squared residuals. The coefficient generation process is summarised in pre-decision state in the next simulation.
Algorithm 1.
Here, we train our approximation function with a set amount of 3.3. Online decision making
iterations. In the first iteration, we assume the initial pre-decision
state is an empty state with no existing project, and the coefficients The approximation function evaluated by Algorithm 1 can be used
are zero. In each iteration, we run a set amount of simulations, and for online decision making for any pre-decision state. The online
from each simulation, we collect features 𝐷𝑗1 and 𝐷𝑗2 of the initial pre- decision-making process is summarised in Algorithm 2. First, all actions
decision states and cumulative simulated profit. After the simulations for the pre-decision state are generated, then the expected profits are
are completed, we estimate coefficients (∀𝑗 ∈  ∶ 𝜃𝑗1 , 𝜃𝑗2 ) by minimising calculated using the approximation function for each action. The action
the sum of the squared deviations between the cumulative discounted with the highest profit is used for the state, with tie-breaking applied
profits and the linear approximation model (27) using a linear least- as described in Section 3.2.
squares regression method. We denote the estimated coefficients by
𝜃_𝑛𝑒𝑤 and existing coefficients by 𝜃_𝑜𝑙𝑑, and we use them in a dynamic 4. Compared algorithms
step-size function (28) to generate coefficients of the new iteration:
We used DP, ORBA, GA and RBA for benchmarking with our ADP
∀𝑗 ∈  , ∀𝑔 ∈ {1, 2} ∶ 𝜃𝑗𝑔 = (1 − 𝜏)𝜃_𝑜𝑙𝑑𝑗𝑔 + 𝜏𝜃_𝑛𝑒𝑤𝑔𝑗 . (28) algorithm. ORBA and GA are applied to the dynamic problem using
a reactive scheduling method which reschedules the plan when a
We generate the dynamic step-size value 𝜏 with the harmonic step-size
new project arrival occurs by rerunning the algorithm. ORBA and GA
method:
generate a task processing order to create an action for a given pre-
decision state. DP generates the optimal policy which we denote 𝜋 ∗ .
𝜏 = 𝜏ℎ𝑎𝑟𝑚𝑜𝑛𝑖𝑐 ∕(𝜏ℎ𝑎𝑟𝑚𝑜𝑛𝑖𝑐 + 𝑖𝑡𝑟 − 1). (29)
RBA directly creates an action for a pre-decision state according to
We set the harmonic step-size value as 𝜏ℎ𝑎𝑟𝑚𝑜𝑛𝑖𝑐 = 1.3 some predefined rules or criteria.
We use the terminal pre-decision state of the iteration as the initial
4.1. Dynamic programming (DP)
pre-decision state in the new iteration. After a specific amount of
iterations, the final coefficients are used in the linear approximation
DP calculates optimal policies from an MDP model of the problem
model for online decision making.
by solving the Bellman equation (Sutton & Barto, 2018). We used the
During a simulation, we find the action set (𝑠𝑡 ) of the current pre- value iteration method. We used DP only for problems from Satic et al.
decision state 𝑠𝑡 and select the most profitable action using the objective (2022) for benchmarking.

4.2. Optimal reactive baseline algorithm (ORBA)

3
We also considered 𝜏ℎ𝑎𝑟𝑚𝑜𝑛𝑖𝑐 = 10 from Powell (2011, p 451) and KESTEN’s ORBA is an exact brute force algorithm that sequentially solves a
stepsize rule (Powell, 2011, p 436), but we received better results with the static RCMPSP from state 𝑠𝑡 to optimality. The static problem results by
stated settings. ignoring the possibility of future project arrivals. New project arrivals

460
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Algorithm 3: : Value Iteration. highest ranked TSO is called the best TSO. After the first generation is
generated, GA begins iterations.
procedure state value iteration procedure
At each iteration, GA creates an empty set of TSOs and fills this new
𝛽 = 0.001 ⊳ 𝛽 is the stopping parameter
set with TSOs to the desired population size amount using elitist selec-
∀𝑠 ∈  ∶ 𝑉 𝑜𝑙𝑑 (𝑠) = 0 ⊳ initial state values
tion, crossover and mutation operators. The elitist selection operator
repeat
copies the desired amount of highest ranked TSOs from the previous
for ∀𝑠 ∈  ∶ do
∑ generation of TSOs to the new empty set of TSOs (new generation).
𝑉 (𝑠) = max 𝑠′ ∈ 𝑃 (𝑠′ |𝑠, 𝑎)(𝑅𝑠𝑡 ,𝑎,𝑠𝑡+1 + 𝛼𝑉 𝑜𝑙𝑑 (𝑠′ )) ⊳ value
𝑎∈𝐴(𝑠) Crossover and mutation operators fill the rest of the new generation.
function The crossover operator randomly selects two TSOs from the previ-
end for ous generation and randomly selects a task inside of the first TSO. The
𝑊𝑚𝑎𝑥 = max |𝑉 (𝑠) − 𝑉 𝑜𝑙𝑑 (𝑠)| ⊳ maximum value change crossover operator copies tasks, from the earliest task to be processed
𝑠∈
Update ∀𝑠 ∈  ∶ 𝑉 𝑜𝑙𝑑 (𝑠)
= 𝑉 (𝑠) up to the randomly selected task of the first (selected) TSO, to (make)
until 𝑊𝑚𝑎𝑥 ≤ 𝛽(1 − 𝛼)∕(2𝛼) a new TSO without changing the order of these tasks. Then, the
end procedure crossover operator re-orders the remaining tasks of the first (selected)
TSO according to order of these tasks in the second (selected) TSO,
then adds them to the new TSO (to after the randomly selected task).
The new TSO is always a precedence-feasible TSO since it is created
disrupt the current schedule and a new schedule incorporating these according to the order of tasks in both selected precedence-feasible
is required. In such a manner, ORBA represents an optimal reactive TSOs.
scheduling algorithm. Due to the computational requirements of brute The new TSO may be adjusted by the mutation operator with a
force algorithms, ORBA runs in factorial time and only small-size desired probability. Under the mutation operation, a task is selected
dynamic and stochastic RCMPSP problems can be solved within a at random and the location of this task in the TSO is randomly re-
reasonable time. Thus, we limit our test problems with ORBA to a assigned. The new location cannot be later than the task’s previous
maximum of 10 tasks. The ORBA used here extends that in Satic et al. order and cannot be sooner than its latest to be processed predecessor
(2022) by allowing for multiple resource types. task. Thus the mutation operator also ensures that the newly generated
Specifically, for a given pre-decision state 𝑠𝑡 , ORBA calculates the TSO is a precedence-feasible TSOs. Then the new TSO is added to
profits and makespans of all precedence-feasible task scheduling orders the new generation. When the size of the new generation reaches
(TSOs) then selects a TSO of maximal profit. A precedence-feasible TSO the population size amount, the TSOs are then ranked as in the first
is a permutation 𝜎 of tasks such that, for any 𝑚 < 𝑛 with 𝜎(𝑚) = (𝑗, 𝑖) and generation. GA iterates the generations until the desired number of
𝜎(𝑛) = (𝑗, 𝑘), 𝑘 ∉ 𝑗,𝑖 . In the case of ties, the candidate schedule with generations is reached. The best TSO of the final generation is used
minimal makespan is selected, and the schedule of the smallest project for decision making.
and task indices are selected from any remaining candidate schedules. The TSO is converted to an action in the same way as for ORBA.
The generated TSO is converted to non-idling action 𝑎 for given Similar to ORBA, GA’s TSO can be used for future pre-decision states
pre-decision state 𝑠𝑡 using a serial schedule-generation scheme (SGS). as long as a new project arrival does not disturb the system.
A non-idling action is one that will always allocate resource to tasks Since the reactive scheduling method reruns GA for each time an ar-
when it is possible to do so. In an SGS, if there are enough free rival disturbs the processing plan, the computational time requirement
resources to process the next task in the TSO, its action becomes one increases with the problem size.
(𝑎𝑗,𝑖 = 1), and its resource usage is subtracted from the free resources.
The process then repeats for the remaining tasks in order of the TSO 4.4. Rule-based algorithm (RBA)
until either no tasks can begin processing or there is insufficient free
resources to process any of the remaining tasks. The TSO is followed Rule-Based Algorithm (RBA) is a priority-based heuristic algorithm
in future periods as long as no new project arrives. If a new project which uses the longest task first priority rule. We considered RBA in
arrival disturbs the system, the current TSO becomes invalid, and ORBA benchmarking to show the performance of a simple heuristic algorithm.
generates a new TSO. Due to the simplicity of the algorithm, it runs very fast for all problem
sizes.
For a given pre-decision state 𝑠𝑡 , RBA creates a precedence-feasible
4.3. Genetic algorithm (GA)
TSO where the tasks with the longest task processing durations have
priority over other tasks. Then the TSO is converted to an action as
GA is a heuristic algorithm which searches the solution space using same as in ORBA.
a set of solutions (population). GA then improves the population many
times (generation) using bio-inspired operators such as crossover and 5. Computational results
mutation to find better solutions. We used the genetic algorithm to
benchmark with our ADP algorithm since GA is the most popular We simulate the dynamic project scheduling environment with
method for RCPSP family. GA is applied to dynamic problems using a random new project arrivals and stochastic task durations, and we
reactive scheduling method. We used GA from Satic et al. (2022) and, compare the expected total discounted long-run profit performance
in this paper, we extended it to multiple resource types. of DP, ADP, ORBA, GA and RBA. Algorithm 4 shows the simulation
For a given pre-decision state 𝑠𝑡 , GA generates the desired pop- procedure we used in our comparisons. The statistical significance of
ulation size amount of precedence-feasible TSOs, which are random ADP (Model 9) against other methods are shown in the tables at three
permutations of the pending for processing tasks (𝑥𝑗,𝑖 = −1). The levels (0.001, 0.01, 0.05). The experiments are coded in JuliaPro 1.3.1.2
algorithm evaluates profits and makespans of the TSOs and then ranks on a desktop computer with Intel i5-11400F CPU with 2.60 GHz clock
them using these values. These were evaluated via simulation under speed and 32 GB of RAM.
an assumption that there will be no new project arrivals and all tasks We used 100 problem scenarios in our comparison which are a
complete at 𝑡max . TSOs with higher profits get a higher rank. In the combination of 10 project settings and 10 project arrival probabilities.
case of ties, TSOs with smaller makespans get a higher rank. If the These arrival probabilities 𝜆𝑗 are 0.01 and from 0.1 to 0.9 with in-
tie continues, TSOs ranked according to their creation time (earliest crements of 0.1. Since we consider a dynamic environment 𝜆𝑗 = 0 is
to latest). This first set of TSOs is called the first generation, and the not used in this comparison instead 𝜆𝑗 = 0.01 is used to represent the

461
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Algorithm 4: : Simulation. some tasks require the allocation of all resources. The resource strength
procedure Profit simulation
of the second problem is 0.50. Here, the initial task of each project can
for 𝑠𝑖𝑚 = 1 to 𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 do ⊳ for each simulation be processed in parallel with any task of the other projects. Using these
𝑉̃𝑠𝑖𝑚 = 0, 𝑠1 = 𝟎 ⊳ 𝑉̃𝑠𝑖𝑚 is cumulative simulation profit, 𝑠1 the initial small-size problems we are able to compare ADP’s performance with
empty pre-decision state optimal policies of DP, scheduling orders of ORBA, GA and RBA.
for 𝑡 = 1 to 𝑃 𝑒𝑟𝑖𝑜𝑑 do ⊳ for each simulation period Tables 6–8 illustrate the expected total discounted long-run profits
𝑠𝑡+1 = 𝑠𝑀 (𝑠𝑡 , 𝑎, 𝑐𝑡 ) ⊳ 𝑎 ∈ 𝜋(𝑠𝑡 ), 𝜋(𝑠𝑡 ) is the policy of selected (which are averages of simulations) of policy generation methods (ver-
solution method tical) at different arrival probabilities (horizontal). The colour of cells
𝑉̃𝑠𝑖𝑚 = 𝑉̃𝑠𝑖𝑚 + 𝛼 𝑡−1 𝑅𝑠𝑡 ,𝑎,𝑠𝑡+1 shows the statistical significance of compared algorithms against ADP.
end for The simulation results of two project types, two tasks and one
end for
1 ∑𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 ̃ resource type problems are shown in Table 6. ADP produces higher***
𝑉̃ = 𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑚=1
𝑉𝑠𝑖𝑚
̃ profits than ORBA, GA and RBA, except for 𝜆𝑗 = 0.01. Recall that GA
return 𝑉
end procedure is evaluated on fewer simulations. Hence, at 𝜆𝑗 = 0.01, RBA’s profit
is lower*** than ADP’s while GA’s profit is NS with a similar profit
achieved.
The simulation results of two project types, three tasks and one
nearly-static case. Also, 𝜆𝑗 = 1 is not used because it causes a non- resource type problems are shown in Table 7. Algorithms’ profits are
ergodic MDP where some feasible states cannot be reachable from any NS different from each other at 𝜆𝑗 = 0.01. From 𝜆𝑗 = 0.1 to 𝜆𝑗 = 0.8,
states (for example a state where all projects are completed but no new ADP has higher** profits than ORBA, GA and RBA.
project has arrived). Of the 100 scenarios, 30 represent smaller sized The simulation results of three project types, two tasks and one
problems on which we are able to compare all solution algorithms. For resource type problems are shown at Table 8. ADP has higher*** profits
the 70 larger sized problems it is only possible to evaluate ADP, GA than RBA except for 𝜆𝑗 = 0.01. At 𝜆𝑗 = 0.1, 0.2, 0.8, 0.9, ADP has higher*
and RBA. profits than ORBA, GA and RBA.
In Algorithm 4, we run 100 simulations for 1000 simulation periods We see the same result as in Satic et al. (2022) in that the reactive
and a discount rate of 𝛼 = 0.999. (For the small instances we use scheduling methods ORBA and GA have close to optimal profits at
10 000 simulations for DP, ADP, RBA and ORBA.) If a period repre- 𝜆𝑗 = 0.01 where the results are NS different from each other. However,
sents a day, this period would represent approximately three years of their results usually diverge from optimum as 𝜆𝑗 increases.
processing time. Simulations start from the empty pre-decision state In summary, our comparison of project settings from Satic et al.
(𝑠1 = 𝟎). In each simulation period an action is generated with the (2022) shows that ADP cannot match the optimal policy of DP. DP’s
policy being investigated (𝜋). Then the following pre-decision state is advantage over the other policies is that it is able to better identify
generated given the action taken and the transition function. The profit opportunities to defer the use of resources to start processing a project
is generated, recorded and included within 𝑉̃𝑠𝑖𝑚 . The discounted profits task to more profitable projects in later periods. ADP generated higher*
of completed projects during the transition time are added to 𝑉̃𝑠𝑖𝑚 . After profits than ORBA, GA and RBA respectively in 22, 21 and 27 of 30
the end of the simulations, the average discounted long-run profit 𝑉̃ of problem scenarios. ADP generated NS different profits than ORBA, GA
the investigated solution method is calculated. and RBA respectively in 2, 3 and 2 of 30 problem scenarios. ADP
ADP (Algorithm 1) is trained for 100 iterations each having 100 generated lower* profits than ORBA, GA and RBA respectively in 6,
simulations with 1000 periods. GA is trained for 100 generations, each 6 and 1 of 30 problem scenarios. Thus, we showed that our linear ADP
with 100 solutions. The elitest selection operator transfers the best 10% model performs better than ORBA, GA and RBA in up to 90% of the
of solutions to the next generation. A new TSO created by the crossover scenarios from Satic et al. (2022).
operator is handled by the mutation operator with a 50% chance. We
note that, while ADP requires training once prior to the simulations, 5.2. Test problem generation
GA requires multiple training occurrences during the simulations. GA
is required to generate a new schedule each time a new project arrives. The problems of Satic et al. (2022) were the only dynamic and
In this study, we assume that the number of tasks of different project stochastic RCMPSPs in the literature that have a reward after comple-
types is equal. Project tasks have a completion duration range 𝑡max − tion, a tardiness cost after a given due date, arrival probability of new
𝑡min + 1 = 3 and have uniformly distributed completion probabilities. projects during a transition time, randomly early, normal and late task
The shortest maximum task duration in our study is 𝑡max = 2, for completions. However, Satic et al. (2022) only considered small-size
such tasks the completion duration range is 2 and have an arbitrary problems where the project network is sequential (serial, 𝑂𝑆𝑗 = 1).
completion probability distribution. The completion probabilities used Thus we generate larger size test problems using ProGen/Max and
in this computational study are: MPSPLIB problems.
ProGen/Max is an RCMPSP generation software which is developed
⎧ 1 , if 𝑡max
𝑗,𝑖 ≥ 3, 1 ≤ 𝑥
̂ 𝑗,𝑖 ≤ 3 by Schwindt (1998) which extends its predecessor ProGen (Kolisch
⎪ 𝑥̂ 𝑗,𝑖
⎪1, if 𝑡max et al., 1995) with an option to consider the minimum and maximum
𝛾𝑗,𝑖 (𝑥̂ 𝑗,𝑖 ) = ⎨ 3 𝑗,𝑖 = 2, 𝑥
̂ 𝑗,𝑖 = 2
(31) time lags between the start of activities.
⎪1, if 𝑡max
𝑗,𝑖 = 2, 𝑥
̂ 𝑗,𝑖 = 1
⎪ We used ProGen/Max to generate RCPSPs with different activity-on-
⎩0, otherwise. node networks, order strength (denoted 𝑂𝑆𝑗 ), task durations, resource
usage, and resource availability. We combined these RCPSPs prob-
5.1. Optimality gaps for small instances lems and added stochastic task completion, project arrival probability,
project completion reward, late completion cost, and due date. We add
We used project settings from Satic et al. (2022). The problems’ reasonable completion rewards and tardiness costs to each project. We
data is available in their paper and at https://github.com/ugursatic/ used the generated task durations as expected task durations 𝑡𝑗,𝑖 and
DSRCMPSP. These problems are arbitrarily created to be small and added one minimal possible (𝑡min 𝑗,𝑖 = 𝑡𝑗,𝑖 − 1) and one maximal possible
solvable by DP. In these problems, order strength, resource factor (𝑡max
𝑗,𝑖 = 𝑡𝑗,𝑖 + 1) duration options. We tested all problems with ten dif-
and the number of resources are set to 1.00. In other words, project ferent 𝜆𝑗 options which are 0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9. We
networks are serial, and all tasks use the same amount of resources. The generated the due date of the project via (32), where 𝜌 is an arbitrary
resource strength of the first and third problems is 0.00, which means factor, which we set to 1.5. We adjusted the resource availability using

462
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Table 6
Two project types and two tasks problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DP 77 503 759 907 1000 1063 1109 1143 1169 1190
ADP 75 481 710 835 911 960 995 1020 1038 1051
ORBA 73 466 669 768 817 840 846 844 837 829
GA 72 452 636 708 745 752 754 735 735 724
RBA 72 413 529 551 542 525 507 491 480 473

Table 7
Two project and three tasks problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DP 115 590 798 905 970 1013 1044 1066 1083 1097
ADP 115 584 786 889 949 988 1015 1034 1048 983
ORBA 115 575 768 862 915 947 966 979 988 995
GA 114 573 772 857 911 952 965 987 984 999
RBA 115 573 762 854 905 937 954 968 975 983

Table 8
Three project and two tasks problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
DP 199 878 1044 1122 1197 1263 1325 1378 1427 1468
ADP 181 746 857 936 1012 1089 1163 1229 1342 1373
ORBA 182 728 854 950 1042 1129 1207 1273 1325 1366
GA 186 727 848 947 1040 1128 1209 1277 1327 1367
RBA 183 736 815 862 921 986 1050 1114 1173 1228

Significantly lower results than ADP p<0.05, p<0.01, p<0.001

Significantly better results than ADP p<0.05, p<0.01, p<0.001

(33) because the combination of resource availabilities of multiple the stochastic task completion and arrival probabilities as same as
single project problems makes the multi-project problem resource-rich: ProGen/Max generated problems. We generated the due date of the
project as:
( ∑𝐼 ) ∑𝐼𝑗
⎛ ⎧ ∑𝐾 𝑗
(𝑡 𝑏𝑘 ) ⎫ ∑𝐾 (𝑡 𝑏𝑘 )
𝑖=1 𝑗,𝑖 𝑗,𝑖
𝑖=1 𝑗,𝑖 𝑗,𝑖
⎜ ⎪ 𝑘=1 𝐵𝑘 ⎪ 𝑘=1 𝐵𝑘
⎜ ⎪ ⎪ 𝐹𝑗 ≈ 𝐵𝑒𝑠𝑡𝑗 + 𝐽 . (34)
𝐹𝑗 ≈ ⎜(1 − 𝑂𝑆𝑗 ) max ⎨𝐽 , max{𝑡𝑗,1 , 𝑡𝑗,2 , … , 𝑡𝑗,𝐼𝑗 }⎬ + 𝐾
⎜ ⎪ 𝐾 ⎪
⎜ ⎪ ⎪ 5.3. Performance analysis for larger instances
⎝ ⎩ ⎭
( ∑𝐼 )
⎧ ∑𝐾 𝑗
(𝑡 𝑏 ) ⎫⎞
𝑘
𝑖=1 𝑗,𝑖 𝑗,𝑖
⎪ 𝐼𝑗 ⎪⎟ We created five project settings with ProGen/Max and two project
𝑘=1
⎪∑
𝐵𝑘
⎪⎟ settings with MPSPLIB problems. The parameters of these problems are
(𝑂𝑆𝑗 ) max ⎨ 𝑡𝑗,𝑖 , 𝐽 ⎬⎟ 𝜌,
⎪ 𝑖=1 𝐾 ⎪⎟ shown in Appendix A. Our ProGen/Max problems are not resource-rich,
⎪ ⎪⎟ and their resource strengths are between 0.007 and 0.176. Maximum
⎩ ⎭⎠
task durations of these problems range between 2 and 21 according to
(32) a uniform distribution. More detailed information (e.g., task duration,
project network, resource usages) about these problems and more
∑𝐽 ( ( )) detailed test results are available at https://github.com/ugursatic/
50 − 4𝐽
𝐵𝑘 ≈ 𝐵𝑘𝑗 . (33) DSRCMPSP. The size of these problems exceeds the computational
𝑗=1
100
limits of DP and ORBA on our hardware, so we only compared ADP,
The MPSPLIB (http://www.mpsplib.com/) is a RCMPSP library GA and RBA.
which contains the problem set of Homberger (2007). These RCMPSP In a number of the larger scale problems, there can be relatively
problems are made by combining single project problems of PSPLIB small increases in profit as the arrival probability increases. We have
(http://www.om-db.wi.tum.de/psplib) which is RCPSP library (Kolisch included the full range of results for completeness and to appraise
& Sprecher, 1996). PSPLIB problems are generated with ProGen. potential differences in performance at different arrival probabilities.
The MPSPLIB contains 140 instances that differ by project type A broader discussion on the effect of arrival probabilities is provided
number, project number, task number, global resource type number in Appendix C.
and arrival times. Global resources are shared among all projects, and In five project types, five tasks and four resource types problems,
local resources are only used for a single project. Compared to our shorter projects are more profitable than longer ones. Thus, RBA with
ProGen/Max generated problems, MPSPLIB problems have predefined longer tasks first rule is disadvantaged, and we expect that ADP out-
tardiness costs and due dates. However, these due dates (𝐵𝑒𝑠𝑡𝑗 ) are the performs RBA. Table 9 shows that ADP produces better*** results than
shortest completion time found for single project problems, which we GA and RBA in all 𝜆𝑗 values.
need to modify to use in the dynamic multi-project setting. In two project types, ten tasks and two resource types problems,
From the MPSPLIB, we only considered 30 tasks per project prob- project type 1 is more profitable based on its reward/project horizon4
lems for algorithm benchmarking. We combined the given global and ratio. Also, tasks of project type 1 are longer than project type 2. Thus
local resources for each resource type. Then we reduced the resource
amount using (33). We use the predefined tardiness costs and twice
4
the tardiness cost as completion rewards to each project. We use The project horizon is the sum of its maximal task durations.

463
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Table 9
Five project types, five tasks and four resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 937 1492 1453 1495 1475 1494 1369 1473 1295 1485
GA 869 1128 1143 1141 1158 1146 1143 1148 1148 1145
RBA 698 601 576 562 558 549 527 538 517 507

Table 10
Two project types, ten tasks and two resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 292 424 490 502 519 754 756 759 759 760
GA 304 448 454 455 457 457 457 458 459 455
RBA 290 420 423 427 437 445 453 454 464 467

Table 11
Five project types, ten tasks and two resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 530 2463 2489 2464 2442 2508 2436 2457 2513 2509
GA 1757 2324 2330 2337 2336 2339 2346 2341 2339 2342
RBA 1769 2341 2356 2377 2376 2378 2389 2377 2392 2389

Table 12
Six project types, five tasks and two resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 1113 1632 824 1408 1660 1443 1740 1538 1310 1560
GA 1226 1308 1313 1312 1307 1301 1301 1299 1318 1311
RBA 1210 1145 1107 1106 1127 1084 1100 1098 1095 1086

this problem is advantageous for the longest task first priority rule of shows that ADP produces higher*** results than RBA in 𝜆𝑗 = 0.1 and
RBA. However, Table 10 shows that ADP usually produces higher*** 𝜆𝑗 = 0.5.
profits than alternatives except for low project arrival rates 𝜆𝑗 = 0.01 In summary, our comparison of larger size problems shows that
and 𝜆𝑗 = 0.1. ADP’s profit was higher* than GA in 53 of 70 problem scenarios; GA’s
profit was higher* for 13 problem scenarios, and there was NS dif-
In five project types, ten tasks and two resource types problems,
ference in the remaining 4 problem scenarios. ADP generated higher*
the tardiness costs are set to approximately 10% of the project rewards.
profits than RBA in 55 problem scenarios; RBA’s profit was higher*
Table 11 shows that ADP’s profits are higher*** than GA and RBA from
than ADP only in 12 problem scenarios, and results between ADP and
𝜆𝑗 = 0.1 to 𝜆𝑗 = 0.9.
RBA are NS different in 3 problem scenarios. These results show that
Six project types, five tasks and two resource types problems consist the overall performance of ADP is better* in the majority of larger-size
of six copies of the same project with different reward and tardiness problems than GA and RBA.
cost combinations. In Table 12 ADP produces higher*** profits for most
project arrival probabilities except for 𝜆𝑗 = 0.01 and 𝜆𝑗 = 0.2. ADP’s and 6. Conclusion
GA’s profits are NS different at 𝜆𝑗 = 0.8.
The ten project types, ten tasks and two resource types problems are Paper summary. In this paper, we modelled the dynamic and
the largest problems we generate with ProGen/Max. In this problem, stochastic RCMPSP as an infinite-horizon discrete-time MDP where
we arbitrarily assigned rewards and tardiness costs. Table 13 shows projects have identical arrival probabilities at each transition time, and
that ADP has higher*** profits than GA and RBA at all project arrival tasks have random durations. Our objective function maximises the
probabilities. expected total discounted long-run profit. We used a linear approxi-
mation model to design a practical scheduling policy and showed that
Two project types, thirty tasks and four resource types problem is
it performs near-optimally in small problems and compares favourably
the smallest MPSPLIB problem in our comparison. The problem name
to existing heuristics in large problems.
is mp_j30_a2_nr2_set in MPSPLIB, and it has one global and three local
The motivation of this study is to create a more comprehensive
resources. The resource strength of type one resource is 0.207. But
project scheduling model by considering the uncertainties of stochastic
other resource strength values are 0.01, 0 and 0. These values represent
task durations, random new project arrivals, multiple types of resource
that at least one large task cannot be processed in parallel with different
usages and bigger and complex project networks. For this purpose, we
tasks and might create a bottleneck. Table 14 shows that ADP leads to
suggest a linear approximation model which generates decisions based
higher*** profits than GA at most arrival probabilities except for 𝜆𝑗 =
on resource consumption and decision rewards. Our linear approxima-
0.1. ADP’s profits are higher*** than RBA at most arrival probabilities
tion model generated the best profits after the exact methods in our
except for 𝜆𝑗 = 0.1 and 𝜆𝑗 = 0.3.
comparisons and contributed to the literature by extending the work
Five project types, thirty tasks and four resource types problem is of Satic et al. (2022) which only considered small-sized projects with
the largest problem we have used in our comparison. The problem sequential networks and single resource type.
name is mp_j30_a5_nr4_set in MPSPLIB. The original problem has three This study provides an efficient ADP algorithm for dynamic and
global and one local resource type. We used the given due dates in stochastic RCMPSP, which generates profits that are significantly higher
the problem without changing them. The resource strengths of resource than or equal to the profits of ORBA, GA and RBA in respectively
types one, two and three are high (0.449–0.687) and the order strengths 80%, 81% and 87% of our comparisons. DP produces better* results
of these projects are low. In other words, many tasks can be processed than ADP for small-size problems. However, it suffers from the curse of
together, and free-resource availability may allow it easily. Table 15 dimensionality and is not suitable for larger problems. ADP is a viable

464
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Table 13
Ten project types, ten tasks and two resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 650 592 576 960 928 865 889 894 932 884
GA 522 536 546 544 539 542 546 540 540 548
RBA 552 550 555 554 553 556 549 552 554 555

Table 14
Two project types, thirty tasks and four resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 176 184 221 206 224 212 213 214 214 214
GA 168 197 195 192 193 192 192 190 189 193
RBA 161 200 208 209 207 209 210 209 209 210

Table 15
Five project types, thirty tasks and four resource types problem.
𝜆𝑗 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ADP 721 1515 864 1143 904 1748 913 1104 1591 1595
GA 719 1546 1665 1706 1727 1740 1741 1748 1755 1754
RBA 717 1493 1573 1589 1609 1614 1617 1627 1623 1627

Table A.16 Table A.17

Two project types and two tasks problem. Two project types and three tasks problem.
Project attributes Project attributes
Number of project types 2 Number of project types 2
Project type no 1 2 Project type no 1 2
Completion reward 3 10 Completion reward 12 6
Due date 8 5 Due date 10 15
Tardiness cost 1 9 Tardiness cost 8 5
Order strength 1.00 1.00 Order strength 1.00 1.00
Resource attributes Resource attributes
Resource factor 1.00 Resource factor 1.00
Number of resource types 1 Number of resource types 1
Resource type no 1 Resource type no 1
Resource amounts 3 Resource amounts 3
Resource strength 0.00 Resource strength 0.50

Table A.18
solution method for more practical, larger scale problems that cannot Three project types and two tasks problem.

be tackled by DP. Project attributes

Also, this study gives insights to project managers to determine Number of project types 3
Project type no 1 2 3
more suitable methods for their environment by providing a perfor-
Completion reward 8 5 20
mance comparison of ADP, DP, ORBA, GA and RBA methods in various Due date 10 8 10
project settings and various arrival probabilities. We suggest using Tardiness cost 5 3 19
DP for problems that are within the computational limitations of DP. Order strength 1.00 1.00 1.00
However, we suggest using ADP methods for larger problems. Resource attributes
Future research direction. In real life, there are more dynamic and Resource factor 1.00
stochastic elements in dynamic project scheduling environments than Number of resource types 1
those (stochastic task durations, uncertain new project arrivals, finish to Resource type no 1
Resource amounts 3
start project networks, multiple resource types and usage) considered in Resource strength 0.00
this paper. Future work might consider other elements of the dynamic
project scheduling environment, such as stochastic resource availability
or multiple modes of task processing.
Appendix B. Modelling assumptions and model generality

Acknowledgements Models are always simplifications of reality, but may still be useful
for making decisions if they capture the key problem features and are
We acknowledge Mahshid Salemi Parizi for making their code avail- solvable within a practical time amount.
able. The first author acknowledges the Ministry of National Education Project due dates and task durations are given in whole numbers
of The Republic of Turkey for providing a PhD scholarship. We thank of periods, and available resources and task resource usages are given
the two anonymous reviewers for their careful and detailed reviews of in non-negative whole units. If a task would in practice last for a
the paper. fractional number of periods, we round it up to an integer assuming
that the resource employed on that task cannot be re-allocated to a new
task until the beginning of the next period (for the remainder of the
Appendix A. Problem attributes
period during which a task is completed, the resource may be allowed
to take a vacation or be allocated to other tasks which are shorter
See Tables A.16–A.25. than one period or are preemptive; these are however not included in

465
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Table A.19 may be allowed to take a vacation or be allocated to other tasks which
Five project types, five tasks and four resource types problem.
require less than one resource unit; these are however not included in
Project attributes the considered system).
Number of project types 5 Geometrically distributed inter-arrival times are the discrete time
Project type no 1 2 3 4 5
Completion reward 63 50 19 48 46
analogue of exponentially distributed inter-arrival times in a continu-
Due date 54 105 232 96 139 ous time setting that are commonly deployed to model random arrivals.
Tardiness cost 21 25 15 25 25 The benefit of the memoryless property of this distribution means that
Order strength 0.80 0.70 0.50 0.50 0.40 the probability of a project arrival is constant and independent of the
Resource attributes system state. Other discrete probability distributions can be used with
Resource factor 0.80 some additional effort. For example, assume that the project arrival
Number of resource types 4.00 probability distribution is defined by the number of periods since the
Resource type no 1 2 3.00 4.00 last project arrival of the same type. In that case, by expanding the state
Resource amounts 18 19 15 18
Resource strength 0.12 0.1 0.08 0.13
to include this information, we can evaluate the conditional probability
of an arrival in the next period given the number of periods since the
last arrival.
Table A.20
The technical assumptions of our model such as the existence of
Two project types, ten tasks and two resource types problem.
project types and the limit of 1 project of each type in the system are
Project attributes
features that provide structure for our model which is then exploited
Number of project types 2
to find a solution efficiently. The model is however still useful even
Project type no 1 2
Completion reward 10 72
in situations that do not strictly fit these technical assumptions, as
Due date 125 129 we describe below. The model could be extended in a straightforward
Tardiness cost 5 15 manner allowing for more than 1 project of each type, at an expense of
Order strength 0.51 0.51 a more complicated notation (required additional indexing by project
Resource attributes number) and dynamics, which would potentially lead to an efficient
Resource factor 0.75 algorithm to solve it, but this modification was omitted in this paper
Number of resource types 2 as we believe the current model captures the key problem features.
Resource type no 1 2
Resource amounts 12 16
Our model (and algorithm) can be directly used to allow for any
Resource strength 0.10 0.18 fixed number of projects instead of just one project of each type. For
instance, if a company considers 2 real project types and is willing to
accept in parallel up to three projects of the first real project type and
Table A.21
Five project types, ten tasks and two resource types problem.
up to five projects of the second real project type, then our model with
8 model project types (limiting each model project type to one project)
Project attributes
can be employed, by defining each of the first three model project
Number of project types 5
Project type no 1 2 3 4 5
types identically to the first real project type and each of the remaining
Completion reward 99 83 85 98 92 five model project types identically to the second real project type. So,
Due date 131 138 166 135 180 our model will treat projects of the same type as different types. The
Tardiness cost 10 8 9 10 9 arrival probabilities of each model project type can be approximately
Order strength 0.36 0.36 0.36 0.69 0.67
obtained by dividing the arrival probability of the real project type by
Resource attributes the number of projects acceptable in parallel.
Resource factor 0.70 Actually, one of our examples uses this modelling trick. In our ‘‘Six
Number of resource types 2
project types, five tasks and two resource types’’ problem, all projects
Resource type no 1 2
Resource amounts 19 20 are copies of each other so there is actually only one type of project but
Resource strength 0.11 0.13 its capacity in the system becomes six. Although in that problem we
assigned different rewards, duration and tardiness costs, these project
variables could have been identical. So, multiple projects of the same
Table A.22
Six project types, five tasks and two resource types problem.
type can be accommodated in our model.
Project attributes The number of projects of each type in the system is naturally
Number of project types 6
limited by the amount of resources and the ratio between the due
Project type no 1 2 3 4 5 6 date and the minimum project duration. Suppose that a company is
Completion reward 99 75 50 30 10 80 managing one resource and the due dates are tight enough not allowing
Due date 177 177 177 177 177 177 to complete two projects one after the other by the earlier due date
Tardiness cost 25 25 25 25 5 60
(i.e., the above ratio is strictly lower than 2). Then it would unlikely
Order strength 0.50 0.50 0.50 0.50 0.50 0.50
be a good idea to have three or more projects of the same type in
Resource attributes
the system at any given time, because every project beyond the first
Resource factor 1.00 two (in the order of completion) would surely incur the tardiness cost.
Number of resource types 2
Resource type no 1 2
Therefore, perhaps except in some rare situations in which processing a
Resource amounts 24 18 project incurring the tardiness cost is still better than processing other
Resource strength 0.15 0.11 projects, it would be wise for the company to limit the number of
projects of each type to two, which would then allow them to use our
model to optimise the scheduling of the projects to achieve the highest
possible profit.
the considered system). Analogously, if a task would in practice use
Another limitation arising in practice, in some sectors such as
a fractional number of resource units, we round it up to an integer
software development, production, construction and R&D, is where
assuming that each resource unit employed on that task cannot be split
some types of resources are shared among all types of projects but
between several tasks (the remainder of the fractional resource unit
some resources are specific to one project type. For example, imagine a

466
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Table A.23
Ten project types, ten tasks and two resource types problem.
Project attributes
Number of project types 10
Project type no 1 2 3 4 5 6 7 8 9 10
Completion reward 36 37 93 69 38 62 12 80 52 62
Due date 429 354 654 771 447 288 448 1124 288 708
Tardiness cost 29 14 70 24 18 21 2 50 40 29
Order strength 0.36 0.58 0.89 1.00 0.51 0.53 0.51 0.53 0.69 0.67
Resource attributes
Resource factor 0.70
Number of resource types 2
Resource type no 1 2
Resource amounts 11 11
Resource strength 0.01 0.01

Table A.24 to their key characteristics (such as the number of tasks, durations
Two project types, thirty tasks and four resource types problem.
of tasks and required resources) into project types while potentially
Project attributes heterogeneous project rewards and tardiness costs can be replaced by
Number of project types 2 their corresponding average values.
Project type no 1 2
Completion reward 40 19
Due date 137 74 Appendix C. The effect of arrival probabilities
Tardiness cost 20 9
Order strength 0.33 0.57
Resource attributes A common feature of the algorithms’ performance in Section 5 is
that accrued profit is relatively consistent for scenarios with higher
Resource factor 0.50
Number of resource types 4 arrival probabilities. This is particularly true in the larger instances.
Resource type no 1 2 3 4 Fig. C.2 highlights this phenomenon for four of the project scenarios
Resource amounts 21 11 10 10 by considering relative measures of profit generation and resource
Resource strength 0.20 0.01 0 0 consumption for ADP Model 3. $/Max is the expected total discounted
long-run profit achieved for the given arrival probability expressed as a
percentage of the maximal profit achieved across all tested arrival prob-
abilities. R𝑘_usage, 𝑘 = 1, … , 𝐾 is the average usage of type 𝑘 resource
expressed as a percentage of total type 𝑘 resource available. Profits
Table A.25
Five project types, thirty tasks and four resource types problem.
increase in line with the increase in arrival probability, but the increase
typically arrests as the resource usage plateaus. This indicates there is
Project attributes
little additional profit gain as resources approach their consumption
Number of project types 5
Project type no 1 2 3 4 5
limits or capabilities for processing tasks in parallel.
Completion reward 48 2 22 32 56 We argue that there is value in investigating the algorithmic per-
Due date 62 82 72 89 72 formance across the test range of arrival probabilities used. In order
Tardiness cost 24 1 11 16 28 to achieve these profits, policies deployed by the algorithms should
Order strength 0.45 0.44 0.39 0.50 0.44
respond to the differing arrival rates. For example, take a problem with
Resource attributes two identical project types that differ only by one project’s net profits
Resource factor 0.25 (on-time and tardy) exceeding the other project’s. Under low arrival
Number of resource types 4
probabilities, the system will experience periods where no, or only one
Resource type no 1 2 3.00 4.00
Resource amounts 58 78 102 19 project, is available for processing. A sensible use of resources will be
Resource strength 0.45 0.65 0.69 0.13 to progress both projects towards on-time completions. At high arrival
probabilities, the availability of both project types for processing is
more common. A sensible policy will focus on processing the higher
return project and deploys resource to the other project when it is
software development company that has two types of projects: mobile viable to do, for example, in parallel or if it is the only project in the
app development and web app development. Each type of project system at that time.
requires a different set of programming skills and testing environments. All policies we consider do this to some extent, as seen in Section 5,
If these special resources are limited to only one, only one project from with strongly performing policies better able to take advantage. DP will
each type can be processed in the system at once, while different project do this optimally, at the expense of evaluating the policy. ADP can
types can be processed in parallel. In these environments, the system learn to do this through appropriate training, by taking into account
cannot process multiple projects of the same type at once and therefore the downstream profit implications of their actions. Reactive baseline
it makes sense not to accept more than one project of each type in the algorithms, by assuming no future arrivals, have a limited view of the
system. future profit implications of their actions. But good policies in this
Classifying projects into types may often be possible in practice, class would aim to maximise the net profit from clearing the existing
even for companies that deal with bespoke projects. Our model can be projects from the system and, hence, would prioritise the higher value
employed at a chosen level of detail, but there is a trade-off. A higher project. Rule-based algorithm performance depends on how the rule is
level of detail would possibly lead to more different types of projects, applied. If the two projects were in an identical state and there was
i.e., to a larger problem which may be harder to solve although the sufficient resource to process a single task from one of these projects,
arrival probabilities would be smaller (or zero in the most extreme then the longest task rule would randomly decide on which project to
case of the highest level with all projects bespoke). On the other hand, action. This could unnecessarily delay one or both project completions
a lower level of detail would allow for grouping projects according resulting in lower net profits.

467
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

Fig. C.2. Relative profit performance and resource consumption for ADP Model 3 in four problem scenarios.

Table C.26
Performance analysis of DP policies design for problem given arrival probabilities applied in problems for different arrival
probabilities.
𝜆𝑗 DP 1% 10% 20% 30% 40% 50% 60% 70% 80% 90%
1% 199.3 199.3 199.3 199.4 199.2 198.9 198.3 198.0 198.0 197.8 197.6
10% 878.3 877.4 878.3 875.8 862.4 844.2 815.6 803.9 801.6 796.0 785.3
20% 1043.9 1036.1 1039.6 1043.9 1033.7 1016.0 986.0 973.0 968.4 963.3 953.1
30% 1122.0 1085.0 1093.0 1110.7 1122.0 1117.8 1099.1 1090.3 1086.2 1081.7 1074.1
40% 1196.6 1114.1 1124.3 1155.4 1189.2 1196.6 1189.7 1186.4 1182.2 1178.6 1170.9
50% 1263.5 1141.4 1153.5 1195.6 1245.9 1260.8 1263.5 1263.1 1259.2 1255.7 1250.7
60% 1324.7 1171.5 1184.1 1234.7 1294.5 1314.5 1320.8 1324.7 1323.4 1321.9 1316.0
70% 1377.6 1205.3 1217.4 1272.3 1334.5 1360.4 1370.3 1375.5 1377.6 1376.2 1373.2
80% 1426.8 1247.8 1254.0 1309.8 1369.4 1401.2 1413.5 1419.4 1425.0 1426.8 1422.1
90% 1467.9 1302.5 1295.0 1347.7 1396.6 1440.4 1451.9 1455.9 1464.5 1468.3 1467.9

We expand the discussion by considering the robustness of policies development time. Management Science, 41(3), 458–484. http://dx.doi.org/10.1287/
to deviations in the arrival probabilities that they were designed for. mnsc.41.3.458.
Ahuja, V., & Birge, J. R. (2020). An approximation approach for response adaptive
We focus on optimal DP policies for the three projects, two tasks and
clinical trial design. INFORMS Journal on Computing, 32(4), 877–894. http://dx.
one resource problem. In Table C.26 we present the expected total doi.org/10.1287/ijoc.2020.0969.
discounted long-run profit of the optimal policy for each 𝜆𝑗 , 𝑗 ∈  , Capa, C., & Ulusoy, G. (2015). Proactive project scheduling in an R&D department a bi-
applied to all arrival probability scenarios. In each instance, we ran objective genetic algorithm. In 2015 International conference on industrial engineering
and operations management (IEOM), Vol. 1 (pp. 1–6). http://dx.doi.org/10.1109/
10,000 simulations. The first two columns show the optimal profit
IEOM.2015.7093733.
(DP) for the problem scenario with arrival rate 𝜆𝑗 . The remaining Chen, H., Ding, G., Zhang, J., & Qin, S. (2019). Research on priority rules for the
columns show the profit performance of the DP policy optimised for the stochastic resource constrained multi-project scheduling problem with new project
arrival probability in the column header applied to the problem with arrival. Computers & Industrial Engineering, 137, Article 106060. http://dx.doi.org/
10.1016/j.cie.2019.106060.
arrival probability given in the 𝜆𝑗 -row. Naturally, the diagonal of these
Choi, J., Realff, M. J., & Lee, J. H. (2007). A Q-learning-based method applied
columns is exactly the optimal profit in column DP. Away from the to stochastic resource constrained project scheduling with new project arrivals.
diagonal, we see the value in a policy being able to adapt to different International Journal of Robust and Nonlinear Control, 17(13), 1214–1231. http:
conditions. As we move along a row, away from the optimal profit, the //dx.doi.org/10.1002/rnc.1164.
Cohen, I., Golany, B., & Shtub, A. (2005). Managing stochastic, finite capacity,
performance of policies designed for different arrival rates deteriorates.
multi-project systems through the cross-entropy methodology. Annals of Operations
For low and high arrival probability there can be a close level of Research, 134(1), 183–199. http://dx.doi.org/10.1007/s10479-005-5730-1.
performance for policies designed for the adjacent arrival probabilities. Creemers, S. (2015). Minimizing the expected makespan of a project with stochastic
In summary, our investigations highlight the effect of project arrival activity durations under resource constraints. Journal of Scheduling, 18(3), 263–273.
http://dx.doi.org/10.1007/s1095.
probabilities on profitable project selection. Comparing policy perfor- Davis, M. T., Robbins, M. J., & Lunday, B. J. (2017). Approximate dynamic program-
mance across a broad range of arrival probabilities is useful to establish ming for missile defense interceptor fire control. European Journal of Operational
the overall effectiveness of the algorithms considered. Research, 259(3), 873–886. http://dx.doi.org/10.1016/j.ejor.2016.11.023.
Fliedner, T., Gutjahr, W., Kolisch, R., & Melchiors, P. (2012). Solving the dynamic
stochastic resource-constrained multi-project scheduling problem with SRCPSP-
References methods. In Proceedings of the 13th international conference on project management
and scheduling, Leuven, Belgium: KU Leuven (pp. 148–151).
Adler, P. S., Mandelbaum, A., Nguyen, V., & Schwerer, E. (1995). From project Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory
to process management: An empirically-based framework for analyzing product of NP-completeness. New York, NY: W. H. Freeman & Co..

468
U. Satic et al. European Journal of Operational Research 315 (2024) 454–469

He, F., Yang, J., & Li, M. (2018). Vehicle scheduling under stochastic trip times: Parizi, M. S., Gocgun, Y., & Ghate, A. (2017). Approximate policy iteration for dynamic
An approximate dynamic programming approach. Transportation Research Part C resource-constrained project scheduling. Operations Research Letters, 45(5), 442–447.
(Emerging Technologies), 96, 144–159. http://dx.doi.org/10.1016/j.trc.2018.09.010. http://dx.doi.org/10.1016/j.orl.2017.06.002.
Homberger, J. (2007). A multi-agent system for the decentralized resource-constrained Powell, W. B. (2009). What you should know about approximate dynamic program-
multi-project scheduling problem. International Transactions in Operational Research, ming. Naval Research Logistics, 56(3), 239–249. http://dx.doi.org/10.1002/nav.
14(6), 565–589. http://dx.doi.org/10.1111/j.1475-3995.2007.00614.x. 20347.
Kolisch, R., & Sprecher, A. (1996). PSPLIB: A project scheduling problem library. Powell, W. B. (2011). Approximate dynamic programming: Solving the curses of dimension-
European Journal of Operational Research, 96(1), 205–216. http://dx.doi.org/10. ality. Hoboken, NJ: John Wiley, http://dx.doi.org/10.1002/9781118029176.
1016/S0377-2217(96)00170-1. Ronconi, D. P., & Powell, W. B. (2010). Minimizing total tardiness in a stochastic single
Kolisch, R., Sprecher, A., & Drexl, A. (1995). Characterization and generation of machine scheduling problem using approximate dynamic programming. Journal of
a general class of resource-constrained project scheduling problems. Management Scheduling, 13, 597–607. http://dx.doi.org/10.1007/s10951-009-0160-6.
Science, 41(10), 1693–1703. http://dx.doi.org/10.1287/mnsc.41.10.1693. Satic, U., Jacko, P., & Kirkbride, C. (2020). Performance evaluation of scheduling
Li, H., & Womer, N. K. (2015). Solving stochastic resource-constrained project schedul- policies for the DRCMPSP. In M. Gribaudo, E. Sopin, & I. Kochetkova (Eds.),
ing problems by closed-loop approximate dynamic programming. European Journal Analytical and stochastic modelling techniques and applications, Vol. 12023 (pp. 100–
of Operational Research, 246(1), 20–33. http://dx.doi.org/10.1016/j.ejor.2015.04. 114). Cham: Springer International Publishing, http://dx.doi.org/10.1007/978-3-
015. 030-62885-7_8.
Li, H., Zhang, X., Sun, J., & Dong, X. (2023). Dynamic resource levelling in projects Satic, U., Jacko, P., & Kirkbride, C. (2022). Performance evaluation of scheduling
under uncertainty. International Journal of Production Research, 61(1), 198–218. policies for the dynamic and stochastic resource-constrained multi-project schedul-
http://dx.doi.org/10.1080/00207543.2020.1788737. ing problem. International Journal of Production Research, 60(4), 1411–1423. http:
Melchiors, P. (2015). Lecture notes in economics and mathematical systems, Dynamic and //dx.doi.org/10.1080/00207543.2020.1857450.
stochastic multi-project planning. Cham, Switzerland: Springer, http://dx.doi.org/10. Schütz, H.-J., & Kolisch, R. (2012). Approximate dynamic programming for capacity
1007/978-3-319-04540-5. allocation in the service industry. European Journal of Operational Research, 218(1),
Melchiors, P., & Kolisch, R. (2009). Scheduling of multiple R&D projects in a dynamic 239–250. http://dx.doi.org/10.1016/j.ejor.2011.09.007.
and stochastic environment. In Operations research proceedings 2008 (pp. 135–140). Schwindt, C. (1998). Generation of resource constrained project scheduling problems subject
Heidelberg: Springer, http://dx.doi.org/10.1007/978-3-642-00142-0_22. to temporal constraints: Report WIOR-543, Kaiserstrasse 12, D-76128 Karlsruhe,
Melchiors, P., Leus, R., Creemers, S., & Kolisch, R. (2018). Dynamic order acceptance Germany: Universitat Karlsruhe.
and capacity planning in a stochastic multi-project environment with a bottleneck Sutton, R. S., & Barto, A. G. (2018). Adaptive computation and machine learning,
resource. International Journal of Production Research, 56(1–2), 459–475. http://dx. Reinforcement learning: An introduction (second ed.). Cambridge, MA, USA: MIT
doi.org/10.1080/00207543.2018.1431417. Press.
Pamay, M. B., Bülbül, K., & Ulusoy, G. (2014). Dynamic resource constrained multi- Wellingtone PPM (2018). The state of project management annual survey 2018.
project scheduling problem with weighted earliness/tardiness costs. In P. S. Pulat, http://www.wellingtone.co.uk/wp-content/uploads/2018/05/The-State-of-Project-
S. C. Sarin, & R. Uzsoy (Eds.), International series in operations research & management Management-Survey-2018-FINAL.pdf, Accessed October 31, 2023.
science: vol. 200, Essays in production, project planning and scheduling (pp. 219–247).
Springer US, http://dx.doi.org/10.1007/978-1-4614-9056-2_10.

469

One-Time General Purpose Fuses: Low Voltage, Branch Circuit Rated Fuses
No ratings yet
One-Time General Purpose Fuses: Low Voltage, Branch Circuit Rated Fuses
1 page
A_Random_Generator_of_Resource-Constrained_Multi-P
No ratings yet
A_Random_Generator_of_Resource-Constrained_Multi-P
38 pages
Im Rcpsp Manuscript-AUTCON 2019 128
No ratings yet
Im Rcpsp Manuscript-AUTCON 2019 128
40 pages
Caterpillar 374f | Excavator | Service Manual | Repair PDF
No ratings yet
Caterpillar 374f | Excavator | Service Manual | Repair PDF
33 pages
manuskript_344
No ratings yet
manuskript_344
21 pages
A_random_generator_of_resource-constrained_multi-p
No ratings yet
A_random_generator_of_resource-constrained_multi-p
20 pages
Shashikant Mharasale ME107- Or
No ratings yet
Shashikant Mharasale ME107- Or
14 pages
kolisch1996
No ratings yet
kolisch1996
14 pages
Biju SPM
No ratings yet
Biju SPM
40 pages
Deep Reinforcement Learning For Dynamic Scheduling of A Flexible Job Shop
No ratings yet
Deep Reinforcement Learning For Dynamic Scheduling of A Flexible Job Shop
22 pages
A Late Mover Genetic Algorithm For Resource Constrained Project
No ratings yet
A Late Mover Genetic Algorithm For Resource Constrained Project
14 pages
A Mixed Linnear Programming Modelling For The Flexible Cyclic Jobshop Problem
No ratings yet
A Mixed Linnear Programming Modelling For The Flexible Cyclic Jobshop Problem
19 pages
A Two Stage Algorithm Based On 12 Priority Rules For The Stochastic
No ratings yet
A Two Stage Algorithm Based On 12 Priority Rules For The Stochastic
12 pages
s10845-023-02161-w
No ratings yet
s10845-023-02161-w
18 pages
Large-Scale Dynamic Scheduling For Flexible Job-Shop With Random Arrivals of New
No ratings yet
Large-Scale Dynamic Scheduling For Flexible Job-Shop With Random Arrivals of New
12 pages
European Journal of Operational Research: Jakob Snauwaert, Mario Vanhoucke
No ratings yet
European Journal of Operational Research: Jakob Snauwaert, Mario Vanhoucke
19 pages
A Hybrid Evolutionary Algorithm For The Dynamic Resource Constrained Task Scheduling Problem
No ratings yet
A Hybrid Evolutionary Algorithm For The Dynamic Resource Constrained Task Scheduling Problem
8 pages
1985 Mhring
No ratings yet
1985 Mhring
32 pages
Dynamicprogrammingkk
No ratings yet
Dynamicprogrammingkk
513 pages
Resources
No ratings yet
Resources
14 pages
New Sequential and Parallel Algorithm For Dynamic Resource Constrained Project Scheduling Problem
No ratings yet
New Sequential and Parallel Algorithm For Dynamic Resource Constrained Project Scheduling Problem
7 pages
2015 Book-RCPSP-t GA
No ratings yet
2015 Book-RCPSP-t GA
10 pages
European Journal of Operational Research
No ratings yet
European Journal of Operational Research
19 pages
A Random Generator of Resource-Constrained Multi-Project Network Problems
No ratings yet
A Random Generator of Resource-Constrained Multi-Project Network Problems
20 pages
An Approximate Dynamic Programming Approach to Project Scheduling With Uncertain Resource Availabilities
No ratings yet
An Approximate Dynamic Programming Approach to Project Scheduling With Uncertain Resource Availabilities
18 pages
Resource Constrained Project Scheduling Using Particle Swarm Optimization
No ratings yet
Resource Constrained Project Scheduling Using Particle Swarm Optimization
6 pages
Solving The FS RCPSP With Hyper Heuristics A Policy Driven Approach
No ratings yet
Solving The FS RCPSP With Hyper Heuristics A Policy Driven Approach
18 pages
PhD Research Proposal
No ratings yet
PhD Research Proposal
4 pages
Domain-Independent Dynamic Programming: Ryo Kuroiwa J. Christopher Beck
No ratings yet
Domain-Independent Dynamic Programming: Ryo Kuroiwa J. Christopher Beck
65 pages
MEPM 515 SP2020 Case Study 01 PDF
No ratings yet
MEPM 515 SP2020 Case Study 01 PDF
21 pages
Robust Buffer Allocation For Scheduling of A Project With Predefined Milestones
No ratings yet
Robust Buffer Allocation For Scheduling of A Project With Predefined Milestones
24 pages
Experimental Investigation of Heuristics For Resource-Constrained Project Scheduling: An Update
No ratings yet
Experimental Investigation of Heuristics For Resource-Constrained Project Scheduling: An Update
21 pages
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
1 s2.0 S0360835219302578 Main
No ratings yet
1 s2.0 S0360835219302578 Main
10 pages
ResourcesAllocaation Kuliah
No ratings yet
ResourcesAllocaation Kuliah
15 pages
Resource Constrained Project Scheduling Under Uncertainty - A Survey
No ratings yet
Resource Constrained Project Scheduling Under Uncertainty - A Survey
9 pages
Cooling Jacket Using Peltier Sensor
No ratings yet
Cooling Jacket Using Peltier Sensor
58 pages
ISO-IEC-7816-8-2021
No ratings yet
ISO-IEC-7816-8-2021
13 pages
Stochastic Reactive Production Scheduling by Multi-Agent Based Asynchronous Approximate Dynamic Programming
No ratings yet
Stochastic Reactive Production Scheduling by Multi-Agent Based Asynchronous Approximate Dynamic Programming
10 pages
Project Management and Scheduling
No ratings yet
Project Management and Scheduling
5 pages
Bostrom - A history of transhumanist thought
No ratings yet
Bostrom - A history of transhumanist thought
30 pages
An Approximate Dynamic Programming Approach To Network Revenue Management
No ratings yet
An Approximate Dynamic Programming Approach To Network Revenue Management
19 pages
PSA Oxygen Catalogue
No ratings yet
PSA Oxygen Catalogue
8 pages
Licenses Procedure
No ratings yet
Licenses Procedure
11 pages
An Integrated Hybrid Metaheuristic Model For The Constrained Scheduling Problem
No ratings yet
An Integrated Hybrid Metaheuristic Model For The Constrained Scheduling Problem
13 pages
Parallel Software Development with Threading Building Blocks: Definitive Reference for Developers and Engineers
From Everand
Parallel Software Development with Threading Building Blocks: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
User Agents
No ratings yet
User Agents
8 pages
Walker PaulVirilioGeorges 2001
No ratings yet
Walker PaulVirilioGeorges 2001
5 pages
Tectonic BMRWhitePaper Rev2.0 2019-1
No ratings yet
Tectonic BMRWhitePaper Rev2.0 2019-1
15 pages
Math g5 m2 Topic B Lesson 4 2
No ratings yet
Math g5 m2 Topic B Lesson 4 2
13 pages
7 KG Sensor Controlled Condenser Tumble Dryer: BDC710W
No ratings yet
7 KG Sensor Controlled Condenser Tumble Dryer: BDC710W
2 pages
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Charm++ Programming and Applications: Definitive Reference for Developers and Engineers
From Everand
Charm++ Programming and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
GW Manual SP24 English
No ratings yet
GW Manual SP24 English
32 pages
StarPU: Parallel Computing and Task Scheduling Techniques
From Everand
StarPU: Parallel Computing and Task Scheduling Techniques
Richard Johnson
No ratings yet
Input Output Module
No ratings yet
Input Output Module
1 page
Practical High Performance Computing: Definitive Reference for Developers and Engineers
From Everand
Practical High Performance Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Training Schedule 2022
No ratings yet
Training Schedule 2022
22 pages
Efficient Project Scheduling with GanttProject: Definitive Reference for Developers and Engineers
From Everand
Efficient Project Scheduling with GanttProject: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Stream Processing Techniques and Patterns: Definitive Reference for Developers and Engineers
From Everand
Stream Processing Techniques and Patterns: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
APA Student Papers
No ratings yet
APA Student Papers
7 pages
Reflective Report and Activity Journal
No ratings yet
Reflective Report and Activity Journal
4 pages
The Relationship Between Smartphone and Academic Performance
No ratings yet
The Relationship Between Smartphone and Academic Performance
9 pages
One HVAC System Covers Three Zones On 2014 Acura MDX
No ratings yet
One HVAC System Covers Three Zones On 2014 Acura MDX
3 pages
Reciprocating Compressors: CO // Semi-Hermetic
No ratings yet
Reciprocating Compressors: CO // Semi-Hermetic
16 pages
Essential Apache Beam: Definitive Reference for Developers and Engineers
From Everand
Essential Apache Beam: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Techniques in Dynamic Programming: A Comprehensive Guide for Java Developers
From Everand
Advanced Techniques in Dynamic Programming: A Comprehensive Guide for Java Developers
Adam Jones
No ratings yet
Pds Leb Manual
No ratings yet
Pds Leb Manual
54 pages
Introduction To Indesign
No ratings yet
Introduction To Indesign
22 pages
Mastering Dynamic Programming in Python
From Everand
Mastering Dynamic Programming in Python
Ed A Norex
No ratings yet
Value Engineering Techniques and Applications: Definitive Reference for Developers and Engineers
From Everand
Value Engineering Techniques and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Batch Certificate: Quality Code Colour Batch Number Batch Date
No ratings yet
Batch Certificate: Quality Code Colour Batch Number Batch Date
1 page
Bankstatment (1) (1) - 063447
No ratings yet
Bankstatment (1) (1) - 063447
1 page
Advanced Guide to Dynamic Programming in Python: Techniques and Applications
From Everand
Advanced Guide to Dynamic Programming in Python: Techniques and Applications
Adam Jones
No ratings yet
Basic-Power-System-Protection-Webinar-Sep-2020
No ratings yet
Basic-Power-System-Protection-Webinar-Sep-2020
4 pages
Applications of Combinatorial Optimization
From Everand
Applications of Combinatorial Optimization
Vangelis Th. Paschos
No ratings yet
The Island Synopsis and Study Guide
No ratings yet
The Island Synopsis and Study Guide
3 pages
Building Options at Project Front-End Strategizing: The Power of Capital Design for Evolvability
From Everand
Building Options at Project Front-End Strategizing: The Power of Capital Design for Evolvability
Guilherme Biesek
No ratings yet
A Measurement Framework for Software Projects: A Generic and Practical Goal-Question-Metric(Gqm) Based Approach.
From Everand
A Measurement Framework for Software Projects: A Generic and Practical Goal-Question-Metric(Gqm) Based Approach.
Prashanth Harish Southekal
No ratings yet
Mechatronics
No ratings yet
Mechatronics
2 pages
Implementing the Stakeholder Based Goal-Question-Metric (Gqm) Measurement Model for Software Projects
From Everand
Implementing the Stakeholder Based Goal-Question-Metric (Gqm) Measurement Model for Software Projects
Dr. Prashanth Harish Southekal
No ratings yet
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
R33640G1 Eng
No ratings yet
R33640G1 Eng
2 pages
Exploring the Complexity of Projects: Implications of Complexity Theory for Project Management Practice
From Everand
Exploring the Complexity of Projects: Implications of Complexity Theory for Project Management Practice
Svetlana Cicmil
No ratings yet
12CS em 2024
No ratings yet
12CS em 2024
152 pages
Agile Approaches on Large Projects in Large Organizations
From Everand
Agile Approaches on Large Projects in Large Organizations
Brian Hobbs
No ratings yet
CISA Exam-Testing Concept-PERT/CPM/Gantt Chart/FPA/EVA/Timebox (Chapter-3)
From Everand
CISA Exam-Testing Concept-PERT/CPM/Gantt Chart/FPA/EVA/Timebox (Chapter-3)
Hemang Doshi
1.5/5 (3)

Dynamic Programming Approach To Solve Multi-Project Scheduling Problem-2024

Uploaded by

Dynamic Programming Approach To Solve Multi-Project Scheduling Problem-2024

Uploaded by

European Journal of Operational Research 315 (2024) 454–469

Contents lists available at ScienceDirect

European Journal of Operational Research

A simulation-based approximate dynamic programming approach to

ARTICLE INFO ABSTRACT

the assumption that at most one project of type 𝑗 can be present in

Fig. 1. Discrete-time Markov Decision Process. 2.4. Model dynamics

2.4.1. Pre-decision state

𝑷 𝑗 = (0, 0, … , 0, 0). (6)

𝑗 is completed or pending for processing and, with 100% probability, Table 1

probability, there will be no new arrival of project type 𝑗 during the

The remaining 27 scenarios were NS differences. Since, Model 3 is not

Algorithm 1: : ADP. Algorithm 2: : Online Decision Making.

4.2. Optimal reactive baseline algorithm (ORBA)

Significantly lower results than ADP p<0.05, p<0.01, p<0.001

Table A.16 Table A.17

be tackled by DP. Project attributes

You might also like