0000 A Ship-Construction Dataset For Resource Leveling Optimization in Large Project Management Problems
0000 A Ship-Construction Dataset For Resource Leveling Optimization in Large Project Management Problems
Data in Brief
Data Article
a r t i c l e i n f o a b s t r a c t
∗
Corresponding author.
E-mail address: [email protected] (C. Kyriklidis).
https://doi.org/10.1016/j.dib.2023.109340
2352-3409/© 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
2 C. Kyriklidis and G. Dounias / Data in Brief 49 (2023) 109340
Specifications Table
• Project managers usually use suitable software for the project scheduling process. Their goal
concerns the scheduling of the project characteristics (i.e., budget, duration, activity correla-
tions, and resources). The combination of these characteristics leads to the project organiza-
tion in the initial phase and during its execution. The Resource Levelling Problem, as part of
Project Management, demands the design of the project network and the calculation of the
resource profiles, which are basic elements of the project structure. Their creation is based on
the following information: 1) activity duration, 2) activity predecessor, and activity resources.
The data presented in this document concerns this necessary information for the ship project
network construction and the corresponding resource profiles. Three files are available con-
taining the related elements (only integers numbers), easy to download and manage.
• Researchers or decision-makers, who want to evaluate their approaches through the imple-
mentation of a highly complex real-world project, can benefit from the present database.
• The related information is available in the form of excel files from the Mendeley Database.
Everyone could download them, apply them as input on his approach software, and create
the project network (CPM) and feasible resource profiles. The experimentation results could
be compared with results from other research, approving in parallel how competitive the
new methods are.
1. Objective
Many researchers, while developing and evaluating their methodologies, search for compa-
rable project networks in the literature to prove the suitability and effectiveness of their new
proposed approaches. Intelligent approaches manage to solve complex problems, producing
C. Kyriklidis and G. Dounias / Data in Brief 49 (2023) 109340 3
high-accuracy solutions, which are competitive for practical application [1,2]. These intelli-
gent techniques implement different evaluation functions for the Resource Leveling Problem
[3,4]. Researchers working on resource-leveling optimization approaches lack access to large
real-world data collections, for testing their methodologies in high-complexity conditions.
In the field of project management, and specifically in resource allocation problems, the avail-
able data sets correspond to case studies with few activities (i.e., ten to twenty) [5,6]. The work
in [4] presents some of the small projects existing in related literature. Theoretical benchmark
projects with a maximum of 90 and 120 activities are available in Project Scheduling Problem
Library (PSPlib: https://www.om-db.wi.tum.de/psplib/), which mainly concerns another impor-
tant subfield of project management, that of Resource Allocation Problem. Resource Allocation
Problem consists of two subproblems: 1) Resource Scheduling Problem and 2) Resource Leveling
Problem. The main difference concerns that Resource Scheduling Problem goals to the compu-
tation of project duration under specific resource availability, while Resource Leveling provides
minimum project duration and parallel smoothing of the resource profile. Moreover, the existing
data sets are artificially created with rather low complexity. A benchmark project of 90 activities
with an approximate number of alternative solutions ranging between 11 ∗ 1075 and 7 ∗ 10105
[4], while a project with 120 activities provides an approximate number of alternative solutions
between 10 ∗ 10112 and 6 ∗ 10124 [7]. Large, realistic, and highly complex projects are rarely avail-
able in the literature.
The current data set aims to fill this gap, providing indispensable information for a real and
complex ship construction project. Thus, researchers can access these data and evaluate the per-
formance of their approaches under real-world conditions. The availability of large real-world
datasets for resource-leveling optimization will prove the worth of obtaining optimal or subop-
timal related solutions which will result in considerable resource savings.
2. Data Description
The data correspond to a large construction project, i.e., the construction of a ship taken in a
shipyard in South Korea. The optimal resource management problem has become popular in the
construction sector in the last decades. The related literature contains different types of projects
(i.e., construction projects, software projects) with small (maximum of 20 activities per project)
or medium (not more than 120 activities per project) size projects considered of low complex-
ity [5,6]. This article presents data from a real-world benchmark ship construction project with
realistic complexity. Real-world projects: 1) usually contain numerous activities and 2) the du-
ration of their execution is often time-consuming. These two factors result in a large number of
alternatives regarding start times for most activities (non-critical activities) in a project. Addi-
tionally, the number of feasible resource profiles also increases rapidly. Therefore, such a kind of
problem is defined as NP-hard in Operations Research [3].
The specific shipbuilding project concerns a 50 0 0 0 DWT ship which consists of 1178 different
activities with several interrelations among them. The project lasted 208 days and the duration
of the activities ranged between 1 and 40 days, according to the construction plan. In total, 21
different kinds of resources were involved in the ship construction. The dataset has also been
used several times during the last decade for teaching and research purposes in the Manage-
ment and Decision Engineering Laboratory (MDE-Lab) of the University of the Aegean, Dept. of
Financial & Management Engineering, Chios, Greece.
Ship construction or shipbuilding is a domain concentrating huge amounts of experience and
engineering knowledge. On the contrary, real-world data corresponding to the construction pro-
cess are not available. As a result, the development of proper optimization methods for resource
leveling is challenging. Cost management in shipbuilding is one of the main issues that can be
found in the literature, which is related to the planning, analysis, and control of product costs in
the maritime industry [8]. It seems that shipyards often face organizational problems with plan-
ning and scheduling production processes, estimating resources, and setting priorities, and thus,
modern managerial and computational techniques have been introduced in some cases [9]. Nev-
4 C. Kyriklidis and G. Dounias / Data in Brief 49 (2023) 109340
ertheless, related studies mostly consider all the above-mentioned problems as parts of a more
global strategic problem in shipbuilding, containing different managerial issues to be studied for
improving the productivity and reengineering of the scheduling process.
Ship construction consists of four main parts or phases [8]: 1) Hull construction, 2) Machin-
ery setting, 3) Outfitting, and 4) Painting. The Hull construction includes sheet metal processing
by welding, straightening, and cutting. Machinery is related to all principal or accessory ma-
chines listed on the ship. Outfitting has to do with the installation of systems such as plumbing,
electrical, etc. Finally, the painting includes cleaning and sheet metal painting (depending on the
ship type, different treatment is required).
The ship-project network requires two project elements: 1) the activities duration and 2)
the activities correlations. Its construction is implemented through the well-known Critical Path
Method (CPM) [10]. CPM provides the Early-Start and Late-Start for every activity, sorting these
activities into two groups: 1) The critical activities (Early-Start = Late-Start) and 2) The non-
critical activities (Early-Start = Late-Start).
Fig. 1 presents a network example of a small project (i.e., 14 activities). Every node contains
information related to the corresponding activity. This information is: (a) the id denoted as i,
(b) the Early Start of activity i, i.e., ES, (c) the Late Start of activity i, i.e., LS, and the duration of
activity i i.e., d. The arrows between nodes represent the predecessor correlation. For example,
the activity with id 5 has a predecessor activity with id 2. In conclusion, the red path depicts
the Critical Path of the project (critical activities).
The Early Start Time of activity is calculated from the Early Start from predecessor activities
added to the predecessor’s duration. The maximum calculation value is the current activity’s
Early Start. For Example, in Fig. 1 Activity with id:10 has predecessor activities with id: 8 and 9.
ES (id:8) = 12, Duration (id:8) = 2, the sum ES+D provides ES= 14.
ES (id:9) = 5, Duration (id:9) =3, the sum ES+D provides ES= 8.
Between the two values activity with id:10 obtains the maximum ES =14.
On the other hand, The Late Start Time of the activity is calculated from the Late Start
from follower activities minus the predecessor’s duration (activity with id:1). The minimum
C. Kyriklidis and G. Dounias / Data in Brief 49 (2023) 109340 5
calculation value is the current activity’s Late Start. For Example, in Fig. 1 Activity with id:1 has
followers’ activities with id: 2, 3, and 4.
LS (id:2) = 1, Duration (id:1) = 0, the difference LS-D provides LS = 1.
LS (id:3) = 9, Duration (id:1) = 0, the difference LS-D provides LS = 9.
LS (id:4) = 7, Duration (id:1) = 0, the difference LS-D provides LS = 7.
Between the three values activity with id:1 obtains the minimum LS = 1.
Fig. 2 concerns the resource profile of a project example. The project execution takes place
for more than 200 days, and the requirement of the resources exceeds the 600 resource units.
The data describe the ship-project network and its relative resource profile. The data set con-
sists of the following files:
a) activities duration.xlsx: containing the activities’ duration.
b) correlations.xlsx: containing the activities’ correlations.
c) resources.xlsx: containing the resources’ requirements per activity.
The duration di of every activity i concerns the execution time for the activity. Its value is
calculated as the mean value of similar activities [11]. This approximation is performed by the
experts who plan the project because in real-world projects the scheduling duration is often
different from the execution duration. Supposing an activity has a duration of 10 days, this du-
ration is an average value, because in reality the specific activity needs 9 days or probably less
than 9 days and in other projects 11 days or probably more than 11 days. A decision for the fi-
nal activity duration should use an average value. The corresponding file contains 1178 columns
representing the project’s number of activities. Every row represents the duration of every ac-
tivity. For example, the activity with the number 5 has a duration of 14 days. The duration of
the activities ranges between 1 and 40 days, according to the construction plan.
Regarding the file with the correlations of the activities, it contains 1178 columns as well.
Every row’s separate cell contains the correlation of the corresponding activity, which concerns
the predecessor activity (p) for i activity and it is defined as pi . The activity’s correlations refer
to activities that must be completed before the present activity execution starts. For example,
the activity with the number 66 has two predecessor activities, the activities with the numbers
60 and 65. The id of predecessors ranges between 1 and 136 while the majority of the activities
has a maximum of 10 predecessor activities.
6 C. Kyriklidis and G. Dounias / Data in Brief 49 (2023) 109340
Table 1
Human resources for ship construction.
Human resources
Table 2
Project example data.
Id Duration Predecessor activities Resources Start time Finish Time Early start time Late start time
1 5 0 4 1 5 1 1
2 4 1 8 6 9 6 7
3 5 1 2 6 10 6 6
Using files “activities duration” and “activities correlations”, the ship-project network through
the Critical Path Method (CPM) can be constructed.
When the ship-project network is available, the decision-maker can build feasible resource
profiles. The resource profiles depict the demand for resources for every different project day.
The daily resource requirement for an activity i, is defined as ri .
The file “resources” contains 1178 columns (total activities number) and every row presents
the total resource requirement. For example, the activity with the number 23 requires 8 re-
sources. All kinds of resources are human resources (Table 1) and the initial capacity per kind
is not restricted. Therefore, the current data set considers one type of resource. No further in-
sight per resource type was given in the raw data. The purpose of the resource leveling problem
is the minimization of the total resources’ daily requirement [4,11], which contributes to the
improvement of the partial kinds of resources.
After the completion of the ship-project network and the production of the feasible resource
profiles, the Resource Management process takes place.
An analysis of the project elements (activity duration, activity predecessors, and activity re-
sources) follows with a small project example (Table 2). The small project contains three activ-
ities with id: 1-3. The execution time of the activities ranges between 4 and 5 days. Activities
with id: 2 and 3 have as the predecessor the activity with id: 1.
Implementing the Critical Path-Method, the Early Start Time and Late Start Time of every ac-
tivity is calculated when the Start Time of the activities takes values between these time limits.
Activities with id: 1 and 3 are critical activities because they are the same as their Early Start
Time and Late Start Time values. On the other hand, the activity with id: 2 is a non-critical
activity (ES=LS).
For the development of the related network, the duration of activities as well as of the pre-
decessor activities is required, (Fig. 3). The following figure presents every activity information:
activity id, activity duration, activity Start Time, and activity Finish Time. Activities with id: 2
and 3 have as value of Start Time the 6th day because the execution of activity with id 1 has to
take place not later than the 5th day of the project. The completion of the activity with id: 3 is
on the 10th day of the project, while the activity with id: 2 has been already on the 9th day of
the project (smaller execution duration).
After the formation of the project network, the resource profile of the project is produced
(Fig. 4). The project example is executed in 10 days. The first 5 days require resources solely for
C. Kyriklidis and G. Dounias / Data in Brief 49 (2023) 109340 7
activity with id:1. In the first 4 days, the resource value rises to 4 units. From the 6th until the
9th day of the project the activities with id: 2 and 3 take place. The requirement of resources
sums to 10 units per day. On the last day of the example, the only activity still executed is the
one with id: 3, and the total daily requirement of resources for that day is reduced to 2 units.
The last example could be a suitable introduction for new researchers or students at univer-
sity. On the other side, the present ship construction data, as a large complexity project, requires
the use of well-known Project Management Software for constructing its network or its resource
profile. From the above example, it is obvious that complexity increases very quickly and the
number of alternative feasible solutions to be considered, becomes quickly enormous. Real-world
examples for resource-leveling optimization like the one proposed in this paper could provide
new engineers with useful knowledge and experience for their future careers in project man-
agement.
The present study contains the basic elements of the ship project network such as the re-
source profiles. Activity duration, activity correlations, and activity resources are available in ex-
cel files and are easy to download and manage.
8 C. Kyriklidis and G. Dounias / Data in Brief 49 (2023) 109340
The specific information has undergone simple processing to obtain its final form because it
has been collected manually. During the collection of the data, a member of the Management
and Decision Engineering Laboratory (MDE- Lab) was present in the place of the ambush of the
ship project. In collaboration with the project manager, he recorded the necessary information
in handwritten form.
The original data was not in a position for direct use by a software program. For this reason,
the information was entered into an excel file during the elaboration of the Ph.D. Dissertation
research [11]. The published dataset was also anonymized by removing any names and costs of
activities, client information, and addresses corresponding to construction sites.
The project consists of 1178 activities, the duration of each activity was entered in each line
of Excel, together with the activity id. The same procedure was followed to enter the data corre-
sponding to the required resources for each activity. Thus, both activity duration and resources
recordings, consist of only one column, and in each cell, only one corresponding integer is writ-
ten down.
On the other hand, the predecessor activities file contains more columns. These activities are
recorded on each line and for as many predecessor activities as an activity has. As previously
mentioned, an activity has a range of 1-136 predecessor activities, and all those activities have
a smaller id than that of each line.
If one wants to insert these data into a software program (e.g., MATLAB) one more interven-
tion is required for inputting all the necessary information at once and not in each line sepa-
rately. If an activity has fewer predecessor activities, a zero (0) is added to the data for each
subsequent column up to the 136th column. Thus, a matrix of dimensions 1178×136 is formed.
Where there is a zero (0) value in the dataset, it is automatically understood that no additional
predecessor activity exists. In case blank cells appear in the data, the code that will use the
specific file must be converted.
An illustrative example follows showing activity 78 in the correlations excel file, which has as
predecessor activities, activities 72 and 77. Both these activities must be completed prior to the
execution start time of activity 78. Furthermore, three more advantages of the specific matrix
structure could be emphasized:
1) The order in which the predecessor activities are listed does not change the use of values.
For example, activity 72 can be recorded as the first predecessor (column A) and activity
77 as the second predecessor (column B) of activity 78. The use of the predecessor ac-
tivities does not change when activity 77 is recorded as the first prerequisite (column A)
and 72 in the second position (column B), as it is observed in the specific excel file (line
78). The remaining 134 elements of the row, all have the value 0 (Total line elements 136:
1 with 2 predecessors and 134 elements with 0). This means that there is no other pre-
decessor activity for activity 78. The predecessor activities of the activity are not order
listed, and the absence of correlations is represented with a 0 number in registration.
2) A predecessor activity’s duplicate or multiple registrations can be detected and persisted
only once. For example, activity 72 can be recorded as the first predecessor (column A),
activity 77 as the second predecessor (column B), and activity 72 as the third predecessor
(column C). The third predecessor must be replaced with a zero (0). Multiple registrations
of the same predecessor activity are not allowed, and extra registrations of the current
predecessor activity take the value 0 as no correlation registration.
3) The id from a predecessor must not be bigger than the present id activity. For example,
activity 72 can be recorded as the first predecessor (column A), and activity 90 can’t be
recorded as the second predecessor (column B) from activity 78. Activity 90 must be re-
placed with the right predecessor activity (activity 77). Between thousands of activities
is complicate the correlation control of all activities. This restriction reduces the random
registration problems during the production of correlations file.
The duration of the data collection was time-consuming and exceeded 6 months as weight
was given to their accurate recording. After manually recording the data, each entry was checked
separately, to avoid typographical errors or illogical information. Entering the data into Excel was
C. Kyriklidis and G. Dounias / Data in Brief 49 (2023) 109340 9
not particularly demanding as all the elements are integers. In conclusion, the three excel files
were created and re-checked to avoid any possibility of a misplaced or incorrect element.
The authors declare that they have no known competing financial interests or personal rela-
tionships that could have appeared to influence the work reported in this paper.
Ethics Statement
References
[1] J.Q. Geng, L.P. Weng, S.H. Liu, An improved ant colony optimization algorithm for nonlinear resource-leveling prob-
lems, Comput. Math. Appl. 61 (2011) 2300–2305, doi:10.1016/j.camwa.2010.09.058.
[2] N. Kartam, T. Tongthong, An artificial neural network for resource leveling problems, Artif. Intell. Eng. Des. Anal.
Manuf. 12 (1998) 273–287, doi:10.1017/S0890060498123053.
[3] J.w. Huang, X.x. Wang, R. Chen, Genetic algorithms for optimization of allocation in large scale construction project
management, J. Comput. 5 (12) (2010) 1916–1924, doi:10.4304/jcp.5.12.1916-1924.
[4] C. Kyriklidis, G. Dounias, Evolutionary computation for resource leveling optimization in project management, Inte-
grat. Comput.-Aided Eng. 23 (2) (2016) 173–184 IOS Press, doi:10.3233/ICA-150508.
[5] S.S. Leu, C.H. Yang, J.C. Huang, Resource leveling in construction by genetic algorithm-based optimization and its
decision support system application, Autom. Construct. 10 (20 0 0) 27–41, doi:10.1016/S0926-5805(99)0 0 011-4.
[6] S.S. Leu, T.H. Hung, An optimal construction resource leveling scheduling simulation model, Canad. J. Civil Eng. 29
(2002) 267–275, doi:10.1139/l02-007.
[7] C. Kyriklidis, Intelligent Methods for Solving Resource Leveling Problems in Projects, Department of Financial and
Management Engineering, University of the Aegean, Chios, Greece, 2015 http://hdl.handle.net/10442/hedi/42879.
[8] J.O. Fischer, G. Holbach, Cost Management in Shipbuilding - Planning, Analyzing and Controlling Product Cost
in the Maritime Industry, GKP Publishing, Cologne, Germany, 2011 https://costfact.de/wp-content/uploads/files/
CostManagementBook_Excerpt_3.pdf.
[9] J.K. Lee, K.J. Lee, H.K. Park, J. Hong, S-J. Lee, Developing scheduling systems for Daewoo shipbuilding: DAS project,
Eur. J. Oper. Res. 97 (2) (1997) 380–395, doi:10.1016/S0377- 2217(96)00205- 6.
[10] J. Kelley, Critical path planning and scheduling: mathematical basis, Oper. Res. 9 (3) (1961), doi:10.1287/opre.9.3.
296.
[11] J. Rieck, J. Zimmermann, T. Gather, Mixed-integer linear programming for resource leveling problems, Eur. J. Oper.
Res. 221 (2012) 27–37, doi:10.1016/j.ejor.2012.03.003.