Predicting Initial Duration of Project Using Linear and Nonlinear Regression Models
Predicting Initial Duration of Project Using Linear and Nonlinear Regression Models
Abstract
The fair prediction of completion time of the road project at the initial stage is crucial for the preparation of the tenders.
Usually, the engineers using their past experience to assign inaccurate execution time for this type of projects which are
not supported by any statistical models. This paper aims to find an acceptable mathematical model that helps specialists in
road projects to estimate the implementation time in a more realistic way. Two mathematical models have been developed,
which can effectively predict the execution time of road projects in Iraq based on quantities of each of base layer length,
width, earthwork, sub base and final cost of the road project. The test results of the nonlinear regression model (NLRM)
seem to be more accurate than multi linear regression model (MLRM). NLRM coefficient of determination (R2) is 88.6%,
while 76% for MLRM. The mean absolute percentage error (MAPE) 20.29%, 11.65% for MLRM and NLRM respectively.
The root means square errors (RMSE) are 76.59, 45.73, and the average accuracy percentages are (AA) 79.71%, 88.35%
for MLRM and NLRM respectively. The developed NLRM model reinforces the ability of the decision maker in such
projects to accurately estimate the required time pre tendering process.
Keywords
Road duration, Nonlinear model, Project cost, Regression.
goal. The clauses of the bill quantities of this type of factors other than bill quantity factors are the two
projects were used as a model inputs. Simulation has limitations of the study [6].
been done to create a statistical sample from the same Three MLRM for the highway construction duration
statistical distribution of the original cases. in Nigeria were developed. The developed models
were in the form of semi-log, linear and log-log
This paper discussed and review the previous studies transformations using road length, road thickness, no
in section 2 while section 3 deals with the method of of culverts, and cost/unit length. The results stated
the study, which contains factors that affect the time that the prediction performance of the log-log model
prediction of the road in the study area, the data in terms of goodness of fit and prediction accuracy
simulation. The results of the study and model was better than that of the other models [7].
validations are in section 4 followed by discussions
in section 5. Lastly, the conclusions and the Data from 39 bridge projects in Greece was collected.
suggested future studies presented in section 6 along Using statistical package for social science (SPSS)
with study limitations. and Waikato environment for knowledge analysis
(WEKA) application to perform correlation analysis
2.Literature review between variables. They conclude that the length of
Based on data collected from Indiana Department of the deck, followed by the surface of the bridge and
Transportation (INDOT) to predict contract time to the concrete amount of deck are the most correlated
examine the relation between highway construction one's with the project duration. Fast artificial neural
project cost and their duration. Factors such as network tool (FANN) models were created using the
projects region, kind of highways, and weather sift data from the previous step. Small sample size
conditions were used as prediction parameters. They and model variables were the two limitations of this
conclude that the contract time could be predicted research. They suggested to adopt another modelling
with 95% confidence interval (CI) using regression method to compare the results [8].
analysis. They made adjustments to enclose the
impact of the main factors such as road type, Depending on six factors which are; road length,
location, traffic volume, and execution season on number of lanes, number of intersections, earthworks,
estimated time. The developed model in this study is type of pavement and furniture level ANN model was
not for contractor usage, but for the transportation developed to predict the road duration in Iraq. A
agency [4]. significant correlation (90.6%) was found between
actual and predict duration of the road project and
Project type and planned cost which are the available AA of 74.27%. The results of the study were limited
factors at the initial stage of the project were used to to construct the project by roads and bridges
estimate the duration from 120 highway project in directorate in Iraq between the years 2011 and 2012
Pakistan from the year 2001 to 2012. These factors in [9].
addition to the project location and their relation to
cost overrun risk were used to develop MLRM. Data Two variables, the initial cost and the initial duration
from four different regions in Pakistan was used to were used in MLRM to predict the actual duration of
assess the location effect on time risk issue. They the road project using data from 37 projects in Greece
concluded that there is a weak correlation between at the bidding stage. ANN model was also built using
road duration and geographic location of the project. factors such as initial duration, initial cost, length,
MAPE values ranged from 20% to 40% overestimate lanes, embankment, existence of bridges,
or underestimate in project time [5]. geotechnical projects, tender offer, technical in
projects and landfills. The MLRM models
Artificial neural network (ANN) model was successfully predicted the execution period of the
developed to estimate the duration of prefabricated road project with R2=72%, while the ANN model
steel bridge projects on rustic routes in Ghana. Final RMSE equal to 1.53E-06. They suggested to include
payment certificate bill of quantity was used and further projects to resolve the correlation issues
prepared by the ANN. To reduce the factors to a least which are not checked in their study [10].
number of variables, namely the formwork and
bridge span as predictors they used a principal Based on the available information at the letting stage
component analysis (PCA). They concluded that the a hazard-based model conducted to predict the
in-situ formwork and the bridge span were strongly duration of a highway project. The study used
related to the road duration. Small sample size and no survival analysis and (Kaplan–Meier) method to
1731
Gafel Kareem Aswed et al.
build their models. Project groups are identified when starting with variable of heigh correlation value and
it has a similar survival distribution using the log- adding one by one based on their correlation degree.
rank test. Time-to-event parametric models were Depending on the value of the correlation coefficient,
developed to connect the project time to the external the variables were entered into the models one after
factors. The models depict the nonlinear pertinence the other starting from the highest correlation. Project
between project time and the external independent length, initial cost, land requirement, landfill, tender
variables such as, location of the project, bidding offer, embankment, initial duration, geotechnical
time, source of design and the contractor They project, number of lanes, existence of a tunnel, a
concluded that from five models tested that Weibull number of technical projects, existence of the bridge
model is the best one for predicting construction time are the study factors. The study revealed that the
of bridges projects and traffic work categories while ANN model is more accurate than MLRM in
log-logistic is the best for other work categories [11]. guessing road project executed time. The small
sample size is the main limitation of this study, which
Five MLRM relationships were tested to predict the is not more than 37 road projects [15].
preliminary road construction time in the west bank
in Palestine using bid quantities, length and width of Data from 100 public road projects finished between
road and area as predictors. The MAPE of the 2003 and 2008 in three adjacent cities in southeastern
constructed models ranged from 19.1% to 31.4%. Poland was used for predicting the road duration at
The study concluded that the inclusion of bid the visibility stage. Factors such as cost, length,
quantities as predicted in the model makes it better culverts, bays, intersections, were adopted in this
than ones include the size of the project. The study. They construct three models, simple
researcher suggested that the responsible authorities regression, a multifactor regression and a regression
in the construction industry in Palestine must tree and stated that the models were statistically
continuously update the productivity data of correct but low precise [16].
machines and labour to improve the accuracy of the
prediction of the road project duration [12]. There is no previous study, to the knowledge of the
researchers, that dealt with predicting the duration of
The performance of MLRM and ANN models was the road project in Iraq using simplified models
compared depending on the available factors at the except for one study that used artificial neural
bidding stage and illuminate the more correlated networks [9] and the resulted model was complex
factors with the road duration. Four analytical tools and difficult to use. In this research some data in the
such as time series analysis, ANN and smoothing bill of quantity will be used to predict the road
techniques and Brimelow’s model of time–cost duration before the bidding phase. A complete list of
(BTC) was used to predict the construction duration abbreviations is shown in Appendix I.
of India’s government infrastructure projects
(highway road projects). The study depended on; the 3.Methods
highway status, type, planned duration, time overrun 3.1Identification of factors
and contract cost as predictors [13]. The most recent studies were explored to recap the
factors affecting the preliminary guess of road project
A study on a rehabilitation road project in Karbala duration in the planning phase when not enough data
city in Iraq was done. The main activities in the road is available. The Google scholar search engine was
were; municipal works, sewer works, trench works, used to check the universal database about the issues.
optical cable works, water works. Six main activities Studies for instance, Mohamed and Moselhi [17],
of the selected road were chosen and it was presented Sharma et al. [18], Le et al. [19], Nevett et al. [20],
to a group of experts in scheduling such activities. Son et al. [21], Okere [22], and Nani et al. [23] used
The study assured that the amended project multi factors that could be adopted to forecast road
evaluation and review technique (PERT) method duration at the early stage of the project. This study
using the fuzzy Delphi (FD) process was the most adopted the available factors after preparing the bill of
appropriate technique to predict the execution time of quantity by the estimator and pricing the whole road
the road [14]. activities. To achieve the study goal the researcher,
collected the data of the completed road projects in
MLRM and ANN models were developed using data order to build the predictive models of road duration.
from Greek highway projects. The data was extracted These factors are; length of the road (L) in (m), width
and tested for correlation analysis. Modelling was (W) in (m), actual duration (D) in days, final total cost
1732
International Journal of Advanced Technology and Engineering Exploration, Vol 9(97)
(C) in Iraqi dinars (IQD), barriers cost (B) in IQD Probability distribution function (PDF) for these
such as presence of obstacles such as water lines, factors is determined first. The values of these factors
sewers and electrical cables, rock (R) in (m3), are simulated many thousands of times and each time
earthwork (E) in (m2), culverts (CL) in m., subbase random numbers are chosen subjected to the same
(S) in m2 and lastly, asphalt (AS) in m2 as in Appendix probability distribution for that factor and the resulted
II. data is a new probable project [24]. Collecting
fourteen completed projects’ data from Directorate of
3.2Study area Roads and Bridges of Karbala Governorate in Iraq.
This research is conducted in Karbala province, Iraq. Examining the distribution of each variable as a first
Kerbala is located in the central part of the country, step to simulate the data. Fifty thousand cases of
southwest of the capital, Baghdad, at a distance of simulated data have been made using SPSS program
(110 km). The total area of city is about (5,560 km), as shown in Figure 1. Fourteen completed projects’
which constitutes (1.1%) of the total area of Iraq. data from Directorate of Roads and Bridges of
More than1500 km length from highway and Karbala Governorate in Iraq. Examining the
secondary roads in Karbala Provence. The percentage distribution of each variable as a first step to simulate
of paved roads in the governorate is about 85% until the data. Fifty thousand cases of simulated data have
2009, and the unpaved roads represent part of the been made using SPSS program as shown in Figure
secondary and rural roads. The population, which 1.
was around 1.22 million in 2018 with growth rate
2.1%, it is expected to reach around 3 million by Linear regression model assumptions were checked
2025 due to random migration. This situation urged using SPSS program. If collinearity between
the authorities at the city to open new roads and new variables are exists it leads to eliminate the corelated
districts to overcome the unplanned immigration. ones as in Figure 2.
3.3Data simulation and methods After that NLRM has been tested. Finally, testing the
Simulation is a method of statistical sampling used to developed models to check its efficiency.
approximate solutions to quantitative factors.
1733
Gafel Kareem Aswed et al.
3.4Non-linear model Test m3, road length in m, total cost in IQD, base layer in
Nonlinear regression is a system of detecting a m2, earthwork m3, and road width in m) have been
nonlinear model of the relation between the examined. Utilizing SPSS program version 26 by
dependent variable and a set of independent using the previously simulated fifty thousand cases to
variables. Even when a linear approach doing well, a discover the best fit model based on the maximum R2
nonlinear model could be used to explore a fine as shown in Figure 3. The conceptual framework of
performance of the parameters [25]. Nine models the study started from problem identification,
(quadratic, s-curve, logarithmic, inverse, power, literature review, data collection, data simulation,
logistic, growth, compound and exponential) were linear and nonlinear models testing, evaluations and
checked. The favored model between (duration) and discussed as in Figure 4.
each one from the six factors, (Subbase quantity in
Literature review
50000 Cases
Data Simulation
Performance Criteria
R2,MPE,RMSE,AA
1735
Gafel Kareem Aswed et al.
statistics are shown in Table 1. Residual tests indicate The length of road is more important factor in
the normal probability test with mean ≈ zero and one predicting the road duration where Beta value is the
standard deviation. The resulted Equation 1 below greatest and equal to 0.890. All three variables are
with (R2=82.3%) is: statically significant based on significance (sig.)
D=b0+b1×L+b2×w+b3×E (1) values as in Table 1.
Where; b0, b1, b2, b3 are constants.
1736
International Journal of Advanced Technology and Engineering Exploration, Vol 9(97)
Actual duration (days) Linear model predicted duration Non-linear model predicted duration
(days) (days)
180 212.96 205.02
240 290.44 261.89
300 288.88 273.06
240 254.77 236.14
180 214.19 189.42
180 212.96 205.02
expected situation of weather based on [3] network model. International Journal of Engineering
suggestion. Modeling can be done by other methods and Technology. 2017; 9(5):3458-69.
as well, such as earned value management, support [10] Titirla M, Aretoulis G. Neural network models for
vector machine, random forest, etc. actual duration of Greek highway projects. Journal of
Engineering, Design and Technology. 2019;
17(6):1323-9.
Acknowledgment [11] Qiao Y, Labi S, Fricker JD. Hazard-based duration
The authors would like to thank Directorate of Roads and models for predicting actual duration of highway
Bridges of Karbala Governorate in Iraq for their projects using nonparametric and parametric survival
cooperation with us and providing executed road project analysis. Journal of Management in Engineering.
information. 2019; 35(6):1-16.
[12] Mahamid I. The development of regression models for
Conflicts of interest preliminary prediction of road construction duration.
The authors have no conflicts of interest to declare. International Journal of Engineering and Information
Systems. 2019; 3(4):14-20.
Author’s contribution statement [13] Velumani P, Nampoothiri NV, Urbański M. A
Gafel Kareem Aswed: Analysis, Interpretation of results comparative study of models for the construction
and approved the final version of the manuscript. duration prediction in highway road projects of India.
Mohammed Neamah Ahmed: Data collection and Sustainability. 2021; 13(8):1-13.
preparation. Hussein Ali Mohammed: Supervision and [14] Nemaa ZK, Aswed GK. Forecasting construction time
Draft manuscript preparation. for road projects and infrastructure using the fuzzy
PERT method. In IOP conference series: materials
References science and engineering 2021 (pp. 1-13). IOP
[1] Khalid FJ. The impact of poor planning and Publishing.
management on the duration of construction projects: [15] Titirla M, Aretoulis G. Comparison of linear
a review. Multi-Knowledge Electronic Comprehensive regression and neural network models to estimate the
Journal for Education and Science Publications. 2017; actual duration of Greek highway projects. In XIV
2:161-81. Balkan conference on operational research, virtual
[2] Khaleel TA, Hadi IZ. Controlling of time-overrun in BALCOR 2020 (pp. 2-6).
construction projects in Iraq. Engineering and [16] Czarnigowska A, Sobotka A. Estimating construction
Technology Journal. 2017; 35(2 Part A):111-7. duration for public roads during the preplanning
[3] Al HBI. An investigation into factors causing delays phase. Journal of Engineering, Project, and Production
in highway construction projects in Iraq. In MATEC Management. 2014; 4(1):26-35.
web of conferences 2018 (pp. 1-11). EDP Sciences. [17] Mohamed B, Moselhi O. Conceptual estimation of
[4] Jiang Y, Wu H. A method for highway agency to construction duration and cost of public highway
estimate highway construction durations and set projects. Journal of Information Technology in
contract times. International Journal of Construction Construction. 2022; 27(29):595-618.
Education and Research. 2007; 3(3):199-216. [18] Sharma S, Ahmed S, Naseem M, Alnumay WS, Singh
[5] Kaleem S, Irfan M, Gabriel HF. Estimation of S, Cho GH. A survey on applications of artificial
highway project duration at the planning stage and intelligence for pre-parametric project cost and soil
analysis of risk factors leading to time overrun. In shear-strength estimation in construction and
T&DI congress: planes, trains, and automobiles 2014 geotechnical engineering. Sensors. 2021; 21(2):1-44.
(pp. 612-26). [19] Le C, Wai YM, David JH, Choi K. Comprehensive
[6] Mensah I, Nani G, Adjei-kumi T. Development of a evaluation of influential factors on public roadway
model for estimating the duration of bridge project contract time. Journal of Management in
construction projects in Ghana. International Journal Engineering. 2021; 37(5).
of Construction Engineering and Management. 2016; [20] Nevett G, Goodrum PM, Littlejohn RL.
5(2):55-64. Understanding the effect of bid quantities, project
[7] Waziri BS, Jibrin AT, Kadai B. Functional duration characteristics, and project locations on the duration of
models for highway construction projects in Nigeria. road transportation construction projects during early
Arid Zone Journal of Engineering, Technology and stages. Transportation Research Record. 2021;
Environment. 2014; 10:13-23. 2675(2):121-34.
[8] Karadimos P, Anthopoulos L. Neural network models [21] Son J, Khwaja N, Milligan DS. Planning-phase
for actual cost and actual duration estimation in estimation of construction time for a large portfolio of
construction projects: findings from Greece. highway projects. Journal of Construction Engineering
International Journal of Structural and Construction and Management. 2019; 145(4):1-11.
Engineering. 2021; 15(5):250-61. [22] Okere G. An evaluation of a predictive conceptual
[9] Al-sadi AM, Zamiem SK, Al-jumaili LA. Estimating method for contract time determination on highway
the optimum duration of road projects using neural projects based on project types. International Journal
of Civil Engineering. 2019; 17(7):1057-73.
1739
Gafel Kareem Aswed et al.
[23] Nani G, Mensah I, Adjei-kumi T. Duration estimation Hussein Ali Mohammed received the
model for bridge construction projects in Ghana. B.Sc. degree in building and
Journal of Engineering, Design and Technology. 2017; construction engineering, the M.Sc.
15(6): 754-77. degree in project management
[24] Kwak YH, Ingall L. Exploring monte carlo simulation engineering from university of
applications for project management. Risk technology (UOT) Bagdad- Iraq and
Management. 2007; 9(1):44-57. the PhD degree from university of
[25] Seber GA. Nonlinear regression models. In the linear Bagdad. He is a faculty member
model and hypothesis 2015 (pp. 117-28). Springer, (Assistant Professor) in Kerbala University-college of
Cham. engineering, Karbala, Iraq.
[26] Vyas T, Varia HR. Predicting traffic induced noise Email: [email protected]
using artificial neural network and multiple linear
regression approach. International Journal of Appendix I
Advanced Technology and Engineering Exploration. S. No. Abbreviation Description
2022; 9(92): 1009-27. 1 AA Accuracy Percentage Average
[27] Stevens JP. Applied multivariate statistics for the 2 ANN Artificial neural network model
social sciences. Routledge; 2012. 3 AS Asphalt
[28] Waters D, Waters CD. Quantitative methods for 4 B Barriers Cost
business. Pearson Education; 2008. 5 BTC Brimelow’s model of time–cost
6 C Final Total Cost
Gafel Kareem Aswed received the 7 CI Confidence Interval
B.Sc. degree in building and 8 CL Culverts
construction engineering and the M.Sc. 9 D Actual Duration
degree in project management 10 df Degree of Freedom
engineering from university of 11 E Earthwork
technology (UOT) Bagdad- Iraq. He is 12 FANN Artificial Neural Network Tool
a faculty member (Assistant Professor) 13 FD Fuzzy Delphi
in Kerbala University-college of 14 L Length of the Road
engineering, Karbala, Iraq. 15 INDOT Indiana Department of Transportation
Email: [email protected] 16 MAPE Mean Absolute Percentage Error
17 MLRM Multi Linear Regression Model
18 NLRM Nonlinear Regression Model
Mohammed Neamah Ahmed received 19 PCA Principal Component Analysis
the B.Sc. degree in Civil Engineering 21 PDF Probability Distribution Function
from university of Babylon, the M.Sc. 22 R Rock
and PhD degree in project management 23 R2 Coefficient of Determination
engineering from university of Bagdad- 24 RMSE Root Mean Square Error
Iraq. He is a faculty member (Assistant 25 S Subbase
Professor) in Kerbala University- 26 Sig. Significance Value
college of engineering, Karbala, Iraq. 27 SPSS Statistical Package for Social Science
Email: [email protected] 28 VIF Variance Inflation Factor
29 W Width
Waikato Environment for Knowledge
30 WEKA
Analysis
Appendix II Actual project data used for models’ evaluation
Project Length(m) Width Duration Total cost Barriers Rock Earthwork Culverts Subbase Asphelt
No. (m) (days) (IQD) (IQD) (m3) (m2) (m.l) (m2) (m2)
1 2300 6 210 619158000 15605000 300 18000 5 8500 14500
2 1800 6 180 405715000 9558000 600 7000 0 3510 10800
3 1100 5 180 297330000 1.13E+08 0 6000 132 2145 6600
4 2500 6 270 783589250 16775000 3000 20000 20 6750 15000
5 650 6 120 221550000 58589250 0 5500 75 1600 3900
6 2400 6 300 669439500 32500000 1600 15500 0 4700 14400
7 1800 5 240 416000000 500000 0 12000 8 3250 9000
8 1100 5 180 275260250 80489500 3500 52600 0 2350 5500
9 1100 6 180 427155000 23700000 0 7000 30 2145 6600
10 2450 5 240 665800000 16097750 800 10000 77 5200 12250
11 2400 6 300 669439500 1.53E+08 0 15500 48 4700 14400
12 1800 5 240 416000000 500000 0 12000 8 3250 9000
13 1100 5 180 275260250 80489500 3500 8250 0 2350 5500
14 1100 6 180 427155000 23700000 0 7000 30 2145 6600
1740