12 Machine Learning Model To Predict Construction Duration
12 Machine Learning Model To Predict Construction Duration
RESEARCH ARTICLE
Abstract
The construction industry is witnessing a rapid rise in tall building projects due to an anticipated urban
population explosion. However, this building typology has been subject to time overruns and total
abandonment due to an underestimation of the project duration. Consequently, this paper presents the
development of a model to predict the construction duration of tall building projects. In developing the
model, a suite of machine learning algorithms was adopted including Multi-Linear Regression Analysis
(MLRA), k-Nearest Neighbors (KNN), Artificial Neural Networks (ANN), Support Vector Machines
(SVM), and Ensemble Methods. Thus, twelve models were developed in the process, and the most efficient
model was selected. The procedure described in this study presents researchers and practitioners with a
strategy to enhance the time performance of tall building projects through the adoption of modern digital
technologies such as machine learning. The proposed model was based on an ensemble method using ANN
as the combiner, with a Correlation Coefficient (R2) of 0.69, Root Mean Squared Error (RMSE) of 301.72,
and Mean Absolute Percentage Error (MAPE) of 18%.
Keywords
Duration prediction; Regression; k nearest neighbour; Neural networks; Support vector machines; Ensemble
methods
Received: 12 January 2021; Accepted: 22 March 2021
ISSN: 2630-5771 (online) © 2021 Golden Light Publishing All rights reserved.
*
Corresponding author
Email: [email protected]
23 Sanni-Anibire et al.
Alzara et al. [5] reported delays in the range of 50% Percentage Error (MAPE). The outcome of the
to 150 %. systematic model development process described in
Particularly, tall building projects are notorious this study is the proposed ML model for the
for their delayed completion times. Interestingly, duration prediction of tall building projects. The
the Council on Tall Buildings and Urban Habitat, model can be described as an ensemble method that
CTBUH [6] in its report “Dream Deferred: combines the outputs of ML algorithms considered
Unfinished Tall Buildings” noted the alarming rate in this study using ANN as the combiner.
of increase of “never completed” tall buildings.
Previous researchers have suggested that a reliable 2. Literature review
prediction of the duration of construction projects is The following sections present an overview of
crucial to avoiding construction delays [7,8,3,9]. relevant background on construction duration
Traditional methods such as the Critical Path estimation. Firstly, traditional approaches, as well
Method (CPM) or Program Evaluation Research as modern trends in construction duration
Task (PERT) have been shown to consistently estimation, were reviewed. Subsequently, previous
underestimate the actual project duration [10]. studies related to the development of mathematical
Typical considerations may include the client’s models as well as the application of artificial
time constraints, budget, or conducting a detailed intelligence and machine learning techniques have
analysis subject to skill, experience, and individual been presented.
intuition of the project engineer. Therefore, there is
a high level of subjectivity in the process which 2.1. Approaches to construction duration
ultimately yields high levels of uncertainty [11]. estimation
In this regard, some research works have sought
The duration of an activity is simply the length of
to apply Artificial Intelligence (AI) and Machine
time or period it takes to complete that activity. This
Learning (ML) to the duration prediction of
is typically measured in hours, days, weeks,
construction projects [11-22]. These studies are
months, or years. Determining task durations
however limited in the techniques used, as they
utilizing detailed analysis is dependent on the
have focused on one or two algorithms, without
required human and material resources, as well as
exploring ensemble methods to achieve improved
the productivity rates of these resources.
performance. Moreover, despite the rapid growth of
Traditionally, there are two modeling techniques
tall building construction, and the recurring time
used in construction project scheduling which are:
overruns of such projects, there is a dearth of
the Critical Path Method (CPM) and Program
research on the subject of its duration estimation.
Evaluation and Review Technique (PERT). The
In light of the foregoing, research to develop a
CPM schedule assumes the duration of work items
model for the estimation of the duration of tall
is known with some level of certainty. On the other
building projects based on ML has been
hand, PERT considers the uncertainty in
conceptualized. Historical data on the construction
determining the duration of work items. Hence,
duration of tall building projects has been obtained.
PERT is based on a "three-time estimate" i.e.
The dataset was further used to develop duration
optimistic estimate, most likely estimate, and the
prediction models based on popular machine
pessimistic estimate. The average of the "three-time
learning algorithms such as Multi Linear
estimate" is adopted as the duration [23].
Regression (MLRA), k-Nearest Neighbors (KNN),
Regardless of the methods applied, the calculated
Support Vector Machines (SVM), Artificial Neural
values remain approximate, and are characteristic
Network (ANN), and Ensemble Methods. The
of high levels of uncertainty. The estimator's
performance of these models was evaluated based
background and experience are highly correlated to
on the Correlation Coefficient (R2), Root Mean
the accuracy of the estimation. Lack of adequate
Squared Error (RMSE), and Mean Absolute
experience and thorough understanding of the
Developing a machine learning model to predict the construction duration of tall building projects 24
projects' scope of work will lead to poor construction projects continue to suffer from
estimations. Additionally, there exists the problem performance and productivity issues [28]. The
of material and labor price variations/fluctuations following sections provide a non-exhaustive review
and inflation which are characteristic of of literature related to the application of artificial
construction projects. To solve the problem of intelligence and machine learning techniques to
uncertainty which may be due to insufficient duration estimation.
information, variations, and human error,
researchers have sought to employ more intelligent 2.2.1. Knowledge-based expert system
methods. Though research interests in duration Knowledge-based expert systems are computer
estimation can be traced back to the 1960s, the past programs originally developed in the field of
few decades have witnessed a resurgence [24,25]. Artificial Intelligence (AI) and designed to reach
The investigated approaches can be summarily the level of performance of a human expert in some
classified into three including Artificial Intelligence specialized problem-solving domain. Hendrickson
(AI)-based scheduling which includes Knowledge- et al. [29] presented a framework for modifying
Based Scheduling, Expert systems and Case-Based standard work productivities for activity duration
Reasoning (CBR), Genetic Algorithms and Neural estimation. The study proposed an expert system
Networks; simulation-based scheduling; and “MASON”. Moselhi and Nicholas [30] also
integrated BIM-based scheduling [24]. presented ESCHEDULER, a prototype system for
precedence setting and modifying durations. Also,
2.2. Previous studies on the development of Shaked and Warszawski [31] developed
models for duration prediction HISCHED, which is a knowledge-based expert
Bromilow [4] is accredited with developing the first system for the construction planning of buildings.
empirical model that establishes the relationship
2.2.2. Linear regression analysis
between cost and time. Bromilow’s Time-Cost
(BTC) is based on historical data from 309 building Hoffman et al. [32] identified the factors
projects completed in Australia between July 1964 influencing construction duration through an
and July 1967. The BTC model has been the subject assessment of 856 facility projects. The study
of many other studies to re-calibrate and test the compared the results of a multiple linear regression
performance of the model in other locations and model with the BTC model and concluded that
various project types. Further developments to multiple linear regression provided a more accurate
Bromilow's model were made by Chan and prediction. Yeom et al. [21] presented a multiple
Kumaraswamy [26] to combine the cost and floor linear regression model that facilitates accurate
area in a similar model. Other researchers studied prediction (94.72%) of construction durations of
the model further and included other variables in general office buildings in Korea. Blyth et al. [15]
the equation. This could be seen as the foundation presented a multiple linear regression analysis
for later studies in developing mathematical models which showed that twenty-one most influential
with the aid of the multi-linear regression method. project variables could accurately predict
Interestingly, the late 80s witnessed the acceptance construction duration for buildings in the UK. The
of more intelligent methods such as AI to solving developed model was further validated with a new
the inherent construction problem of estimating set of data which showed that the absolute
durations. Mohan [27] outlined 37 expert-system percentage error for the overall duration varied
applications in the field of construction engineering between 0.38% and 6.68%. Lin et al. [33]
and management. After five decades of Bromilow's developed a regression model for predicting the
initial model, a lot of technological advancements construction duration of steel-reinforced concrete
can be witnessed in construction project building projects in Taiwan. Khosrowshahi and
management, planning, and scheduling. However, Kaka [12] proposed two models for cost and
25 Sanni-Anibire et al.
sample size of 35 projects was identified with various views of the dataset will provide a general
construction completion dates between 1993 and idea of the views that are better for the machine-
2015. Remarkably, all the projects contained in the learning problem [36]. Generally, the best
dataset are from China, which according to CTBUH algorithm to be used to solve an ML problem is
[34] accounts for 61.5% of 200-meter-tall buildings usually not known beforehand. Experts have
in the world in 2018, and has maintained its role as suggested that common ML algorithms should
the most prolific country in tall building firstly be explored, especially those common in the
construction for over two decades. field of the ML problem at hand [36,37]. In this
study, four ML algorithms have been considered
3.2. Data pre-processing including Multi Linear Regression Analysis
Since the dataset has been obtained from the real (MLRA), Artificial Neural Networks (ANN), K-
world, it may exhibit characteristics not ideal for Nearest Neighbors (KNN), and Support Vector
ML modeling and thus require pre-processing and Machines (SVM). The dataset was split into a train-
re-shaping. In this study, the Waikato Environment test ratio of 66% to 34%. Additionally, five various
for Knowledge Analysis (Weka 3.8.3) has been views were considered as follows:
used. This is an open-source machine learning Raw dataset: original dataset as described in
software written in Java and developed at the Table 1.
University of Waikato, New Zealand [35]. Table 1 Normalized view (input features only):
provides descriptive statistics of the numerical rescaling values in the input dataset to a range
features of the dataset, while Table 2 describes the of 0 and 1, such that the largest value for each
non-numerical features of the dataset and their feature is 1 and the lowest is 0. Normalization is
conversion to dummy variables. a good technique to use when the distribution of
the data is either unknown, or is Gaussian (i.e.
3.3. Views of the dataset bell curve). The formula for normalization is
expressed in Eq. 1:
These are copies of the dataset in addition to the
original dataset. They are created based on some 𝑥𝑥 − 𝑥𝑥𝑚𝑚𝑚𝑚𝑚𝑚
𝑥𝑥𝑛𝑛𝑛𝑛𝑛𝑛 = (1)
system such as normalization and standardization. 𝑥𝑥𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑥𝑥𝑚𝑚𝑚𝑚𝑚𝑚
Evaluating the performance of algorithms on
Normalized view (entire dataset): input and determined and subsequently ranked. Furthermore,
output values are converted to a range of 0 to 1. attribute selection was based on the Recursive
In this case, a further step was required to de- Feature Elimination (RFE) procedure [38]. In RFE,
normalize the output data. the entire feature set (V) ranked according to the
Standardized view (input features): input values correlation coefficient is split in half to derive the
are rescaled such that the means are set at 0, and best V/2 features, and the worst V/2 features are
the standard deviation is 1. This technique is eliminated. The splitting process continues
more useful if the dataset has a Gaussian (bell recursively until only one best feature is left.
curve) distribution [36]. The process is executed Thereafter, the feature subset that achieved the best
according to equation 2 (where µ represents the accuracy/or the best performance measure is finally
arithmetic mean and σ the standard deviation): chosen as the best subset to be used.
𝑥𝑥 − µ
𝑥𝑥𝑛𝑛𝑛𝑛𝑛𝑛 = (2) 3.5. Hyperparameter optimization
𝜎𝜎
Replace missing values: datasets for machine The performance of ML algorithms is dependent on
learning usually contain missing values that the tuning of optimal hyperparameters. It involves
need to be treated by removing or replacing the searching for the hyperparameters that result in the
missing values [36]. The best performance of an algorithm given a set of
ReplaceMissingValues filter in Weka was used data. The ML algorithms used in this study are
to create this view where the missing values are described in the Weka environment as follows:
set equal to the mean of the distribution for MLRA: “LinearRegression”, k-NN: “IBk”, ANN:
numerical features, and the mode for categorical “Multilayer Perceptron”, and SVM: “SMOReg”. In
features respectively. determining the hyperparameters that yield optimal
model performance, Weka was used to execute a
3.4. Feature selection modified systematic search i.e. a range of randomly
spaced values are searched first, and then the range
The best view of the dataset determined from the
that performs best is zoomed in for further
previous section was selected for further
investigation. The optimal hyperparameter for
processing. The “CorrelationAttributeEval”
KNN is the k value (search range 1 - 30), as well as
technique was used to determine the most relevant
the search and distance function, while ANN
attributes contributing to the predictive
depends on the learning rate (search range 0.1 –
performance. The correlation of various features in
0.3), hidden layers (search range 1 – 4) and number
the dataset to the prediction output is firstly
Developing a machine learning model to predict the construction duration of tall building projects 28
of nodes (search range 1 – 4). SVM optimization procedure followed in this study is further
depends on the regularization factor C (search range summarized and illustrated in Fig. 1.
1 – 1000), the type of kernel function, as well as
epsilon parameter (search range 0.1 – 0.00001). 4. Results and findings
where (𝑦𝑦 𝑎𝑎 - 𝑦𝑦 𝑝𝑝 ) is the difference between the 4.2. Performance of machine learning algorithms
actual and predicted values and 𝑛𝑛 is the size of the To develop the initial models for which the
dataset used. performance of ML algorithms will be evaluated,
1
𝑛𝑛
𝑦𝑦 𝑎𝑎 − 𝑦𝑦 𝑝𝑝 the best combination of features that yields
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 = �� � (5) optimum performance was determined. This was
𝑛𝑛 𝑦𝑦 𝑝𝑝
𝑡𝑡=1 achieved using the “CorrelationAttributeEval” and
where 𝑦𝑦 𝑎𝑎 is the actual value and 𝑦𝑦 𝑝𝑝 is the predicted RFE techniques discussed previously in the
value, and 𝑛𝑛 is the size of the dataset used. “feature selection” section. Thus, in addition to a
dataset containing all features, four more feature
3.7. Combining algorithms sets were developed as described in Table 4. It can
be observed from the Table, that the most important
To improve the performance of the techniques used,
feature influencing the duration of tall building
an ensemble method was used. This is an approach
projects is the number of total floors followed by
that combines the prediction outcomes of a set of
the number of floors above the ground floors. As
algorithms with the same or different sets of
shown in Table 5, the performance of the four
features. This can be achieved through averaging
algorithms varied with different sets of features.
(fixed rules) and stacking (trained rules) [39,40].
MLRA exhibited the best performance with the best
Averaging is a simple aggregation of the
two features and best feature respectively i.e. “# of
predictions of other models based on a fixed rule
total floors” and “# of floors above ground floors”.
such as the mean, maximum and minimum values.
It is also observed that both features (best V/8 and
Stacking is an extension of averaging which allows
best feature) exhibit similar performance for all
another algorithm to learn how best to combine the
algorithms except KNN. This suggests that both
predictions of other models [36]. The systematic
features are collinear, and one of them could
satisfactorily replace the other.
29 Sanni-Anibire et al.
ANN performed best with the best V/4 features Though, MOD3 had a better correlation coefficient
(described in Table 4), with an RMSE of 208.61, a (R2) compared to MOD4, a cross plot of the actual
74% decrease in error compared to the case where versus predicted values presented in Fig. 2 shows
all features were used (RMSE of 800.47), as shown that the predicted values for MOD3 was simply a
in Table 5. KNN performed best with the best V/2 general average and thus did not reflect a realistic
features (RMSE of 225.89), while SVM performed prediction which is evident in the high inaccuracies
best with the best V/4 features (RMSE of 261.19). from the RMSE and MAPE values.
The results presented in Table 5 formed the basis
for the development of the initial models for further 4.3. Performance of ensemble methods
investigation through hyperparameter tuning and To seek further improvement in the predictive
optimization. performance, an ensemble method was adopted.
Based on the performance of various The three best performing models (MOD1, MOD2
combinations of ML algorithms and feature sets, & MOD4) were combined using fixed and trained
five initial models have been selected. The rules also referred to as averaging and stacking
highlighted figures in Table 5 indicate the preferred respectively. Thus, seven more models were
configuration for the models. To further optimize created for further investigation as shown in Table
the performance of ML algorithms, tuning their 8.
hyperparameters becomes necessary. Table 6
presents the developed models showing the 4.3.1. Averaging (fixed rules)
combination of ML algorithms, feature sets, and To combine the selected best-performing models
optimal hyperparameters. The performance of the through a fixed rule system, the mean, maximum
various models is presented in Table 7. To and minimum values of the models’ predicted
determine the models with the best performance, outputs were considered. This formed MOD6,
the models with the lowest RMSE and MAPE MOD7 & MOD8. The performance results are
values, as well as the highest R2 values are presented in Table 9. It can be observed that MOD8
considered first. It can be observed that MOD1, performed best with the least RMSE and MAPE
MOD2, and MOD4 performed best when the values, as well as the highest R2 value.
RMSE, MAPE, and R2 results are compared.
31 Sanni-Anibire et al.
Table 6. ML models (combinations of algorithms, optimized hyperparameters and selected feature sets)
Model ML Algorithm Selected features Optimization hyperparameters
ANN (Multilayer # of total floors; # of floors above 0.3 learning rate; one hidden layer
MOD1
Perceptron) ground floors; # of parking spaces; cost with four nodes
# of total floors; # of floors above Nearest Neighbor: LineraNN
ground floors; # of parking spaces; cost; Distance function: Manhattan
MOD2 KNN (IBk)
building area; height to tip; floor area; # Distance
of elevators K: 1
Nearest Neighbor: LineraNN
Distance function: Manhattan
MOD3 KNN (IBk) # of total floors
Distance
K: 18
Kernel: Polykernel
# of total floors; # of floors above Cost function, C: 1
MOD4 SVM (SMOReg)
ground floors; # of parking spaces; cost Epsilon: 1E -12
Epsilon parameter: 1E -3
Kernel: Pearson VII function based
universal kernel (PUK)
MOD5 SVM (SMOReg) # of total floors Cost function, C: 100
Epsilon: 1E -12
Epsilon parameter: 1E -4
Fig. 2. Cross-plots of the actual vs. predicted duration values for selected models
Developing a machine learning model to predict the construction duration of tall building projects 32
performance of machine learning algorithms was the predictive performance of the model built, due
the normalized view, where all features (input and to the limited amount of data available for training
output) were re-scaled to range between 0 and 1. and testing. Similarly, previous research works in
The study further explored various feature sets that duration prediction have relied on similar sizes of
contribute to the performance of the algorithm. This the dataset. It may thus be concluded that the
was to identify and select the features that construction industry is deficient in recording and
contribute to improving the predictive performance publishing data suitable for ML applications for
of the ML algorithm. Additionally, feature selection duration prediction. Furthermore, the study's
helps to control the “curse of dimensionality”, limitation in the dataset being sourced from china
which is a phenomenon characteristic of real data. may be relieved due to China being the major driver
In this study, the most relevant feature that of tall building construction globally [34]. Also, the
correlates with the output for prediction (i.e. dataset contains the GDP of the cities where the
duration) was the number of total floors. This is a building projects are located and thus may provide
logical outcome when considering the nature of the an opportunity for extension to other construction
study’s focus (i.e. tall buildings). Further to that was climates globally based on the GDP of a city. The
the algorithm evaluation process, where the significance of this study is reflected in its
hyperparameters of the selected algorithms were addressing a current trend in the construction
adjusted to determine the optimal values for the industry, which is the exponential rise of tall
initial duration estimation models (MOD1, MOD2, building projects in urban centers across the globe.
MOD3, MOD4, and MOD5). The initial duration These projects are known to be characterized by
estimation models were further combined through their delayed completion times.
ensemble techniques i.e. fixed and trained rules.
The final result from the overall process was the 6. Conclusion
selection of a model which was based on a This study demonstrated the significance of
combination of three initial models using ANN as leveraging the capabilities of machine learning for
the combiner. The best performing model in this enhanced time performance in the construction
study described as MOD10 had an R2 of 0.69, industry. Specifically, the literature reveals that
MAPE of 0.18, and RMSE of 301.76. there is a dearth of studies in the construction
The level of accuracy of MOD10 suggests that domain on the time performance of tall buildings.
it could be recommended as a decision support tool Tall building projects have become a dominant
in estimating the duration of tall building projects. building typology of the modern urban habitat.
A comparison of the performance of MOD10 with Furthermore, it is now widely considered an
the poorest performing model in this study (i.e. important area of construction engineering and
MOD5) revealed a gain in performance of 47% was management research. Many factors may contribute
achieved in the correlation coefficient (R2), 44% in to studying this specific building typology separate
MAPE, and 37% in RMSE. Likewise, MOD10 from other/horizontal building types. For instance,
arguably outperforms the CBR model developed by this study reveals that the most significant factor
Li et al. [11] for skyscrapers. The R2 value obtained influencing the construction duration is the total
in this study for MOD10 was 69%, while a CBR number of floors – an intrinsic attribute of tall
model developed by Li et al. [11] achieved 62%; buildings. Other potential factors may include the
which was only improved to 69% when some structural systems used. Furthermore, tall buildings
poorly predicted cases were deleted from the CBR have specific characteristics that may lead to their
model. Thus, the superiority of MOD10 is apparent. delayed completion times, such as the complexity
The limitations of this study may be reflected in the in design and construction, as well as the large
source and size of the dataset used. The dataset in number of professionals involved.
this study contains about 35 cases that may impact
Developing a machine learning model to predict the construction duration of tall building projects 34
[10] Ballesteros-Pérez P, Larsen GD, González-Cruz Journal of Civil Engineering and Management,
MC (2018) Do projects really end late? On the 24(3): 238-253.
shortcomings of the classical scheduling [22] Sağlam B, Bettemir ÖH (2018) Estimation of
techniques. JOTSE: Journal of Technology and duration of earthwork with backhoe excavator by
Science Education, 8(1): 17-33. Monte Carlo Simulation. Journal of Construction
[11] Li Y, Lu K, Lu Y (2016) Project schedule Engineering, Management & Innovation 1(2): 85-
forecasting for skyscrapers. Journal of 94.
Management in Engineering, 33(3): 05016023. [23] Hinze J. Construction Planning and Scheduling.
[12] Khosrowshahi F, Kaka AP (1996) Estimation of Pearson/Prentice Hall, 2011.
project total cost and duration for housing projects [24] Faghihi V, Nejat A, Reinschmidt KF, Kang JH
in the UK. Building and Environment, 31(4): 375- (2015) Automation in construction scheduling: a
383. review of the literature. The International Journal
[13] Chan DWM, Kumaraswamy MM (1999) of Advanced Manufacturing Technology, 81(9-12):
Forecasting construction durations for public 1845-1856.
housing projects: a Hong Kong perspective. [25] Liu H, Al-Hussein M, Lu M (2015) BIM-based
Building and environment, 34(5): 633–646. integrated approach for detailed construction
[14] Skitmore RM, Ng ST (2003) Forecast models for scheduling under resource constraints. Automation
actual construction time and cost. Building and in Construction, 53: 29-43.
environment, 38(8): 1075-1083. [26] Chan DW, Kumaraswamy MM (1995) A study of
[15] Blyth K, Lewis J, Kaka A (2004) Predicting project the factors affecting construction durations in Hong
and activity duration for buildings in the UK. Kong. Construction Management and Economics,
Journal of Construction Research, 5(02): 329-347. 13(4): 319-333.
[16] Attal A. Development of neural network models for [27] Mohan S (1990) Expert systems applications in
prediction of highway construction cost and project construction management and engineering. Journal
duration. Doctoral dissertation, Ohio University, of Construction Engineering and Management,
2010. 116(1): 87-99.
[17] Koo C, Hong T, Hyun C, Koo K (2010) A CBR- [28] Al-Kofahi ZG, Mahdavian A, Oloufa A (2020)
based hybrid model for predicting a construction System dynamics modeling approach to quantify
duration and cost based on project characteristics in change orders impact on labor productivity 1:
multi-family housing projects. Canadian Journal of principles and model development comparative
Civil Engineering, 37(5): 739-752. study. International Journal of Construction
[18] Mensah I, Nani G, Adjei-Kumi T (2016) Management,
Development of a Model for Estimating the doi:10.1080/15623599.2020.1711494.
Duration of. Bridge Construction Projects in [29] Hendrickson C, Zozaya-Gorostiza C, Rehak D,
Ghana, International Journal of Construction Baracco-Miller E, Lim P (1987) Expert system for
Engineering and Management, 5(2): 55-64. construction planning. Journal of Computing in
[19] Jin R, Han S, Hyun C, Cha Y (2016) Application of Civil Engineering, 1(4): 253-269.
case-based reasoning for estimating preliminary [30] Moselhi O, Nicholas MJ (1990) Hybrid expert
duration of building projects. Journal of system for construction planning and scheduling.
Construction Engineering and Management, Journal of Construction Engineering and
142(2): 04015082. Management, 116(2): 221-238.
[20] Peško I, Mučenski V, Šešlija M, Radović N, [31] Shaked O, Warszawski A (1995) Knowledge-based
Vujkov A, Bibić D, Krklješ M (2017) Estimation of system for construction planning of high-rise
costs and durations of construction of urban roads buildings. Journal of Construction Engineering and
using ann and svm. Complexity, Article ID: Management, 121(2): 172-182.
2450370. [32] Hoffman GJ, Thal Jr AE, Webb TS, Weir JD
[21] Yeom DJ, Seo HM, Kim YJ, Cho CS, Kim Y (2007) Estimating performance time for
(2018) Development of an approximate construction projects. Journal of Management in
construction duration prediction model during the Engineering, 23(4): 193-199.
project planning phase for general office buildings. [33] Lin MC, Tserng HP, Ho SP, Young DL (2011)
Developing a construction-duration model based
Developing a machine learning model to predict the construction duration of tall building projects 36
on a historical dataset for building project. Journal [38] Sanni-Anibire MO, Zin RM, Olatunji SO (2020)
of Civil Engineering and Management, 17(4): 529- Machine learning model for delay risk assessment
539. in tall building projects. International Journal of
[34] CTBUH (2018) CTBUH Year in Review: Tall Construction Management, doi:10.1080/15623599.
Trends of 2018. https://www.skyscrapercenter.com 2020.1768326
/year-in-review/2018 [39] Xia R, Zong C, Li S (2011) Ensemble of feature
[35] Witten IH, Frank E, Hall MA, Pal CJ. Data Mining: sets and classification algorithms for sentiment
Practical Machine Learning Tools and Techniques. classification. Information Sciences, 181(6): 1138-
Morgan Kaufmann, 2016. 1152.
[36] Brownlee J (2018) Machine learning mastery with [40] Kuncheva LI. Combining Pattern Classifiers:
Weka. https://machinelearningmastery.com/machi Methods and Algorithms. John Wiley & Sons,
ne-learning-mastery-weka/ 2014.
[37] Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, [41] Gunduz M, Nielsen Y, Ozdemir M (2015) Fuzzy
Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Assessment Model to Estimate the Probability of
Zhou ZH (2008) Top 10 algorithms in data mining. Delay in Turkish Construction Projects. Journal of
Knowledge and Information Systems, 14: 1-37. Management in Engineering, 31(4): 04014055.