0% found this document useful (0 votes)

32 views16 pages

Applied Artificial Intelligence For Predicting Construction Projects Delay

This article discusses applying artificial intelligence and machine learning techniques to predict construction project delays. The authors first reviewed literature on factors that influence construction delays. They then collected quantitative delay data from experts and used it to train ensemble machine learning algorithms like bagging, boosting, and naive bayes. This allowed them to develop optimized predictive models like random forest and gradient boosting. Finally, they created an ensemble of ensembles model using stacking to maximize predictive performance. Evaluation metrics showed ensemble methods improved predictive ability over single algorithms for forecasting construction delays.

Uploaded by

Jabeer Abdul Sathar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views16 pages

Applied Artificial Intelligence For Predicting Construction Projects Delay

Uploaded by

Jabeer Abdul Sathar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/354871960

Applied Artiﬁcial Intelligence for Predicting Construction Projects Delay

Article in Machine Learning · September 2021

DOI: 10.1016/j.mlwa.2021.100166

CITATIONS READS

20 1,445

5 authors, including:

Christian Nnaemeka Egwim Hafiz A. Alaka

University of Hertfordshire University of Hertfordshire
11 PUBLICATIONS 43 CITATIONS 82 PUBLICATIONS 2,748 CITATIONS

SEE PROFILE SEE PROFILE

Olalekan Luqman Toriola-Coker Habeeb Balogun

University of Salford University of Hertfordshire
26 PUBLICATIONS 71 CITATIONS 8 PUBLICATIONS 39 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Marginalization of end-user stakeholder’s in public private partnership road projects in Nigeria View project

A Cloud based C/ C++ Compiler for Smart Devices View project

All content following this page was uploaded by Christian Nnaemeka Egwim on 06 October 2021.

The user has requested enhancement of the downloaded file.

Machine Learning with Applications 6 (2021) 100166

Contents lists available at ScienceDirect

Machine Learning with Applications

journal homepage: www.elsevier.com/locate/mlwa

Applied artificial intelligence for predicting construction projects delay

Christian Nnaemeka Egwim a , Hafiz Alaka a ,∗, Luqman Olalekan Toriola-Coker b ,
Habeeb Balogun a , Funlade Sunmola c
a Big Data Technologies and Innovation Laboratory, University of Hertfordshire, Hatfield, AL10 9AB, United Kingdom
b School of Built Environment, University of Salford, Manchester, M5 4WT, United Kingdom
c
School of Physics, Engineering & Computer Science, University of Hertfordshire, Hatfield, AL10 9AB, United Kingdom

ARTICLE INFO ABSTRACT

Keywords: This study presents evidence of a developed ensemble of ensembles predictive model for delay prediction –
Artificial intelligence a global phenomenon that has continued to strangle the construction sector despite considerable mitigation
Machine learning efforts. At first, a review of the existing body of knowledge on influencing factors of construction project
Ensemble of ensembles
delay was used to survey experts to approach its quantitative data collection. Secondly, data cleaning, feature
Ensemble learning
selection, and engineering, hyperparameter optimization, and algorithm evaluation were carried out using
Project delay
Predictive analytics
the quantitative data to train ensemble machine learning algorithms (EMLA) – bagging, boosting, and naïve
bayes, which in turn was used to develop hyperparameter optimized predictive models: Decision Tree, Random
Forest, Bagging, Extremely Randomized Trees, Adaptive Boosting (CART), Gradient Boosting Machine, Extreme
Gradient Boosting, Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes. Finally, a
multilayer high performant ensemble of ensembles (stacking) predictive model was developed to maximize the
overall performance of the EMLA combined. Results from the evaluation metrics: accuracy score, confusion
matrix, precision, recall, f1 score, and Compute Area Under the Receiver Operating Characteristic Curve (ROC
AUC) indeed proved that ensemble algorithms are capable of improving the predictive force relative to the
use of a single algorithm in predicting construction projects delay.

1. Introduction construction projects, resulting in great dissatisfaction from the indus-

try’s clients on its overall performance. Some research publications,
Construction sector is considered a major contributor to the global for instance, Flyvbjerg (2014) and Rhodes (2019) indicated that 9
economy — represents 13% of the global gross domestic product (GDP) out of 10 global mega projects encounter delay, which usually results
with a promising 85% to $15.5 billion globally by the year 2030 in excess cost overruns. Delay is the main factor in the general com-
with three leading countries – China, the United States and India – pletion of every construction project as it raises overflow costs (Haq
contributing 57% of its global demand (Robinson, 2015). Furthermore, et al., 2017). Delay is described as an increase in time outside the
Woetzel et al. (2017) estimates global infrastructure spending at $3.4 stakeholder’s negotiated timeline of project completion or after a date
trillion annually from 2013 to 2030, which is roughly 4% of total of the termination of a lawful contract. For the client, delay connote
GDP. The sector is also considered a major backbone of any country’s loss of revenue or investments at the end of agreed time, while to the
economy — represents 3% of the total economic output of Nigeria contractor, a delay can imply an increase in overhead cost (Assaf &
(Egwim et al., 2021), 4.3% of the total economic output of Germany Al-Hejji, 2006). Also, (Bartholomew, 2001) makes an important point
(European Comission, 2017), 6% of the total economic output of United arguing that delay is a deceleration of some part of a construction
Kingdom (UK) (Rshodes, 2019), 4.1% and 6.8% of the total economic project without a complete halt.
output of the United States of America (USA) and China respectively Investigation by several researchers have shown that delay of con-
(Wang, 2018, 2019) etc. struction projects has adverse effect on the reputation of the construc-
However, despite its importance the construction industry has con- tion industry’s contribution to the global economy. With reference to
tinued to underperform. According to Egan (2018) the construction Abdul-Rahman et al. (2011), the effects of construction delay can be
industry is under-achieving as evident in its low profitability, capital evaluated with respect to its national footprints which with prejudice
investment, research and development generally caused by delay of sway the industry’s subsidy to the economy; at an industry level, where

∗ Corresponding author.
E-mail addresses: [email protected] (C.N. Egwim), [email protected] (H. Alaka), [email protected] (L.O. Toriola-Coker),
[email protected] (H. Balogun), [email protected] (F. Sunmola).

https://doi.org/10.1016/j.mlwa.2021.100166
Received 19 February 2021; Received in revised form 28 May 2021; Accepted 13 September 2021
Available online 25 September 2021
2666-8270/© 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Fig. 1. This study’s organogram.

delays impact profitability and productivity negatively; and at a project the Artificial Intelligence (AI)/Machine Learning (ML) which has been
level where delay foster industry client’s dissatisfaction on its overall widely adopted across other industries, but construction industry is
performance, cessation of contracts by the owner, and unprofitability slow to adopt (Blanco et al., 2018). The adoption of AI/ML algorithms
for contractor(s). Furthermore it has been argued (Kumar, 2016) that in construction is relatively evolving, especially when compared to
delay often lead to project cost overruns, insolvency of organization, other industries like healthcare: guiding in the choice of treatment;
loss of opportunity of future projects, dispute among project stakehold- education: virtual lectures; and transportation: autonomous vehicles,
ers. Major delay factors have been identified by several researchers as it currently uses lots of methods that were used in the centuries past
as evident in vast body of international literature (from amongst the (Marks, 2017).
oldest articles to the most recent), e.g. bad weather and jurisdic- As a commonplace the industry produces massive amount of data
tional/contractual disputes in both United States of America (USA) and daily on every project. For example data produced from images cap-
UK by Baldwin, Manthei, Rothbart, and Harris (1971) and Sullivan and tured from smart devices, IoT sensors, Building Information Modelling
Harris (1986) respectively; variation orders in both Nigeria and United (BIM is defined as structured model of data that represents building
Arab Emirates (UAE) by Motaleb and Kishk (2010) and Odeyinka and elements with its usage spanning beyond the pre-construction phase to
Adebayo (1997) respectively; planning and scheduling deficiencies in the post-construction phase (Ameziane, 2000)) etc, presents a window
Australia, delay in payment certificates in Ghana and poor site man- of opportunity for the industry and its customers to examine and gain
agement in Malaysia by Shah (2016); ground problems and inefficient profits from insights generated from past construction data through
structural connections for prefabricated components in both the UK and the aid of AI and ML. AI is defined as a collection of state-of-the-
India by (Agyekum-Mensah & Knight, 2017; Ji et al., 2018) respectively art technologies that permit machines or any computer programme to
and finally shortage of adequate equipment and poor communication sense, comprehend, act and learn (Goyal, 2019). ML on the other hand
among contracting parties in China by Chen et al. (2019). is a branch of AI that allows computers to learn by a direct route from
Several research approaches and guidelines for mitigating delay examples, data and experience replacing the traditional approaches to
of construction projects have been established over the decades. For programming that relied on hardcoded step by step rules (Royal Soci-
instance, Sullivan and Harris (1986) suggested more teamwork espe- ety, 2017). Several ML algorithms such as Genetic Algorithm, Neural
cially at the early stages of project planning. According to Alaghbari Networks, Linear Regression, Logistic Regression, Nearest-Neighbour
and Sultan (2018), Assaf, Al-Khalil, and Al-Hazmi (1995), Enshassi, Mapping, Decision Trees, K-Means Clustering, Random Forests, and
Al Najjar, and Kumaraswamy (2009) and Owalabi et al. (2014) clients Support Vector Machines exist for ML model implementation. Which
should adhere to timely payment of progress fee and consider funding ML algorithm to use depends on lot of factors, e.g., ease of use, accu-
levels at the planning stage of project. Furthermore, the survey by racy, training time, etc. Few researchers have attempted the use of AI
Gondia et al. (2020), Yaseen et al. (2020) recommended the use of and ML algorithms in some aspect of construction. Poh, Ubeynarayana,
predictive models to mitigate delay risks and time claim in construction and Goh (2018) used five popular ML algorithms to predict accident
projects. Despite all these delay factors and recommendations towards occurrence and severity of construction sites in Singapore; Zou and
mitigating delay in construction, delay still strives in the industry, Ergan (2019) Leveraged on three ML techniques to predict the influence
hence the first motivation of this study. Interestingly, only a few of construction projects on urban quality of life; Arditi and Pulket
studies have taken the advantage of the contemporary analysis method (2005) and Mahfouz and Kandil (2012) used only one and three ML
which best explains the factors that can be affecting a phenomenon models respectively to forecast end results of construction litigation all
like delay based on its predictive capabilities. This analysis method is in the USA.

2
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Table 1
List of features and target.

Only a hand full of literature have attempted the adoption of AI or Table 2

Reliability statistics.
ML to mitigate construction delay. For example, Gondia et al. (2020)
Cronbach’s alpha Cronbach’s alpha based on N of factors
used two ML models — Decision Tree and Naïve Bayesian Classifiers
standardized factors
(with accuracy value of 74.5% and 78.4% respectively) towards expe-
0.938 0.935 24
diting precise project delay risk assessments and forecast in building
project in Egypt. Also, Asadi, Alsubaey, and Makatsoris (2015) used two
ML approach (with accuracy value of 79.41% and 73.52% for decision
tree and Naive Bayes model respectively) to predict delays in construc- the weight of this observation would be raised and conversely (Di-
tion logistics in Qatar. Furthermore, Yaseen et al. (2020) developed a etterich, 2000). Naive Bayes — an effective and efficient inductive
hybrid artificial intelligence model (a combination of Random Forest ensemble methods also referred to as conditional independence, is
and Genetic Algorithm) and achieved an accuracy value of 91.67% for the most basic type of Bayesian network, in which all characteristics
delay problem prediction in Iraq. Evidently, no specific literature to are independent of the class variable’s value (Zhang, 2004). Different
the best of our knowledge at the time of this study have attempted to from the bagging, boosting and naïve bayes ensemble machine learning
use Ensemble Machine Learning Algorithms (EMLA) to predict delays techniques, stacking often considers heterogeneous week learners by
of construction projects, hence the final motivation of this study. EMLA combining the base algorithms using a meta-model rather than some
utilize a group of algorithms where the cumulative outcome from them averaging processes (Seni & Elder, 2010). To achieve these objectives
is almost always greater in terms of predictive accuracy relative to the this study will proceed to its research methodological approach to data
use of a single model as it integrate decisions from different algorithms collection and exploration in the next section. Section 3 will follow
to maximize the overall performance (Badawi et al., 2019; Dietterich, detailing its results and analysis of how the high performant ensemble
2000; Hastie, Tibshirani, & Friedman, 2009). Consequently, this study of ensembles predictive model was developed. Finally, Section 4 will
aims to develop a multilayer high performant ensemble of ensembles detail its conclusion and recommendation (see Fig. 1).
predictive model using hyperparameter optimized EMLA to predict
delay of construction projects. The following objectives will be used 2. Research methodology
to achieve this aim:
A review of existing literatures on influencing factors of construc-
1. Carry out literature review towards gathering the most common tion projects delay was used to establish the most applicable factors
factors affecting delay of construction projects and use it to there by fulfilling part of the first objective. Twenty-four applicable
conduct survey of experts to establish the most applicable factors factors (see Table 1) were consolidated at the end of the review which
affecting delay of construction projects. was pre-empted as search results became repetitive. These factors were
2. Utilize established factors in objective 1 as independent variables used to design a survey in form of questionnaire to fulfil the remaining
for EMLA (bagging and boosting) to develop hyperparameter part of the first objective.
optimized predictive models. The questionnaire was divided into five sections such that each
3. Combine the best predictive models from objective 2 to develop section deals with a specific feature of event under investigation (delay
a multilayer high performant ensemble of ensembles (stacking) factors). Section A asked the responders to rate how eighteen factors
predictive model. affected the duration of the project. Where a project does not have
an official schedule/programme of work indicating the duration of
Bagging is an ensemble machine learning technique where multi- the project, they were asked to use an assumed duration that such a
ple models of the same algorithm are used, however with different project would have taken, or the duration based on an agreed date of
subsets of data selected randomly (Opitz & Maclin, 1999). Boosting completion with the client. Section B enquired to what level of detail
is a repetitive technique that adapts the weight of the observation one factor had, and Section C asked for frequency of occurrence of two
to the last grading. If an observation has been falsely categorized, factors, Section D enquired what percentage a responder would give to

3
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Fig. 2. EMLA Prediction Architecture.

Fig. 3. Shape of Dataset’s Distribution.

three factors. All these made a total of twenty-four factors as features how they affect the phenomenon (cause and effect) and often uses large
(independent variables). Also, the responders were asked to rate how samples (Easterby-Smith et al., 2008; Morgan, 1980).
long the entire project delayed for in the final Section E; this represents Prior to distribution of the questionnaire, pilot testing was con-
the target (dependent variable) for the EMLA implementation. In the ducted by asking group of experts in construction to comment on the
end, a total of 302 questionnaire were distributed. A total of 120 representativeness and suitability of the questions. This was done to
responses were received and since sampling cannot be done in isolation ensure thorough understanding of the questions by the responders and
as there are no special right decision for determining sample size to avoid errors when recording data, to assess questions validity and the
for a research Flick (2014), this number of responses are considered likely reliability of data to be collected (Saunders et al., 2009, p.425).
satisfactory (Delice, 2001; Durbarry, 2019; James, Joe, & Chadwick, The responders of the questionnaire were experienced stakeholders
2001). from the construction industries in Nigeria. They were instructed to
The questions were designed on a Likert scale with a scale of one have in mind any project of choice they have worked on in the past
to five. Although questions in each section were analysed individually, while answering the questions. Since this study aims to develop a
they were also linked together in such a way that their respective multilayer high performant ensemble of ensembles predictive model
answers accumulatively helped to arrive at a finding (delay). The use using hyperparameter optimized EMLA to predict delay of construction
of questionnaire research signifies independent observation — implies projects makes it a deductive research which further reinforces its
the questionnaire will be completed in the absence of the researcher, positivism. Convenient sampling method was selected due to its ease of
and since one of the objectives of this study is set out to establish the accessibility, geographical proximity and affordability which satisfied
true (most) applicable factor to construction delay makes it a positivist this research. Convenient sampling (also called haphazard/accidental
research. A positivist researcher is usually independent (of the subject) sampling) is a typical nonprobability/non-random sampling where a
as an observer, reduces a phenomenon to simpler measurable factors researcher considers the most convenient object(s) and time, effort
(causes of delay in construction projects deduced from several literature and money for conducting data collection (Matthews, Ross, & Ellison,
was reduced using Likert scale.), explains the elements with regards to 2010).

4
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Fig. 4. Correlation Matrix Plot.

Fig. 5. Standardized Dataset Distribution.

The collected data received via Google forms were extracted and ROC curve modelling evaluation metrics were employed to measure the
converted into a comma-separated values file. To achieve the sec- new model and EMLA performance on the testing dataset as shown in
ond objectives, this raw data was pre-processed into a clean and an Fig. 2
analysable dataset by carrying out data imputation and outlier detec-
tion. Scaling and encoding feature engineering techniques were em- 3. Results and analysis
ployed to enable the selection of features or input variables to increase
the predictive power (hyperparameter optimization) of the EMLA. The 3.1. Reliability analysis of survey outcomes
resulting clean, pre-processed and feature engineered dataset was split
randomly into two in a ratio of 60% to 40% of training dataset A reliability analysis from the Alpha Test of Cronbach was carried
and testing dataset respectively. EMLA were imported into a running out to test the reliability of the respondents ’ answers for all 24 factors.
instance of Jupiter Notebook using Scikit-learn — an integral Python Alpha of Cronbach (𝛼) can be written as:
programming language module with a broad spectrum of state-of-the- 𝑁.𝑐
𝛼=
art algorithms for supervised and unsupervised medium-scale problems 𝑣 + (𝑁 − 1) .𝑐
(Pedregosa et al., 2011). Since EMLA fit input variables (delay factors) where, 𝑁 is the number of factors, 𝑐 the average covariance between
to a known output variable (delay) supervised modelling taxonomy factor-pairs and 𝑣 the average variance. The main purpose of 𝛼 was
was undoubtedly chosen. The training dataset (60% of total dataset) to assess how accurate the data obtained from the survey were, by
was used to fit different EMLA while their knobs were optimized evaluating the internal consistency coefficient of data. In addition, it
during successive runs to further improve the performance for making was important to decide whether the combined factors help to calculate
predictions on unseen test dataset (40% of total dataset). The resulting the same construct (delay).
best performing EMLA selected via the hard and soft voting rule were While there is no lower bound, the higher the alpha coefficient of
used as new input variables which produced a multilayer high perfor- Cronbach is to 1 , the greater the internal accuracy of the factors (Gliem
mant ensemble of ensembles algorithm (to achieve the third objective). & Gliem, 2003). An 𝛼 of 0.7 or higher is known to be symptomatic
Finally, Accuracy, Confusion Matrix, Precision, Recall, 𝐹1 -Score and of strong inner harmony of the factors in determining the reliability

5
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Fig. 6. Chi-squared Test.

Table 3
Algorithms and libraries.

Table 4
Confusion matrix.
Prediction
Negative Positive
(delay < threshold limit) = 0 (delay > threshold limit) = 1
Negative True negative False positive
Actual
(delay < threshold limit) = 0
Positive False negative True positive
(delay > threshold limit) = 1

of the construct (Bhatnagar, Kim, & E. Many, 2014). However, It is (F25) represent the target/dependent variable as shown in Table 1.
the viewpoint of Nunnally (1978) that the 𝛼 should surpass 0.8 for Descriptive statistics of these columns showed they contain discrete
fundamental science to consider accurate responses to a factor. The categorical data with ordinal values from 1–5. Furthermore, Fig. 3
findings of this study on the 24 factors in this analysis show strong displays a summary of the central tendency, dispersion and shape of
inner stability 𝛼 of 0.938 as shown in Table 2. the dataset’s distribution, as relates to its mean, median and standard
deviation (std). For instance, F3 has a mean of 2.48, median of 3, and
3.2. Data pre-processing std of 1.11 — implies that on the average, during the course of most
of the project on which each respondent answers were based, Inflation
An initial investigation on the data through Exploratory Data Anal- or sudden increase in good/commodities (F3) was medium.
ysis (EDA) showed that the data is a two-dimensional array with 120 After EDA, correlation analysis was done to identify multicollinear-
rows and 25 columns where the 1st to the 24th columns (F1–F24 factor ity among predictors (features vs target) using their respective correla-
IDs) represent the features/independent variables and the 25th column tion coefficient values (See Fig. 4). The correlation matrix plot in Fig. 4

6
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Fig. 7a. DT ROC AUC Plot. Fig. 7d. Extra Tree ROC AUC Plot.

Fig. 7b. RF ROC AUC Plot. Fig. 7e. AdaBoost ROC AUC Plot.

Fig. 7c. Bagging ROC AUC Plot.

Fig. 7f. GMB ROC AUC Plot.

shows the cross correlation between each feature (F1–F24) and the 3.3. Feature engineering
target (F25). For example, F13 has a positive correlation of 0.4 to F25,
F5 has a negative correlation of −0.07 to F25 and so on. In general, the
As a habitual requirement for most ML estimators owing to their
existence of multicollinearity implies an absolute correlation coefficient
underlying assumptions of any given dataset to be normally distributed,
>0.7 among two or more predictors (Dormann et al., 2013). Evidently, with zero mean and unit variance (Pedregosa et al., 2011), this study
there exist multicollinearity between F12 and 13, F13 and F14, F13 and utilized standardization feature scaling method to meet this require-
F16 etc as shown below. ment by subtracting the mean from each feature observation and

7
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Fig. 7j. Multinomial ROC AUC Plot.

Fig. 7g. XGBoost ROC AUC Plot.

the target (F25). Consequently, data from column F25 were encoded
into 0 (no delay) for its ordinal values containing any number <= 3
and 1(delay) for its ordinal values containing 4 and 5 such that on
each occurrence, this value (0 or 1) can then help to show if a category
(delay) is present or not. Finally, the dataset was split using Scikit-
learn’s train_test_split function at a ratio of 60:40 for training (72 data
points) and testing (48 data points) respectively.

3.4. Feature selection

In typical machine learning pipeline, feature selection is a crucial

mechanism designed to eliminate obsolete, redundant, and noisy char-
acteristics and retain a limited subset of features from the primary
feature space (Kira & Rendell, 1992; Wei et al., 2020). In relation to
this study, a multivariate filter-based feature selection method called
Chi-squared was chosen to eliminate obsolete, redundant and noisy
Fig. 7h. Bernoulli ROC AUC Plot.
features, boost model accuracy, improve model interpretability, lower
computational complexity and enhance generalizability by mitigating
overfitting. This Chi-squared method was inevitably chosen since our
data contains categorical features (frequencies) and binary target vari-
able. Fig. 6 shows the outcome of the Chi-squared test with the varying
minimal degree of association of each feature and the target.
Consequently, these irrelevant features with the coloured bars (F6,
F3, F1, F5, F2, F4, F19, F15, F7, F21, F23, F9, F22, F20, and F14) were
removed before fitting different EMLA on the training dataset (60%
of total dataset) and comparing their respective parameter settings
on unseen test dataset (40% of total dataset) as a bias trade-off for
individual EMLA to further improve their performance for making
predictions. Hence only the remain 9 important feature factors out of
the initial 24 factors was subsequently used.

3.5. Ensemble machine learning technique

This technique involves the use of multiple algorithms where the

Fig. 7i. Gaussian ROC AUC Plot.
cumulative outcome from them is almost always greater in terms of
predictive accuracy relative to the use of a single algorithm as it
integrate decisions from different algorithms to maximize the overall
dividing by the standard deviation as shown in the equation below: performance (Badawi et al., 2019; Dietterich, 2000; Hastie et al., 2009).
𝑋−𝐱 All examples of ensemble learning techniques available in Scikit-learn
𝑋′ =
𝜎 version 0.23.2 were used for EMLA experimentations in this study.
Where 𝑋 ′ represents the standardized value; 𝑋 a given feature They include Bagging (Bootstrap Aggregating), Boosting (Hypothesis
observation; 𝐱 the mean and 𝜎 the standard deviation. Hence our Boosting), Naive Bayes, and Stacking (see Table 3). Their respective
resulting feature scaled dataset has its variance at 1, centred its mean algorithms and libraries used are as follows in Table 3:
at 0 with a varying min max value as shown in Fig. 5. To further understand the underlying principles behind the pro-
Furthermore, as a final transformation on the dataset, One-hot posed approach of this study we represent these ensemble methods
encoding (k-1 variant ) a categorical encoding technique was done on mathematically by the following formulae:

8
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Table 5
Default parameter ensemble’s performance metrics report.

Bagging is mathematically expressed by the following formula: Thus,

∏
𝑛
( )
𝑓𝑏𝑎𝑔 = 𝑓1 (𝑥) + 𝑓2 (𝑥) + ⋯ + 𝑓𝑏 (𝑥) 𝑦̂ = arg max 𝑃 (𝑦) 𝑃 𝑥𝑖 |𝑦
𝑦 𝑖=1
where the term on the left, 𝑓𝑏𝑎𝑔 is the bagged prediction, and
Where 𝑦̂ and 𝑃 are predicted class and probability of occurrence
𝑓1 (𝑥) to 𝑓𝑏 (𝑥) the actual learners (Random Forest, Bagging and Extra-
respectively.
Trees used in this study) are the term on the right. b represents the
Unlike bagging, boosting, and naive bayes, stacking, also known
cumulative number of learners. as stacked generalization whose base estimator(s) e.g., DT algorithm
Three key steps were used to experiment the boosting ensemble used in this study are trained on heterogeneous EMLA such that base
technique. First, the target variable (projects delay) is predicted using estimator’s outputs are combined using a meta-classifier as shown
an initial model 𝑓0 with a residual (y – 𝑓0 ). Secondly, a new model ℎ1 below:
is fit to the previous step’s residuals. Finally, 𝑓0 and ℎ1 are merged to ∑
𝑛
( ( ) )
produce 𝑓1 , the boosted variant of 𝑓0 as shown below: min 𝑙 𝑓 𝑥𝑖 , 𝑦𝑖 + 𝜆𝑟(𝑓 )
𝑓
𝑖=1
𝑓1 (𝑥) < −𝑓0 (𝑥) + ℎ1 (𝑥) Where the first term in the above equation is the empirical risk which
is defined by a loss function S, that evaluates the effectiveness of the
To boost 𝑓1 ’s results, we built a new model 𝑓𝑚 based on 𝑓1 ’s
function 𝑓 . The second term is the regularization term, and it evaluates
residuals repeated for ‘m’ iterations until the residuals are as low as
the complexity of the function 𝑓 , which is normally a norm of function
possible as shown below: 𝑓 or its derivatives. Consequently, we proceed to the performance
𝑓𝑚 (𝑥) < −𝑓𝑚−1 (𝑥) + ℎ𝑚 (𝑥) metrics of EMLA in the next sub section.

Naive Bayes ensemble methods we used follows the Bayes’ theorem 3.6. Algorithms performance metrics
which establishes the link between dependent variable y and related
independent variables vector 𝑥1 to 𝑥𝑛 as shown below: Typical performance metrics for evaluating classification-based prob-
( ) lems like the one for this study are, accuracy score, confusion matrix,
( ) 𝑃 (𝑦) 𝑃 𝑥1 , … , 𝑥𝑛 |𝑦 ∏𝑛
( ) precision, recall, 𝑓1 score and Compute Area Under the Receiver
𝑃 𝑦|𝑥1 , … , 𝑥𝑛 = ∝ 𝑃 (𝑦) 𝑃 𝑥𝑖 |𝑦
𝑃 (𝑥1 ..., 𝑥𝑛 ) Operating Characteristic Curve (ROC AUC).
𝑖=1

9
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Table 6
Hyperparameter optimized ensemble’s performance metrics report.

Accuracy score is the sum of accurate estimates that have been made cases with delay > threshold accurately predicted to surpass the total
and separated into one per cent by the overall number of predictions number of cases with delay > threshold in the test data of this study.
that have been made. Accuracy score is generally not the only preferred It is expressed mathematically as:
metrics to use for classifiers especially with skewed datasets. The
True Positives
formula used to calculate accuracy score for this study is: Recall =
True Positives + False Negatives
True Negatives + True Positives
Accuracy score = 𝑥100 𝐹1 Score is the harmonic mean of precision and recall. As regular
Number of Predictions
Confusion matrix is a 2 x 2 matrix description of the number of mean gives equal weight to all values, harmonic mean gives more
accurate and inaccurate predictions made by a classifier. The confusion weight to low values. The 𝐹1 score favours classifiers that have similar
matrix result used for EMLA experimentations is shown in Table 4. precision and recall in this study. It is expressed mathematically as:
Precision (false positive rate) measures the accuracy of positive 2 Precision 𝑥 Recall
𝐹1 = = 2𝑥
predictions. Hence, in this study it is the accurately predicted ratio 1
+ 1 Precision + Recall
of cases with delay < threshold limit to be less than or equal to the Precision Recall

threshold limit to the total number of cases with delay < threshold limit The Receiver Operator Characteristic Curve (ROC) is a recall plot
in the test data. It is expressed mathematically as: of the 𝑦-axis against precision of the 𝑥-axis. In this study, the threshold
True Positives of the algorithm, which ranges from zero to one with a scale of 0.1, is
Precision =
True Positives + False Positives seen on the vertical axis to the right of the plot and on the curve as
Recall (sensitivity or true positive rate) is the ratio of positive well. Area under the curve (AUC) is the area under the ROC curve that
instances that are correctly detected by the classifier. It is the ratio of is generally recognized as the best indicator of the overall performance

10
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Fig. 8. Bagging Decision Boundaries.

of a classifier. Since the maximum value of the precision and recall of BNB: binarize = True, alpha = 0, fit_prior = True, class_prior = None;
the x and y axes of the ROC curve are 1, the maximum AUC value, (ix) MNB: alpha = 100, fit_prior = False, class_prior = None; (x) GNB:
indicating excellent accuracy, is 1. The minimum AUC value, however, priors = None, var_smoothing = 1e-01.
is 0.5.
Cross validation is a resampling technique for evaluating machine 3.7. Algorithms performance evaluation
learning models on a small dataset. Since the estimators (see Table 3)
for this study are classifiers and the target variable is binary, stratified At first the EMLA experimentations was performed on the first
k-fold a variant of k-fold that returns stratified folds containing about 7 algorithms (see Table 3) with their default parameters (without
the same proportion of target class as the initial dataset is used to hyperparameter optimization) on the unseen test dataset (40% of total
evaluate cross validation scores. As a result, the variation between dataset) after training the EMLA (model fitting on training dataset).
the estimates is minimized, and the average error estimation is more Table 5 shows their respective evaluation metrics report on test dataset.
accurate, hence mitigates potential overfitting. To obtain the cross A close attention to column 9 of Table 5 shows that the best of them
validation score in this study, we took the mean of stratifield10-fold only had a 15% increase in the minimum AUC value. Hyperparameter
(i.e., where k = 10) for each EMLA. optimization became more necessary, hence we proceeded with it and
The main parameters for each model are as follows. (i) DT: ran- decided not to continue the model training and testing experiment with
dom_state = 42, min_samples_leaf = 5, criterion = ’gini’, the default parameter for the last algorithm due to computational cost
min_samples_split = 4, n_jobs = -1; (ii) RF: n_estimators = 100, n_jobs and time.
= -1, random_state = 42, bootstrap = True, warm_start = False; (iii) Interestingly, we obtained almost double (27%) in value after hy-
Bagging: n_estimators = 100, bootstrap = True, n_jobs = -1, ran- perparameter optimization on the EMLA when compared to the initial
dom_state = 42, min_samples_leaf = 2, min_samples_split = 3, verbose EMLA experimentations with their default parameters on test dataset
= 1; (iv) Extra-Trees: min_samples_split = 4, random_state = 42, (40% of total dataset) as shown in Table 6.
criterion = ’entropy’, n_jobs = -1, min_samples_leaf = 2, n_estimators Table 6 presents the evaluation metrics report of the models de-
= 100; (v) AdaBoost: max_depth = 1, n_estimators = 100, learn- veloped with the 11 algorithms used for EMLA experimentations on
ing_rate = 3, min_samples_split = 3, n_jobs = -1, random_state = 42; the test data (40% of total dataset) for a weak learner DT as the base
(vi) GBM: loss: ‘deviance’, n_jobs = -1, random_state = 42, learn- estimator. Comparing the ensemble of ensembles (stacking) approach
ing_rate = 2, min_samples_split = 3, min_samples_leaf = 3; (vii) XG- proposed with existing naïve bayes method used to potentially predict
Boost: base_score = 1, booster = ’gbtree’, colsample_bylevel = 1.9, construction project delay risk by Gondia et al. (2020), this report
colsample_bynode = 1, colsample_bytree = 1, gamma = 0, gpu_id = clearly shows and confirms that although naïve bayes which is consid-
-1, importance_type = ’gain’, learning_rate = 1, max_delta_step = 2, ered as one of the most effective and efficient inductive EMLA due to
max_depth = 6, min_child_weight = 1, n_estimators = 100, n_jobs = -1, the conditional independence assumption on which it is theoretically
num_parallel_tree = 3, random_state = 42, tree_method = ’exact’; (viii) built, is however rarely valid in a real-world applications such as

11
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Fig. 9. Boosting Decision Boundaries.

the case in this study (Zhang, 2004). Furthermore, the better perfor- Gaussian, however although one of them (multinomial) did not perform
mance (75%, 76%, and 0.7448 of accuracy score, cross validation score as much as the base estimator, we proceeded with vote casting, which
and ROC AUC respectively) of Gaussian naïve bayes over Bernoulli ultimately yielded a third performant model. Figs. 8–10 shows how
naïve bayes (60.42%, 60% and 0.6171 of accuracy score, cross vali- these ensembles made their respective decisions interdependently.
dation score and ROC AUC respectively) and multinomial naïve bayes Where Ensemble 1, 2, and 3 are the resulting new performant
(60.42%, 75% and 0.6031 of accuracy score, cross validation score and models obtained during vote casting in ensemble bagging, boosting,
ROC AUC respectively) owes to the existence of dependences between and naïve bayes respectively. All in all, we conclusively proceeded to
its features since our dataset was optimally standardized to normally use these aggregated predictions from Ensemble 1, 2, and 3 to train a
distributed features (see Section 3.3). Fig. 11 (see Appendix A) shows new algorithm called Ensemble of Ensembles using Scikit-learn’s Mlens
the tree structure of this base estimator while Figs. 7a to 7j present the library. Consequently, resulting to a more performant model with a
multiple ROC AUC plot for the algorithms used. much more better accuracy score, confusion matrix, precision, recall, f1
For a start, we benched marked our EMLA on DT (typical unstable score, and Compute Area Under the Receiver Operating Characteristic
model) as their base estimator, then we proceeded the experimentation Curve (ROC AUC) as shown in Table 6.
by trying to gain stability for the base estimator using 3 bagging
ensemble algorithms namely optimized RF (a natural ensemble of DT), 4. Conclusion & recommendations
Bagging and Extremely Randomized Trees. As expected, they all per-
formed better than the base estimator based on all evaluation metrics The perpetual occurrence of a global phenomenon — delay in
(see Table 6), the challenge, however, was how to identify the best construction sector despite considerable mitigation efforts remains a
ensemble algorithm. To avoid bias and to enhance generalizability, we huge concern to its policy makers. Interestingly, this sector which
casted a vote with the bagging ensembles using the hard and soft voting produces massive amount of data from IoT sensors, building infor-
rule in Scikit-learn’s VotingClassifier. mation modelling, on most of its projects daily is slow in taking the
A resulting performant model (higher accuracy score and low vari- advantage of the contemporary analysis method — artificial intel-
ance) undoubtedly emerged. The experiment was repeated with the ligence/machine learning which best explains the factors that can
base estimator but this time using 3 boosting ensemble algorithms: affect a phenomenon like delay based on its predictive capabilities
Adaptive Boosting (CART), Gradient Boosting Machine, and Extreme haven been widely adopted across other sectors. In this study there-
Gradient Boosting and not surprisingly similar to the bagging ensem- fore, a premise to use ensemble machine learning algorithms (EMLA)
bles, they all performed better than the base estimator (see Table 6). for predicting delay of construction projects was architected, built
We again casted votes amongst them which yielded yet another perfor- and presented. First a review of existing body of knowledge on in-
mant model. Finally, we again repeated the experiment with the base fluencing factors of construction projects delay was used to conduct
estimator using 3 naïve bayes ensembles: Bernoulli, multinomial and survey of experts as an approach to its data collection and exploration.

12
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Fig. 10. Naïve Bayes Decision Boundaries.

The resulting dataset applied to EMLA was used to develop hyperpa- construction project within the sector, the unique data transformation
rameter optimized predictive models: Decision Tree, Random Forest, employed in this study may not, as typical of any data driven model,
Bagging, Extremely Randomized Trees, Adaptive Boosting (CART), Gra- be transferable to the data from other regions. Nevertheless, other
dient Boosting Machine, and Extreme Gradient Boosting. Finally, a region’s project datasets can be applied to the processes described in
multilayer high performant ensemble of ensembles predictive model this study. Also, the sample size of the respondents of this study may
was developed to maximize the overall performance of the EMLA not be representative of the total population size of the region. In order
combined. to produce improved classification outcomes, future studies should
Results from the algorithm evaluation metrics: accuracy score, con- be targeted at extending the algorithms either by further parameter
fusion matrix, precision, recall, F1, and ROC AUC indeed proved that optimization or feature engineering. Other methods used in the creation
EMLA are capable of improving the predictive force relative to the of ensemble models, apart from bagging, boosting, naïve bayes and
use of a single algorithm in predicting construction projects delay. stacking, should also be considered for predicting construction projects
By developing a multilayer high performant ensemble of ensembles delay.
predictive model, the current research contributes to the effort of
improving time efficiency of construction projects – a key performance CRediT authorship contribution statement
indicator for successful projects. Ultimately, this model can subse-
quently be integrated into construction information system to promote
Christian Nnaemeka Egwim: Conceptualization, Methodology,
evidence-based decision-making, thereby enabling constructive project
Software, Formal analysis, Writing – original draft. Hafiz Alaka: Su-
risk management initiatives. As compared to existing numerical or
pervision, Writing – review & editing, Investigation, Visualization.
statistical approaches, which used pure mathematical techniques such
Luqman Olalekan Toriola-Coker: Data curation. Habeeb Balogun:
as the arithmetic mean, standard deviation, hypothesis testing, etc.
Resources, Data curation. Funlade Sunmola: Supervision, Writing –
to draw inference from data, our predictive analytics approach used
review & editing.
known results (input variables), statical methods and advance ML
algorithms to develop a novel multilayer high performant ensemble
Appendix A
of ensembles predictive model to forecast futuristic delay values for
complex and new data of typical construction projects. Thus, will help
improve the quality of decisions and risks to be taken by several See Fig. 11.
construction sector stakeholders on their present or future construction
projects which as a result will foster trust, increase in productivity Appendix B. Supplementary data
and revenue and more importantly yield timely delivery of construc-
tion projects in the sector. While the proposed contemporary method Supplementary material related to this article can be found online
of analysis is assumed to be applicable in mitigating delay of any at https://doi.org/10.1016/j.mlwa.2021.100166.

13
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Fig. 11. DT as base estimator.

References Ameziane, F. (2000). Information system for building production management. Interna-
tional Journal of Production Economics, 64(1), 345–358. http://dx.doi.org/10.1016/
S0925-5273(99)00071-7, Elsevier Science B.V..
Abdul-Rahman, H., et al. (2011). Project schedule influenced by financial issues: Arditi, D., & Pulket, T. (2005). Predicting the outcome of construction litigation using
Evidence in construction industry. Scientific Research and Essays, 6(1), 205–212. boosted decision trees. Journal of Computing in Civil Engineering, 19(4), 387–393.
http://dx.doi.org/10.5897/SRE10.989. http://dx.doi.org/10.1061/(ASCE)0887-3801(2005)19:4(387), American Society of
Civil Engineers.
Agyekum-Mensah, G., & Knight, A. D. (2017). The professionals’ perspective on the
Asadi, A., Alsubaey, M., & Makatsoris, C. (2015). A machine learning approach for
causes of project delay in the construction industry. Engineering, Construction and
predicting delays in construction logistics. International Journal of Advanced Logistics,
Architectural Management, 24(5), 828–841. http://dx.doi.org/10.1108/ECAM-03-
4(2), 115–130. http://dx.doi.org/10.1080/2287108x.2015.1059920, Informa UK
2016-0085.
Limited.
Alaghbari, W., & Sultan, B. (2018). Delay factors impacting construction projects Assaf, S. A., & Al-Hejji, S. (2006). Causes of delay in large construction projects.
in Sana ’ a -Yemen 1. PM World Journal, VII(December), 1–28, Available at: International Journal of Project Management. Pergamon, 24(4), 349–357. http://dx.
https://www.researchgate.net/publication/329656460%0ADelay. doi.org/10.1016/j.ijproman.2005.11.010.

14
C.N. Egwim, H. Alaka, L.O. Toriola-Coker et al. Machine Learning with Applications 6 (2021) 100166

Assaf, S. A., Al-Khalil, M., & Al-Hazmi, M. (1995). Causes of delay in large building Kumar, R. D. (2016). Causes and effects of delays in construction industry. International
construction projects. Journal of Management in Engineering. American Society of Research Journal of Engineering and Technology, 3(4), 1831–1837, Available at:
Civil Engineers, 11(2), 45–50. http://dx.doi.org/10.1061/(ASCE)0742-597X(1995) www.irjet.net.
11:2(45). Mahfouz, T., & Kandil, A. (2012). Litigation outcome prediction of differing site
Badawi, H., et al. (2019). Use of ensemble methods for indirect test of RF circuits: Can condition disputes through machine learning models. Journal of Computing in
it bring benefits? In LATS 2019-20th IEEE Latin American test symposium (no. 1). Civil Engineering, 26(3), 298–308. http://dx.doi.org/10.1061/(ASCE)CP.1943-5487.
http://dx.doi.org/10.1109/LATW.2019.8704641. 0000148.
Baldwin, J. R., Manthei, J. M., Rothbart, H., & Harris, R. B. (1971). Causes of delay in Marks, M. (2017). Construction: The next great tech transformation Voices Michael
the construction industry. Journal of the Construction Division, 177–187, Available at: Marks.
https://cedb.asce.org/CEDBsearch/record.jsp?dockey=0018302. (Accessed 23 April Matthews, B., Ross, L., & Ellison, N. (2010). RESEARCH METHODS : A great starting
2020). point for students and would-be social researchers. Available at: www.pearsoned.
Bartholomew, S. H. (2001). Construction contracting: business and legal principles. Prentice co.uk/matthews. (Accessed 24 September 2020).
Hall. Morgan, D. H. J. (1980). Sociological paradigms and organisational analysis. Sociology,
Bhatnagar, R., Kim, J., & E. Many, J. (2014). Candidate surveys on program evaluation: 14(2), 332–333. http://dx.doi.org/10.1177/003803858001400219.
Examining instrument reliability, validity and program effectiveness. American Motaleb, O., & Kishk, M. (2010). An investigation into causes and effects of construction
Journal of Educational Research. Science and Education Publishing Co. Ltd, 2(8), delays in UAE. In Association of Researchers in Construction Management, ARCOM
683–690. http://dx.doi.org/10.12691/education-2-8-18. 2010 - Proceedings of the 26th Annual Conference (pp. 1149–1157). Available at:
Blanco, J. L., et al. (2018). Artificial intelligence : Construction technology https://www.researchgate.net/publication/266174953. (Accessed 24 April 2020).
’ s next frontier. Mckinsey & Company, (April), 1–8. Available at: Nunnally, J. C. (1978). Phychometric theory (p. 640).
https://www.mckinsey.com/industries/capital-projects-and-infrastructure/our- Odeyinka, H., & Adebayo, Y. (1997). The causes and effects of construction delays on
insights/artificial-intelligence-construction-technologys-next-frontier. completion cost of housing projects in Nigeria. Journal of Financial Management of
Chen, G. X., et al. (2019). Investigating the causes of delay in grain bin construction Property and Construction, 2(3), 31–44, Available at: https://www.researchgate.net/
projects: the case of China. International Journal of Construction Management. Taylor publication/249643683. (Accessed 24 April 2020).
and Francis Ltd., 19(1), 1–14. http://dx.doi.org/10.1080/15623599.2017.1354514. Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal
Delice, A. (2001). The sampling issues in quantitative research. Educational Sciences: of Artificial Intelligence Research. Morgan Kaufmann Publishers, 11, 169–198. http:
Theory & Practices, 10(4), 2001–2019. //dx.doi.org/10.1613/jair.614.
Dietterich, Thomas G. (2000). Ensemble methods in machine learning. In Lecture notes Owalabi, J. D., et al. (2014). Causes and effects of delay on project construction delivery
in computer science (including subseries lecture notes in artificial intelligence and lecture time. International Journal of Education and Research, 2(4), 197–208, Available at:
notes in bioinformatics), (pp. 1–15). http://dx.doi.org/10.1007/3-540-45014-9_1. www.ijern.com. (Accessed 24 April 2020).
Dormann, C. F., et al. (2013). Collinearity: A review of methods to deal with it and a Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine
simulation study evaluating their performance. Ecography. Blackwell Publishing Ltd, Learning Research, 12(85), 2825–2830, Available at: http://scikit-learn.sourceforge.
36(1), 27–46. http://dx.doi.org/10.1111/j.1600-0587.2012.07348. net. (Accessed 7 January 2021).
Durbarry, R. (2019). Quantitative research. Research Methods for Tourism Students, Poh, C. Q. X., Ubeynarayana, C. U., & Goh, Y. M. (2018). Safety leading indicators for
(August), 98–113. http://dx.doi.org/10.4324/9780203703588-9. construction sites: A machine learning approach. Automation in Construction, 93,
Easterby-Smith, M., et al. (2008). Management research: theory and practice. 375–386. http://dx.doi.org/10.1016/j.autcon.2018.03.022, Elsevier B.V..
cloudgene.uibk.ac.at. Available at: https://cloudgene.uibk.ac.at/l2xq3pav9aoy/12- Rhodes, C. (2019). Construction industry: statistics and policy (01432), (pp. 1–13). House
alysha-lowe/read-9781847871770-management-research-theory-and-practice- of Commons Library.
paperbac.pdf. (Accessed 24 September 2020). Robinson, T. G. (2015). Global construction market to grow $ 8 trillion by 2030 :
Egan, S. J. (2018). Rethinking the Report of the Construction Task Force, Construction driven by China, US and India. Global Construction, 44.
Task Force, Available at: http://constructingexcellence.org.uk/wp-content/uploads/ Royal Society, T. (2017). Machine learning: The power and promise of computers that
2014/10/rethinking_construction_report.pdf. learn by example.
Egwim, C. N., et al. (2021). Extraction of underlying factors causing construction Saunders, M., et al. (2009). Research methods for business students (5th ed.). Pearson
projects delay in Nigeria. Journal of Engineering, Design and Technology, http://dx. Education Limited, Available at www.pearsoned.co.uk. (Accessed 24 September
doi.org/10.1108/jedt-04-2021-0211, ahead-of-p(ahead-of-print). 2020).
Enshassi, A., Al Najjar, J., & Kumaraswamy, M. (2009). Delays and cost overruns in the Seni, G., & Elder, J. F. (2010). Synthesis lectures on data mining and knowledge discovery:
construction projects in the Gaza Strip. Journal of Financial Management of Property vol. 2, Ensemble methods in data mining: improving accuracy through combining
and Construction, 14(2), 126–151. http://dx.doi.org/10.1108/13664380910977592. predictions (1), (pp. 1–126). Morgan & Claypool Publishers LLC, http://dx.doi.org/
European Comission (2017). European construction sector observatory. (p. 27). (June). 10.2200/s00240ed1v01y200912dmk002.
Flick, U. (2014). An introduction to qualitative research. Available at: https://us.sagepub. Shah, R. K. (2016). An exploration of causes for delay and cost overruns in construction
com/en-us/nam/an-introduction-to-qualitative-research/book240398. projects: Case study of Australia, Malaysia & Ghana. Journal of Advanced College
Flyvbjerg, B. (2014). What you should know about megaprojects and why: An overview. of Engineering and Management, 2, 41. http://dx.doi.org/10.3126/jacem.v2i0.16097,
Project Management Journal, 45(2), 6–19. http://dx.doi.org/10.1002/pmj.21409. Nepal Journals Online (JOL).
Gliem, J. a., & Gliem, R. R. (2003). Calculating, interpreting, and reporting cronbach’s Sullivan, A., & Harris, F. C. (1986). Delays on large construction projects. International
alpha reliability coefficient for likert-type scales. In 2003 Midwest research to Journal of Operations & Production Management, 6(1), 25–33. http://dx.doi.org/10.
practice conference in adult, continuing, and community education (1992), (pp. 82–88). 1108/eb054752.
http://dx.doi.org/10.1109/PROC.1975.9792. Wang, T. (2018). China: construction industry’s contribution share to GDP 2018–2021 |
Gondia, A., et al. (2020). Machine learning algorithms for construction projects delay statista. Statista. Available at: https://www.statista.com/statistics/1068213/china-
risk prediction. Journal of Construction Engineering and Management. American Society construction-industry-gdp-contribution-share/. (Accessed 18 April 2020).
of Civil Engineers (ASCE), 146(1), Article 04019085. http://dx.doi.org/10.1061/ Wang, T. (2019). Value added of U.S. construction as a percentage of GDP 2018.
(ASCE)CO.1943-7862.0001736. Statista. Available at: https://www.statista.com/statistics/192049/value-added-by-
Goyal, M. (2019). Artificial intelligence : A tool for hyper personalization. International us-construction-as-a-percentage-of-gdp-since-2007/. (Accessed 18 April 2020).
Journal of 360 Management Review, 07, 2320–7132. Wei, G., et al. (2020). A novel hybrid feature selection method based on dynamic
Haq, S., et al. (2017). Effects of delay in construction projects of punjab-Pakistan : An feature importance. Applied Soft Computing, 93, Article 106337. http://dx.doi.org/
empirical study effects of delay in construction projects of punjab-Pakistan. Journal 10.1016/j.asoc.2020.106337.
of Basic and Applied Scientific Research, 3(January 2014), 87–96. Woetzel, J., et al. (2017). Bridging infrastructure gaps has the world made progress?.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Ensemble learning. (pp. 1–20). McKinsey Global Institute, (October), p. 10. Available at: https://www.
http://dx.doi.org/10.1007/b94608_16. mckinsey.com/~/media/McKinsey/Industries/CapitalProjectsandInfrastructure/
James, E., Joe, W., & Chadwick, C. (2001). Organizational research : Determining OurInsights/Bridginginfrastructuregapshastheworldmadeprogress/
appropriate sample size in survey research. Information Technology, Learning, and BridginginfrastructuregapsHowhastheworldmadeprogressv2/MGI-Bridging-
Performance Journal, 19(1), 43–50. infrastructure-gaps.
Ji, Y., et al. (2018). Assessing and prioritising delay factors of prefabricated concrete Yaseen, Z. M., et al. (2020). Prediction of risk delay in construction projects using a
building projects in China. Applied Sciences, 8(11), 2324. http://dx.doi.org/10. hybrid artificial intelligence model. Sustainability, 12(4), 1514. http://dx.doi.org/
3390/app8112324, (Switzerland). MDPI AG. 10.3390/su12041514.
Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. In Machine Zhang, H. (2004). The optimality of naive Bayes naive Bayes and augmented naive
learning proceedings 1992 (pp. 249–256). Elsevier, http://dx.doi.org/10.1016/b978- Bayes. Aa, 1(2), 3, Available at: www.aaai.org. (Accessed 27 May 2021).
1-55860-247-2.50037-1. Zou, Z., & Ergan, S. (2019). Leveraging data driven approaches to quantify the impact of
construction projects on urban quality of life. (pp. 1–31). arXiv preprint arXiv:1901.
09084. Available at: https://arxiv.org/abs/1901.09084. (Accessed 8 May 2020).

View publication stats

Methods in Behavioural Research Canadian 2nd Edition Cozby Test Bank
100% (7)
Methods in Behavioural Research Canadian 2nd Edition Cozby Test Bank
11 pages
2 - 275680 - Individual Assignment 1
No ratings yet
2 - 275680 - Individual Assignment 1
15 pages
Time-Cost Optimization of Building Projects
From Everand
Time-Cost Optimization of Building Projects
Uzair Waheed
No ratings yet
Appliedartificialintelligenceforpredictingconstructionprojectsdelay
No ratings yet
Appliedartificialintelligenceforpredictingconstructionprojectsdelay
16 pages
Design and Construction of A Motorized Citrus Juice Extractor
No ratings yet
Design and Construction of A Motorized Citrus Juice Extractor
6 pages
Project Control Methods and Best Practices: Achieving Project Success
From Everand
Project Control Methods and Best Practices: Achieving Project Success
Yakubu
No ratings yet
ArticleText 264425 1 10 20171129
No ratings yet
ArticleText 264425 1 10 20171129
8 pages
Model Design and Simulation of Automatic Sorting Machine Using Proximity Sensor
No ratings yet
Model Design and Simulation of Automatic Sorting Machine Using Proximity Sensor
6 pages
Introductionto Political Science
No ratings yet
Introductionto Political Science
5 pages
Data Analysis and Collection for Costing of Research Reactor Decommissioning: Final Report of the DACCORD Collaborative Project
From Everand
Data Analysis and Collection for Costing of Research Reactor Decommissioning: Final Report of the DACCORD Collaborative Project
IAEA
No ratings yet
Studies of Extraction of Banana Fibers
No ratings yet
Studies of Extraction of Banana Fibers
6 pages
Project Management in Construction of Research Reactors
From Everand
Project Management in Construction of Research Reactors
IAEA
No ratings yet
Human Factors Engineering Aspects of Instrumentation and Control System Design
From Everand
Human Factors Engineering Aspects of Instrumentation and Control System Design
IAEA
No ratings yet
Designand Implementationofa Sequential Digital Displayfora Nigerian University
No ratings yet
Designand Implementationofa Sequential Digital Displayfora Nigerian University
6 pages
Building Options at Project Front-End Strategizing: The Power of Capital Design for Evolvability
From Everand
Building Options at Project Front-End Strategizing: The Power of Capital Design for Evolvability
Guilherme Biesek
No ratings yet
Design of Secure Electronic Voting System Using Fi
No ratings yet
Design of Secure Electronic Voting System Using Fi
10 pages
Considerations on Decommissioning in the Design and Operation of Research Reactors
From Everand
Considerations on Decommissioning in the Design and Operation of Research Reactors
IAEA
No ratings yet
Eco-responsible web design: A practical guide to substainable websites
From Everand
Eco-responsible web design: A practical guide to substainable websites
Grégory Clément
No ratings yet
Digital Instrumentation and Control Systems for New and Existing Research Reactors
From Everand
Digital Instrumentation and Control Systems for New and Existing Research Reactors
IAEA
No ratings yet
Approaches to Cost-Benefit Analysis of New Nuclear Power Projects
From Everand
Approaches to Cost-Benefit Analysis of New Nuclear Power Projects
IAEA
No ratings yet
article
No ratings yet
article
9 pages
Tracking the Impacts of Innovation: Offshore wind as a case study
From Everand
Tracking the Impacts of Innovation: Offshore wind as a case study
International Renewable Energy Agency (IRENA)
No ratings yet
Summary Review on the Application of Computational Fluid Dynamics in Nuclear Power Plant Design
From Everand
Summary Review on the Application of Computational Fluid Dynamics in Nuclear Power Plant Design
IAEA
No ratings yet
A Review On Steel Connections and Structural Behavior A Review On Steel Connections and Structural Behavior
No ratings yet
A Review On Steel Connections and Structural Behavior A Review On Steel Connections and Structural Behavior
13 pages
Design and Construction of A Motorized Citrus Juic
No ratings yet
Design and Construction of A Motorized Citrus Juic
6 pages
Oyelade K Mean1002.2425
No ratings yet
Oyelade K Mean1002.2425
5 pages
Handbook on Construction Techniques: A Practical Field Review of Environmental Impacts in Power Transmission/Distribution, Run-of-River Hydropower and Solar Photovoltaic Power Generation Projects
From Everand
Handbook on Construction Techniques: A Practical Field Review of Environmental Impacts in Power Transmission/Distribution, Run-of-River Hydropower and Solar Photovoltaic Power Generation Projects
Shotaro Sasaki
2/5 (1)
Technology Roadmap for Small Modular Reactor Deployment
From Everand
Technology Roadmap for Small Modular Reactor Deployment
IAEA
No ratings yet
Integrated Building Information Modelling
From Everand
Integrated Building Information Modelling
PublishDrive
1/5 (1)
Dental 7
No ratings yet
Dental 7
6 pages
IJSER092020 Bilisaf Teferri
No ratings yet
IJSER092020 Bilisaf Teferri
12 pages
Public and Environmental Health Effects of Plastic
No ratings yet
Public and Environmental Health Effects of Plastic
14 pages
Applications of Combinatorial Optimization
From Everand
Applications of Combinatorial Optimization
Vangelis Th. Paschos
No ratings yet
EvaluatingUrbanServiceDeliveryinLagosStateNigeria.Abidtoenhancesustainablewastemanagement
No ratings yet
EvaluatingUrbanServiceDeliveryinLagosStateNigeria.Abidtoenhancesustainablewastemanagement
24 pages
Fish Preservation A Multi-Dimensional Approach
No ratings yet
Fish Preservation A Multi-Dimensional Approach
9 pages
Automatic Sorting Machine
No ratings yet
Automatic Sorting Machine
6 pages
Microsoft Excel-Based Tool Kit for Planning Hybrid Energy Systems: A User Guide
From Everand
Microsoft Excel-Based Tool Kit for Planning Hybrid Energy Systems: A User Guide
Asian Development Bank
No ratings yet
Sansevieria Trifasciata Fibre and Composites: A Review of Recent Developments
No ratings yet
Sansevieria Trifasciata Fibre and Composites: A Review of Recent Developments
12 pages
Ecosystems Architecture
From Everand
Ecosystems Architecture
Philip Tetlow
No ratings yet
Quality infrastructure for smart mini-grids
From Everand
Quality infrastructure for smart mini-grids
International Renewable Energy Agency (IRENA)
No ratings yet
Monthly People
From Everand
Monthly People
Sung-rae Park
No ratings yet
Guidelines for Wind Resource Assessment: Best Practices for Countries Initiating Wind Development
From Everand
Guidelines for Wind Resource Assessment: Best Practices for Countries Initiating Wind Development
Asian Development Bank
No ratings yet
Production of Ogi
No ratings yet
Production of Ogi
10 pages
Design and Construction of A Smart Door With Embedded Spy Camera
No ratings yet
Design and Construction of A Smart Door With Embedded Spy Camera
9 pages
Comparative Studyon Education Funding
No ratings yet
Comparative Studyon Education Funding
13 pages
Wagner Proposition in Nigeria: An Econometric Analysis: Heliyon August 2020
No ratings yet
Wagner Proposition in Nigeria: An Econometric Analysis: Heliyon August 2020
11 pages
Improving Energy Efficiency and Reducing Emissions through Intelligent Railway Station Buildings
From Everand
Improving Energy Efficiency and Reducing Emissions through Intelligent Railway Station Buildings
Asian Development Bank
No ratings yet
Training Facility Norms and Standard Equipment Lists: Volume 2---Mechatronics Technology
From Everand
Training Facility Norms and Standard Equipment Lists: Volume 2---Mechatronics Technology
Fook Yen Chong
No ratings yet
Development of An Electronic Weighing Indicator For Digital Measurement
No ratings yet
Development of An Electronic Weighing Indicator For Digital Measurement
8 pages
Technical Aspects Related to the Design and Construction of Engineered Containment Barriers for Environmental Remediation
From Everand
Technical Aspects Related to the Design and Construction of Engineered Containment Barriers for Environmental Remediation
IAEA
No ratings yet
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
Comparative Analysis of The Tensile Strength of Bamboo and Reinforcement Steel Bars As Structural Member in Building Construction
No ratings yet
Comparative Analysis of The Tensile Strength of Bamboo and Reinforcement Steel Bars As Structural Member in Building Construction
7 pages
Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies
From Everand
Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies
Ruchi Doshi
No ratings yet
Artificial Intelligence and Knowledge Processing: Methods and Applications
From Everand
Artificial Intelligence and Knowledge Processing: Methods and Applications
Hemachandran K.
No ratings yet
A Review On Design and Fabrication of Fuel Fired Crucible Furnace
No ratings yet
A Review On Design and Fabrication of Fuel Fired Crucible Furnace
12 pages
Fatunlaetal2022FebPublishedJIDCMalariaPaper14894 ArticleText 147289 1 10 20220315
No ratings yet
Fatunlaetal2022FebPublishedJIDCMalariaPaper14894 ArticleText 147289 1 10 20220315
11 pages
In Vitro Antioxidant Properties and Digestibility
No ratings yet
In Vitro Antioxidant Properties and Digestibility
8 pages
Endrias Ayaleneh 2013 TE Gamo
No ratings yet
Endrias Ayaleneh 2013 TE Gamo
7 pages
Agile Approaches on Large Projects in Large Organizations
From Everand
Agile Approaches on Large Projects in Large Organizations
Brian Hobbs
No ratings yet
Construction and Evaluation of a Power Inverter
No ratings yet
Construction and Evaluation of a Power Inverter
6 pages
Training and Human Resource Considerations for Nuclear Facility Decommissioning
From Everand
Training and Human Resource Considerations for Nuclear Facility Decommissioning
IAEA
No ratings yet
1-s2.0-S2468550X18300352-main
No ratings yet
1-s2.0-S2468550X18300352-main
9 pages
Steps of Cable Splicing
No ratings yet
Steps of Cable Splicing
2 pages
GSO ED203 Practicum-Educational-Management-Planning-Syllabus-2021-Final Revision 02262021
No ratings yet
GSO ED203 Practicum-Educational-Management-Planning-Syllabus-2021-Final Revision 02262021
7 pages
Introduction To Industrial Engineering-TAPEC
No ratings yet
Introduction To Industrial Engineering-TAPEC
28 pages
The Rationale For Using Artificial Intelligence
No ratings yet
The Rationale For Using Artificial Intelligence
2 pages
01 Problem Solving and Algorithm Design
No ratings yet
01 Problem Solving and Algorithm Design
27 pages
NominalRollClass09
No ratings yet
NominalRollClass09
16 pages
Sociology UPSC PYQ P1-2013-2022
No ratings yet
Sociology UPSC PYQ P1-2013-2022
46 pages
MCQ - MEd
No ratings yet
MCQ - MEd
4 pages
Western Communication Modes
No ratings yet
Western Communication Modes
3 pages
3.1. Theoretical and Conceptual Framework
No ratings yet
3.1. Theoretical and Conceptual Framework
18 pages
White and Purple Modern Artificial Intelligence Presentation - 20240727 - 142338 - 0000
No ratings yet
White and Purple Modern Artificial Intelligence Presentation - 20240727 - 142338 - 0000
9 pages
A Comparative Study of Online Learning and Traditional Learning Among Grade 11 Abm Students in St. Clare College
100% (1)
A Comparative Study of Online Learning and Traditional Learning Among Grade 11 Abm Students in St. Clare College
35 pages
Bueza, Vallejo, Flores (BPED) - Final Thesis
No ratings yet
Bueza, Vallejo, Flores (BPED) - Final Thesis
71 pages
Vijayam Junior College::Chittoor: Intermediate Text Books
No ratings yet
Vijayam Junior College::Chittoor: Intermediate Text Books
4 pages
Boxman-Shabtai, L. (2020) - Meaning Multiplicity Across Communication Subfields Bridging The Gaps.
No ratings yet
Boxman-Shabtai, L. (2020) - Meaning Multiplicity Across Communication Subfields Bridging The Gaps.
23 pages
CHAPTER 1. LIM-IT. et. al-1
No ratings yet
CHAPTER 1. LIM-IT. et. al-1
10 pages
Artificial Intelligence: Computer Science Engineering
No ratings yet
Artificial Intelligence: Computer Science Engineering
1 page
2 My DDM Research
No ratings yet
2 My DDM Research
25 pages
(Andrew Pollard) Teaching and Learning in The Prim
100% (1)
(Andrew Pollard) Teaching and Learning in The Prim
337 pages
AP EFN 4302 Week 4 Research Problem Literature Review PDF
No ratings yet
AP EFN 4302 Week 4 Research Problem Literature Review PDF
34 pages
Reflective Essay - Edited
No ratings yet
Reflective Essay - Edited
5 pages
(eBook PDF) Children and Their Development 3rd Canadian Editioninstant download
100% (4)
(eBook PDF) Children and Their Development 3rd Canadian Editioninstant download
49 pages
Parts of Plot DLP
No ratings yet
Parts of Plot DLP
4 pages
List of Online Courses For The Year 2020-21: Ugc-Human Resource Development Centre
No ratings yet
List of Online Courses For The Year 2020-21: Ugc-Human Resource Development Centre
2 pages
Jurnal Penelitian Survey
No ratings yet
Jurnal Penelitian Survey
13 pages
Copy of Time Table 2025-2026(1)
No ratings yet
Copy of Time Table 2025-2026(1)
40 pages
Reflection Paper No.2
No ratings yet
Reflection Paper No.2
2 pages
INTRODUCTION
No ratings yet
INTRODUCTION
20 pages

Applied Artificial Intelligence For Predicting Construction Projects Delay

Uploaded by

Applied Artificial Intelligence For Predicting Construction Projects Delay

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Applied Artiﬁcial Intelligence for Predicting Construction Projects Delay

Article in Machine Learning · September 2021

Christian Nnaemeka Egwim Hafiz A. Alaka

SEE PROFILE SEE PROFILE

Olalekan Luqman Toriola-Coker Habeeb Balogun

SEE PROFILE SEE PROFILE

A Cloud based C/ C++ Compiler for Smart Devices View project

The user has requested enhancement of the downloaded file.

Contents lists available at ScienceDirect

Machine Learning with Applications

Applied artificial intelligence for predicting construction projects delay

ARTICLE INFO ABSTRACT

1. Introduction construction projects, resulting in great dissatisfaction from the indus-

Fig. 1. This study’s organogram.

Only a hand full of literature have attempted the adoption of AI or Table 2

Fig. 2. EMLA Prediction Architecture.

Fig. 3. Shape of Dataset’s Distribution.

Fig. 4. Correlation Matrix Plot.

Fig. 5. Standardized Dataset Distribution.

Fig. 6. Chi-squared Test.

Fig. 7c. Bagging ROC AUC Plot.

Fig. 7j. Multinomial ROC AUC Plot.

3.4. Feature selection

In typical machine learning pipeline, feature selection is a crucial

3.5. Ensemble machine learning technique

This technique involves the use of multiple algorithms where the

Bagging is mathematically expressed by the following formula: Thus,

Fig. 8. Bagging Decision Boundaries.

Fig. 9. Boosting Decision Boundaries.

Fig. 10. Naïve Bayes Decision Boundaries.

Fig. 11. DT as base estimator.

View publication stats

You might also like