0% found this document useful (0 votes)

31 views14 pages

Pham 2021

This study presents a machine learning framework for the efficient estimation and optimization of building costs, utilizing a dataset of 10,000 building configurations. The research demonstrates that models like Artificial Neural Network (ANN), Gradient Boosting, and XGBoost can achieve up to 99% accuracy in cost estimation, significantly enhancing operational efficiency and competitiveness for construction companies. Additionally, the optimized costs derived from this study are 7% lower than actual data, showcasing the potential for cost savings and improved client relationships in the construction sector.

Uploaded by

Gini Mwiinga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views14 pages

Pham 2021

Uploaded by

Gini Mwiinga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

International Journal of Construction Management

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tjcm20

Efficient estimation and optimization of building

costs using machine learning

T. Q. D. Pham, T. Le-Hong & X. V. Tran

To cite this article: T. Q. D. Pham, T. Le-Hong & X. V. Tran (2021): Efficient estimation and
optimization of building costs using machine learning, International Journal of Construction
Management, DOI: 10.1080/15623599.2021.1943630

To link to this article: https://doi.org/10.1080/15623599.2021.1943630

Published online: 02 Jul 2021.

Submit your article to this journal

Article views: 29

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tjcm20
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT
https://doi.org/10.1080/15623599.2021.1943630

Efficient estimation and optimization of building costs using machine learning

T. Q. D. Pham, T. Le-Hong and X. V. Tran
Institute for Development Strategies, Thu Dau Mot University, Binh Duong, Vietnam

ABSTRACT KEYWORDS
This study provides a fast and accurate Machine learning (ML) and optimization framework, which allows Building cost; maintenance
a quick estimate for building costs, hence improving operational efficiency and competitiveness of a con- cost; machine learning;
struction company. A dataset composed of 10,000 parametric building configurations, collected from regression analysis; neural
networks; operational
end-to-end real-world activities in our partner company, was used to train and validate the ML models to efficiency
perform multiple tasks. Among the 13 ML regression algorithms used, the Artificial Neural Network (ANN),
Gradient Boosting, and XGBoost models appear to be the most suitable to estimate the building costs
and the required resources with an accuracy of 99% within less than a second of the training time. The
ANN models are also developed to identify available options of the building features under a given
budget. The optimization problem under constraints is solved, helping clients determine the optimal
building costs according to their preferences. Besides, the optimized building costs obtained by this study
are 7% smaller than those of the actual data, hence to improve the company’s competitiveness. This
study showcases that ML models can be efficiently used in the construction sector to optimize the work-
flow for cost savings and provide some practical implications for data-driven management.

Introduction estimate associated costs that are construction and maintenance

costs per unit surface. This total bare cost of construction and
Optimizing operations to save engineering and design costs play
maintenance is then added with a margin rate, leading to the
a vital role for the company to propose the proposals with the
final cost estimation that is communicated to the client.
least costs based on the information at the very early stage of
Normally, the house is co-designed between the house owner
project (Rardin and Rardin 1998). In the construction sector,
and the building company, the process is repeated until the cli-
most of large projects are procured through competitive con-
ent approves the design for a given budget. During this process,
tracting process (Tepeli et al. 2019), where clients issue request
the client often continues to change the requests, sometimes with
for proposals and contractors interested submit proposals with
minor modifications. One such change usually takes a lot of time
project solutions and the cost tender (Rardin and Rardin 1998;
and requires many interfaces between different company services,
Tepeli et al. 2019). During the proposal preparation, many itera-
leading to complicated coordination work, inefficient operations
tions between different design, engineering, resource estimation
as well as long waiting time for clients.
and costing departments occur (Sha’ar et al. 2017; Garcıa de
After years of services, our partner company has gathered a
Soto et al. 2019; Tepeli et al. 2019). In practice, only large size
quite comprehensive database of about 10000 building configura-
companies can have sufficient resources to prepare the proposals
tions. This motivates them to seek for opportunities to save
for bidding and the clients’ requirements are often different from
operating costs to improve efficiency and to reduce the waiting
different projects, so it is not easy to standardize the data and
time during the co-design phase by exploiting these data. Our
share experience to apply in new projects (Sha’ar et al. 2017;
partner company identifies the following four tasks as ambitions:
Garcıa de Soto et al. 2019; Tepeli et al. 2019).
In contrast, our medium-sized partner company used to build 1. The company needs to estimate the building costs quickly
for thousands of quite similar small projects such as few storey and accurately so that they can quickly communicate with
buildings where the owner often directly requests for proposal the clients while assuring that only profitable cost estima-
without a need to go through the formal procurement process. tions are proposed. This task will help to save engineering
Also, the building is normally co-designed between the owner and design costs, to achieve competitive offered prices, and
and the builder company to finalize the building features. Once improve the customer relationship.
an initial request for proposal is issued by the client, the key 2. The company needs to be able to quickly identify the most
information on the planned building features, such as ground realistic options of the planned building for a given budget
surface area, number of floors, number of windows, construction to assure a profitable and competitive price and manage the
materials, etc., are collected. These data are then transferred to client’s expectations efficiently.
the design team, who will sketch plans of the future building, 3. Once a set of building features is confirmed by the client,
and to the engineering team to ensure safety in design. Later, the required resource estimate needs to be made as quickly
required resources, such as materials and labour, are estimated as possible by different services to prepare for adequate stor-
by the company’s internal services to prepare for storage and age and efficient supply chain management. This task will

CONTACT X. V. Tran [email protected]

ß 2021 Informa UK Limited, trading as Taylor & Francis Group
2 T. Q. D. PHAM ET AL.

help to streamline operations, improve design and engineer- as few-storey buildings and there is constant and open inter-
ing cost savings as well as risk management efficiency. action between the house owner and builder to finalize the
4. Finally, in order to take into account the very diverse prefer- building features. The purpose of the building cost prediction is
ences of customers, an optimization tool under constraints to identify the end-to-end relationship between the building fea-
needs to be developed to estimate the building costs in dif- tures (i.e., number of floors) requested by the customer and the
ferent scenarios. building costs computed by the company to set the offered cost
estimation. Focussing on the four tasks described in Section
It should be noted that these four tasks can be done and con-
introduction, this section aims to review articles related to the
tinuously improved by the current practices. This paper aims to
building cost estimation using ML models.
demonstrate how a construction company can exploit the avail-
Note that the objective of this study is to predict the building
able data to perform the above-mentioned tasks with bet-
cost from the construction company’s perspectives rather than
ter efficiency.
the house price from the clients’ ones. However, as the building
Supervised machine learning (ML) models, based on self-
costs and the house price are somehow interlinked, both these
learning algorithms using a set of training data, are widely used in
perspectives are reviewed. In the housing sector, ML is used to
different technology sectors (Kim et al. 2004; 2004; 2013). As dis-
predict the house prices from the mortgage data based on house
cussed in detail in Section literature review, the application of ML
features, for example, locations, state, useful surface, garden …
for the building cost prediction using the available real-world
(Limsombunchai 2004; Park and Bae 2015; Gao et al. 2019;
engineering data from a construction company has not yet exten-
Madhuri et al. 2019), and mainly for the demand side once the
sive in the construction management and operation applications
houses are already built (Limsombunchai 2004; Gao et al. 2019).
(Kim et al. 2004; 2004; 2013). Also, it appears that a few papers
From the supply side of house building companies, advanced
that reported a systematic deployment of a dozen ML models to
data analytics has recently gained interest. Le et al. (2019) per-
perform all the above-mentioned four tasks to identify their suit-
formed an in-depth literature review of the civil integrated man-
ability and to provide some practical implications. More precisely,
agement data. They concluded that data sharing and integration
solving an inverse prediction model to determine the range of fea-
were critical for successfully implementing integrated civil infra-
ture options under a given budget and optimization problems structure management.
with constraints to save costs has not gained extensive interest. Ghosalkar and Dhage (2018) predicted the building cost using
These problems are investigated in this paper in order to provide multiple linear regression ML algorithms. The three most
fast and accurate ML-based models that can be used to predict important factors influencing the building cost were physical
and optimize the building costs under different scenarios. conditions, concepts, and location. Their research proved that
In this study, based on the actual data obtained from a long- the linear regression model predicted the building cost with a
established construction company, an ML-based estimation tool mean squared error of 0.3713, ensuring a good predictive model.
is developed to provide a fast and efficient estimation of the Besides, Bhagat et al. (2016) developed a purely statistical frame-
building and maintenance costs directly from the customer work to predict the efficient house pricing for real estate custom-
requests (end-to-end services). More precisely, the laborious and ers with respect to their budgets and priorities using a linear
time-consuming steps performed by the back-end offices are to regression algorithm. Based on a dataset composed of 286 build-
be integrated into an automatic tool that the front-end company ing configurations collected in the United Kingdom, Lowe et al.
can employ to quickly estimate the building costs to better dis- (2006) predicted the total client costs given the construction and
cuss with customers. The ML-based model is expected to link client costs with an R2 (coefficient of determination) value of
the client requests and the final cost estimation within a short 0.661 using multiple linear regression. Recently, Huang and
time, improving the client – builder relationship. Besides, the Hsieh (2020) proposed a simple linear regression framework to
repeated designs, as well as resource estimations, can be avoided, predict the labour cost using 19 completed Building Information
leading to significantly enhanced work efficiency. In addition, for Modelling (BIM) projects from a leading construction company
a set of feature constraints required by the customer preferences, in Taiwan. Their results indicated that the linear regression
the building cost needs to be minimized to increase the com- model was the most stable, compared to the Random Forest
pany’s profits. Thus, the tool developed in this study will help model, with increasing the number of projects.
the company save costs, increase revenues, and improve cus- Regarding the nonlinear ML methods, Chen et al. (2017)
tomer satisfaction. adopted a novel approach based on the Support Vector Machine
In this paper, Section literature review is dedicated to a litera- (Suykens and Vandewalle 1999) algorithm to predict residential
ture review. The data description, data processing techniques, house prices. Their results showed that the maximum accuracy
and ML models are presented in Section dataset and method- (R2 value) could reach 0.72. Phan (2018) developed a framework
ology. In Section results and discussions, the results obtained using a polynomial regression method to forecast house prices in
from ML models performing the four tasks mentioned previ- Australia with an accuracy (R2) of 0.84. Besides, the Artificial
ously, including the cost prediction for a given set of building Neural Network (ANN) was also used in the construction sector,
features, the identification of the possible building features for a for example, to predict engineering service costs (Matel et al.
given budget, the prediction of the required resources for a given 2019) or classify the delay risk assessment in tall building proj-
set of building features, and the optimization of the building ects (Sanni-Anibire et al. 2020). These studies suggested that the
costs with and without constraints on building features are dis- ANN model can present a high accuracy of more than 93%.
cussed in detail prior to conclusions in the last section. As for the boosting-based ML algorithms, Ceh et al. (2018)
estimated Random Forest’s performance versus multiple linear
regression’s one when predicting the apartment price of 7407
Literature review
data from 2008-2013. The authors concluded that the Random
It should be noted that we focus on medium size construction Forest outperformed the other regression method at all perform-
company that is working on many relatively small projects such ance measures. The Random Forest has also been used to predict
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 3

Figure 1. Statistical distributions of the variables X, Z, and Y from the actual dataset.

the building cost in many studies as in (Afonso et al. 2019; methods and optimization under constraints using real-world
Hong et al. 2020; Jui et al. 2020). Several frameworks related to data from the end-to-end activities of a construction company in
the building cost prediction were recently developed using the order to provide some practical implications for construc-
XGBoost method (Chen et al. 2015). Zhen Peng et al. (2019) tion management.
used this technique to analyze 35,417 pieces of data captured by
the Chengdu HOME LINK network. The results indicated that
XGBoost prediction accuracy was the highest, with an R2 value Dataset and methodology
reaching about 0.93. In a similar study, Truong et al. (2020) con- Data description
cluded that XGBoost appeared to be the best algorithm to pre-
dict the house price. In this study, a dataset composed of 10,000 parametric building
It should be noted that the above-mentioned studies (Suykens configurations collected from the end-to-end activities of a con-
and Vandewalle 1999; Lowe et al. 2006; Chen et al. 2015; Bhagat struction company was used to train, validate, and optimize ML
et al. 2016; Chen et al. 2017; Ceh et al. 2018; Ghosalkar and models in order to estimate building costs and required resour-
Dhage 2018; Phan 2018; Afonso et al. 2019; Matel et al. 2019; ces. The dataset consists of 24 variables divided into three spe-
Peng et al. 2019; Hong et al. 2020; Huang and Hsieh 2020; Jui cific groups, including building features, costs, and required
et al. 2020; Sanni-Anibire et al. 2020; Truong et al. 2020) mainly resources. The building features, denoted by the vector X, have
focussed on the estimation of the building costs using data from 11 features named X1, X2, … , X11. They consist of key charac-
one or a few internal services without going through an end-to- teristics of a building, which are provided by the client, such as
end workflow from customer requests to the building price the number of floors, the building depth, the width of windows,
offered. Regarding cost optimization used to improve profits, few the floor height, etc. The costs, denoted by the vector Z, have 2

studies have been reported in the literature. Kravanja and Zula components, Z1 and Z2, corresponding to the building and main-
(2010) presented a simultaneous cost, topology, and standard tenance cost per unit surface, respectively. The required resour-
cross-section optimization of industrial steel building structures ces, denoted by the vector Y, are composed of 11 parameters,
using the mixed-integer nonlinear programming approach. named Y1, Y2, … , Y11. Internal to the company, these variables
Similarly, Michael J. Risbeck et al. (2015) developed a framework represent the amount of cement and sand, etc., estimated by the
to optimize the combined building heating/cooling equipment company, which serve to calculate costs and prepare necessary
cost. Besides, Vinko Lesic et al. (2017) proposed a modular resources for construction.
energy cost optimization for buildings with an integrated micro- The actual dataset is included in the Appendix. This dataset is
grid. The results showed a considerable cost-saving of modular a matrix of 10,000 lines and 24 columns formed by the standar-
energy in different configurations. dized data. The first 11 columns represent the independent varia-
Based on this review, it appears that the use of advanced data bles of the building features X. The next two columns represent
analytics for the real-world data collected from end-to-end activ- the costs Z, which are functions of the building features X. The
ities of a construction company to perform the four tasks last 11 columns are the required resources Y estimated by differ-
described in Section introduction has not been extensively ent company departments, which are also functions of X. Each
studied. Also, most of the published works focussed on predict- line in the data represents a configuration of the building to be
ive models to estimate the costs from the building features. built and associated costs estimated beforehand and required
However, limited attention has been given to an inverse resources. They were obtained across internal services and
approach that identifies the features under a given budget. departments of the company.
Besides, the optimization of the building costs performed in the Figure 1 shows the distributions of the variables X, Z, and Y
literature was usually subjected to constraints on the input, from the actual dataset. Note that Figure 1 is presented in the
which will be considered in this paper. Additionally, our study logarithmic scale without normalization, and the negative values
aims to demonstrate the suitability of different ML regression of X1 are excluded. As can be seen in the figure, each feature
4 T. Q. D. PHAM ET AL.

Figure 2. Workflow for the ML regression framework.

Figure 3. Distributions of the normalized values of X, Z, and Y.

exhibits a different distribution. These differences could be very magnitudes and statistical distributions. This significantly
sensitive and considerably affect the algorithms that consider the slows down the computation process and negatively affects
Euclidean distance, such as the K-means algorithm. Additionally, the model performance. In order to deal with these issues,
this significantly increases the ML model’s training time, in par- the data of each column in the actual dataset were normal-
ticular that of the ANN. ized in the range of 0 and 1 using the following equation:
XX min
X scaled ¼ , (1)
Regression frameworks X max Xmin
Figure 2 represents the workflow for the ML regression frame-
work used in this study. It includes data processing, ML regres- where Xscaled is the normalized value of each variable of X
sion algorithm, and model evaluation. The actual data, defined from the actual dataset. Xmax and Xmin represent its actual
as those collected from the company without any modification, minimum and maximum values. Similar normalization was
were first analyzed using several processing methods, such as performed on Y and Z variables. The distributions of the
data normalization, data checking, and Principal Component normalized data are shown in Figure 3. In this paper, the
Analysis (PCA) (Wold et al. 1987). They were then split into normalized values were used for training, testing, and valid-
training and test datasets with a ratio of 80% and 20%, respect- ation processes, as well as for result illustration, except the
ively. The training dataset was used as the input for the ML optimization task under constraints presented in Section
models that were then validated using the test dataset. During optimization of the costs Z under constraints on the build-
this process, cross-validation and parameter tuning were per- ing features X.
formed to improve/maximize the model accuracy. Data checking and Principal Component Analysis
Figure 4 shows the correlation coefficients between the
building features X before and after applying the Principal
Data processing Component Analysis (PCA) using Pearson correlation ana-
Data normalization lysis. It can be seen that without the PCA, some features are
As shown in Figure 1, due to the diverse intrinsic character- strongly correlated, in particular between X9 or X10 and
istics and properties of the features X from the actual data, other features with absolute values of the correlation coeffi-
their scales are relatively large with different orders of cient of 0.4 or 0.5, as shown in Figure 4. The correlations
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 5

Figure 4. Correlation matrices for the building features X before (left) and after (right) applying the PCA by Pearson correlation analysis.

Table 1. Analysis results of the PCA method for the dataset (the values higher than 0.5 are marked in bold).
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
X1 7e-5 2.1e-4 8.9e-4 1.1e-3 8.1e-5 6.9e-3 1.9e-3 1.2e-3
X2 2.5e-4 7.8e-4 1.5e-3 7.6e-4 5.1e-3 1.1e-3 0.02 0.012
X3 20.94 0.23 0.012 5.6e-3 0.12 0.01 0.10 0.15
X4 0.28 0.87 0.021 4.7e-3 0.13 5e-3 0.044 0.28
X5 0.073 0.086 20.55 0.7 0.37 0.051 0.15 0.11
X6 0.079 0.077 20.55 -0.7 0.38 0.074 0.14 0.12
X7 0.014 0.055 0.082 0.061 0.19 0.74 0.49 0.21
X8 0.015 0.053 0.074 0.065 0.21 20.66 0.61 0.19
X9 5.9e-3 0.024 0.043 1.7e-5 0.11 0.042 0.37 0.20
X10 0.054 0.049 0.46 4.1e-3 0.52 0.027 0.15 0.68
X11 0.10 0.039 0.32 3.2e-3 -0.55 0.033 0.38 0.52
Variance explained (%) 62 18 9 7 2 0.7 0.5 0.3

between features appear to be plausible as in the building t ¼ UðxÞ ¼ Wx þ b, (2-a)

design (i.e., a higher number of floors is strongly correlated
with the lower floor depth). Note that for the predictive
where W 2 RpxL , WW T ¼ I ðIdentity matrixÞ, and b 2
tasks using ANN and regression models, the features are not
Rp : W denotes the eigenvectors of the covariance matrix corre-
required to be non-correlated. Besides, each feature’s contri-
sponding to the first largest values. On the contrary, a reverse
bution can be estimated via the linear regression coeffi-
map is defined as w : t ! x so that the actual data can be
cients.
reconstructed from the PCA-converted data as follows:
Nevertheless, it is more difficult for an optimization process
with correlated features to converge (Wold et al. 1987). In wðtÞ ¼ W T ðt bÞ, (2-b)
practice, the PCA (Wold et al. 1987) is employed to convert
all the features into orthogonal ones with zero covariance by The analysis results of the 8 principal components (PCs)
performing a change of basis on the data, reducing the from the PCA are shown in Table 1. Each principal compo-
number of correlated features. In this study, the PCA was nent (PC1 to PC8) is the direction vector that best fits the
only applied in the optimization task under constraints (see data, and the variance explained represents the total variabil-
Section optimization of the costs Z under constraints on the ity of the data. Note that the values higher than 0.5 are
building features X) in order to eliminate the correlations marked in bold in the table. It can be seen that PC1 has a
between features. Besides, the PCA is also capable of elimi- high impact on X3 (number of floors). This component
nating noise and improving the processing speed. As shown explains up to 62% of the total variance and contains the
in Figure 4, no correlation (indicated by correlation coeffi- features that directly affect the building costs Z. PC2 has a
cients of nearly 0) is observed amongst the building features high impact on X4 (floor depth) and explains 18% of the
after applying the PCA. Furthermore, the number of fea- total variance. As shown in the table, X5 and X6, corre-
tures is reduced from 11 to 8, which can significantly sponding to the building properties (i.e., floor height), are
decrease the computing time. the features that contribute the most to PC3 and PC4. This
In the PCA method, the converted feature t 2 RL is gener- suggests that PC3 and PC4 can represent the building prop-
ated from the actual data x 2 Rp via a linear transform- erties. Besides, the number of facilities in the house (i.e.,
ation function UðxÞ: This feature can be formulated as number of doors) X10, X11, show the highest impact on PC5
follows: and PC8. Finally, PC6 and PC7 have a significant effect on
6 T. Q. D. PHAM ET AL.

X8. A more detailed analysis of each feature’s influence on from the ML model) and the actual values by the following for-
the building costs will be shown in Section prediction of the mula:
building and maintenance costs Z as a function of the fea- sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
tures X and Section prediction of the required resources Y X n
ð^y i yi Þ
MSE ¼ , (5)
as a function of the building features X. i¼1
n
Machine learning regression algorithms where n, ^y , and y are the number of samples, the predicted
In this study, 13 ML algorithms were applied to identify the values from the regression, and the actual values, respectively.
most suitable models to estimate the costs Z and the required Hence, the lower the value of MSE, the better the
resources Y with respect to the building features X. As a basic model predicts.
ML algorithm, regression is a training process to label the input The coefficient of determination R2 represents a measure of
independent variables with the output dependent variables. It how much the model replicates the actual values on the basis of
becomes a supervised learning problem and can be solved using the set of various errors:
various ML algorithms. The next section will provide a brief
description of three families of the ML regression algorithms. X
n
2
SSE ¼ yi ^y i , (6-a)
Linear regression-based ML models i¼1

Linear regression-based ML model is a parametric and X

supervised learning algorithm that uses a linear approach SST ¼ ðyi y Þ2 , (6-b)
i¼1
for a prediction problem. In the context of this study, the
values of the independent features X are associated with SSE
R2 ¼ 1 , (6-c)
those of the dependent variables Z by a linear relationship SST
as follows: where SSE, SST, and y represent the residual sum of squares,
Z ¼ b0 þ b1 X1 þ b2 X2 þ . . . þ b11 X11 (3) the total sum of squares, and the mean of the actual values,
respectively. Thus, the closer to 1 the value of R2 is, the better
the model predicts and the higher its accuracy.
As mentioned above, Z may be either the building cost per
unit surface Z1 or the maintenance cost per unit surface Z2.
The total budget (Z1þZ2) is also considered. Four linear Genetic algorithm for optimization models
regression-based algorithms, namely Linear Regression
A Genetic Algorithm (Goldberg 2006) is a computational
(Seber and Lee 2012), Lasso Regression (Tibshirani 1996),
Ridge Regression (Hoerl and Kennard 1970), and Elastic- method for solving constrained and unconstrained optimization
Net Regression (Zou and Hastie 2005), were tested in problems, which is based on the natural selection process that
this study. mimics biological evolution. This technique repeatedly modifies
Nonlinear regression-based ML models a population of individual solutions, which is the input that min-
The nonlinear regression-based ML models h are extended imizes the optimization function, to search for the optimal one
versions of the linear regression-based ML models, which created through the propagation of positive traits and mutation.
can be expressed as follows: The evaluation of each solution is rated by a target function
named fitness function; only the solutions with the highest fit-
Z ¼ hðX1 , X2 , . . . , X11 Þ (4) ness function score are allowed to move on to the next gener-
ation after a generation. The whole algorithm can be defined as
In this method, the data are fitted using a method of succes- a 4-step process: population representation and initialization,
sive approximation for the model’s hyperparameters. objective and fitness functions, selection and crossover, and
Decision Tree (Safavian and Landgrebe 1991), K-Nearest mutation, as shown in Figure 5.
Neighbour (Keller et al. 1985), Support Vector Machine
(Suykens and Vandewalle 1999), and Artificial Neural
Network (Jain et al. 1996) were tested in this study. Results and discussions
Boosting regression-based ML models In this section, the results of the ML models for different tasks
Boosting regression-based ML models or Ensemble regres- are presented. First, an overview of the four tasks described in
sion models aim to obtain a simple function with small Section introduction and inspired by real-world scenarios is dis-
residuals at almost every point in a sufficiently large sample. cussed to remind the ML models’ goals. Then, the results for
In this study, four models, namely AdaBoost (Roe et al.
these four tasks are detailed. Finally, practical implications and
2005), Extra Tree (Maier et al. 2015), Random Forest
contributions to construction management are discussed.
(Aldous 1993), Gradient Boosting (Friedman 2002), and
To develop ML-based models in this study, Python script
XGBoost (Chen et al. 2015), were tested.
codes were implemented using several Python libraries, such as
Numpy (numerical computing tool) (Van Der Walt et al. 2011),
Evaluation metrics Scipy (scientific and technical computing) (Jones et al. 2001),
In order to assess the performance of the building cost predic- Matplotlib (visualization) (Hunter 2007), Pandas (data loading
tion model using ML, two error metrics, including Mean and data-driven) (McKinney 2011), Scikit-learn (library for ML-
Squared Error (MSE) and coefficient of determination R2, were based regression models) (Pedregosa et al. 2011), and Keras
used in this study. The MSE measures how close the estimated (library for the artificial neural network) (Gulli and Pal 2017).
data are relative to the actual data by calculating the average of The training time was obtained from an Intel Core I5 Laptop
the squared deviation between the estimated values (obtained equipped with 6 GB RAM and 2 CPU cores.
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 7

Figure 5. Genetic algorithm framework for the building cost optimization with and without constraints.

Overview of tasks performed by machine learning models nodes in the second layer named hidden layer, and one node in
the last layer representing the costs. The ANN model architec-
In this section, four main tasks, previously described in Section
ture is shown in Figure 6. The model hyperparameters, such as
introduction, were performed using ML-based models. The first
the number of neurons in the hidden layers, the number of hid-
task was to predict the building cost per unit surface (Z1) and
den layers, the activation function, were selected using the
maintenance cost per unit surface (Z2) for a given set of features Bayesian optimization algorithm from the scikit-learn library
X. This task aims to provide the company with a fast and accur- (Aldous 1993).
ate tool to estimate the building costs based on the actual data- Table 2 lists the R2 metrics obtained from 13 ML regression
set. The second task was to identify potential options of the algorithms mentioned in Section machine learning regression
building features X that a client could afford for a given cost algorithms. It can be seen that the Boosting regression-based
(budget) Z (¼ Z1 þ Z2). This task aims to guide the company to models and ANN model exhibit the best accuracy when per-
the available building options to communicate with the client forming the cost prediction, in particular Gradient Boosting,
effectively. The third task was to predict the required resources Y XGBoost, and ANN with a very high-value R2 of 0.99. In other
for a given set of features X so that all necessary materials and words, the building and maintenance costs, as well as the total
labours can be stocked and planned before construction. This cost, can be predicted very well for a given set of building fea-
task aims to ensure that the risks of delays due to resource short- tures X. This observation is expected and can be explained by
age can be limited, and the costs can be accurately estimated the fact that the Boosting regression techniques and ANN
beforehand. The fourth task was to minimize the costs Z under a model use the gradient descent to minimize the error function.
given set of constrained features X according to the clients’ pref- However, these methods need much more time to converge as
erences. This task aims to ensure that the company’s competitive compared to the others. Regarding the linear regression-based
but profitable prices are always offered to the client. models, except the multiple linear regression that can reach
0.94 of R2 after only about 0.002 s of the training time, the
Prediction of the building and maintenance costs Z as a Lasso, Ridge, and Elastic-Net regression models show a rela-
function of the features X tively low value of R2. This lack of accuracy may be due to the
high correlation between the variables. In this case, these algo-
In order to evaluate the performance of the selected ML-based rithms will only retain one variable and set the other corre-
models described in Section machine learning regression algo- lated variables to zero. Thus, the dataset will lose information
rithms, the dataset was split into two parts for training and test- resulting in lower model performance. It can be noticed that
ing. The test dataset was used for the model validation using the all the linear regression models can converge rapidly after only
R2 metrics in Eq. (6-c). For the Boosting method, 2000 estima- 0.002-0.003s. This can be straightforwardly explained by the
tors were used during aggregation to maximize the model per- fact that solving a linear equation is much easier and less time-
formance. The ANN model was built from three layers with 11 consuming than a nonlinear Boosting equation. In terms of
nodes in the first layer corresponding to the input features, five accuracy and training time, the ANN model appears to be the
8 T. Q. D. PHAM ET AL.

most suitable to predict the building costs, as shown in lower R2 than the ANN model, cross-validation presents the
Table 2. same trend in the 3 cases Z1, Z2, and Z1 þ Z2, but with more
Figure 7 shows the actual and predicted results using the lin- scattering around the y ¼ x line (dashed red line in Figure 7)
ear regression and ANN models with cross-validation. Note that where x and y represent actual and predicted values of Z,
cross-validation was only performed for the linear regression and respectively.
ANN models due to their best quality in both training time and As shown in Figure 7, the ANN model’s predicted results are
accuracy. In the figure, the colour bar on the right side repre- in excellent agreement with the actual data. As a result, the ANN
sents the density of points. Since the linear regression shows a was used to perform the remaining studies in this paper, thanks
to its accuracy and fast training time. This result helps to estab-
lish the link between client requests and final cost described in
Figure 2 within a few seconds instead of many days of work per-
formed by different services of a construction company as in
current practice. This can result in a more efficient and
smoother interaction between company and clients, enhancing
customer satisfaction. Simultaneously, the costs incurred by the
internal services can be avoided; the company’s experience and
knowledge are streamlined and exploited, leading to a better esti-
mate for the building costs.
In order to investigate the influence of each building feature
on the building and maintenance costs, the linear regression
model was used to find the parameters with the highest coeffi-
cients bi as shown in Table 3. Note that the actual data were
normalized using Eq. (1); thus, the coefficients bi are called as
the normalized regression coefficients. According to statistical
Figure 6. Architecture of the Artificial Neural Network for task 1. theory, a coefficient with the highest value is the most influential
parameter. Also, a positive coefficient indicates an increasing
contribution of the feature to the output, while a negative coeffi-
Table 2. Values of R2 for different ML-based regression models. cient tends to decrease the output as the input feature increases.
R2 Training time (s) As listed in Table 3, among 11 features, X4 and X9 exhibit the
Regression model Z1 Z2 (Z1þZ2) Z1 Z2 (Z1þZ2)
highest regression coefficient values in absolute terms, leading to
the most influence on the building cost per unit surface Z1.
Multiple Linear 0.94 0.94 0.94 0.002 0.003 0.003
Lasso 0.13 0.12 0.13 0.003 0.003 0.004 Indeed, the X4 and X9 correspond to the floor depth and the
Ridge 0.12 0.13 0.13 0.003 0.003 0.004 floor height, respectively. Thus, it is reasonable that these key
Elastic-Net 0.15 0.13 0.13 0.003 0.003 0.003 geometrical building parameters can directly increase the build-
Decision Tree 0.95 0.83 0.9 0.03 0.03 0.03 ing cost. Whereas, in the case of the maintenance cost per unit
KNN 0.82 0.79 0.8 0.3 0.3 0.4
SVM 0.93 0.91 0.97 0.09 0.1 0.25 surface Z2, according to the regression coefficient values shown
AdaBoost 0.92 0.86 0.89 14.87 14.9 16.12 in Table 3, the two most important features are X9 (floor height)
Gradient Boosting 0.99 0.99 0.99 12.08 12.0 12.86 and X10 (kitchen width). Similar to the building cost Z1, it
Extra Tree 0.97 0.92 0.95 48.9 48.2 49.5 appears that X4 and X9 are the two most influential features in
Random Forest 0.97 0.92 0.95 44.94 46.3 47.7
XGBoost 0.99 0.99 0.99 16.87 18.21 18.62
the case of the total cost (Z1þZ2). These results are beneficial for
ANN 0.99 0.99 0.99 2.79 2.83 2.94 the sales representatives when choosing the appropriate features
to meet as much as possible the client’s requests with a

Figure 7. Scatter plot of the cross-validation result for (a) the linear regression models, and (b) the ANN models (the dashed red lines represent the y ¼ x line).
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 9

Table 3. Normalized regression coefficients of the linear regression models.

Normalized regression coefficients bi (i ¼ 1 … 11)
Case
(Z ¼ ) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
Z1 0.007 0.14 0.28 0.82 0.11 0.10 0.07 0.07 0.43 0.009 0.26
Z2 0.06 0.02 0.21 0.22 0.26 0.05 0.16 0.02 0.57 0.39 0.24
Z1þZ2 0.05 0.12 0.33 1.65 0.37 0.16 0.23 0.09 1.01 0.40 0.18

Figure 8. Architecture of the Artificial Neural Network used for Task 2.

reasonable price. Also, the client and seller can quickly identify The result indicates that the ANN model is capable of predicting
the critical features that need to be modified, but always under a the inverse task consisting of a set of building features (X1 to
given budget. The customer’s waiting time to receive the quotes X11) as the outputs with one input in a short period of time. In
from the changes can be minimized, improving customer practice, this information can allow the company to quickly
satisfaction. select each feature’s value for a given budget, then communicate
it to the client. For example, under a given amount of budget,
the model is able to claim that the client can afford a building
Identification of the features X for a given value of the cost with a specific number of floors and windows, etc.
(budget) Z
This task aims to perform a multi-output regression with only
Prediction of the required resources Y as a function of the
one independent variable (input). Three types of input variables
building features X
were investigated, including the building cost per unit surface Z1,
the maintenance cost per unit surface Z2, and the total cost The required resources Y are critical information for company
(Z1þZ2). The ANN regression model was employed for this task operation, which can be used to prepare needed resources and
because of its high quality in training time and accuracy, as estimate the costs for a construction project. Table 5 lists the
shown in Section prediction of the building and maintenance regression results for two cases. Similar to Tasks 1 and 2 shown
costs Z as a function of the features X. In addition, it appears in Sections prediction of the building and maintenance costs Z
that the ANN model is the most suitable for this task since there as a function of the features X and identification of the features
is only one input variable with multiple outputs. The ANN X for a given value of the cost (budget) Z for the costs Z, the
architecture was selected as 1-3-5-7-11, and it contains five layers first case consists of predicting the required resources Y from the
densely connected to one another, as shown in Figure 8. The input features X. The inverse case aims to determine the building
first layer has one node representing the single input. The final features X as a function of the input variables corresponding to
layer represents the 11 features corresponding to the input fea- the required resources Y. Contrary to Task 1 (see Section predic-
tures of Task 1. tion of the building and maintenance costs Z as a function of
The Relu activation function (Li and Yuan 2017) and Adam the features X) in which the output contains only one variable,
optimizer (Kingma and Ba 2014) were used in the inverse regres- the first problem investigated in this part consists of a multi-out-
sion model for this task. The loss computed by the Mean put regression task, knowing the number of the required resour-
Squared Error (MSE) for training and validation (see Eq. (5)) is ces is 11, similar to the input features X. The first problem can
shown in Figure 9. Note that an epoch is defined as one cycle be summarized as follows:
when the network looks through the full training dataset. It can þ Train the

be seen that the models rapidly converge after about 150 epochs. dataset: ðX1 , X2 , . . . , X11 Þ, ðY1 , Y2 , . . . , Y11 Þ , Yi 2 R100001
Additionally, no significant difference in the loss can be observed
þ Predict the vector Y ¼ ðY1 , Y2 , . . . , Y11 Þ for a given X:
for three types of costs Z1, Z2, and (Z1 þ Z2) after model
convergence. It is observed that the R2 values, describing the model accur-
Table 4 shows that the values of MSE obtained by the ANN acy, obtained by the ANN models for the two problem cases can
regression model can reach only about 0.0577, 0.058, and 0.0638 reach 0.99 and 0.985, as shown in Table 5. This result is similar
for Z1, Z2, and (Z1þZ2), respectively. For the training time, the to that of the one-output regression in Task 1 for the costs Z
models can quickly converge after only a few tenths of seconds. (see Section prediction of the building and maintenance costs Z
Besides, there is no significant difference among the three cases. as a function of the features X). Due to the complicated structure
10 T. Q. D. PHAM ET AL.

Figure 9. Epoch-dependent MSE loss of the training and validation for (a) Z1, (b) Z2, (c) (Z1þZ2).

Table 4. MSE and training time obtained by the ANN model for task 2: identifi- Table 5. ANN model’s results for two regressions tasks: required resources Y vs.
cation of the features X as a function of the costs Z. building features X, and the inverse problem.
Input case Mean squared error (MSE) Training time (s) R2 Training time (s)
Z1 0.0577 19.7 Prediction of the required resources Y 0.990 21.2
Z2 0.0580 20.3 Prediction of the building features X 0.985 23.4
Z1þZ2 0.0638 17.6

In the GA, the population contains 11 components with a

of the ANN, this method significantly increases the training total size of 100. In practice, the total size strongly depends on
time, i.e., 21.2 s and 23.4 s (see Table 5). In practice, these results the complexity of the dataset and the problem/model to be
are expected to help the engineers reduce the time for calculating solved; it can vary from a few tens for a simple linear regression
the number of resources needed to build the corresponding for a to a few thousand for ML-based models with highly complicated
given set of features, improving operational efficiency. structure. Note that, in this study, the total size was chosen by
following a trial-and-error process: (i) starting from a low popu-
lation; (ii) increasing the population progressively; (ii) monitor-
Optimization of the costs Z under constraints on the ing the improvement of the GA algorithm in both convergence
building features X time and performance. The minimization algorithm was then
performed under 100 generations in order to obtain the global
For this task, the building cost per unit surface Z1 and the main- minimum of Z. For this particular purpose, as discussed in
tenance cost per unit surface Z2 were minimized under con- Section data processing, the features X were standardized with
straints on certain building features using the ANN model and unit-variance and zero-mean using the following formula:
the Genetic Algorithm (GA) described in Section regression
frameworks. These constraints represent the features imposed by XX
Xscaled ¼ (7)
clients’ preferences, which may generally increase the minimum r
costs as compared to the case without constraints. As discussed where X and r are the mean and standard deviation of the
previously in the literature review section, the GA, which is cur- actual data before standardization. The maximum and minimum
rently one of the most popular optimization methods used to values of the building features after standardization are listed in
solve the global minimum problem, was chosen for this task. Table 6. As discussed previously in Section data processing,
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 11

Table 6. Maximum and minimum values of the building features X after standardization.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
Max 1.55 1.21 1.47 5.42 1.69 1.68 1.52 1.53 1.05 1.10 1.28
Min 1.54 1.97 3.27 0.93 2.46 2.47 1.92 1.94 3.24 4.41 2.81

Table 7. Optimal building features X for three types of costs.

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 Z min
Case 1 Z ¼ Z1 Real value 1.55 1.97 1.13 1.89 1.40 1.38 0.19 0.20 2.09 1.10 0.79 0
Prediction 0.8 1.49 1.75 3.44 2.23 0.92 0.53 0.95 2.64 0.45 0.85 0.12
Z ¼ Z2 Real value 1.15 1.21 1.47 0.93 0.50 1.38 1.92 1.07 3.24 1.10 0.76 0
Prediction 0.11 0.20 0.16 0.02 0.25 1.02 1.50 0.48 1.59 0.19 0.04 0.06
Z ¼ Z1þZ2 Real value 1.16 1.97 2.24 1.89 1.69 1.38 0.19 0.21 2.1 1.1 0.59 0
Prediction 0.3 1.62 2.70 3.14 4.04 0.70 0.45 1.50 2.89 0.29 0.25 0.08
Case 2 Z ¼ Z1 Real value 0.77 0.91 0.02 0.48 0.21 0.19 1.06 1.07 2.67 1.19 0.59 0.11
Prediction 0.22 0.47 0.74 1.71 1.67 0.62 0.39 1.10 1.80 0.04 0.73 0.04
Z ¼ Z2 Real value 0.77 0.91 0.02 0.48 0.21 0.19 1.06 1.07 2.67 1.19 0.59 0.04
Prediction 0.21 0.86 0.76 1.33 1.63 0.61 0.18 1.36 1.74 0.25 0.53 0.03
Z ¼ Z1þZ2 Real value 0.77 0.91 0.02 0.48 0.21 0.19 1.06 1.07 2.67 1.19 0.59 0.09
Prediction 0.08 0.67 1.67 1.96 2.24 0.46 0.01 1.12 1.58 0.41 0.24 0.03
Case 3 Z ¼ Z1 Real data 0.01 0.15 0.17 0.48 0.38 0.10 1.06 0.21 2.09 1.19 0.08 0.17
Prediction 0.23 0.77 0.65 1.53 1.68 0.05 0.24 0.39 0.70 0.34 0.22 0.16
Z ¼ Z2 Real data 0.01 0.15 0.43 0.48 0.38 0.19 0.19 1.07 2.09 1.19 0.08 0.14
Prediction 0.08 0.67 1.67 1.96 2.24 0.46 0.01 1.12 1.58 0.41 0.24 0.08
Z ¼ Z1þZ2 Real value 0.01 0.15 0.17 0.48 0.38 0.10 1.06 0.21 2.09 1.19 0.08 0.15
Prediction 0.03 0.48 1.57 1.91 1.93 0.18 0.34 0.28 0.59 0.49 0.35 0.13

some building features are highly correlated to one another, the For case 1, the optimized values are equal to 0.12, 0.06,
Principal Component Analysis (PCA) was used in this task. and 0.08 for Z1, Z2, and (Z1þZ2), respectively. Besides, the
In this study, three ranges of features X were selected, corre- actual normalized minimum costs are 0 for all three cases. After
sponding to 100% (case 1, Xmin X Xmax), 80% (case 2, reconversion into their original space of 11 dimensions, the min-
1.1Xmin X 0.9Xmax), and 60% (case 3, 1.2Xmin imum costs are reduced by 7%, 1%, and 1% for Z1, Z2, and
X 0.8Xmax) of the minimum and maximum values of the Z1þZ2, respectively. Regarding the optimal values of the building
actual dataset, as listed in Table 7. Case 1 can be considered features for the costs, the value of the feature X3 is reduced as
equivalent to a non-constrained optimization problem. Note that compared to its actual value. Besides, the feature X4 increases in
this section aims to demonstrate that an optimization problem comparison with its actual value. Note that X3 denotes the num-
under constraint can help to take into account customers’ prefer- ber of floors that can be considered as one of the most import-
ences. For the sake of simplification, the building features were ant building features since a lower value of X3 can lead to a
obtained by eliminating the top and bottom 10% quantiles from lower building cost. In addition, when the number of floors X3
the dataset for case 2, and 20% for case 3, to facilitate discus- decreases, the depth of the building X4 should increase to ensure
sions. For customers’ world-real requirements, the constraints its balance (negative correlation between these two features as
may be different, and solving these real requirements is shown in Figure 6). That is why the value of X4 increases com-
straightforward. pared to its actual value in three types of costs. In brief, using
For this task, the fitness function was obtained by the output the ML-based optimization, the minimum costs can be reduced
of the ANN model that has been developed in Section prediction by 7% at best as compared to its actual value. This finding helps
of the building and maintenance costs Z as a function of the fea- the company identify the optimal features for more profit-
tures X. The ANN model took 11 building features as the input able prices.
(X1 to X11). The output consists of three types of building costs For case 2, the range of the features was reduced by removing
Z. As a consequence, three Genetic Algorithm frameworks were the 10% of maximum and minimum values from the dataset. In
separately developed to consider three types of building costs, this case, the actual normalized minimum costs become 0.11,
including the building cost per unit surface Z1, the maintenance 0.04, and 0.09 for these three types of costs, respectively. The
cost per unit surface Z2, and the total cost (Z1þZ2). The GA optimized values are 0.04, 0.03, and 0.03 for three costs Z1, Z2,
algorithm was performed for 100 generations corresponding to and (Z1þZ2), respectively. As listed in Table 7, the optimization
the initial population’s size. Besides, the crossover is defined as process can reduce up to 6%, 1%, and 4% of the minimum costs
the average of each individual, and the mutation is the addition for Z1, Z2, and (Z1þZ2) as compared to the actual values. Similar
of a random value from the range [-0.1, 0.1] to each individual. to case 1, a lower value of X3 and a higher value of X4 can lower
Using the PCA, the number of building features was reduced the costs. Different from case 1, in case 2, X8 and X11 are
from 11 to 8, as described previously in Section data processing, decreased compared to their real data, while the others are
leading to a new matrix of 10,000 rows with 8 columns. This increased. This may be due to the high device range to maintain
matrix was then multiplied by the inversion of the conjunction in the building, knowing that X8 and X11 correspond to the
matrix of 11 rows 8 columns, transforming the building fea- number of facilities. As a consequence, the building cost will
tures back into the original data matrix of 10,000 rows 11 col- be decreased.
umns. Note that the lowest value of the actual cost after Another range of the boundary value was considered in case
normalization is 0; thus, the obtained actual cost after optimiza- 3, in which the building features were limited to 80% of the
tion is expected to be negative in Case 1. maximum and 120% of the minimum. Similar to the other cases,
12 T. Q. D. PHAM ET AL.

the optimization result shows that a decrease in X3, X8, and X11 required resources Y as both variables Y and Z are functions
can reduce the three types of costs, as listed in Table 7. Besides, of X. This task represents an important mission to plan
the optimization algorithm reduces 1%, 5%, and 2% of the actual potential scenarios for risk management, especially when
minimum costs for Z1, Z2, and (Z1þZ2). It is noted that the facing the world of uncertainty such as the labour shortage
(Z1þZ2) value is the total budget that the client needs to pay for due to the pandemic or the lack of certain materials result-
owning the building. The result obtained by the optimization ing from supply chain disruption.
process will help the client select the most affecting variables to
reduce the building cost according to his/her budget. Similarly, It should be noted that the above implications may only be
the company can quickly identify the critical features to change valid for medium sized partner company who used to build for
according to the client’s feature constraints. All these functions thousands of relatively similar small projects such as few storey
will help to improve operational efficiency and customer satisfac- buildings where the owner often directly contacts for a proposal
tion, and save costs. without a need to go through the formal procurement process
In summary, about 1 to 7% of three types of costs can be and where the building is co-designed by the owner and the
reduced in comparison with the actual values using the Genetic building company.
Algorithm with the fitness function calculated from the Artificial
Neural Network. Furthermore, it shows that the more the fea-
tures are constrained, the less the minimum cost reduction can Conclusions
be achieved. This result may be explained by the fact that the
minimum value is searched for within a lower space in the case This study provides a fast and accurate Machine learning (ML)
of constraints than in the space without constraints. In case 1, and optimization framework, which allows a quick estimate for
the highest reduction in the minimum cost is observed in Z1. building costs, hence improving operational efficiency and com-
However, in case 2 and case 3, Z2 is the most reduced by the petitiveness of construction company.
optimization process. Practical constraints by an expert will be A dataset composed of 10,000 parametric building configura-
provided in the implementation phase of this project. tions, collected from end-to-end real-world activities in the com-
pany, was used to train and validate the ML models to perform
multiple tasks. First, the costs and required resources were pre-
Practical implications and contributions to dicted for a given set of building features. Then, the possible
construction management building features were identified for a given budget. Finally, the
As discussed in the previous sections, this paper aims to demon- optimization of the building costs with feature constraints was
strate how ML could be used in a construction company to conducted. Numerical data was carefully processed, and the
exploit real-world operational data. Some practical implications Principal Component Analysis method was applied to remove
can be drawn upon this study as follows: the correlation of data features. Multiple algorithms were tested
to explore the suitability of ML to estimate the building costs.
In this paper, four tasks are identified by asking the right Among the 13 ML regression algorithms used, the Artificial
and interesting questions stemmed from real-world activ-
Neural Network, Gradient Boosting, and XGBoost models appear
ities. This process is of great importance to assure that the
to be the most suitable to estimate the building costs and the
dataset and suitable algorithms can be selected to deliver the
required resources with an accuracy of 99% within less than a
expected values of the data projects. Identifying regularly
second of the training time. The number of floors, the floor
the pain points and opportunities for improvement can help
depth, the floor height, and the kitchen’s width are found to be
the company figure out the tasks with potential good return
of investment. the most influential in estimating the building costs. Artificial
It is shown in this paper that once a sizable dataset of high Neural Network models were also developed to identify options
quality from end-to-end activities is available, ML is a that a client can afford under a given budget. The optimization
powerful tool to streamline operations, save design and problem under constraints was successfully solved, helping cli-
engineering costs, and improve customer experience. Also, ents determine the optimal building costs according to their
the self-improvement of the ML-based models will help to preferences. The optimized building costs obtained by the opti-
integrate skills and knowledge of different services within mization framework using ML-models are 7% smaller than those
the company to better predict the building costs when the of the actual data, hence improving the company’s
models are implemented and more data are collected. Data competitiveness.
collection plays a vital role in any data-driven project. The This study showcases that ML models can be efficiently used
company should adopt a centralized and integrated in the construction sector to optimize the workflow for cost sav-
approach to assure that both quality and quantities of data ings and provide some practical implications for data-driven
across different services are collected, shared, and exploited. construction management.
The paper provides a systematic application of 13 ML mod- For future study, the optimization with the constrained set of
els and employs different optimization techniques in order the required resources will be performed in order to exploit in-
to perform the four tasks stemmed from real-world activ- depth the datasets. Also, further investigation on the sensitivity
ities. Not only the hyperparameters of the ML models of ML models will be performed.
should be carefully computed, but also the accurate inter-
pretation of the prediction and performance of the results
obtained from ML models is essential. Acknowledgement
The fourth task involving the optimization of the costs
(building and maintenance) Z under constrained building The support of Thu Dau Mot University for this project is greatly
features X can be extended to the constraints on the appreciated.
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 13

Disclosure statement Limsombunchai V. 2004. House price prediction: hedonic price model vs.
artificial neural network. In New Zealand agricultural and resource eco-
No potential conflict of interest was reported by the author(s). nomics society conference; p. 25–26.
Lowe DJ, Emsley MW, Harding A. 2006. Predicting construction cost using
multiple regression techniques. J Constr Eng Manage. 132(7):750–758.
Madhuri CR, Anuradha G, Pujitha MV. 2019, March. House price prediction
References using regression techniques: A comparative study. In 2019 IEEE
Afonso B, Melo L, Oliveira W, Sousa S, Berton L. 2019. Housing prices pre- International Conference on Smart Structures and Systems (ICSSS); p.
diction with a deep learning and random forest ensemble. In Anais do 1–5.
XVI Encontro Nacional de Intelig^encia Artificial e Computacional. SBC; Maier O, Wilms M, von der Gablentz J, Kr€amer UM, M€ unte TF, Handels H.
p. 389–400. 2015. Extra tree forests for sub-acute ischemic stroke lesion segmentation
Aldous D. 1993. The continuum random tree 2I. Ann Probab. 167:248–289. in MR sequences. J Neurosci Methods. 240:89–100.
Bhagat N, Mohokar A, Mane S. 2016. House price forecasting using data Matel E, Vahdatikhaki F, Hosseinyalamdary S, Evers T, Voordijk H. 2019.
mining. IJCA. 152(2):23–26. An artificial neural network approach for cost estimation of engineering
M, Kilibarda M, Lisec A, Bajat B. 2018. Estimating the performance of
Ceh services. Int J Constr Manag. 1–14. DOI: 10.1080/15623599.2019.1692400
random forest versus multiple regression for predicting prices of the McKinney W. 2011. pandas: a foundational Python library for data analysis
apartments. IJGI. 7(5):168. and statistics. Python High Performance Scientific Comput. 14(9):1–9.
Chen JH, Ong CF, Zheng L, Hsu SC. 2017. Forecasting spatial dynamics of Park B, Bae JK. 2015. Using machine learning algorithms for housing price
the housing market using support vector machine. Int J Strategic Prop prediction: The case of Fairfax County, Virginia housing data. Expert Syst
Manag. 21(3):273–283. Appl. 42(6):2928–2934.
Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H. 2015. Xgboost: Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, …
extreme gradient boosting. R package version 0.4-2, 1(4). Duchesnay E. 2011. Scikit-learn: Machine learning in Python. J Machine
Friedman JH. 2002. Stochastic gradient boosting. Comput Stat Data Anal. Learn Res. 12:2825–2830.
38(4):367–378. Peng Z, Huang Q, Han Y. 2019. Model research on forecast of second-hand
Gao G, Bao Z, Cao J, Qin AK, Sellis T, Wu Z. 2019. Location-centered house
house price in chengdu based on xgboost algorithm. In 2019 IEEE 11th
price prediction: a multi-task learning approach. arXiv preprint arXiv:
International Conference on Advanced Infocomm Technology (ICAIT).
1901.01774.
Garcıa de Soto B, Agustı-Juan I, Joss S, Hunhevicz J. 2019. Implications of IEEE; p. 168–172.
construction 4.0 to the workforce and organizational structures. Int J Phan TD. 2018. Housing price prediction using machine learning algorithms:
Constr Manag. 1–13. DOI: 10.1080/15623599.2019.1616414 The case of Melbourne city, Australia. In 2018 International Conference
Ghosalkar NN, Dhage SN. 2018. Real estate value prediction using linear on Machine Learning and Data Engineering (iCMLDE). IEEE; p. 35–42.
regression. In 2018 Fourth International Conference on Computing Rardin RL, Rardin RL. 1998. Optimization in operations research (Vol. 166).
Communication Control and Automation (ICCUBEA). IEEE; p. 1–5. Upper Saddle River, NJ: Prentice Hall.
Goldberg DE. 2006. Genetic algorithms. USA: Springer. Risbeck MJ, Maravelias CT, Rawlings JB, Turney RD. 2015. Cost optimiza-
Gulli A, Pal S. 2017. Deep learning with Keras. Birmingham, England: Packt tion of combined building heating/cooling equipment via mixed-integer
Publishing Ltd. linear programming. In 2015 American Control Conference (ACC). IEEE;
Hoerl AE, Kennard RW. 1970. Ridge regression: Biased estimation for nonor- p. 1689–1694.
thogonal problems. Technometrics. 12(1):55–67. Roe BP, Yang HJ, Zhu J, Liu Y, Stancu I, McGregor G. 2005. Boosted deci-
Hong J, Choi H, Kim WS. 2020. A house price valuation based on the ran- sion trees as an alternative to artificial neural networks for particle identi-
dom forest approach: the mass appraisal of residential property in South fication. Nucl Instrum Methods Phys Res, Sect A. 543(2-3):577–584.
Korea. Int J Strategic Prop Manag. 24(3):140–152. Safavian SR, Landgrebe D. 1991. A survey of decision tree classifier method-
Huang CH, Hsieh SH. 2020. Predicting BIM labor cost with random forest ology. IEEE Trans Syst, Man, Cybern. 21(3):660–674.
and simple linear regression. Autom Constr. 118:103280. Sanni-Anibire MO, Zin RM, Olatunji SO. 2020. Machine learning model for
Hunter JD. 2007. Matplotlib: A 2D graphics environment. IEEE Ann Hist delay risk assessment in tall building projects. Int J Constr Manag. 1–10.
Comput. 9(03):90–95. DOI: 10.1080/15623599.2020.1768326.
Jain AK, Mao J, Mohiuddin KM. 1996. Artificial neural networks: A tutorial. Seber GA., Lee AJ. 2012. Linear regression analysis. Vol. 329. New Jersey,
Computer. 29(3):31–44. USA: John Wiley & Sons.
Jones E, Oliphant T, Peterson P. 2001. SciPy: Open source scientific tools for Sha’ar KZ, Assaf SA, Bambang T, Babsail M, Fattah AAE. 2017.
Python. Design–construction interface problems in large building construction
Jui JJ, Molla MI, Bari BS, Rashid M, Hasan MJ. 2020. Flat price prediction projects. Int J Constr Manag. 17(3):238–250.
using linear and random forest regression based on machine learning Suykens JA, Vandewalle J. 1999. Least squares support vector machine classi-
techniques. In Embracing Industry 4.0. Singapore: Springer; p. 205–217.
fiers. Neural Process Lett. 9(3):293–300.
Keller JM, Gray MR, Givens JA. 1985. A fuzzy k-nearest neighbor algorithm.
Tepeli E, Taillandier F, Breysse D. 2019. Multidimensional modelling of com-
IEEE Trans Syst, Man Cybern. SMC-15(4):580–585.
plex and strategic construction projects for a more effective risk manage-
Kim GH, An SH, Kang KI. 2004. Comparison of construction cost estimating
models based on regression analysis, neural networks, and case-based rea- ment. Int J Constr Manag. 1–22. DOI: 10.1080/15623599.2019.1606493.
soning. Build Environ. 39(10):1235–1242. Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J Stat
Kim GH, Shin JM, Kim S, Shin Y. 2013. Comparison of school building con- Soc Ser B Methodol. 58(1):267–288.
struction costs estimation methods using regression analysis, neural net- Truong Q, Nguyen M, Dang H, Mei B. 2020. Housing Price Prediction via
work, and support vector machine. JBCPR. 01(01):1–7. Improved Machine Learning Techniques. Procedia Comput Sci. 174:
Kim GH, Yoon JE, An SH, Cho HH, Kang KI. 2004. Neural network model 433–442.
incorporating a genetic algorithm in estimating construction costs. Build Van Der Walt S, Colbert SC, Varoquaux G. 2011. The NumPy array: a struc-
Environ. 39(11):1333–1340. ture for efficient numerical computation. Comput Sci Eng. 13(2):22–30.
Kingma DP, Ba J. 2014. Adam: A method for stochastic optimization. arXiv Wold S, Esbensen K, Geladi P. 1987. Principal component analysis.
preprint arXiv:1412.6980. Chemometrics Intell Lab Syst. 2(1-3):37–52.

Kravanja S, Zula T. 2010. Cost optimization of industrial steel building struc- Zou H, Hastie T. 2005. Regression shrinkage and selection via the elastic net,
tures. Adv Eng Softw. 41(3):442–450. with applications to microarrays. J Royal Statistical Soc B. 67(2):301–320.
Le T, Hassan F, Le C, Jeong HD. 2019. Understanding dynamic data inter-
action between civil integrated management technologies: a review of use
cases and enabling techniques. Int J Constr Manag. 1–22. DOI: 10.1080/ Appendix
15623599.2019.1678863.
Lesic V, Martincevic A, Vasak M. 2017. Modular energy cost optimization Due to competitive advantage constraints and confidentiality con-
for buildings with integrated microgrid. Appl Energy. 197:14–28.
cerns, the attached dataset was standardized, and no explicit labels
Li Y, Yuan Y. 2017. Convergence analysis of two-layer neural networks with
relu activation. In Advances in neural information processing systems; p. of the variables are listed. The interested readers are invited to con-
597–607. tact the authors for access to the actual dataset.

MR Marmalade PDF
No ratings yet
MR Marmalade PDF
26 pages
A Machine Learning Approach Based On Contract Parameters For Cost Forecasting in Construction
100% (1)
A Machine Learning Approach Based On Contract Parameters For Cost Forecasting in Construction
13 pages
Integrated Project Planning and Construction Based on Results
From Everand
Integrated Project Planning and Construction Based on Results
Williams Chirinos
No ratings yet
Contracts, Biddings and Tender:Rule of Thumb
From Everand
Contracts, Biddings and Tender:Rule of Thumb
Dr. Mohamed A. El-Reedy
5/5 (2)
Guide - Making Money Online
91% (11)
Guide - Making Money Online
324 pages
Sanni Anibire2021
No ratings yet
Sanni Anibire2021
10 pages
Research Paper BIM .8
No ratings yet
Research Paper BIM .8
5 pages
Using Neural Networks To Predict Construction Cost
No ratings yet
Using Neural Networks To Predict Construction Cost
9 pages
Enhanced Predictive Models For Construction Costs
No ratings yet
Enhanced Predictive Models For Construction Costs
18 pages
Deep Learning
No ratings yet
Deep Learning
14 pages
Application of BIM Technology in Construction Project Cost Refinement Control and Construction Energy Consumption Control
No ratings yet
Application of BIM Technology in Construction Project Cost Refinement Control and Construction Energy Consumption Control
15 pages
Cost Management: In Easy Way
From Everand
Cost Management: In Easy Way
Dr. Mohamed A. El-Reedy
5/5 (2)
Study of Conceptual Cost in Construction Project in Indian Sub-Continent Field Using ANN
No ratings yet
Study of Conceptual Cost in Construction Project in Indian Sub-Continent Field Using ANN
7 pages
G10 Research Paper Done V1.8
No ratings yet
G10 Research Paper Done V1.8
16 pages
Buildings 14 01072
No ratings yet
Buildings 14 01072
18 pages
Estimating The Volume of Civil Construction Materi
No ratings yet
Estimating The Volume of Civil Construction Materi
22 pages
Project Management For Procurement Management Module
From Everand
Project Management For Procurement Management Module
Dr. Mohamed A. El-Reedy
No ratings yet
Migration of Network Infrastructure: Project Management Experience
From Everand
Migration of Network Infrastructure: Project Management Experience
Tomasz Szymanski
No ratings yet
Improving Efficiency and Accuracy in Construction Sales Valuation Via Random Search Optimization
No ratings yet
Improving Efficiency and Accuracy in Construction Sales Valuation Via Random Search Optimization
11 pages
4 Application of Machine Learning in Construction Management
No ratings yet
4 Application of Machine Learning in Construction Management
5 pages
Trends In Project Management
From Everand
Trends In Project Management
Quay Consulting
No ratings yet
Consise Cloud Compute: It Professionals’ Handbook
From Everand
Consise Cloud Compute: It Professionals’ Handbook
Vijay
No ratings yet
Ug18021fu1 6
No ratings yet
Ug18021fu1 6
1 page
UG18021FU1
No ratings yet
UG18021FU1
11 pages
Ug18021fu1 1
No ratings yet
Ug18021fu1 1
2 pages
Cloud Brokering
From Everand
Cloud Brokering
Felipe Díaz-Sánchez
No ratings yet
Design Management Framework
From Everand
Design Management Framework
Ivy M. A. Abu
No ratings yet
12 Machine Learning Model To Predict Construction Duration
No ratings yet
12 Machine Learning Model To Predict Construction Duration
15 pages
Artikel - Analysis of Construction Cost and Investment Planning Using Time Series Data
No ratings yet
Artikel - Analysis of Construction Cost and Investment Planning Using Time Series Data
16 pages
Research On Construction Project Cost Budget and C
No ratings yet
Research On Construction Project Cost Budget and C
15 pages
Business Project Investment: Risk Assessment & Decision Making
From Everand
Business Project Investment: Risk Assessment & Decision Making
Dr. Mohamed A. El-Reedy
No ratings yet
Final Thesis
No ratings yet
Final Thesis
70 pages
Quality Management System Concept
From Everand
Quality Management System Concept
James Hutchins
3/5 (1)
5 - Design Information Extraction From Construction Specifications To Support Cost Estimation
No ratings yet
5 - Design Information Extraction From Construction Specifications To Support Cost Estimation
14 pages
Review of Cost Estimation Models: Nikhil K Gilson, Alester Joseph Vanreyk
No ratings yet
Review of Cost Estimation Models: Nikhil K Gilson, Alester Joseph Vanreyk
3 pages
Dancing on a Cloud: A Framework for Increasing Business Agility
From Everand
Dancing on a Cloud: A Framework for Increasing Business Agility
David Sterling
No ratings yet
UK Office Space: Regional Cities Flexible Workspace Providers Directory 2021
From Everand
UK Office Space: Regional Cities Flexible Workspace Providers Directory 2021
The Office Providers
No ratings yet
Shubham Sirsat
No ratings yet
Shubham Sirsat
15 pages
Research Paper - Anjuli Sapkota
No ratings yet
Research Paper - Anjuli Sapkota
20 pages
Construction Cost Estimation Model and Dynamic Management Control Analysis Based On Artificial Intelligence
No ratings yet
Construction Cost Estimation Model and Dynamic Management Control Analysis Based On Artificial Intelligence
12 pages
Construction Cost Management: Cost Engineering, Cost Controls & Controlled Bidding
From Everand
Construction Cost Management: Cost Engineering, Cost Controls & Controlled Bidding
Adek Apfelbaum
3/5 (2)
Ug18021fu1 9
No ratings yet
Ug18021fu1 9
2 pages
The A-B-C of Construction Estimating & Costing
From Everand
The A-B-C of Construction Estimating & Costing
william amolo
No ratings yet
Edge Computing Applications in Supply Chain Management
From Everand
Edge Computing Applications in Supply Chain Management
Bo Li
No ratings yet
Predicitive Models Building
No ratings yet
Predicitive Models Building
22 pages
A Neural Network Approach For Early Cost Estimation of Structural Systems of Buildings
No ratings yet
A Neural Network Approach For Early Cost Estimation of Structural Systems of Buildings
8 pages
A Building Cost Estimation Model Based On Cost Significant Work Packages
No ratings yet
A Building Cost Estimation Model Based On Cost Significant Work Packages
13 pages
Ref Paper Part
No ratings yet
Ref Paper Part
10 pages
Project Control Methods and Best Practices: Achieving Project Success
From Everand
Project Control Methods and Best Practices: Achieving Project Success
Yakubu
No ratings yet
Integrated Building Information Modelling
From Everand
Integrated Building Information Modelling
Peng Wu
No ratings yet
Cost Estimation in Agile Software Development: Utilizing Functional Size Measurement Methods
From Everand
Cost Estimation in Agile Software Development: Utilizing Functional Size Measurement Methods
Stefan Luckhaus
No ratings yet
C-O-S-T: Cost Optimization System and Technique
From Everand
C-O-S-T: Cost Optimization System and Technique
Craig Theisen
No ratings yet
The Impact of Artificial Intelligence On
No ratings yet
The Impact of Artificial Intelligence On
10 pages
2014 - Sousa - Role of Statistics and Engineering Judgment in Developing Optimized Time Cost
No ratings yet
2014 - Sousa - Role of Statistics and Engineering Judgment in Developing Optimized Time Cost
10 pages
Design & Construction of the Contract Package Concept
From Everand
Design & Construction of the Contract Package Concept
Robert E. Bartz
No ratings yet
Intelligent Building Construction Cost Optimization and Prediction by Integrating BIM and Elman Neural Network
No ratings yet
Intelligent Building Construction Cost Optimization and Prediction by Integrating BIM and Elman Neural Network
15 pages
BIM-based Modeling and Management of Design Options at Early Planning CC
No ratings yet
BIM-based Modeling and Management of Design Options at Early Planning CC
14 pages
A Construction Cost Estimation Framework Using DNN and Validation Unit
No ratings yet
A Construction Cost Estimation Framework Using DNN and Validation Unit
12 pages
QTY 804 Assignment Trends in Cost Modelling
No ratings yet
QTY 804 Assignment Trends in Cost Modelling
3 pages
The Future of Work for Technical Professionals
From Everand
The Future of Work for Technical Professionals
Tom Henricksen
No ratings yet
The Commercial Real Estate Revolution: Nine Transforming Keys to Lowering Costs, Cutting Waste, and Driving Change in a Broken Industry
From Everand
The Commercial Real Estate Revolution: Nine Transforming Keys to Lowering Costs, Cutting Waste, and Driving Change in a Broken Industry
Rex Miller
No ratings yet
Cost Machine Learning Thesis
No ratings yet
Cost Machine Learning Thesis
91 pages
10 1061@ascela 1943-4170 0000421
No ratings yet
10 1061@ascela 1943-4170 0000421
9 pages
What Is Substantial Similarity
No ratings yet
What Is Substantial Similarity
18 pages
10 1016@j Autcon 2019 102974
No ratings yet
10 1016@j Autcon 2019 102974
14 pages
Machine Learning Model For Delay Risk Assessment in Tall Building Projects
No ratings yet
Machine Learning Model For Delay Risk Assessment in Tall Building Projects
11 pages
PHD Thesis CKTS
No ratings yet
PHD Thesis CKTS
590 pages
Pilskog Orvik 2024 IOP Conf. Ser. Earth Environ. Sci. 1389 012012
No ratings yet
Pilskog Orvik 2024 IOP Conf. Ser. Earth Environ. Sci. 1389 012012
14 pages
Lam 2017
No ratings yet
Lam 2017
17 pages
Key Influencing Factors Toward Switching To Solar
No ratings yet
Key Influencing Factors Toward Switching To Solar
11 pages
Proceedings of The Construction Business & Project Management Conference
No ratings yet
Proceedings of The Construction Business & Project Management Conference
435 pages
Work Life Balance of Professional Quantity Surveyors Enagegd in The Construction Industry
100% (1)
Work Life Balance of Professional Quantity Surveyors Enagegd in The Construction Industry
19 pages
Risk Estimation and Risk Prediction Using Machine-Learning Methods
No ratings yet
Risk Estimation and Risk Prediction Using Machine-Learning Methods
16 pages
Ijsret 2024 0021
No ratings yet
Ijsret 2024 0021
10 pages
Limitations of Risk Identification Tools Applied in Project Management in The Nigerian Construction Industry
No ratings yet
Limitations of Risk Identification Tools Applied in Project Management in The Nigerian Construction Industry
119 pages
Measuring Productivity Levels in Construction
No ratings yet
Measuring Productivity Levels in Construction
10 pages
Risk Reduction On Infrastructure Project
No ratings yet
Risk Reduction On Infrastructure Project
8 pages
Productivity Measurement
No ratings yet
Productivity Measurement
12 pages
National Infrastructure Policy 2023
No ratings yet
National Infrastructure Policy 2023
42 pages
Bar Bending PDF
No ratings yet
Bar Bending PDF
4 pages
SSRN 5023999
No ratings yet
SSRN 5023999
15 pages
2347-Cikk PDF-11347-1-10-20231219
No ratings yet
2347-Cikk PDF-11347-1-10-20231219
15 pages
Measuring Productivity Using Linear Regression Technique in Labour-Intensive Construction Projects
No ratings yet
Measuring Productivity Using Linear Regression Technique in Labour-Intensive Construction Projects
22 pages
Ijhss 5 20002-6
No ratings yet
Ijhss 5 20002-6
9 pages
Friction Torque of A Rotary Shaft Lip Seal
No ratings yet
Friction Torque of A Rotary Shaft Lip Seal
5 pages
Navdeep Singh Sodhi
No ratings yet
Navdeep Singh Sodhi
35 pages
DR AI 1688489062
No ratings yet
DR AI 1688489062
44 pages
Chapter 1 - Notes - Fixed Income Analysis
No ratings yet
Chapter 1 - Notes - Fixed Income Analysis
3 pages
5.1 s2.0 S095006182032657X Main
No ratings yet
5.1 s2.0 S095006182032657X Main
15 pages
1st Sem Result
No ratings yet
1st Sem Result
1 page
2024 Style Guide Template
No ratings yet
2024 Style Guide Template
7 pages
Altivar 61 For Medium Voltage Motors
No ratings yet
Altivar 61 For Medium Voltage Motors
34 pages
Blue Zones Minestrone - Dan's Version - Dan Buettner
No ratings yet
Blue Zones Minestrone - Dan's Version - Dan Buettner
3 pages
(001~052) 영어3학년 미래엔 (최연희) 정답.indd 1 2020-12-11 오후 3:17:30
No ratings yet
(001~052) 영어3학년 미래엔 (최연희) 정답.indd 1 2020-12-11 오후 3:17:30
52 pages
Interim Budget 2024 EY Highlights
No ratings yet
Interim Budget 2024 EY Highlights
23 pages
Career Summary Recruitment (BFSI, Analytics, IT & Non-IT), Leadership Hiring, Training &
No ratings yet
Career Summary Recruitment (BFSI, Analytics, IT & Non-IT), Leadership Hiring, Training &
2 pages
Equity Research Report EID PARRY
No ratings yet
Equity Research Report EID PARRY
6 pages
Rebooting The Indian Startup Ecosystem: A Case Study by T-Hub
No ratings yet
Rebooting The Indian Startup Ecosystem: A Case Study by T-Hub
3 pages
RW 11 12 Unit 5 Lesson 3 Problem-Solution
No ratings yet
RW 11 12 Unit 5 Lesson 3 Problem-Solution
23 pages
Persuasive Speech On Homework Should Be Banned
100% (1)
Persuasive Speech On Homework Should Be Banned
6 pages
Privacy Issues in Smart Home Devices Using Internet of Things - A Survey
No ratings yet
Privacy Issues in Smart Home Devices Using Internet of Things - A Survey
4 pages
PSSS - 2 Membership Option Form
No ratings yet
PSSS - 2 Membership Option Form
1 page
Penberthy Eductor
No ratings yet
Penberthy Eductor
16 pages
Class 10 Geography Chapter 12
No ratings yet
Class 10 Geography Chapter 12
10 pages
Layout 2 of 250-300TPH Crushing Plant-20240925
No ratings yet
Layout 2 of 250-300TPH Crushing Plant-20240925
1 page
DAA NOTES UNIT 1 (Design and Analysis of Algorithm)
No ratings yet
DAA NOTES UNIT 1 (Design and Analysis of Algorithm)
18 pages
Lovair - L-990 (991 992) Sensor Tap - Parts List
No ratings yet
Lovair - L-990 (991 992) Sensor Tap - Parts List
1 page
Knook Sampler Scarf
No ratings yet
Knook Sampler Scarf
6 pages
Futuretech Olympiad 2022
No ratings yet
Futuretech Olympiad 2022
12 pages
HR Forecasting Assignment Individual Assignment 2
No ratings yet
HR Forecasting Assignment Individual Assignment 2
3 pages
Extended Abstract For Diesel Combustion
No ratings yet
Extended Abstract For Diesel Combustion
8 pages

Pham 2021

Uploaded by

Pham 2021

Uploaded by

International Journal of Construction Management

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tjcm20

Efficient estimation and optimization of building

T. Q. D. Pham, T. Le-Hong & X. V. Tran

To link to this article: https://doi.org/10.1080/15623599.2021.1943630

Published online: 02 Jul 2021.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

Efficient estimation and optimization of building costs using machine learning

Introduction estimate associated costs that are construction and maintenance

CONTACT X. V. Tran [email protected]

Figure 2. Workflow for the ML regression framework.

Figure 3. Distributions of the normalized values of X, Z, and Y.

between features appear to be plausible as in the building t ¼ UðxÞ ¼ Wx þ b, (2-a)

Linear regression-based ML model is a parametric and X

Table 3. Normalized regression coefficients of the linear regression models.

Figure 8. Architecture of the Artificial Neural Network used for Task 2.

In the GA, the population contains 11 components with a

Table 7. Optimal building features X for three types of costs.

You might also like