Performance Evaluation of Normalization-Based CBR Models For Improving
Performance Evaluation of Normalization-Based CBR Models For Improving
Automation in Construction
journal homepage: www.elsevier.com/locate/autcon
Keywords: Case-based reasoning (CBR) can be an effective approach to achieve reliable accuracy in cost estimation for
Case-based reasoning construction projects, especially in the early design stages where only limited information is available. As CBR
Construction cost estimation relies on historical data, it is important to perform data preprocessing to obtain high-quality of base cases.
Data preprocessing Normalization preprocessing gives all attributes standard scores so that they can be compared. This research
Normalization
examines the effects of normalization methods through performance evaluations of normalization-based CBR
models to improve construction cost estimation in the early design stages. Multi-family housing complexes were
used as case studies, and leave-one-out cross validation (LOOCV) was used for model validation. The perfor-
mance of the CBR models was evaluated using the mean absolute error rate (MAER), mean squared deviation
(MSD), mean absolute deviation (MAD), and standard deviation (SD) for accuracy and stability. The kernel
density estimation (KDE) method was used to examine the appropriateness of the normalization methods. The
results are expected to contribute to the enhancement of accuracy and stability of CBR-based cost estimation and
to support decision-making. The suggested method could also be applied to other CBR areas such as energy
prediction, noise management, bid decision-making, and scheduling, as well as other data-oriented methods,
such as regression analysis and artificial neural networks.
1. Introduction projects to resolve new problems and has become an effective problem-
solving method to improve the accuracy of construction cost estimation
A successful one-off construction project can be achieved by esti- [1–3,7–11]. There is a tendency among most people to think about
mating construction cost with a high level of accuracy, which is crucial problems they have encountered in the past when they are faced with
in the conceptual and schematic design stages due to the influence on new challenges [12]. People try to establish a relationship between a
cost reduction [1–3]. A considerable amount of building cost is required current problem and those they have faced in the past [13]. CBR highly
with the increased expectations for higher quality with lower or stable relies on past experience or data, so a quality base case needs to be
budgeting, so construction costs need to be accurately estimated and prepared to obtain quality cost estimation results [14]. Historical data
managed as one of the key decision-making elements [4,5]. However, is the foundation of accurate cost estimation and provides credibility,
minimal information is obtainable in the early design phases, unlike the accuracy, and defensibility [15–17].
detailed design and documentation stages, where precise material take- Stiff and Mongeau [18] observed that a more familiar and reliable
off information is available. Thus, owners and cost estimators require source will result in more comprehension and persuasion among a
highly effective strategies [6]. specific group of people. For this reason, even owners and cost esti-
Case-based reasoning (CBR) uses knowledge gained from past mators who do not have much experience and knowledge can build
Corresponding author.
⁎
E-mail addresses: [email protected] (J. Ahn), [email protected] (S.-H. Ji), [email protected] (S.J. Ahn), [email protected] (M. Park),
[email protected] (H.-S. Lee), [email protected] (E.-B. Lee), [email protected] (Y. Kim).
https://doi.org/10.1016/j.autcon.2020.103329
Received 23 January 2017; Received in revised form 25 May 2020; Accepted 13 June 2020
Available online 26 June 2020
0926-5805/ © 2020 Elsevier B.V. All rights reserved.
J. Ahn, et al. Automation in Construction 119 (2020) 103329
effective strategies and be more persuasive by using CBR cost estima- which include improved user acceptance, incremental learning, re-
tion [6], whereas artificial neural networks lack transparency in terms duced problem solving, and easier knowledge acquisition. Leake [31]
of their processes [10,19]. However, it is difficult to use advanced cost asserts that CBR is more effective than rule-based systems, which are
estimating models in the predesign stages, such as statistical, CBR, ex- useful when only one or a few solutions to a problem are possible. Ei-
pert systems, fuzzy logic rules, and machine learning techniques. senstadt and Althoff [34] discussed an overview of the CBR cycle
Therefore, developing a more user-friendly, precise, and reliable cost modifications and emphasized potentiality of integration of CBR system
model is important [20]. with artificial intelligence.
A quality base case is obtained through data preprocessing, which is
a preliminary process that prepares secondary data to identify re-
lationships and available patterns hidden in a large quantity of data 2.2. Literature reviews on normalization in CBR cost estimation
[14]. Data preprocessing is a standard practice in data mining used for
normalization, denoising internal errors, or abnormal values [21]. It is In CBR cost estimation, data preprocessing is a preliminary process
always important first to ensure normalization or standardization in that is often used for working out vital or meaningful relationships and
CBR cost estimation to assess the attribute similarity under identical patterns hidden within a large quantity of information [35–37]. Thus,
standards [22,23]. Normalization refers to the adjustment of all col- normalization is very crucial in data mining, which apparently com-
lected data to standardized values assigned to specific ranges such as prises a variety of processing procedures that are aimed at preparing
0.0 to 1.0. raw data for further processes, including the actual estimations [21,25].
This allows for comparison of corresponding standardized values for Normalization in this context refers to the adjustment of values mea-
different datasets in a way that eliminates the effect of certain gross sured on different scales to a notionally common scale. The process of
influences [24,25]. The units used to measure each attribute can distort normalization involves the conversion of all collected data into stan-
the data and result in unnecessarily complicated interactions by causing dardized values [38], and there are different methods that focus on
some dimensions to be very different from others. This can impact the achieving a different goal. Ji et al. [6] suggested a cost estimation
accuracy of a data-oriented model due to the input effect of a large model for building projects using CBR and used z-score normalization
value [26]. Therefore, normalization is a necessary process in CBR that to standardize errors when population attributes are known. This nor-
can influence the estimation results [22,27]. malization method can be applied to normally distributed populations.
This research examines the effects of normalization methods applied In statistics, the standard score (also referred to as the z-score, normal
to CBR cost models based on a hypothesis: CBR cost estimation accu- score, or z-value) is the number of standard deviations by which an
racy and stability can be improved by employing statistically accurate observation or datum is above the mean.
normalization methods. CBR cost estimation models were compared According to Sevgi et al. [3], the median (Med) method is similar to
using five different normalization methods: interval standardization, the total count (TC) method, in which unit counts are divided by the
Gaussian distribution-based normalization, z-score normalization, lo- total number of mapped reads associated with their lane and multiplied
gistic function-based normalization, and ratio standardization. by the mean total count across all samples of the dataset. However, in
Euclidean distance was used as a similarity measurement, and genetic the Med method, the total counts are replaced by the median counts
algorithms were used as an attribute weight-assigning method. The cost that are different from 0 in the computation of the normalization fac-
models are used in the early design stages (i.e., conceptual and sche- tors. The method by Sevgi et al. proved instrumental in instances where
matic design), and the cost data was constructed using public multi- decision trees are used to determine attribute weights in a case-based
family housing. model of early cost prediction.
The research process is as follows. First, a comprehensive literature Feature scaling is a normalization method used to standardize the
review of normalization in CBR cost estimation and CBR method was range of independent attributes of data. Koo et al. [11] developed a
conducted. The characteristics of the normalization methods are ex- construction cost prediction model with improved prediction capacity
plained. Finally, the CBR cost models were developed and validated using an advanced CBR approach. The method was used to deal with
using the leave-one-out cross-validation (LOOCV) method. The mean challenges posed by a wide range of raw data values. This data nor-
absolute error rate (MAER), mean squared deviation (MSD), and mean malization procedure was carried out during the data preprocessing
absolute deviation (MAD) were used as performance measurement step and was intended to standardize features that had a broad range of
methods for accuracy, and the standard deviation (SD) was used to values so that each of the values can contribute proportionately to the
show the stability of the models [28]. The kernel density estimation investigated final attribute.
(KDE) method was performed for the appropriate selection of normal- Shalabi et al. [21] emphasized different methods of normalization
ization methods. against an induction decision tree (ID3) using Hue Saturation Value
(HSV) data-testing procedures. Three normalization methods were
2. Normalization in CBR cost estimation tested: the z-score method, min-max, and decimal point normalization.
The results showed that min-max was the best method for the training
2.1. Case-based reasoning dataset with regard to the accuracy and efficiency of the whole HSV
dataset. However, the test of normalization methods was limited to the
CBR is a process of solving new problems using solutions that have use of training data and the data structure, and the types of attributes
been used in the past to solve similar problems and provide references were not clearly stated. Therefore, tests of normalization methods need
for decision-makers [29]. CBR can be seen as a four-part cycle [30]: 1) further examination based on real construction-project data using CBR
retrieving the most similar cases; 2) reusing the cases to resolve new methods.
issues; 3) revising the cases, which is suggested; and 4) retaining the As summarized in Table 1, very limited research on CBR cost esti-
cases and findings for future use. According to Leake [31], retrieved mation has emphasized the importance of applying a normalization
information is vital in predicting the values of target features of a new method. Most previous studies were limited in stating whether nor-
problem. The method has proven effective in solving complex problems malization was performed or not. Some research did not indicate which
with many alternative solutions. normalization methods were used. Overall, there is little in-depth ex-
In CBR systems, memory is seen as forming the basis of learning amination of which normalization methods are more reliable in terms
capability [13]. A CBR system has the potential to learn without looking of accuracy and stability when they are applied to CBR cost estimation.
into particular formulas, cases, symbolic representations, or rules [32]. The statistical appropriateness of normalization methods also needs to
This type of demand-driven approach has many advantages [26,33], be further discussed. Therefore, more comparative research is needed.
2
J. Ahn, et al.
Table 1
Literature reviews on normalization in CBR cost estimation (revised from [6]).
Researcher Objective Data profile Attribute Normalization
Project type Year # of cases for # of cases for # of Scale type Weighting method Applied Method Reason for Method
model validation attribute selection comparison
construction
Du and Bormann Quantity take off in Power plant 1993–2010 47 1 (20 crafts) 12 Nominal, ratio Sobol's total Applied N/A N/A N/A
[19] proposal development sensitivity
Kim and Hong Cost estimation for Railroad-bridge 1998–2009 134 5 8 Nominal, ratio GA N/A N/A N/A N/A
[39] railroad-bridge
construction project in
the planning phase
Ji et al. [2] Develop a case Military barrack 2004–2008 129 13 18 Nominal, ratio GA Applied STANDARDIZE & To organize N/A
adaptation method for NORMDIST function data for more
3
construction cost (EXCEL) efficient access
estimation
Jin et al. [9] MRA-based revised CBR Business facility, – 59 (multi-family), 40 (multi-family), 10 Nominal, ratio MRA N/A N/A N/A N/A
cost prediction model multi-family 31 (business) 10 (business)
housing
Ji et al. [40] Develop military facility Military barrack 2004–2009 422 10 9–18 Nominal, ratio GA Applied Probability density N/A N/A
cost estimation system function
Koo et al. [23] CBR based hybrid Multi-family 2000–2005 101 – 20 Nominal, ratio GA Applied Original value/ To analyze N/A
model for cost and housing maximum value under identical
duration estimation standards
Kim and Kim Preliminary cost Bridge 2000–2005 585 30 5 Nominal, ratio GA N/A N/A N/A N/A
[10] estimation
An et al. [41] CBR cost estimation Residential 1997–2002 540 40 9 Nominal, ratio AHP, Feature N/A N/A N/A N/A
model using AHP building counting, gradient
descent
Doğan et al. [1] Cost of structural Residential – 24 5 8 Nominal, ratio Decision tree N/A N/A N/A N/A
system estimation building
Yau and Yang Cost and duration Office building – 60 (hypothetical) 3 (hypothetical) 10 Nominal, ratio Subjectively N/A N/A N/A N/A
[42] estimation for a assigned by
building project authors
Automation in Construction 119 (2020) 103329
J. Ahn, et al. Automation in Construction 119 (2020) 103329
3. Normalization-based CBR cost model dimensionless quantity, which is the standard score [46]. This whole
conversion process is referred to as normalizing or standardizing. Ac-
3.1. Normalization methods cording to Carroll and Carroll [47], z-score normalizations strong point
is that it can work out prediction intervals. As z-score normalization is
This part explains five normalization methods: interval standardi- based on the standard deviation and not the range, normalized attri-
zation, Gaussian distribution-based normalization, z-score normal- butes are less affected by outliers. However, the weakness of z-score
ization, logistic function-based normalization, and ratio standardiza- normalization is its ineffectiveness in cases where population para-
tion. Characteristics, pros and cons of each normalization method are meters are unknown or estimated. Each attribute needs to have a
described. These normalization methods are applied to the case-base; normal or at least a symmetric distribution to properly use the z-score
and normalized data are utilized in the CBR cost models in Section 3.2 method. The value v of attribute A is normalized to V by computing:
CBR Cost Model Development. v µA
V = , µ : mean, : standard deviation
A (3)
3.1.1. Interval standardization method
The interval standardization method (also called score range
transformation) can cover attributes of ordinal, interval, and ratio scale 3.1.4. Logistic function-based normalization
types and transform the score range exactly between 0 and 1. This The logistic function is mainly applied in logistic regression, where
method applies the theory and practice of rigorously working on a it is used to model how the probability of an event may be affected by
computer with certain and uncertain real numbers, which are re- one or more explanatory attributes. The logistic function is instru-
presented as intervals. The method involves the appropriate use of mental in the Rasch model, which is used in item response theory [48].
standard floating-point calculations, direct floating-point calculations, After computing the standard score of each attribute, normalization is
and interval arithmetic [43]. These three computations are combined in performed using the values of the logistic function, which correspond to
a way that gives reasonable enclosures of the results with an acceptable the standard scores. However, the coefficient in a logistic function
cost. However, this type of normalization method also has a dis- changes according to the logit. Thus, the interpretation of normalized
advantage in that transformed scores are not proportional to the ori- values using a logistic function is more complicated [49].
ginal data. Thus, relative distances among the original data are not 1
preserved. The method also requires too much attention, especially Normalized Score rij = L (z ij ) = , L ( ): Logistic Function
1+e z ij (4)
with the use of interval arithmetic, so the results might be invalid [43].
x ij x min
j 3.1.5. Ratio standardization
, for benefit criterion
x jmax x min
j Ratio standardization or maximum score transformation is the
x max x ij simplest linear transformation method. In the context of score trans-
j
rij = , for cost criterion formation, ratio standardization is a procedure of converting raw scores
x max x jmin
j
into transformed scores. The main advantage is that proportional
|x ij T| properties from the original data properties still remain. In other words,
, for desired value of T
max {x max
j T, T x min
j } (1) the normalized values preserve the relative distances of the original
data. According to Bod et al. [48], the procedure serves crucial pur-
poses, such as giving meaning to the scores, thus allowing for easy in-
3.1.2. Gaussian distribution-based normalization terpretation. Direct comparison of two scores is also possible. The two
Gaussian distribution normalization can be used to describe phy- main types of transformation are percentile ranks and linear transfor-
sical events when the number of events is very large. A Gaussian dis- mation. However, the disadvantages are that the minimum value of the
tribution is a continuous probability distribution that approximates the transformed data is not 0, which causes difficulties for interpretation
exact binomial distribution of events [44]. Gaussian distribution nor- when both positive and negative values coexist in the original data.
malization has other names, such as a normal distribution. Because of
x ij
its curved flaring shape, the Gaussian distribution is also referred to as a , benefit criteria
“bell-shaped curve.” According to Havil [44], the biggest advantage of x jmax
rij =
Gaussian distribution-based normalization is the many convenient x jmin
properties, such as normal sum distribution and normal difference , cost criteria
x ij (5)
distribution. However, there is always an unfortunate tendency to in-
voke normal distributions in situations where they may not be applic-
able [45]. 3.2. CBR cost model development
Standard Score z ij
The applied methods for developing CBR cost models are different
x ij xj between research [7,8,40,42,50,51], so details of the CBR cost model
= , x j : mean of attribute j, sj : standard deviation of attribute j
sj need to be elaborated to satisfy the particular research aims, such as
(2) normalization, similarity measurement, and attribute weight assign-
ment methods. This development of the CBR cost models (Fig. 1) is
rij explained in three sections: 1) data acquisition and analysis, 2) Eu-
= (z ij ), ( ): Cumulative distribution function of standard normal clidean distance-based similarity measurement, and 3) genetic algo-
rithm-based attribute weight assignment.
distribution
4
J. Ahn, et al. Automation in Construction 119 (2020) 103329
Attribute assessment
Measuring attribute and
case similarity
Data normalization
Interval
Computing average cost of
Gaussian k-nearest neighbor cases
Z-score (1,2,5,10 NN)
Logistic
Ratio Cost estimation results
Normalized Report
database
estimation results from quality base cases. Based on the data structure, we constructed base cases of 100 multi-
Raw data must be collected and arranged in an appropriate manner family housing projects [7,8,35]. The cost data is the priced bills of
to avoid ambiguity and inconsistency [52]. Table 2 shows a data quantities that are used by public owners to estimate an initial project
structure for the base cases. It comprises many cases from 1 to n. At- budget and by contractors to prepare bid proposals. As shown in Fig. 2,
tributes are numbered from one to an indefinite value m. The re- the typical cost breakdown structure in Korea is a composition of bills
lationships between the attributes and the cases give the estimated of quantities and organized by trades (i.e., work types), such as pre-
values of the project cost. 12 attributes were used, which were selected liminaries, site work, interior work, stone and tile, and so forth. The
through a first screening by a literature review and expert interviews, a work types consist of elemental work like living room wall finishing,
second screening by correlation analysis and regression analysis of interior ceiling finishing, bathroom floor tile work, and entrance stone
sample cases, and a third screening by expert interviews with experi- and tile. The work is divided into elements that consist of a combination
enced practitioners in the construction industry [8]. The 12 extracted of material cost, labor cost, and other expenses.
attributes are 1) the number of households, 2) gross floor area, 3) Korea's housing supply legislation has enforced the use of unit gross
number of unit floor households, 4) number of elevators, 5) number of area for multi-family housing construction, so a database could be built
floors, 6) number of piloti with household scale, 7) number of house- in accordance with unit types. The obtained cost data comprised either
holds of unit floor per elevator, 8) height between stories, 9) depth of singular or plural types of four units (i.e., 49 m2, 59 m2, 84 m2, and
pit, 10) roof type, 11) hallway type, and 12) structural type. 114 m2). The convenience of the data analysis and the accuracy of the
cost data were improved by the separation of all historical data. We
Table 2 classified the data by singular-unit gross area. The cost for each unit
Data structure of case base. type from historical data was computed using equations from Ji et al.
Attribute Project cost
[35] and Ahn et al. [7]. To consider economic issues such as the in-
flation rate, the cost data was adjusted using a Korean construction cost
1 2 ⋯ j ⋯ m index published by the Korea Institute of Construction Technology
(KICT).
Case 1 x11 x12 ⋯ x1j ⋯ x1m C1
2 x21 x22 ⋯ x2j ⋯ x2m C2
⋮ ⋮ ⋮ ⋱ ⋮ ⋱ ⋮ ⋮
i xi1 xi2 ⋯ xij ⋯ xim Ci
3.2.2. Euclidean distance based similarity measurement
⋮ ⋮ ⋮ ⋱ ⋮ ⋱ ⋮ ⋮ Similarity is defined as a quantity that denotes the strength of a
n xn1 xn2 ⋯ xnj ⋯ xnm Cn relationship between two features or two objects. The purpose of a si-
Project 0 x01 x02 ⋯ x0j ⋯ x0m C0 milarity analysis is to compare two lists of components and compute a
Weight w1 w2 wj wm
⋯ ⋯
single number that represents their evaluation [53]. In regard to CBR, a
Note: (X1) number of households, (X2) gross floor area, (X3) number of unit similarity measurement is a fundamental component that helps in
floor households, (X4) number of elevators, (X5) number of floors, (X6) number problem solving since the basic idea of the cost-based argument in cost
of pilotis with household scale, (X7) number of households of unit floor per estimation is the hypothesis that similar problems have similar solu-
elevator, (X8) height between stories, (X9) depth of pit, (X10) roof type, (X11) tions [30]. Similarity measurements are used in the retrieval of similar
hallway type, (X12) structure type. cases from the case-based approach. There are two primary retrieval
5
J. Ahn, et al. Automation in Construction 119 (2020) 103329
approaches in CBR. The first approach deals with the measure of case Table 3
similarity by the computation of distances between two cases, whereas Attribute weights.
the other approach operates mostly with a method of representing or Rank Attribute Weights Cumulative
indexing the structures. The latter method is more suitable for text-
based case applications [30]. 1 X2 0.361 0.361
Euclidean space is the most used type of distance measurement 2 X5 0.292 0.653
3 X3 0.176 0.829
method and is based on the location of objects. In this method, a dis-
4 X12 0.061 0.890
tance is computed as the square root of the summation of squares of the 5 X7 0.041 0.931
numerical differences between two analogous objects [54]. The stan- 6 X11 0.024 0.955
dard Euclidean distance is the most fundamental procedure for de- 7 X1 0.019 0.974
8 X6 0.007 0.981
scribing the relationship between the two cases, which constitutes the
9 X10 0.007 0.988
neighboring figures of an arbitrary case [42]. In CBR, Euclidean space is 10 X9 0.006 0.994
used to represent more complex symbolic representations such as as- 11 X4 0.004 0.998
sumptions [6]. Most importantly, it is possible to define the weighted 12 X8 0.002 1.000
Euclidean distance in form of an equation.
Note: (X1) number of households, (X2) gross floor area, (X3) number of unit
Euclidean space provides an acceptable similarity measure since the
floor households, (X4) number of elevators, (X5) number of floors, (X6) number
concept of invariance is considered [53]. Therefore, we used weighted
of pilotis with household scale, (X7) number of households of unit floor per
Euclidean distance-based similarity measurements and the equations
elevator, (X8) height between stories, (X9) depth of pit, (X10) roof type, (X11)
used by Ahn et al. [8] (Eq. (6).). Eq. (6) denotes the similarity between hallway type, (X12) structure type.
the new case (case 0) and the retrieved case (case i) for the jth attribute.
j C1 X11 X1i 1 D1
SIMiE0 = 1 WDISi0 = 1 wj (x ij x 0j )2 min Dn 2 , s. t. =
j (6) Cj Xj1 Xji Dj
n=1 i (7)
th
where Cj denotes the cost of the j case project, ωi represents the
weight value of the ith attribute, and Xi denotes the ith attribute value of
3.2.3. Genetic algorithm-based attribute weight assignment the jth case.
Genetic algorithms offer an approach for learning methods using the
generation of successor hypotheses through processes such as iterative 4. Model validation
mutation and crossover [55]. Biological evolution processes inspired
this generic procedure. Genetic algorithms are been considered a valid 4.1. Validation process and methods
way to solve problems that require competent and efficient searching
[56]. Genetic algorithms are mostly applied in business, science, and Fig. 3 shows the experiment design to examine the impacts of
engineering and are efficient for making improvements as they make normalization methods applied to the CBR cost model. The base cases
computations simple. They provide a ranking criterion for potential were normalized by five different normalization methods and used to
hypothesis that are referred to as the hypothesis fitness function, so retrieve similar cases of k-NN (nearest neighbor) using Euclidean dis-
they are useful in weighting all the members of a given population [57]. tance-based similarity measures and genetic algorithms as an attribute
Genetic algorithms are fundamental in CBR cost estimation. It is weight assigning method. The k-nearest neighbor principle is an ap-
possible to optimize the attribute weight values. Most importantly, proach to compare the effectiveness of different models using some
genetic algorithms are used to examine and identify a space of candi- specified criterion, such as accuracy estimation [59]. The principle in-
date solutions and isolate the best one [58]. We used the hypothesis volves using a distance measure to locate the k-nearest cases with re-
fitness function and attribute weights (Table 3) proposed by Ji et al. [6] spect to the current input case [60].
to search for the optimal value of ωi to minimize the sum of the square Next, a retrieval case is selected. This is the class selected from the
root of the distance: majority of the k cases of the entire population. The retrieved cases of 1,
6
J. Ahn, et al. Automation in Construction 119 (2020) 103329
DB DB DB DB DB
• Mean Absolute Error Rate (MAER)
2, 5, and 10-NN were reused to estimate the average costs. Based on the housings was used to validate the CBR cost models. In statistics, there
retrieved cases of k-NN, the performance of the CBR cost estimation are different techniques to validate models and assess the possibility of
models was evaluated and compared in terms of accuracy and stability. data analysis to generalize and obtain an independent set of data. Cross-
The comparisons of MAER, MSD, MAD, and SD of the cost estimation validation is one of the most widely used models in predictions and
results were used as performance indicators and are defined in Eqs. estimations [28]. LOOCV is a type of k-fold cross validation, and k is
(8)–(11). equal to the number of data points in a given set (N). The model is
n evaluated using the average error calculated, thus making it easy to
1 ci ci make predictions using a broad range of datasets. Compared to calcu-
MAER (%) = × 100, ci : estimated or hypothetical cost
i=1
n ci lating a typical residual error, computing the LOOCV error is simpler
(8) and faster [28].
To examine the appropriateness of normalization methods, Kernel
MAER is a ratio used to quantify how close the predictions or esti-
density estimation (KDE) is performed. KDE is a way to evaluate a
mates are to the target data. The MAER method is a simple way to
probability density function. KDE is a non-parametric technique and is
evaluate the accuracy of single sequences and is easy to comprehend
used mostly for random variables. KDE is an essential tool in data
and calculate. MAER is scale dependent and cannot be interrelated
smoothening, which enables population inference to be performed on
across series [8,28].
any fixed data sample [61]. Kernel density estimators are related to
n
1 histograms, but they are superior. KDE is a preferred technique because
MSD (%) = (c i ci )2 , ci : estimated or hypothetical cost
n it alleviates estimation problems encountered when using histograms.
i=1 (9)
The results using histograms are not smooth in comparison to those of
MSD can be expressed as the second moment of a given set of ob- KDE. Histograms depend on the end points and width of the data points,
servations made from an arbitrary origin. If the stated source represents but with kernel density estimators, the resulting curves are smooth with
the mean of the set, then the equivalent of the variance computed from no end points and greatly depend on the bandwidths [62].
the set of observations is the MSD [28].
n 4.2. Results and discussions
1
MAD (%) = |ci ci |, ci : estimated or hypothetical cost
n (10)
i=1 As shown in Fig. 4 and Table 4, MAER displayed some distinctive
MAD is simply the estimated average distance for all the elements in characteristics that arose from the different normalization methods.
the dataset from the cumulative mean computed from the same dataset. MAER is calculated using the negative values denoted by the z-score
MAD is obtained in a three-step approach: calculation of the total mean, method, whereas the other normalization methods use positive values.
finding the absolute deviation, and computing the mean deviation using The ratio standardization method had the lowest MAER values when
the absolute deviation figures obtained [28]. SD is a measure used to k = 1, 2, 5, and 10 (refer to Appendices 1–4 for the error rate of cost
determine the level of variation in a group of data values. SD is based on estimation of 1, 2, 5, 10-NN). Lower values of MAER indicate a higher
the mean. If the obtained SD is closer to 0, it shows that the datasets are accuracy of the CBR model, and the ratio standardization-based CBR
closer to the mean. A dataset whose values are dispersed over a wider cost model appeared to be the most accurate.
range of values generates a high SD [28]. Other patterns of results were obtained from the MSD (Fig. 4 and
Table 5). The results using the z-score method were plotted on the
1
n
positive plane, while the other values were plotted immediately above
= (x i µ )2 , : standard deviation
N the 0 point. The interval, Gaussian, logistic, and ratio normalization-
i=1 (11)
based CBR cost models had very low MSD, which means that these
The LOOCV method based on the base cases of 100 multi-family methods had highly precise cost estimation.
7
J. Ahn, et al. Automation in Construction 119 (2020) 103329
MAER MSD
Interval Gaussian Logisc Rao Interval Gaussian Logisc Rao
0.450 0.016
0.400 0.014
0.350 0.012
0.300
0.010
0.250
0.008
0.200
0.006
0.150
0.100 0.004
0.050 0.002
0.000 0.000
K=1 K=2 K=5 K=10 K=1 K=2 K=5 K=10
MAD SD
Interval Gaussian Logisc Rao Interval Gaussian Logisc Rao
0.090 0.140
0.080 0.120
0.070
0.100
0.060
0.050 0.080
0.040 0.060
0.030
0.040
0.020
0.010 0.020
0.000 0.000
K=1 K=2 K=5 K=10 K=1 K=2 K=5 K=10
Table 4 the z-score model were highest and ranged from approximately 0.286
Results of mean absolute error ratio (MAER). to 0.298 for all values of k. The MAD values resulting from the other
k=1 k=2 k=5 k = 10
normalization methods ranged from 0.043 to 0.078 for 1, 2, 5, and 10-
NN. The lowest MAD values were obtained using the ratio normal-
Interval 0.163 0.177 0.211 0.259 ization method. Overall, the normalization methods other than the z-
Gaussian 0.308 0.313 0.355 0.425 score method had relatively low MAD values, which means that these
Z-score −0.497 −0.340 −0.080 −0.183
methods had very narrow ranges of estimate errors.
Logistic 0.141 0.142 0.150 0.157
Ratio 0.076 0.082 0.089 0.095 The results obtained from the computation of the SD were relatively
stable when using the Gaussian, interval, logistic, and the ratio nor-
malization-based CBR models and ranged from 0.071 to 0.117 for all
Table 5 values of k (Fig. 4 and Table 7). However, the z-score model had re-
Results of mean squared deviation (MSD). latively high values of SD. The results from the z-score model suggested
k=1 k=2 k=5 k = 10
that the dataset values were dispersed over a wider range. In contrast,
the other models denoted that the datasets were closer to the mean,
Interval 0.009 0.009 0.010 0.009 which represented more stable CBR-based cost estimate models.
Gaussian 0.014 0.012 0.012 0.013 As illustrated in Fig. 5, density estimations were conducted using
Z-score 0.306 0.220 0.225 0.210
the different methods. In regard to the original score, the methods
Logistic 0.009 0.008 0.007 0.007
Ratio 0.006 0.005 0.006 0.005 displayed different graph trends. For example, when using the Gauss
model, the density obtained was under-smoothed because the band-
width was too small with six modes. Similarly, in the case of the logistic
Table 6 form, the bandwidth was increased, so the estimate was flatter with
Results of mean absolute deviation (MAD). three modes. This situation was considered overestimated since the
k=1 k=2 k=5 k = 10
bandwidth was too large, and most of the data structure was obscured
[61].
Interval 0.051 0.057 0.061 0.067
Gaussian 0.072 0.073 0.076 0.078
Z-score 0.298 0.293 0.297 0.286 Table 7
Logistic 0.055 0.057 0.058 0.058
Results of standard deviation (SD).
Ratio 0.043 0.046 0.049 0.051
k=1 k=2 k=5 k = 10
8
J. Ahn, et al. Automation in Construction 119 (2020) 103329
However, graphs using the ratio norm and interval norm provided highest stability.
optimally smooth kernel estimates with fewer modes. In this case, the • The experiment results confirmed that the MAER, MSD, MAD, and
value of the bandwidth minimized the error between the estimated SD can vary according to normalization methods.
density and the actual density [61,63]. The bandwidth of the kernel • In terms of the appropriateness of normalization methods, the kernel
density estimation acts as a smoothing parameter that has a strong ef- density estimation results demonstrated that interval and ratio
fect on the estimate results and controls the adjustment between bias normalization can be an appropriate method.
and variance in the result. A large bandwidth might cause a very • The ratio normalization-based CBR cost model was superior to its
smooth density distribution (i.e., high-bias), whereas a small bandwidth model counterparts for multi-family housing projects.
might bring about an unsmooth density distribution (i.e., high-var-
iance). There is ongoing dispute about selecting the optimal bandwidth Since the comparative experiments were conducted using public
[64–66]. multi-family housings, various types of buildings need to be used as
To summarize, in terms of the estimate accuracy, ratio normal- base cases for generalization of the models. The comparative experi-
ization yielded the lowest MAER, MSD, and MAD. Thus, the ratio ments with the same hypothesis also need to be performed using var-
normalization method is considered to be more accurate in retrieving ious similarity measurements and attribute weight assignment methods.
similar cases. Regarding estimate stability, we also obtained a lower SD Also, a silhouette method, a graphical display for cluster analysis, needs
for the ratio standardization than the other normalization methods. to be further conducted to evaluate clustering validity and to select
Consequently, the results verified the hypothesis that CBR cost esti- optimal number of clusters when applying k-NN [67].
mation accuracy and stability can be different and can be improved by Nevertheless, this research can be viewed as an accurate and ap-
employing statistically accurate normalization methods. In terms of the propriate approach in dealing with a data-mining affect the degree of
appropriate selection of normalization methods, the kernel density es- reliability of models' estimation results. The results are expected to
timation results showed that interval and ratio normalization can be contribute to the enhancement of accuracy, stability, and appropriate-
more appropriate. Thus, this experiment showed that a CBR cost model ness of CBR-based cost estimation by applying the proposed perfor-
has appropriate normalization methods. mance evaluation criteria for normalization method selection.
Eventually, this improved CBR cost model can support decision-mak-
5. Conclusions ings based on data-preprocessed case-base in practice. The suggested
method could also be applied to the other CBR areas, such as energy
Case-based reasoning can be an effective approach to achieve a high prediction, noise management, bid decision-making, scheduling, and
level of accuracy for construction cost estimation in the early design other data-oriented methods, such as regression analysis and artificial
stages. As CBR relies on the past historical data, it is important to neural networks.
perform data preprocessing to obtain high quality of base cases. The
performance of preprocessing of raw data has a positive effect on the
estimation results of a CBR cost model [2,35]. The highlights of the Declaration of competing interest
validation results can be summarized as follows:
The authors declare that they have no known competing financial
• In terms of the estimate accuracy, ratio normalization yielded the interests or personal relationships that could have appeared to influ-
highest accuracy in terms of MAER, MSD, and MAD. ence the work reported in this paper.
• In terms of estimate stability, ratio normalization also obtained the
9
J. Ahn, et al. Automation in Construction 119 (2020) 103329
Acknowledgment Seoul National University. This work was also supported by the
National Research Foundation of Korea (NRF) grant funded by the
This research was supported by the Institute of Construction and Korea government (MSIT) (No. 2019R1F1A1058866).
Environmental Engineering and the Institute of Engineering Research at
Appendix 1. MAER cost estimation results of different normalization method when 1-NN is used for each of 99 cases
10
J. Ahn, et al. Automation in Construction 119 (2020) 103329
Appendix 2. MAER cost estimation results of different normalization method when 2-NN is used for each of 99 cases
Appendix 3. MAER cost estimation results of different normalization method when 5-NN is used for each of 99 cases
11
J. Ahn, et al. Automation in Construction 119 (2020) 103329
Appendix 4. MAER cost estimation results of different normalization method when 10-NN is used for each of 99 cases
References [12] B. Ozorhon, L. Dikmen, M. Birgonul, Case-based reasoning model for international
market selection, J. Constr. Eng. Manag. 132 (9) (2006) 940–948, https://doi.org/
10.1061/(ASCE)0733-9364(2006)132:9(940).
[1] S.Z. Doğan, D. Arditi, H.M. Günadın, Determining attribute weights in a CBR model [13] R.C. Schank, A. Kass, C.K. Riesbeck, Inside Case-Based Explanation, Lawrence
for early cost prediction of structural system, J. Constr. Eng. Manag. 132 (10) Erlbaum Associates, Hillsdale, NJ, 9780805810295, 1994.
(2006) 1092–1098, https://doi.org/10.1061/(ASCE)0733-9364(2006) [14] S.B. Kotsiantis, D. Kanellopoulos, P.E. Pintelas, Data preprocessing for supervised
132:10(1092). learning, Int. J. Comput. Sci. 1 (2) (2006) 111–117 (ISSN 1306-4428).
[2] S.H. Ji, M. Park, H.S. Lee, Case adaptation method of case-based reasoning for [15] GAO, GAO Cost Estimating and Assessment Guide: Best Practices for Developing
construction cost estimation in Korea, J. Constr. Eng. Manag. 138 (1) (2012) 43–52, and Managing Capital Program Costs, GAO-09-3SP, Washington, D.C.,
https://doi.org/10.1061/(ASCE)CO.1943-7862.0000409. 9781437917024, 2009.
[3] Z.D. Sevgi, D. Arditi, G. Murat, Using decision trees for determining attribute [16] ISPA, Parametric Estimating Handbook, 4th edition, (2008) 0-9720204-7-
weights in a case-based model of early cost prediction, J. Constr. Eng. Manag. 134 0Vienna, VA.
(2) (2008) 146–152, https://doi.org/10.1061/(ASCE)0733-9364(2008)134:2(146). [17] S.H. Ji, J. Ahn, E.B. Lee, Y.G. Kim, Learning method for knowledge retention in CBR
[4] M.H. Kim, Cost Planning in Architecture, Kimoondang, Seoul, 9788970867632, cost models, Autom. Constr. 96 (2018) 65–74, https://doi.org/10.1016/j.autcon.
2005. 2018.08.019.
[5] R. Kirkham, Ferry and Brandon's Cost Planning of Buildings, 9th ed., Wiley- [18] J.B. Stiff, P.A. Mongeau, Persuasive Communication, Guilford Press, New York,
Blackwell, 978-1-119-96862-7, 2014. 9781572307025, 2003.
[6] S.H. Ji, M. Park, H.S. Lee, Cost estimation model for building projects using case- [19] J. Du, J. Bormann, Improved similarity measure in case-based reasoning with global
based reasoning, Can. J. Civ. Eng. 38 (5) (2011) 570–581, https://doi.org/10.1139/ sensitivity analysis: an example of construction quantity estimating, J. Comput. Civ.
l11-016. Eng. 28 (6) (2014) 04014020, , https://doi.org/10.1061/(ASCE)CP.1943-5487.
[7] J. Ahn, S.H. Ji, M. Park, H.S. Lee, S. Kim, S.W. Suh, The attribute impact concept: 0000267.
applications in case-based reasoning and parametric cost estimation, Autom. [20] O.S. Idowu, K.C. Lam, Web-based application for predesign cost planning of vertical
Constr. 43 (2014) 195–203, https://doi.org/10.1016/j.autcon.2014.03.011. building envelopes, Automation Construct. 106 (2019), https://doi.org/10.1016/j.
[8] J. Ahn, M. Park, H.S. Lee, S.J. Ahn, S.H. Ji, K.S. Song, B.S. Son, Covariance effect autcon.2019.102909.
analysis of similarity measurement methods for early construction cost estimation [21] L.A. Shalabi, Z. Shaaban, B. Kasasbeh, Data mining: a preprocessing engine, J.
using case-based reasoning, Autom. Constr. 81 (2017) 254–266, https://doi.org/10. Comput. Sci. 2 (9) (2006) 735–739 (ISSN 1549-3636).
1016/j.autcon.2017.04.009. [22] A. Attig, P. Perner, The problem of normalization and a normalized similarity
[9] R.Z. Jin, K.M. Cho, C.T. Hyun, M.J. Son, MRA-based revised CBR model for cost measure by online data, Trans. Case-Based Reason. 4 (1) (2011) 3–17 (ISBN 978-3-
prediction in the early stage of construction projects, Expert Syst. Appl. 39 (2012) 942952-09-5).
5214–5222, https://doi.org/10.1016/j.eswa.2011.11.018. [23] C.W. Koo, T.H. Hong, C.T. Hyun, K.J. Koo, A CBR-based hybrid model for predicting
[10] K. Kim, K. Kim, Preliminary cost estimation model using case-based reasoning and a construction duration and cost based on project characteristics in multi-family
genetic algorithms, J. Comput. Civ. Eng. 24 (6) (2010) 499–505, https://doi.org/ housing projects, Can. J. Civ. Eng. 37 (2010) 739–752, https://doi.org/10.1139/
10.1061/(ASCE)CP.1943-5487.0000054. L10-007.
[11] C.W. Koo, T.H. Hong, C.T. Hyun, The development of a construction cost prediction [24] S.J. Ahn, Statistical Decision Theory, Freeacademy, 978-89-7338-633-8, 2007.
model with improved prediction capacity using the advanced CBR approach, Expert [25] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kauffman, San
Syst. Appl. 38 (7) (2011) 8597–8606, https://doi.org/10.1016/j.eswa.2011.01.063. Francisco, 1-55860-901-6, 2006.
12
J. Ahn, et al. Automation in Construction 119 (2020) 103329
[26] H. Li, Fuzzy Neural Intelligent Systems: Mathematical Foundation and the [48] R. Bod, J. Hay, S. Jannedy, Probabilistic Linguistics, MIT Press, Cambridge,
Applications in Engineering, CRC Press, 9781420057997, 2000. Massachusetts, 9780262025362, 2003.
[27] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Advance in [49] N.A. Gershenfeld, The Nature of Mathematical Modeling, Cambridge University
Knowledge Discovery and Data Mining, AAAI Press, 978-0262560979, 1996. Press, Cambridge, UK, 9780521570954, 1999.
[28] K. Black, Business Statistics: For Contemporary Decision Making, John Wiley & [50] J.S. Chou, Web-based CBR system applied to early cost budgeting for pavement
Sons, 978-1118749647, 2014. maintenance project, Expert Syst. Applic. 39 (2009) 2947–2960, https://doi.org/
[29] J. Liu, H. Li, M. Skitmore, Y. Zhang, Experience mining based on case-based rea- 10.1016/j.eswa.2008.01.025.
soning for dispute settlement of international construction projects, Autom. Constr. [51] A. Schirmer, Case-based reasoning and improved adaptive search for project
97 (2019) 181–191, https://doi.org/10.1016/j.autcon.2018.11.006. scheduling, Nav. Res. Logist. 47 (3) (2000) 201–222, https://doi.org/10.1002/
[30] A. Aamodt, E. Plaza, Case-based reasoning: foundational issues, methodological (SICI)1520-6750(200004)47:3<201::AID-NAV2>3.0.CO;2-L.
variations and system approaches, AI Commun. 7 (1) (1994) 39–59, https://doi. [52] B. Melnyk, D. Morrison-Beedy, Intervention Research: Designing, Conducting,
org/10.3233/AIC-1994-7104. Analyzing, and Funding, Springer Publishing Company, 9780826109576, 2012.
[31] D. Leake, Case-Based Reasoning: Experience, Lessons, and Future Directions, AAAI [53] I. Ragnemalm, The Euclidean distance transform in arbitrary dimensions, Pattern
Press/MIT Press, Menlo Park, NJ, 0-262-62110-X, 1996. Recogn. Lett. 14 (11) (1993) 883–888, https://doi.org/10.1016/0167-8655(93)
[32] C. Globig, K.P. Jantke, S. Lange, Y. Sakakibara, On case-based learnability of lan- 90152-4.
guages, N. Gener. Comput. 15 (1) (1997) 59–83, https://doi.org/10.1007/ [54] E. Deza, M.M. Deza, Encyclopedia of Distances, Springer, 2009, https://doi.org/10.
BF03037560. 1007/978-3-642-00234-2_1.
[33] D.W. Aha, L.A. Breslow, H. Muñoz-Avila, Conversational case-based reasoning, [55] K.S. Shin, I. Han, Case-based reasoning supported by genetic algorithms for cor-
Appl. Intell. 14 (1) (2001) 9–32, https://doi.org/10.1023/A:1008346807097. porate bond rating, Expert Syst. Appl. 16 (2) (1999) 85–95, https://doi.org/10.
[34] V. Eisenstadt, K.D. Althoff, Overview of the 4R CBR Cycle Modification, http:// 1016/S0957-4174(98)00063-3.
ceur-ws.org/Vol-2454/paper_68.pdf, (2019). [56] J. Jarmulak, S. Craw, R. Rowe, Genetic algorithms to optimise CBR retrieval, Adv.
[35] S.H. Ji, M. Park, H.S. Lee, Data preprocessing-based parametric cost model for Case-Based Reason. 1898 (2000) 136–147, https://doi.org/10.1007/3-540-44527-
building projects: with case studies of Korean construction projects, J. Constr. Eng. 7_13.
Manag. 136 (8) (2010) 844–853, https://doi.org/10.1061/(ASCE)CO.1943-7862. [57] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning,
0000197. Addison-Wesley, 0201157675, 1989.
[36] H. Liu, H. Metoda, Instance Selection and Constructive Data Mining, Kluwer, [58] T.M. Mitchell, Machine Learning, Mcgraw Hill, 9780070428072, 1997.
Boston, MA, 978-1-4757-3359-4, 2001. [59] D.B. Torres, C.P. Rodriguez, D.H. Peluffo-Ordóñez, X.B. Valencia, J. Revelo-
[37] D. Pyle, Data Preparation for Data Mining, Morgan Kaufmann Publishers, Los Altos, Fuelagán, M.A. Becerra, A.E. Castro-Ospina, L.L. Lorente-Leyva, Adaptation and
CA, 978-1558605299, 1999. recovery stages for case-based reasoning systems using Bayesian estimation and
[38] S.K. Pal, S.C.K. Shiu, Foundations of Soft Case-Based Reasoning, Wiley Interscience, density estimation with nearest neighbors, Intelligent Information and Database
Hoboken, NJ, 978-0471086352, 2004. Systems, Springer, Cham, 2019.
[39] B.S. Kim, T.H. Hong, Revised case-based reasoning model development based on [60] K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, When is “nearest neighbor”
multiple regression analysis for railroad bridge construction, J. Constr. Eng. Manag. meaningful? Database Theory-ICDT’99 (1999) 217–235 (ISBN 3-540-65452-6).
138 (1) (2012) 154–162, https://doi.org/10.1061/(ASCE)CO.1943-7862.0000393. [61] S.J. Sheather, M.C. Jones, A reliable data-based bandwidth selection method for
[40] S.H. Ji, M. Park, H.S. Lee, J. Ahn, N. Kim, B. Son, Military facility cost estimation kernel density estimation, J. R. Stat. Soc. 53 (3) (1991) 683–690 http://www.jstor.
system (MilFaCE) using case-based reasoning in Korea, J. Comput. Civ. Eng. 25 (3) org/stable/2345597.
(2011) 218–231, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000082. [62] Z.I. Botev, J.F. Grotowski, D.P. Kroese, Kernel density estimation via diffusion, Ann.
[41] S.H. An, G.H. Kim, K.I. Kang, A case-based reasoning cost-estimating model using Stat. 38 (5) (2010) 2916–2957, https://doi.org/10.1214/10-AOS799.
experience by analytic hierarchy process, Build. Environ. 42 (2007) 2573–2579, [63] M.P. Wand, M.C. Jones, Kernel Smoothing, CRC Press, London, 9780412552700,
https://doi.org/10.1016/j.buildenv.2006.06.007. 1994.
[42] N.J. Yau, J.B. Yang, Case-based reasoning in construction management, Comput. [64] M.C. Jones, J.S. Marron, S.J. Sheather, A brief survey of bandwidth selection for
Aided Civ. Infrastruct. Eng. 13 (1998) 143–150, https://doi.org/10.1111/0885- density estimation, J. Am. Stat. Assoc. 91 (433) (1996) 401–407 http://www.jstor.
9507.00094. org/stable/2291420.
[43] A. Neumaier, Vienna Proposal for Interval Standardization, Universit Wien [65] B.U. Park, J.S. Marron, Comparison of data-driven bandwidth selectors, J. Am. Stat.
Nordbergstr, Wien, 2008http://www.mat.univie.ac.at/~neum/papers.html#1788. Assoc. 85 (409) (1990) 66–72, https://doi.org/10.1080/01621459.1990.
[44] J. Havil, Gamma: Exploring Euler's Constant, Princeton University Press, Princeton, 10475307.
NJ, 9781400832538, 2010. [66] X. Xu, Z. Yan, S. Xu, Estimating wind speed probability distribution by diffusion-
[45] J.K. Patel, C.B. Read, Handbook of the Normal Distribution, 2nd eds., CRC Press, based kernel density method, Electr. Power Syst. Res. 121 (2015) 28–37, https://
New York, 9780824793425, 1996. doi.org/10.1016/j.epsr.2014.11.029.
[46] E. Kreyszig, Advanced Engineering Mathematics, 4th ed., Wiley, 978-0471021407, [67] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of
1979. cluster analysis, Comput. Appl. Math. 20 (1987) 53–65, https://doi.org/10.1016/
[47] S.R. Carroll, D.J. Carroll, Statistics Made Simple for School Leaders: Data-Driven 0377-0427(87)90125-7.
Decision Making, R&L Education, New York, 978-0810843226, 2002.
13