A Performance Comparison of Multi-Objective Optimization-Based
A Performance Comparison of Multi-Objective Optimization-Based
a r t i c l e i n f o a b s t r a c t
Article history: Building Energy Model (BEM) calibration is the process of reducing the gap between the simulation out-
Received 3 December 2019 puts and the actual measured data at the same conditions. The literature shows that BEM calibration
Revised 23 January 2020
approaches could lead to a significant error in the model inputs even the calibration has been conducted
Accepted 10 March 2020
successfully based on the model outputs (i.e., error functions). This paper compares the performance (i.e.,
Available online 14 March 2020
accuracy and robustness) of 60 optimization-based calibration approaches. The approaches have differ-
Keywords: ent error functions (individual or combination of NMBE, NME, CV(RMSE), R2 ,Cχ 2 ) to be minimized and
Building energy model different outputs (heating demand, cooling demand, and\or indoor temperature for weeks, months, or a
Calibration approach year) to be calibrated. The BESTEST600, predefined by ANSI/ASHRAE 140–2001, is selected as a white-box
Multi-objective optimization BEM case study for conducting the comparison test. Having the case study inputs and outputs with-
Genetic algorithms out uncertainty gives a trustworthy comparison between the tested approaches. EnergyPlus is used for
Pareto curve
conducting the simulation while the Multi-objective optimization algorithm (a variant of NSGA-II) from
Calibration variables
Simulation outputs
MATLAB is used to minimize the error function(s) associated to each calibration approach. Among the 60
Error functions calibration approaches, eight proved to be the most accurate in predicting all calibration variables with
Accuracy percentage errors lower than 10%. CV(RMSE) was found to be the most robust error function under dif-
Robustness ferent calibration datasets. The results also show that the current standard calibration requirements are
not proper as stopping criteria for automatic optimization-based calibration.
© 2020 Elsevier B.V. All rights reserved.
1. Introduction before 1980 [2]. Building simulation tools are a solution to identify
ECMs and quantify their possible energy savings. However, there
The building sector consumes 20.1% of the total energy de- are significant discrepancies between simulation results and the
livered worldwide [1]. Energy Conservation Measures (ECMs) and actual measured consumption of real buildings [3]. Deviation be-
building refurbishment can contribute to reduce the building en- tween building design and construction, deterioration of the build-
ergy demand. In Europe, the potential of saving energy by renova- ing thermal performance or the randomness of the occupancy be-
tion is huge as two-thirds of the European building stock was built havior can cause the discrepancies [4].
Calibration is defined by American Society of Heating, Refriger-
ating and Air Conditioning Engineers (ASHRAE) as the process of
Abbreviations: BEM, Building Energy Model; ECM, Energy Conservation Mea- reducing the uncertainty of a model by comparing the predicted
sure; ASHRAE, American Society of Heating, Refrigerating and Air-conditioning En- output of the model under a specific set of conditions to the actual
gineers; IPMVP, International Performance Measurement and Verification Protocol;
measured data for the same set of conditions [5]. The calibration
UBEM, Urban Building Energy Model; ANN, Artificial Neural Networks; CV(RMSE),
Coefficient of Variation of the Root Mean Square Error; NMBE, Normalized Mean allows approximating the results of the energy models to the ex-
Bias Error; MBE, Mean Bias Error; BESTEST-EX, Building Simulation Test for Exist- perimental data, improving the accuracy of the simulations [6]. The
ing Homes; MOO, Multi-Objective Optimization; NSGA-II, Non-Sorting Genetic Al- International Performance Measurement and Verification Protocol
gorithm II; NME, Normalized Mean Error; DI, Discrepancy Indicator; PE, Percentage (IPMVP) and ASHRAE recommend the use of calibrated building
Error; MPR, Maximum Performance Regret; PR, Performance Regret.
∗
models to evaluate ECMs and building renovation measures [7,8].
Corresponding author.
E-mail addresses: [email protected] (S. Martínez), [email protected] (P.
Coakley et al. [9] had conducted a review of methods to match
Eguía), [email protected] (E. Granada), [email protected] (A. Moazami), building energy simulation models to measured data. This review
[email protected] (M. Hamdy).
https://doi.org/10.1016/j.enbuild.2020.109942
0378-7788/© 2020 Elsevier B.V. All rights reserved.
2 S. Martínez, P. Eguía and E. Granada et al. / Energy & Buildings 216 (2020) 109942
states that the calibrations methods can be more generally classi- of Determination (R2 ) as objective functions for calibration. Other
fied as either manual or automated. The manual methods rely on researches like Li et al. [4] and Monetti et al. [36] use a cost func-
the iterative pragmatic intervention of the modeller and they cover tion that combines Mean Bias Error (MBE) and CV(RMSE) [4,36].
the following methods: Other factor that varies among optimization-based calibrations is
the period that can be some days [4,35,43], months [34,36,39] or
• Characterization techniques that consist in collecting physical
a year [37,38,40]. The most used calibration objectives are temper-
and operational features that affect energy performance of the
atures [34,35,39] and energy consumption [36,37,40,43]. Therefore,
building being modelled like detailed energy audits [10,11]. This
optimization-based calibration involves the choice of a series of
category also involves the use of expert knowledge or databases
factors that may affect the appearance of the results. These deci-
[12,13], the use of high resolution data [14] or short term en-
sions have to do with the error functions selected, the simulation
ergy monitoring [15] for characterization and identification of
periods and the basis of the calibration.
the energy use profiles.
The suitability of the calibration results can be evaluated ac-
• Advanced graphical methods consist in the use of graphical sta-
cording to some standards like the proposed by ASHRAE Guideline
tistical indexes and representations to compare the differences
14 or IPMVP [7,8] based on the use of the CV(RMSE) and NMBE
among simulated and measured building data [16] or to iden-
of the simulated outputs with respect to measured data. Neverthe-
tify faulty parameters [17].
less, the many degrees of freedom of calibration may provide so-
• Procedural extensions methods use standard processes to im-
lutions that produce a good agreement with measured data even
prove and assist the building model simulation. Some of these
though individual parameters are incorrectly defined [13,44]. The
works make changes according to a procedural approach [18].
uncertainty involved in the model inputs that define real buildings
Sensitivity analysis helps to identify the parameters that have
makes impossible to assess the effectiveness of a calibration ap-
a great influence on the energy performance of a building [19].
proach by measuring the input-side error. A technique for testing
Uncertainty quantification and identification of the error in the
calibration approaches in both model output and model input er-
model input parameters is used to determine the effect on the
rors is described in the Building Energy Simulation Test for Exist-
outcome of the simulation [20,21] .
ing Homes (BESTEST-EX) presented by Judkoff [45]. The technique
• In the automatic methods, a software program is in charge of
allows testing calibration methods accuracy under ideal laboratory
performing the successive simulations or calculation needed for
conditions, providing a true benchmark for comparing all calibra-
the calibration process. The main automatic methods are:
tion methods [46].
• Optimization techniques that cover procedures to optimize pre-
Garret and New [46] studied if alternative metrics to the ones
diction performance of building models. This category includes
proposed by ASHRAE Guideline 14 showed a correlation between
mathematical techniques that work minimizing an objective
input-side and output-side error. They used BESTEST-EX reference
function between simulated and measured data to predict the
building as base model where certain input variables were varied
model variables under uncertainty [22–24]. Bayesian calibration
within some specified bounds. They did not find any significant
is also included in this group and applies Bayes’ law to obtain
correlations and stated that the metrics proposed by ASHRAE are
a probabilistic distribution for the value of the calibration vari-
as good as any of the other metrics tested. In the work carried
ables [25,26]. Bayesian calibration offers the advantage of nat-
out by Vogt et al. [47] they tested several combinations of statis-
urally accounting for uncertainty in model prediction through
tical indices for calibration. Their work provides recommendations
the use of prior input distributions, [9] avoiding over fitted pa-
of which statistical indices can improve the simulation outputs tak-
rameters.
ing into account deterioration of other error metrics.
• Pattern-recognition based calibration between simulated and
In this work, we test sixty different optimization-based calibra-
measured data is proposed by Sun et al. [27] instead of math-
tion approaches, where error functions (NMBE, NME, CV(RMSE),
ematical optimization techniques that can match numerically
R2 , Cχ 2 ), calibration objectives (energy demands and/or temper-
the model to measured data but not to the actual physics of
atures), and calibration period (week, month or year) are vari-
the building. More recently, Y. Chen and T. Hong [28] presented
ant. The aim is to determine the most accurate and robust
a novel automated technique for a rapid calibration of Urban
optimization-based approaches for calibrating BEMs. The nov-
Building Energy Models (UBEMs). The method is based on the
elty of the work is testing the calibration performance of the
use of energy performance databases derived from simulations
optimization-based approaches by measuring the error made in
of models created by Monte Carlo sampling of the unknown pa-
the input domain (calibrated variables) while the error made in
rameters.
the output domain (simulation results) is minimized.
• Alternative modeling techniques that cover novel building
A Multi-Objective Optimization (MOO) technique with a genetic
model procedures like black box and Artificial Neural Networks
algorithm is developed in MATLAB for the automatic execution
(ANN) [29–31]. ANNs need an accurate and large set of train-
of the calibration approaches. The case study selected for the as-
ing data for predicting building energy consumption. Metamod-
sessment of the calibration approaches is the BESTEST 600 from
eling [32,33] reduces the computation time of model-based
ANSI/ASHRAE 140–2001 [48], an ideal case developed for testing
optimization by creating an emulator of the original building
the results of different energy simulation software. BESTEST 600
model. Black-box metamodels can be used for calibrating de-
is a fully-specified model where all the values for the model in-
tailed model input variables with respect to normalized ob-
puts and outputs are known and therefore it is possible to provide
served data [33].
a benchmark for comparing the different calibration approaches.
Automated optimizations methods are increasingly being used The simulation of the BESTEST model with the known values of
for building model calibration [34–40] because they show good the calibration variables will produce the true output. The calibra-
performance with few requirements. Among these works, there is tion approaches have to be able to predict the known values for
a variety of optimization strategies. Robertson et al. [38], Hong the calibrated variables while reducing the error between true and
et al. [40] and Lara et al. [41] used the Coefficient of Variation of simulated output.
the Root Mean Square Error (CV(RMSE)) and the Normalized Mean
Bias Error (NMBE) of the simulation outputs in optimization based
calibration for building energy models. Tahmasebi and Mahdavi
[39] and Ramos et al. [42] applied CV(RMSE) and the Coefficient
S. Martínez, P. Eguía and E. Granada et al. / Energy & Buildings 216 (2020) 109942 3
Fig. 1. Holistic diagram describing the followed methodology. The block of calibration approaches definition and the block of optimization-based calibration are explained
in more detail in Fig. 3 and Fig. 4 respectively.
Table 1
Model input variables selected for calibration and their calibration range.
1 NMBE: Normalized Mean Bias Error The coefficient of determination is a measure of the proportion
n of the variance in known data that can be explained by the model.
i=1 yi − yˆi
NMBE (% ) = 100 × n (1) If R2 value is one, it means a perfect adjustment of the model to
i=1 ( yi ) the known data, thus the term (1-R2 ) is minimized when this in-
The NMBE normalizes the sum of the error between the simu- dex is used for optimization based-calibration.
lated data and known data, being a good indicator for evaluating 1 Cχ 2: Standardized Contingency Coefficient
the overall positive or negative bias of a model. However, the main
drawback is that it has cancelation effect. m χ2
Cχ 2 = × (5)
m−1 χ +N2
1 NME: Normalized Mean Error
n
i=1 yi
− yˆi Where,
NME (% ) = 100 × n (2) exp
2
i=1 ( yi ) obs
Nk, − Nk, Nk × N j
χ =
2 3
k=1 3
j=1
j
exp
j
with exp
Nk, j
= (6)
The NME solves the problem of the cancelation effect as the er- Nk, j
N
ror between model data and known data is considered in absolute
The standard contingency coefficient, Cχ 2 , expresses the
value.
strength of correlation between measured data and simulation
1 CV(RMSE): Coefficient of Variation of the Root Mean Square Er- data. It goes from zero, which means no association, to one, which
ror is the maximum association possible. For that reason, the term (1-
n 2 Cχ 2 ) will be minimized in the optimization-based calibration ap-
i=1 yi − yˆi /n proaches that will use this coefficient. To calculate this standard
coefficient, first is needed to calculate the chi-squared coefficient,
CV(RMSE )(% ) = 100 × (3)
y χ 2 in Eq. (6) that measures the statistical dependency between
known data and predicted data from the model. To determine this
CV(RMSE) normalizes the root mean square error between
coefficient, output data intervals, yt = yi – yi-1 , are calculated as
known and simulated data. The root mean square error is normal-
hourly differences both for simulated and known data. There are
ized with the mean value of the known data, ȳ. It doesn’t have any
three types of intervals: (1) increasing, (2) decreasing and (3) con-
cancelation effect as the error is squared.
stant intervals. In Eq. (6), k is the data type interval for simulated
1 R2 : Coefficient of Determination data and j is the data type interval for known data. Nobs and Nexp
⎛ ⎞2 are the observed and expected numbers of a certain occurrence in
all intervals. The observed number is derived from known and sim-
⎜ n ⎟
n× (yi × yˆi )− ni=1 yi × ni=1 yˆi ulated data, whereas the expected number is calculated from the
R2 =⎜ i=1
⎟
⎝ n 2 2
⎠ margins totals [47]. The observed and expected numbers were cal-
n × i=1 yi 2 − ( ni=1 yi ) × n × ni=1 yˆ2i −( ni=1 yˆi ) culated through a contingency table with the simulated and known
data intervals made in MATLAB. Once χ 2 is known, the standard-
(4) ized contingency coefficient is calculated from Eq. (5), where m is
S. Martínez, P. Eguía and E. Granada et al. / Energy & Buildings 216 (2020) 109942 5
Fig. 3. Error functions used and dataset types and sizes to form the calibration approaches. There are 10 error functions combinations and six datasets, forming 60 different
calibration approaches.
Table 2
Identification of the optimization-based calibration approaches (App.) defined by the error function set, data set type and data set
size.
Data set type Heating and cooling demands Heating, cooling demands and temperatures
the number of interval types that is three, and N is the total num- the data set size is a week or a month, January is selected for sim-
ber of data intervals. ulating heating demand and July to simulate the cooling demand.
To calibrate with average interior temperatures, the week, month
2.3.2. Definition of the optimization-based calibration approaches and year are simulated in free floating temperatures conditions.
The proposed calibration approaches consist of ten combina- Therefore, for calibrating heating demand we use a representative
tions of five error functions (presented in subSection 2.3.1) applied week and month of winter and for calibrating cooling demand we
to two data set types and three data-set sizes as shown in Fig. 3. use a representative week and month of summer.
Five of the ten combinations are single-error functions applied to
Approaches 33, 38 and 43 = min(CV (RMSE )Heat . )
different calibration dataset types. The other five are a combina- (8)
and min(CV (RMSE )Cool . ) and min CV (RMSE )Temp.
tion of single-error functions as shown in Table 2. The combination
used for calibration evaluation, CV(RMSE) and NMBE, is tested for
optimization-based calibration. The other four combinations were Approaches
46, 51, 56 =
min CV (RMSE )Heat. + CV (RMSE )Cool. + CV (RMSE )Temp. (9)
selected according to the results of the work done by Vogt et al.
[47] who tested different statistical indices for building model cali- and min NMBEHeat . + NMBECool . + NMBETemp.
bration. There are 60 calibration approaches coming from multiply-
ing 10 error function combinations by six datasets as calculated in
2.4. Automatic multi-objective optimization-based calibration with
Eq. (7). Table 2 presents all the optimization-based calibration ap-
MATLAB
proaches that are tested to compare their calibration performance.
The process of the MOO based calibration developed in MATLAB
Number o f calibration approaches = is shown in Fig. 4. The MOO is developed in four different MAT-
10error f unctions × (2 dataset types × 3 dataset sizes) = 60 LAB functions that are called among them. Each function is repre-
sented as a gray-dashed rectangle in Fig. 4. The MATLAB function
(7)
numbered "1" in Fig. 4 contains the genetic algorithm that looks
The calibration approaches are multi-objective because more for the values of the calibration variables (X) that minimize the er-
than one objective function is minimized in each approach. In the ror functions of the output domain. The algorithm evaluates the
approaches where only one type of error function is used, each ob- population in base of the objective functions values and continues
jective is the error function applied to each of the calibration basis generating new populations until the maximum number of genera-
that form the type of dataset (heating demand, cooling demand tions, which is 40, is achieved. The main function also contains the
and/or indoor temperatures) as shown in the example provided by definition of the dataset type and size and the lower and upper
Eq. (8). In the calibration approaches where a combination of er- bands for the calibration variables. In addition, the Pareto results
ror functions is used, each objective is the sum of one of the error of the MOO are collected in this function.
functions applied to heating demand, cooling demand or indoor The MATLAB function numbered "2" in Fig. 4 reads the building
temperatures as indicated in the example provided by Eq. (9). If model, looks for the name of the calibration variables and replaces
6 S. Martínez, P. Eguía and E. Granada et al. / Energy & Buildings 216 (2020) 109942
Fig. 4. Flow diagram of the MOO based calibration developed in MATLAB. The code for the MOO was developed in four MATLAB functions represented with gray dashed
rectangles.
their values with the values given by the algorithm. Once the val- elitism, is the use of the best solution to pass on to next genera-
ues are replaced, EnergyPlus simulates the file that contains the tions [49].
model. The output of the simulation is collected as the predicted NSGA-II is a genetic algorithm that has shown a competitive
data set. The MATLAB function numbered "3" collects the simula- performance in various applications and can be improved with
tion results and known outputs and calculates the different error some effective operators like parameter control, hybrid algorithms
function sets. Depending on the calibration approach presented in and information-oriented [50]. Moreover it has proven to be one
Table 2, different error functions applied to different datasets are of the most efficient algorithms to solve multiobjective optimiza-
used as objective functions by the algorithm. The MATLAB function tion and multicriteria decision for improving building performance
numbered "4" reads the error functions and selects which ones are [51–54]. Ramos et al. [55] used a methodology based on multi-
used as objective functions for each calibration approach. The pro- objective genetic algorithm with NSGA-II for building envelope cal-
cess represented in Fig. 4 is repeated automatically for each ap- ibration, showing that the methodology captures the building dy-
proach presented in Table 2. namics with high accuracy. The algorithm used to run the opti-
Furthermore, each calibration approach is executed 10 times be- mization is called “gamultiobj”, which is a controlled, elitist ge-
ginning with the same initial population. This initial population is netic algorithm, a variant of NSGA-II. A controlled elitist genetic
randomly generated with the lower and upper bounds of uncer- algorithm favors individuals with better fitness values but also the
tainty for each calibration variable. The results from optimization individuals that can increase the diversity of the population, even
can vary if such optimization is run several times due to the ran- though they have worse fitness value [56].
dom behavior of crossover and mutation operators of the genetic Population size is an important parameter affecting the perfor-
algorithm (see SubSection 2.4.1). To avoid an incorrect evaluation mance of optimization methods based in genetic algorithms. The
of an approach performance, the results from the 10 iterations of population sizes recommended are twice or four times the num-
the approach are used to evaluate its performance. ber of variables [53]. In our work, each individual of the popula-
tion is formed by seven calibration variables values presented in
Table 1; hence, the population size configuration chosen for the al-
2.4.1. Genetic algorithm gorithm is 30. The same initial population randomly created based
Genetic algorithms are inspired in biological evolution as strat- on the lower and upper values of the variables is used in all cali-
egy for optimization problem solving. The genetic algorithms be- bration approaches. Each individual of the population is evaluated
gin optimization with a population based on individuals. Each in- by replacing the variable values in the EnergyPlus file by the val-
dividual is represented by a chromosome that contains genes. For ues given by the algorithm. Then, to create the next generation,
instance, in an optimization-based calibration the genes are values the algorithm scores the current population according to its fit-
for the calibrated variables. Optimization is performed by three ge- ness value, that is, their evaluated value of the objective functions.
netic operators: crossover, mutation and selection. The crossover is These fitness values are scaled and called expectation values. Some
carried out by swapping one segment of one chromosome with the members of the population are selected as elite based on their
corresponding segment on another chromosome at a random po- lower expectation values and will pass directly to the next gen-
sition. The mutation is achieved by flopping the randomly selected eration. Other members are selected as parents, and children are
genes. The selection of the best chromosomes that is also called produced either by mutation a single parent or by crossover from
S. Martínez, P. Eguía and E. Granada et al. / Energy & Buildings 216 (2020) 109942 7
in a Pareto solution.
|T rue val ue−Predicted val ue|
DI=PE (% )= ×100 (10)
T rue value
2.5.1. Accuracy definition
As each approach is executed 10 times, 10 Pareto frontiers are
obtained per approach. Each Pareto provides different number of
solutions, and each solution provides a value for the seven calibra-
tion variables as shown in Fig. 6.
The accuracy in predicting the values of the calibration vari-
ables is defined in Eq. (11) as the number of Pareto variables with
a DI lower than 10% divided by the total number of variables given
by the Pareto solutions. The accuracy value evaluated for the 10
optimization runs per calibration approach is calculated as shown
in Eq. (12). In this case, the total number of Pareto variables is cal-
culated as the number of Pareto solutions multiplied by seven cal-
ibration variables for the 10 optimization runs, r in Eq. (12) . If the
accuracy value is low it indicates the risk of having calibration er-
ror when selecting a Pareto solution from a specific approach.
Accuracy per opt. run (% ) =
Fig. 5. Framework for the performance comparison of the optimization-based cali- No. of Pareto variables with DI< 10% (11)
bration approaches. On one hand, the calibration approaches are compared in terms ×100
of accuracy. On the other hand, the error functions used for optimization are com- (Total no. of Pareto variables )
pared in terms of robustness to different datasets.
Accuracy 10 opt. runs (% ) =
10
r=1 No. of Pareto variables with DI < 10% (r )
two different parents. This process is repeated until the maximum 10 ×100
r=1 ( No. of Pareto solutions(r ) × 7 calibration variables )
number of generations (stopping criteria, see SubSection 2.4.2) is
reached. (12)
2.5. Performance comparison of the optimization-based calibration MPR= max (P R )Error f unction set (14)
approaches
3. Results and discussion
We carried out a performance comparison among the
optimization-based calibration approaches following two cri- The results of the study only serve as recommendations for
teria as shown in Fig. 5: on one hand, an accuracy comparison of MOO based calibrations of white-box BEMs using genetic algo-
the approaches; on the other hand, a robustness comparison of the rithms. To obtain results in real building cases comparable to those
error functions. Knowing the true value of the input domain allows provided in this work, calibration users must ensure they have a
for testing each calibration approaches’ accuracy. The robustness proper initial population that explores all the searching space. In
is defined as the capacity of an error functions set to provide addition, it is advisable to perform a minimum number of 10 0 0
accurate calibration results under different calibration datasets. evaluations with the algorithm. The number and type of selected
Both criteria are based on the use of a Discrepancy Indicator (DI) calibration variables also affect the results. In our case, which is a
to evaluate the error made in the prediction of the calibration simple model it wasn’t necessary to carry out a sensitivity analy-
variables. This DI is calculated as the Percentage Error (PE) shown sis. However, we recommend performing a sensitivity analysis in
in Eq. (10) and it measures the discrepancy between the true real building models to detect the variables that have the greatest
value for a calibration variable and the prediction for that variable influence on the simulation outputs.
8 S. Martínez, P. Eguía and E. Granada et al. / Energy & Buildings 216 (2020) 109942
Fig. 6. (a) Example of a Pareto curve for the calibration approach based on minimizing the CV(RMSE) of heating and cooling demands. Each Pareto solution is numbered
and contains a value for each of the seven calibration variables. (b) Accumulated PE of the seven calibration variables of each numbered Pareto solution.
Fig. 7. Boxplots for the accuracy values obtained in the 10 optimization runs per error function set applied to: (a) Heating and cooling demands, (b) heating demand, cooling
demand and temperatures. For each error function set the first boxplot corresponds to a calibration period of a week, the second for a month and the third for a year.
3.1. Accuracy results with energy demands does not cause a notable improvement over
calibrating using only energy demands for most of the error func-
Fig. 7 presents the boxplots of the accuracy values of each cali- tions. However, in the case of the Cχ 2 , an increase in accuracy is
bration approach for the 10 optimization runs. The medians of the observed by using temperatures and energy demands. The calibra-
boxplots for the single-error functions, except NMBE, and the com- tion accuracy also increases if CV(RMSE) is used with yearly data
bination of error functions NME, R2 and CV(RMSE), have the high- of temperatures and energy demands.
est values of accuracy. Calibrating using temperatures combined
S. Martínez, P. Eguía and E. Granada et al. / Energy & Buildings 216 (2020) 109942 9
Table 3
Accuracy values for each optimization-based calibration approach. The accuracy is the percentage of Pareto variables in
the 10 optimization runs with DI<10%. Accuracy < 40% (red), 40% ≤ accuracy < 50% (orange), 50% ≤ accuracy < 60%
(yellow), 60% ≤ accuracy <70% (light green), accuracy ≥ 70% (dark green).
In Table 3, the accuracy values per approach are calculated as functions presented. There is one exception and it is the combi-
the percentage of calibration variables with DI lower than 10% for nation of NME, R2 and CV(RMSE) which shows accuracy values
all the Pareto solutions provided in the 10 optimization runs as de- higher than 60% in five out of six calibration datasets in which is
scribed in Eq. (12), thus, taking into account the variability of the tested. The error function NME presents acceptable accuracy values
results of each approach for the 10 optimization runs. Higher accu- around 60% in all presented datasets.
racy values indicate that most of the calibrated variables resulted The error function CV(RMSE) shows a robust input-side cal-
from the Pareto solutions of a calibration approach have a lower ibration performance as it gives accuracies higher than 60% for
input-side error. It is important to mention that these are the ac- three datasets and accuracies higher than 70% in the other three
curacy results for 1200 evaluations in each of the 10 optimization datasets. R2 shows the best accuracy values using a calibration pe-
runs per approach. These results could vary if the optimization al- riod of a month. The dataset size influences the calculation of the
gorithm performs another number of evaluations. R2 values. Weekly datasets contain less data for the model to ex-
The calibration approaches in Table 3 with higher values of ac- plain the variance in known data. With larger periods like a year,
curacy (more than 70% of accuracy) are eight of sixty and they are R2 has the difficulty of adjusting model data behavior to known
the approaches numbered in Table 2 as 3, 8, 9, 15, 39, 40, and data especially if the dataset contains heating, cooling demands
43. Three of these approaches use the error function CV(RMSE) ap- and temperatures. Cχ 2 shows also accurate calibration variables
plied to a week and a month of heating and cooling demands data, prediction under different datasets except in the case with weekly
and a year with heating, cooling and temperatures. R2 has two energy demands that can be insufficient data for the index to sta-
approaches with more than 70% accuracy and both are obtained blish a proper correlation between model and known data.
with monthly datasets. The error function Cχ 2 provides three ap- A remarkable finding is that some optimization-based calibra-
proaches with higher accuracies, two of them are yearly and the tion approaches are able to predict accurately the calibration vari-
one that is monthly uses heating and cooling demands and tem- ables values with weekly datasets. If heating and cooling demands
perature data. data are available, NME has an accuracy of 62%, CV(RMSE) accu-
Even though there isn’t any approach with a 100% accuracy racy is 70% and their combination with R2 has an accuracy of 69%.
some approaches can have Pareto solutions with all of their cali- If indoor temperatures are known, R2 and Cχ 2 can be used too as
bration variables with DI<10% but we are taking into account all error functions for weekly optimization-based calibration.
the solutions provided by the 10 Pareto frontiers in the calcula-
tion. The criterion that is going to use a calibration user for se-
lecting a Pareto solution among others is unknown. Thus, we take 3.2. Robustness results
into account all the Pareto solutions provided by an approach for
measuring its accuracy. Then, the accuracy is an indicator of the Table 4 presents the Performance Regrets for each approach.
likelihood of having Pareto solutions with their calibration values The MPR is the maximum PR for an error function set across all
with DI<10%. scenarios as shown in Eq. (14). The objective of the MPR is to mea-
As shown in Table 3, NMBE is not a proper index for sure the non-robustness of each error function to different calibra-
optimization-based calibration. In its mathematical definition the tion scenarios. Lower MPR values mean that the error function per-
error is not measured in absolute value and therefore it is not valid forms robustly for all the calibration scenarios, considering differ-
for using with minimization purposes. All the optimization combi- ent datasets sizes and types.
nations that include the NMBE present an inaccurate calibration According to Table 4 the most robust error function along all
performance. CV(RMSE) and NMBE are commonly used to evaluate scenarios is CV(RMSE) followed in order by Cχ 2 , NME and R2 . The
the output-based calibration results, however using their combina- combination of NME, R2 and CV(RMSE) also presents a robust per-
tion for optimization-based calibration doesn’t give accurate results formance to different calibration scenarios. On the contrary, NMBE
in calibration variables prediction. presents the worst calibration robustness in comparison to the rest
The single-error functions for optimization-based calibration of the error functions sets. Furthermore, the combination of NMBE
(NME, CV(RMSE), R2 and Cχ 2 ) provide more accurate input-side with CV(RMSE) proposed by ASHRAE Guideline 14 for calibration
calibration results than using the combination of different error evaluation, does not provide robust results for optimization-based
calibration under different datasets.
10 S. Martínez, P. Eguía and E. Granada et al. / Energy & Buildings 216 (2020) 109942
Table 4
Performance Regrets for each optimization-based calibration approach and Maximum Performance Regret for each error function set.
Scenarios
Data set type Heating and cooling demands Heating, cooling demands and temperatures
Data set size Week Month Year Week Month Year
Maximum performance (accuracy) of each scenario 70% 74% 70% 66% 78% 77%
Performance Regrets (PR) Max. PR
Error
NMBE 42% 47% 38% 39% 51% 47% 51%
func-
NME 8% 18% 7% 7% 13% 14% 18%
tionCV(RMSE) 0% 0% 3% 1% 15% 0% 15%
R2
designs 11% 1% 9% 3% 6% 19% 19%
Cχ 2 16% 10% 0% 4% 0% 6% 16%
CV(RMSE) & NMBE 31% 39% 32% 33% 39% 33% 39%
NME & R2 13% 25% 18% 9% 25% 23% 25%
Cχ 2 & CV(RMSE) 19% 20% 12% 17% 29% 15% 29%
NME & R2 & CV(RMSE) 1% 5% 6% 0% 18% 20% 20%
NMBE & NME & CV(RMSE) & Cχ 2 & R2 29% 39% 32% 26% 33% 38% 39%
Table 5
Requirements for evaluating the calibration results proposed by ASHRAE Guideline 14 and IPMVP.
Fig. 8. Results of the 1200 evaluations (error functions vs no. of calibration variables with DI>10%) of jointly optimizing the CV(RMSE) and NMBE of yearly heating and
cooling energy demands. (a) Results for the CV(RMSE) between simulated and known energy demands. (b) Results for the NMBE between simulated and known energy
demands. The black lines represent the ASHRAE Guideline 14 thresholds for evaluating yearly calibration results.
3.3. Stopping criterion analysis calibrated variables with DI>10%. If an optimization were stopped
according to these criteria, the simulation outputs would be cal-
Due to the computational effort that requires simulating a ibrated; however there would be a considerable error in the pre-
building model in several iterations, it is necessary to determine diction of some calibration variables. Even though the optimization
a stopping criterion for optimization algorithms. Depending on the made more evaluations than stopping with these criteria, the al-
stopping configuration of the optimization, the calibration solu- gorithm wouldn’t find any Pareto solution with all its calibration
tions can be more or less close to the true values. The intention of variables with DI<10%. This fact shows that NMBE is not a proper
this section is to show how the selection of the stopping criterion statistical index for optimization-based calibration with CV(RMSE).
for the optimization algorithm can affect the calibration results. Instead, we propose the use of the most robust error function
The calibration requirements for evaluating the outputs of for optimization-based calibration, that is CV(RMSE). In Fig. 9 we
yearly simulations with hourly data proposed by ASHRAE Guideline used the same error function, CV(RMSE) for optimizing two objec-
and IPMVP are presented in Table 5. These requirements consist in tives, heating and cooling energy demands. We can see that if the
the use of CV(RMSE) and NMBE. Fig. 8 shows the values of opti- selected stopping criteria is when both CV(RMSE) values are lower
mizing CV(RMSE) and NMBE, applied to yearly heating and cooling than 3%, it is possible to find Pareto solutions with all their cal-
demands and the number of calibrated variables with DI>10% that ibration variables with DI<10%. For instance, in Fig. 9, there is a
provide their corresponding evaluations. It is checked if when the solution which has CV(RMSE) values of 2.48% for heating demand
simulation outputs of the optimization fulfill calibration evaluation and 2.35% for cooling demand with all its calibrated variables with
criteria, there is error in the calibration variables prediction. DI<10%.
According to Fig. 8, CV(RMSE) values lower than 30% and NMBE
values between ±10% for heating and cooling energy demands
can’t provide solutions with all their calibrated variables with
DI<10%. Furthermore, most of their solutions have from two to six
S. Martínez, P. Eguía and E. Granada et al. / Energy & Buildings 216 (2020) 109942 11
Fig. 9. (a) Results of the 1200 evaluations of optimizing the CV(RMSE) of yearly heating and cooling demands. (b) Zoom of the Fig. 9a showing that minimizing the CV(RMSE)
of the energy demands under values of 3% provides Pareto solutions with all of its calibration variables with DI<10%.
tions of the calibration variables without discrepancies greater [17] G. Liu, M. Liu, A rapid calibration procedure and case study for simplified sim-
than 10% with respect to their true values. ulation models of commonly used HVAC systems, Build. Environ. 46 (2) (2011)
409–420.
[18] D. Coakley, P. Raftery, P. Molloy, G. White, Calibration of a detailed bes model
Declaration of Competing interests to measured data using an evidence-based analytical optimisation approach,
2011.
[19] F. Westphal, R. Lamberts, Building Simulation Calibration Using Sensitiviy
The authors declare that they have no known competing finan-
Analysis, IBPSA 2005, International Building Performance Simulation Associ-
cial interests or personal relationships that could have appeared to ation, 2005 2005.
influence the work reported in this paper. [20] Z. O’Neill, B. Eisenhower, V. Fonoberov, T.E. Bailey, Calibration of a building
energy model considering parametric uncertainty, 2012.
[21] Y. Heo, R. Choudhary, G.A. Augenbroe, Calibration of building energy models
CRediT authorship contribution statement for retrofit analysis under uncertainty, Energy Build. 47 (2012) 550–560.
[22] F. Roberti, U.F. Oberegger, A. Gasparella, Calibrating historic building energy
Sandra Martínez: Conceptualization, Methodology, Software, models to hourly indoor air and surface temperatures: methodology and case
study, Energy Build. 108 (2015) 236–243.
Validation, Formal analysis, Data curation, Writing - original draft, [23] P. Penna, A. Prada, F. Cappelletti, A. Gasparella, Multi-objectives optimization
Writing - review & editing. Pablo Eguía: Conceptualization, Writ- of energy efficiency measures in existing buildings, Energy Build. 95 (2015)
ing - review & editing, Project administration, Funding acquisition. 57–69.
[24] J. Sun, T.A. Reddy, Calibration of building energy simulation programs us-
Enrique Granada: Conceptualization, Supervision, Resources, Writ- ing the analytic optimization approach (RP-1051), HVAC&R Res. 12 (1) (2006)
ing - review & editing, Project administration, Funding acquisition. 177–196.
Amin Moazami: Conceptualization, Methodology, Formal analysis. [25] A. Chong, K.P. Lam, M. Pozzi, J. Yang, Bayesian calibration of building energy
models with large datasets, Energy Build. 154 (2017) 343–355.
Mohamed Hamdy: Conceptualization, Methodology, Software, For-
[26] S. Martínez, A. Erkoreka, P. Eguía, E. Granada, L. Febrero, Energy characteriza-
mal analysis, Resources, Writing - original draft, Writing - review tion of a Paslink test cell with a gravel covered roof using a novel methodol-
& editing, Supervision. ogy: sensitivity analysis and Bayesian calibration, J. Build. Eng. 22 (2019) 1–11.
[27] K. Sun, T. Hong, S.C. Taylor-Lange, M.A. Piette, A pattern-based automated ap-
proach to building energy model calibration, Appl. Energy 165 (2016) 214–224.
Acknowledgments [28] Y. Chen, T. Hong, Automatic and rapid calibration of urban building energy
models, 2019.
The Ministry of Science, Innovation and Universities of the [29] S. Karatasou, M. Santamouris, V. Geros, Modeling and predicting building’s en-
ergy use with artificial neural networks: methods and results, Energy Build. 38
Spanish Government (grant number BES-2016–076516) supported (8) (2006) 949–958.
this work. [30] S.A. Kalogirou, M. Bojic, Artificial neural networks for the prediction of the
The authors would like to thank the support given by the energy consumption of a passive solar building, Energy 25 (5) (20 0 0) 479–491.
[31] X. Lü, T. Lu, M. Viljanen, Calibrating numerical model by neural networks: a
project Smartherm (RTI2018-096296-N-C21) of the Ministry of Sci- case study for the simulation of the indoor temperature of a building, Energy
ence, Innovation and Universities of the Spanish Government. Procedia 75 (2015) 1366–1372.
[32] B. Eisenhower, Z. O’Neill, S. Narayanan, V.A. Fonoberov, I. Mezić, A method-
References ology for meta-model based optimization in building energy models, Energy
Build. 47 (2012) 292–301.
[1] EIA, International Energy Outlook, U.S. Energy Information Administration, [33] M. Manfren, N. Aste, R. Moshksar, Calibration and uncertainty analysis for
2016. computer models – A meta-model based approach for integrated building en-
[2] J.V. Frances Bean, V. Dorizas, E. Bourdakis, D. Staniaszek, L. Pagliano, ergy simulation, Appl. Energy 103 (2013) 627–641.
A. Roscetti, Future-proof buildings for all Europeans – A guide to implement [34] M. Taheri, F. Tahmasebi, A. Mahdavi, A case study of optimization-aided ther-
the energy performance of buildings directive,, Buildings performance institute mal building performance simulation calibration, 2013.
Europe (BPIE), 2019. [35] P. Penna, F. Cappelletti, A. Gasparella, F. Tahmasebi, A. Mahdavi, Optimization-
[3] P. Raftery, M. Keane, J. O’Donnell, Calibrating, whole building energy models: based calibration of a school building based on short-term monitoring data,
an evidence-based methodology, Energy Build 43 (9) (2011) 2356–2364. 2014.
[4] W. Li, Z. Tian, Y. Lu, F. Fu, Stepwise calibration for residential building ther- [36] V. Monetti, E. Davin, E. Fabrizio, P. André, M. Filippi, Calibration of building en-
mal performance model using hourly heat consumption data, Energy Build 181 ergy simulation models based on optimization: a case study, Energy Procedia
(2018) 10–25. 78 (2015) 2971–2976.
[5] ASHRAE, Guideline 14-2014, Measurement of energy and demand savings, in, [37] T. Yang, Y. Pan, J. Mao, Y. Wang, Z. Huang, An automated optimization method
American society of heating, refrigerating and air-conditioning Engineers, A for calibrating building energy simulation models with measured data: orien-
tlanta, GA , USA, 2014. tation and a case study, Appl. Energy 179 (2016) 1220–1231.
[6] M. Royapoor, T. Roskilly, Building model calibration using energy and environ- [38] J.J. Robertson, B.J. Polly, J.M. Collis, Reduced-order modeling and simulated an-
mental data, Energy Build 94 (2015) 109–120. nealing optimization for efficient residential building utility bill calibration,
[7] I. CommitteeInternational Performance Measurement and Verification Proto- Appl. Energy (2015) 148.
col, Concepts and Options for Determining Energy and Water Savings, I, US [39] A. Mahdavi, F. Tahmasebi, An optimization-based approach to recurrent cali-
Department of Energy (US), United States, 2001. bration of building performance simulation models, 2012.
[8] J. Haberl, D. Claridge, C. Culp, ASHRAE’s guideline 14-2002 for measurement of [40] T. Hong, J. Kim, J. Jeong, M. Lee, C. Ji, Automatic calibration model of a building
energy and demand savings: how to determine what was really saved by the energy simulation using optimization algorithm, Energy Procedia 105 (2017)
retrofit, 2005. 3698–3704.
[9] D. Coakley, P. Raftery, M. Keane, A review of methods to match building energy [41] R.A. Lara, E. Naboni, G. Pernigotto, F. Cappelletti, Y. Zhang, F. Barzon, A. Gaspar-
simulation models to measured data, Renew. Sustain. Energy Rev. 37 (2014) ella, P. Romagnoni, Optimization tools for building energy model calibration,
123–141. Energy Procedia 111 (2017) 1060–1069.
[10] I. Shapiro, Energy audits in large commercial office buildings, ASHRAE J. 51 (1) [42] G. Ramos, C. Fernandez Bandera, T. Gómez-Acebo Temes, A. Sánchez-Ostiz,
(2009) 18+20+22+24+26-27. Genetic algorithm for building envelope calibration, Appl. Energy 168 (2016)
[11] J.P. Waltz, Practical experience in achieving high levels of accuracy in energy 691–705.
simulations of existing buildings, ASHRAE Trans. (1992) 606–617. [43] C. Cornaro, D. Sapori, F. Bucci, M. Pierro, C. Giammanco, Thermal performance
[12] T.A. Reddy, I. Maor, C. Panjapornpon, Calibrating detailed building energy sim- analysis of an emergency shelter using dynamic building simulation, Energy
ulation programs with measured data—part I: general methodology (RP-1051), Build. 88 (2015) 122–134.
HVAC&R Res 13 (2) (2007) 221–241. [44] W.L. Carroll, R.J. Hitchcock, Tuning simulated building descriptions to match
[13] T.A. Reddy, I. Maor, C. Panjapornpon, Calibrating detailed building energy sim- actual utility data: methods and implementation, ASHRAE Trans. (1993)
ulation programs with measured data—part II: application to three case study 928–934.
office buildings (RP-1051), HVAC&R Res 13 (2) (2007) 243–265. [45] R. Judkoff, B. Polly, M. Bianchi, J. Neymark, Building energy simulation test for
[14] P. Raftery, M. Keane, A. Costa, Calibrating whole building energy models: de- existing homes (BESTEST-EX) methodology: preprint, in: Proceedings of Build-
tailed case study using hourly measured data, Energy Build 43 (12) (2011) ing Simulation 2011: 12th Conference of International Building Performance
3666–3679. Simulation Association, 2011.
[15] J.M. Manke, D.C. Hittle, C.E. Hancock, Calibrating Building Energy Analysis [46] J. New, Suitability of ASHRAE guideline 14 metrics for calibration, ASHRAE
Models Using Short Term Test Data, American Society of Mechanical Engineers, Trans. (2016).
New York, NYUnited States, 1996 United States. [47] M. Vogt, P. Remmen, M. Lauster, M. Fuchs, D. Müller, Selecting statistical
[16] J.S. Haberl, T.E. Bou-Saada, Procedures for calibrating hourly simulation models indices for calibrating building energy models, Build. Environ. 144 (2018)
to measured building energy and environmental data, J. Sol. Energy Eng. 120 94–107.
(3) (1998) 193–204.
S. Martínez, P. Eguía and E. Granada et al. / Energy & Buildings 216 (2020) 109942 13
[58] R.H.H.a.M.J. White, EnergyPlus Testing with ANSI/ASHRAE Standard 140-2001 [54] S. Martínez, E. Pérez, P. Eguía, A. Erkoreka, E. Granada, Model calibration and
(BESTEST), in, U.S. Department of Energy, Washington, 2004. exergoeconomic optimization with NSGA-II applied to a residential cogenera-
[49] X.-S. YangChapter 5 - Genetic Algorithms, Elsevier, Oxford, 2014, pp. 77–87. tion, Appl. Therm. Eng. 169 (2020) 114916.
[50] H. Lu, R. Zhou, Z. Fei, J. Shi, A multi-objective evolutionary algorithm based [55] G. Ramos Ruiz, C. Fernández Bandera, T. Gómez-Acebo Temes, A. Sánchez-Os-
on Pareto prediction for automatic test task scheduling problems, Appl. Soft tiz Gutierrez, Genetic algorithm for building envelope calibration, Appl. Energy
Comput. 66 (2018) 394–412. 168 (2016) 691–705.
[51] F.P. Chantrelle, H. Lahmidi, W. Keilholz, M.E. Mankibi, P. Michel, Development [56] K. Deb, Multi-Objective optimization using evolutionary algorithms, 2001.
of a multicriteria tool for optimizing the renovation of buildings, Appl. Energy [57] V. Machairas, A. Tsangrassoulis, K. Axarli, Algorithms for optimization of build-
88 (4) (2011) 1386–1394. ing design: a review, Renew. Sustain. Energy Rev. 31 (2014) 101–112.
[52] R. Evins, P. Pointer, R. Vaidyanathan, S. Burgess, A case study exploring regu- [58] R. Kotireddy, P.-.J. Hoes, J. Hensen, Simulation-based comparison of robustness
lated energy use in domestic buildings using design-of-experiments and mul- assessment methods to identify robust low-energy building designs, 2017.
ti-objective optimisation, Build. Environ. 54 (2012) 126–136.
[53] M. Hamdy, A.-T. Nguyen, J.L.M. Hensen, A performance comparison of multi-
-objective optimization algorithms for solving nearly-zero-energy-building de-
sign problems, Energy Build. 121 (2016) 57–71.