0% found this document useful (0 votes)

61 views16 pages

Urban Building Energy Performance Prediction and Retrofit Analysis Using Data-Driven Machine Learning Approach

Uploaded by

nakranitirth7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views16 pages

Urban Building Energy Performance Prediction and Retrofit Analysis Using Data-Driven Machine Learning Approach

Uploaded by

nakranitirth7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Energy & Buildings 303 (2024) 113768

Contents lists available at ScienceDirect

Energy & Buildings

journal homepage: www.elsevier.com/locate/enbuild

Urban building energy performance prediction and retroﬁt analysis using

data-driven machine learning approach
Usman Ali a,∗ , Sobia Bano a , Mohammad Haris Shamsi d , Divyanshu Sood a , Cathal Hoare a ,
Wangda Zuo c , Neil Hewitt b , James O’Donnell a
a
School of Mechanical and Materials Engineering and UCD Energy Institute, UCD, Dublin, Ireland
b
School of Architecture and The Built Environment, Ulster University, Belfast, UK
c
Pennsylvania State University, University Park, PA, USA
d
Flemish Institute for Technological Research (VITO), Boeretang Mol, Belgium

A R T I C L E I N F O A B S T R A C T

Keywords: Stakeholders such as urban planners and energy policymakers use building energy performance modeling and
Building energy performance analysis to develop strategic sustainable energy plans with the aim of reducing energy consumption and emissions
Data-driven approaches from the built environment. However, inconsistent energy data and the lack of scalable building models create a
Urban building energy modeling
gap between building energy modeling and traditional planning practices. An alternative approach is to conduct a
Machine learning
large-scale energy usage survey, which is time-consuming. Similarly, existing studies rely on traditional machine
Building retrofit
learning or statistical approaches for calculating large-scale energy performance. This paper proposes a solution
that employs a data-driven machine learning approach to predict the energy performance of urban residential
buildings, using both ensemble-based machine learning and end-use demand segregation methods. The proposed
methodology consists of five steps: data collection, archetype development, physics-based parametric modeling,
machine learning modeling, and urban building energy performance analysis. The devised methodology is tested
on the Irish residential building stock and generates a synthetic building dataset of one million buildings through
the parametric modeling of 19 identified vital variables for four residential building archetypes. As a part of the
machine learning modeling process, the study implemented an end-use demand segregation method, including
heating, lighting, equipment, photovoltaic, and hot water, to predict the energy performance of buildings at
an urban scale. Furthermore, the model’s performance is enhanced by employing an ensemble-based machine
learning approach, achieving 91% accuracy compared to the traditional approach’s 76%. Accurate prediction of
building energy performance enables stakeholders, including energy policymakers and urban planners, to make
informed decisions when planning large-scale retrofit measures.

1. Introduction ergy efficiency within the building sector using the Energy Performance
of Buildings Directive (EPBD). The primary objective of this directive
The operation of buildings accounted for 30% of global energy con- is to facilitate the adoption of policies and measures that will enable
sumption and 27% of total energy sector greenhouse gas emissions the achievement of a highly energy-efficient and decarbonized building
(GHG) in 2021 [1]. Within this context, 8% comprised direct emissions stock by the years 2030 and 2050, respectively [2].
occurring within buildings, while 19% represented indirect emissions The rise in annual energy consumption, especially in urban areas,
resulting from the production of electricity and heat used in buildings. is expected to increase carbon emissions significantly [1]. As a result,
To address these environmental concerns, the member nations of the there is a growing focus on reducing energy use and emissions from
European Union (EU) have established a legislative infrastructure to the building sector. Urban planners and policymakers are exploring
advance sustainable strategic planning initiatives and strengthen en- innovative strategies to make existing buildings more sustainable, in-

* Corresponding author.
E-mail addresses: [email protected] (U. Ali), [email protected] (S. Bano), [email protected] (M.H. Shamsi),
[email protected] (D. Sood), [email protected] (C. Hoare), [email protected] (W. Zuo), [email protected] (N. Hewitt),
[email protected] (J. O’Donnell).

https://doi.org/10.1016/j.enbuild.2023.113768
Received 19 September 2023; Received in revised form 1 November 2023; Accepted 17 November 2023
Available online 22 November 2023
0378-7788/© 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Nomenclature

𝐵𝐸𝑀 Building Energy Modeling 𝐻𝐺𝐵 Histogram-Based Gradient Boosting

𝐵𝐸𝑃 𝑆 Building Energy Performance Simulator 𝐻𝑉 𝐴𝐶 Heating Ventilation, and Air Conditioning
𝐵𝐸𝑅 Building Energy Rating 𝐾𝑁𝑁 K-Nearest Neighbor
𝐶𝐸𝐴 City Energy Analyst 𝐿𝐺𝐵𝑀 Light Gradient Boosted Machine
𝐶𝑖𝑡𝑦𝐵𝐸𝑆 City Building Energy Saver 𝐿𝑅 Linear Regression
𝐶𝑆𝑂 Central Statistics Oﬃce 𝑁𝑁 Neural Network
𝐷𝐸𝐴𝑃 Dwelling Energy Assessment Procedure 𝑅𝐹 Random Forest
𝐷𝑇 Decision Tree 𝑆𝐸𝐴𝐼 Sustainable Energy Authority of Ireland
𝐸𝑃 𝐵𝐷 European Union Energy Performance of Buildings Direc- 𝑆𝑉 𝑅 Support Vector Regression
tive 𝑈 𝐵𝐸𝑀 Urban Building Energy Modeling
𝐸𝑃 𝐶 Energy Performance Certiﬁcate 𝑈 𝑀𝐼 Urban Modeling Interface
𝐺𝐵 Gradient Boosting 𝑋𝐺𝐵 Extreme Gradient Boosting

cluding creating comprehensive sustainable energy plans. Furthermore, rithms commonly used in building energy demand prediction include
long-term renovation strategies are necessary to achieve a higher level a nearest neighbor, naive Bayes, rule induction, deep learning, Sup-
of sustainability and reduce carbon emissions from buildings. These port Vector Machines (SVM), and neural networks [14,15,13]. On the
plans aim to minimize overall energy consumption and CO2 emissions other hand, unsupervised learning techniques are applied without any
by analyzing data on the energy performance of buildings on a large corresponding output variable for inputs [14]. Unsupervised learning
scale. As a result, the EU has implemented the aforementioned EPBD to algorithms commonly implemented in this domain include clustering
ensure that member states develop the buildings database comprising and association rules of k means [16,11]. However, previous stud-
Energy Performance Certificates (EPCs). However, even with this man- ies employing the data-driven methodology primarily concentrated on
date, building stock databases typically cover only 30-50% of the total forecasting the energy consumption of individual buildings [17]. This
building stock [3]. limited focus is mainly due to the need for more high-quality and reli-
Moreover, available data are often inadequate for stakeholders such able data on a large scale. In addition, these studies have relied on only
as urban planners, energy policymakers, utility planners, and manufac- a few parameters to forecast the potential energy consumption of the
turers to create effective and sustainable energy conservation measures. building [18].
Gathering accurate and comprehensive data for urban modeling poses The novelty of this research lies in the integration of parametric
a significant challenge [4]. The limited availability and accessibility of simulations, ensemble-based machine learning approaches, and segre-
data at the urban scale make it difficult to understand the urban con- gation methods to predict building energy performance at an urban
text thoroughly. This poses a hurdle for researchers and practitioners scale using limited resources. Parametric simulation techniques can cre-
who aim to develop accurate and reliable models that capture the com- ate synthetic data encompassing a wide range of relevant scenarios for
plexities of urban systems. Overcoming this issue requires innovative stakeholders. This study implements ensemble-based machine learning
approaches and collaborations to improve data collection and sharing algorithms to predict building energy performance on an urban scale by
mechanisms, ensuring a more comprehensive and representative urban segregating end-use demands such as electricity, hot water, and heating.
modeling and analysis. Similarly, estimating the energy performance of Furthermore, this research identifies the key building characteristics for
the entire building stock is challenging due to numerous factors that each end-use demand prediction. The research additionally analyses the
impact energy usage, including the building envelope, the geometry of impact of retrofit measures and future stakeholder policies using histor-
buildings, the behavior of occupants, heating and cooling systems, and ical and future weather data.
the weather conditions [5,6]. This paper is structured as follows. Section 2 describes an overview
Generally, there are two main approaches to estimating building of the existing work done on the prediction of the energy performance
energy performance: physical and data-driven models [7]. Physical of urban buildings. Section 3 outlines the methodology devised, includ-
models are based on detailed building physics and are analyzed using an explanation of the steps followed in the development of the
ing simulation tools such as EnergyPlus, ESP-r, and TRNSYS [5]. The machine learning model. The results of the Irish case study are pre-
simulation of these tools requires extensive building characteristics, in- sented in Section 4, followed by discussions of possible implications
cluding geometric and non-geometric information [6]. On the other and improvements in the case study in Section 5. Section 6 includes
hand, the data-driven approach predicts energy usage based on his- conclusions and potential challenges, and future work.
torical data, employing statistical or machine learning algorithms [8].
Unlike the physical modeling approach, this method does not require 2. Literature review
a deep understanding of the building. This approach has gained signif-
icant popularity in the building energy sector because it allows pre- Urban building energy modeling can effectively analyze building en-
diction and estimation of energy consumption with limited building ergy performance and facilitate sustainable energy planning. The most
information [6]. Similarly, data-driven models can uncover complex common modeling approaches, such as physics-based or data-driven ap-
relationships between various characteristics of buildings and energy proaches, differ based on implementation and data requirements, as
consumption, which can be challenging to identify using traditional described in the following sections.
methods.
In recent years, researchers implemented various data-driven ap- 2.1. Physics-based urban building energy modeling
proaches in building energy demand prediction. These approaches use
historical data and employ statistical and machine learning (ML) al- The physics-based urban building energy modeling approach also
gorithms to develop data-driven models [6,9–12]. Machine learning referred to as the engineering or simulation approach, uses simulation
algorithms can be broadly classified into supervised and unsupervised techniques along with data related to building characteristics, construc-
learning techniques, with supervised learning further divided into re- tion, weather conditions, and data from heating-cooling systems to
gression and classification algorithms [13]. Supervised learning algo- compute the consumption of end-use energy [19,20]. The physics-based

2
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

approach can simulate and estimate building energy usage or produc- 2.2. Data-driven urban building energy modeling
tion on site, incorporating renewable energy technologies [13]. These
models determine the end-use energy consumption of each building by In urban energy modeling, a data-driven approach can predict and
type and rating using measurable data [7]. assess buildings’ energy usage by considering various factors related to
In the context of cities, the bottom-up archetype method has been the characteristics of the buildings [7,19]. This approach is based on the
widely used to analyze the overall impact of energy efficiency strate- analysis of existing data sources that include building stock datasets,
gies and new technologies at a regional or national scale [5,21]. Each billing data (such as electricity and gas consumption), survey data,
building archetype is modeled in the simulation engine to estimate en- and socioeconomic variables [7]. Data-driven urban energy modeling
ergy consumption, with these estimates then scaled up to represent the is conducted mainly using machine learning and statistical approaches.
regional or national building stock [22]. These approaches heavily rely Recent studies on urban energy have increasingly focused on using ma-
on quantitative data obtained from building physics. These methods re- chine learning algorithms over traditional statistical techniques [7].
quire various inputs, such as the thermal properties (U values) of the Rahman et al. used deep recurrent neural networks to predict
building components (walls, windows, roof, floor, doors), internal and medium- to long-term electricity use in commercial and residential
external temperatures, heating system patterns, ventilation rates, ap- buildings [34]. Meanwhile, Kontokosta and Tull devised statistical mod-
pliance quantities, occupancy, schedules, and internal loads [7,6]. In els to determine the energy consumption of electricity and natural gas
addition, these models require numerous assumptions to establish the in more than a million buildings in New York City [35]. Feifeng et
behavior of the occupants and a substantial amount of technical data to al. proposed a semi-supervised learning method for predicting energy
estimate energy consumption. use intensity (EUI) using 34,456 unlabeled samples [36]. Zhang et al.
One of the most prominent projects, the City Building Energy Saver proposed a data-driven framework for the prediction of energy usage
(CityBES), offers a platform for modeling and analyzing the thermal and greenhouse gas emissions, which considered various factors such
performance of different retrofit scenarios [23]. CityBES uses the En- as building characteristics, geometry and urban morphology [37]. Sim-
ergyPlus simulation engine to model buildings and analyze retrofit at ilarly, Seo et al. developed a data-driven model to predict the energy
the district or city scale [24]. Another project, The CitySim project, demand for heating of 10,000 low-income households in South Korea
involves a decision support tool that assists energy planners and stake- [38]. Razak et al. developed a machine learning model that forecasts an-
holders in minimizing energy usage and emissions while incorporating nual average energy use based on building design features in the initial
development stages [18]. Ngo et al. used ensemble machine learning
various optimization and retrofit analyses [25]. Urban Modeling Inter-
models to forecast building energy consumption over 24 hours [39].
face (UMI) integrates the EnergyPlus simulation engines, Daysim, and
Lastly, Wurm et al. developed a workflow for modeling the heat demand
a Python module for the operational energy, daylighting, and walk-
of building stock on an urban scale, using deep learning algorithms
ability of urban buildings [26]. MIT’s UBEM (Urban Building Energy
[40].
Model) platform uses the EnergyPlus simulation engine to model ap-
Although a significant amount of research has been conducted on
proximately 83,541 buildings by integrating official GIS datasets and a
predicting energy consumption in individual buildings using their spe-
custom building archetype library [27]. URBANopt (Urban Renewable
cific characteristics, more studies have yet to explore using data-driven
Building And Neighborhood Optimization) provides an EnergyPlus and
models for predicting energy consumption on a larger scale. The main
OpenStudio-based simulation software development kit (SDK) to simu-
challenge lies in the lack of high-quality data in sufficient quantities
late the energy performance of low-energy districts and campus-scale
to train prediction models effectively. This underscores the need for a
thermal and electrical analyses [28].
robust building energy modeling approach capable of accurately pre-
One of the significant challenges in modeling at an urban scale is the
dicting the energy performance of entire building stocks, even when
availability of both building geometric and non-geometric data. Few re-
faced with limited resources for complex decision-making analysis. Fur-
cent studies have focused on the generation of new building geometric
thermore, previous research on predicting building energy consumption
data. UBEM.io, a novel web-based framework, automates the genera-
has been limited by considering only a small set of parameters ([18]).
tion of urban-scale building geometries based on widely available inputs Fewer recent studies have started incorporating crucial factors such
such as shapefiles, LiDAR, and tax assessor data [29]. Soroush et al. as U-values, HVAC systems, and renewable energy systems into their
developed a detailed urban building energy model using the CityGML machine-learning algorithms to estimate better energy performance in
format for 3D urban geometry and employed spatial joining to incor- buildings ([37]). However, only a few studies have specifically investi-
porate the features required for archetype selection [30]. Ali et al. gated the impact of parameters such as U values, HVAC system types,
proposed urban building energy and microclimate modeling by gen- and the presence of renewable energy systems on the estimation of
erating 3D city models from sources such as Google Earth, Microsoft the energy performance of buildings using machine learning algorithms
Footprints, and OpenStreetMap [31]. Irene et al. developed a model- ([18,39–41]).
ing framework to assess the potential of creating energy communities Predicting the energy performance of buildings at an urban scale
by combining UBEM capabilities with the rooftops’ potential for solar poses a significant challenge for urban planners and policymakers. The
generation [32]. accurate prediction of energy consumption and the identification of
With increased data availability and more sophisticated modeling opportunities for enhancing energy efficiency are crucial for fostering
techniques, it has become crucial to devise a generalized UBEM frame- sustainable development in cities. There is significant potential to ex-
work and improve the existing work to facilitate the modeling and anal- pand current research and establish a comprehensive methodology for
ysis of different use cases. Previous studies provide a limited view of the data-driven building energy modeling on an urban level.
different building energy aspects in an urban setting. This stems mainly However, one major issue that arises in an urban context is the avail-
from the fact that simulating each building individually, along with ability of data. Obtaining comprehensive and reliable data at an urban
their interdependencies, requires significant time and resources [33]. scale can be challenging, as it requires collecting and integrating infor-
Furthermore, these methods usually deploy a physics-based simulation mation from multiple sources [4]. Addressing this issue is essential to
engine, which can be computationally demanding and time-consuming enable effective energy planning and modeling techniques, empower-
due to the intricate nature of urban systems”. ing stakeholders to make informed decisions and drive positive change
Data-driven urban building energy modeling can address the afore- in urban energy management.
mentioned challenges by estimating building energy consumption using These findings highlight the importance of adopting a holistic ap-
basic knowledge of the buildings’ features. However, this approach still proach to building energy modeling, considering all relevant factors,
has research gaps, as discussed in the next section. to accurately predict building energy performance and align with the

3
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Fig. 1. Overarching methodology for urban building energy performance prediction using machine learning.

objectives of various stakeholders. Therefore, this research proposes required for building energy modeling is gathered from building stock
a methodology that combines and harnesses the strengths of physics- and energy performance certificate databases and existing construction
based and data-driven approaches to accurately predict the energy per- databases such as TABULA, EPISCOPE, and building typology databases
formance of buildings on an urban scale. In the physics-based approach, ([43]).
parametric simulation methods are employed to generate synthetic data Along with geometric data, non-geometric data are also required for
that encompass all possible scenarios relevant to stakeholders. Sim- modelings, such as user occupancy patterns, equipment loads, HVAC
ilarly, ensemble machine learning and end-use demand segregation systems, and usage patterns also need to be modeled. One of the sig-
methods are used in the data-driven approach instead of relying on a nificant challenges in this regard is the availability of non-geometric
single model to achieve accurate predictions of building energy perfor- building information on a large scale. Non-geometric building data can
mance on an urban scale. be obtained through the building archetypes approach, using available
national census databases, statistical surveys, and energy performance
3. Methodology certificate data.
Weather data sets are essential to accurately model energy use in
This study proposes a novel methodology that uses supervised ma- building thermal simulations ([44]). The most commonly used climate
chine learning algorithms to predict building energy performance on data sets, such as the typical meteorological year data (TMY), have
a large scale. This research aims to identify the most effective model been available for a long time and describe the local climate ([45]).
using physics and data-driven approaches. The prediction methodol- Another helpful resource are EnergyPlus Weather format (EPW) files,
ogy for the energy performance of urban buildings involves five steps which can be accessed online for more than 3,034 locations. These files
(Fig. 1). are arranged by region and country of the World Meteorological Or-
ganization. Furthermore, this study incorporates future weather files
1. The initial step involves collecting data from various sources such to assess the impact of weather conditions on retrofit measures under
as building stock, census, weather, and geographical data. various climate scenarios, aiming to achieve the energy policy targets
2. The next step involves developing building archetypes using exist- set by policymakers, such as those for 2030 or 2050. The sources of
ing building stock data to identify representative baseline models. these future weather files can vary, including resources like Meteonorm,
3. The subsequent step focuses on parametric simulation to develop WeatherShift, and CCWorldWeatherGen [46].
appropriate synthetic data. Similarly, the modeling process relies on additional sources such as
4. The step of developing machine learning models predicts building census data, reports on energy policies, and construction data. These
energy performance on a large scale using an ensemble or segrega- sources offer valuable insights into demographic patterns, energy con-
tion method. sumption trends, and infrastructure development, facilitating a more
5. Finally, the urban building energy performance analysis step an- comprehensive analysis and meeting the requirements of urban systems.
alyzes the modeling process results for planning and decision-
making purposes. 3.2. Building archetypes development

Several buildings on an urban scale often share similar character-

3.1. Data collection istics and can be classified into building archetypes. In the context of
urban building energy simulation, a building archetype, referred to as
The data collection process involves gathering various inputs for ur- a reference building, is a representative model that captures the typ-
ban building energy performance prediction using machine learning, ical characteristics and performance of a specific category or group
including building stock data, weather information, census data, reports of buildings within a large building stock. The parametric simulation
on energy policies, and construction data [5]. framework uses each building archetype as a baseline model. These
The building stock data are necessary for conducting physics-based data can be sourced from established national building stock databases,
simulations that encompass buildings’ geometry and non-geometry such as the TABULA or EPC databases [43]. Building archetypes or
data. This includes data such as building envelope specifications, reference buildings serve as standardized models that simplify the simu-
shapes, number of floors, type of building, geometry, geographical po- lation process by providing a baseline or template for analysis. They are
sition, and window opening ratios ([42]). Typically, the geometric data typically developed based on existing data collection, statistical analy-

4
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Fig. 2. Process of machine learning modeling to predict Energy Use Intensity (EUI) using machine learning models.

sis, and empirical studies of buildings within the target building stock. These methods allow for generating representative synthetic datasets
Moreover, simulating any building archetype requires geometric and encompassing a range of parameter combinations, facilitating a more
non-geometric data for each baseline model. These building archetypes comprehensive analysis of design alternatives and optimizing energy
are the starting point for parametric modeling of different buildings to modeling outcomes.
develop a synthetic stock.
3.4. Machine learning modeling
3.3. Parametric simulation
This process involves formulating machine learning models to es-
Parametric simulation provides an optimal solution, mainly when timate the building energy performance (Fig. 2). Synthetic building
only sparse data sets are available for energy modeling. To execute stock data, generated from the parametric simulation step, is intended
complex parametric simulations involving multiple parameters, a para- to serve as input for the development of machine learning models.
metric tool is used to perform numerous simulations using a Building
Energy Performance Simulator (BEPS) model ([47]). This study uses jE- 3.4.1. Data preprocessing
Plus as a parametric tool for energy simulations. Furthermore, jEPlus The process begins with data preprocessing, during which inconsis-
uses EnergyPlus for simulation and incorporates DesignBuilder con- tencies within the dataset are identified and eliminated before the data
struction templates to integrate diverse parameter values. Parametric are used for further analysis and model development.
simulation using EnergyPlus presents a robust approach to assess the
energy performance of buildings and investigate various design alterna- 3.4.2. Data splitting
tives. In the parametric simulation, EnergyPlus facilitates a systematic The pre-processed data is divided into two subsets to ensure optimal
exploration of the design parameters, providing insights into their im- training of the model: a training dataset used for training the model
pact on energy consumption, comfort, and other performance metrics. and a test dataset for evaluating the performance of the trained model.
The selection of parametric features plays a crucial role in devel- Two standard techniques for data splitting are random data splitting
oping parametric simulation-based models and generating synthetic and cross-validation.
datasets. The accuracy of the building energy model is highly depen- Random data splitting is a straightforward method in which data
dent on the careful selection of each parameter in this process. These is randomly divided into training and testing datasets, typically in an
parameter values, which encompass the necessary variations for syn- 80-20% split ratio. However, this method may cause problems with
thetic data generation, can be obtained from literature surveys that are uneven data distribution, and an incorrect selection of training and
specific to the relevant climate environments ([48,3]). testing datasets can also adversely affect the machine learning model’s
In the parametric simulation process, various essential parameters performance [51]. On the other hand, cross-validation is a more sophis-
are commonly used that include construction characteristics such as ticated method that is often used to strike a balance between minimal
walls, windows, floors, roofs, internal gains, occupancy density, and bias and variance in the trained model. This study adopts the k-fold
heating or cooling systems. They all contribute to the overall energy cross-validation algorithm for data splitting to prevent overfitting or
performance assessment and are integral to the parametric simulation. underfitting the model.
By considering these parameters and their variations, parametric simu-
lation enables the exploration of different design alternatives and their 3.4.3. Non-segregation models development
impact on energy consumption, comfort levels, and other performance This paper implements and compares three different machine learn-
metrics. It allows for a comprehensive evaluation of the energy effi- ing model approaches to predict building energy performance, namely:
ciency of the building and helps to make decisions about design op- the single model approach, end-use demand segregation method, and
timizations. Therefore, selecting the appropriate parameters and their ensemble-based segregation method. In the single model approach, also
values, based on literature surveys and specific climate environments, referred to as the “non-segregation” method, this study conducts a com-
is crucial to create accurate and representative synthetic datasets and parative analysis of various machine learning algorithms, assessing their
ensuring the reliability of parametric simulation-based models. predictive accuracy, efficiency, and suitability for building energy per-
However, dealing with the complexity of many parameters makes it formance modeling. Over recent years, machine learning models have
nearly impossible to generate simulated data for all possible combina- garnered considerable attention in data-driven modeling. Among the
tions. Sampling methods such as Simple Random Sampling (SRS) and most frequently used models are Linear Regression (LR), Neural Net-
Latin Hypercube Sampling (LHS) are used to generate synthetic data work (NN), Decision Tree (DT), Random Forest (RF), K-Nearest Neigh-
to address this challenge ([49,50]). Simple Random Sampling (SRS) is bor (KNN), Gradient Boosting (GB) and Support Vector Regression
a straightforward method in which each sample is randomly and in- (SVR) [7]. Some of the popular implementations of gradient boosting in-
dependently selected from the population. On the other hand, Latin clude XGBoost (Extreme Gradient Boosting), Histogram-Based Gradient
Hypercube Sampling (LHS) is a more advanced sampling method that Boosting (HGB), and LGBM (Light Gradient Boosted Machine). These
aims to achieve a more uniform distribution of samples across the entire algorithms have demonstrated exceptional performance in energy fore-
range of the data. LHS ensures that each parameter value combination is casting and prediction, particularly in the context of energy modeling,
balanced, allowing for a more comprehensive design space exploration. due to their extensive use and success in previous studies ([17,11]). By

5
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Fig. 3. Methodology for end-use demand segregation modeling to predict Energy Use Intensity (EUI) using machine learning.

assessing the effectiveness of these models, this study aims to discern improves accuracy compared to the conventional approach of using a
the most efficient approach to predict building energy performance us- single model. There are two main ensemble learning techniques that
ing machine learning techniques. differ mainly by kind of model, data sampling, and decision function.
Therefore, ensemble learning techniques can be classified as stacking
3.4.4. End-use demand segregation models development and voting techniques.
End-use demand segregation methods use different machine learn- The stacking method, also known as stacking generalization, was in-
ing models to predict each end-use demand. This strategy diverges troduced by Wolpert [52]. The goal is to reduce the generalization error
from the traditional approach of employing a single machine-learning of different machine learning models. The final Meta-Model comprises
model. This modification aims to achieve superior predictive perfor- the predictions of an “n” number of machine learning-based models
mance (Fig. 3). The workflow includes developing distinct regression through the k-fold cross-validation technique. On the other hand, the
machine learning models for each end-use demand, such as heating, voting ensemble method is one of the most intuitive and easy to under-
cooling, lighting, and hot water. The predictions of these end-use de- stand. The voting ensemble method comprises a number “n” of machine
mands are aggregated to calculate the final energy performance of the learning models, and the final prediction is the one with “the most
building, measured in terms of Energy Use Intensity (EUI). The predic- votes” or the highest weighted and averaged probability. Generally, en-
tion for each end-use demand is multiplied by its corresponding primary semble learning techniques use multiple best-prediction performance
energy factor. The resulting values for heating, cooling, equipment, machine learning models. The study implements a stacking-based en-
lighting, and hot water are then aggregated and photovoltaic energy semble method to predict each end-use demand, enhancing model ac-
generation is deducted from them to calculate the total energy con- curacy and predicting building energy performance. This method com-
sumption of the building. This cumulative total is then divided by the bines predictions from multiple models by training another model to
building area to calculate the Energy Use Intensity (EUI), a measure of consolidate its output, often resulting in more accurate and robust pre-
the energy performance of the building as defined in Equation (1). Fi- dictions compared to the voting ensemble method (Fig. 4).
nally, the EUI is classified into an Energy Performance Certificate (EPC)
label or rating, 3.4.6. Models performance
To evaluate the effectiveness of machine learning models, commonly
(𝐸heating × 𝑃 𝐸𝐹heating ) + (𝐸cooling × 𝑃 𝐸𝐹cooling ) used performance indices such as R-Squared (𝑅2 ), Mean Absolute Error
𝐸𝑈 𝐼 =
𝐴total (MAE), and Root Mean Squared Error (RMSE) are employed ([7,11]). A
(𝐸lighting × 𝑃 𝐸𝐹lighting ) + (𝐸equipment × 𝑃 𝐸𝐹equipment ) model with the lowest RMSE and MAE values and a 𝑅2 value near-
+ est to 1 is deemed superior among all models. Finally, in order to
𝐴total
assess the model’s accuracy, the predicted value of EUI (expressed in
(𝐸hotwater × 𝑃 𝐸𝐹hotwater ) − (𝐸PV × 𝑃 𝐸𝐹PV )
+ (1) kW h/(m2 *year)) is transformed into an Energy Performance Certifi-
𝐴total cate (EPC) label or rating. Furthermore, precision and recall are crucial
where 𝐸ℎ𝑒𝑎𝑡𝑖𝑛𝑔 , 𝐸𝑐𝑜𝑜𝑙𝑖𝑛𝑔 , 𝐸𝑙𝑖𝑔ℎ𝑡𝑖𝑛𝑔 , 𝐸𝑒𝑞𝑢𝑖𝑝𝑚𝑒𝑛𝑡 , 𝐸ℎ𝑜𝑡𝑤𝑎𝑡𝑒𝑟 , and 𝐸𝑃 𝑉 rep- metrics used for a detailed analysis of each class. Precision assesses
resent the energy consumption (or generation for 𝐸𝑃 𝑉 ) for each re- the accuracy of positive predictions made by the model, whereas recall
spective category in kilowatt hours per year (kW h/year).𝑃 𝐸𝐹ℎ𝑒𝑎𝑡𝑖𝑛𝑔 , quantifies the model’s capability to detect all positive instances within
𝑃 𝐸𝐹𝑐𝑜𝑜𝑙𝑖𝑛𝑔 , 𝑃 𝐸𝐹𝑙𝑖𝑔ℎ𝑡𝑖𝑛𝑔 , 𝑃 𝐸𝐹𝑒𝑞𝑢𝑖𝑝𝑚𝑒𝑛𝑡 , 𝑃 𝐸𝐹ℎ𝑜𝑡𝑤𝑎𝑡𝑒𝑟 , and 𝑃 𝐸𝐹𝑃 𝑉 are the dataset [3].
the primary energy factors (PEFs) for each respective category. 𝐴𝑡𝑜𝑡𝑎𝑙
represents the total floor area of the building in square meters (m2 ). 3.4.7. End-use features extraction
The final step of this process is to find the importance of features
3.4.5. Ensemble and segregation models development for each end-use demand using the developed machine learning model.
The workflow further implements ensemble machine learning meth- Feature importance refers to the determination of the relevance or con-
ods to test multiple learning algorithms and obtain better predictive per- tribution of individual features in a machine learning model to make
formance. Ensemble techniques are commonly used in machine learning accurate predictions. It helps in understanding which features have the
to enhance model accuracy by mitigating overfitting and increasing most significant impact on the model’s predictions.
generalizability. By leveraging the complementary strengths of mul- One popular method for calculating feature importance is SHAP
tiple models, ensemble learning provides more stable predictions and (SHapley Additive exPlanations). SHAP values provide a unified mea-

6
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Fig. 4. Methodology for ensemble machine learning modeling approach for enhanced predictive performance in machine learning models.

sure of feature importance by considering the contribution of each energy performance of buildings on an urban scale. This case study fol-
feature value to the prediction for a specific instance while also ac- lows the same structure as the proposed methodology discussed in the
counting for interactions between features. By using SHAP values, we previous section, with subsequent subsections following the same order.
can gain insight into which features impact the model’s predictions the
most. This information can be valuable for understanding the underly- 4.1. Data collection
ing relationships in the data and identifying the key drivers or factors
that influence the target variable. Collecting urban-scale building stock data is challenging as indi-
vidual building information is often unavailable [4]. The data collec-
3.5. Urban building energy performance analysis tion process involves acquiring raw building data from various sources
to implement the proposed methodology, including building stock
In the final phase of the methodology, the developed machine learn- datasets, building census datasets, weather data, and data from energy
ing model predicts the energy performance of the entire building stock. policymakers’ reports. See Table 1.
The availability of comprehensive building stock data can help stake- In Ireland, building stock data are available as Energy Performance
holders analyze the building stock at an urban scale and successfully im- Certificates (EPCs) maintained by the Sustainable Energy Authority of
plement sustainable energy policies. Furthermore, the developed model Ireland (SEAI). The EPC (also called the Building Energy Rating (BER)
can be applied to practical application scenarios, such as implementing certificate) dataset of the Irish residential stock represents the measured
and evaluating proposed retrofit measures as part of national-level pol- building stock and comprises more than 200 building characteristics.
icy decisions. These measures, often proposed at the national level, aim These features include building fabric, heating systems, estimated end-
to improve the energy performance of existing buildings through mod- use, CO2 emissions, and estimated delivered and primary energy con-
ifications and improvements. For example, this could include installing sumption. Each entry in the Irish EPC dataset contains an energy rating
heat pumps or integrating renewable energy systems like solar panels. for the respective building, ranked its energy performance on a graded
The proposed models can evaluate their impact before implementa- scale from A1 to G based on the estimated energy consumption per
tion and identify potential energy savings. This predictive capability square meter per year [53]. In 2023, the Irish EPC dataset contained
reduces the risk of implementing ineffective or inefficient measures, en- approximately 1,126,817 residential buildings, with a significant pro-
suring that resources are used optimally. It also helps fine-tune such portion of building ratings within the range of C1 to D2 (Fig. 5). The
measures to fit better the specific needs and constraints of the building dataset’s most common types of buildings are semi-detached and de-
stock. tached houses.
In general, the developed model offers a holistic approach to urban- The Irish census, conducted every four years by the Central Statis-
scale energy management and policy implementation, creating a more tics Office (CSO), collects various data points on the building where
sustainable built environment. Using modeling outcomes, stakehold- the respondent resides. Therefore, the census provides the number of
ers can navigate the complexities of urban building stock analysis and buildings in each geographic area [56]. According to the CSO 2022
energy policy implementation, even without extensive knowledge of dataset, Ireland has approximately 1,841,152 residential buildings. Sim-
building dynamics. This empowers policymakers and stakeholders alike ilarly, the GeoDirectory database provides statistical and geographical
to make informed decisions when retrofitting existing building stock to information on Ireland’s entire building stock [54]. The Q4 2022 GeoDi-
improve energy efficiency and mitigate environmental impact. rectory report, published by An Post (Irish Postal Service) and Ordnance
Survey Ireland, comprises geocoded addresses of 2,100,905 residential
4. Case study buildings in Ireland. Detached dwellings remained the most prevalent
type of residence (30.7% of the national total), followed by terraced
The primary objective of this case study is to test the proposed dwellings (28.2%) and semi-detached dwellings (24.7%). This study
methodology by calculating the energy performance of Ireland’s resi- focuses on Dublin City in Ireland and the Dublin EPC dataset, which
dential building stock. This methodology seamlessly integrates a data- includes 339,494 of the 624,758 residential buildings, representing the
driven approach with parametric simulation modeling to predict the highest proportion of the entire Irish building stock. This suggests that

7
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Table 1
Building data requirements and associated data sources for Irish case study.

Data Type Case Study Data Source Publisher

Building Stock Irish EPC (BER) Database [53] SEAI

Geographic data GeoDirectory [54,55] An Post/ Ordnance Survey Ireland
Census Irish Cenus database [56] Central Statistics Oﬃce
Weather Dublin EPW File [45] EnergyPlus and Meteonorm
Energy policymakers’ Reports Irish Climate Action Plan [57] Government of Ireland

Fig. 5. Irish EPC building energy rating chart used to determine building energy performance, percentage of total EPC vs. Non-EPC residential buildings.

Fig. 6. 3D geometry of Irish residential building archetypes for energy parametric simulation [44,48].

EPC data are available for only approximately 54% of the residen- types are selected to represent the primary variations of building types
tial building stock of Dublin City ([53]). This study employs machine based on data from the CSO, Irish EPC, and GeoDirectory datasets.
learning algorithms to predict the energy rating of the remaining 46% These building archetypes serve as the starting point for the paramet-
stock using limited variables (Fig. 5). Furthermore, the weather data ric modeling of different buildings, helping to develop a synthetic stock
for Dublin are obtained from the default EnergyPlus dataset, which in- representation. These four different types of residential buildings also
cludes historical data and also incorporates future weather files for 2030 exist in the GeoDirectory database, namely terraced houses, detached
by Meteonorm. This allows us to assess the impact of weather conditions houses, semi-detached houses, and bungalows (Fig. 6).
on retrofit measures in various climate scenarios. Building archetypes require both geometric and non-geometric data
Similarly, energy policy reports are necessary to explore future sce- to model each baseline model. The initial step involves identifying the
narios. Irish national reports, such as the Climate Action Plan 2023, are non-geometric and geometric parameters associated with the existing
used to test scenarios in this case study. This provides valuable insight building stock of Dublin. This information is essential for performing
into future plans and strategies for Irish residential buildings. These a parametric simulation using the archetypes. Geometric information
reports outline the goals, roadmaps, and goals set by policymakers to collected from various types of Irish buildings is based on existing stud-
address climate change, reduce greenhouse gas emissions, and improve ies and Irish building regulations guidelines. However, non-geometric
energy efficiency in the residential sector [57]. parameters are determined using current building energy performance
databases and literature surveys. For example, the Irish EPC provides
4.2. Building archetypes development values for essential building physics parameters, such as U-values for
walls, roofs, floors, and windows, along with their respective ranges.
The parametric simulation framework uses each building archetype Other relevant non-geometric parameters that impact the energy perfor-
as a baseline model. In this case study, four building types are consid- mance of the Irish building stock have been identified based on previous
ered as archetypes of the Irish residential building stock [44]. These research [44,48]. The geometric and non-geometric parameters of base-

8
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Table 2
Geometric and non-geometric parameters of baseline archetypes used in the Irish case study.

Geometric Parameters (Default Model Values)

Parameters Unit Terraced Detached Semi-detached Bungalow

Total Floor Area m2 91.66 130.81 107.69 85.91

Net Conditioned Area m2 91.66 130.81 107.69 85.91
Gross Roof Area m2 65.66 115.68 81.76 130.43
Window to Wall Ratio on NWSE facades % 0.4/0/0.4/0 0/0.5./0/0.5 0.4/0/0.4/0 0.4/0/0.4/0
Number of Stories (Height 2.7 meters) Numeric 2 2 2 1
Number of Zone Numeric 10 13 10 8
Orientation degree 0 90 0 0

Non-GeometricParameters (Default Model Values)

Wall U-value W/m2 K 0.5 0.5 0.5 0.5

Window U-value W/m2 K 3 3 3 3
Floor U-value W/m2 K 0.5 0.58 0.5 0.58
Roof U-value W/m2 K 0.33 0.33 0.33 0.33
Door U-value W/m2 K 2.041 2.041 2.041 2.041
Lighting Density W/m2 2.92 2.95 2.92 3.025
Occupancy Person 3 4 3 4
Equipment Density W/m2 1.47 1.61 1.47 1.56
Heating setpoint °C 21 21 21 21
Heating setback °C 12 12 12 12
HVAC Eﬃciency/ COP Numeric 0.8 0.8 0.8 0.8
DHW l/m2 /day 1.5 1.5 1.5 1.5
ACH Numeric 0.94 0.87 0.94 0.74
Renewables W 2400 2400 2400 2400

Table 3
Parameters needed for parametric simulation of archetypes.

No Parameters Unit Minimum Maximum Source

P1 Building type Categorical Semi Detached, Detached, House, [53]

Terrace, Bungalow
P2 Location Categorical Dublin [56]
P3 Weather Categorical Historical, 2030 EPW
P4 Wall U-value W/m2 K 0.09 2.4 [48,58]
P5 Window U-value W/m2 K 0.73 5.7 [48,58]
P6 Floor U-value W/m2 K 0.15 1.23 [48,58]
P7 Roof U-value W/m2 K 0.07 2.3 [48,58]
P8 Door U-value W/m2 K 0.81 5.9 [48,58]
P9 Orientation degree 0 315 [48,58]
P10 Lighting density W/m2 1 9 [48,58]
P11 Occupancy Person 1 6 [56]
P12 Equipment density W/m2 1 21 [48,58]
P13 Heating setpoint °C 18 23 [48,58]
P14 Heating setback °C 10 14 [48,58]
P15 HVAC eﬃciency or COP 0.45 to 4 0.3 4.5 [53]
P16 Domestic hot water l/m2 /day 0.5 3.5 [48,58]
P17 Air changes per hour Numeric 0.35 3 [59,53]
P18 Window-to-wall ratio % 30 70 [48,58]
P19 Renewables W Yes/No [53]

line archetypes with default values used for the Irish case study are struction templates and reducing the number of dependent features. For
shown in Table 2 [44,48,62,63]. instance, building elements require material features such as thickness,
conductivity, density, and specific heat. In this study, existing templates
4.3. Parametric simulation were used, and U-values were used to represent these features. This ap-
proach ultimately results in a reduction of the required parameters as
The selection of parametric features is pivotal in developing physics- inputs to the UBEM and further reduces the model computing time by
based models based on parametric simulation and generating synthetic
eliminating dependent parameters.
datasets after the archetype development process. The accuracy of the
One of the primary output parameters in this study is the Energy
building energy model relies on the careful selection of each input
Use Intensity (EUI), also referred to as the final primary energy use
and output parameter in this process. These parameter values embody
per building’s total floor area per year, measured in kW h/(m2 *year).
the necessary variations for synthetic data generation. In this study,
19 input parameters are used to simulate Irish residential building Irish EPC data provide information on building energy performance or
archetypes. The selection of these parameters is based on existing stud- certificate ratings in terms of EUI (kW h/(m2 *year)), which is further
ies on residential buildings [48,3]. However, these previous studies do interpreted on an A1 to G rating scale. An A1-rated building demon-
not include certain advanced features. Therefore, several additional strates the highest level of energy efficiency, typically associated with
parameters, including HVAC systems, are incorporated to conduct a the lowest energy consumption and CO2 emissions. On the other hand,
complete analysis of HVAC systems, primary heating factors, and re- a building with a G rating represents the least energy-efficient rating
newable parameters (Table 3). Furthermore, this study employed a (Fig. 5). Furthermore, this study focuses on the end-use demand seg-
building feature reduction approach by integrating Design-Builder con- regation method to calculate the Energy Use Intensity. Therefore, each

9
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Fig. 7. Distribution of 1 million residential buildings synthetic data in terms of the Irish building energy rating labels.

Table 4
Comparative analysis of machine learning models to predict end-use demand in kW h/yr using
RMSE metrics.

Models Heating Interior Lighting Interior Equipment Photovoltaic Power Water Systems

XGB 683.17 0 0 0.02 0

LGBM 801.69 0 0 0 0
HGB 1256.58 0.02 0.06 0.06 0.21
GB 2809.86 67.72 193.83 16 13.56
RF 1613.23 0 0 0 0
NN 3400.93 1.01 18.94 6.85 16.78
DT 2430.7 0 0 0 0
LR 5162.23 181.24 546.94 172.78 6440.26
KNN 5106.97 175.26 483.64 310.06 5629.45
SVM 7330.98 192.2 575.37 175.56 7976.76

end-use demand, including heating, lighting, equipment, photovoltaic, XGBoost (XGB), LightGBM (LGBM), Gradient Boosting (GB), Histogram-
and hot water, is considered an output parameter in the parameter sim- based Gradient Boosting (HGB), Random Forest (RF), Neural Network
ulation process. (NN), Decision Tree (DT), Linear Regression (LR), K-Nearest Neighbors
This study employs jEPlus as a parametric tool for physics-based (KNN) and Support Vector Machine (SVM). The performance of each de-
parametric simulation. A jEPlus uses the capabilities of EnergyPlus for veloped model is evaluated using metrics such as R-Squared (𝑅2 ), Mean
thermal simulation and integrates DesignBuilder construction templates Absolute Error (MAE), and Root Mean Squared Error (RMSE). A model
to incorporate diverse parameter values. A sample of 1 million buildings is considered superior if it achieves values closer to zero for RMSE and
is generated using the Latin hypercube sampling (LHS) method to con- MAE and values close to zero for 𝑅2 . The target feature is EUI, which is
struct a reliable machine learning model. This sampling process ensures used to predict building energy performance using regression models.
that the resulting distribution covers all energy rating data for Irish Furthermore, the final predicted EUI is also converted into an energy
buildings (Fig. 7). rating based on the Irish EPC rating (Fig. 5). Finally, the model’s per-
formance is further tested using an accuracy estimation of the energy
4.4. Machine learning modeling rating, with the model producing the highest accuracy being considered
the best learning model.
This process involves formulating an urban-scale building energy This study conducts a comparative analysis of three different ma-
performance machine learning model. The process begins with gener- chine learning models proposed in this research to evaluate which one
ated synthetic building stock data from the previous step, which are is best suited for predicting building energy performance. These ap-
preprocessed to remove outliers and improve the data set’s quality be- proaches include the single-model approach (non-segregation method),
fore implementing machine learning models. Subsequently, the data is the end-use demand segregation method, and the ensemble-based seg-
divided into two subsets to create training and testing datasets. This regation method. In the non-segregation method, EUI predicted using
study uses a 10-fold cross-validation method during data division to all ten machine learning models. Similarly, the workflow then develops
mitigate the risk of overfitting, rather than using a random data selec- learning models using the segregation method for each end-use demand,
tion for training and testing. such as heating, interior lighting, photovoltaic power and water systems
Ten different machine learning algorithms are analyzed to assess in the interior equipment. The process implemented and tested ten ma-
their abilities to predict EUI building energy performance based on chine learning models for each end-use demand (Table 4). The results
a given dataset. These regression algorithms have shown exceptional show that the XGB model showed the best performance in predicting
performance in energy forecasting and prediction, particularly within the demand for heating with an RMSE of 683.17. For interior lighting,
the context of energy modeling ([17,11,7]). The algorithms include interior equipment, photovoltaic power and water systems, the XGB,

10
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Table 5
List of important features with rank that aﬀect end-use demand machine learning models using SHAP
method.

Rank Heating Lighting Equipment Photovoltaic Water Systems

1 Air changes per hour Lighting density Equipment density Renewables Building type
2 Heating setpoint Building type Building type Orientation Domestic hot water
3 Wall U-value Weather
4 Building type
5 Occupancy
6 Window U-value
7 Equipment density
8 Weather
9 Roof U-value
10 Lighting density
11 Heating setback
12 Floor U-value

LGBM, RF and DT models reported an RMSE of 0, indicating excellent is essential to note that some models, such as NN, LR, KNN, and SVM,
performance. continue to demonstrate suboptimal performance even in the segrega-
In addition, models such as LR, KNN, and SVM exhibited relatively tion scenario. The Neural Network (NN) model shows relatively less
higher root mean square errors (RMSE) in all categories, indicating improvement compared to other models, which might suggest that it
less accurate predictions. The results demonstrate that the RMSE for does not benefit as much from segregation in this particular context.
most end-use demands is nearly 0. This can be attributed to the fact The poor performance of SVM persisted even with segregation, indicat-
that end-use demands calculated in EnergyPlus are derived using static ing that this model might not be suitable for this dataset irrespective of
calculations, meaning that values are determined based on fixed param- the data processing method.
eters and equations without accounting for variability or randomness. These results indicate that incorporating segregation in the analy-
Therefore, machine learning models can easily learn and map these sis improves the performance of most models, particularly XGB, LGBM,
fixed relationships between input features and end-use demands, re- and HGB. These findings highlight the importance of considering seg-
sulting in a near-perfect fit to the data. Furthermore, the SHAP method regation in the machine learning process to obtain more accurate pre-
is employed to gain further insight into the main features that affect the dictions for EUI values and emphasize the potential for future research
model output (Table 5). The findings reveal significant factors that af- to explore novel approaches to improve the performance of models that
fect energy consumption in buildings. The rate of air changes per hour are lagging.
emerged as the most influential feature, highlighting the importance of The modeling process is further improved using ensemble learning
ventilation in determining heating demand. The heating setpoint and techniques to combine the best-developed models (XGB, LGBM, and
wall U-value also ranked high, underscoring the importance of tem- HGB) based on performance. By comparing the interpretation of these
perature control and insulation in regulating energy usage. The type models, this study seeks to identify the most effective approach for
of building appeared consistently throughout the ranking, indicating its predicting building energy performance using machine learning tech-
substantial influence on overall energy demand and usage patterns. The niques.
relevance of orientation and weather in photovoltaic power generation These results highlight the importance of EUI segregation and the
emphasizes the need to consider building direction for optimal energy effectiveness of ensemble modeling in improving the accuracy of end-
production. These results provide valuable information for stakehold- use demand prediction (Table 6). In general, non-segregation method,
ers to understand these critical features and design effective strategies the XGB model achieved an RMSE of 13.89, with an accuracy of 76%.
aimed at reducing energy consumption, improving energy efficiency, On the contrary, the XGB model segregation method results in a sig-
and promoting sustainability in the built environment. nificantly lower RMSE of 7.69, indicating reduced prediction errors
Finally, the prediction of each end-use demand is multiplied by compared to the previous method. The accuracy improves to 89%, sug-
its respective Irish primary energy factor, and these values are then gesting more accurate predictions in most cases. Finally, the ensemble-
summed to determine the total energy consumption of the building. based segregation approach, combining the XGB, LGBM, and HGB mod-
This cumulative total is then divided by the area of the building to cal- els, achieves the lowest RMSE of 6.48, demonstrating a further reduc-
culate the EUI, a measure of the energy performance of the building.
tion in prediction errors compared to the previous methods. Accuracy
The results illustrate the significant improvement in the performance of
reaches 91%, indicating a higher level of correct predictions than the
various machine learning models in predicting EUI with and without ap-
other methods. The confusion matrix shows that the model performs
plying segregation methods (Fig. 8). Firstly, non-segregation scenario,
well with all energy ratings of the building (Fig. 9). The findings sug-
the XGB model demonstrates the best performance on all metrics, boast-
gest that the combination of models can enhance prediction capabilities
ing an RMSE of 13.89, MAE of 9.72, and an accuracy of 76% in terms
and provide more reliable estimates for decision-making processes.
of building rating. LGBM follows closely in performance. However, as
we move down the table, the performance degrades, with the SVM hav-
ing an RMSE of 71.96, MAE of 50.98, R-squared of 0.76 and accuracy 4.5. Urban building energy performance analysis
of 29%. This suggests that the Gradient Boosts models, such as XGB and
LGBM, are better suited for this problem of non-segregation. In the urban building energy performance analysis phase, the devel-
Secondly, when considering the EUI Segregation scenario, there is oped model is applied to practical application scenarios, implementing
a notable enhancement in the performance of several models. Specifi- retrofit measures outlined in Ireland’s National Climate Action Plan
cally, the XGB and LGBM models excel with good R-squared values and 2023. The objective is to retrofit existing residential buildings with
substantially lower RMSE and MAE values compared to those without below B2 ratings and install heat pumps. Two different scenarios are
the segregation method. These models achieve substantially higher ac- developed, improving the U values of windows, walls and roofs as rec-
curacy, with XGB reaching 89% and LGBM reaching 87%. This signifies ommended by Part L of the Irish Building Regulations and upgrading
that segregation could efficiently capture the underlying data patterns, the HVAC system from a boiler to a heat pump. Additionally, the sce-
aiding these models in making more precise predictions. However, it narios include options with and without renewables (Table 7).

11
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Fig. 8. Comparative analysis RMSE and accuracy of machine learning models using with and without end-use demand segregation method to predict EUI.

Table 6
Comparative analysis of method and machine learning models for predicting EUI
using model performance metrics.

Methods Models RMSE MAE R-squared Accuracy

Non-Segregation XGB 13.89 9.72 0.99 76%

Segregation XGB 7.69 4.67 1 89%
Ensemble Segregation XGB, LGBM, HGB 6.48 3.9 1 91%

Both retrofit scenarios are applied to a dataset of 10,000 buildings considered in this study are based on a Representative Concentration
with ratings below B2 and boilers as the HVAC system. This dataset size Pathway (RCP), which is a greenhouse gas concentration trajectory
of 10,000 buildings allows for a sufficiently large sample to analyze and adopted by the IPCC [60]. The 2030 weather file is based on RCP
apply retrofit scenarios effectively, covering all inefficient building rat- 4.5, described by the IPCC as an intermediate scenario and the most
ings from B3 to G. In general, there is a significant improvement in the probable baseline scenario, considering the exhaustible nature of non-
distribution of energy ratings in buildings. Furthermore, implementing renewable fuels. The study shows no significant differences when using
both retrofit scenarios in sample buildings resulted in a notable im- the future weather file. However, due to global warming and projected
provement, as indicated by the change in the distribution curve from average temperature increases of 1–1.6 °C, heating demand is expected
lower energy ratings to higher ones (Fig. 10). However, the results to decrease in the future, potentially leading to an improvement in
indicate that in Scenario I, where the heat pumps are installed with building energy ratings [61]. Furthermore, the rating distribution for
windows, walls, and roofs refurbished, only 2,725 buildings achieved a buildings is expected to change, primarily through using photovoltaics
rating of B2 and above. as renewable energy sources (Fig. 11).
In contrast, Scenario II, which included renewable installations, The results demonstrate that the proposed methodology helps ur-
showed a slight improvement, with 3,467 buildings reaching higher rat- ban planners, energy policymakers, utility planners, and manufacturers
ings. These results demonstrate that both scenarios could only improve in evaluating the implementation of retrofit measures on a large scale.
the higher rating of a relatively small percentage of buildings, ranging Additionally, this case study highlights that fabric renovation in build-
from 27% to 34%. It highlights the need for deeper retrofitting mea- ings is insufficient as a standalone solution. In conjunction with the
sures to achieve higher ratings, including heat pumps and renewables installation of the heat pump, it is crucial to address other factors such
(Fig. 10). as the airtightness of the building and the control of the heating to ef-
The results are further examined using historical and future weather fectively improve the energy performance of the building, as evidenced
conditions, utilizing a 2030-year weather file. The emission scenarios by the importance of the characteristics.

12
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Fig. 9. Confusion matrix shows the performance of the ensemble-based segregation model for each building rating. (For interpretation of the colors in the ﬁgure(s),
the reader is referred to the web version of this article.)

Table 7
Retroﬁt scenarios to analyze the pre or post-eﬀect on building energy performance at urban scale.

Retroﬁt Scenarios Window U-value Wall U-value Roof U-value HVAC Renewables

Scenario I 1.4 0.21 0.16 Heat Pump No

Scenario II 1.4 0.21 0.16 Heat Pump Yes

Fig. 10. Impact on the distribution of 10,000 building sample pre or post-retroﬁt scenarios.

5. Discussion present a signiﬁcant and ongoing barrier to accurately implementing

urban-scale modeling. The developed model allows for the prediction of
The proposed data-driven methodology offers a potential solution by various retrofit scenarios, even with limited resources. Segregation and
enabling the analysis of the energy performance of residential buildings ensemble-based methods improve the overall performance of the model,
on a large scale, facilitating the decision-making process. The method- resulting in a significant 15% improvement. However, it is essential to
ology uses limited available data to generate a synthetic dataset of note that the accuracy and implementation of the model depend on the
1 million buildings. This dataset is then used to develop a machine- quality and availability of input data and may vary in different contexts
learning model explicitly designed for the urban context. However, the and countries. Moreover, developing synthetic data for different build-
data required to implement the proposed methodology, such as building archetypes in other contexts might require additional computational
ing geometry and non-geometry data, census information, and weather time.
data, originate from various sources and come in different formats, Furthermore, the study identifies the key characteristics that in-
leading to data inconsistencies. Consequently, due to these inconsisten- fluence the building demand for end-use. This finding enables policy-
cies and the absence of standardized urban-scale data, available data makers to prioritize these influential features when considering retrofit

13
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

Fig. 11. Impact of historical and future weather conditions on the post-retroﬁt scenarios.

measures. By focusing on these critical factors, policymakers can ef- in accuracy 15%. Accurate prediction of building energy performance
fectively allocate resources and implement targeted retrofit strategies enables stakeholders, such as energy policymakers and urban planners,
to improve building energy efficiency. However, it should be acknowl- to make informed decisions when planning large-scale retrofit mea-
edged that the importance of characteristics may differ for different sures.
sample data, weather conditions, or urban contexts. In general, the proposed methodology offers valuable information
Finally, the proposed solution is a valuable tool for urban planners, and tools to support urban planners and energy policymakers in ad-
energy policymakers, utility planners, and manufacturers in evaluat- dressing the challenges of sustainable planning and energy efficiency on
ing and implementing retrofit scenarios at the urban scale. However, an urban scale. The data-driven approach, coupled with feature analysis
the models inherently depend on the quality of the data input. There- and predictive modeling, empowers decision-makers to make informed
fore, incorrect synthetic data that do not closely represent real-world choices and drive positive change in urban energy systems. The findings
conditions might not accurately capture the complexities and uncer- of this study offer valuable assistance to energy policymakers and urban
tainties of the actual urban context. Furthermore, machine learning planners by providing information that can contribute to the develop-
models are often considered ‘black boxes,’ which could lead to a lack ment of effective retrofit measures. These measures aim to decrease
of understanding of the underlying reasons behind the predictions. This building energy consumption and mitigate carbon emissions. By in-
lack of knowledge makes it difficult for policymakers and planners to corporating the knowledge gained from this study, policymakers and
trust and fully understand the recommendations. Additionally, the com- planners can make well-informed decisions that facilitate sustainable
plexity and computational requirements of machine learning models urban development and address the pressing issue of climate change.
and parametric simulations can be prohibitive, necessitating significant Furthermore, the study helps policymakers and urban planners eval-
computational resources. uate the feasibility and impact of implementing retrofit measures on a
larger scale. This comprehensive approach supports the formulation and
6. Conclusion and future work execution of strategies to address energy efficiency and environmental
concerns.
Stakeholders analyze the energy performance of buildings on an Future research directions could investigate the influence of dif-
urban scale to develop effective policy measures that reduce energy
ferent mid-rise or high-rise apartments and non-residential archetype
consumption and CO2 emissions. However, collecting and analyzing
models on the predictive performance of machine learning algorithms.
building energy performance data on a large scale is complex and time-
Furthermore, the integration of cloud computing parametric simulation
consuming, requiring multiple resources. To address this challenge, we
could further enhance the research results. Currently, this research fo-
propose a novel methodology that uses machine learning algorithms
cuses on annual energy use and could be expanded to analyze seasonal
to predict the energy performance of an entire urban building stock.
and monthly variations.
This methodology allows stakeholders to make informed decisions and
implement targeted interventions to promote sustainable urban devel-
Declaration of competing interest
opment. In this paper, we implement the end-use demand segregation
method and the ensemble-based approach to develop a robust learning
The authors declare that they have no known competing financial
model to predict building energy performance. This approach improves
the predictive performance of machine learning and supports informed interests or personal relationships that could have appeared to influence
decision-making in building energy performance assessment. the work reported in this paper.
The methodology tested on Dublin City by developing a synthetic
building dataset of 1 million residential buildings using parametric anal- Data availability
ysis of 19 key parameters identified from four building archetypes. The
results show that the segregation method is highly effective for predict- Data will be made available on request.
ing EUI based on the given dataset, compared to the traditional single
model approach. Among the ten different machine learning algorithms Acknowledgements
compared, variations of the Gradient Boosting algorithm (XGB, LGBM,
and HGB) are found to be the most efficient and accurate models to This publication has emanated from research supported by Sci-
predict building energy performance. Furthermore, the ensemble-based ence Foundation Ireland through US-Ireland R&D Partnership Research
approach further improved the results, achieving an accuracy of 91%. Grant 20/US/3695, the U.S. National Science Foundation through
Comparing the ten different models revealed that the ensemble-based Award Number 2217410, and the Department for the Economy in
segregation method is highly effective in predicting EUI, with an im- Northern Ireland through USI 167. The opinions, findings, and conclu-
provement in the energy rating of the building resulting in an increase sions or recommendations expressed in this material are those of the

14
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

author(s) and do not necessarily reflect the views of the Science Foun- [27] C.C. Davila, C.F. Reinhart, J.L. Bemis, Modeling Boston: a workflow for the effi-
dation Ireland or other funding agencies. cient generation and maintenance of urban building energy models from existing
geospatial datasets, Energy 117 (2016) 237–250.
[28] R. El Kontar, B. Polly, T. Charan, K. Fleming, N. Moore, N. Long, D. Goldwasser, Ur-
References banopt: an open-source software development kit for community and urban district
energy modeling, Tech. rep., National Renewable Energy Lab. (NREL), Golden, CO
[1] EU-Energy, Energy for Europe by European commission, Online; https://energy.ec. (United States), 2020.
europa.eu/index_en, 2022. (Accessed 1 December 2022). [29] Y.Q. Ang, Z.M. Berzolla, S. Letellier-Duchesne, V. Jusiega, C. Reinhart, Ubem. io:
[2] W.A. Benjamin, Revision of the energy performance of buildings directive: fit for 55 a web-based framework to rapidly generate urban building energy models for carbon
package, 2022. reduction technology pathways, Sustain. Cities Soc. 77 (2022) 103534.
[3] U. Ali, M.H. Shamsi, M. Bohacek, C. Hoare, K. Purcell, E. Mangina, J. O’Donnell, [30] S.S. Abolhassani, M. Amayri, N. Bouguila, U. Eicker, A new workflow for detailed ur-
A data-driven approach to optimize urban scale energy retrofit decisions for resi- ban scale building energy modeling using spatial joining of attributes for archetype
dential buildings, Appl. Energy 267 (2020) 114861. selection, J. Build. Eng. 46 (2022) 103661.
[4] C. Hoare, R. Aghamolaei, M. Lynch, A. Gaur, J. O’Donnell, A linked data approach [31] A. Katal, M. Mortezazadeh, L.L. Wang, H. Yu, Urban building energy and microcli-
to multi-scale energy modelling, Adv. Eng. Inform. 54 (2022) 101719. mate modeling–from 3d city generation to dynamic simulations, Energy 251 (2022)
[5] C.F. Reinhart, C.C. Davila, Urban building energy modeling–a review of a nascent 123817.
field, Build. Environ. 97 (2016) 196–202. [32] I.M. Borràs, D. Neves, R. Gomes, Using urban building energy modeling data to
[6] T. Hong, Y. Chen, X. Luo, N. Luo, S.H. Lee, Ten questions on urban building energy assess energy communities’ potential, Energy Build. 282 (2023) 112791.
modeling, Build. Environ. 168 (2020) 106508. [33] A. Nutkiewicz, Z. Yang, R.K. Jain, Data-driven urban energy simulation (due-s):
[7] U. Ali, M.H. Shamsi, C. Hoare, E. Mangina, J. O’Donnell, Review of urban build- integrating machine learning into an urban building energy simulation workflow,
ing energy modeling (UBEM) approaches, methods and tools using qualitative and Energy Proc. 142 (2017) 2114–2119.
quantitative analysis, Energy Build. 246 (2021) 111073. [34] A. Rahman, V. Srikumar, A.D. Smith, Predicting electricity consumption for com-
[8] T. Ahmad, H. Chen, Y. Guo, J. Wang, A comprehensive overview on the data driven mercial and residential buildings using deep recurrent neural networks, Appl. En-
and large scale based approaches for forecasting of building energy demand: a re- ergy 212 (2018) 372–385.
view, Energy Build. 165 (2018) 301–320. [35] C.E. Kontokosta, C. Tull, A data-driven predictive model of city-scale energy use in
[9] Y. Zhao, C. Zhang, Y. Zhang, Z. Wang, J. Li, A review of data mining technologies in buildings, Appl. Energy 197 (2017) 303–317.
building energy systems: load prediction, pattern identification, fault detection and [36] F. Jiang, J. Ma, Z. Li, Y. Ding, Prediction of energy use intensity of urban buildings
diagnosis, Energy Built Environ. 1 (2) (2020) 149–164. using the semi-supervised deep learning model, Energy 249 (2022) 123631.
[10] Y. Wang, T. Wu, H. Li, M. Skitmore, B. Su, A statistics-based method to quantify [37] Y. Zhang, B.K. Teoh, M. Wu, J. Chen, L. Zhang, Data-driven estimation of build-
residential energy consumption and stock at the city level in China: the case of the ing energy consumption and ghg emissions using explainable artificial intelligence,
Guangdong-Hong Kong-Macao Greater Bay area cities, J. Clean. Prod. 251 (2020) Energy 262 (2023) 125468.
119637. [38] J. Seo, S. Kim, S. Lee, H. Jeong, T. Kim, J. Kim, Data-driven approach to predicting
[11] Y. Sun, F. Haghighat, B.C. Fung, A review of the-state-of-the-art in data-driven ap- the energy performance of residential buildings using minimal input data, Build.
proaches for building energy prediction, Energy Build. 221 (2020) 110022. Environ. 214 (2022) 108911.
[12] C. Tian, Y. Ye, Y. Lou, W. Zuo, G. Zhang, C. Li, Daily power demand prediction for [39] N.-T. Ngo, A.-D. Pham, T.T.H. Truong, N.-S. Truong, N.-T. Huynh, T.M. Pham, An
buildings at a large scale using a hybrid of physics-based model and generative ad- ensemble machine learning model for enhancing the prediction accuracy of energy
versarial network, in: Building Simulation, vol. 15, Springer, 2022, pp. 1685–1701. consumption in buildings, Arab. J. Sci. Eng. 47 (4) (2022) 4105–4117.
[13] N. Abbasabadi, M. Ashayeri, Urban energy use modeling methods and tools; a re- [40] M. Wurm, A. Droin, T. Stark, C. Geiß, W. Sulzer, H. Taubenböck, Deep learning-
view and an outlook for future tools, Build. Environ. (2019) 106270. based generation of building stock data from remote sensing for urban heat demand
[14] P. Manandhar, H. Rafiq, E. Rodriguez-Ubinas, Current status, challenges, and modeling, ISPRS Int.l J. Geo-Inf. 10 (1) (2021) 23.
prospects of data-driven urban energy modeling: a review of machine learning meth-
[41] A.S. Mohammed, P.G. Asteris, M. Koopialipoor, D.E. Alexakis, M.E. Lemonis, D.J.
ods, Energy Rep. 9 (2023) 2757–2776.
Armaghani, Stacking ensemble tree models to predict energy performance in resi-
[15] C. Benavente-Peces, N. Ibadah, Buildings energy efficiency analysis and classifica- dential buildings, Sustainability 13 (15) (2021) 8298.
tion using various machine learning technique classifiers, Energies 13 (13) (2020)
[42] F. Johari, G. Peronato, P. Sadeghian, X. Zhao, J. Widén, Urban building energy
3497.
modeling: state of the art and future prospects, Renew. Sustain. Energy Rev. 128
[16] U. Ali, M.H. Shamsi, F. Alshehri, E. Mangina, J. O’Donnell, Comparative analysis
(2020) 109902.
of machine learning algorithms for building archetypes development inurban build-
[43] T. Loga, B. Stein, N. Diefenbach, Tabula building typologies in 20 European
ing energy modeling, in: Building Performance Modeling Conference and SimBuild,
countries—making energy-related features of residential building stocks compara-
2018.
ble, Energy Build. 132 (2016) 4–12.
[17] Y. Chen, M. Guo, Z. Chen, Z. Chen, Y. Ji, Physical energy and data-driven models in
[44] U. Ali, M.H. Shamsi, C. Hoare, E. Mangina, J. O’Donnell, A data-driven approach for
building energy prediction: a review, Energy Rep. 8 (2022) 2656–2671.
multi-scale building archetypes development, Energy Build. 202 (2019) 109364.
[18] R. Olu-Ajayi, H. Alaka, I. Sulaimon, F. Sunmola, S. Ajayi, Building energy con-
[45] W. Wang, S. Li, S. Guo, M. Ma, S. Feng, L. Bao, Benchmarking urban local weather
sumption prediction for residential buildings using deep learning and other machine
with long-term monitoring compared with weather datasets from climate station
learning techniques, J. Build. Eng. 45 (2022) 103406.
and energyplus weather (EPW) data, Energy Rep. 7 (2021) 6501–6514.
[19] Y. Pan, M. Zhu, Y. Lv, Y. Yang, Y. Liang, R. Yin, Y. Yang, X. Jia, X. Wang, F. Zeng,
[46] M.P. Tootkaboni, I. Ballarini, M. Zinzi, V. Corrado, A comparative analysis of differ-
et al., Building energy simulation and its application for building performance op-
ent future weather data for building energy performance simulation, Climate 9 (2)
timization: a review of methods, tools, and case studies, Adv. Appl. Energy (2023)
(2021) 37.
100135.
[20] M. Ferrando, F. Causone, T. Hong, Y. Chen, Urban building energy modeling (UBEM) [47] Y. Zhang, I. Korolija, Performing complex parametric simulations with jeplus, in:
tools: a state-of-the-art review of bottom-up physics-based approaches, Sustain. SET2010-9th International Conference on Sustainable Energy Technologies, 2010,
Cities Soc. 62 (2020) 102408. pp. 24–27.
[21] O. Pasichnyi, J. Wallin, O. Kordas, Data-driven building archetypes for urban build- [48] J. Egan, D. Finn, P.H.D. Soares, V.A.R. Baumann, R. Aghamolaei, P. Beagon, O.
ing energy modelling, Energy 181 (2019) 360–377. Neu, F. Pallonetto, J. O’Donnell, Definition of a useful minimal-set of accurately-
[22] L.G. Swan, V.I. Ugursal, Modeling of end-use energy consumption in the residential specified input data for building energy performance simulation, Energy Build. 165
sector: a review of modeling techniques, Renew. Sustain. Energy Rev. 13 (8) (2009) (2018) 172–183.
1819–1835. [49] Y. Choi, D. Song, S. Yoon, J. Koo, Comparison of factorial and Latin hypercube
[23] T. Hong, Y. Chen, S.H. Lee, M.A. Piette, Citybes: a web-based platform to support sampling designs for meta-models of building heating and cooling loads, Energies
city-scale building energy efficiency, Urban Comput. 14 (2016) 2016. 14 (2) (2021) 512.
[24] Y. Chen, T. Hong, M.A. Piette, Automatic generation and simulation of urban build- [50] W. Tian, Y. Heo, P. De Wilde, Z. Li, D. Yan, C.S. Park, X. Feng, G. Augenbroe,
ing energy models based on city datasets for city-scale building retrofit analysis, A review of uncertainty analysis in building energy assessment, Renew. Sustain.
Appl. Energy 205 (2017) 323–335. Energy Rev. 93 (2018) 285–301.
[25] D. Robinson, F. Haldi, P. Leroux, D. Perez, A. Rasheed, U. Wilke, Citysim: com- [51] Y. Ye, M. Strong, Y. Lou, C.A. Faulkner, W. Zuo, S. Upadhyaya, Evaluating perfor-
prehensive micro-simulation of resource flows for sustainable urban planning, in: mance of different generative adversarial networks for large-scale building power
Proceedings of the Eleventh International IBPSA Conference, no. CONF, 2009, demand prediction, Energy Build. 269 (2022) 112247.
pp. 1083–1090. [52] D.H. Wolpert, Stacked generalization, Neural Netw. 5 (2) (1992) 241–259.
[26] C. Reinhart, T. Dogan, J.A. Jakubiec, T. Rakha, A. Sang, Umi-an urban simulation [53] Building energy rating certificate database by SEAI, Online; https://ndber.seai.ie/
environment for building energy use, daylighting and walkability, in: 13th Con- BERResearchTool/ber/search.aspx. (Accessed 25 October 2023).
ference of International Building Performance Simulation Association, Chambery, [54] K. McDonagh, Geodirectory technical guide, an post and ordnance survey Ireland,
France, 2013. 2023.

15
U. Ali, S. Bano, M.H. Shamsi et al. Energy & Buildings 303 (2024) 113768

[55] Ordnance survey Ireland, Online; https://www.osi.ie. (Accessed 25 October 2023). [60] Intergovernmental panel on climate change (IPCC), Online; https://www.ipcc.ch.
[56] Census of population 2022 - profile 1 housing in Ireland by Central Statis- (Accessed 25 October 2023).
tics Office, Online; https://www.cso.ie/en/releasesandpublications/ep/p-cpsr/ [61] P. Nolan, J. Flanagan, High-resolution climate projections for Ireland–a multi-model
censusofpopulation2022-summaryresults/, 2022. (Accessed 25 October 2023). ensemble approach, Environmental Protection Agency, 2020.
[57] Ireland climate action plan 2023, Online; https://www.gov.ie/en/publication/ [62] D. Sood, I. Alhindawi, U. Ali, J.A. McGrath, M.A. Byrne, D. Finn, J. O’Donnell,
7bd8c-climate-action-plan-2023/. (Accessed 25 October 2023). Simulation-based evaluation of occupancy on energy consumption of multi-scale
[58] U. Ali, M.H. Shamsi, M. Bohacek, C. Hoare, K. Purcell, E. Mangina, J. O’Donnell, residential building archetypes, J. Build. Eng. 75 (2023) 106872.
A data-driven approach to optimize urban scale energy retrofit decisions for resi- [63] D. Sood, I. Alhindawi, U. Ali, D. Finn, J.A. McGrath, M.A. Byrne, J. O’Donnell,
dential buildings, Appl. Energy 267 (2020) 114861. Zone-wise occupancy schedules developed using Time Use Survey data for building
[59] J. Laue, Ashrae 62.1: using the ventilation rate procedure, Consult.-Specif. Eng. 55 energy performance simulations, Data Brief 49 (2023) 109453.
(2018) 14–17.

Neural Machine Translation For Low-Resource (Repoprt)
No ratings yet
Neural Machine Translation For Low-Resource (Repoprt)
24 pages
Modeling and Forecasting Building Energy Consumption - A Review of Data-Driven Techniques
No ratings yet
Modeling and Forecasting Building Energy Consumption - A Review of Data-Driven Techniques
27 pages
1 s2.0 S0360132322002955 Main
No ratings yet
1 s2.0 S0360132322002955 Main
20 pages
Comparing Deep Learning Models For Multi Energy Vectors Prediction On Multiple Types of Building
No ratings yet
Comparing Deep Learning Models For Multi Energy Vectors Prediction On Multiple Types of Building
25 pages
Explainable AI For Building Energy Retrofitting Under Data Scarcity
No ratings yet
Explainable AI For Building Energy Retrofitting Under Data Scarcity
25 pages
23 09 2023 From Roman PHD - Maksym - Literature Review
No ratings yet
23 09 2023 From Roman PHD - Maksym - Literature Review
339 pages
The Building Data Genome Project 2, Energy Meter Data From The ASHRAE Great Energy Predictor III Competition
No ratings yet
The Building Data Genome Project 2, Energy Meter Data From The ASHRAE Great Energy Predictor III Competition
14 pages
Building Energy Models at Different Time Scales Based On Multi-Output Machine Learning
No ratings yet
Building Energy Models at Different Time Scales Based On Multi-Output Machine Learning
30 pages
Lipid Nanoparticle Delivery
No ratings yet
Lipid Nanoparticle Delivery
18 pages
Engineering Applications of Artificial Intelligence
No ratings yet
Engineering Applications of Artificial Intelligence
24 pages
1 s2.0 S0360544221029418 Main
No ratings yet
1 s2.0 S0360544221029418 Main
12 pages
Chapter 2 RRL
No ratings yet
Chapter 2 RRL
22 pages
Systematic Review of Deep Learning and Machine Learning For Building Energy
No ratings yet
Systematic Review of Deep Learning and Machine Learning For Building Energy
48 pages
Marine Global Services Network Catalogue
No ratings yet
Marine Global Services Network Catalogue
118 pages
Building Energy Performance Forecastinga Multiple Linear Regression Approach
No ratings yet
Building Energy Performance Forecastinga Multiple Linear Regression Approach
30 pages
An Ensemble Model For The Energy Consumption Prediction of Residential Buildings - ScienceDirect
No ratings yet
An Ensemble Model For The Energy Consumption Prediction of Residential Buildings - ScienceDirect
15 pages
Parrot Minidrone Competition Report
No ratings yet
Parrot Minidrone Competition Report
17 pages
Urban Building Energy Performance Prediction and Retrofit Analysis Using Data-Driven Machine Learning Approach - ScienceDirect
No ratings yet
Urban Building Energy Performance Prediction and Retrofit Analysis Using Data-Driven Machine Learning Approach - ScienceDirect
38 pages
A Review On Time Series Forecasting Techniques For Building Energy - 2017
No ratings yet
A Review On Time Series Forecasting Techniques For Building Energy - 2017
23 pages
Buildings 12 02039
No ratings yet
Buildings 12 02039
25 pages
Paper 1
No ratings yet
Paper 1
40 pages
1 s2.0 S0306261921015592 Main
No ratings yet
1 s2.0 S0306261921015592 Main
13 pages
1 s2.0 S037877882100459X Main
No ratings yet
1 s2.0 S037877882100459X Main
15 pages
1 s2.0 S0360132323002792 Main
No ratings yet
1 s2.0 S0360132323002792 Main
15 pages
s12273 024 1181 y
No ratings yet
s12273 024 1181 y
19 pages
Virtual Reality: Steven M. Lavalle
100% (1)
Virtual Reality: Steven M. Lavalle
143 pages
1 s2.0 S037877881631372X Main
No ratings yet
1 s2.0 S037877881631372X Main
14 pages
Mini Project
No ratings yet
Mini Project
33 pages
A Review of The-State-Of-The-Art in Data-Driven Approaches For Building
No ratings yet
A Review of The-State-Of-The-Art in Data-Driven Approaches For Building
23 pages
Fonseca City Simulation Tool
No ratings yet
Fonseca City Simulation Tool
19 pages
1 s2.0 S0378778825003743 Main
No ratings yet
1 s2.0 S0378778825003743 Main
15 pages
Buildings 12 01636 v3
No ratings yet
Buildings 12 01636 v3
15 pages
Tutorial Electromagnetic Transient Analysis Simulation Tools Julia
No ratings yet
Tutorial Electromagnetic Transient Analysis Simulation Tools Julia
65 pages
A Comparative Study of Machine Learning and Deep Learning Methods For Energy Balance Prediction in A Hybrid Building-Renewable Energy System
No ratings yet
A Comparative Study of Machine Learning and Deep Learning Methods For Energy Balance Prediction in A Hybrid Building-Renewable Energy System
18 pages
1 s2.0 S0306261920310953 Main
No ratings yet
1 s2.0 S0306261920310953 Main
12 pages
Predicting Energy Consumption in Multiple Buildings Using Machine
No ratings yet
Predicting Energy Consumption in Multiple Buildings Using Machine
15 pages
1 s2.0 S0360544221003145 Main
No ratings yet
1 s2.0 S0360544221003145 Main
12 pages
6 2020 Simpat - Compressed
No ratings yet
6 2020 Simpat - Compressed
50 pages
A Study of Deep Learning-Based Multi-Horizon Building Energy Forecasting
No ratings yet
A Study of Deep Learning-Based Multi-Horizon Building Energy Forecasting
15 pages
Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble
No ratings yet
Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble
24 pages
Reference - 8
No ratings yet
Reference - 8
17 pages
Safety Flowchart
No ratings yet
Safety Flowchart
49 pages
Approaching Socially Aware Robot Navigation Framework Where and How To Approach
No ratings yet
Approaching Socially Aware Robot Navigation Framework Where and How To Approach
15 pages
Reference - 1
No ratings yet
Reference - 1
17 pages
Heating and Cooling Loads Forecasting For Residential Buildings Based On Hybrid Machine Learning Applications A Comprehensive Review and Comparative Analysis
No ratings yet
Heating and Cooling Loads Forecasting For Residential Buildings Based On Hybrid Machine Learning Applications A Comprehensive Review and Comparative Analysis
20 pages
Energies 17 01285 v2
No ratings yet
Energies 17 01285 v2
18 pages
DumpsBoss Top-Rated CISA Certified Information Systems Auditor Dumps
No ratings yet
DumpsBoss Top-Rated CISA Certified Information Systems Auditor Dumps
7 pages
Lei - A Building Energy Consumption Prediction Model Based On Rough Set Theory and Deep Learning Algorithms
No ratings yet
Lei - A Building Energy Consumption Prediction Model Based On Rough Set Theory and Deep Learning Algorithms
19 pages
Energies 16 07508 v2
No ratings yet
Energies 16 07508 v2
24 pages
Goyal 2020
No ratings yet
Goyal 2020
5 pages
314008-Computer Aided Drawing and Simulation
No ratings yet
314008-Computer Aided Drawing and Simulation
9 pages
Comparison of Dynamic Urban Building Energy Models (UBEM)
No ratings yet
Comparison of Dynamic Urban Building Energy Models (UBEM)
11 pages
Energies 16 03748
No ratings yet
Energies 16 03748
23 pages
(Asce) SC 1943-5576 0000555
No ratings yet
(Asce) SC 1943-5576 0000555
8 pages
Comparison of Multi Linear Regression and Artificial Neural Network To Predict The Energy Consumption of Residential Buildings
No ratings yet
Comparison of Multi Linear Regression and Artificial Neural Network To Predict The Energy Consumption of Residential Buildings
10 pages
Building Energy Consumption Prediction Using Deep Learning
No ratings yet
Building Energy Consumption Prediction Using Deep Learning
11 pages
Convolutional Neural Network Based Energy Consumption Management Model For The Full Life Cycle
No ratings yet
Convolutional Neural Network Based Energy Consumption Management Model For The Full Life Cycle
9 pages
Paper Presentation Betab Ash
No ratings yet
Paper Presentation Betab Ash
7 pages
Energy Conversion and Management
No ratings yet
Energy Conversion and Management
16 pages
Bai Bao Online SCIE
No ratings yet
Bai Bao Online SCIE
12 pages
The Virtual Brain User Guide
No ratings yet
The Virtual Brain User Guide
100 pages
Network Manager Training Course Descriptions Nm6 Updated Rev 8
No ratings yet
Network Manager Training Course Descriptions Nm6 Updated Rev 8
90 pages
Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach
No ratings yet
Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach
20 pages
Energies: A Review of Deep Learning Techniques For Forecasting Energy Use in Buildings
No ratings yet
Energies: A Review of Deep Learning Techniques For Forecasting Energy Use in Buildings
26 pages
Solar PV and Wind Energy Systems
From Everand
Solar PV and Wind Energy Systems
Amitabh Bhosale
No ratings yet
ENB2012
No ratings yet
ENB2012
9 pages
5an Application of Bayesian Network Approach For Selecting Energy Efficient
No ratings yet
5an Application of Bayesian Network Approach For Selecting Energy Efficient
1 page
1 s2.0 S0973082621001307 Main
No ratings yet
1 s2.0 S0973082621001307 Main
14 pages
Building Energy Prediction
No ratings yet
Building Energy Prediction
14 pages
2024 CSCE OSCE Manual
No ratings yet
2024 CSCE OSCE Manual
12 pages
Greenbuilding tr09
No ratings yet
Greenbuilding tr09
14 pages
Level Up Game On - Leveling Up Learning Through Gamification
No ratings yet
Level Up Game On - Leveling Up Learning Through Gamification
16 pages
TR - Articulated Driving NC III
No ratings yet
TR - Articulated Driving NC III
73 pages
Project Proposal Final With Refs
No ratings yet
Project Proposal Final With Refs
8 pages
Cucs 008 12
No ratings yet
Cucs 008 12
6 pages
RFID in Healthcare A Six Sigma DMAIC and Simulation Case Study
No ratings yet
RFID in Healthcare A Six Sigma DMAIC and Simulation Case Study
31 pages
Experimental Procedures For Efficient After-Treatment Model Calibration
No ratings yet
Experimental Procedures For Efficient After-Treatment Model Calibration
28 pages
Modeling Heating and Cooling Loads by Artificial Intelligence For Energy-Efficient Building Design
No ratings yet
Modeling Heating and Cooling Loads by Artificial Intelligence For Energy-Efficient Building Design
10 pages
An Advanced Cyber Physical Framework For Micro Devices Assembly
No ratings yet
An Advanced Cyber Physical Framework For Micro Devices Assembly
15 pages
Project Schedule Excel Spreadsheet
No ratings yet
Project Schedule Excel Spreadsheet
6 pages
Date: Study of Network Simulator Tools
No ratings yet
Date: Study of Network Simulator Tools
11 pages
Enhancement of A Modelica Model of A Desiccant Wheel
No ratings yet
Enhancement of A Modelica Model of A Desiccant Wheel
7 pages
Verification of WWTP Design Guidelines With Activated Sludge Process Models
No ratings yet
Verification of WWTP Design Guidelines With Activated Sludge Process Models
10 pages
Avionics Electronics Engineer Testing in Dallas FT Worth TX Resume John Dillon
No ratings yet
Avionics Electronics Engineer Testing in Dallas FT Worth TX Resume John Dillon
3 pages
Acknowledgement: Group C2
No ratings yet
Acknowledgement: Group C2
37 pages
Process Improvement in Casting Through Defect Minimization A Case Study1 PDF
100% (1)
Process Improvement in Casting Through Defect Minimization A Case Study1 PDF
7 pages
HP Uses OR
No ratings yet
HP Uses OR
14 pages
Company: Sustainable Product-Process Engineering, Evaluation & Design
No ratings yet
Company: Sustainable Product-Process Engineering, Evaluation & Design
8 pages
Pro II Basics Tutorial
No ratings yet
Pro II Basics Tutorial
2 pages
Most Essential Learning Competencies (Melcs) : Pecial Urricular Rograms
100% (2)
Most Essential Learning Competencies (Melcs) : Pecial Urricular Rograms
16 pages

Urban Building Energy Performance Prediction and Retrofit Analysis Using Data-Driven Machine Learning Approach

Uploaded by

Urban Building Energy Performance Prediction and Retrofit Analysis Using Data-Driven Machine Learning Approach

Uploaded by

Energy & Buildings 303 (2024) 113768

Contents lists available at ScienceDirect

Energy & Buildings

Urban building energy performance prediction and retroﬁt analysis using

𝐵𝐸𝑀 Building Energy Modeling 𝐻𝐺𝐵 Histogram-Based Gradient Boosting

Several buildings on an urban scale often share similar character-

Data Type Case Study Data Source Publisher

Building Stock Irish EPC (BER) Database [53] SEAI

Geometric Parameters (Default Model Values)

Parameters Unit Terraced Detached Semi-detached Bungalow

Total Floor Area m2 91.66 130.81 107.69 85.91

Non-GeometricParameters (Default Model Values)

Wall U-value W/m2 K 0.5 0.5 0.5 0.5

No Parameters Unit Minimum Maximum Source

P1 Building type Categorical Semi Detached, Detached, House, [53]

XGB 683.17 0 0 0.02 0

Rank Heating Lighting Equipment Photovoltaic Water Systems

Methods Models RMSE MAE R-squared Accuracy

Non-Segregation XGB 13.89 9.72 0.99 76%

Scenario I 1.4 0.21 0.16 Heat Pump No

5. Discussion present a signiﬁcant and ongoing barrier to accurately implementing

You might also like