Machine Learning Algorithms for Predicting Energy
Machine Learning Algorithms for Predicting Energy
Research Article
Machine Learning Algorithms for Predicting Energy
Consumption in Educational Buildings
Khaoula Elhabyb ,1 Amine Baina,1 Mostafa Bellafkih ,1 and Ahmed Farouk Deifalla 2
1
National Institute of Posts and Telecommunications (INPT), Rabat, Morocco
2
Future University Cairo in Egypt, Cairo, Egypt
Received 14 December 2023; Revised 19 March 2024; Accepted 26 March 2024; Published 13 May 2024
Copyright © 2024 Khaoula Elhabyb et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
In the past few years, there has been a notable interest in the application of machine learning methods to enhance energy efficiency
in the smart building industry. The paper discusses the use of machine learning in smart buildings to improve energy efficiency by
analyzing data on energy usage, occupancy patterns, and environmental conditions. The study focuses on implementing and
evaluating energy consumption prediction models using algorithms like long short-term memory (LSTM), random forest, and
gradient boosting regressor. Real-life case studies on educational buildings are conducted to assess the practical applicability of
these models. The data is rigorously analyzed and preprocessed, and performance metrics such as root mean square error
(RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are used to compare the effectiveness of
the algorithms. The results highlight the importance of tailoring predictive models to the specific characteristics of each
building’s energy consumption.
$67.4 billion
2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030
Year
This data is centralized for optimizing building operations, to utilizing streaming and temporal data for forecasting
improving resident comfort, and reducing energy usage. buildings’ energy demand. This gap will be addressed
ML, on the other hand, is a powerful tool for processing through the utilization of real historical electricity data. The
large amounts of data from various sources. It analyzes used data is analyzed to evaluate model performance and
this data to identify patterns and predict future events, accuracy, aiming to identify the most effective approach for
such as equipment failures, enabling preventative mainte- smart building energy management. The research is aimed
nance [5]. at optimizing forecasting techniques through rigorous
The American Council for Energy-Efficient Economy comparative analysis, leveraging the strengths of LSTM, RF,
(ACEEE) [6] suggests that commercial buildings can signif- and GBR models. The study highlights the importance of
icantly reduce their energy bills by up to 30% by imple- advanced machine learning in shaping smart building strate-
menting energy-efficient technologies such as smart gies and is aimed at enhancing sustainability and efficiency in
thermostats and controlled lighting. The US Department of energy usage. Insights from this research will inform future
Energy [7] reports that commercial buildings account for a advancements in energy management practices for sustain-
significant portion of total energy consumption and green- able development. The article is structured into several delin-
house gas emissions in the US. This highlights the impor- eated sections, each serving a specific purpose:
tance of buildings that can predict energy consumption
and plan efficiently to reduce energy usage. Intel research (i) Introduction: This section introduces the applica-
[8] indicates also that energy consumption prediction has tion of AI within the smart building sector, setting
the potential to achieve operational cost savings, staff pro- the context for the study
ductivity gains, and energy usage reductions. Given these
(ii) Literature analysis: Here, a comparative examina-
findings, the primary emphasis will be on forecasting the
tion of various ML algorithms used for energy pre-
energy usage of smart buildings, with a specific focus on
diction in smart building systems is provided,
educational facilities, which will be analyzed for the first
drawing insights from existing research
time. Understanding and predicting energy consumption
in educational environments are paramount for optimizing (iii) Methodology: This section outlines the systematic
resource allocation, implementing effective efficiency mea- approach adopted in the study, encompassing data
sures, and establishing sustainable and cost-effective opera- analytics, model development, and model evalua-
tional procedures [9]. By focusing on this sector, valuable tion processes
insights can be gained to inform strategies for enhancing
energy efficiency and sustainability in educational buildings, (iv) Results and discussions: Findings obtained from the
ultimately contributing to improved resource management methodology are presented, followed by a compara-
and environmental conservation efforts. tive analysis that juxtaposes these results with prior
The research concentrates on energy management research initiatives
within smart buildings, aiming to forecast power consump- (v) Conclusion: This section synthesizes the results and
tion through three distinct approaches: a traditional statisti- provides conclusions, offering perspectives on the
cal approach employing the random forest algorithm, a deep implications of the study’s findings for the field of
learning approach utilizing long short-term memory smart building energy management
(LSTM), and a hybrid approach leveraging the gradient
boosting regressor algorithm. These three techniques were 2. Literature Review
chosen to investigate a research gap regarding to the major-
ity of data-driven methodologies. While significant progress A recent study conducted by the International Energy
has been made in this area, limited attention has been given Agency [10] has revealed concerning levels of energy
International Journal of Energy Research 3
Energy
Smart meters,
demand response Water
Smart meters, use
Lighting and flow sensing
Occupancy sensing
HVAC
Fans, variable air
Fire volume, air quality
Functionality checks,
detector service Elevators
Maintenance,
24/7 monitoring performance
Condition monitoring,
parking lot utilization Access and security
Badge in, cameras,
PEHV charging integration perimeter,
Charging of hybrid doors
and electric vehicles
consumption in buildings. The study found that buildings model is not explicitly instructed on what to search
are responsible for a significant portion of electricity con- for but instead learns
sumption and overall energy consumption in urban areas.
Buildings account for 72% of total electricity consumption (iii) Semisupervised learning is a learning approach that
and 38% of average energy consumption in urban areas. combines supervised and unsupervised learning. In
Additionally, buildings contribute to almost 40% of total this approach, the model is trained using a data set
carbon dioxide pollution in urban areas. A smart building that is partly labeled, meaning that some of the data
is a modern infrastructure that incorporates automated points have known labels
control systems and uses data to improve the building’s (iv) Reinforcement learning where a model is trained to
performance and occupants’ comfort. Figure 2 presents make a series of decisions in a changing environ-
the smart building functionalities and its most important ment. The model learns through trial and error,
axis of work. receiving feedback in the form of rewards or costs
The top technology companies are currently prioritizing
IoT (Internet of Things) and AI (artificial intelligence). The Energy consumption prediction is a valuable technique
future of building innovation is expected to focus on achiev- that involves forecasting the amount of energy a system or
ing maximum energy efficiency, and this challenge can be device will use within a specific time frame. This technique
addressed by integrating AI-powered systems like machine serves various purposes, such as optimizing energy usage,
learning (ML) and deep learning. ML systems continuously predicting future energy demands, and identifying potential
improve themselves, leading to advancements in various AI inefficiencies in energy consumption. To predict energy
research areas [12]. ML involves algorithms that allow them consumption, different methods can be employed, including
to respond to inputs from their environment and identify statistical models, machine learning algorithms, and physics-
nonlinear connections in complicated or uncertain systems. based models. The choice of technique depends on factors
ML is divided into four major categories based on the type of such as data availability, system complexity, and the desired
learning task they manage: supervised learning, unsuper- level of accuracy. In this particular case, the focus is on
vised learning, semisupervised learning, and reinforcement utilizing machine learning algorithms to predict energy
learning. consumption by leveraging historical data and other relevant
factors.
(i) Supervised learning is a method of developing a The quality and relevance of the data used in machine
machine learning model by using a labeled data learning algorithms greatly influence their performance. In
set. In this process, each data point in the set is asso- a study conducted by Ahajjam et al. [13] on Moroccan
ciated with a known intended output. The model is Buildings’ Electricity Consumption Data Set, electricity con-
trained to predict the output sumption was categorized into three types: whole premises
(WP), individual loads (IL), and circuit-level (CL) data.
(ii) Unsupervised learning: in contrast to traditional
supervised learning, developing a model on an unla- (1) Labeled WP: Labeled whole premises (WP) con-
beled data set involves working with data where the sumption data refers to electricity usage data col-
target outputs are unknown. In this scenario, the lected from 13 households in the MORED data set.
4 International Journal of Energy Research
This data is valuable as it includes not only the raw data, outperformed support vector regression (SVR) and
electricity consumption measurements but also addi- Gaussian process regression (GPR) with the best RMSE
tional information that can assist in analyzing, scores. The dropout method reduces overfitting, and Shap-
modeling, and comprehending the patterns of elec- ley’s additive explanation is used for feature analysis. Accu-
tricity usage in different households rate energy consumption estimates can help detect and
diagnose system faults in buildings, aiding in energy policy
(2) Labeled IL: Ground-truth electricity consumption implementation. Further, Kawahara et al. [18] explore the
refers to the electricity consumption data of individ- application of various machine learning models to predict
ual loads (IL) that have been labeled or annotated voltage in lithium-ion batteries. The study includes algo-
with accurate information. This involves recording rithms such as support vector regression, Gaussian process
and labeling the operational states of specific loads, regression, and multilayer perceptron. The hyperparameters
such as refrigerators or air conditioners when they of each model were optimized using 5-fold cross-validation
are turned on or off at specific times. Having this on training data. The data set used consists of both simula-
ground-truth information is valuable for researchers tion data, generated by combining driving patterns and
and analysts as it allows for accurate load disaggrega- applying an electrochemical model, and experimental data.
tion, energy management, and appliance recognition The performance of the ML models was evaluated using both
(3) CL: Measurements in the context of energy refer to simulation and experimental data, with different data sets
the circuit-level energy measurements obtained from created to simulate variations in state of charge distribution.
the electrical mains of a premises. These measure-
ments provide information about the overall energy 2.2. Deep Learning and Hybrid Approaches. Additionally,
consumption of a circuit and can be used to under- various networks integrate multiple techniques to devise
stand the energy consumption of a group of loads data-driven approaches. These integrated mechanisms are
commonly referred to as hybrid networks. For example,
The current work focuses on three educational buildings Mohammed et al. [19] focus on the application of an
located at Down Town University. Further information intelligent control algorithm in HVAC systems to enhance
about these buildings will be provided next. The subsequent energy efficiency and thermal comfort. The authors pro-
section presents a literature review on energy consumption pose integrating SCADA systems with an intelligent build-
forecasting in various buildings using multiple machine ing management system to optimize heat transmission
learning algorithms. coefficients and air temperature values. Genetic algorithms
are employed to maintain user comfort while minimizing
2.1. Traditional Machine Learning Approach. ML algorithms energy consumption. Similar to [19], Aurna et al. [20]
have been utilized to tackle the primary challenges of compare the performance of ARIMA and Holt-Winters
physics-driven methods in load prediction. For instance, models in predicting energy consumption data in Ohio
Somu et al. [14] developed eDemand, a new building energy and Kentucky. The study finds that the Holt-Winters model
use forecasting model, using long short-term memory net- is more accurate and effective for long-term forecasting. The
works and an improved sine cosine optimization algorithm, authors recommend further research to consider other
and as a result, the model outperformed previous state-of- parameters, and environmental factors, and explore hybrid
the-art models in real-time energy load prediction. Next, models for better short-term load forecasting. Next, Fer-
Suranata et al. [15] focused on predicting energy consump- doush et al. [21] developed a hybrid forecasting model for
tion in kitchens. They used a feature engineering technique time series electrical load data. The model combines random
and a short-term memory (LSTM) model. Principal compo- forest and bidirectional long short-term memory methods
nent analysis (PCA) was applied to extract important fea- and was tested on a 36-month Bangladeshi electricity con-
tures, and the LSTM model was used on two tables. In sumption data set. The results showed that the hybrid model
addition, Shapi et al. [16] developed a prediction model for outperformed standard models in terms of accuracy. The
energy demand making use of the Microsoft Azure cloud- study emphasizes the effectiveness of the hybrid machine
based machine learning framework, The methodology of learning approach in improving short-term load forecasting
the prediction model is provided using three distinct tech- accuracy in the dynamic electric industry. In their study,
niques, including support vector machine, artificial neural He and Tsang [22] developed a hybrid network combining
network, and k-nearest neighbors. The study focuses on long short-term memory (LSTM) and improved complete
real-world applications in Malaysia, with two tenants from ensemble empirical mode decomposition with adaptive noise
an industrial structure chosen as case studies. The experi- (iCEEMDAN) to optimize electricity consumption. They
mental findings show that each tenant’s energy consumption divided the initial power consumption data into patterns
has a particular distribution pattern, and the suggested using iCEEMDAN and used Bayesian-optimized LSTM to
model can accurately estimate energy consumption for each forecast each mode independently. In the same direction,
renter. To forecast daily energy consumption based on Jin et al. [23] proposed an attention-based encoder-decoder
weather data, Faiq et al. [17] developed a new energy usage network with Bayesian optimization for short-term electrical
prediction technique for institutional buildings using long load forecasting, using a gated recurrent unit recurrent neu-
short-term memory (LSTM). The model, trained using ral network for time series data modeling and a temporal
Malaysian Meteorological Department weather forecasting attention layer for improved prediction accuracy and
International Journal of Energy Research 5
precision. Further in their study, Olu-Ajayi et al. [24] used disciplines. The buildings under study (referred to as CLAS,
various machine learning techniques to predict yearly build- NHAI, and Cronkite) are all part of the same institution
ing energy consumption using a large data set of residential and serve distinct functions. Building CLAS, an abbrevia-
buildings. The model allows designers to enter key building tion of Center of Law and Society, mainly consists of an
design features and anticipate energy usage early in the amphitheater and offices, and building NHAI, which
development process. DNN was found to be the most effi- means Nursing and Health Innovation, consists of offices
cient predictive model, motivating building designers to and laboratories. In contrast, Cronkite consists of class-
make informed choices and optimize structures. Jang et al. rooms and seminar halls.
[25] created three LSTM models to compare the effects of The buildings are equipped with IoT sensors connected
incorporating operation pattern data on prediction perfor- to power intel sockets, and the collected data is sorted on
mance. The model using operation pattern data performed an open-source website server [32]. The prediction method
the best, with a CVRMSE of 17.6% and an MBE of 0.6%. will use three machine learning algorithms: long short-
The article by Ndife et al. [26] presents a smart power term memory (LSTM), random forest (RF), and gradient
consumption forecast model for low-powered devices. The boosting regressor (GBR). The data will be analyzed and
model utilizes advanced methodologies, such as the prepared before being used to train and test the models.
ConvLSTM encoder-decoder algorithm, to accurately pre- The methodology for forecasting energy consumption
dict power consumption trends. The performance evalua- will be divided into three sections:
tion of the model demonstrates improved accuracy and
computational efficiency compared to traditional methods. (1) Data analysis involves evaluating raw data to under-
Also, Duong and Nam [27] developed a machine learning stand patterns and characteristics of electrical power
system that monitors electrical appliances to improve elec- consumption data.
tricity usage behavior and reduce environmental impact.
(2) Model training trains machine learning models,
The system utilizes load and activity sensors to track energy
using past data to identify patterns and correlations
consumption and operating status. After three weeks of test-
between input characteristics and day power use
ing, the system achieved a state prediction accuracy of
93.60%. In their approach, Vennila et al. [28] propose a (3) Model test models evaluation using validation met-
hybrid model that integrates machine learning and statistical rics to assess their performance and accuracy.
techniques to improve the accuracy of predicting solar
energy production. The model also helps in reducing place- 3.1. Data Analysis
ment costs by emphasizing the significance of feature selec-
3.1.1. Data Preparation. This study focuses on the process of
tion in forecasting. In the sale context, Kapp et al. [29]
data preparation in machine learning, which is time-
developed a supervised machine learning model to address
consuming and computationally challenging due to the pres-
energy use reduction in the industrial sector. They collected
ence of missing values and uneven value scales between
data from 45 manufacturing sites through energy audits
features. The data was prepared using two techniques: impu-
and used various characteristics and parameters to predict
tation of missing data and standardization. The imputation
weather dependency and production reliance. The results
procedure was carried out using the probabilistic principal
showed that a linear regressor over a transformed feature
component analysis (PPCA) approach, a maximum likeli-
space was a better predictor than a support vector machine.
hood estimate-based technique that estimates missing values
In their research, Bhol et al. [30] propose a new method for
using the expectation-maximization (EM) algorithm. This
predicting reactive power based on real power demand. They
method is developed from the principal component analysis
utilize a flower pollination algorithm to optimize their model
(PCA) method, which is used for data compression or
and show that it outperforms other models like GA, PSO, and
dimensionality reduction. The resulting cleaned data was
FPA. Asiri et al. [31] used an advanced deep learning model
then subjected to standardization, also known as Z-score
for accurate load forecasting in smart grid systems. They use
normalization, to ensure an even distribution of the data
hybrid techniques, including LSTM and CNN, feature engi-
above and below the mean value as shown in equation (3):
neering, and wavelet transforms, to enhance forecasting
accuracy and efficiency. The results show significant
x−μ
improvements in short-term load prediction, outperforming xstandardized = , 1
traditional forecasting methods. σ
Table 1 contains detailed information about the algo-
rithms used, performance evaluation measurements, and where μ represents the mean, σ denotes the standard devia-
the advantages and disadvantages of each approach. tion, and x is the original data points.
while skewness measures irregular probability distribution long-term relationships, handles variable-length sequences,
around the mean value [33]. Equations (2) and (3) provide and recalls previous data, making it useful for energy con-
formulas for skewness and kurtosis, which are essential for sumption prediction [35]. The LSTM model structure con-
understanding the data set distribution and its impact on sists of three layers: input, LSTM unit, and output. The
the prediction outcome. mathematical equations used in LSTM include the forget
gate, input gate, output gate, and cell state. The following
∑Ni=1 xi − x 3 2
are the equations utilized in LSTM:
Skewness = ,
N − 1 ∗ σ3
it = σ W i · ht−1 , xt + bi f t = σ W f · ht−1 , xt + b f ,
∑N x − x 4
Kurtosis = i=1 i4 , 3
Ct = f t · C t−1 + it · C t , 4
σ
where n is the number of data points in the collection, xi ot = σ W o · ht−1 , xt + bo ht = ot · tan h C t ,
is the individual data points within the sample, and x is
the sample mean. where xt is the input at the step t; it , f t , and ot are the input,
3.1.3. Feature Selection. Feature engineering is a crucial forgot, and output vectors; gt is the candidate activation vec-
aspect of machine learning, involving the creation of mean- tor, and ct is the cell state at time t.
ingful data representations to enhance model performance. The LSTM algorithm is a powerful tool for collecting
It involves careful selection, transformation, and creation of and transmitting information across long sequences. It is
features that capture relevant information from raw data, commonly used in applications such as audio recognition,
enhancing predictive accuracy and interoperability. Tech- natural language processing, and time series analysis. Based
niques like principal component analysis, domain knowledge on previous research and the availability of a time series data
extraction, and creative data manipulation help models set, LSTM is chosen as the algorithm for predicting energy
extract patterns and make accurate predictions, bridging with high precision. Figure 5 presents a flowchart of LSTM.
the gap between raw data and actionable insights.
3.2.3. Gradient Boosting Regressor. The gradient boosting
As previously stated, our data set comprises 27 features
approach is an iterative method that combines weak learners
detailing the characteristics of the selected buildings. To
to create a strong learner by focusing on errors at every step.
ensure optimal input for our predictive model, we employed
It is aimed at decreasing the loss function by finding an
a feature engineering approach leveraging a tree-based
approximation function of the function F x that translates
model, specifically the random forest algorithm.
x to y. This method improves prediction performance and
3.2. Model Development. This study uses supervised machine lowers prediction error by matching weak learner models
learning to predict energy usage using data prepared and to the loss function [36]. The squared error function is often
trained in two groups. The model employs regressive predic- used to estimate the approximation function, which is then
tion using random forest, LSTM, and gradient boosting used to find the ideal settings for weak learners. The gradient
regressor. The process from data collection to model gener- boosting regressor’s mathematical equation is as follows:
ation is depicted in Figure 3.
M
3.2.1. Random Forest. A random forest regressor is a yi = F xi + 〠 γm hm xi , 5
machine learning method that combines multiple decision m=1
trees to create a predictive model for regression tasks. Each
tree is constructed using a randomly selected subset of train- where yi is the predicted target, xi is the input features, F xi
ing data and features with H x ; θk , k = 1, ⋯, K where x rep- is the ensemble model prediction, M is the weak model, γm
resents the observed input (covariate) vector of length p with is the learning rate, and hm xi is the prediction by m−th
associated random vector X. During prediction, the regres- weak model. The current research utilized gradient boosting
sor aggregates predictions from all trees to generate the final due to its robust predictive performance, ability to capture
output, typically the average of the individual three predic- complex data linkages and nonlinear patterns, and flexibility
tion h x = 1/k ∑Kk=1 h x ; θk [34]. This method is com- and customization capabilities. Figure 6 depicts the gradient
monly used for pattern identification and prediction due to boost regressor algorithm’s flow chart.
its ability to learn complicated behavior, Consequently, it is
the best choice for constructing the prediction model in 3.3. Model Evaluation. The data set was divided into a train-
the present study. In Figure 4, we present a flow chart of ing group (25%) and a testing group (75%). The training
the random forest algorithm. group was used to train machine learning algorithms and
create predictive models for maximum consumption data.
3.2.2. Long Short-Term Memory. Sepp Hochreiter and Juer- The testing group was used to evaluate the performance of
gen Schmidhuber introduced long short-term memory these models. This process is illustrated in Figure 7.
(LSTM) in 1997 as an advanced application of recurrent The training and testing process involved a simple parti-
neural networks. LSTM is effective in processing and pre- tioning of data to prevent overfitting. Machine learning algo-
dicting time series data with varying durations. It captures rithms’ predictive models were evaluated for performance
8 International Journal of Energy Research
Training data
Random forest
(70%)
Gradient boosting
regressor
Start Start
Construct trees
For each iteration
Prediction by trees
Aggregation Yes
Calculate pseudo-residuals
Final prediction
Fit weak learner
End
Aggregated predictions
Start
Final prediction
Input (xt)
End
Data Generate
preparation predictive values Comparison with
the actual
Training data recorded
(30%) Root mean square maximum
error consumption
Mean absolute
error
Mean squared
error
Mean absolute
percentage error
4. Results and Discussion 2023. Figure 8 also depicts the format of the data set for a
graphical examination of normality.
The experiment results were reviewed in sections, discuss- Based on Table 3, the data sets for the CLAS, NHAI, and
ing the initial processing and imputation of missing data, Cronkite buildings were approximately symmetrical and
energy consumption prediction for each building, and per- skewed with bidirectional shape distribution. However, there
formance comparisons for random forest, long short-term were some differences in the skewness values for each build-
memory, and gradient boosting regressor models. The pre- ing. The CLAS building showed normal asymmetry due to
sentation of results follows a hierarchy, starting with the power consumption and KWS, with a slightly negative skew-
normality test, then data preprocessing, and finally model ness indicating a longer left tail. The CHWTON distribution
evaluation. was skewed, with a skewness of 427578, indicating a longer left
tail. The nursing and health innovation building had a pro-
4.1. Normality Testing of Data. The evaluation is aimed at nounced asymmetry, with power consumption having a posi-
examining the impact of data shape on predictive model tive skewness and KWS and CHWTON having a negative
development performance, using measures of skewness and skewness, indicating balanced tails. The Cronkite building
kurtosis. Results were compiled in Table 3 to evaluate the had positive skewness values, indicating a moderate right-
data’s shape and potential deviations from normal distribu- skewed distribution. Overall, all three data sets were approxi-
tion. To evaluate the normality of the energy demand data, mately symmetric, skewed, and bimodal in their form density.
the two values were computed using the aggregated data The kurtosis values of all three buildings in Table 3
from each building spanning from January 2020 to January were less than 0, indicating that their distributions were
10 International Journal of Energy Research
0.0008 0.00030
0.0007
0.00025
0.0006
0.0005 0.00020
Density
Density
0.0004 0.00015
0.0003
0.00010
0.0002
0.0001 0.00005
0.0000 0.00000
4000 5000 6000 7000 8000 –1000 0 1000 2000 3000 4000 5000 6000 7000
30
0.0004
25
0.0003 20
Density
Density
0.0002 15
10
0.0001
5
0.0000 0
1000 2000 3000 4000 5000 6000 –0.4 –0.2 0.0 0.2 0.4
Power_consumption (KW) CHWTON
0.00035
0.0010 0.00030
0.0008 0.00025
Density
Density
0.00020
0.0006
0.00015
0.0004
0.00010
0.0002 0.00005
0.0000 0.00000
4000 5000 6000 7000 8000 0 2000 4000 6000 8000
Power_consumption (KW) CHWTON
platykurtic. This was also evident in Figure 8, where the normal distributions, but CLAS had a lower mean than
probability distribution plot had a higher tail and a larger the median. Department CLAS also had an almost normal
peak center. However, the Cronkite building had a kurto- distribution but with higher skewness and kurtosis. The
sis value greater than 0, indicating a leptokurtic distribu- CHWTON data set had a higher variation compared to
tion with higher variance. CLAS and NHAI had roughly the other data sets.
International Journal of Energy Research 11
4.2. Data Preprocessing. Based on Figure 9, the original data (3) Demography: A building’s population might influ-
set had various scale ranges for power consumption factors ence consumption patterns
like KWS, CHWTON, voltage, and building occupants. To
verify the prediction capacity of 29 features, multiple (4) Geographical factors such as climate. People will use
approaches like correlation analysis, ensemble analysis, and more electrical appliances at hot and low tempera-
tree-based models were used. After testing the mentioned tures, respectively
methods, the most ideal qualities for projecting energy
demand and consumption are as follows: The study on missing data utilized the missingness
matrix to quantify the extent of missing data and identify
(1) Previous consumption patterns rows that contained missing values. Upon analyzing
Figure 10, it is noteworthy that none of the three data sets
(2) Calendar: weekday, month, and season exhibited any missing data.
12 International Journal of Energy Research
0.8 877
0.6 658
0.4 438
0.2 219
0.0 0
Time reviewed
Power_consumption (KW)
KWS
CHWTON
Weekday
Figure 10: Missingness graph of CLAS building.
4.3. Feature Selection. Selecting the most crucial features Table 4: Feature importance.
plays a vital role in enhancing the effectiveness, stability, Features Importance
and scalability of our prediction model. Through the utiliza-
tion of a feature importance assessment method, as summa- CHWTON 0.303456
rized in Table 4, we identified the top five influential CHWTONgalsgas 0.23327
features: KW, KWS, CHWTON, total houses, and Total houses 0.291235
CHWTONgaslas. The ranking of these features is illustrated KW 0.353657
in Figure 11, which shows the order of their importance. HTmmBTU 0.083478
Although the initial analysis considered all 29 parameters, Combined mmBTU 0.183562
the figure only highlights features that significantly contrib-
HTmmBTUgalsgas 0.285535
ute to precision, ensuring a streamlined and informative
depiction. Total light bulbs 0.229835
The study is aimed at predicting energy consumption in
three educational buildings by identifying key parameters.
Through feature selection, we have identified key parameters tiple methods for each building after training and testing.
that significantly impact energy usage. These include Comparative results are shown in Table 5.
“CHwton” or chilled water tons which measures the cooling Based on the performance evaluation measurements pre-
capacity of chilled water systems, representing the heat sented in Table 5, the GBR method exhibited outstanding
energy required to melt one ton of ice in 24 hours. Addition- performance across all buildings. Notably, the determination
ally, “KW” denotes the power consumption of electrical coefficients were remarkably high, reaching 0.998 for Cron-
equipment and lighting systems within the buildings. “Total- kite, 0.984 for CLAS, and 0.845 for NHAI. Furthermore,
lightbulb” denotes the aggregate number of light bulbs or the corresponding mean squared error (MSE) values were
lamps within the buildings, crucial for various assessments. 8.148, 5.09, and 9.17, respectively. The root mean squared
Furthermore, aspects of HVAC systems, like “CHWTON- error (RMSE) and mean absolute error (MAE) also sup-
galsgas,” offer insights into chilled water and gas usage. ported these results, indicating that GBR outperformed
Moreover, “Combined mmBTU” measures the heat required other methods and yielded the best values. Additionally,
to raise the temperature of water by one degree Fahrenheit. when assessing the mean absolute percentage error (MAPE)
The feature selection process helps identify the most influen- results, GBR surpassed the other methods, demonstrating
tial parameters for the predictive model, enabling more the lowest error percentage. The LSTM method exhibited
accurate energy consumption forecasts. lower determination coefficients compared to the GBR
results, with values of 0.86 for CLAS, 0.7772 for NHAI,
4.4. Performance Evaluation and Comparison. The predic- and 0.7609 for Cronkite. However, when comparing LSTM
tion models’ performance was evaluated by comparing mul- to the RF method, the performance varied across buildings.
International Journal of Energy Research 13
Feature ranking
0.35
0.30
0.25
Importance
0.20
0.15
0.10
0.05
0.00
KW KWS CHWTON HTmmBTU Combined mmBTU Totalgalsgas Totallightbulbs Total#Houses CHWTONgalsgas HTmmBTUgalsgas
Feature
Table 6: Real and predicted average consumption for each method. Table 7: Cross-validation score for models.
Real values GBR test LSTM test RF test Algorithm RF LSTM GBR
5364.07 5511.33 5646.09 5666.75 Validation score 0.83 0.92 0.95
5902.25 5881.72 5811.137 5822.69
5915.77 5900.722 5871.49 5870.67
5496.93 5491.65 5499.29 5496.55
5512.42 5523.29 5535.02 5535.18 Specifically, in the Cronkite building, the random forest
6173.30 6178.63 6366.57 6354.28 method outperformed LSTM with an R2 value of 0.89. Nev-
6141.73 6296.113 6345.22 6365.09 ertheless, in terms of other metrics such as MSE and RMSE,
LSTM yielded comparably smaller values than the RF
6302.20 6364.17 6371.89 6389.07
method. Moreover, there was a significant difference in the
6182.52 6182.480 6234.04 6240.37 MAPE results, with LSTM generating fewer errors compared
6251.62 6171.9 6166.97 6168.348 to random forest. This observation suggests that, in terms of
6251.62 5606.99 5530.53 5527.52 errors, LSTM performed better and produced a lower num-
5602,05 5626.398 5637.729 5527.52 ber of errors compared to RF. According to the forecast eval-
5678,72 6769.29 6437.53 5641.79 uation, the square error method was deemed a more suitable
6842,53 6893.64 6884.76 6417.95 evaluation metric for assessing the accuracy of the predic-
tions. Following this examination, it became clear that the
6980,2 6701.16 6606.624 6876.06
gradient boosting regressor (GBR) method performed the
6767,83 6695.89 6520.20 6622.32 best across all buildings.
6568,83 6394.089 6358.6 6506.90 Considering the data presented in Table 6, it is evident
5789,52 5730.96 5596.32934 6355.81 that the algorithm closest to the real testing values is the gra-
dient boosting regressor, demonstrating good precision. The
14 International Journal of Energy Research
7500
7000 7000
6500 6500
6000
6000 5500
5500 5000
4500
5000
4000
4500
0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 800
Observed
Predicted_RF
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 800
Observed
Predicted_LSTM
7500
7000 7000
6500 6500
6000
6000
5500
5500 5000
5000 4500
4000
4500
0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 800
Observed
Predicted_GBR
Figure 12: Real and predicted average consumption for CLAS building.
long short-term memory (LSTM) method follows in second reliable model evaluation without excessive computational
place, and the random forest algorithm comes last in terms overhead and aligns with common practices in the field,
of accuracy in predicting average consumption. In the con- allowing easier comparison with existing literature and
text of result validation, K-fold cross-validation is a highly benchmarks. Table 7 provides the outcome of the 5-fold
suitable technique for our case due to its inherent advan- cross-validation.
tages. By partitioning the data set into K subsets, each con- A line graph comparison was used to better demonstrate
taining a representative sample of the data, K-fold cross- the difference between the actual and anticipated average
validation ensures thorough training and validation of the consumption levels, as depicted in Figures 12–14. In addi-
model. This approach maximizes data utilization and mini- tion, Figures 15–17 show the graphical presentation of the
mizes bias, as every data point is utilized for both training regression line for the three buildings. In the CLAS and
and validation across different folds. Furthermore, the aver- Cronkite buildings, the gradient boosting regressor (GBR)
aging of performance metrics over multiple splits provides a produces a symmetric regression line, indicating that its
robust evaluation, effectively reducing the variance associ- predicted values closely align with the actual ones. Con-
ated with a single train-test split. Additionally, K-fold versely, for the NHAI building data set, characterized by
cross-validation facilitates better generalization by assessing nonsymmetrical data, long short-term memory (LSTM)
the model’s performance across diverse subsets of the data, outperforms other models due to its ability to capture tem-
ensuring that it can effectively handle various scenarios. Its poral dependencies.
utility extends to hyperparameter tuning, enabling the com- However, in the case of NHAI, the performance differ-
parison of different parameter configurations across multiple ence between LSTM and GBR is minimal, highlighting the
validation sets. suitability of both algorithms for different data characteris-
In our scenario, we choose 5-fold cross-validation for its tics. GBR excels in all cases, while LSTM’s recurrent nature
moderate data set size, balancing computational efficiency makes it valuable for handling nonlinear, time-dependent
and robust performance estimation. This method ensures data.
International Journal of Energy Research 15
8000 8000
7500 7000
7000
6000
6500
5000
6000
4000
5500
0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 800
Observed
Predicted_RF
1.0 0.7
0.8 0.6
0.5
0.6 0.4
0.4 0.3
0.2
0.2
0.1
0.0 0.0
0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 800
Observed
Predicted_LSTM
8000 8000
7500 7000
7000
6000
6500
5000
6000
4000
5500
0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 800
Observed
Predicted_GBR
Figure 13: Real and predicted average consumption for Cronkite building.
From the analysis of all the tables and figures, we con- face area and the simultaneous use for many educational
clude that the best performances are consistently achieved objectives. On the other hand, the Cronkite building has an
by the gradient boosting regressor (GBR). GBR’s sequential energy consumption rate of 26 kWh/h, while NHAI has a
training approach trains weak learners sequentially, correct- consumption rate of 12 kWh per hour. Predictive modeling
ing errors from previous iterations, and fine-tuning the approaches are necessary for efficient energy allocation and
model’s predictive capabilities with each step. Additionally, management. Within this particular instance, the gradient
gradient descent optimization minimizes prediction errors, boosting regressor model demonstrates its superiority in
leading to more accurate predictions. Following GBR, long effectively predicting outcomes for both the CLAS and Cron-
short-term memory (LSTM) stands out as it is specifically kite buildings. The choice is backed by the model’s remark-
designed for handling sequential data, making it well- able performance metrics, as shown by its coefficient of
suited for time series forecasting and similar tasks. Its ability determination (R-squared) values of 0.99 for Cronkite and
to understand and process temporal patterns contributes to 0.98 for CLAS. This model improves the accuracy of forecast-
accurate predictions in time-dependent scenarios. Lastly, ing by offering proactive insights into the energy needs of
the random forest algorithm also delivers good results, par- each building. It also helps in preventing energy loss before
ticularly when it comes to capturing complex nonlinear cor- it happens and promotes efforts to reduce energy usage.
relations between features and the subject variable, and its
ability to model complex interactions and patterns makes 5. Comparison with the Previous Study
it effective.
The CLAS building has a significantly higher energy con- The study compared three algorithms: random forest,
sumption rate, exceeding 30 kWh, in contrast to the other LSTM, and gradient boosting regressor, revealing their per-
buildings. The main reason for this difference is the large sur- formance in forecasting monthly average consumption.
16 International Journal of Energy Research
5000 5000
4500 4500
4000 4000
3500
3500
3000
3000
2500
2500 2000
2000
0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 800
Observed
Predicted_RF
0.8 1.0
0.7 0.8
0.6
0.5 0.6
0.4 0.4
0.3
0.2 0.2
0.1 0.0
0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 800
Observed
Predicted_LSTM
5000 5000
4500 4500
4000 4000
3500 3500
3000
3000
2500
2500 2000
2000
0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 800
Observed
Predicted_GBR
Figure 14: Real and predicted average consumption for NHAI building.
The development of prediction models demonstrated their negative predictions, indicating its unsuitability for time
capabilities, urging further optimization. The findings also series data sets. Furthermore, in their research, Khaoula
led to a comparative analysis with previous machine learn- et al. [42] used three deep learning algorithms—recurrent
ing studies. In the first research conducted by Khaoula neural networks (RNNs), artificial neural networks (ANNs),
et al. in 2022 [40], four machine learning algorithms were and autoregressive neural networks (AR-NNs)—to forecast
implemented to predict energy demand for a commercial the total load of HVAC systems. The results showed that
building over two years. The algorithms used were multiple the autoregressive neural network model outperformed the
linear regression (MLR), long short-term memory (LSTM), other two due to its ability to capture temporal dependencies
simple linear regression (LR), and random forest (RF). The and patterns in time series data, which is crucial for HVAC
results indicated that LSTM performed the best, followed load prediction. AR-NNs use a simpler architecture, focus-
by RF, MLR, and LR, providing valuable insights into the ing on past observations to predict future values, and their
regression algorithms’ capabilities. In the second research, autoregressive nature allows them to effectively model the
Khaoula et al. in 2023 [41] examined energy consumption self-dependence of time series data, leading to more accurate
prediction in a low-energy house over four months. Unlike predictions.
the first research, this time, the prediction considered not Drawing insights from these three studies, significant
only the house’s energy but also its appliances. Three findings emerge regarding the efficacy of regression algo-
machine learning algorithms, namely, artificial neural net- rithms for energy consumption prediction. Specifically, long
works (ANN), recurrent neural networks (RNN), and ran- short-term memory (LSTM) and random forest (RF) consis-
dom forest (RF), were employed for tests. Recurrent neural tently emerge as top performers, especially in handling time
networks especially LSTM once again outperformed the series data. However, our research introduces a novel aspect
other algorithms, achieving an impressive accuracy of 96%. by exploring the effectiveness of gradient boosting regressor
RF was followed with 88% accuracy. However, ANN yielded (GBR), which yielded exceptional results. Notably, GBR
International Journal of Energy Research 17
Predicted LSTM usage (kWh):
Figure 15: Regression line between observation and predictions for CLAS building.
Figure 16: Regression line between observation and predictions for Cronkite building.
Predicted LSTM usage (kWh):
Predicted RF usage (kWh):
Figure 17: Regression line between observation and predictions for NHAI building.
achieved remarkable precision, boasting an impressive accu- larger data sets. We intend also to apply a novel approach
racy of 98.9%. Moreover, compared to other algorithms, to the gradient boosting optimizer to fine-tune the model’s
GBR demonstrated superior performance with fewer errors, parameters and hyperparameters more effectively. These
as evidenced by lower root mean square (RMS), mean abso- efforts are aimed at enhancing the GBR algorithm’s perfor-
lute error (MAE), and mean absolute percentage error mance for accurate energy consumption forecasting and
(MAPE) values. This underscores the potential of GBR as a other applications.
formidable contender in energy consumption prediction Another significant contribution of our future research
tasks, offering a promising alternative to LSTM and RF in lies in the utilization of transformer models for predicting
certain contexts. diurnal energy consumption patterns. Transformers, origi-
nally designed for natural language processing tasks, have
6. Perspectives and Future Work shown remarkable capabilities in capturing long-range
dependencies in sequential data, making them well-suited
For future contributions, we plan to optimize the GBR for time series forecasting tasks as well. By applying trans-
model by increasing the data used for training and predic- former architectures to predict diurnal energy consumption,
tion, which may improve efficiency and performance on we aim to leverage their ability to effectively model complex
18 International Journal of Energy Research
[13] M. A. Ahajjam, D. B. Licea, C. Essayeh, M. Ghogho, and [27] V. H. Duong and N. H. Nguyen, “Machine learning algorithms
A. Kobbane, “MORED: a Moroccan buildings’ electricity con- for electrical appliances monitoring system using open-source
sumption dataset,” Energies, vol. 13, no. 24, p. 6737, 2020. systems,” IAES International Journal of Artificial Intelligence,
[14] N. Somu, M. R. Gauthama Raman, and K. Ramamritham, “A vol. 11, no. 1, p. 300, 2022.
hybrid model for building energy consumption forecasting [28] C. Vennila, T. Anita, T. Sri Sudha et al., “Forecasting solar
using long short term memory networks,” Applied Energy, energy production using machine learning,” International
vol. 261, article 114131, 2020. Journal of Photoenergy, vol. 2022, Article ID 7797488, 7 pages,
[15] I. W. A. Suranata, I. N. K. Wardana, N. Jawas, and I. K. A. A. 2022.
Aryanto, “Feature engineering and long short-term memory [29] S. Kapp, J.-K. Choi, and T. Hong, “Predicting industrial build-
for energy use of appliances prediction,” TELKOMNIKA (Tele- ing energy consumption with statistical and machine learning
communication Computing Electron- ics and Control), vol. 19, models informed by physical system parameters,” Renewable
no. 3, pp. 920–930, 2021. and Sustainable Energy Reviews, vol. 172, article 113045, 2023.
[16] M. K. M. Shapi, N. A. Ramli, and L. J. Awalin, “Energy con- [30] R. Bhol, S. C. Swain, R. Dash, K. J. Reddy, C. Dhanamjayulu,
sumption prediction by using machine learning for smart and B. Khan, “Short-term reactive power forecasting based
building: case study in Malaysia,” Developments in the Built on real power demand using Holt-Winters’ model ensemble
Environment, vol. 5, article 100037, 2021. by global flower pollination algorithm for microgrid,” Interna-
[17] M. Faiq, K. G. Tan, C. P. Liew et al., “Prediction of energy con- tional Journal of Energy Research, vol. 2023, Article ID
sumption in campus buildings using long short-term mem- 9733723, 22 pages, 2023.
ory,” Alexandria Engineering Journal, vol. 67, pp. 65–76, 2023. [31] M. M. Asiri, G. Aldehim, F. A. Alotaibi, M. M. Alnfiai,
[18] T. Kawahara, K. Sato, and Y. Sato, “Battery voltage prediction M. Assiri, and A. Mahmud, “Short-Term Load Forecasting in
technology using machine learning model with high extrapola- Smart Grids Using Hybrid Deep Learning,” IEEE Access,
tion accuracy,” International Journal of Energy Research, vol. 12, pp. 23504–23513, 2024.
vol. 2023, Article ID 5513446, 17 pages, 2023. [32] “Real-time electricity consumption,” https://portal.emcs
[19] S. A. Mohammed, O. A. Awad, and A. M. Radhi, “Optimiza- .cornell.edu/d/2/dashboard-list?orgId=2.
tion of energy consumption and thermal comfort for intelli- [33] P. Mishra, C. M. Pandey, U. Singh, A. Gupta, C. Sahu, and
gent building management system using genetic algorithm,” A. Keshri, “Descriptive statistics and normality tests for statis-
Indonesian Journal of Electrical Engineering and Computer Sci- tical data,” Annals of Cardiac Anaesthesia, vol. 22, no. 1,
ence, vol. 20, no. 3, pp. 1613–1625, 2020. pp. 67–72, 2019.
[20] N. F. Aurna, M. T. M. Rubel, T. A. Siddiqui et al., “Time series [34] M. R. Segal, Machine learning benchmarks and random forest
analysis of electric energy consumption using autoregressive regression, 2004.
integrated moving average model and Holt Winters model,” [35] Y. Yu, X. Si, C. Hu, and J. Zhang, “A review of recurrent neural
TELKOMNIKA (Telecommunication Computing Electronics networks: LSTM cells and network architectures,” Neural
and Control), vol. 19, no. 3, pp. 991–1000, 2021. Computation, vol. 31, no. 7, pp. 1235–1270, 2019.
[21] Z. Ferdoush, B. N. Mahmud, A. Chakrabarty, and J. Uddin, [36] A. Natekin and A. Knoll, “Gradient boosting machines, a tuto-
“A short-term hybrid forecasting model for time series rial,” Frontiers in Neurorobotics, vol. 7, p. 21, 2013.
electrical-load data using random forest and bidirectional
[37] A. Di Bucchianico, “Coefficient of determination (R2),” Ency-
long short-term memory,” International Journal of Electrical
clopedia of Statistics in Quality and Reliability, 2008.
and Computer Engineering, vol. 11, no. 1, pp. 763–771,
2021. [38] D. Wallach and B. Goffinet, “Mean squared error of prediction
as a criterion for evaluating and comparing system models,”
[22] Y. He and K. F. Tsang, “Universities power energy manage-
Ecological Modelling, vol. 44, no. 3-4, pp. 299–306, 1989.
ment: a novel hybrid model based on iCEEMDAN and Bayes-
ian optimized LSTM,” Energy Reports, vol. 7, pp. 6473–6488, [39] P. Goodwin and R. Lawton, “On the asymmetry of the sym-
2021. metric MAPE,” International Journal of Forecasting, vol. 15,
no. 4, pp. 405–408, 1999.
[23] X.-B. Jin, W.-Z. Zheng, J.-L. Kong et al., “Deep-learning fore-
casting method for electric power load via attention-based [40] E. Khaoula, B. Amine, and B. Mostafa, “Evaluation and com-
encoder decoder with Bayesian optimization,” Energies, parison of energy consumption prediction models,” in Inter-
vol. 14, no. 6, p. 1596, 2021. national Conference on Advanced Technologies for Humanity,
Marrakech, Morocco, 2022.
[24] R. Olu-Ajayi, H. Alaka, I. Sulaimon, F. Sunmola, and S. Ajayi,
“Building energy consumption prediction for residential [41] E. Khaoula, B. Amine, and B. Mostafa, “Evaluation and com-
buildings using deep learning and other machine learning parison of energy consumption prediction models case study:
techniques,” Journal of Building Engineering, vol. 45, article smart home,” in The International Conference on Artificial
103406, 2022. Intelligence and Computer Vision, pp. 179–187, Springer
Nature Switzerland, Cham, 2023.
[25] J. Jang, J. Han, and S.-B. Leigh, “Prediction of heating energy
consumption with operation pattern variables for non- [42] E. Khaoula, B. Amine, and B. Mostafa, “Forecasting diurnal
residential buildings using LSTM networks,” Energy and Heating Energy Consumption of HVAC system using ANN,
Buildings, vol. 255, article 111647, 2022. RNN, ARNN,” in 2023 14th International Conference on Intel-
ligent Systems: Theories and Applications (SITA), pp. 1–6,
[26] A. N. Ndife, W. Rakwichian, P. Muneesawang, and Y. Mensin,
Casablanca, Morocco, 2023.
“Smart power consumption forecast model with optimized
weighted average ensemble,” IAES International Journal of [43] “Real time utility use data,” https://portal.emcs.cornell.edu/d/
Artificial Intelligence, vol. 11, no. 3, p. 1004, 2022. 2/dashboard-list?orgId=2.