0% found this document useful (0 votes)
15 views5 pages

Goyal 2020

The document summarizes a research paper presented at the 8th International Conference on Reliability, Infocom Technologies and Optimization in June 2020 in India. The paper explores using machine learning techniques like multiple linear regression, K-nearest neighbors, and support vector regression as well as ensemble techniques like random forests, gradient boosting machines, and extreme gradient boosting to predict energy efficiency in buildings based on characteristics like compactness, wall area, height, and glazing. Experimental results showed that ensemble techniques performed better than individual machine learning methods according to evaluation metrics like RMSE, MSE, MAE, R-squared and accuracy.

Uploaded by

Shreyas Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Goyal 2020

The document summarizes a research paper presented at the 8th International Conference on Reliability, Infocom Technologies and Optimization in June 2020 in India. The paper explores using machine learning techniques like multiple linear regression, K-nearest neighbors, and support vector regression as well as ensemble techniques like random forests, gradient boosting machines, and extreme gradient boosting to predict energy efficiency in buildings based on characteristics like compactness, wall area, height, and glazing. Experimental results showed that ensemble techniques performed better than individual machine learning methods according to evaluation metrics like RMSE, MSE, MAE, R-squared and accuracy.

Uploaded by

Shreyas Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)

Amity University, Noida, India. June 4-5, 2020

Exploratory Analysis of Machine Learning


Techniques to predict Energy Efficiency in
Buildings
Monika Goyal Mrinal Pandey Rahul Thakur
Research Scholar, Associate Professor, Research Scholar,
Computer Science and Technology, Computer Science and Technology, Computer Science and Technology,
Manav Rachna University Manav Rachna University Manav Rachna University
Faridabad, India Faridabad, India Faridabad, India
[email protected] [email protected] [email protected]

Abstract—The utilization of energy in the most efficient 32% energy consumption. The third major area is Transport
manner is an urgent demand of the modern era, as energy is with 28% energy consumption. These studies motivated to
being used in each and every field. Globally buildings consume devise solutions for energy optimization in buildings.
the largest percentage of energy, and HVAC system consumes
most of the energy in a building. HVAC maintains desired Further studies show that within buildings, HVAC
temperature within a building by meeting Heating load and (Heating, Ventilation and Air conditioning) system is the
Cooling load requirements. These requirements should be less largest energy consumer accounting for 40-50%
to lessen energy consumption and achieve energy efficiency. consumption [6]-[7]. HVAC consumes energy to maintain
Some of the characteristics of buildings greatly affect the the desired temperature within a building and controlling
Heating load and Cooling load requirements. This research humidity. It is responsible for meeting the Heating load and
analyses eight important characteristics of a building- Relative Cooling load of a building. These two are related to the
Compactness, Surface Area, Wall Area, Roof Area, Overall thermal load of the building. When the building is cold, the
Height, Orientation, Glazing Area, Glazing Area Distribution thermal load is converted into Heating load and when the
and uses them for predicting Heating load and Cooling load. building is hot, thermal load is converted into Cooling load
Experiments have been performed using three Machine [8].
Learning algorithms- Multiple Linear Regression, K Nearest
Neighbours, Support Vector Regression and three Ensemble The Heating load and Cooling load of a building directly
algorithms- Random Forests, Gradient Boosting Machines, affect its energy performance. It requires analysis of factors
Extreme Gradient Boosting. Models have been evaluated using that affect the Heating and Cooling load. Studies reveal that
performance metrics: RMSE, MSE, MAE, R Squared and various characteristics of a building and its structure affect
Accuracy. Results show that Ensemble techniques outperform Heating and Cooling loads to a major extent [9]- [10].
Machine Learning techniques with an appreciable margin.
This paper focusses on several important features of
Keywords—Machine Learning; Energy Optimization; buildings, analyses them and uses them for predicting the
Random Forests; Multiple Linear Regression; Gradient Boosting Heating load and Cooling load using three well known
Machines; Extreme Gradient Boosting; K nearest neighbors; Machine Learning techniques and three Ensemble
Support Vector Regression techniques.
The paper is organized as follows: Section 1 introduces
I. INTRODUCTION the problem. Section 2 describes the Machine Learning
The use of energy is widespread and reached almost techniques and Ensemble techniques used in research. The
every known area, from Industries to means of transport, Literature is reviewed in Section 3. The Methodology of the
from offices to households. In the modern world, no aspect work done is described in Section 4. Section 5 explains the
of life can be thought of which doesn’t consume energy. experiments performed and results obtained. Section 6
Such massive usage of energy leads us to work towards its concludes the paper.
conservation and optimization. Optimal measures should be
taken so that energy is used optimally to safeguard the II. LITERATURE SURVEY
environment [1]. Studies done by experts all over the world This section reviews the analysis and findings of several
show that the highest percentage of energy is consumed by authors worldwide in the field of energy consumption and
buildings [2]-[5]. Buildings consume about 40% of the total optimization in buildings using various individual Machine
energy consumed in the world. After buildings, the second Learning and Ensemble techniques as shown in TABLE I.
major energy consumer is Industry which is reported for

TABLE I. SURVEY OF LITERATURE


Authors ML model used Work Done Results
Tsanas A.& Xifara A., Iteratively Regressive Least A ML framework developed to analyze the Results of Random Forests were better at
2012 [11] Squares, Random Forests effect of different building parameters on revealing relationships between input and
Heating load and Cooling load output variables
Fan C. et. Al., 2014 MLR, ARIMA, SVR, BT, Ensemble models developed to predict next- Prediction accuracy of Ensemble models
[12] RF, MLP, MARS, KNN day energy demand in buildings higher than individual models as evaluated

978-1-7281-7016-9/20/$31.00 ©2020 IEEE 1033

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 02:31:16 UTC from IEEE Xplore. Restrictions apply.
by MAPE
Jain R. K. et. Al., Support Vector Regression Developed a sensor-based model for Spatial granularity impact the prediction
2014 [2] forecasting energy consumption in multi- power of the model significantly
family residential buildings
Wei X. et. Al., 2015 MLP ensemble Model developed for HVAC energy Energy savings were more when Internal Air
[13] optimization Quality was taken into consideration
Park H. S. et. Al., Decision Tree Developed a new energy benchmark to Proposed benchmark better than
2016 [14] improve the operational rating system of conventional and baseline system
office buildings
Candanedo L.M. et. MLR, SVM, RF, GBM Model developed for predicting energy GBM outperformed other models;
Al., 2017 [15] usage by appliances in residential building atmospheric pressure is an important
predictor
Manjarres D. et. Al., Random Forest A framework developed to optimize HVAC Energy consumption reduced by 48% for
2017 [16] energy consumption heating and 39% for cooling
Peng Y. et. Al., 2018 K Nearest Neighbour A model developed to optimize energy Energy savings of 7-52% obtained
[17] consumption in building space according to
occupancy
Gallagher C. V. et. Al. LSR, DT, KNN, ANN, SVM ML algorithms used for measurement and Error reduced by 51.09%
, 2018 [18] verification of energy saved in Industrial
buildings
Deb C. et. Al., 2018 MLR, ANN Prediction models developed for energy ANN more accurate with MAPE of 14.8%
[19] savings in HVAC in office buildings
Nayak S. C., 2019 ARIMA, RBFNN, MLP, Developed a new model which linearly Model developed was better in terms of
[20] SVM, FLANN combined the five ML models for better feasibility and performance
accuracy
Sethi J.K & Mittal M., DT, Naïve Bayes, SVM, RF, ML techniques applied to predict accurate Ensemble techniques outperformed others
2019 [21] LR, Stacking ensemble, Air quality index
Voting ensemble

III. MACHINE LEARNING neighbors. In our experiments, we have kept the


value of k at 4.
The concept of Machine Learning is to make a machine
learn and behave in a certain manner upon feeding a x Support Vector Regression: It aims at finding a
particular type of data as input. Machine Learning can be function f(x) which allows deviation to a certain
broadly classified as Unsupervised Learning and Supervised extent from the obtained target values yi in training
Learning. Clustering is a common Unsupervised Learning data samples. It should also ensure maximum
technique that takes a given dataset as input and groups it flatness [24].
into a finite number of clusters according to some similarity
index. B. Ensemble Techniques
Supervised Learning uses a labeled set of training data The basic fundamental of Ensemble techniques is to
and maps the input data to the desired output. Classification integrate the results of individual Machine Learning models,
and Regression are two common Supervised Learning such that the prediction results exhibit improvement in terms
techniques. Classification assigns a data sample into one of of accuracy and robustness. Bagging and Boosting are two
the predefined discrete classes; Regression can be viewed as popular ensemble methods. The Ensemble techniques used in
a statistical methodology generally used for numeric this paper are explained as follows:
prediction. x Random Forests: It is a tree-based ensemble
technique that can be applied for both Classification
A. Machine Learning Techniques as well as Regression. Some of the features which
The Machine Learning techniques used in this paper are make Random Forests appealing are: Prediction
briefly explained below: efficiency, suitability for highly multi-dimensional
x Multiple Linear Regression: It is although quite problems, missing values handling, outlier removal
similar to Linear Regression but there is one etc. [25].
significant difference. In MLR model, one Response
x Gradient Boosting Machines: GBM is also an
variable B is dependent upon multiple independent
ensemble learning technique, whose underlying
variables A1 , A2 , A3 ….. An. The complexity of
structure is Decision Tree. In GBM additive
results interpretation increases in this model as a
regression models are created by iteratively fitting a
result of the correlation between different
simple base to currently updated pseudo residuals by
independent variables [22].
calculating least squares at every continuous
x K Nearest Neighbours: This technique takes into iteration [26].
consideration “k” number of closest instances in the
x Extreme Gradient Boosting: XGBoost is also a tree-
training dataset to predict the value of the unknown
based ensemble technique. Apart from performance
instance. These k instances are the k nearest
and speed as its key features, this technique has an
neighbors of the unknown instance [23]. The value
added feature of Scalability. Several optimizations
returned is obtained after averaging the values of k
have been performed on the basic algorithm to
nearest
ensure the scalability of the model [27].

1034

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 02:31:16 UTC from IEEE Xplore. Restrictions apply.
IV. METHODOLOGY B. Data Analysis and Pre-processing
This section explains the workflow approach followed in Data pre-processing is a process that consists of checking
this research. Fig. 1 shows the steps of the methodology the dataset for missing values and filling them with
followed. appropriate values, detecting and removing any outliers,
converting it into a particular form suitable for applying
algorithm, attribute selection etc. [12].
The dataset used in this research has been specifically
created by a Civil Engineer for performing energy analysis
in different kinds of buildings, so much pre-processing was
not required. Pearson Correlation Coefficient was calculated
to derive the strength of the relationship among several
variables of the dataset. This coefficient facilitates the
derivation of optimal filters for reducing noise. A zero value
for the Pearson Correlation Coefficient means the variables
are not correlated i.e. they are independent. A value closer
to 1 indicates a high correlation among variables [29]. The
obtained Correlation Coefficient values are represented in
matrix form in Fig. 2.

Fig. 1. Methodology Fig. 2. Correlation Matrix

A. Data Collection C. Data Partitioning


The dataset used in this research is a standard dataset that Dataset was partitioned according to 70% - 30% rule
has been collected from the UCI repository [28]. This dataset into two subsets: Training dataset and Testing dataset. For
is related to Energy Efficiency in Buildings and consists of partitioning, Random Sampling without Replacement was
eight different characteristics of buildings which act as input applied which resulted in 70% Training data and 30% Test
variables and Heating load and Cooling load of buildings as data.
two output variables. Table II describes the parameters of the
data used along with symbols and type. D. Model Construction
The model was constructed by applying three Machine
TABLE I Learning techniques namely, Multiple Linear Regression, K
DATASET PARAMETER DESCRIPTION
Nearest Neighbours and Support Vector Regression and
Parameter Symbol Type three Ensemble techniques namely, Random Forests,
Gradient Boosting Machines, and Extreme Gradient
Relative X1 Input Boosting. The models were applied on the Training dataset
Compactness
and validated using the Testing dataset.
Surface Area X2 Input
Wall Area X3 Input
Roof Area X4 Input E. Model Evaluation
Overall Height X5 Input The evaluation of the results obtained after applying
Orientation X6 Input models was done using five well-known performance
Glazing Area X7 Input metrics namely Root Mean Square Error, Mean Square
Glazing Area X8 Input Error, Mean Absolute Error, R Squared and Accuracy.
Distribution
Heating Load Y1 Output
Cooling Load Y2 Output

1035

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 02:31:16 UTC from IEEE Xplore. Restrictions apply.
V. EXPERIMENTS AND RESULTS
All the experiments of the research were performed using
Python. Three Machine Learning algorithms namely MLR,
KNN and SVR and three Ensemble techniques namely, RF,
GBM, and XGBoost have been experimented on the
collected dataset. The results of the experiments are
described in Table III and Table IV.

A. Results of Machine Learning Techniques


Table III summarizes the results obtained after applying
all three aforementioned Machine Learning algorithms on the
dataset based on the performance metrics. The values of
RMSE range between 3.24 and 3.61 for both output variables
Y1 and Y2 after applying MLR and SVR whereas it is lower,
2.22 for Y1 and 1.8 for Y2 when KNN is applied. Fig. 3. MAE Results
Correspondingly MSE values for KNN are also lower than
MLR and SVR. Similarly, MAE values range between 2.32
and 2.63 using MLR and SVR, and the values are 1.50 and
1.35 using KNN. R Squared values are better in KNN (0.94
and 0.96), as compared to MLR (0.88 and 0.83) and SVR
(0.82 and 0.79). KNN results are better than the other two
algorithms in terms of accuracy also.

B. Results of Ensemble Techniques


The results of the experiments performed by applying
Ensemble techniques are summarized in Table IV. As per the
results, RMSE is 0.48 for output variable Y1 for all the three
algorithms. Y2 value varies slightly, 1.88 for RF, 1.79 for
GBM and 1.91 for XGBoost. Corresponding MSE value is Fig. 4. R Squared Results
0.23 for Y1 for each algorithm and vary slightly for Y2.
MAE value for Y1 is 0.35 using RF and 0.36 when either
GBM or XGBoost is applied. R Squared result values are
almost same for all the algorithms, 0.99 and 0.95 for Y1 and
Y2 respectively. All three algorithms perform same in terms
of accuracy also, which is 99.76% for Y1 and range between
96.01% and 96.14% for Y2.

C. Comparative Analysis of Algorithms based on


Performance Metrics
In this section, the results of experiments are represented
graphically based on various performance metrics used for
results evaluation. Fig.3 – Fig.6 show the results on the basis
of RMSE, MAE, R Squared and Accuracy respectively.

Fig. 5. Accuracy Score

TABLE II. RESULTS OF ML ALGORITHMS

ML Algorithm MLR KNN SVR


Performance
Metric Y1 Y2 Y1 Y2 Y1 Y2
RMSE 3.24 3.61 2.22 1.8 3.38 3.40
MSE 10.50 13.093 4.94 3.24 11.49 11.60
MAE 2.32 2.62 1.50 1.35 2.63 2.46
R Squared 0.88 0.83 0.94 0.96 0.82 0.79
Accuracy 89.63% 85.75% 95.12% 96.47% 88.66% 87.37%

1036

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 02:31:16 UTC from IEEE Xplore. Restrictions apply.
TABLE III. RESULTS OF ENSEMBLE ALGORITHMS

Ensemble Algo RF GBM XGBoost


Performance
Metric Y1 Y2 Y1 Y2 Y1 Y2

RMSE 0.48 1.88 0.48 1.79 0.48 1.91


MSE 0.23 3.54 0.23 3.20 0.23 3.66
MAE 0.35 1.19 0.36 1.21 0.36 1.27
R Squared 0.99 0.95 0.99 0.96 0.99 0.95
Accuracy 99.76% 96.14% 99.76% 96.01% 99.76% 96.01%

[11] A. Tsanas and A. Xifara, “Accurate quantitative estimation of energy


VI. CONCLUSION performance of residential buildings using statistical machine
learning tools” Energy and Buildings, 49,2012, pp. 560-567.
The issue of energy consumption at a fast pace and in [12] C. Fan, F. Xiao, and S. Wang, “Development of prediction models
large amounts demands solutions in this area, which can help for next-day building energy consumption and peak power demand
in efficient energy use. Buildings being the largest energy using data mining techniques,” Applied Energy, 127,2014, pp. 1-10.
[13] X. Wei, A. Kusiak, M. Li, F. Tang, and Y. Zeng, “Multi-objective
consumer need more focus from researchers. The HVAC optimization of the HVAC (heating, ventilation, and air
system consumes a large percentage of building’s total conditioning) system performance” Energy, 83, 2015, pp. 294-306.
energy. HVAC needs energy for operation so that it can meet [14] H. S. Park, M. Lee, H. Kang, T. Hong, and J. Jeong, “Development
the Heating load and Cooling load requirements of the of a new energy benchmark for improving the operational rating
building. Heating and Cooling loads are greatly affected by system of office buildings using various data-mining
techniques” Applied energy, 173, 2016, pp.225-237.
various attributes of a building. This research implements [15] L. M. Candanedo, V. Feldheim, and D. Deramaix, “Data driven
certain well known Machine Learning and Ensemble prediction models of energy use of appliances in a low-energy
techniques to predict the Heating load and Cooling load of a house” Energy and buildings, 140,2017, pp. 81-97.
building. The results of experiments prove that Ensemble [16] D. Manjarres, A. Mera, E. Perea, A. Lejarazu, and S. Gil-Lopez, “An
techniques perform better than Machine Learning techniques. energy-efficient predictive control for HVAC systems applied to
tertiary buildings based on regression techniques,” Energy and
Buildings, 152, 2017, pp.409-417.
REFERENCES [17] Y. Peng, A. Rysanek, Z. Nagy, and A. Schlüter, “Using machine
learning techniques for occupancy-prediction-based cooling control
[1] J. C. Lam, K. K. Wan, C. L. Tsang, and L. Yang, “Building energy
in office buildings” Applied energy, 211, 2018, pp. 1343-1358.
efficiency in different climates,” Energy Conversion and
[18] C. V. Gallagher, K. Bruton, K. Leahy, and D. T. O’Sullivan, “The
Management, 49(8), pp.2354-2366.
suitability of machine learning to minimise uncertainty in the
[2] R. K. Jain, K. M. Smith, P. J. Culligan, and J. E. Taylor,
measurement and verification of energy savings,” Energy and
“Forecasting energy consumption of multi-family residential
Buildings, 158, 2018, pp. 647-655.
buildings using support vector regression: Investigating the impact
[19] C. Deb, S. E. Lee, and M. Santamouris, “Using artificial neural
of temporal and spatial monitoring granularity on performance
networks to assess HVAC related energy saving in retrofitted office
accuracy,” Applied Energy 123, 2014, pp. 168-178.
buildings,” Solar Energy, 163,2018, pp. 32-44.
[3] H. Naganathan, W. O. Chong, and X. Chen, “Building energy
[20] S. C. Nayak, “Escalation of Forecasting Accuracy through Linear
modeling (BEM) using clustering algorithms and semi-supervised
Combiners of Predictive Models,” EAI Endorsed Transactions on
machine learning approaches,” Automation in Construction 72,
Scalable Information Systems, 6(22), 2019.
2016, pp. 187-194.
[21] J. S. Sethi and M. Mittal, “Ambient Air Quality Estimation using
[4] M. W. Ahmad, M. Mourshed, and Y. Rezgui, “Trees vs Neurons:
Supervised Learning Techniques,” Scalable Information
Comparison between random forest and ANN for high-resolution
Systems,2019
prediction of building energy consumption,” Energy and
[22] M. Krzywinski and N. Altman, “Multiple linear regression: when
Buildings 147, 2017, pp. 77-89.
multiple variables are associated with a response, the interpretation
[5] J. S. Chou and D. K. Bui, “Modeling heating and cooling loads by
of a prediction equation is seldom simple” Nature Methods, 12(12),
artificial intelligence for energy-efficient building design. Energy
2015, pp. 1103-1105.
and Buildings 82,2014, pp. 437-446.
[23] P. Cunningham and S. J. Delany, “k-Nearest neighbour
[6] P. Carreira, A. A. Costa, V. Mansu, and A. Arsénio, “Can HVAC
classifiers” Multiple Classifier Systems, 34(8), 2007, pp. 1-17.
really learn from users? A simulation-based study on the
[24] A. J. Smola and B. Schölkopf, “A tutorial on support vector
effectiveness of voting for comfort and energy use
regression,” Statistics and computing, 14(3), 2004, pp. 199-222.
optimization,” Sustainable Cities and Society, 2018.
[25] L. Breiman, “Random forests” Machine learning, 45(1), 2001, pp.5-
[7] J. Drgoňa, D. Picard, M. Kvasnica, and L. Helsen, “Approximate
32.
model predictive building control via machine learning,” Applied
[26] J. H. Friedman, “Stochastic gradient boosting” Computational
Energy,218, 2018, pp. 199-216.
statistics & data analysis, 38(4), 2002, pp. 367-378.
[8] S. S. Roy, R. Roy, and V. E. Balas, “Estimating heating load in
[27] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting
buildings using multivariate adaptive regression splines, extreme
system” In Proceedings of the 22nd acm sigkdd international
learning machine, a hybrid model of MARS and ELM, “Renewable
conference on knowledge discovery and data mining, pp. 785-794,
and Sustainable Energy Reviews” 82, 2018, pp. 4256-4268.
August 2016.
[9] S. Kumar, S. K. Pal, and R. P. Singh, “A novel method based on
[28] https://archive.ics.uci.edu/ml/datasets/Energy+efficiency
extreme learning machine to predict heating and cooling load
[29] J. Benesty, J. Chen, Y. Huang, and I. Cohen, “Pearson correlation
through design and structural attributes,” Energy and Buildings, 176,
coefficient” In Noise reduction in speech processing, Springer,
2018, pp. 275-286.
Berlin, Heidelberg, pp. 1-4.
[10] N. T. Ngo, “Early predicting cooling loads for energy-efficient
design in office buildings by machine learning,” Energy and
Buildings, 182, 2019, pp. 264-273.

1037

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 02:31:16 UTC from IEEE Xplore. Restrictions apply.

You might also like