0% found this document useful (0 votes)
24 views12 pages

Machine Learning

Uploaded by

Thái Sau Thành
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views12 pages

Machine Learning

Uploaded by

Thái Sau Thành
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Journal of Cleaner Production 203 (2018) 810e821

Contents lists available at ScienceDirect

Journal of Cleaner Production


journal homepage: www.elsevier.com/locate/jclepro

Predictive modelling for solar thermal energy systems: A comparison


of support vector regression, random forest, extra trees and regression
trees
Muhammad Waseem Ahmad*, Jonathan Reynolds, Yacine Rezgui
BRE Centre for Sustainable Engineering, School of Engineering, Cardiff University, Cardiff, CF24 3AA, United Kingdom

a r t i c l e i n f o a b s t r a c t

Article history: Predictive analytics play an important role in the management of decentralised energy systems. Pre-
Received 30 April 2018 diction models of uncontrolled variables (e.g., renewable energy sources generation, building energy
Received in revised form consumption) are required to optimally manage electrical and thermal grids, making informed decisions
17 July 2018
and for fault detection and diagnosis. The paper presents a comprehensive study to compare tree-based
Accepted 19 August 2018
ensemble machine learning models (random forest e RF and extra trees e ET), decision trees (DT) and
Available online 28 August 2018
support vector regression (SVR) to predict the useful hourly energy from a solar thermal collector system.
The developed models were compared based on their generalisation ability (stability), accuracy and
Keywords:
Artificial intelligence
computational cost. It was found that RF and ET have comparable predictive power and are equally
Extra trees applicable for predicting useful solar thermal energy (USTE), with root mean square error (RMSE) values
Random forest of 6.86 and 7.12 on the testing dataset, respectively. Amongst the studied algorithms, DT is the most
Decision trees computationally efficient method as it requires significantly less training time. However, it is less ac-
Ensemble algorithms curate (RMSE ¼ 8.76) than RF and ET. The training time of SVR was 1287.80 ms, which was approximately
Solar thermal energy systems three times higher than the ET training time.
© 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction renewable energy generation in the EU. The majority of this energy
generation is currently harnessed through solar photovoltaic sys-
The existing building sector, which is one of the most substantial tems for producing electricity, accounting for around 4.3% of total
consumers of energy, contributes towards 40% of world's total en- installed renewable energy in the EU in 2016 (Eurostat, 2016). In
ergy consumption and accounts for 30% of the total CO2 emissions contrast, solar thermal energy only accounts for around 2% of
(Ahmad et al., 2016a). Currently, energy systems are predominantly installed renewable generation. To ensure a renewable energy
based on fossils fuels. However, to reduce CO2 emissions and tackle future, it is vital that heating and cooling demands are also met by
the challenge of mitigating climate change, such systems need to renewable energy technologies. It is expected that solar thermal
include a combination of fluctuating renewable energy resources energy will continue to grow to play a significant future role in this
(RES) such as wind and solar energy, along with residual resources endeavour. Solar thermal energy is most commonly harvested via
(e.g., biomass) (Lund et al., 2014). In recent years, more focus is glazed evacuated tube collectors or flat-plate collectors. In a typical
being placed on increasing the energy efficiency, incorporating flat-plate collector, solar radiation passes through a transparent
renewable energy generation sources and optimally managing the cover. A large portion of this energy is absorbed by a blackened
fluctuation of energy supply (Mathiesen et al., 2015). Energy gen- absorber surface, which is then transferred to a fluid in tubes
eration through direct harnessing of solar radiation is one of the (Kalogirou, 2004). Evacuated thermal collectors contain a heat pipe
largest renewable energy technologies currently exploited world- inside a vacuum-sealed tube. The heat pipe is attached to a black
wide. Solar energy currently constitutes a significant proportion of copper fin that fills the absorber plate. These collectors also contain
a protruded metal tip on top of each tube, which is attached to the
sealed pipe. A small amount of fluid, contained in the heat pipe,
* Corresponding author. undergoes an evaporating-condensing cycle. The fluid evaporates
E-mail addresses: [email protected] (M.W. Ahmad), reynoldsJ8@cardiff. and rises to the heat sink region, where it dissipates latent heat, and
ac.uk (J. Reynolds), [email protected] (Y. Rezgui).

https://doi.org/10.1016/j.jclepro.2018.08.207
0959-6526/© 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821 811

condenses back to the collector to repeat the process (Kalogirou, solar collector as well as the efficiency to within a 7% relative error.
2004). Solar thermal energy is most commonly harvested on a Whilst the described modelling approaches achieve accurate cal-
smaller residential scale. However, solar thermal generation is culations of solar thermal performance; they do require highly
increasingly being integrated into larger scale projects in combi- complex mathematical modelling using thermodynamic principles.
nation with supplementary generation as part of wider, district- In these cases, the time and effort are justified due to the experi-
scale energy systems (Sawin et al., 2017). Prediction models, a mental nature of the solar collectors presented. However, in gen-
core component of smart-grids, of solar thermal systems, could be eral, analytical models are computationally intensive, and in most
used for following applications; cases, exhaustive exploration of parametric space for online control
is not feasible. Also, most consumers would not require such
 The comparison of predicted performance with the actual per- detailed modelling of solar thermal collector systems. Therefore,
formance of a system could be used as an indication of potential simpler and more generic modelling approaches are required to be
failure (e.g. shaded solar thermal collector, valve failure, solar able to forecast the key variables, namely outlet temperature, and
collector fault, etc.). Models can be used to automatically acti- useful heat energy gain.
vate an alarm in case of any problem so that any potential Data-driven models are often the preferred choice where fast
malfunction could be corrected promptly. responses are required (e.g., near real-time control applications)
 Optimal control of decentralised energy systems can be ach- and where pertinent information for detailed simulation/numerical
ieved by using prediction models of uncontrolled variables (e.g., models is not available (Ahmad et al., 2016b). Data-driven models
energy generation from RES, building heating demand, etc.). It capture the underlying physical behaviour by identifying trends in
allows building users, owners, mechanical and electrical (M&E) the data and do not require detailed information about system
engineers, thermal-grid operators, etc. to make informed de- characteristics. These techniques have been extensively applied to
cisions such as shifting energy consumption to off-peak periods, model or predict several parameters related to energy systems. For
increasing penetration of RES, etc. example solar PV generation was predicted in (Kharb et al., 2014;
 Prediction models could be used to analyse performance char- Yap and Karri, 2015; Yona et al., 2007), wind energy in (Cadenas and
acteristics of different solar collector types such as flat and Rivera, 2009; Catala ~o et al., 2011; Kusiak et al., 2009) and building
evacuated, different system configurations, etc. The models energy demand in (Ahmad et al., 2017; Benedetti et al., 2016; Chae
could be used by engineers, while designing a system, to achieve et al., 2016). They have proven accuracy and applicability to energy
maximum efficiency with minimum cost and computational scheduling problems with the significant advantage of simplicity
resources. and speed.
Application of machine learning algorithms for solar thermal
collectors is so far limited and most of the previous research studies
1.1. Related work are focused on using artificial neural networks. A recent article by
Reynolds et al. (2018) provides an overview of different modelling
Prediction and modelling of solar thermal and renewable energy techniques for solar thermal energy systems. An adaptive neuro-
generation systems have been addressed in the existing body of fuzzy inference system (ANFIS) modelling approach was applied
literature. Broadly, two methods are available for modelling solar to a solar thermal system in (Yaïci and Entchev, 2016). The model
thermal systems; one is built upon the analytical understanding of used time, ambient temperature, solar radiation, and stratification
the thermodynamic phenomena within the system, the second is a tank temperatures at the previous to predict the heat input from
rapidly growing field based on computational intelligence tech- the solar thermal collector and tank temperature at the next
niques. This section will give an overview of the two methods timestep. The resulting predictions were compared with an ANN
through reviewing existing studies within the literature as well as based on the same data and found both models performed
outlining the novelty and originality of the present work. comparably. Similarly, Ge czy-Víg and Farkas (2010) used an ANN to
Calculation of the performance of a solar thermal system is model the temperature at different layers of a solar-connected
highly complex when using an analytical modelling approach. An stratification tank using temperatures from the previous timestep
overview of the theoretical equations governing the thermal dy- as an input as well as mass flow rate and solar radiation. The model

namics of solar thermal collectors can be found in Duffie and achieved accurate predictions with an average deviation of 0.2 C
Beckman (2013). Often, computational models are required to but only predicted 5 minutes ahead. An ANN was used in (Kalogirou
capture the physical phenomena at the expense of a large amount et al., 2014) to allow prediction of daily energy gain and resulting
of computational time and power. A combination of finite differ- thermal storage tank temperature of a large-scale solar thermal
ence and electrical analogy models were used in (Notton et al., systems. Several combinations of input data were trialled including
2013; Motte et al., 2013) to calculate the outlet temperature of a daily solar radiation, average ambient temperature, and storage
building integrated solar thermal collector. The accuracy of the tank initial conditions. Results on test data achieved an R2 value of
numerical model was validated against experimental data allowing around 0.93 although a total daily figure is less likely to be useful
the authors to simulate future geometric and material design al- than a daily profile with hourly or sub-hourly resolution. Both
terations to improve the efficiency of the solar collector. A nu- (Caner et al., 2011; Esen et al., 2009) applied ANN to calculate the
merical modelling approach was applied to a building integrated, efficiency of experimental solar air collectors. Both of these studies
aerogel covered, solar air collector in Dowson et al. (2012). From achieved high R2 values, however, both case studies have a limited
this, the authors were able to calculate outlet temperatures and amount of training data. Therefore, required many, potentially
collector efficiency from weather conditions. The model outputs difficult to monitor, input features. So € zen et al. (2008) also aimed to
were validated to within 5% of the measured values over a short calculate the efficiency of a solar thermal collector using an ANN.
measurement period. As a result, the authors could simulate much More generic inputs were used such as solar radiation, surface
longer time periods to demonstrate the potential efficiency and temperature, and tilt angles. The model could accurately predict the
financial payback of their proposed solution. A numerical model- efficiency of a solar thermal collector with a maximum deviation of
ling approach within the MATLAB environment applied to a v- 2.55%. The authors argued that the resulting, more generic, model
groove solar collector was developed in Karim et al., (2014). The can therefore be used throughout the region to calculate the effi-
resulting model can predict the air temperature at any part of the ciency of any similar flat plate collector. Kalogirou et al. (2008)
812 M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821

utilised ANN prediction of solar thermal system temperatures to are seldom used and most of the used data-driven approaches are
develop an automatic fault diagnosis module. The ANN models based on artificial neural networks or its variants. To the best of
were trained using fault free TRNSYS simulation data. The pre- authors’ knowledge, there are not any studies that investigated the
dictions of fault free temperature resulting from the trained ANN applicability of tree-based methods and in particular tree-based
were compared to the real system data from which the likelihood of ensemble methods for modelling solar thermal systems. From the
system failure could be determined. The fault detection system was literature, it was also found that some of the most widely used
shown to effectively detect three types of failure relating to the machine learning algorithms (e.g. artificial neural networks, deci-
collector, the pipe insulation, and the storage tank. Liu et al. (2015) sion trees) are prone to be unreliable due to their instability issues
tested the applicability of two types of ANN, multi-layer feed-for- (Breimanet al., 1996). The instability of these algorithms may result
ward neural networks (MLFN) and general regression neural net- in large variations in the model output due to small changes in the
works (GRNN) as well as a support vector machines (SVM) model to input data (Breimanet al., 1996; Wang et al., 2018). As highlighted in
calculate the heat collection rate and heat loss coefficient of solar the above section, the developed models from this research could
thermal systems. They aimed to allow calculation using simple, be used for real-time optimisation, fault detection and diagnosis.
portable test instruments rather than the current method which Therefore, instability of models could cause failure of the prediction
requires deconstruction of the entire system. They find that the models as these application rely on the accuracy of the developed
MLFN is best suited to predicting the heat collection rate but the models. In the early 1990s, more advanced machine learning
GRNN performed better at predicting the heat loss coefficient. techniques, ensemble learning, were developed to overcome these
Table 1 summarizes previous work on modelling solar thermal instability issues (Wang et al., 2018; Hansen and Salamon, 1990).
energy systems. Ensemble-based methods generally perform better than the
individual learners that construct them, as they overcome their
limitations and there might not be enough data available to train a
1.2. Motivation, objectives and contributions
single model with better generalisation capabilities (Dietterich,
2000; Fan et al., 2014). The paper compares the accuracy in pre-
Thermal performance analyses of the solar thermal system are
dicting hourly useful solar thermal energy (USTE) by using four
too complex; analytical models are computationally intensive and
different machine learning algorithms: random forest (RF),
require a considerable amount of computational time to accurately
extremely randomised trees/extra tree (ET), decision trees (DT) and
model these systems. On the other hand, data-driven approaches

Table 1
Review summary of solar thermal system modelling techniques.

Ref Method Input Parameters Output Parameters Model Accuracy Location

(Notton et al., 2013) Numerical Modelling Thermodynamic parameters, Component temperatures 5e10% Relative RMSE France
weather conditions
(Dowson et al., 2012) Numerical Modelling Thermodynamic parameters, Solar thermal outlet e UK
weather conditions, inlet temperature
temperature
(Karim et al., 2014) Numerical Modelling Thermodynamic parameters, Component temperatures, air <7% Relative Error e
weather conditions, inlet temperatures, efficiency
conditions
(Yaïci and Entchev, 2016) ANFIS Ambient temperature, solar Tank temperature, heat input, 1e9% Relative Error Canada
radiation, previous tank solar fraction
temperatures
czy-Víg and Farkas, 2010)
(Ge ANN Ambient temperature, solar Tank temperature at 8 layers 0.24 Average Error Hungary
radiation, mass flow rate,
previous tank temperature
(Kalogirou et al., 2014) ANN Average daily temperature, Daily energy output, final tank r ¼ 95e96% e
total daily solar radiation, temperature
starting tank temperature
(Caner et al., 2011) ANN Date, time, inlet and outlet Collector efficiency R2 ¼ 0.9967, Turkey
collector temperature, tank RMSE ¼ 1.73%
temperature, ambient and
surface temperature, solar
radiation
(Esen et al., 2009) WNN, ANN Ambient temperature, solar Efficiency, outlet temperature R2 ¼ 0.9992/0.9994, Turkey
radiation, absorbing plate RMSE ¼ 0.0094/0.0034
temperatures
€ zen et al., 2008)
(So ANN Date, time, surface Efficiency R2 ¼ 0.983 Turkey
temperature, solar radiation,
declination, azimuth and tilt
angles
(Kalogirou et al., 2008) ANN Global radiation, beam Collector inlet and outlet R2 ¼ 0.9920, 0.9996, Cyprus
radiation, ambient temperature, storage inlet and 0.8823, 0.9504
temperature, incidence angle, outlet temperature
wind speed, humidity, flow
availability, input temperature
(Liu et al., 2015) MLFN, GRNN, SVM Tube length, number and Heat collection rate, heat loss RMSE ¼ 0.14/0.73 China
radius, hot water mass, coefficient (MLFN), ¼ 0.33/0.71
collector area, tilt angle, final (GRNN), ¼ 0.29/0.73 (SVM)
temperature

Note - ANFIS (Adaptive Neuro-Fuzzy Inference System), ANN (Artificial Neural Network), WNN (Wavelet Neural Network), MLFN (Multi-Layer Feed-Forward Neural Network),
GRNN (General Regression Neural Network), SVM (Support Vector Machine), RMSE (Root Mean Squared Error).
M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821 813

support vector regression (SVR). The work also does not take into
account system control variables as input features, which increases
the complexity of the problem. Furthermore, the models developed
in this study can provide a 24-h ahead prediction of USTE at an
hourly time-step rather than the total daily sum or parameters with
limited applicability such as efficiency.
The research presented in this paper mainly addresses the
following aspects;

 the use of ensemble-based techniques for solar thermal systems


as current application of machine learning algorithms are
limited and most of the previous research work are focussed on
artificial neural networks and its variants,
 the use of tree-based ensemble methods to provide insight into
the analysis of the variable importance of each input feature, i.e.
using them as feature selection tools. In most of the existing Fig. 1. The parameters of the support vector regression. Source (Dong et al., 2005; Li
research, domain knowledge is widely used to reduce input et al., 2009).

variable space. The presented analysis will allow researchers to


gain better understanding of the modelled systems, and,
In Equation (1), ∅ðXÞ denotes the high-dimensional space. A
 to demonstrate that tree-based ensemble methods can improve
regularised risk function, given in Equation (2), is used to estimate
the prediction and stability of the developed model. Also, they
coefficients W and b (Li et al., 2009).
are more computationally efficient as compared to the con-
ventional methods used in the literature (for example, support
1 1 X
N
vector regression in our case). Minimise : kWk2 þ C Lε ðYi ; fðXi ÞÞ (2)
2 N
i¼1
The rest of the paper is organised as follows: Section 2 describes
the principles of random forest, extra trees, decision trees and 
0; jYi  f ðXi Þj  ε
support vector regression. The methodology of the developed Lε ðYi ; f ðXi ÞÞ ¼ (3)
jYi  f ðXi Þj  ε; others
prediction models is presented in Section 3, along with feature
selection process and results. Prediction results and discussion are
kWk2 is known as regularised term and C is the penalty parameter
detailed in Section 4, whereas concluding remarks and future
to determine the flexibility of the model. The second term of
research directions are presented at the end of the paper.
Equation (2) is the empirical error and is measured by the ε-in-
tensity loss function (Equation (3)). This defines a ε tube shown in
2. Machine learning methods
Fig. 1. If the predicted value is within the tube, then the loss is zero.
Whereas if it is outside the tube, then the loss is the magnitude of
Four data-driven algorithms for predicting useful solar thermal
the difference between the predicted value and the radius ε of the
energy are introduced in this section. These algorithms include
tube (Li et al., 2009). To estimate W and b, the above equation is
extra trees (ET), random forest, decision trees, and support vector
transformed into the primal objective function given by Equation
regression (SVR).
(4) (Li et al., 2009).
2.1. Support vector machines
Minimise 1 1XN  
 : jjW jj2 þ C z  zi (4)
Support vector machine is one of the most widely used z1 ; z1 ; W; b 2 N i¼1 i

computational intelligence technique applied in building energy


8
and renewable energy generation prediction and modelling appli- < Yi  W$∅ðxi Þ  b  ε þ z1
cations. It provides a sparse pattern of solutions and flexible control Subject to: W$∅ðxi Þ þ b  ε þ z1 ; i ¼ 1; 2…; N
:
on the model complexity (Deng et al., 2018), making it highly z1  0 z1  0
effective in solving non-linear problems even with a small sample
of training datasets. SVM adopts the structure risk minimisation In the above equations, z1 and z1 are the slack variables. By
(SRM) principle; which instead of only minimising the training introduction of kernel function kðXi ; Xj Þ, Equation (4) is written as
error (this the principle of traditional empirical risk minimisation), bellow;
minimises an upper bound of the generalisation error consisting of
the sum of the training error and a confidence interval (Dong et al., Minimise 1 X N X N     
  : ai  ai aj  aj $k Xi ; Xj
2005). SVM is commonly applied with different kernel functions to fai g; ai 2
i¼1 j¼1
map the input space into a higher dimensional feature space, which (5)
introduces the non-linearity in the solution, and to perform a linear X
N   X
N  
ε ai  ai þ Yi ai  ai
regression in the feature space (Li et al., 2009; Vapnik, 2013).
i¼1 i¼1
Assuming normalised input variables consist of a vector Xi, and Yi is
the useful solar thermal energy (i represents the ith data-point in 8
<X 
> 
N
the dataset). In this case, a set of data points can be defined as
fðXi ; Yi ÞgN
ai  ai ¼ 0
i¼1 , where N is the total number of samples. An SVM Subject to:
regression approximates the function using the form given in
>
: i¼1 
ai ; ai 2 0; C
Equation (1) (Dong et al., 2005; LIN et al., 2006).
In Equation (5) ai ; ai are Lagrange multipliers, i and j are
Y ¼ f ðXÞ ¼ W$∅ðXÞ þ b (1) different samples. Therefore, Equation (1) becomes (Li et al., 2009);
814 M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821

where x is the vectored input variable, C is the number of trees, and


X
N     Ti ðxÞ is a single regression tree constructed based on a subset of
Y ¼ f ðXÞ ¼ ai  ai k Xi ; Xj þ b (6)
input variables and the bootstrapped samples. RF can natively
i¼1
perform out-of-bag error estimation in the process of constructing
the forest by using the samples that are not selected during the
training of the i-th tree in the bagging process. This subset is called
2.2. Random forest out-of-bag, which can compute an unbiased estimation of gener-
alisation error without using an external text data subset (Breiman,
A random forest (RF) is a tree-based ensemble method and was 2001). RF also enables assessment of relative importance of input
developed to address the shortcomings of traditional Classification features, which is useful for dimensionality reduction to improve
and Regression Tree (CART) method. RF consists of a large number model's performance on high-dimensional datasets (Ahmad et al.,
of weak decision tree learners, which are grown in parallel to 2017). The RF switches one of the input variables while keeping
reduce the bias and variance of the model at the same time the remaining constant, and measures the mean decrease in
(Breiman, 2001). For training a random forest, N bootstrapped model's prediction accuracy, which is then used to assign relative
sample sets are drawn from the original dataset. Each bootstrapped importance score for each input variable (Breiman, 2001). Fig. 2
sample is then used to grow an unpruned regression (or classifi- shows the structure of random forest algorithm.
cation) tree. Instead of using all available predictors in this step,
only a small and fixed number of randomly sampled K predictor are
selected as split candidates. These two steps are then repeated until 2.3. Extra trees
C such trees are grown, and new data is predicted by aggregating
the prediction of the C trees. RF uses bagging to increase the di- Extremely randomised trees (or extra trees) (Geurts et al., 2006)
versity of the trees by growing them from different training data- algorithm is a relatively recent machine learning techniques and
sets, and hence reducing the overall variance of the model was developed as an extension of random forest algorithm, and is
(Rodriguez-Galiano et al., 2015). A RF regression predictor can be less likely to overfit a dataset (Geurts et al., 2006). Extra tree (ET)
expressed as; employs the same principle as random forest and uses a random
subset of features to train each base estimator (John et al., 2016).
b C 1X C
However, it randomly selects the best feature along with the cor-
f RF ðxÞ ¼ T ðxÞ (7)
C i¼1 i responding value for splitting the node (John et al., 2016). ET uses
the whole training dataset to train each regression tree. On the
other hand, RF uses a bootstrap replica to train the model.

Fig. 2. Structure of random forest.


M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821 815

2.4. Decision trees 3. Material and methods

A decision tree (DT) is an efficient algorithm for classification This section details the training and testing datasets, feature
and regression problems. The basic idea of the decision tree algo- selection process and results. The section also details metrics used
rithm is to split a complex problem into several simpler problems, for assessing models’ predictive performance. The implementation
which might lead to a solution that is easier to interpret (Xu et al., of extra trees, random forest, support vector regression included in
2005). A DT represents a set of conditions, which are hierarchically the scikit-learn (Pedregosa et al., 2011) module of python pro-
organised and successively applied from root to leaf of the tree gramming language was used for all developmental and experi-
(Breiman et al., 1984). DTs are easy to interpret and their structure is mental work. The work was carried out on a personal computer
transparent. DTs produce a trained model that can represent logical (Intel Core i5 2.50 GHz with 16 GB of RAM).
rules, which can then be used to predict new dataset through the
repetitive process of splitting (Ahmad et al., 2017). According to
Breiman et al. (1984); in a decision tree method, features of data are 3.1. Data description
referred as predictor variables whereas the class to be mapped is
the target variable. For regression problems, the target variables are The studied solar thermal system is installed at an experimental
facility in Chambery, France and has a total area of 400 m2. The solar
continuous.
To train a DT model, recursive partitioning and multiple re- loop contains a mixture of 60% and 40% water-glycol, and has a
gressions are performed from the training dataset. From the root density of 1044 kg/m3. The mass flow rate, supply and return
node of the tree, the data splitting process in each internal node of a temperatures are monitored every minute. The building also has an
rule of the tree is repeated until the stopping criterion is met on-site weather station; which monitors outdoor dry-bulb air
(Rodriguez-Galiano et al., 2015). In DT algorithm, each leaf node of temperature, solar radiation, wind speed and direction, relative
the tree contains a simple regression model, which only applies to humidity and atmospheric pressure. In total, after removing out-
that leaf only. After the induction process, pruning can be applied to liers and missing values, the training and testing datasets contained
improve the generalisation capability of the model by reducing the 5580 data samples. The data was collected from 01st April 2017 to
tree's complexity (Rodriguez-Galiano et al., 2015). For a solar 25th January 2018. Predicting USTE is a challenging task as none of
thermal collector application, a simple example of decision tree to the system variables (i.e. mass flow rate, supply and return tem-
predict USTE is depicted in Fig. 3. The output of the decision tree is perature) are considered as input variables. The system variables
the useful solar thermal energy. It is worth mentioning that the are not available in advance and therefore are not suitable for future
decision tree is only for demonstration purpose and the actual DT predictions (unless separate models are developed for those vari-
used in the analysis is more complex (i.e., more than two features ables). Also, USTE did not exhibit any clear pattern as opposed to
are considered when looking for best split and the tree is deeper). solar PV prediction (which is almost directly related to solar radi-
The decision tree shown in Fig. 3 only considers solar radiation and ation), as it would also depend on energy load on thermal storage.
outdoor dry-bulb air temperature as input variables, and the Training data is taken as 70% of the whole dataset, and remaining
maximum depth of the tree is restricted to 3. data samples were used as testing dataset. Fig. 4 displays the scatter
plots for each of the input variables with USTE. It is clear that any

Fig. 3. Decision tree for predicting energy gain from solar collector. Note: Sol rad.: solar radiation, Out Temp.: outdoor dry-bulb air temperature.
816 M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821

Fig. 4. Scatter plot demonstrating the relation between input and output variables.

relationship of input features and the output variable is not trivial, two indicators are defined as below;
and simple learners may not be able to accurately predict USTE. It is vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uN
also important to mention that features were normalised before uP
u ðyi  y b i Þ2
applying SVR to avoid features in greater numeric ranges domi- t
i¼1
nating those in smaller numeric ranges. In this paper, we will focus RMSE ¼ (9)
N
on developing machine learning models for useful solar thermal
energy (Qc ), without using system controlled and uncontrolled
variables (i.e. mass flow rate, and supply and return temperature). 1 XN
MAE ¼ b  yi j
jy (10)
The absorption heat transfer rate or USTE, Q_ c , can be calculated by N i¼1 i
using Equation (8) (Karsli, 2007).
where yb i is the predicted value, yi is the actual value, and N is the
Q_ c ¼ m_  Cp  ðTout  Tin Þ (8) total number of samples. In this work, root mean squared error
(RMSE) is used as the primary metric.
In Equation (8), m_ is the mass flow rate, Cp is the specific heat of
the solar collector fluid, and Tin and Tout are the inlet and outlet
temperatures of the solar collector. 3.3. Feature subset selection

3.2. Uncertainty analysis Feature selection is an important step in the development of


machine learning models. The number of input features may vary
To assess the performance of developed models on training and from two to hundreds of features, among them many may be un-
testing datasets; root mean square error (RMSE), mean absolute important or have lower correlation with the target variables.
error (MAE) and the determination coefficient (R2) were calculated. Previous research works have demonstrated that prediction
The determination coefficient was adopted to measure the corre- models are often affected by high variance in the training dataset
lation between the actual and estimated USTE values. The former (Neupane et al., Aung). Feature selection methods increase models'
performance on high-dimensional datasets by reducing training
time, enhancing model's generalization capability, improving
interpretability of the models (Ahmad et al., 2017). Random forest
and extra trees also allow the estimation of the importance of each
feature in the model. Fig. 5 shows the results of internal calculation
carried out by ET and RF algorithms, as well as features' Pearson
correlation with hourly useful thermal energy gain. It is interesting
to notice that each of the machine learning models has different
variable importance score for some of the input features. As an
example; for the ET model, outdoor relative humidity has a variable
importance score of 0.072. Whereas, RF has a low score for relative
humidity (i.e. 0.0058). Solar radiation was considered as the most
important feature by both algorithms. As expected, Outdoor dry-
bulb air temperature, solar radiation and hour of the day present
a positive correlation with the useful solar thermal energy, as
demonstrated by their Pearson correlation coefficients. On the
other hand, outdoor relative humidity, wind speed, wind direction,
the month of the year and atmospheric pressure are negatively
Fig. 5. Feature importance and Pearson correlation for solar thermal useful energy
prediction. Notes: Pres.: atmospheric pressure, Mon: month of the year, Day: day of the
related to the useful solar thermal energy. Later in the results, we
week, Hr: hour of the day, WD: wind direction, WS: wind speed, Rad: Solar radiation, will discuss that the prediction of USTE could be improved by
RH: Outdoor air relative humidity, DBT: outdoor air dry-bulb temperature. integrating demand load prediction. The prediction could also be
M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821 817

improved by considering previous hour useful solar thermal en- 1=K, where K is the number of input features. Therefore, for this
ergy. However, in the current work, previous hour values are not paper, g ¼ 1/5 was used to estimate outlet temperature and useful
considered and will need to be investigated in future. energy from a solar thermal heating system. Penalty parameter (C)
of the error term is used to find the trade-off between the model
4. Prediction results and discussion complexity and the degree to which deviations larger than ε are
tolerated in the optimisation formulation (represented by Equation
This section details the prediction results obtained with tree- (4)). A small value of C will place a small weight on the training data
based ensemble machine learning methods (random forest and and therefore will result in an under-fit model. On the contrary, a
extra trees), support vector regression and decision trees; which too large value of C will only minimise the empirical risk, and hence
are described in Section 2. This section also details an assessment of will under-fit the training dataset. In this study, a step wise search
the impact of different hyper-parameters on model's performance. was used to find optimal values of C and ε. Initially, ε was fixed at 0.1
while varying C over the range of 27 and 27. From results in Fig. 6, it
is evident that initially there was significant improvement in the
4.1. Hyper-parametric tuning
performance of the model with an increase of C. However, from the
results, it was found that higher values of C did not significantly
Model's hyper-parameters has a great influence on its predictive
improve the performance and also it was computationally intensive
performance, robustness and generalization capability. This section
process to train SVR with larger C values. Therefore, a value of
details the selection of optimal hyper-parameters of studied algo-
C ¼ 26 was selected for further experiments. Too large values of
rithms. For this purpose, a stepwise searching method is used to
parameter ε also deteriorate model's accuracy as it controls the
find optimal values of model's hyper-parameters. In order to pre-
width of the ε-intensive zone (Dong et al., 2005). Values of ε were
vent over-fitting problems and analyse models' performance on
varied over the range of 210 and 25, while keeping C ¼ 26. It is
unknown data, a cross-validation approach is used to select optimal
evident from Fig. 6 that larger values drastically reduce the accu-
hyper-parameters. In k-fold cross-validation, the training dataset is
racy of the model. From the results, a value of ε ¼ 27 was selected
divided into k subsets of equal size. Each k subset is used as a
as it provided best results.
validation dataset, whereas the remaining k-1 subsets are used as
training dataset. In this study, five-fold validation is performed for
selecting optimal hyper-parameters. 4.1.2. Random forest, extra trees and decision trees
Tree-based ensemble methods (extra trees and random forest)
4.1.1. Support vector regression need the adjustment of three hyper-parameters, i.e. number of
Different factors affect the generalisation capabilities of support trees (M), number of minimum samples required for splitting a
vector regression, i.e. to predict unseen data after learning carried node (nmin ) and attribute selection strength parameter (K).
out on training dataset. SVR needs the adjustment of (a) kernel Parameter M represents the total number of trees in the forest and
function e linear, polynomial, sigmoid and radial-basis (RBF); (b) is directly related to the computational cost. Therefore, a reason-
gamma of the kernel function, except for linear kernel function; (c) able number of trees need to be selected to find a trade-off between
degree of the polynomial kernel function; (d) bias on the kernel predictive power and computational time. For this paper, 100
function, only applicable to the sigmoid and polynomial kernels; (e) number of trees were selected in the forest as increasing the
penalty parameter (C) of the error term and; (f) radius (ε). These number of trees to greater than 100 did not significantly improve
parameters need to be tuned to make sure that the developed prediction results. K denotes the number of randomly selected
models do not under fit or over fit data. features at each node during the tree growing process, and de-
In the literature, RBF kernel has been widely used for regression termines the strength of variable selection process. For most
problems as it non-linearly maps samples into a high dimensional regression problems, this parameter is set to p, where p is the
space, and can easily handle the non-linear relationship between dimension of input features vector (Geurts et al., 2006).
class labels and attributes (Dong et al., 2005). A polynomial kernel For ET, DT and RF, it was found that nmin did not significantly
function has more hyper-parameters to tune as compared to RBF. enhance the performance of the models and therefore a default
Due to its wide use and lower complexity (fewer hyper-parameters value of 2 was selected for this parameter. K values were varied in
to consider), RBF was selected for this study. For RBF, there are three the range of (Ahmad et al., 2016a; Kalogirou, 2004) (i.e. total
hyper-parameters to tune, i.e., kernel coefficient (g), penalty numbers of features selected for model construction process). For
parameter of the error term (C) and radius (ε). According to the ET and DT, K ¼ 5 resulted in better results. Whereas, for RF, K ¼ 2
definition of the kernel coefficient by Chang and Lin (2011),  g ¼ produced optimal results. It is worth mentioning that for RF and ET,

Fig. 6. (a) The results of various C, where ε ¼ 0.1 and (b) the results of various ε, where C ¼ 26.
818 M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821

Table 2
Results of various dmin for ET, RF and DT.

dmin Extra trees Random forest Decision Tree

R2 () RMSE (kWh) MAE (kWh) R2 () RMSE (kWh) MAE (kWh) R2 () RMSE (kWh) MAE (kWh)

1 0.7634 16.1055 10.5532 0.5226 22.8774 15.9858 0.7819 15.4619 8.6532


3 0.8944 10.7576 5.1266 0.8982 10.5639 5.9474 0.9081 10.0368 4.5962
5 0.9317 8.6529 3.8446 0.9397 8.1295 4.0182 0.9300 8.7570 3.4668
7 0.9443 7.8122 3.3374 0.9513 7.3028 3.2673 0.9248 9.0782 3.4126
9 0.9523 7.2327 2.9462 0.9552 7.0041 3.0218 0.9239 9.1292 3.3824
10 0.9538 7.1187 2.8737 0.9570 6.8647 2.9168 0.9239 9.1322 3.3420
11 0.9536 7.1287 2.8534 0.9570 6.8660 2.8971 0.9184 9.4566 3.4448
12 0.9537 7.1252 2.8249 0.9560 6.9443 2.9236 0.9201 9.3552 3.3994
13 0.9529 7.1854 2.8382 0.9567 6.8874 2.8882 0.9194 9.3982 3.4511
15 0.9524 7.2227 2.8225 0.9570 6.8680 2.8643 0.9173 9.5232 3.5334
20 0.9526 7.2080 2.8242 0.9553 7.0002 2.8986 0.9099 9.9370 3.6294

Notese For RF: nmin ¼ 2, K ¼ 2, M ¼ 100; for ET: nmin ¼ 2, K ¼ 5, M ¼ 100; and for DT: nmin ¼ 2, K ¼ 5.

Table 3
Comparison of models on full training and testing datasets.

Model Training dataset Testing dataset Training time (ms)

R2 () RMSE (kWh) MAE (kWh) R2 () RMSE (kWh) MAE (kWh)

DT 0.957 6.780 2.908 0.930 8.758 3.467 16.00


ET 0.987 3.791 1.630 0.954 7.119 2.874 421.00
SVR 0.917 9.460 4.459 0.903 10.287 4.755 1287.80
RF 0.985 3.955 1.796 0.957 6.8651 2.917 491.60

Fig. 7. Prediction results from DT, ET, RF and SVR models on testing data samples.

K parameter did not drastically enhance the results. On the con- 4.2. Model analytical results
trary; for DT, K significantly enhance the prediction results, i.e. for
values of K equals to 1 and 5, models has R2 values of 0.8577 and Table 3 presents the RMSE, R2, and MAE on training and testing
0.9140, respectively. Table 2 shows the dependence of models’ datasets for predicting USTE. Generally, errors on the testing
performance on maximum tree depth. Generally, deeper trees dataset show the generalisation capabilities of the developed
resulted in better performance. For ET and RF, trees deeper than 10 models. On the other hand, errors on the training dataset show the
started to deteriorate and led to under-fitting. A maximum depth of goodness-of-fit of the developed models. Results in Table 3 suggest
5 levels produced marginally better results for DT. From the results, that RF and ET achieved the best performance across training and
it is evident that for the studied tree-based ensemble algorithms, testing datasets. RF achieved RMSE values of 3.96 and 6.86 on
default parameters are near-optimal and could result in a robust training and testing datasets, respectively. Whereas, ET has RMSE
prediction model. values of 3.79 on training and 7.12 on testing datasets. The results
M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821 819

showed that tree-based ensemble methods have nearly compara- that both RF and ET showed almost same behaviour on training and
ble performance. SVR has the highest training and testing errors, testing datasets. Their accuracy significantly increased between
while DT has achieved marginally better performance as compared n ¼ 100 and n ¼ 500. SVR showed relatively lower accuracy on both
to SVR. Fig. 7 illustrates the plot for hourly USTE values predicted by training and testing datasets as compared to ET, RF, and DT. It is also
all studied machine learning models vs measured data. It can be important to mention that for tree-based algorithms; the accuracy
concluded that both ET and RF showed strong non-linear mapping on training dataset reduced with an increase in the training dataset.
generalisation ability, and can be effective in predicting hourly For SVR; initially there was a decrease in the accuracy on training
USTE. It was found that best performing methods, RF and ET, over dataset, which started to increase after n ¼ 500. Fig. 8(b) shows the
predicted some of the values. Even though the solar radiation SVR has significantly higher training time as compared to RF and ET.
values were higher and it was expected to have higher values of Please note that DT training time is considerably small and there-
USTE. However, the difference between supply and return tem- fore we have not considered it in Fig. 8(b). SVR training time
perature was small, and therefore the actual value of USTE was increased exponentially with an increase in the training data
lower. In the future work, this problem will need to be tackled, and samples. ET and RF algorithms have comparable training time on
it is envisaged that considering thermal load on the storage tank as lower number of training samples. However, RF has marginally
an input variable will further improve models' accuracy. SVR al- higher training time for n > 1500. In this work, we analysed the
gorithm did not capture the peaks values of USTE and therefore impact of number of samples on models' performance and training
produced worse results as compared to other algorithms. RF closely time. However, in future, training time dependency on algorithm's
followed the USTE pattern and therefore performed better on the hyper-parameter will be explored.
testing dataset. Also, ET algorithm had the lowest training time
(421 ms) than RF (491.60 ms) and SVR (1287.80 ms). Among all
studied algorithms, DT was found to be the least computationally 5. Conclusions
intensive. However, this comes at the expense of model's accuracy,
as DT has a lower training and testing performances as compared to The paper details the feasibility of using machine learning al-
ET and RF. gorithms to predict hourly useful solar thermal energy. For this
purpose, a solar thermal system installed at Chambe ry, France was
used as a case study. Experiments were performed over the period
4.3. Number of training samples of April 2017 through January 2018 to gather experimental data for
training and testing machine learning models. Different statistical
The number of training samples has two impacts on machine measures were used to appraise the models’ prediction perfor-
learning algorithms; 1) with the increase in the number of training mance. The capability of decision tree-based ensemble methods for
samples, it is expected that the training time and memory usage predicting the USTE has been verified with better accuracy as
during the training phase will increase, and 2) it will increase compared to decision trees and support vector regression. The re-
prediction accuracy of the model. It is worth mentioning here that sults also demonstrated that ET and RF algorithms have signifi-
the training time could also depend on many factors, e.g the cantly lower training time, i.e., 421 ms and 491.60 ms, respectively
implementation of an algorithm in the programming library, as compared to 1287.80 ms for SVR.
number of input features used, model complexity, feature extrac- The developed tree-based ensemble methods improved the
tion, input data representation and sparsity (Ahmad et al., 2017). prediction results and have RMSE values of 6.87 and 7.12 for RF and
For tree-based ensemble methods, this would also depend on other ET, respectively. Both of these methods were developed to over-
factors, e.g. number of trees in the forest, maximum depth of a tree, come shortcomings of CART, e.g. final tree is not guaranteed to be
etc. (Ahmad et al., 2017). To demonstrate the sensitivity of machine the optimal tree and generate a stable model. Simple regression
learning models to the training dataset size and time required to trees are not effective for predicting hourly USTE. However, en-
construct a model, different experiments were performed. Fig. 8(a) sembles of these trees have significantly improved models’ per-
shows the effect of the number of training data samples on models' formance. Tree-based ensemble methods discussed in this paper
predictive performance. Generally, all developed models react in a require fewer tuning parameters and in most cases default hyper-
gradual way to an increase in the training sample size. For all parameters can result in satisfactory performance. The developed
studied algorithms, it was found that increasing the number of models only used weather and time information to predict hourly
samples increases the models' generalisation ability (i.e., increased USTE. To the best of our knowledge, previous works also considered
performance on the unseen testing dataset). It can be seen in Fig. 8 system control variables as inputs to the model. The system

Fig. 8. a) Effect of number of training data samples on prediction accuracy, b) Effect of number of training data samples on training time.
820 M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821

variables are not available in advance and therefore are not suitable RES renewable energy resources
for future predictions (unless separate models are developed for RMSE root mean square error
these variables). The developed tree-based ensemble methods can USTE useful solar thermal energy
achieve accurate and reliable hourly prediction and could be used
for fault detection and diagnosis (e.g., solar collector fault, shaded References
collector area, value fault, etc.), making informed decisions and
operational optimisation of multi-vector energy systems. In future Ahmad, M.W., Mourshed, M., Mundow, D., Sisinni, M., Rezgui, Y., 2016a. Building
work, another promising emerging technique, deep learning, will energy metering and environmental monitoring e a state-of-the-art review
and directions for future research. Energy Build. 120 (Suppl. C), 85e102. ISSN
need to be investigated for solar thermal collectors. Machine 0378-7788, secondoftwo doi: https://doi.org/10.1016/j.enbuild.2016.03.059.
learning models for different types of solar collectors and solar Ahmad, M.W., Mourshed, M., Yuce, B., Rezgui, Y., 2016b. Computational intelligence
collector based systems will need to be developed to cover a wide techniques for HVAC systems: a review. Building Simulation 9 (4), 359e398.
https://doi.org/10.1007/s12273-016-0285-4 secondoftwo doi:
range of systems. The performance of models will be enhanced in
Ahmad, M.W., Mourshed, M., Rezgui, Y., 2017. Trees vs Neurons: comparison be-
future by incorporating storage load predictions. tween random forest and ANN for high-resolution prediction of building energy
consumption. Energy Build. 147 (Suppl. C), 77e89. https://doi.org/10.1016/
j.enbuild.2017.04.038. ISSN 0378-7788, secondoftwo doi:
Acknowledgement Benedetti, M., Cesarotti, V., Introna, V., Serranti, J., 2016. Energy consumption
control automation using Artificial Neural Networks and adaptive algorithms:
The work was carried out in the framework of the Horizon 2020 proposal of a new methodology and case study. Appl. Energy 165, 60e71.
https://doi.org/10.1016/j.apenergy.2015.12.066 secondoftwo doi:
project (Grant reference e 731125) PENTAGON 00 Unlocking Euro- Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5e32.
pean grid local flexibility trough augmented energy conversion Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A., 1984. Classification and Regression
capabilities at district-level”. The authors acknowledge the finan- Trees. CRC press.
Breiman, L., et al., 1996. Heuristics of instability and stabilization in model selection.
cial support from the European Commission. The authors also like
Ann. Stat. 24 (6), 2350e2383.
to thank Michael Descamps (CEA-INES) for providing valuable Cadenas, E., Rivera, W., 2009. Short term wind speed forecasting in La Venta,
experimental data. Oaxaca, Me xico, using artificial neural networks. Renew. Energy 34 (1),
274e278. https://doi.org/10.1016/j.renene.2008.03.014 secondoftwo doi:
Caner, M., Gedik, E., Keçebaş, A., 2011. Investigation on thermal performance
Nomenclature calculation of two type solar air collectors using artificial neural network.
Expert Syst. Appl. 38 (3), 1668e1674. https://doi.org/10.1016/j.eswa.2010.07.090
z1 , z1 slack variables
secondoftwo doi:
Catala~o, J. P. d. S., Pousinho, H.M.I., Mendes, V.M.F., 2011. Short-term wind power
ai ; ai Lagrange multipliers forecasting in Portugal by neural networks and wavelet transform. Renew.
kwk2 Euclidean norm Energy 36 (4), 1245e1251. https://doi.org/10.1016/j.renene.2010.09.016 sec-
ondoftwo doi:
kðXi ; Xj Þ kernel function Chae, Y.T., Horesh, R., Hwang, Y., Lee, Y.M., 2016. Artificial neural network model for
ε precision parameter/radius forecasting sub-hourly electricity usage in commercial buildings. Energy Build.
C Number of trees/penalty parameter 111, 184e194. https://doi.org/10.1016/j.enbuild.2015.11.045 secondoftwo doi:
g RBF kernel coefficient Chang, C.-C., Lin, C.-J., 2011. LIBSVM: a library for support vector machines. ACM
Trans. Intell. Syst. Technol. (TIST) 2 (3), 27.
M number of trees in a forest Deng, H., Fannon, D., Eckelman, M.J., 2018. Predictive modeling for US commercial
x inputs building energy use: a comparison of existing statistical and machine learning
N number of training samples for SVR algorithms using CBECS microdata. Energy Build. 163, 34e43. https://doi.org/
10.1016/j.enbuild.2017.12.031. ISSN 0378-7788, secondoftwo doi:
Cp specific heat Dietterich, T.G., 2000. Ensemble methods in machine learning. In: International
Tout outlet temperature of the solar collector Workshop on Multiple Classifier Systems, vols. 1e15. Springer.
Dong, B., Cao, C., Lee, S.E., 2005. Applying support vector machines to predict
building energy consumption in tropical region. Energy Build. 37 (5), 545e553.
Abbreviations https://doi.org/10.1016/j.enbuild.2004.09.009. ISSN 0378-7788, secondoftwo
ANFIS adaptive neuro-fuzzy inference system doi:
Dowson, M., Pegg, I., Harrison, D., Dehouche, Z., 2012. Predicted and in situ per-
CART classification and regression trees
formance of a solar air collector incorporating a translucent granular aerogel
ET extra trees cover. Energy Build. 49, 173e187. https://doi.org/10.1016/j.enbuild.2012.02.007
MAE mean absolute error secondoftwo doi:
Duffie, J.A., Beckman, W.A., 2013. Solar Engineering of Thermal Processes. John
RBF radial basis function
Wiley & Sons.
RF random forest Esen, H., Ozgen, F., Esen, M., Sengur, A., 2009. Artificial neural network and wavelet
SVR support vector regression neural network approaches for modelling of a solar air heater. Expert Syst. Appl.
WNN wavelet neural network 36 (8), 11240e11248. https://doi.org/10.1016/j.eswa.2009.02.073 secondoftwo
doi:
b bias term Eurostat, 2016. Primary Production of Renewable Energy by Type. URL. http://ec.
nmin number of minimum samples required for splitting a europa.eu/eurostat/web/energy/data/main-tables.
tree node Fan, C., Xiao, F., Wang, S., 2014. Development of prediction models for next-day
building energy consumption and peak power demand using data mining
∅ðxÞ non-linear transformation techniques. Appl. Energy 127, 1e10. https://doi.org/10.1016/j.ape-
W weight vector nergy.2014.04.016 secondoftwo doi:
Ti regression tree Geczy-Víg, P., Farkas, I., 2010. Neural network modelling of thermal stratification in
C a solar DHW storage. Sol. Energy 84 (5), 801e806. https://doi.org/10.1016/
b
f RF random forest regression predictor j.solener.2010.02.003 secondoftwo doi:
Geurts, P., Ernst, D., Wehenkel, L., 2006. Extremely randomized trees. Mach. Learn.
K attribute selection parameter
63 (1), 3e42. https://doi.org/10.1007/s10994-006-6226-1 secondoftwo doi:
N number of training samples Hansen, L.K., Salamon, P., 1990. Neural network ensembles. IEEE Trans. Pattern Anal.
y outputs Mach. Intell. 12 (10), 993e1001. https://doi.org/10.1109/34.58871 secondoftwo
doi:
m_ mass flow rate
John, V., Liu, Z., Guo, C., Mita, S., Kidono, K., 2016. Real-time Lane Estimation Using
Tin inlet temperature of the solar collector Deep Features and Extra Trees Regression. Springer International Publishing,
ANN artificial neural network Cham, pp. 721e733. https://doi.org/10.1007/978-3-319-29451-3_57 sec-
DT decision tree ondoftwo doi:
Kalogirou, S.A., 2004. Solar thermal collectors and applications. Prog. Energy
GRNN general regression neural network Combust. Sci. 30 (3), 231e295. ISSN 0360-1285, secondoftwo doi. https://doi.
PV photovoltaic org/10.1016/j.pecs.2004.02.001.
M.W. Ahmad et al. / Journal of Cleaner Production 203 (2018) 810e821 821

Kalogirou, S., Lalot, S., Florides, G., Desmet, B., 2008. Development of a neural secondoftwo doi:
network-based fault diagnostic system for solar thermal applications. Sol. En- B. Neupane, W. L. Woon, Z. Aung, Ensemble prediction model with expert selection
ergy 82 (2), 164e172. https://doi.org/10.1016/j.solener.2007.06.010 secondoftwo for electricity price forecasting, Energies 10 (1).
doi: Notton, G., Motte, F., Cristofari, C., Canaletti, J.-L., 2013. New patented solar thermal
Kalogirou, S., Mathioulakis, E., Belessiotis, V., 2014. Artificial neural networks for the concept for high building integration: test and modeling. Energy Procedia 42,
performance prediction of large solar systems. Renew. Energy 63, 90e97. 43e52. https://doi.org/10.1016/j.egypro.2013.11.004 secondoftwo doi:
https://doi.org/10.1016/j.renene.2013.08.049 secondoftwo doi: Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Karim, M., Perez, E., Amin, Z.M., 2014. Mathematical modelling of counter flow v- Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al., 2011. Scikit-learn:
grove solar air collector. Renew. Energy 67, 192e201. https://doi.org/10.1016/ machine learning in Python. J. Mach. Learn. Res. 12 (Oct), 2825e2830.
j.renene.2013.11.027 secondoftwo doi: Reynolds, J., Ahmad, M.W., Rezgui, Y., 2018. Holistic modelling techniques for the
Karsli, S., 2007. Performance analysis of new-design solar air collectors for drying operational optimisation of multi-vector energy systems. Energy Build. 169,
applications. Renew. Energy 32 (10), 1645e1660. https://doi.org/10.1016/ 397e416. https://doi.org/10.1016/j.enbuild.2018.03.065. ISSN 0378-7788, sec-
j.renene.2006.08.005. ISSN 0960-1481, secondoftwo doi: ondoftwo doi:
Kharb, R.K., Shimi, S., Chatterji, S., Ansari, M.F., 2014. Modeling of solar PV module Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., Chica-Rivas, M., 2015.
and maximum power point tracking using ANFIS. Renew. Sustain. Energy Rev. Machine learning predictive models for mineral prospectivity: an evaluation of
33, 602e612. https://doi.org/10.1016/j.rser.2014.02.014 secondoftwo doi: neural networks, random forest, regression trees and support vector machines.
Kusiak, A., Zheng, H., Song, Z., 2009. Short-term prediction of wind farm power: a Ore Geol. Rev. 71 (804 e 818) https://doi.org/10.1016/j.oregeorev.2015.01.001.
data mining approach. IEEE Trans. Energy Convers. 24 (1), 125e136. https:// ISSN 0169-1368, secondoftwo doi:
doi.org/10.1109/TEC.2008.2006552 secondoftwo doi: J. L. Sawin, F. Sverrisson, K. Seyboth, R. Adib, H. E. Murdock, C. Lins, I. Edwards, M.
Li, Q., Meng, Q., Cai, J., Yoshino, H., Mochida, A., 2009. Applying support vector Hullin, L. H. Nguyen, S. S. Prillianto, et al., reportRenewables 2017 Global Status
machine to predict hourly cooling load in the building. Appl. Energy 86 (10), Report .
2249e2256. https://doi.org/10.1016/j.apenergy.2008.11.035. ISSN 0306-2619, So€ zen, A., Menlik, T., Ünvar, S., 2008. Determination of efficiency of flat-plate solar
secondoftwo doi: collectors using neural network approach. Expert Syst. Appl. 35 (4), 1533e1539.
LIN, J.-Y., CHENG, C.-T., CHAU, K.-W., 2006. Using support vector machines for long- https://doi.org/10.1016/j.eswa.2007.08.080 secondoftwo doi:
term discharge prediction. Hydrol. Sci. J. 51 (4), 599e612. https://doi.org/ Vapnik, V., 2013. The Nature of Statistical Learning Theory. Springer Science &
10.1623/hysj.51.4.599 secondoftwo doi: Business Media.
Liu, Z., Li, H., Zhang, X., Jin, G., Cheng, K., 2015. Novel method for measuring the heat Wang, Z., Wang, Y., Zeng, R., Srinivasan, R.S., Ahrentzen, S., 2018. Random Forest
collection rate and heat loss coefficient of water-in-glass evacuated tube solar based hourly building energy prediction. Energy Build. 171, 11e25. https://
water heaters based on artificial neural networks and support vector machine. doi.org/10.1016/j.enbuild.2018.04.008. ISSN 0378-7788, secondoftwo doi:
Energies 8 (8), 8814e8834. https://doi.org/10.3390/en8088814 secondoftwo Xu, M., Watanachaturaporn, P., Varshney, P.K., Arora, M.K., 2005. Decision tree
doi: regression for soft classification of remote sensing data. Rem. Sens. Environ. 97
Lund, H., Werner, S., Wiltshire, R., Svendsen, S., Thorsen, J.E., Hvelplund, F., (3), 322e336. https://doi.org/10.1016/j.rse.2005.05.008. ISSN 0034-4257, sec-
Mathiesen, B.V., 2014. 4th Generation District Heating (4GDH): integrating ondoftwo doi:
smart thermal grids into future sustainable energy systems. Energy 68, 1e11. Yaïci, W., Entchev, E., 2016. Adaptive Neuro-Fuzzy Inference System modelling for
https://doi.org/10.1016/j.energy.2014.02.089. ISSN 0360-5442, secondoftwo performance prediction of solar thermal energy system. Renew. Energy 86,
doi: 302e315. https://doi.org/10.1016/j.renene.2015.08.028 secondoftwo doi:
Mathiesen, B.V., Lund, H., Connolly, D., Wenzel, H., Østergaard, P.A., Mo € ller, B., Yap, W.K., Karri, V., 2015. An off-grid hybrid PV/diesel model as a planning and
Nielsen, S., Ridjan, I., Karnøe, P., Sperling, K., Hvelplund, F.K., 2015. Smart Energy design tool, incorporating dynamic and ANN modelling techniques. Renew.
Systems for coherent 100% renewable energy and transport solutions. Appl. Energy 78, 42e50. https://doi.org/10.1016/j.renene.2014.12.065 secondoftwo
Energy 145, 139e154. ISSN 0306-2619, secondoftwo doi: https://doi.org/10. doi:
1016/j.apenergy.2015.01.075. Yona, A., Senjyu, T., Saber, A.Y., Funabashi, T., Sekine, H., Kim, C.-H., 2007. Applica-
Motte, F., Notton, G., Cristofari, C., Canaletti, J.-L., 2013. A building integrated solar tion of neural network to one-day-ahead 24 hours generating power fore-
collector: performances characterization and first stage of numerical calcula- casting for photovoltaic system. In: Intenational Conference on Intelligent
tion. Renew. Energy 49, 1e5. https://doi.org/10.1016/j.renene.2012.04.049 Systems Applications to Power Systems, 2007. ISAP 2007, vols. 1e6. IEEE (r).

You might also like