The methodology identifies 10 major ML models frequently used in energy systems, i.e., ANN, MLP, ELM, SVM, WNN, ANFIS, decision trees, deep learning, ensembles, and advanced hybrid ML models. The notable manuscripts are accordingly categorized into the relevant groups and further reviewed in this section. Note that, in the presented taxonomy, deep learning as an emerging modeling technique has been categorized under the ML models. Furthermore, it is worth mentioning that WNN and ANFIS are of a hybrid nature. However, the category of “advanced hybrid ML models” includes only recently developed algorithms.
3.2. MLP
MLP is an advanced version of ANN for engineering applications and energy systems; it is considered a feed-forward neural network and uses a supervised and back-propagation learning method for training purposes [
34,
35,
36]. This is a simple and popular method for the modeling and prediction of a process, and, in many cases, it is considered as the control model.
Table 2 demonstrates some important papers in this field.
Ahmed et al. (2015) [
37] performed a study on forecasting hourly solar irradiation for New Zealand. In this paper, the ability to provide 24-h-ahead hourly global solar irradiation forecasts was assessed utilizing several methods, especially incorporating autoregressive recurrent neural networks. Hourly time series were used for training and testing the forecasting methods. MLP, NARX, ARMA, and persistence methods were compared using
RMSE.
Figure 6 presents the related results. Based on the results, the NARX method with the lowest value of
RMSE presented a precision about 49%, 22%, and 52% higher than that of the MLP, ARMA, and Persistence methods, respectively.
Chahkoutahi et al. (2017) [
38] introduced a seasonal optimal hybrid model to forecast the electricity load. In this study, a direct optimum parallel hybrid model was presented using multi-layer perceptron neural network, Seasonal Autoregressive Integrated Moving Average, and Adaptive Network-based Fuzzy Inference System to forecast the electricity load. The main reason for using this model was to utilize these models’ advantages for modeling complex systems. The validation of the presented model implies that it was more accurate than its components.
Figure 7 presents the results of the proposed DOPH method against SARIMA, MLP, ANFIS, DE-based, and GA-based models. The output of each method was compared with target values using
RMSE. Based on the results, the proposed method could improve the prediction capability by 51.4%, 33.18%, 31.10%, 16.44%, and 12.8%, compared with the SARIMA, MLP, ANFIS, DE-based, and GA-based models, respectively, in the test stage.
Kazem et al. (2013) [
39] designed and installed a photovoltaic system for electricity production. The output of the system was measured for one year. The photovoltaic system output was simulated and predicted by self-organizing feature maps, feed-forward networks, support vector machines, and multi-layer perceptron. Ambient temperature and solar radiation data were these model’s inputs, and the PV array current and current were the outputs. The outputs of each model were compared with the target values using
RMSE factor. The results have been presented in
Figure 8. Based on the results, the SOFM generates the lowest
RMSE value compared to the MLP model, GFF model, and SVM model. Therefore the SOFM model is suitable for this purpose.
Loutfi et al. (2017) [
40] presented an analysis of the design of solar energy systems. In this study, a comparison between multilayer perceptron and neural autoregressive with exogenous inputs was presented. The proposed model has excellent ability to produce hourly solar radiation forecasts for cheaper data such as relative humidity and temperature. The results of the best model are presented in
Table 3 for the developed model. The study proposes the NARX method in conjunction with the MLP method. As is clear from
Table 3, the proposed method has the best prediction capability with reference to n
RMSE and correlation coefficient values.
Shimray et al. (2017) [
41] performed a study on the installation of hydropower plant sites ranking using a Multi-layer Perceptron Neural Network. In this paper, a model was developed for decision makers to rank potential power plant sites based on water quality, air quality, energy delivery cost, natural hazard, ecological impact, and project duration. The case in this paper was ranking several potential plant sites in India.
3.10. Hybrid ML Models
Hybrid models benefit from multiple ML methods and/or other soft computing and artificial intelligence methods. Data preprocessing and optimization tools have now become common to produce high-accuracy hybrid models for improved prediction capabilities. In these models, usually, one part is for prediction or acts as an estimator, and the other part acts as an optimizer. These models are mainly employed when there is a need for an accurate estimation. ANFIS and WNN are among the early generation of hybrid models [
36,
79].
Table 12 presents some important papers in this field.
Deng et al. (2018) [
80] presented a hybrid short-term load-predicting model optimized by switching delayed particle swarm optimization. In this study, this method was proposed based on switching delayed particle swarm optimization, extreme learning machine with different kernels and empirical mode decomposition. At the first stage, the load database history was decomposed into independent intrinsic mode functions, and the intrinsic mode function sample entropy values were computed. The intrinsic mode function was categorized into three groups. Then the extreme learning machine was applied to predict the three groups. Lastly, the prediction results were gathered to achieve the final prediction result. The experimental results showed that the presented perdition model was robust.
Figure 22 presents the
RMSE values of the study for each load by the employed models. In this figure, L1, L2, and L3 are user-load datasets for three micro-grids located in Beijing, Yanqing, Guangdong Province, Dong’ao Island, and Xinjiang, Turpan, respectively.
Dou et al. (2016) [
81] presented energy management strategies for a microgrid by utilizing a renewable energy source and load prediction. An energy management system based on a two-level multi-agent was built. Then, in the upper-level EMA, strategies of the energy management were constructed by using a PSO method based on renewable energy sources and load probabilistic forecasting. Ensemble empirical mode decomposition coupled with sparse Bayesian learning was used for forecasting of the lower-level renewable energy source and load agents. Simulation results examined the validity of the proposed method.
Peng et al. (2016) [
82] introduced a method to hybridize differential empirical mode decomposition and quantum particle swarm optimization algorithm with support vector regression in electric load forecasting. The differential empirical mode decomposition method was applied for decomposing the electric load to several parts related to high frequencies and an approximate part related to low frequencies. The quantum particle swarm optimization algorithm was utilized for optimizing the parameters of support vector regression. The validation of the method demonstrated that it could provide forecasting with good precision and interpretability. Qu et al. (2016) [
83] introduced a hybrid model for wind speed forecasting based on fruit fly optimization algorithm and ensemble empirical mode decomposition. The original data of wind speed was divided into a set of signal components using ensemble empirical mode decomposition. Then, the fruit fly optimization algorithm was used to optimize parameters of prediction artificial intelligence models. The final prediction values were acquired by reconstructing the refined series. The empirical results demonstrate that the presented hybrid model was better than some of the existing forecasting models.
Figure 23 presents the
RMSE values of the study by the employed models.
Yang et al. (2017) [
84] presented a hybrid model for electricity price forecasting utilizing ARMA, wavelet transform, and KELM methods. SAPSO was applied for searching the optimal kernel parameters. After the test of the wavelet decomposition components, the ARMA model was used to predict stationary series. SAPSO-KELM model. The proposed method performance is validated utilizing electricity price data from several cities. The real data demonstrated that the presented method was more accurate than individual methods.
Figure 24 presents the
RMSE values of the study for each season by the employed models.
Renewable energy systems such as wind and solar are site-dependent and highly difficult to predict [
7,
87,
88]. The prediction model using hybrid ML models effectively contributes to increased solar energy production [
89]. The economic and environmental aspects of solar photovoltaic as a renewable energy source have caused a significant rise in the number of PV panels in recent years. The high level of computational power and data has empowered ML models for more precise predictions. Due to the significance of prediction in solar photovoltaic power output for decision makers in the energy industry, ML models are employed extensively.
Table 13 lists some critical papers in this field.
David et al. (2016) [
90] evaluated performances of a combination of ARMA and GARCH models in econometrics to establish solar irradiance probabilistic forecasts. A testing procedure has been utilized to evaluate probabilistic forecasts and point forecasts. The results are presented in
Table 14 and
Figure 25.
As is clear from
Table 14 and
Figure 25, Recursive ARMA has a low value for
RMSE compared with other models. Therefore it can be claimed that the presented model can carry out point forecasts as accurately as other models established on machine learning techniques, and the accuracy of the proposed model is same as the other machine learning techniques for both point and probabilistic forecasts.
Feng et al. (2017) [
91] incorporated GRNN, RF, ELM, and optimized back propagation GANN to estimate daily Hd for two stations in northern China. All presented artificial models were compared with the empirical model (
Table 15 and
Figure 26).
Based on
Figure 26 and
Table 15, the GANN model presents the best accuracy due to its lowest
RMSE and highest
r values compared with those for others for both Beijing and Zhengzhou stations.
Hassan et al. (2017) [
92] presented ensemble models for solar radiation modeling. Gradient boosting, RF and bagging were developed to estimate radiation in hourly and daily time scales. These novel ensemble models were developed to generate synthetic radiation data to be utilized to simulate the performance of solar energy systems with different configurations.
Figure 27 presents the results of the study in detail. In this study, D1 is a daily model with day number, sunshine fraction as inputs and horizontal global irradiation as the output of the model. D2 is a daily model with global clearness index, day number as inputs and a diffuse fraction as the output of the model. D3 is a daily model with global clearness index, day number as inputs, and normal clearness index as the output of the model. H1 is an hourly model with horizontal global irradiation, sunshine time, day number as inputs, and horizontal global irradiance as the output of the model. H2 is an hourly model with global clearness index, sunshine fraction, day number as inputs, and a diffuse fraction as the output of the model. H3 is the hourly model with global clearness index, sunshine time, day number as inputs, and normal clearness index as the output of the model.
Based on
Figure 27, generally, SVR has the best prediction ability compared with the other techniques because it has a high correlation coefficient and a low average
RMSE compared with the other models employed by Hassan et al. [
14]. Salcedo-Sanz et al. (2018) [
93] integrated the CRO with the ELM model in their study. The presented algorithm was applied in two stages. An ELM algorithm was used for the feature selection process, and solar radiation was estimated using the optimally screened variables by the CRO-ELM model (
Figure 28).
Based on
Figure 28, the hybrid CRO-(ELM)-ELM model has the highest accuracy compared with that for hybrid CRO-(ELM)-MLR, CRO-(ELM)-MARS, and CRO-(ELM)-SVR and the GGA models. Generally, the CRO-based hybrid system is carefully screened through a wrapper-based modeling system. The hybrid CRO-(ELM)-ELM model presents clearer advantages compared with the alternative machine learning approaches.
Salcedo-Sanz et al. (2017) [
94] studied the prediction of global solar radiation at a given point incorporating a multilayer perceptron trained with extreme learning machines. A coral reefs optimization algorithm with species was used to reduce the number of significant predictive variables. Based on the results (
Figure 28), the proposed model (CRO-SP) has been tested by Toledo (Spain) data. The average best result of
RMSE was equal to 69.19 (W/m
2), which led to higher accuracy of predictions compared with other machine learning techniques. This claim is evident in
Figure 29, which presents the average values of
RMSE for the four developed techniques.
Touati et al. (2017) [
95] predicted output power from photovoltaic panels under different atmospheric conditions. This study’s goal was to investigate photovoltaic performance in the harsh environmental conditions of Qatar. The ML model was used to relate various environmental factors such as irradiance, PV surface temperature, wind speed, temperature, relative humidity, dust, and cumulative dust to power production.
Figure 30 presents the results of the analysis with correlation coefficient.
As is clear from
Figure 30, Linear Regression and M5P tree decision algorithms have been developed for prediction proposes equipped with CFS and RelifF to select subsets of relevant and high-quality features. Based on the results, the M5P model equipped with RelifF creates more accurate predictions due to its high correlation coefficient value; on the other hand, the developed models are relatively simple and can be readily equipped to predict PV power output.
Voyant et al. (2017) [
96] proposed models based on the Kalman filter to forecast global radiation time series without utilizing historical data. These methodologies were compared with other data-driven models with different time steps using
RMSE values. The results claimed that the proposed model improved the prediction purposes. Voyant et al. (2017) [
97] presented a method to better understand the propagation of uncertainty in the global radiation time series. In this study, the reliability index has been defined to evaluate the validity of predictions. The presented method has been applied to several meteorological stations. The comparisons were performed using
RMSE factor. The results were promising for successfully applying in these stations in the Mediterranean area.
There are many novel hybrid ML models proposed to forecast solar radiation. Hybrids of the ANN method have often been used for this purpose and SVM and SVR are being used more extensively nowadays. SVM and SVR usually have the same forecasting performance. Also, the ensemble models were reported to generally deliver higher performance. SVR, GP, and NN have better forecasting performance than AR in forecasting solar radiation. The
RMSE values of ELM, GANN, RF, and GRNN [
18] shows that there is no meaningful difference between them in terms of forecasting performance.
In order to integrate highly volatile wind power in a power grid, precise forecasting of wind speed is crucial. This would result in less of a need to control the energy provided by wind, having battery loading strategies and planning reserve plants. ML models can predict a time interval from seconds to hours and, as a result, are essential for energy grid balancing.
Table 16 shows some critical papers in this field. The estimation of the total power collected from wind turbines in a wind farm depends on several factors such as the location, hub height, and season. Cornejo-Bueno et al. (2017) [
98] applied different machine learning regression techniques to predict WPREs. Variables from atmospheric reanalysis data were used as predictive inputs for the learning machine.
RMSE was employed as a comparison factor among the developed models. The results have been presented in
Figure 31. In general, GPR followed by MLP has the lowest
RMSE compared with SVR and ELM for each farm. This shows the high prediction capability of GPR and MLP models, in line with the purpose of the study.
Accurate forecasting of WPREs is necessary for the efficient integration of a wind farm into an electricity system [
103]. Khosravi et al. (2018) [
99] developed models based on a group model of data handling type neural network, adaptive neuro-fuzzy inference system, ANFIS optimized with an ant colony, ANFIS optimized with particle swarm optimization algorithm, ANFIS optimized with genetic algorithm, and multilayer feed-forward neural network. Day, month, average air temperature, minimum and maximum air temperature, air pressure, wind speed, relative humidity, latitude, longitude, and top of atmosphere insolation. The group method of data handling-type neural network was the best-developed model.
Figure 32 demonstrates the
RMSE and correlation coefficient for each model for making the best comparison.
Burlando et al. (2017) [
100] compared a pure ANN model and a hybrid model. Both models had similar performance. Both models were validated against the wind farm SCADA data. However, the hybrid model made better predictions during high and low ranges of wind speed, and ANN better predicted medium wind speed ranges. The results were compared using the normalized root mean square error and the normalized mean absolute error. The best results (the lowest value of comparison factors) were calculated for NWP height of 100 and 200 m for both layout 1 and 2.
Pandit and Infield (2018) [
101] performed a study to reduce the costs of operation and maintenance of the wind turbine. Predictive condition monitoring based on SCADA was applied to identify early failures, boost production, limit downtime, and lower the energy cost. A Gaussian Process algorithm was presented to roughly calculate operational curves, which can be utilized as a reference model to recognize critical failures of the wind turbine and enhance power performance.
Figure 33 presents the correlation coefficient for the prediction results of four variables using Gaussian process compared with the target values. Based on
Figure 33, this model successfully estimated the power curve compared with other variables. Sharifian et al. (2018) [
102] presented a new model based on the fuzzy neural network to forecast wind power under uncertain data conditions. The proposed model was established using a particle swarm optimization algorithm. This model was based on the neural network’s learning and expert knowledge of the fuzzy system. The presented model was validated against a real wind farm. The results are presented in
Figure 34 using
RMSE values for each case study. As is evident,
RMSE for the first case study has the lowest value and for the fifth case study has the highest value. Therefore, it can be claimed that the precision of the employed model for the first case study is higher than that of other case studies.
For proper integration of wind power into the power grid, a high-performance forecasting model to compute the forecasting of wind speed at a reasonable speed is needed. It can be concluded that the multilayer perceptron ANN model has better forecasting performance for wind speed than SVM and regression trees [
25]. Also, hybrid models such as the ANFIS model have better performance than SVR models. For wind speed forecasting, the hybridization of ANFIS with GP has better performance than its hybridization with PSO and GA [
26].
ML models can provide accurate energy consumption and demand prediction, and can be used at the managerial level such as by building commissioning project managers, utility companies, and facilities managers to introduce energy-saving policies.
Table 17 demonstrates some critical papers in this field.
Albert and Maasoumy (2016) [
104] presented a predictive segmentation technique to create the targeting process and highly-interpretable segmentation for energy companies. The presented model utilized demographics, consumption and program enrollment data to make predictive patterns. This model displayed homogeneous segments that were 2- to 3-fold more productive for targeting. Alobaidi et al. (2018) [
105] proposed an ensemble learning framework for household energy consumption forecasting. In this paper, a prediction framework was presented to predict individual household average daily energy consumption. The results showed the robustness of the proposed ensemble model to provide prediction performance using limited data.
Figure 35 presents the results of
RMSE for each model separately.
Benedetti et al. (2016) [
106] introduced a new methodology for control automation of energy consumption utilizing adaptive algorithms and artificial neural networks. Three neural network structures were presented and trained to deal with an enormous amount of data. Three indicators were used to identify the best structure for creating a control tool for energy consumption. The accuracy of the model was investigated. Finally, the model was applied to a case study of a building in Rome, Italy.
Chen et al. (2018) [
107] worked on a novel approach for predicting residential electricity consumption using ensemble learning. In this study, a data-driven framework was introduced to forecast the annual electricity consumption of household utilizing ensemble learning model. Ridge regression was used to combine feed-forward deep networks and extreme gradient boosting forest.
Figure 36 presents the results of the study in comparison with those of the other models. As is clear from
Figure 36, the proposed models have the highest accuracy with the lowest
RMSE in comparison with the other models.
Kuroha et al. (2018) [
108] presented an operational planning model for residential air conditioners. In this study, the focus was on automatic air conditioners for thermal comfort improvement and electricity cost reduction. An energy management methodology was introduced to provide an air conditioner operation plan by learning the installation environment characteristics from result data of the historic operation. Based on the results, the proposed model could reduce the electricity cost about 39.7% compared with that for the benchmark method.
The type of data that is available today is continuously evolving. Some data already encode information that is used as proxy metrics to predict energy consumption in buildings. For example, geometry, size, and height can be used to predict energy consumption in buildings. Wang et al. [
109] developed the Unige Building Identifier to correspond attribute data and building energy to smooth the way for corresponding across datasets. Depecker et al. [
110] matched the consumption of heating of the buildings and their shape. In this study, the criterion for the shape of buildings was presented. Fourteen buildings were chosen based on their shape varieties. The results demonstrated that the energy consumption of buildings is inversely proportionate to the building’s compactness. Qi and Wang [
111] introduced a novel model for calculation of shape coefficient of buildings utilizing Google Earth. Astronomy principles, geometry, and GIS slope analysis were used for calculation of shape coefficient of buildings. This new model can be used for energy-saving measures in existing buildings.
ML and big data have led to believe that personally identifiable information is released when predicting energy patterns, and so forth. However, this is not often the case. The following studies show how to protect data privacy while predicting and disclosing information about energy in commercial buildings. Livingston et al. [
112] presented a solution to measure the impact of modifying the utility meter aggregation threshold for dweller privacy and on buildings that are qualified for energy usage reporting. As the threshold rises, lesser buildings are qualified for disclosure of energy use data. This paper’s goal was to study the resemblance between whole-building totals and individual utility meters at various aggregation levels. Sweeney et al. [
113] proposed a solution for data privacy. The solution included a formal protection model titled k-anonymity as a series of accompanying policies. For this definition of privacy, in a k-anonymized dataset, every record is identical from at least k-1 other records. Machanavajjhala et al. [
114] studied two problems about k-anonymity; little diversity in sensitive attributes and background knowledge of attackers. They introduced a new privacy criterion called l-diversity that can shield against such attacks. The hybridization of ML models in energy demand field demonstrated that the accuracy of energy demand forecasting could improve significantly. Also, an ensemble model has significantly higher generalization ability than ANN and SVM models, and it has a lower uncertainty of forecasting [
7,
32].