Predicting Hourly Solar Irradiance Using Machine Learning Methods
Predicting Hourly Solar Irradiance Using Machine Learning Methods
Abstract—Accurate prediction of solar irradiance for steady Solar irradiance is the principal term in the field of solar
supply of electrical energy has always been a difficult task both power production [5]. It is the amount of energy released by
in the field of physical simulation or artificial intelligence. If the sun appearing as a wide spectrum of light waves. Simply
solar irradiance could be predicted with very high precision, it put, it is the power per unit area( 𝑊𝑎𝑡𝑡𝑠/𝑚2 ) received from
would be easier for power grid operators to take corrective the sun in the form of electromagnetic radiation. In brief, light
actions in advance to ensure failsafe outcomes during periods of waves are described as vibrations caused by electromagnetic
sudden or cascading drops in solar energy. In this paper, three fields. Organic compounds such as ozone, water vapour,
machine learning techniques were used to forecast hourly solar
carbon (IV) oxide and nitrous oxide, are primarily responsible
irradiance of Johannesburg city using historical meteorological
for filtering the solar radiation passing through the atmosphere
data of two consecutive years. The goal was to build a model that
would readily utilize historical meteorological data to predict
[6]. The effects of these compounds, which includes
highly accurate solar irradiance information on an hourly basis. absorption of radiation at varying spectrums of the light wave,
Results obtained showed that Support Vector Regression (SVR) has direct influence in the Heating Rate which is vital in the
with normalized Root Mean Square Error (nRMSE) of 7.2% studies of global atmospheric circulation models and radiative
gave the best overall performance. This was followed by balance [7]. The term used in describing the interaction in the
Artificial Neural Network (ANN) and Random forest form of filtration between radiation and atmospheric matter is
respectively. called Radiative transfer. Examples of these atmospheric
matter are cloud droplets, aerosols, gases etc. These
Keywords— Solar Irradiance prediction, machine learning, interactions result in absorption, emission, and scattering of
Artificial Neural Network, Random Forest, Support Vector the incident radiation from the sun by the atmospheric matter.
Regression. Absorption of an incident radiation causes reduction in
radiative energy moving at incident angle, while scattering
causes a redistribution of the radiative energy in every
I. INTRODUCTION possible direction. The energy that finally reaches the PV
cells as a result of the interaction between radiation and matter
The world fossil energy reserve is depleting due to heavy is the solar radiation. In this paper, prediction of the intensity
reliance on it by energy consumers. As non-renewable energy or amount of this solar radiation shall be done using machine
sources, fossil fuels have harmful environmental impact learning methods.
because they emit dangerous gases into the atmosphere Despite that solar energy is clean and abundant in nature,
thereby causing pollutions and Climate change. Over- it is characterized by fluctuations in its radiative energy or
dependence on fossil fuels will decline only if reliable solar radiation production, hence it is stochastic. This makes
alternative sources of energy are developed to replace them. solar power generation to be somewhat volatile and
In recent times, there has been an upsurge in the use of solar sometimes discouraging to users. In view of this, accurate
energy globally [1]. This can be attributed to its accessibility, prediction of solar irradiance for efficient power generation
and improvement on the technologies of Photovoltaic (PV) continues to be a very difficult task both in the field of physical
cells. Solar energy is a clean source of energy and it is simulation and artificial intelligence [2]. Generation of solar
naturally abundant with less environmental impacts [2][3]. power is negatively affected by fluctuations in meteorological
Globally, a lot of investments have gone into improving the weather conditions which greatly impact the amount of Global
efficiency of Solar Power which have resulted in the Horizontal Irradiance (GHI). Accurate forecast of solar
emergence of technologies that produced new types of cell irradiance is essential for both grid operators and solar power
candidates to replace the traditional and expensive Crystal supply companies because the process of solar power
Silicon Cells [4]. Examples of these new cell candidates are generation is intermittent [5]. Prediction data are required by
the Thin-Film and the Multijunction Cells technologies. This grid operators to prepare supply-demand plans to prevent
trend in technology and the global quest to minimize interruption of power supply. This can be achieved by using
Greenhouse effects, position Solar Power as one of the most the prediction data to establish a prior electricity generation
feasible Renewable Energies. plan that will result in failsafe outcomes during cascading
Authorized licensed use limited to: University of East London. Downloaded on December 19,2022 at 07:22:52 UTC from IEEE Xplore. Restrictions apply.
power outages owing to fluctuations in solar radiation. Prior 22% with ANN-SVM model. Reference [16] predicted solar
information on the power to be produced plays a critical role radiation intensity using extreme learning machine (ELM)
in ensuring the reliability of service and preventing losses that and Multiple linear regression (MLR) as methods. The data
could have been incurred if otherwise. was collected over five years period and their results showed
In this paper, historical meteorological data, obtained from that ELM performed far better than the conventional MLR,
Meteoblue website [8] through subscription, were used as with nRMSE of 15.29% and MAE of 24.79%.
input parameters to predict hourly solar irradiance. The data According to [5], in their work, extreme gradient boosting
comprise of temperature, relative humidity, sunshine duration, (XGB) model with RMSE value of 6.63 performed better
and measured (actual) GHI. Data from two consecutive years than Random Forest (RF) and ANN models in that order.
were used. Three machine learning methods namely: Support
Vector Regression (SVR), Random Forest (RF), and Artificial III. EXPERIMENTAL SET UP
Neural Network (ANN) were used. The results obtained after Source of data and the composition/quality of training data
the simulations using the three methods were duly compared are necessary for accurate prediction. In this paper,
using performance evaluators such as normalized Root Mean meteorological weather information of Johannesburg city
Square Error (nRMSE) and Mean Absolute Error (MAE). obtained from Meteoblue was used. Hourly data from two
This is to be able to establish the machine learning method that consecutive years (2017 and 2018) were used. Before using
is more efficient for the task. Also, in this paper, some of the the data, they were cleaned by removing all the night hours
related works carried out in the field solar irradiance data which all contained zero 𝑊𝑎𝑡𝑡𝑠/𝑚2 measured solar
prediction are presented in section (II). Section (III) deals with irradiance. Only data from 7am to 6pm daily were used. Data
the parameters and description of the data used, while section from each of these years were used separately to train the three
(IV) discusses the three machine learning models applied for models and compare the results. The data were split: 80% as
prediction. Lastly, performance comparison of the models is training data and 20% as testing data. The dataset contains the
presented in section (V) while the last section discusses the following parameters which were used as input:
future works and the areas for improvement.
a. Temperature [2metres above ground]
II. RELATED WORK b. Relative humidity [2metres above ground]
Three classes of models in use for predicting solar
c. Sunshine duration and
irradiance are: statistical, physical, and hybrid models.
According to [2], “Both physical and statistical methods have d. Measured solar radiation
been developed for the task of forecasting solar power”.
The chart shown in figure 1 below illustrates the general
Statistical time series forecasting methods operates optimally
process input data were pre-processed, split and fed into the
where a very short time forecast is required [9]. Both physical
three machine learning algorithms in order to generate a
and statistical methods are used in an area where satellite
prediction.
images are utilized in predicting cloud movement which in
turn is used to also predict the near-term solar radiation Start
[10][11]. Example of physical model is Numerical Weather
Prediction (NWP). Multiple Linear Regression (MLR) and
Autoregressive Moving Integrated Moving Average
Data collection and pre-
(ARIMA) are good examples of statistical models. Any processing (i.e Data cleaning,
combination of two or more of these and other examples not normalization, splitting etc)
mentioned is a hybrid model.
Machine Learning (ML) is a branch of artificial intelligence
(AI) that is concerned with the design and development of
algorithms that enable computers to generate rules that Design and development of models using
ANN, or RF or SVR algorithms
depend on the patterns of raw data that have been fed into it.
ML algorithms operate by evolving behaviour or intelligence
into a machine thereby giving it the capacity to learn and do
better in the future from its own experience Predicting solar irradiance using ANN
ANN was used for the first time to predict time series global or RF or SVR
solar radiation in 1999 by [12]. They used a method based on
Multiple Layer Perceptron (MLP) to forecast the irradiation
for the next day. Their findings showed a prediction error of Evaluation of model accuracy using
18.5% in summer and 21.8% in winter using Mean Absolute nRMSE and MAE
Percentage Error (MAPE) as evaluation criterion. Since then,
machine learning forecasting models are persistently being
improved to build better models with higher accuracy. For End
instance, [13] used SVR and K-NN to predict one hour ahead
solar irradiance in Germany. They obtained nRMSE of 6.2%,
Figure 1: A chart showing the general process used to
with SVR performing better. In the same vein, SVM model
generate predictions using the three ML models
did better than NWP in a one hour ahead forecast research
done by [14] in Canada. In USA, [15] predicted solar
irradiance using ANN-SVM and ARMA and got nRMSE of
Authorized licensed use limited to: University of East London. Downloaded on December 19,2022 at 07:22:52 UTC from IEEE Xplore. Restrictions apply.
Below is a brief description of the three machine learning used in pattern recognition, regression, classification,
models used to predict hourly solar irradiance of estimation, and operator inversion for difficult tasks. The
Johannesburg city: application of this method to time series prediction tasks has
been a success story [22]. The approach used in SVM for
A. Artificial Neural Network (ANN) classification problems has the primary objective for
ANNs are a class of models inspired by the structure of discovering the hyperplane which effectively separates the
biological neural networks. Like kernel methods, they are class representation of data. Hyperplane is a generalization of
good for solving problems involving pattern-matching a line in 2-D and a plane in 3-D. When there are several
techniques. In the review conducted by [18], 79% of AI hyperplanes to choose from, SVM selects the one where the
techniques used in weather forecasting data are based on distance of the hyperplane from the nearest data points is the
ANN. The figure below illustrates a mathematical model of farthest. The maximum margin linear hyperplane can be
artificial neuron: formed as soon as instances from the support vector have
been identified as shown in the figure below:
Authorized licensed use limited to: University of East London. Downloaded on December 19,2022 at 07:22:52 UTC from IEEE Xplore. Restrictions apply.
𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒 𝑒𝑟𝑟𝑜𝑟, 𝑀𝑆𝐸 = (1⁄𝑛) ∑𝑡(𝐴𝑡 − 𝐹𝑡 ), 2nd Model 15.8% 3.893
C. SVR Model
The results obtained from SVR model are the best for solar
irradiance prediction in this paper. It has the least nRMSE and
MAE values among all the three machine models evaluated.
Table III below represents the different performance
evaluator values obtained using this method. Figures 5a and
5b show the regression plots.
Authorized licensed use limited to: University of East London. Downloaded on December 19,2022 at 07:22:52 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION
Over the past few years, the use of Solar power has been
gaining global acceptability as a clean, sustainable, and
reliable alternative to fossil energy. However, this
momentum can be sustained only if the efficiency of power
generated through PV cells is being continuously improved.
Apart from the technologies, like Maximum Power Point
Tracking (MPPT) mechanism that helps the system to
generate maximum power with every radiative energy it
receives, prediction of solar irradiance needs to be more
accurate. By building a reliable forecasting model, the grid
Figure 5a: Regression plot for 2017 using SVR model operators will be able to overcome the uncertainties
associated with fluctuations in the weather patterns and
climatic instance of the day which greatly affect solar power
generation.
This study suggests that Support Vector Regression (SVR) is
the best model to select for solar irradiance forecasting.
Although the accuracy of these methods depends on the
horizon and the quality of the training data, yet SVR has
proven in many applications to be very good for both
classification and regression problems. In this work, SVR
model gave the minimum nRMSE and MAE of 7.2% and
1.039 respectively. In future works, the plan is to use deep
learning approach to improve on the results obtained in this
paper. Recently, deep learning networks have proven to be
effective class of models although it requires large volume of
Figure 5b: Regression plot for 2018 using SVR model
historical data and high-speed Graphics Processing Unit
(GPU) to implement them. Further improvements can equally
TABLE III
be made by combining different machine learning methods
SVR model nRMSE MAE
analysed in this work to obtain higher accuracy.
1st Model 7.2% 1.039
2nd Model 7.8% 1.459 REFERENCES
Authorized licensed use limited to: University of East London. Downloaded on December 19,2022 at 07:22:52 UTC from IEEE Xplore. Restrictions apply.
[7] T. Shimazaki and L. C. Helmle, “A simplified method presented at the 2014 International Joint Conference on
for calculating the atmospheric heating rate by Neural Networks (IJCNN), 2014, pp. 651–657.
absorption of solar radiation in the stratosphere and [15] Z. Dong, D. Yang, T. Reindl, and W. M. Walsh, “A
mesosphere,” 1979. novel hybrid approach based on self-organizing maps,
[8] Ali Ahmad, N. Hasan Ali, Tshilidzi. Marwala, support vector regression and particle swarm
"Perturb and observe based on fuzzy logic controller optimization to forecast solar irradiance,” Energy, vol.
maximum power point tracking (MPPT)", IEEE 82, pp. 570–577, 2015.
International Conference on Renewable Energy [16] H. Suyono, H. Santoso, R. N. Hasanah, U. Wibawa,
Research and Applications (lCRERA) 2014, October and I. Musirin, “Prediction of solar radiation intensity
2014. using extreme learning machine,” Indonesian Journal
[9] C. Yang and L. Xie, “A novel ARX-based multi-scale of Electrical Engineering and Computer Science, vol.
spatio-temporal solar power forecast model,” presented 12, no. 2, pp. 691–698, 2018.
at the 2012 North American Power Symposium [17] C. Voyant et al., “Machine learning methods for solar
(NAPS), 2012, pp. 1–6. radiation forecasting: A review,” Renewable Energy,
[10] B. Goswami and G. Bhandari, “Automatically vol. 105, pp. 569–582, 2017.
adjusting cloud movement prediction model from [18] A. Mellit, S. A. Kalogirou, L. Hontoria, and S. Shaari,
satellite infrared images,” presented at the 2011 Annual “Artificial intelligence techniques for sizing
IEEE India Conference, 2011, pp. 1–4. photovoltaic systems: A review,” Renewable and
[11] A. Radovan and Ž. Ban, “Predictions of cloud Sustainable Energy Reviews, vol. 13, no. 2, pp. 406–
movements and the sun cover duration,” presented at 419, 2009.
the 2014 37th International Convention on Information [19] F. Rodrigues, C. Cardeira, and J. Calado, “Neural
and Communication Technology, Electronics and networks applied to short term load forecasting: A case
Microelectronics (MIPRO), 2014, pp. 1210–1215. study,” in Smart Energy Control Systems for
[12] Y. Kemmoku, S. Orita, S. Nakagawa, and T. Sustainable Buildings, Springer, 2017, pp. 173–197.
Sakakibara, “Daily insolation forecasting using a [20] A. Moncada, W. Richardson, and R. Vega-Avila,
multi-stage neural network,” Solar Energy, vol. 66, no. “Deep learning to forecast solar irradiance using a six-
3, pp. 193–199, 1999. month UTSA skyimager dataset,” Energies, vol. 11, no.
[13] B. Wolff, E. Lorenz, and O. Kramer, “Statistical 8, p. 1988, 2018.
learning for short-term photovoltaic power [21] V. Vapnik, The nature of statistical learning theory.
predictions,” in Computational sustainability, Springer science & business media, 2013.
Springer, 2016, pp. 31–45. [22] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data
[14] P. Krömer, P. Musílek, E. Pelikán, P. Krč, P. Juruš, and Mining: Practical machine learning tools and
K. Eben, “Support vector regression of multiple techniques. Morgan Kaufmann, 2016.
predictive models of downward short-wave radiation,”
.
Authorized licensed use limited to: University of East London. Downloaded on December 19,2022 at 07:22:52 UTC from IEEE Xplore. Restrictions apply.