0% found this document useful (0 votes)

22 views14 pages

Electronics 09 01700 v3

Uploaded by

Lucas Haas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views14 pages

Electronics 09 01700 v3

Uploaded by

Lucas Haas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

electronics

Article
Sky Imager-Based Forecast of Solar Irradiance Using
Machine Learning
Anas Al-lahham 1, * , Obaidah Theeb 1 , Khaled Elalem 1 , Tariq A. Alshawi 1 and
Saleh A. Alshebeili 1,2
1 Electrical Engineering Department, King Saud University, Riyadh 11421, Saudi Arabia;
[email protected] (O.T.); [email protected] (K.E.); [email protected] (T.A.A.);
[email protected] (S.A.A.)
2 King Abdulaziz City for Science and Technology (KACST)-Technology Innovation Center (TIC) in Radio
Frequency and Photonics (RFTONICS), King Saud University, Riyadh 11421, Saudi Arabia
* Correspondence: [email protected]

Received: 11 September 2020; Accepted: 12 October 2020; Published: 16 October 2020

Abstract: Ahead-of-time forecasting of the output power of power plants is essential for the stability
of the electricity grid and ensuring uninterrupted service. However, forecasting renewable energy
sources is difficult due to the chaotic behavior of natural energy sources. This paper presents a
new approach to estimate short-term solar irradiance from sky images. The proposed algorithm
extracts features from sky images and use learning-based techniques to estimate the solar irradiance.
The performance of proposed machine learning (ML) algorithm is evaluated using two publicly
available datasets of sky images. The datasets contain over 350,000 images for an interval of 16 years,
from 2004 to 2020, with the corresponding global horizontal irradiance (GHI) of each image as the
ground truth. Compared to the state-of-the-art computationally heavy algorithms proposed in the
literature, our approach achieves competitive results with much less computational complexity for
both nowcasting and forecasting up to 4 h ahead of time.

Keywords: global horizontal irradiance (GHI); photovoltaics (PV); solar energy; solar
irradiance forecasting

1. Introduction
Photovoltaic (PV) systems have attained a rapid increase in popularity and utilization to face
the challenges of climate change and energy insecurity, as they bring a potential displacement for
fossil fuel due to its merits of being pollution-free and its role of limiting global warming. However,
the volatility and uncertainty of solar power resources are some of the main challenges that affect the
PV power output, which, along with inaccurate forecasting, may impact the stability of the power
grid [1,2]. Therefore, accurate irradiance forecasting may help power system operators to perform
different actions in the grid operation, such as load following, scheduling of spinning reserves or unit
commitment [3].
PV power output mainly depends on the amount of solar irradiance on a collection plane.
However, the amount of solar irradiance is affected by various weather conditions such as clouds and
dust. Thus, solar irradiance may be prone to rapid fluctuations in various regions [4]. Various models
have been proposed to forecast solar irradiance; these forecasting models are classified into parametric
and statistical models. The main difference between these two models is the dependency on historical
data; the parametric, physical or “white box” models do not need any historical data to generate the
prediction of solar irradiance. They generate the prediction according to meteorological processes and
weather conditions, such as cloud formation, wind, and temperature. The most well-known physical

Electronics 2020, 9, 1700; doi:10.3390/electronics9101700 www.mdpi.com/journal/electronics

Electronics 2020, 9, 1700 2 of 14

model is the numerical weather prediction (NWP), which, as the time horizon increases, offers greater
accuracy over statistical models. Hybrid methods are also popular as they combine a mix of both
models [5].
Several physical and statistical methods have been proposed in the literature for solar irradiance
forecasting. Larson et al. [6] proposed a methodology to generate a day-ahead power output forecast
of two PV plants using publicly available NWP from two models; a PV physical model was allowed to
obtain power output using global horizontal irradiance (GHI) values obtained from the two models.
The statistical and machine learning (ML) models predict solar irradiance by extracting relations
among historical data to train the model; therefore, a decent training sample is essential in order to
produce an accurate model. There are two well known statistical methods, artificial intelligence (AI)
techniques and regressive methods, which are mostly used for short term forecasting (less than 4 h).
In such cases, NWP does not perform well because of the lack of the necessary granularity as a way to
add future information to forecasting models.
Talha A. Siddiqui et al. [7] presented a deep neural network approach to forecast short-term solar
irradiance. The datasets in that work were collected from two different locations. The first location
was the Solar Radiation Research Laboratory (SRRL) (Golden, Colorado dataset), where an image
was recorded using a Total-Sky Imager commercial camera (TSI) every 10 min with a mechanical
sun tracker to prevent satiety in the image. The dataset was collected from 2004–2016, and the total
images captured totaled 304,309. The second location was in Tucson, Arizona, where the dataset
had been recorded at the Multiple Mirror Telescope Observatory (MMTO). The dataset spans the
period from the months of November 2015 to May 2016. That paper applied two types of irradiance
predictions on the datasets, namely nowcasting and forecasting. The forecasting was for a duration
up to 4 h. Air temperature, wind speed, relative humidity and other auxiliary data were used to
improve the quality of the model. The work in [7] used the normalized mean absolute percentage error
(nMAPE) metric to quantify the prediction accuracy. The proposed algorithm uses computationally
heavy ML techniques.
Anto Ryu et al. [8] presented an approach for short-term solar irradiance forecasting for 5–20 min
ahead, using (TSI), with two forecasting models. First, a Convolutional Neural Network (CNN) model
was used with only sky images taken by TSI. Second, a CNN model using both sky images and lagged
GHI are used as input data. Moreover, the output of estimated GHI of the first model was used as input
data to the second model. A third persistence model was used to compare the forecasting accuracy of
the proposed CNN models.
Graeme Vanderstar et al. [9] proposed a method to forecast two hours ahead of solar irradiance
using Artificial Neural Network (ANN). The use of different remote solar monitoring stations is
combined with the use of ML concepts using genetic algorithms. The algorithm was used to find the
best selection of solar monitoring stations chosen from the 20 available sites. The algorithm has a
forecasting capability using a small number of monitoring stations—as few as five stations.
Ke-Hung Lee et al. [10] presented a method for short-term solar irradiance forecasting using
electromagnetism-like neural networks. The results of the electromagnetism-like neural network
were compared with the backpropagation neural network. The comparison results showed that
the prediction of the electromagnetism-like neural network was better than the backpropagation
neural network.
M.Z. Hassan et al. [11] conducted research into the forecasting of day-ahead solar radiation using
ML approach. That paper collected the datasets samples from local solar power plant at Nadi Airport
in Fiji. The average values of the solar power in a known duration were contained in the data samples.
The authors of [11] implemented two regression techniques, one was the linear least squares and the
second was the Support Vector Machine (SVM). Multiple kernel functions with SVM were used to
obtain good results on non-linear separable data. Mean Absolute Error (MAE) and Root Mean Square
Error (RMSE) were considered as prediction accuracy metrics. The results showed that no forecasting
algorithm of the proposed models can be perfect for all conditions.
Electronics 2020, 9, 1700 3 of 14

An extensive literature review on the prediction of PV power production was conducted by

Ahmed et al. [12]. This comprehensive review included different forecasting methods, input correlation
analysis, uncertainty quantification, time stamp, data pre- and post-processing, forecast horizon,
network optimization, performance evaluations, weather classification and extensive reviews of ANN
and other AI techniques. This review shows that the conventional and statistical forecasting methods
in terms of reliability, accuracy and computational economy could not outrun the ML approach in
the form of ANN or its hybrids, especially for short-term forecast horizons. The complexity and
computational time of ML models were also considered in this study, showing that having multiple
inputs significantly increases complexity and computational time.
Huynh et al. [13] developed a model to forecast global solar radiation (GSR). The model is based
on deep learning principle—more accurately, the long short-term memory (LSTM) network modelling
strategy. This model considers very short-term forecasting (1–30 min forecasting horizon). This study
claims that the LSTM model outperforms other deep learning models, a statistical model, a single
hidden layer, and a machine learning-based model. Hybridization with other models is also considered
in this study to further improve the performance of the LSTM model.
As reported in the literature, the forecasting accuracy of solar irradiance remains less adequate.
State-of-the-art deep learning solar irradiance prediction algorithms have demonstrated excellent
performance but are computationally heavy. On the other hand, a major shortcoming of parametric
models is the high dependency on NWP, which is spatially too coarse to accurately predict solar
irradiance due to the generality of the information provided by weather forecast as well as the lack
sufficient spatial and temporal resolution [5].
In this paper, we develop new computationally efficient ML algorithms for forecasting the solar
irradiance for durations from 1 h up to 4 h. This study targets accurate prediction of GHI by training
multiple forecasting models, using sky images obtained from the SRRL. The GHI ground truth for
the sky images are obtained from a measurement and instrumentation data center (MIDC) in Golden,
Colorado [14,15]. The main contributions of this paper are as follows:

• Proposing a prediction approach that does not rely on meteorological parameters, and encodes
an input sky image to take the form of a one-dimensional (1-D) vector to facilitate the use of less
complex ML regressors.
• Adopting Latent Semantic Analysis (LSA) to reduce the size of the regressor input vector,
without decreasing the prediction accuracy.
• Evaluating the performance of a new proposed approach using a 350,000-sample dataset.
The results show that the proposed approach outperforms the more complex state-of-the-art
forecasting methodology presented in [7].

The development of algorithms that are computationally efficient and solely rely on sky images for
irradiance prediction will enable their implementation in inexpensive off-the-shelf hardware platforms.
The organization of this paper is as follows. The background about the data collection is given in
Section 2. Section 3 presents the proposed GHI prediction algorithms. The results and discussion are
given in Section 4. Our concluding remarks are outlined in Section 5.

2. Data Collection
Sky images are obtained from a wide-angle lensed Sky Imager. Measured GHI is taken from
the MIDC. This dataset is used in this work to forecast GHI up to 4 h ahead of time. The proposed
algorithms are developed using two publicly available datasets of sky images captured in Golden,
Colorado (39.742◦ N, 105.18◦ W, Colorado, USA). Golden, located in north–central, Colorado, U.S., lies
at an elevation of 1829 m, and is surrounded by mountains. It has a warm climate with a significant
amount of rainfall during the year. The datasets were recorded at SRRL [14,15]. Samples of the
obtained images are illustrated in Figure 1. The description for each of the datasets is as follows:
Electronics 2020, 9, 1700 4 of 14

(a) Sunny (b) Cloudy (c) Rainy

Figure 1. Sky images for three types of weather conditions from the TSI-880 (top) and
ASI-16 (bottom) datasets.

2.1. Total Sky Imager (TSI-880)

The sky images in this dataset have been taken using Total Sky Imager model TSI-880. It provides
full-color wide angle view sky images at an interval of 10 min. The imager has a mechanical sun
blocking band which tracks and blocks the sun, thus preventing saturation in the image. It has been
capturing all sky images since 14 July 2004 [16]. We used 313,562 images for an interval of 12 years
during the period from 14 July 2004 to 31 December 2016. The first 261,092 images, which were
captured over 10 years, from 14 July 2004 to 31 December 2014, were assigned as a training set.
The remaining 52,470, covering 2 years from 2015 to 2016, were assigned as a testing set to evaluate the
performance of both nowcasting and forecasting models. Other selections of training and testing sets
were also considered.

2.2. All Sky Imager (ASI-16)

This dataset was captured by All Sky Imager (ASI-16), an automatic full-sky camera system with
a fisheye lens for a 180◦ field of view. It provides full hemispheric pictures of the sky and clouds at an
interval of 10 min. It has been capturing all sky images since 26 September 2017 [17]. We used a total
of 57,863 images from 26 September 2017 to 20 February 2020, such that 70% of the total images were
for training and 30% were for testing. The corresponding GHI data was obtained from the National
Renewable Energy Laboratory (NREL) Baseline Measurement System (BMS), which has measured
and logged data every minute since 15 July 1981. In this paper, we consider 10 min GHI data obtained
from pyranometer model CMP22 [14,15].

3. GHI Prediction Algorithms

The development of prediction algorithms is comprised of two main stages: feature extraction
and regression. In the first stage, the features are extracted directly from the raw sky images or their
transformed versions. In the second stage, supervised learning is used to obtain the regression models.

3.1. Feature Extraction

Pre-processing steps are used to make the data suitable for ML algorithms. First, by using the
open-source software Python and OpenCV, a sky image was converted into a three-dimensional
matrix with dimensions 288 × 352 × 3. Second, to reduce the computational load, the sky image was
downsized into an RGB 3-dimensional array of size M × M × 3, where M = 32. Then, each image
Electronics 2020, 9, 1700 5 of 14

was reshaped into the form of a one-dimensional (1-D) array of size M2 × 3 samples (pixels). For N
available images, the dataset’s dimension becomes N × ( M2 × 3) samples.
The input array feature vectors to the ML algorithms are pixel values. The size of this vector is
quite large as M = 32 in our development. In this paper, LSA is used to reduce the number of features.
LSA, also known as latent semantic indexing (LSI), introduced by Deerwester et al. [18], performs a
linear dimensionality reduction using the method of truncated singular value decomposition (SVD).
Given a rectangular matrix X of size Q × D, the SVD of X is:

X = UΣV T (1)

where U ∈ CQ×Q and V ∈ CD× D are orthogonal matrices, the columns of U are called the left singular
vectors of X, while the columns of V are the right singular values of X. Σ ∈ RQ× D is the matrix
containing the singular values of X along its diagonal. However, the truncated SVD produces a
low-rank approximation of X with the k largest singular values:

X ≈ Xk = Uk Σk VkT (2)

where k < r (the number of non-zero singular values), Uk ∈ CQ×k and Vk ∈ CD×k , Σk ∈ Rk×k [19–21].
If X is the training set with Q = L and D = M2 × 3 × m, then the reduced dimension training set
will be
X 0 = Uk ΣkT (3)

This new transformed set contains L × k features. In the testing phase, the input features matrix,
T, of size l × ( M2 × 3) pixels, is first transformed to a reduced form using

T 0 = TVk (4)

The transformed vector T 0 is now of dimension l × k.

3.2. Regression Algorithms

Two machine learning algorithms have been used to develop the regression models. In particular,
the Random Forest (RF) and K-nearest neighbors (KNN) algorithms are considered. The choice of
these two algorithms has been based on extensive investigations to determine a regression algorithm
with competitive performance and reduced computational complexity.

3.2.1. KNN
KNN is one of the simplest of all ML algorithms, which can be used for both regression and
classification. KNN finds the closest neighbors for a set of testing points based on a user-defined
number called (K) within the given features. The neighbors are picked from a set of training points
whose classifications are noted. The parameter K defines the number of nearest neighbors used for the
regression. KNN could be considered as a lazy learning and non-parametric algorithm. Choosing the
value of K is essential to avoid the risk of overfitting. Without tuning this parameter, we run the risk
of having two noisy data points that are close enough to each other to outvote the right data points.
The values of K are fine tuned to be 2 using extensive experimentation with the implementation of
cross validation techniques (CV) [22], and using Euclidean distance as the distance function [23,24].

3.2.2. Random Forest

RF is considered one of the most famous ensemble ML techniques. It is used for performing
both regression and classification tasks. It operates by constructing multiple numbers of decision
trees. After training, the prediction for the test sample is done by averaging the predictions of all the
decision trees. The unique feature of the RF algorithm, which makes it different from other bagging
Electronics 2020, 9, 1700 6 of 14

algorithms, is that it selects random subsets of features at each split. This is beneficial because if one
or more features are powerful in predicting the output target value, these features will be selected
in building many of the next trees. The RF algorithm avoids overfitting the decision trees on their
training set using the bagging technique. Bagging selects random subsets of the training set to fit
each tree. This procedure leads to a better performance since it decreases the variance of the model,
without increasing the bias [25,26]. By using extensive experimentations along with the CV techniques,
the different parameters of the√ RF algorithm are fine tuned to be as follows (No.trees = 200, maximum
depth = 100, No.features (p) = k), where p is the number of features to consider when looking for the
best split.

3.3. Predictors’ Architectures

The proposed architectures for both nowcasting and forecasting prediction operations are
illustrated in Figure 2.

Figure 2. The proposed architectures for nowcasting and forecasting.

Nowcasting is the prediction of the solar irradiance at the instant the frame is captured [27].
Each raw image in the dataset was down-sized into an RGB three-dimensional array. This array is
directly reshaped into a 1-D input vector, which is applied to a regression model to predict solar
irradiance (GHI).
In forecasting, the current image and m − 1 previous (look-back) images are used to form a
concatenated input vector. Because the resulting input vector large in size, LSA is used to reduce its
dimensionality. Therefore, k features are extracted from the input vector and applied to the regression
model. Algorithm 1 shows the pseudocode of the forecasting process.
Electronics 2020, 9, 1700 7 of 14

Algorithm 1 Forecasting Process

Input: Training set X of size L × ( M2 × 3).
Test set T of size l × ( M2 × 3).
N= L + l
Ground truth set G of size 1 × L
Output: Predicted GHI up to 4 h ahead
Procedure:
1 Form look-back X̂ of size ( L − m + 1) × ( M2 × 3 × m)
2 Form look-back T̂ of size (l − m + 1) × ( M2 × 3 × m)
3 Compute truncated SVD : X̂k = Uk Σk VkT
4 Form transformed training train set X 0 = Uk ΣkT
5 Form transformed test set T 0 = TVk
6 Fit ML model over training set X 0
7 Generate forecast over test set T 0
8 Calculate nMAP error

Three statistical metrics are used to assess the performance of the models using two different
datasets. These metrics are as follows.
The normalized mean absolute percentage error (nMAPE)

l
|yi − ŷi |
nMAPE = ∑ l
× 100 (5)
i =1 ∑ i =1 y i

The root mean square error (RMSE)

v
u l
u1
RMSE = t ∑ (yi − ŷi )2 (6)
l i =1

The normalized root mean square error (nRMSE)

RMSE
nRMSE = (7)
(yimax − yimin )

where l is the number of testing samples, and yi and ŷi are the true and predicted values of GHI,
respectively.

4. Results And Discussion

This section reports the results of the proposed prediction algorithms. Specifically, we consider
two types of predictions, nowcasting and forecasting.
As mentioned in Section 3, both LSA and sample look-back are considered for forecasting. Figure 3
show the tuning process for both k and look-back intervals when the two datasets, TSI-880 and ASI-16,
are considered. Tuning the sample look-back interval gives rise to a trade off between the look-back
time and the accuracy. For example, in the case of a 12-sample look-back, and with 10 min between
each frame, 2 h will be the prediction latency; in other words, the model will wait 2 h to produce the
first forecast prediction (1–4 h ahead). On the other hand, increasing the look-back period will increase
the accuracy but up to a certain limit. Therefore, due to the importance of forecasting GHI as early
as possible for the application at hand, Figure 3 suggests that the optimal parameters are k = 20 with
look-back of 120 min.
The nowcasting and forecasting results for the two datasets are reported in Tables 1 and 2.
For the TSI-880 dataset in years 2015 and 2016, the results of the proposed approach are demonstrated
Electronics 2020, 9, 1700 8 of 14

along with the original VGG16 deep learning framework [28], as well as the approach of [7].
In nowcasting, the approach of [7] augmented the training of their model with auxiliary weather
parameters (average wind speed, relative humidity, barometric pressure, air temperature, sun position
(z), and clear sky prediction). We observe that our model achieved comparable results for nowcasting
with respect to the state-of-the-art models, as shown in Table 1. Additionally, applying the proposed
prediction algorithms to the first 10 years (2004 to 2014), excluding both the years 2015 and 2016,
will produce superior results. The reason is due to the following: during the period from May
2015 to December 31, 2016, the sun tracker has stopped working, as shown in Figure 4. Therefore,
the captured images are greatly affected by the sun. Contrary, the approach of [7] uses more robust
techniques of covering a higher receptive region of sky images with cloud movement to extract
relevant features from an image, therefore mitigating the effect of the sun tracker. This difference is
apparent in forecasting since the proposed approach uses sample look-back for prediction. As shown
in Table 1, the performance of the proposed prediction algorithms did not perform well in comparison
with the approach of [7] when the sun tracker is inactive. On the other hand, when 2 years of data
are randomly selected for testing, the performance of the proposed approaches is improved by a
significant difference.

30 30
Look-back (80 minutes) Look-back (80 minutes)
28 Look-back (100 minutes) 28 Look-back (100 minutes)
Look-back (120 minutes) Look-back (120 minutes)
Look-back (140 minutes) Look-back (140 minutes)
26 26

24 24
nMAPE %

nMAPE %

22 22

20 20

18 18

16 16

14 14

12 12
0 5 10 15 20 25 30 35 40 45 50 55 0 5 10 15 20 25 30 35 40 45 50 55
No. of components (k) No. of components (k)

(a) TSI-880 (b) ASI-16

Figure 3. Normalized mean absolute percentage error (nMAPE) vs. k value for different
look-back intervals.

Figure 4. Images from 2015 and 2016, respectively, where the sun-tracker has stopped working.
Electronics 2020, 9, 1700 9 of 14

Table 1. nMAPE for nowcasting and forecasting results using different methods.

Forecasting nMAPE (%)

Dataset Method Test Period Nowcasting nMAPE (%)
+1 hr +2 hr +3 hr +4 hr
2015 21.0 - - - -
VGG16 [28]
2016 21.9 - - - -
2015 14.6 17.9 25.2 31.6 39.1
A. Siddiqui et al [7]
2016 15.7 16.9 25.0 31.9 39.5
TSI-880 2015 17.51 36 38.9 41.5 44.4
KNN 2016 16.79 36.5 39.5 42.1 45.2
2 years (random) 10.2 14.9 16.7 18.7 21.1
2015 14.1 30.8 34.2 36.9 40.1
RF 2016 14.8 31.4 34.7 37.5 40.6
2 years (random) 9.8 21.9 24.9 27.8 30.6
KNN 1 year (random) 14.5 14.7 15.8 16.6 18.4
ASI-16
RF 1 year (random) 13.35 23.5 25.5 27.6 30.5

Table 2. Root mean square error (RMSE) and normalized root mean square error (nRMSE) for
nowcasting and forecasting results using the proposed methods.

Forecasting
Dataset Method Test Period Performance Metric Nowcasting
+1 hr +2 hr +3 hr +4 hr
RMSE (W/m2 ) 71.0 122.2 137.4 151.1 164.4
KNN 2 years (random)
nRMSE (%) 4.4 7.7 9.6 11.2 12
TSI-880
RMSE (W/m2 ) 64.7 141.8 158.9 171.2 183.2
RF 2 years (random)
nRMSE (%) 4 8.9 11.1 12.7 13.5
RMSE (W/m2 ) 112.3 116.7 127.6 132.3 143.8
KNN 1 years (random)
nRMSE (%) 8.5 8.9 9.3 10.2 11.3
ASI-16
RMSE (W/m2 ) 111.4 141.3 156.3 164.6 173.3
RF 1 years (random)
nRMSE (%) 8.1 10.8 11.4 12.7 13.7

With reference to Table 1, for the TSI-880 dataset and KNN model, the nMAPE values are
14.9%, 16.7%, 18.7%, and 21.1% for 1–4 h ahead forecasts, respectively, while, for the ASI-16 dataset,
the nMAPE values are 14.7%, 15.8%, 16.6%, and 18.4%, respectively. The results of the KNN model
for the two datasets are very close, which further confirms the effectiveness of proposed prediction
approach. A second note is that the RF algorithm performs better in nowcasting, while the KNN
algorithm is the best in forecasting. Figure 5 shows the ahead-of-time forecasting errors in an hourly
fashion for the KNN model. The error increases for larger forecast horizons as well as for later hours
in the day. Table 2 reports the prediction accuracy using RMSE and nRMSE. Figure 6 shows the RMSE
and nRMSE for both KNN and RF. Note that, for the TSI-880 dataset and KNN model, the nRMSE
values are 7.7%, 9.6%, 11.2%, and 12% for 1–4 h ahead forecasts, respectively, while, for the ASI-16
dataset, the nRMSE values are 8.9%, 9.3%, 10.2%, and 11.3%, respectively.
Electronics 2020, 9, 1700 10 of 14

Forcast(+1Hr) Forcast(+2Hr) Forcast(+3Hr) Forcast(+4Hr)

30
nMAPE %

0
11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00
Time (hour)

Figure 5. Hourly nMAPE forecast of +1, +2, +3, and +4 h.

200 16
KNN RF KNN RF
180
14
160
12
140
RMSE (W/m 2 )

10
120
nRMSE (%)

100 8

80
6
60
4
40
2
20

0 0
0Hr +1Hr +2Hr +3Hr +4Hr 0Hr +1Hr +2Hr +3Hr +4Hr
Forecast Horizen (Hour) Forecast Horizen (Hour)

(a) RMSE (b) nRMSE

Figure 6. RMSE and nRMSE prediction errors for K-nearest neighbors (KNN) and Random Forest (RF).

Three types of weather conditions are considered to compare the predicted solar irradiance of the
proposed models with the measured values. Figure 7a shows close agreement between the predicted
and measured GHI values for a sunny day. Figure 7b,c show the predicted and measured values
for cloudy and rainy days, with a noticeable discrepancy, which is more pronounced between the
predicted and measured values for the rainy day. This, in part, is due to the rapid changes in the
hourly irradiance values during the day. The nMAPE values for sunny, cloudy and rainy days are
3.1%, 14.3%, and 20.5%, respectively. Furthermore, Figure 7d illustrates the effect of the rapid change
in hourly irradiance on the prediction as the weather shifts from sunny to cloudy.
Electronics 2020, 9, 1700 11 of 14

1000 1000
Measured Measured
900 Sunny Predicted 900 Cloudy Predicted

800 800

700 700
Solar irradiance W/m2

Solar irradiance W/m2

600 600

500 500

400 400

300 300

200 200

100 100

0 0
06:00 08:00 10:00 12:00 14:00 16:00 18:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00
Time (hour) Time (hour)

(a) Sunny (b) Cloudy

1200
Measured
Predicted

1000 Rainy
Solar irradiance W/m2

800

600

400

200

0
06:00 08:00 10:00 12:00 14:00 16:00 18:00
Time (hour)

(c) Rainy (d) Sunny and cloudy

Figure 7. Measured vs. forecasted hourly solar irradiance for three types of weather conditions.

It is relevant to mention here that the approach of [7] is computationally more expensive when
compared to our proposed prediction algorithms. In particular, the proposed architecture of [7] uses
a sky image of dimensions (64, 64, 3) as an input to a CNN-based model. This model is obtained
by performing the ablation of layers from the original VGG16 architecture [28], which, compared
to our model, is a very deep and computationally complex approach. This CNN stage, aided by
auxiliary data, is only used to predict a single unit of solar irradiance (nowcasting). For forecasting,
a two-tier-long short-term memory (LSTM) neural network has been considered to utilize the model
of the CNN stage to obtain historical full-sky representations (look-back) and produce ahead-of-time
forecasts. The computational complexity of CNN convolution layers can be approximated by [29]:

d
O( ∑ nl −1 × s2l × nl × m2l ) (8)
l =1

where d is the number of convolution layers, l is the index of a convolution layer, nl is the number
of filters, nl −1 is the number of input channels of lth layer, sl is the spatial size of the filter, and ml is
the spatial size of the output feature map. In [7], CNN and LSTM were used to obtain ahead-of-time
forecasting. The time complexity for the LSTM stage is considered to be O(1) [30]. The pooling
and fully connected layers (FCL) take about 10% of the computational time [29]. On the other hand,
the computational complexities of RF and KNN regressors are O(ntrees × log( N )) and O( N × p),
respectively, where N is the number of training samples, p is the number of features, and ntrees is the
number of trees [31,32]. Using the values of relevant parameters of the predictors under consideration
Electronics 2020, 9, 1700 12 of 14

in Section 3.2, we find that the computational complexities of KNN and RF are reduced by 30% and
95% compared to those of the CNN-based approach, respectively.

5. Conclusions and Future Work

Accurate solar irradiance forecasting is crucial for the stability of the power grid. In this paper,
a solar irradiance forecast approach was proposed, which combines ML methods with dimensionality
reduction techniques. The learning algorithm is able to perform forecasting for solar irradiance
up to 4 h ahead. Two different datasets were used in this study to comprehensively evaluate the
performance of proposed prediction approaches. In addition, three statistical metrics were used to
assess the performance of the proposed approaches. It has been found that the proposed KNN-based
approach can achieve the following performance. For TSI-880 and ASI-16 hourly forecasts, the nMAPE
is 14.9% and 14.7%, the RMSE is 122.2 W/m2 and 116.7 W/m2 , and the nRMSE is 7.7% and 8.9%,
respectively. These results achieved are competitive compared to the state-of-the-art algorithms, while
utilizing computationally efficient techniques for the nowcasting and forecasting of surface irradiance.
In particular, the proposed KNN-based approach achieves better computational complexity, which is
reduced by 30% of that of the state-of-the-art algorithms.
A possible direction for future work is to implement the proposed prediction algorithms on
a low-cost sky imaging system. An initial investigation indicates that a Raspberry Pi single-board
computer connected to a programmable, high-resolution Pi camera (with a fisheye lens for a wide field
of view) can be used. The Raspberry Pi is a powerful platform because of its processing capabilities,
its lean design, and low power requirement. For performance evaluation purposes, a precision
pyranometer is required to provide the ground truth values of GHI. The pyranometer is a device used
to measure the irradiance (W/m2 ) on a plane surface, which results from the direct solar radiation
and the diffuse radiation incident from the hemisphere above. To accurately assess an all-sky imaging
system, sufficient data acquisition needs to be performed to cover different weather conditions (sunny,
cloudy, rainy, and dusty).

Author Contributions: Conceptualization, A.A.-l., O.T. and K.E.; methodology, A.A.-l., O.T. and K.E.; software,
A.A.-l., O.T. and K.E.; validation, A.A.-l. and O.T.; formal analysis, A.A.-l., O.T. and K.E.; investigation, A.A.-l., O.T.
and K.E.; resources, A.A.-l., O.T. and K.E.; data curation, A.A.-l.; writing—original draft preparation, A.A.-l., O.T.
and K.E.; writing—review and editing, A.A.-l., T.A.A. and S.A.A.; visualization, A.A.-l., O.T. and K.E.; supervision,
T.A.A. and S.A.A.; project administration, T.A.A. and S.A.A.; funding acquisition, S.A.A. All authors have read
and agreed to the published version of the manuscript.
Funding: This work was supported by the Researchers Supporting Project number (RSP-2020/46), King Saud
University, Riyadh, Saudi Arabia.
Acknowledgments: The authors would like to acknowledge the Researchers Supporting Project at King Saud
University.
Conflicts of Interest: The authors declare no conflicts of interest. The funders had no role in the design of the
study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to
publish the results.

References
1. Marcos, J.; Marroyo, L.; Lorenzo, E.; Alvira, D.; Izco, E. Power output fluctuations in large scale PV plants:
One year observations with one second resolution and a derived analytic model. Prog. Photovoltaics Res. Appl.
2011, 19, 218–227. [CrossRef]
2. Martinez-Anido, C.B.; Botor, B.; Florita, A.R.; Draxl, C.; Lu, S.; Hamann, H.F.; Hodge, B.M. The value of
day-ahead solar power forecasting improvement. Sol. Energy 2016, 129, 192–203. [CrossRef]
3. Sediqi, M.M.; Lotfy, M.E.; Ibrahimi, A.M.; Senjyu, T. Stochastic Unit Commitment and Optimal Power
Trading Incorporating PV Uncertainty. Sustainability 2019, 11, 4504. [CrossRef]
4. Kleissl, J. Solar Energy Forecasting and Resource Assessment; Academic Press: Cambridge, MA, USA, 2013.
5. Antonanzas, J.; Osorio, N.; Escobar, R.; Urraca, R.; Martinez-de Pison, F.J.; Antonanzas-Torres, F. Review of
photovoltaic power forecasting. Sol. Energy 2016, 136, 78–111. [CrossRef]
Electronics 2020, 9, 1700 13 of 14

6. Larson, D.P.; Nonnenmacher, L.; Coimbra, C.F. Day-ahead forecasting of solar power output from
photovoltaic plants in the American Southwest. Renew. Energy 2016, 91, 11–20. [CrossRef]
7. Siddiqui, T.A.; Bharadwaj, S.; Kalyanaraman, S. A deep learning approach to solar-irradiance forecasting
in sky-videos. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision
(WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 2166–2174.
8. Ryu, A.; Ito, M.; Ishii, H.; Hayashi, Y. Preliminary Analysis of Short-term Solar Irradiance Forecasting by
using Total-sky Imager and Convolutional Neural Network. In Proceedings of the 2019 IEEE PES GTD
Grand International Conference and Exposition Asia (GTD Asia), Bangkok, Thailand, 20–23 March 2019;
pp. 627–631.
9. Vanderstar, G.; Musilek, P.; Nassif, A. Solar Forecasting Using Remote Solar Monitoring Stations and
Artificial Neural Networks. In Proceedings of the 2018 IEEE Canadian Conference on Electrical & Computer
Engineering (CCECE), Niagara Falls, ON, Canada, 13–16 May 2018; pp. 1–4.
10. Lee, K.H.; Hsu, M.W.; Leu, Y.G. Solar Irradiance Forecasting Based on Electromagnetism-like Neural
Networks. In Proceedings of the 2018 1st IEEE International Conference on Knowledge Innovation and
Invention (ICKII), Jeju Island, Korea, 23–27 July 2018; pp. 365–368.
11. Hassan, M.Z.; Ali, M.E.K.; Ali, A.S.; Kumar, J. Forecasting day-ahead solar radiation using machine learning
approach. In Proceedings of the 2017 4th Asia-Pacific World Congress on Computer Science and Engineering
(APWC on CSE), Nadi, Fiji, 10–12 December 2017; pp. 252–258.
12. Ahmed, R.; Sreeram, V.; Mishra, Y.; Arif, M. A review and evaluation of the state-of-the-art in PV solar
power forecasting: Techniques and optimization. Renew. Sustain. Energy Rev. 2020, 124, 109792. [CrossRef]
13. Huynh, A.N.L.; Deo, R.C.; An-Vo, D.A.; Ali, M.; Raj, N.; Abdulla, S. Near real-time global solar radiation
forecasting at multiple time-step horizons using the long short-term memory network. Energies 2020,
13, 3517. [CrossRef]
14. NREL Solar Radiation Research Laboratory (SRRL). TSI-880 Sky Imager Gallery. 2004. Available online:
https://midcdmz.nrel.gov/apps/imagergallery.pl?SRRL (accessed on 25 September 2019 ).
15. NREL Solar Radiation Research Laboratory (SRRL). Baseline Measurement System (BMS). 1981. Available
online: https://midcdmz.nrel.gov/srrl_bms/ (accessed on 25 September 2019).
16. Morris, V. Total Sky Imager (TSI). In Handbook; Citeseer: Richland, WA, USA 2005.
17. NREL Solar Radiation Research Laboratory (SRRL). ASI-16 Sky Imager Gallery. 2017. Available online:
https://midcdmz.nrel.gov/apps/imagergallery.pl?SRRLASI (accessed on 22 February 2020).
18. Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K.; Harshman, R. Indexing by latent semantic
analysis. J. Am. Soc. Inf. Sci. 1990, 41, 391–407. [CrossRef]
19. Mirzal, A. The limitation of the SVD for latent semantic indexing. In Proceedings of the 2013
IEEE International Conference on Control System, Computing and Engineering, Penang, Malaysia,
29 November–1 December 2013; pp. 413–416.
20. Cherkassky, V.; Mulier, F.M. Learning from Data: Concepts, Theory, and Methods; Wiley-IEEE Press: Hoboken,
NJ, USA, 2007.
21. Klema, V.; Laub, A. The singular value decomposition: Its computation and some applications. IEEE Trans.
Autom. Control 1980, 25, 164–176. [CrossRef]
22. Pal, K.; Patel, B.V. Data Classification with k-fold Cross Validation and Holdout Accuracy Estimation
Methods with 5 Different Machine Learning Techniques. In Proceedings of the 2020 Fourth International
Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 11–13 March 2020;
pp. 83–87.
23. Shalev-Shwartz, S.; Ben-David, S. Understanding Machine Learning: From Theory to Algorithms; Cambridge
University Press: Cambridge, UK, 2019.
24. Pedro, H.T.; Coimbra, C.F. Assessment of forecasting techniques for solar power production with no
exogenous inputs. Sol. Energy 2012, 86, 2017–2028. [CrossRef]
25. Russell, S.J. Artificial Intelligence: A Modern Approach; Pearson: New York, NY, USA, 2002.
26. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer:
Berlin/Heidelberg, Germany, 2013; Volume 112.
27. Xingjian, S.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A
machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems;
Curran Associates, Inc.: Red Hook, NY, USA, 2015; pp. 802–810.
Electronics 2020, 9, 1700 14 of 14

28. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014,
arXiv:1409.1556.
29. He, K.; Sun, J. Convolutional neural networks at constrained time cost. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5353–5360.
30. Tsironi, E.; Barros, P.; Weber, C.; Wermter, S. An analysis of convolutional long short-term memory recurrent
neural networks for gesture recognition. Neurocomputing 2017, 268, 76–86. [CrossRef]
31. Deng, Z.; Zhu, X.; Cheng, D.; Zong, M.; Zhang, S. Efficient kNN classification algorithm for big data.
Neurocomputing 2016, 195, 143–148. [CrossRef]
32. Louppe, G. Understanding random forests: From theory to practice. arXiv 2014, arXiv:1407.7502.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional
affiliations.

c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).