Feature Selection and ANN Solar Power Prediction
Feature Selection and ANN Solar Power Prediction
Research Article
Feature Selection and ANN Solar Power Prediction
Received 8 May 2017; Revised 14 September 2017; Accepted 16 October 2017; Published 8 November 2017
Copyright © 2017 Daniel O’Leary and Joel Kubby. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
A novel method of solar power forecasting for individuals and small businesses is developed in this paper based on machine
learning, image processing, and acoustic classification techniques. Increases in the production of solar power at the consumer level
require automated forecasting systems to minimize loss, cost, and environmental impact for homes and businesses that produce
and consume power (prosumers). These new participants in the energy market, prosumers, require new artificial neural network
(ANN) performance tuning techniques to create accurate ANN forecasts. Input masking, an ANN tuning technique developed for
acoustic signal classification and image edge detection, is applied to prosumer solar data to improve prosumer forecast accuracy
over traditional macrogrid ANN performance tuning techniques. ANN inputs tailor time-of-day masking based on error clustering
in the time domain. Results show an improvement in prediction to target correlation, the 𝑅2 value, lowering inaccuracy of sample
predictions by 14.4%, with corresponding drops in mean average error of 5.37% and root mean squared error of 6.83%.
expectation and maximizing the value returned from a solar Table 1: A glossary of variables used in ANN error equations.
investment.
ANN’s success is strongly correlated to careful parameter Variable Definition
tuning. Input masking, a parameter tuning technique used in 𝑚 The number of samples in the evaluation
visual recognition systems, has been applied successfully in 𝑡 The sample index
applications for audio signal classification and wind turbine 𝑃𝑡 Power produced at time 𝑡
power forecasting. This paper details the use of input masking ̂𝑡
𝑃 The forecasted power for time 𝑡
𝑚
in ANNs unique to short-term solar power forecasting. 𝑃𝑡 Average power = (1/𝑚) ∑𝑡=1 𝑃𝑡
Table 2: ANN assessments from the related literature. Table 3: A table defining the variables used as inputs to the ANN.
Assessment Related literature values Variable Definition
MRE 1.5% [3] 𝑡0 Current time
𝑅2 95% [3], 85% [4] 𝑃MPP (𝑡0 ) Current MPP
MAE 53.49 kW [3], 10% [7] 𝐼PSP (𝑡0 ) Current PSP
nRMSE 15.82% [3], 7% [4] 𝐼NIP (𝑡0 ) Current NIP
𝑃MPP (𝑡−1 ) ⋅ ⋅ ⋅ 𝑃MPP (𝑡−24 ) Past two hours of MPP values
𝐼PSP (𝑡−1 ) ⋅ ⋅ ⋅ 𝐼PSP (𝑡−24 ) Past two hours of PSP values
4. Data
𝐼NIP (𝑡−1 ) ⋅ ⋅ ⋅ 𝐼NIP (𝑡−24 ) Past two hours of NIP values
This work uses data that was collected at five-minute inter- 𝑆(−1)MPP ⋅ ⋅ ⋅ 𝑆(−2)MPP Slope of MPP at five and ten minutes
vals continuously from May 2011 to August 2012. The data 𝑆(−1)PSP , 𝑆(−2)PSP Slope of PSP at five and ten minutes
was collected from measurements on a dual axis tracking 𝑆(−1)NIP , 𝑆(−2)NIP Slope of NIP at five and ten minutes
polycrystalline silicon photovoltaic module with 170-watt
maximum power installed at the RE Lab at NASA Ames in the Input layer Hidden layer Output layer
Moffett Field Air Force Base in Mountain View, California.
This data and the corresponding predictions include night t0
time readings, when energy production is zero. Including
night data allows us to use the full day to identify time regions PMPP (0)
of high and low prediction accuracy for input masking. The 1
data samples consist of four measurements:
PMPP (−24)
(i) Timestamp. The date and time when samples are 2
taken. IPSP (0)
(ii) Normal Incidence Pyrheliometer (NIP). A measure of IPSP (−24) PMPP (5)
3
the direct beam solar irradiance (W/m2 ).
(iii) Precision Spectral Pyranometer (PSP). A measure of INIP (0)
the total irradiance in the plane of the array (W/m2 ).
INIP (−24)
(iv) Maximum Power Point (MPP). A measure of the
power produced from the solar panel (W). 20
S(−1)
To describe the inputs to the ANN calculated from the
measurements above, allow 𝑡𝑛 to be the time of a particular S(−2)
sample. Further, let 𝑛 be an index of the sample taken in
relation to the sample under consideration. Thus, 𝑛 is 0 for Figure 1: The artificial neural network (ANN) with standard
the sample in the dataset under investigation and 𝑡1 is the preprocessing contains two previous hours’ worth of measurements
timestamp of the next sample taken, five minutes later, and and two slopes as input and a twenty-minute prediction window.
𝑡𝑥 is the 𝑥th sample taken 5𝑥 minutes after time 𝑡0 . Similarly,
𝑡−1 is the sample taken 5 minutes prior to 𝑡0 and 𝑡−24 is the
sample taken 2 hours earlier. days. As clouds pass over the irradiation measurement tools,
Further, allow the function 𝑃MPP (𝑡) to map time values irradiance varies greatly. During sunny days, irradiance
to MPP samples. 𝑃MPP (𝑡0 ) is the power produced for the slopes remain low.
sample under consideration, 𝑃MPP (𝑡−4 ) is the MPP value for
Knowing this nomenclature, we can now describe the
the sample twenty minutes prior to 𝑡0 , and 𝑃MPP (𝑡4 ) is the
inputs to the ANN in Table 3. Colloquially, each sample
MPP value for the sample twenty minutes after 𝑡0 . Similarly,
includes the current measurements, the power for the pre-
allow 𝐼PSP (𝑡) to be the irradiance measured by the PSP (total
vious two hours, and the slope of the previous two samples.
plane of array irradiance) at time 𝑡 and 𝐼NIP (𝑡) to be the
irradiance measured by the NIP (direct normal irradiance)
at time 𝑡. 5. ANNs
Beyond the measurements themselves, the absolute value
of the slope of the MPP curve is calculated using Artificial neural networks are mathematical constructs based
on the physical structures of a biological system, the brain.
𝑃
(𝑡 ) − 𝑃MPP (𝑡𝑥 ) Typically, a preprocessing step is performed on data before
𝑆 (𝑥) = MPP 0 .
(6) entering into the ANN. We have performed a min–max
𝑡0 − 𝑡𝑥 normalization, resulting in all inputs, outputs, and targets to
Similar equations are used to find the magnitude of the fall between zero and one.
slope of the PSP and NIP. The magnitude of the slopes of The artificial neural network (ANN) is made up of layers
measurement values gives an indication of variably cloudy of neurons. Figure 1 depicts the version of the ANN structure
4 Journal of Renewable Energy
120 24.6%
Prediction (W)
100
80
47.5% <2 W
12.5%
60
4–10 W
40
11.1%
20
4.3%
0
0 20 40 60 80 100 120 140 160 10–40 W
Target (W) >40 W
Figure 2: A model fitness graph for Max Power Point (MPP) target Figure 3: A pie chart of the proportion of predictions in different
versus prediction analysis of a standard, normalized input ANN. error categories. Nearly half (47.5%) of all predictions have an error
Black lines and grey boxes have been added to the diagram to of less than 2 [W].
identify and assess target/prediction outliers with high error values.
used with the RE Lab data. This ANN consists of three layers: three categories: test, validation, and training. The unmasked
the input layer, the hidden layer, and the output layer. Each ANN was trained on the training data and validated on the
neuron in the input layer takes in one data source. The output validation data, and when the validation showed that the
of each input layer neuron is input for each of the hidden layer training was done, the resulting ANN was tested on the test
neurons. This ANN has seventy-eight input neurons; each data. We then returned to the original, total dataset, divided it
neuron in the hidden layer will have seventy-eight inputs. randomly into three categories (training, test, and validation),
This ANN relationship occurs a second time between the and developed the masked ANN on the training data; once it
output of a neuron in the hidden layer and the inputs of the was validated that the training was done, it was tested on the
output layer. Thus, if you have twenty neurons in the hidden testing dataset. That is to say, each ANN pulls from the same
layer, you will have twenty inputs for each neuron in the dataset and uses the same ratio of training : validation : test;
output layer. however, the contents of each category are unique for each
The ANN model randomly divides the total set of data run of each ANN. Figure 2 correlates the predictions of the
from the RE Laboratory into three main categories: training, ANN on the test dataset to the max power point (MPP)
validation, and testing. The test consists of 20% of the total of the solar panel 20 minutes in the future (the prediction
dataset. Of the remaining 80% of the data, 85% is allocated value, found along the vertical axis) and the actual MPP
to training and 15% is allocated to validation. The training that was measured twenty minutes later (the target value,
of an ANN involves multiple cycles of training, referred found along the horizontal axis). The accuracy measurements
to as epochs. Each epoch consists of the ANN training for the ANN, with an 𝑅2 value of 90.88%, an RMSE of
on the training dataset and the resulting neural network is 16.98 W, and a MAE of 6.33 W (for a 170 W panel), are in line
then applied to the validation dataset. If the RMSE of the with other ANN forecasting accuracy correlation coefficients
prediction on the validation set is lower than the previous under similar circumstances [3].
validation RMSE, indicating that the training of the ANN A perfect prediction in the model fitness chart occurs
has improved accuracy, training continues. The ANN will along the 45-degree line. Two black lines have been added to
continue to train and validate in a cycle until the validation the model fitness chart to indicate a twenty-watt error range,
dataset RMSE does not improve for ten consecutive epochs. roughly 10%. Further, grey boxes have been added to indicate
While the ANN learns from the training dataset, it uses regions of high error with similar characteristics.
backpropagation to adjust the weights on each neuron based A cursory look at Figure 2 could easily lead the reader
on the error of the output layer and each neuron’s output is to the conclusion that the ANN has a high forecasting
dictated by a sigmoidal activation function. error, despite the strong correlation coefficient. However,
the model fitness diagram points overlap in areas where
6. ANNs with Standard Preprocessing the prediction/target points cluster, at low and high power
production times. Consequently, the vast majority (ninety
Prior to inputting the RE Lab data into the ANN, measure- percent of points) fall in between the two black lines,
ments were normalized, such that each input varies from 0 to indicating a target/prediction pair with an absolute error of
1. Figure 2 shows the correlation between the predictions of less than twenty watts. The pie chart in Figure 3 breaks out the
the ANN and our total dataset that was divided randomly into percentage of samples in different absolute error ranges. Less
Journal of Renewable Energy 5
Prediction (W)
15 15 120
100 100
80 80
60 60
Frequency
Frequency
40 40
10 20 10
0 20
0
0
20
40
60
80
100
120
140
160
0
20
40
60
80
100
120
140
160
Target (W)
5 5 Target (W)
0 0
8 9 10 11 12 13 14 15 16 6 7 8 9 10 11 12 13 14
Time of day (hour in mil. time) Time of day (hour in mil. time)
(a) Region 1 (b) Region 2
Outlier error histogram Outlier error histogram
20 20
Model fitness: outliers Model fitness: outliers
160 160
140 140
Prediction (W)
Prediction (W)
120 15 120
15 100 100
80 80
60 60
Frequency
Frequency
40 40
10 20 10 20
0 0
0
20
40
60
80
100
120
140
160
0
20
40
60
80
100
120
140
160
Target (W) Target (W)
5 5
0 0
8 9 10 11 12 13 14 15 6 8 10 12 14 16
Time of day (hour in mil. time) Time of day (hour in mil. time)
(c) Region 3 (d) Region 4
Figure 4: Outliers are filtered by region on the model fitness graph. The model fitness graph is reproduced in the inlay of each chart, with
the points within the region indicating the area of the graph the region contains. The histograms show the frequency of each outlier in that
region grouped by hour of day.
than fifteen percent of all predictions are off by more than ten Figures 4(b) and 4(d) define underpredictions that when
watts. reviewed by time of day in the histogram show a high
A closer examination of the high error forecasts shows frequency of these errors occurring in the six- and seven-
a clustering of high error points around specific times of hour range (6:00 a.m. to 8:00 a.m.), corresponding to the time
the day. Dividing the model fitness outliers into groups the solar panel first begins to get light in the mornings.
based on their location in the model fitness graph yields a
correlation between high error and time of day. Figure 4 6.1. Masking Inputs. Analysis of the standard preprocessing
examines different regions and their corresponding time-of- ANN shows four distinct time frames, shown in Figure 5,
day frequency. Figures 4(a) and 4(c) identify regions where characterized by the error rates in Figure 4:
the ANN predicts more than it should have. The histograms
(i) Night: when solar energy production is essentially
in Figures 4(a) and 4(c) show the frequency of these forecasts
zero.
throughout the day, while the inlaid model fitness graph
indicates the region being considered. Peaks in the fourteen- (ii) Sunrise: one of the two time zones with the highest
and fifteen-hour range (2:00 p.m. to 4:00 p.m.) indicate that error rate due to the high volatility of the solar energy
these overpredictions are consistent with the time that the production data.
panel begins to lose light due to the setting sun and local (iii) Day: when solar energy is consistent (on sunny days)
obstructions. Similarly, the inset model fitness graphs in and therefore more predictable than sunrise or sunset.
6 Journal of Renewable Energy
Masking inputs by time of day Table 4: Error values for nonmasked versus masked ANN.
Assessment Nonmasked Masked
𝑅2 90.88% 92.2%
RMSE 16.98 W 15.82 W
MAE 6.33 W 5.99 W
Dawn PSP_dawn
mask
(iv) Sunset: one of the two time zones with the highest Figure 6: The impact of masking on the PSP inputs. The same
error rate due to the high volatility of the solar energy masking was applied to the NIP and MPP inputs.
production data.
Conflicts of Interest
The authors declare that there are no conflicts of interest
regarding the publication of this paper.
Acknowledgments
This work is the direct result of the patience and mentorship
of Professor Joel Kubby. The authors would like to thank the
University of California, Santa Cruz (UCSC) and the National
Aeronautics and Space Administration for the opportunities
they have provided. Further thanks are due to Dr. Oscar
Azucena, Samuel Kahn, and Steve Willis for their help and
expertise. This material is based on work supported by the
National Science Foundation under Grant no. 0942439.
References
[1] Y. Zhang, M. Beaudin, H. Zareipour, and D. Wood, “Forecasting
Solar Photovoltaic power production at the aggregated system
level,” in Proceedings of the 2014 North American Power Sympo-
sium, NAPS ’14, pp. 1–6, 2014.
[2] A. Mellit, M. Benghanem, and M. Bendekhis, “Artificial neural
network model for prediction solar radiation data: application
for sizing stand-alone photovoltaic power system,” in Proceed-
ings of the IEEE Power Engineering Society General Meeting, vol.
1, pp. 40–44, June 2005.
[3] H. T. C. Pedro and C. F. M. Coimbra, “Assessment of forecasting
techniques for solar power production with no exogenous
inputs,” Solar Energy, vol. 86, no. 7, pp. 2017–2028, 2012.
[4] E. İzgi, A. Öztopal, B. Yerli, M. K. Kaymak, and A. D. Şahin,
“Short-mid-term solar power prediction by using artificial
neural networks,” Solar Energy, vol. 86, no. 2, pp. 725–733, 2012.
[5] Z. Wang and I. Koprinska, “Solar power prediction with data
source weighted nearest neighbors,” in Proceedings of the 2017
International Joint Conference on Neural Networks (IJCNN), pp.
1411–1418, IEEE, Anchorage, Alaska, Alaska, USA, May 2017.
[6] W. Cabrera, D. Benhaddou, and C. Ordonez, “Solar power
prediction for smart community microgrid,” in Proceedings of
the 2nd IEEE International Conference on Smart Computing,
SMARTCOMP ’16, pp. 1–6, May 2016.
[7] R. Palma-Behnke, C. Benavides, E. Aranda, J. Llanos, and
D. Sáez, “Energy management system for a renewable based
microgrid with a demand side management mechanism,” in
Proceedings of the Symposium Series on Computational Intelli-
gence, IEEE SSCI 2011 - 2011 IEEE Symposium on Computational
Intelligence Applications in Smart Grid, CIASG 2011, pp. 1–8,
April 2011.
[8] M. Pourhomayoun, P. Dugan, M. Popescu, and C. Clark, Bioa-
coustic signal classification based on continuous region processing,
grid masking and artificial neural network, 2013, https://arxiv
.org/abs/1305.3635.
[9] H. Liu, H.-Q. Tian, D.-F. Pan, and Y.-F. Li, “Forecasting models
for wind speed using wavelet, wavelet packet, time series and
Artificial Neural Networks,” Applied Energy, vol. 107, pp. 191–
208, 2013.
Journal of Journal of International Journal of
Rotating
Energy Wind Energy Machinery
Journal of Journal of
Industrial Engineering Petroleum Engineering
Journal of
Solar Energy
Submit your manuscripts at
https://www.hindawi.com
Journal of
Fuels
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
(QJLQHHULQJ 3KRWRHQHUJ\
Journal of International Journal of
Advances in Advances in Journal of
Power Electronics
Hindawi Publishing Corporation Hindawi Publishing Corporation
High Energy Physics
Hindawi Publishing Corporation Hindawi Publishing Corporation
Combustion
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 201 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 201 http://www.hindawi.com Volume 2014
,QWHUQDWLRQDO-RXUQDORI
$HURVSDFH
Advances in
Tribology
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Science and Technology of
Nuclear Installations
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
(QJLQHHULQJ
+LQGDZL3XEOLVKLQJ&RUSRUDWLRQ
KWWSZZZKLQGDZLFRP 9ROXPH