Intra-Hour Solar Irradiance Forecasting Using Topology Data Analysis and Physics-Driven Deep Learning
Intra-Hour Solar Irradiance Forecasting Using Topology Data Analysis and Physics-Driven Deep Learning
Intra-Hour Solar Irradiance Forecasting Using Topology Data Analysis and Physics-Driven Deep Learning
Renewable Energy
journal homepage: www.elsevier.com/locate/renene
A R T I C L E I N F O A B S T R A C T
Keywords: Real-time Direct Normal Irradiance (DNI) prediction is crucial for reliable and economic operation of Concen
Time series analysis trated photothermal Solar Power (CSP) system in arid desert areas. However, the stochastic characteristics of
Topology data analysis short-term multidimensional meteorological time series make intra-hour DNI prediction a challenging task. In
LSTM
this study, we have proposed a deep learning model called TLD, which is combined with topological features
DNI forecasting
TLDP
captured by Topology Data Analysis (TDA) and temporal features captured by LSTM to address this challenge.
Experimental results demonstrated that TLD outperformed the five latest models (Ridge, RF, C_GRU, BiLSTM,
and GBRT) on seven solar radiation datasets in arid desert areas. Further analysis revealed that the proportion of
cloudy days is a key factor affecting the model’s performance. To enhance the forecast ability of TLD, we
developed a physics-informed hybrid model named TLDP based on TLD and a smart persistence model, which
fully combines the DNI prediction ability of TLD under cloudy conditions and that of the smart persistence model
under sunny conditions. Experimental results of eight datasets collected from real-world solar photothermal
power stations indicated that TLDP outperformed existing models, which may lay a foundation for more
economical and stable operation of CSP plants in arid desert areas.
* Corresponding author. Systems Engineering Institute, School of Automation, Xi’an Jiaotong University, Xi’an, Shaanxi, 710049, China.
** Corresponding author.
E-mail addresses: [email protected] (Y. Wang), [email protected] (Q. Peng).
https://doi.org/10.1016/j.renene.2024.120138
Received 31 May 2023; Received in revised form 12 November 2023; Accepted 11 February 2024
Available online 23 February 2024
0960-1481/© 2024 Elsevier Ltd. All rights reserved.
T. Han et al. Renewable Energy 224 (2024) 120138
short-term DNI [21]. Chu et al. developed a system for solar irradiance
estimation by obtaining cloud features of all-sky images on both regional
and global scales [22]. Bijan Nouri et al. developed a real-time capable
nonparametric probabilistic quantile nowcasting method based on
all-sky imager (ASI) systems [23]. Quentin Paletta et al. developed a
single machine learning framework to improve intra-hour (up to 60-min
ahead) irradiance forecasting based on all-sky cameras (up to 30-min
ahead) and satellite observations [24]. To further improve prediction
accuracy, scholars have developed hybrid models by adding meteoro
logical time series data analysis to the image-based models. Caldas et al.
proposed a hybrid model with real-time irradiance measurements and
all-sky images to generate a 1 to 10 min-ahead forecast of 1-min aver
aged solar radiation [25]. Nouri et al. developed a hybrid solar irradi
Fig. 1. Operating principles of CSP plants.
ance nowcasting approach combining all-sky imager systems and
persistence irradiance models [26]. Xiaoqiao Huang et al. developed a
to track the sun accurately, which ensures that maximum solar energy is hybrid 3D ConvLSTM-CNN network combining all-sky-image data and
captured and converted into electricity. Moreover, short-term DNI pre meteorological data for forecasting solar irradiance [27]. However, the
diction enables operators to anticipate adverse weather conditions in effectiveness of image-based models and hybrid models depends on the
order to take preventive measures to minimize downtime, and forecast high spatial and temporal resolutions of satellite and ground-based im
the plant’s energy production precisely [14,15]. However, due to the ages [28], and the high cost of image capture facilities limits the
rapid changes in weather conditions, the complex interactions between widespread use of image-based models for real-time short-term DNI
various meteorological factors and the stochastic nature of these factors, prediction.
short-term DNI prediction is recognized as the most challenging task In contrast to image-based and hybrid models, time series-based
compared to long-term and medium-term DNI prediction [16,17]. For models are easy to be used for short-term DNI prediction, benefiting
instance, as the main factor causing DNI sudden changes, the cloud from the conveniently acquisition of relevant meteorological time series
cover’s motion process is often accompanied by their own generation data. These models can be divided into physical models, machine
and dissipation, which makes modeling cloud motion trajectory diffi learning models and deep learning models [29]. Physical models typi
cult, leading to a large number of stochastic features in DNI prediction. cally combine terrain, weather, and other aerodynamic factors to
Additionally, since large-scale solar farms are mostly built in sparsely establish a prediction model that calculates the changes of related
populated desert areas and plateau regions, solar radiation is influenced meteorological factors through numerical weather prediction (NWP).
by ground and high-altitude factors [18,19]. Furthermore, factors such However, this type of approach is typically ill-suited for short-term solar
as sunshine duration and atmospheric moisture content exhibit a certain irradiance prediction due to its high calculation costs [30], thus making
degree of randomness, posing challenges to the accurately short-term it unsuitable for real-time solar irradiance prediction. Machine learning
DNI prediction. models, such as Auto-Regressive Moving Average (ARMA) models [31],
Many researchers believed that machine learning methods with the Support Vector Machines (SVM) [32], and Artificial Neural Networks
ability to capture nonlinear characteristics are effective means to predict (ANN) [33], have also been proposed. However, conventional ML
stochastic processes of solar radiation fluctuations [20]. Much effort has models can only extract shallow features instead of high-dimensional
been devoted to developing short-term DNI prediction models from the nonlinear features among multidimensional time series [34]. With the
perspective of data analysis, which can be divided into three categories: advantage of nonlinear feature extraction capabilities, deep learning
image-based models, time-series-based models, and hybrid models. models have been applied to solar radiation prediction and achieved
Since cloud cover is recognized as the main factor causing sudden good prediction performance. Fahim et al. proposed a system to forecast
changes in DNI, many image-based models have been proposed to Bangladesh solar radiation using three different networks, including
capture cloud cover information based on all-sky images to predict Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM),
2
T. Han et al. Renewable Energy 224 (2024) 120138
and Gated Recurrent Unit (GRU), among them the GRU model showing 2. Related works
the best prediction results with a MAPE score of 19.28% [35]. Anil
Kumar et al. proposed a novel approach for forecasting solar energy by 2.1. TDA
transforming multipoint time series into images with the multi-column
convolutional neural network (MCNN) to make prediction [36]. Bhatt TDA was firstly proposed by Gunnar as a novel data analysis method
et al. proposed three deep learning models to forecast solar radiation that applied the principles of algebraic topology to statistical analysis
from 1 step (15 min) to 6 steps (1 h and 30 min) ahead [37]. However, [43]. This approach can analyze ultra-high dimensional data without
the stochastic characteristics involved in multidimensional meteoro performing data dimension reduction, making it useful for retaining raw
logical time series are key influencing factors for short-term DNI pre data information during analysis. As such, TDA can effectively abstract
diction. Even the current deep learning algorithms have demonstrated the essential features within high-dimensional data under various met
outstanding ability in high-dimensional nonlinear feature extraction, rics [44]. Persistent Homology (PH) is the most critical topology analysis
they do not yet have the ability to capture stochastic features among method in TDA [45]. In PH, we started with a collection of data points
multidimensional variables. Topological Data Analysis (TDA), a tech forming a data cloud. An independent variable ε, is defined as the radius
nology that combines computational topology and data science, is pro of an imaginary ball centered at each data point. As we gradually
posed to mine the valuable relationships hidden in big data clouds [38], increased the ε, the balls would grow outwards and gradually touch with
and has been widely applied in many fields such as computer vision and each other. The overlap of these balls generated a unique topological
understanding of protein folding [39–42]. Persistent Homology (PH) is a characteristic specific to the dataset. We can use this unique topological
representative method in TDA to derive topological structures from a characteristic to differentiate nuances in the topologies of different point
given data cloud. Compared with commonly used data analysis methods clouds. The schematic diagram showing a data cloud and how the
such as principal component analysis and clustering analysis, PH not connection relationship between data points change as ε increases can
only effectively captures topological information in high-dimensional be seen in Fig. 2.
data spaces, but also excels at discovering small stochastic characteris The geometric structure shown in the process of Persistent Homology
tics that cannot be discovered using traditional methods. The topologi is called simplex. For example, we can decompose an arbitrary simplicial
cal features have some nice theoretical properties such as robustness and complex into 0-simplexes (nodes), 1-simplexes (links), 2-simplexes
scale invariance, which are more resilient to local perturbations. (faces), 3-simplexes (tetrahedrons) components. The standard topolog
Incorporating TDA features into the construction of deep learning ical descriptors are the persistence diagram [46] and the persistence
models may be helpful for short-term DNI prediction. barcode [47], which give a multiscale representation of the homology of
Therefore, in this study, considering the stochastic features obtained the geometric construction [48] (Fig. 3).
by TDA and multidimensional meteorological time series, we developed How to vectorize these topological features into the data that can be
a physics-driven deep learning model called TDA-LSTM-DNN (TLD) for recognized by machine learning is a key scientific problem. Persistence
short-term DNI prediction. Firstly, PH was used to extract the topolog Landscapes was proposed by Peter Bubenik as a topological feature
ical features of meteorological time series including Global Horizontal representation method and has been widely used in many fields [49].
Irradiance (GHI), Direct Normal Irradiance (DNI), Direct Horizontal The principle of Persistence Landscapes is to perform certain trans
Irradiance (DHI), Average Pressure (AP), Relative Humidity (RH), Wind formations based on the base of persistence diagram. The idea of land
Speed (WS), Wind Direction (WD), Solar Zenith Angle (SZA), Surface scape is defined by considering the set of functions created by tenting
Albedo (SA), Temperature (Tem), Ozone, Clear sky GHI (CGHI), Clear each point p = (d, b) representing a birth-death pair in the persistent
sky DNI (CDNI) and Clear sky DHI (CDHI), on time dimension and diagram as follows:
feature dimension respectively, as the stochastic features. Then, ⎧ [ ]
⎪ d+b
considering the physical significance of these meteorological and sto ⎪
⎪
⎪ t − d, t ∈ d,
⎪ 2
chastic features, the TLD was developed for intra-hour DNI forecasting. ⎪
⎨ ( ]
In constructing the neural network structure of TLD, we draw inspiration ψ p (t) = d+b (1)
⎪
⎪ n − t, t ∈ ,n
from the information processing of the human brain, partitioning and ⎪
⎪
⎪ 2
⎪
blocking the variables by considering their physical significance. The ⎩
0, otherwise
experimental results have shown that compared with the exiting DNI
prediction models, TLD delivered the best DNI prediction ability on 8 The persistent landscape of persistent diagram is the collection of
representative datasets in arid desert areas, which benefited from the functions {ψ p }p :
stochastic features abstracted by TDA and well designed physics-driven ( )
λdgm (k, t) = maxk ψ p (t) , t ∈ [0, T], k ∈ N (2)
network structure. Furthermore, a novel physics-informed hybrid model
combining TLD and smart persistance model called TLDP is developed to here parameter k represents the kth landscape of persistence diagram,
enhance the predictive ability of the TLD under different weather con and maxk is the kth largest value in the set (Fig. 4).
ditions. The experimental results showed that TLDP’s DNI prediction Here uniform sampling of different persistence landscapes can be
ability has been greatly improved in three weather conditions, which employed to characterize the topological features of a multidimensional
provided significant support for intra-hour DNI prediction in arid desert meteorological time series.
areas.
The contents of the whole paper are structured as follows: In Section
2, we provided a brief review of related materials and methods. Section 2.2. LSTM
3 described the effectiveness of TDA in capturing stochastic features of
multidimensional meteorological time series and the framework of TLD. LSTM (Long Short-Term Memory) is a significant variant of RNN
In Section 4, we presented the experimental results and compared them [50]. Traditional RNN has the problem of gradient vanishing or ex
with the advanced models, and constructed the framework of TLDP to ploding. In order to overcome this issue, LSTM introduced an internal
improve TLD’ prediction performance. Finally, we concluded the paper state called ‘cellular state’ that enables the network to selectively forget
in Section 5. or remember previous information and capture long sequence de
pendencies effectively [51]. LSTM has three gating mechanisms, namely
the forgetting gate, the input gate, and the output gate, which respec
tively control the forgetting and updating of cell states, the amount of
new information, and the output content. These gating mechanisms are
3
T. Han et al. Renewable Energy 224 (2024) 120138
Fig. 2. The schematic diagram showing a data cloud, and how the connection relationship between data points change as ε increases. The number of 0-simplexes and
1-simplexes can be deduced from the subfigures to be roughly 8 → 3→1 → 0 and 0 → 4→6 → 13 respectively.
Fig. 3. An example of the persistence diagram (left) and persistence barcode (right). The points in the persistence diagram represent simplicial complex, with
different colors representing different dimensions. The X-axis of persistence diagram represents the radius value corresponding to the appearance of a simplicial
complex, and the Y-axis represents the radius value when the corresponding simplicial complex disappears. Different lines in persistence barcode represent different
simplicial complex. The X-axis of persistence barcode represents the radius value, and the Y-axis represents the index of different simplicial complex.
Fig. 4. An example of a persistence landscape is shown in the figure on the right, which is associated with a persistence diagram shown on the left. The first
landscape is shown in red, the second one in blue, and the rest landscapes are all zeros.
4
T. Han et al. Renewable Energy 224 (2024) 120138
Fig. 5. The figures in the first row are the comparisons between the original time series (blue line) and the time series with added noise (red line). The figures in the
second row are the persistence barcodes, which are denoted as TDA features of the original time series. The figures in the third row are the persistence barcodes,
which are denoted as TDA features of the time series with added noise. The noise added in the original time series decreases gradually from left to right.
3.2. The framework of TLD is a fully connected neural network with two layers. In general, the
framework of TLD is shown in Fig. 7.
In order to make short-term DNI prediction accurately, inspired by
model decomposition, the basic idea of TLD is to extract the features of 4. Experimental evaluation
variables with similar physical meanings using a deep learning model,
then combine features with similar physical meanings to obtain In this section, we compared TLD with 8 DNI prediction models to
advanced features. Considering that different variables are with evaluate the effectiveness of TLD.
different physical meanings [56], we divided them into the following
categories (Fig. 6a). For topological features of multidimensional solar
radiation time series, we extracted spatial features and temporal features 4.1. Data description
by using TDA respectively, which are named TDA_feature and TDA_time.
Taking the subnet of generating Solar Features as an example, we Rich solar energy resources are primary prerequisite for the devel
briefly explained the construction process of the TLD. Firstly, the LSTM opment of CSP plants [57]. Based on the classification of global solar
network was used to extract the temporal features of variables in the radiation and daylight hours, the regions with the highest intensity of
time scale, and then we used the DNN network to combine the extracted solar radiation and the best amount of daylight hours worldwide include
features together to obtain the more global feature. The subnet of con North Africa, the Middle East, Southwest United States, Mexico,
struction of Solar Features is shown in Fig. 6 b. Southern Europe, Australia, South Africa, Southeast South America and
Here TLD consists of two basic structures called LSTM network and China [58]. Currently, all operating, under construction, and planned
DNN network. The LSTM network contains two LSTM units, which are solar thermal power stations in the world are almost located in these
used to extract the context features of time series. And the DNN network countries and regions. Spain and the United States, with the largest solar
thermal power capacity in the world, were the earliest to develop the
Fig. 6. (a)The categories of meteorological factors related to solar radiation. (b) The subnet for extracting solar features.
5
T. Han et al. Renewable Energy 224 (2024) 120138
CSP plants. India, Morocco, South Africa and Chile have developed later, datasets (Tibet dataset and Qinghai dataset) based on the collected data
with currently operating CSP capacity being relatively small. However, provided by our partner, National Power Investment Corporation. The
they are increasing their CSP generation capacity, with several solar details of the datasets are shown in Table 1.
thermal power projects currently under construction. Furthermore,
many countries have announced the development of CSP projects, such 4.2. The comparison models and standard metrics
as China, which, although taking a relatively late step, whose CSP power
generating capacity under construction is in a leading position in the To fully validate TLD, we divided the comparison models into four
world. categories: statistical methods, machine learning methods, deep
According to statistics provided by the International Renewable learning methods, and ensemble learning methods. The Ridge regression
Energy Agency (IRENA) at the end of December 2022 [59], Spain ranked model (Ridge) is the representation of statistical methods [61]. The
first globally with a total installed capacity of 23.0 GW for solar thermal Random Forest (RF) model, which has shown great effectiveness in solar
power stations, accounting for nearly half of the global solar thermal radiation prediction, is the representation of machine learning methods
power generation. The United States ranked second with a total installed [62]. The recently proposed deep learning models for solar radiation
capacity of 17.7 GW. Other countries with significant CSP capacity prediction called C_GRU and BiLSTM are the representation of deep
include India, South Africa, United Arab Emirates, Algeria, and learning methods [63,64]. The Gradient Boosted Regression Tree
Morocco. To study solar radiation and its potential for CSP generation, (GBRT) is the representation of ensemble learning methods [65].
the datasets were downloaded from the US National Solar Radiation Additionally, to verify the effectiveness of topological features in DNI
Database (NSRDB). The NSRDB is a serially complete collection of prediction, we conducted ablation experiments named GBRT + t, GBRT
hourly and half-hourly values of meteorological data [60]. The collected + f, and GBRT + ft based on the GBRT model, for which we added
datasets contain minute-level data from 8 regions, including Chile, the temporal topology features, spatial topology features, and
United States, Spain, South Africa, India, Morocco, Qinghai and Tibet in spatio-temporal topology features together with the original time series
China. These regions represent the working environment of more than as input data respectively.
90% of the world’s CSP stations, thus conducting DNI research on these To evaluate the performance of TLD and the baseline models, three
datasets has practical significance. Besides, In order to obtain more effective statistical metrics were used [66]. Let the original measured
comprehensive datasets, we have made data integration for the two sequence be denoted as Sm = {sm m m
1 , s2 , ..., sN } and the predicted sequence
Table 1
The details of the datasets.
Position Latitude Longitude Time_scale Time_span
6
T. Han et al. Renewable Energy 224 (2024) 120138
Table 2
The prediction performance of different models on the whole dataset. The prediction results undergo paired sample t-tests.† means that the prediction performance of
the comparison models is significantly worse than that of TLD. ◊ means that the prediction performance of TLD is significantly better than that of more than half of the
comparison models. The paired sample t-tests are used to calculate the significance of the predicted results.
Ridge RF C_GRU BiLSTM GBRT GBRT + t GBRT + f GBRT + ft TLD
Chile
nMAE 0.3501† 0.1046 0.1207† 0.1232† 0.1381† 0.1386† 0.1420† 0.1412† 0.1023◊
nRMSE 0.5500† 0.2683† 0.2960† 0.3065† 0.3052† 0.3009† 0.3085† 0.3031† 0.2454◊
R 0.8192† 0.9569† 0.9476† 0.9438† 0.9443† 0.9458† 0.9431† 0.9450† 0.9640◊
USA
nMAE 0.3284† 0.1221† 0.1360† 0.1134 0.1437† 0.1446† 0.1476† 0.1487† 0.1029◊
nRMSE 0.5260† 0.3000† 0.3239† 0.2754† 0.3227† 0.3221† 0.3273† 0.3261† 0.2429◊
R 0.8130† 0.9391† 0.9290† 0.944†2 0.9296† 0.9298† 0.9275† 0.9281† 0.9566◊
Spain
nMAE 0.4303† 0.1920† 0.1596† 0.1725† 0.2204† 0.2232† 0.2273† 0.2238† 0.1550◊
nRMSE 0.6532† 0.4338† 0.3490† 0.3655† 0.4482† 0.4490† 0.4534† 0.4508† 0.3346◊
R 0.7714† 0.8992† 0.9235 0.9161† 0.8924† 0.8920† 0.8899† 0.8911† 0.9297◊
SouthAfrica
nMAE 0.3176† 0.1292 0.1484† 0.1620† 0.1466† 0.1508† 0.1513† 0.1499† 0.1276◊
nRMSE 0.4974† 0.3188† 0.3386† 0.3807† 0.3227† 0.3236† 0.3268† 0.3236† 0.3004◊
R 0.8345† 0.9320† 0.9270† 0.9077† 0.9303† 0.9299† 0.9285† 0.9300† 0.9396◊
India
nMAE 0.4776† 0.1754 0.1822 0.2018† 0.2245† 0.2161† 0.2291† 0.2184† 0.1880◊
nRMSE 0.6826† 0.3779 0.3800 0.4239† 0.4155† 0.4036 0.4225† 0.4053 0.3939◊
R 0.7444† 0.9216 0.9208 0.9014† 0.9053† 0.9106† 0.9020† 0.9099† 0.9149◊
Morocco
nMAE 0.3962† 0.1620 0.1780† 0.1887† 0.1915† 0.1917† 0.1974† 0.1981† 0.1619◊
nRMSE 0.5960† 0.3507† 0.3789† 0.4029† 0.3734† 0.3727† 0.3788† 0.3777† 0.3416◊
R 0.7862† 0.9259† 0.9183 0.9076† 0.9160† 0.9163† 0.9136† 0.9141† 0.9297◊
Tibet
nMAE 0.4534† 0.2133† 0.2172† 0.2255† 0.2319† 0.2348† 0.2405† 0.2379† 0.2012◊
nRMSE 0.6781† 0.4460† 0.4584† 0.4752† 0.4508† 0.4529† 0.4604† 0.4567† 0.4209◊
R 0.7375† 0.8864 0.8868 0.8784† 0.8839† 0.8829† 0.8789† 0.8809† 0.8988◊
Qinghai
nMAE 0.4567† 0.2097 0.2487† 0.2533† 0.2391† 0.2346 0.2447† 0.2373† 0.2001◊
nRMSE 0.6888† 0.4148† 0.4905† 0.5002† 0.4397† 0.4310† 0.4473† 0.4338† 0.3984◊
R 0.7392† 0.9054† 0.8764† 0.8715† 0.8937† 0.8979† 0.8900† 0.8965† 0.9127◊
7
T. Han et al. Renewable Energy 224 (2024) 120138
Table 3 Table 4
The prediction performance of TLD considering the seasonality. The statistical analysis of different datasets.
TLD Spring Summer Autumn Winter Whole Spring (ratio/ Kt < 0.25 0.25<Kt < Kt > 0.9 Total
number) 0.9 number
Chile
nMAE 0.0988 0.1694 0.1005 0.0759 0.1023 Chile 0.0882 0.0701 0.8416 39,744
nRMSE 0.2485 0.4288 0.2305 0.1681 0.2454 (3508) (2787) (33,449)
R 0.9658 0.9361 0.9611 0.9714 0.9640 USA 0.0556 0.1217 0.8226 39,744
(2210) (4840) (32,694)
USA
Spain 0.1413 0.2010 0.6575 26,496
nMAE 0.1181 0.1110 0.0893 0.1433 0.1029
(3746) (5328) (17,422)
nRMSE 0.2742 0.2292 0.2261 0.3602 0.2429
SouthAfrica 0.0775 0.1311 0.7913 26,496
R 0.9424 0.9507 0.9656 0.9409 0.9566
(2054) (3475) (20,967)
Spain India 0.0109 (578) 0.1066 0.8824 52,992
nMAE 0.2326 0.1433 0.2122 0.2649 0.1550 (5650) (46,764)
nRMSE 0.4531 0.2869 0.4841 0.6238 0.3346 Morocco 0.0475 0.1517 0.8007 26,496
R 0.8951 0.9286 0.9027 0.8786 0.9297 (1259) (4021) (21,216)
Tibet 0.1220 0.2840 0.5939 52,992
SouthAfrica
(6466) (15,053) (31,473)
nMAE 0.1656 0.1005 0.1439 0.1802 0.1276
Qinghai 0.0872 0.2464 0.6663 52,992
nRMSE 0.3881 0.2425 0.3149 0.3929 0.3004
(4622) (13,059) (35,311)
R 0.9170 0.9637 0.9300 0.9031 0.9396
Summer (ratio/number)
India
Chile 0.1665 0.0926 0.7407 39,744
nMAE 0.1457 0.2738 0.1722 0.1592 0.1880
(6620) (3684) (29,440)
nRMSE 0.3056 0.6010 0.3701 0.3378 0.3939
USA 0.0384 0.0993 0.8621 39,744
R 0.9387 0.8356 0.9287 0.9406 0.9149
(1528) (3949) (34,267)
Morocco Spain 0.0391 0.1186 0.8421 26,496
nMAE 0.1528 0.1765 0.1785 0.1568 0.1619 (1037) (3145) (22,314)
nRMSE 0.3055 0.3364 0.3942 0.3641 0.3416 SouthAfrica 0.0277 (736) 0.0457 0.9265 26,496
R 0.9352 0.9245 0.9209 0.9333 0.9297 (1211) (24,549)
India 0.0983 0.3714 0.5301 52,992
Tibet (5212) (19,684) (28,096)
nMAE 0.2518 0.3040 0.1445 0.2029 0.2012 Morocco 0.0655 0.1623 0.7721 26,496
nRMSE 0.4986 0.5991 0.3253 0.4161 0.4209 (1737) (4301) (20,458)
R 0.8647 0.8246 0.9338 0.9165 0.8988 Tibet 0.2103 0.2696 0.5200 52,992
Qinghai (11,145) (14,289) (27,558)
nMAE 0.2097 0.2373 0.1750 0.2122 0.2001 Qinghai 0.1964 0.2102 0.5933 52,992
nRMSE 0.4034 0.4614 0.3620 0.4530 0.3984 (21,158) (11,140) (31,443)
R 0.9029 0.8925 0.9308 0.9063 0.9127 Autumn (ratio/number)
Chile 0.0793 0.0865 0.8341 39,312
(3120) (3401) (32,791)
three categories based on their degree of overcast weather. In general, USA 0.0257 0.0641 0.9101 39,312
the clear-sky index (k) was used to determine how much the sun is (1013) (2521) (35,778)
blocked by cloud cover. Spain 0.1234 0.1604 0.7160 26,208
(3235) (4206) (18,767)
Im SouthAfrica 0.0637 0.1291 0.8071 26,208
k= (7) (1670) (3385) (21,153)
Iclr
India 0.0303 0.1301 0.8394 52,416
(1592) (6823) (44,001)
where Im is the measured DNI value and Iclr is the clear-sky DNI value
Morocco 0.0837 0.1585 0.7577 26,208
obtained from clear-sky model. Therefore, we constructed an indicator (2195) (4155) (19,858)
called Kt to measure the degree of clear sky in multidimensional time Tibet 0.0499 0.1720 0.7780 52,416
series samples. It is defined as follows: (2619) (9017) (40,780)
Qinghai 0.0618 0.1832 0.7548 52,416
1 ∑N (3243) (9605) (39,568)
Kt = ki (8)
N i=1 Winter (ratio/number)
Chile 0.0294 0.0620 0.9085 38,869
(1145) (2410) (35,314)
here N is the length of the time window, ki is the value of clear-sky index
USA 0.0900 0.0963 0.8135 38,869
in the ith time point. Kt represents the average clear sky degree of each (3500) (3746) (31,623)
sample. The higher Kt is, the better the weather condition is. We divided Spain 0.1414 0.1514 0.7070 25,909
all samples into three categories based on their Kt value, the samples (3666) (3924) (18,319)
SouthAfrica 0.1407 0.1580 0.7012 25,909
with the value ranged from Kt < 0.25, 0.25 < Kt < 0.9 and 0.9 < Kt
(3646) (4095) (18,168)
representing extremely cloudy weather, relatively cloudy weather and India 0.0164 (854) 0.0824 0.9010 51,829
sunny weather respectively. (4273) (46,702)
The statistical analysis has found that the proportion of cloudy Morocco 0.0588 0.1167 0.8244 25,909
weather days in the India summer dataset (0.4697) and Tibet summer (1525) (3024) (21,360)
Tibet 0.0686 0.2456 0.6856 51,829
dataset (0.4799) are far higher than that of other datasets (Table .4). We (3560) (12,732) (35,537)
speculated that the predicted performance of the model is related to the Qinghai 0.0556 0.2215 0.7228 51,829
proportion of cloudy days in different seasons. This means as the pro (2884) (11,481) (37,464)
portion of cloudy samples increases, the predictive ability of the TLD
would decrease.
To further compare the predictive performance of different models
under different weather conditions, we applied TLD for DNI prediction
on three weather conditions sample sets of eight datasets comparing
8
T. Han et al. Renewable Energy 224 (2024) 120138
Qinghai
nMAE 0.3399 0.3203 0.3118 0.3388 0.2424
Table 5 nRMSE 0.4525 0.4397 0.4294 0.4759 0.3424
The prediction performance of different models on the extremely cloudy R 0.6009 0.6232 0.6068 0.5586 0.7125
weather conditions of different datasets.
Kt < 0.25 GBRT RF C_GRU BiLSTM TLD
Chile where Δt is the forecast horizon, ̂Fp (t +Δt) is the persistent prediction.
nMAE 1.8661 1.1802 1.3308 1.0446 0.9611 The framework of TLDP is shown in Fig. 8.
nRMSE 2.8290 2.0761 2.6002 2.2285 1.8930 From Fig. 8 we can see that, the prediction of TLDP was the linear
R − 0.6227 0.1260 − 0.3708 − 0.0069 0.2733
weighted combination of TLD prediction and smart persistence model
USA prediction. In TLDP, we had to train the TLD model firstly to get the
nMAE 1.3290 1.0932 0.9772 1.0656 1.2207 prediction results in the training set. And than we needed to train a
nRMSE 1.9896 1.8000 1.7330 1.9704 2.0121
R − 0.1449 0.0628 0.1312 − 0.1230 − 0.2329
linear regression model using the obtained prediction results of TLD and
prediction results of smart persistence model to get weight coefficients
Spain
using gradient descent algorithm. The experimental results indicated
nMAE 1.3997 1.2536 1.1048 1.0332 0.8552
nRMSE 2.0601 1.9548 1.8870 2.1131 1.6256 that the prediction performance of TLDP is much better than that of TLD
R − 0.0764 0.0308 0.0969 0.1137 0.3895 in the eight datasets (Table .8). TLDP not only greatly improved its
SouthAfrica
predictive ability of data under sunny conditions, but also improved its
nMAE 1.5787 1.2330 1.0085 1.0022 1.0104 predictive ability under cloudy conditions, which may lay a solid
nRMSE 2.4101 2.1317 1.9201 1.9333 1.9956 foundation for real-time and accurate prediction of short-term DNI.
R 0.1840 0.0737 0.1827 0.1714 0.1882 From Table .8 we can see that, the linear regression coefficients (LR
India coefficient) of TLD prediction and smart persistence model prediction in
nMAE 0.6204 0.2441 0.3666 0.4024 0.3297 TLDP are quite different in DNI prediction task of different datasets. For
nRMSE 1.0891 0.7617 0.8016 0.9303 0.7968
instance, the coefficient ratio (TLD/persistence) is 0.3318/0.6508 for
R 0.3870 0.7001 0.6679 0.5528 0.6718
Qinghai dataset, while that is 0.4924/0.4953 for South Africa dataset. It
Morocco is indicated that the proposed algorithm can adapt to data with different
nMAE 1.2423 1.0176 1.0769 0.8640 0.9112
nRMSE 1.9382 1.7515 1.8865 1.6737 1.6675
distributions, that means TLDP is more adaptive than the other
R − 0.0068 0.1778 0.0461 0.3072 0.2547 algorithms.
Tibet
nMAE 0.9826 0.8753 0.8070 0.7200 0.7405 5. Conclusion
nRMSE 1.3741 1.3014 1.2520 1.2379 1.1824
R − 0.0617 0.0476 0.1186 0.1883 0.2595 In this study, we have proposed an intra-hour DNI prediction method
Qinghai called TLD based on TDA, LSTM and DNN to improve short-term DNI
nMAE 1.3852 1.1625 1.1600 1.0150 0.8812 prediction accuracy by capturing the stochastic features in multidi
nRMSE 1.9742 1.7863 1.8187 1.7369 1.5132 mensional meteorological time series. We incorporated spatial and
R − 0.2068 0.0119 − 0.0218 0.0659 0.3237
temporal topological features obtained by TDA into the model
9
T. Han et al. Renewable Energy 224 (2024) 120138
Table 7 construction to fully take into account the physical significance of the
The prediction performance of different models on the sunny weather conditions input variables and align the computational process of the model with
of different datasets. human cognitive processes to improve the forecast performance. The
Kt > 0.9 GBRT RF C_GRU BiLSTM TLD ablation experiments based on GBRT highlighted the impact of correct
Chile
use of topological features on model prediction performance. Compar
nMAE 0.0697 0.0499 0.0654 0.0740 0.0604 ative experiments showed that TLD outperformed other models in pre
nRMSE 0.1506 0.1359 0.1498 0.1675 0.1433 dicting short-term DNI in 7 datasets, except the Indian dataset. The
R 0.9861 0.9886 0.9862 0.9828 0.9874 proportion of cloudy days in the dataset was found to be a key factor
USA affecting the performance of the model. However, TLD still performed
nMAE 0.0691 0.0553 0.0717 0.0697 0.0729 better than other models under cloudy conditions. For Indian dataset,
nRMSE 0.1510 0.1427 0.1601 0.1644 0.1761 TLD’s poor performance under sunny conditions was discovered.
R 0.9834 0.9851 0.9813 0.9803 0.9796
To address this issue, we developed a hybrid model named TLDP that
Spain combines TLD and a smart persistence model to enhance the perfor
nMAE 0.1112 0.0915 0.1002 0.1133 0.1230
mance of the model, especially under sunny conditions. The smart
nRMSE 0.2386 0.2274 0.2359 0.2865 0.2907
R 0.9660 0.9691 0.9668 0.9584 0.9617 persistence model is a physical model that considers only current de
viations from clear sky conditions. TLDP’s adaptive learning ability can
SouthAfrica
nMAE 0.0892 0.0716 0.0901 0.0929 0.0792
balance the proportion of their prediction results according to data
nRMSE 0.2005 0.1915 0.2173 0.2291 0.1912 distribution. Experimental results demonstrated that TLDP not only
R 0.9728 0.9752 0.9695 0.9661 0.9753 greatly improved DNI prediction performance under sunny conditions,
India but also under cloudy conditions, which owed to the topological features
nMAE 0.1665 0.1309 0.1409 0.1497 0.1454 obtained by TDA and the well-designed deep neural network structure.
nRMSE 0.3450 0.3121 0.3131 0.3507 0.3348 TLDP has the real-time prediction capability of DNI, that may lay a
R 0.9480 0.9575 0.9572 0.9463 0.9511
foundation for more economical and stable operation of CSP plants.
Morocco However, in this study, TLD and TLDP have been proven to be effective
nMAE 0.1173 0.0951 0.1041 0.1158 0.1052 in predicting DNI in arid desert scenario. How to make improvment on
nRMSE 0.2486 0.2340 0.2466 0.2779 0.2396
the basis of TLDP to adapt to more diverse application scenarios is the
R 0.9639 0.9680 0.9644 0.9586 0.9664
next research direction.
Tibet
nMAE 0.1163 0.0895 0.0938 0.1028 0.0848
nRMSE 0.2996 0.2795 0.2744 0.2910 0.2602 Fundings
R 0.9640 0.9687 0.9698 0.9646 0.9717
Qinghai
Scientific Research Plan Projects of Shaanxi Education Department,
nMAE 0.1344 0.1105 0.1307 0.1260 0.1077 Grant/Award Numbers: 22JE010; Key Research and Development Pro
nRMSE 0.2977 0.2756 0.3324 0.2975 0.2940 gram of Shaanxi, Grant/Award Numbers: 2022GY-186; Innovation
R 0.9620 0.9675 0.9610 0.9621 0.9780 Capability Support Program of Shaanxi, Grant/Award Numbers:
10
T. Han et al. Renewable Energy 224 (2024) 120138
Table 8
The prediction performance of different models on the relatively cloudy weather conditions of different datasets.† means that the prediction performance of TLDP is
significantly better than that of TLD. The paired sample t-tests are used to calculate the significance of the predicted results.
TLD TLDP Kt < 0.25 (TLDP/TLD) 0.25<Kt < 0.9 (TLDP/TLD) Kt > 0.9 (TLDP/TLD) LR coefficient (TLD/Per)
Chile
nMAE 0.1023 0.0636 0.6551/0.9611† 0.2175/0.2921† 0.0319/0.0604† 0.2718/0.7209
nRMSE 0.2454 0.2084 1.5277/1.8930† 0.3866/0.4312† 0.1153/0.1433†
R 0.9640 0.9740 0.5267/0.2733† 0.7150/0.6454† 0.9918/0.9874†
USA
nMAE 0.1029 0.0693 0.7133/1.2207† 0.2734/0.3433† 0.0377/0.0729† 0.3211/0.6646
nRMSE 0.2429 0.2108 1.4314/2.0121† 0.4602/0.4946† 0.1204/0.1761†
R 0.9566 0.9673 0.4072/-0.2329† 0.6317/0.6027† 0.9894/0.9796†
Spain
nMAE 0.1550 0.1068 0.7327/0.8552† 0.2453/0.3295† 0.0669/0.1230† 0.4095/0.5777
nRMSE 0.3346 0.2922 1.4128/1.6256† 0.4410/0.4889† 0.1988/0.2907†
R 0.9297 0.9464 0.4937/0.3895† 0.6970/0.6878† 0.9764/0.9617†
SouthAfrica
nMAE 0.1276 0.1076 0.8273/1.0104† 0.3719/0.4299† 0.0551/0.0792† 0.4924/0.4953
nRMSE 0.3004 0.2892 1.7500/1.9956† 0.6020/0.6095† 0.1679/0.1912†
R 0.9396 0.9467 0.3757/0.1882† 0.5446/0.5331† 0.9809/0.9753†
India
nMAE 0.1880 0.0842 0.1791/0.3297† 0.1629/0.2941† 0.0528/0.1454† 0.2541/0.7361
nRMSE 0.3939 0.2560 0.6997/0.7968† 0.3647/0.4661† 0.1586/0.3348†
R 0.9149 0.9640 0.7469/0.6718† 0.7807/0.6418† 0.9890/0.9511†
Morocco
nMAE 0.1619 0.1130 0.6106/0.9112† 0.2756/0.3557† 0.0613/0.1052† 0.3210/0.6671
nRMSE 0.3416 0.3003 1.3413/1.6675† 0.4754/0.5287† 0.1830/0.2396†
R 0.9297 0.9486 0.5550/0.2547 0.7080/0.6698† 0.9820/0.9664†
Tibet
nMAE 0.2012 0.1702 0.6153/0.7405† 0.2960/0.3172† 0.0737/0.0848† 0.4849/0.4978
nRMSE 0.4209 0.4028 1.0697/1.1824† 0.4721/0.4578 0.2496/0.2602†
R 0.8988 0.9073 0.3566/0.2595† 0.6174/0.6593 0.9750/0.9717†
Qinghai
nMAE 0.2001 0.1446 0.6435/0.8812† 0.2303/0.2424† 0.0834/0.1077† 0.3318/0.6508
nRMSE 0.3984 0.3345 1.2787/1.5132† 0.3701/0.3424 0.2450/0.2940†
R 0.9127 0.9385 0.4948/0.3237† 0.7078/0.7125 0.9788/0.9780†
11
T. Han et al. Renewable Energy 224 (2024) 120138
[14] Benjamin Kurtz, Kleissl Jan, Measuring diffuse, direct, and global irradiance using [41] Jessica L. Nielson, et al., Topological data analysis for discovery in preclinical
a sky imager, Sol. Energy 141 (2017) 311–322. spinal cord injury and traumatic brain injury, Nat. Commun. 6 (1) (2015) 8581.
[15] Hsu-Yung Cheng, Cloud tracking using clusters of feature points for accurate solar [42] Alexander D. Smith, Paweł Dłotko, Victor M. Zavala, Topological data analysis:
irradiance nowcasting, Renew. Energy 104 (2017) 281–289. concepts, computation, and applications in chemical engineering, Comput. Chem.
[16] Huaizhi Wang, et al., A review of deep learning for renewable energy forecasting, Eng. 146 (2021) 107202.
Energy Convers. Manag. 198 (2019) 111799. [43] Gunnar Carlsson, Topological methods for data modelling, Nature Reviews Physics
[17] Georg A. Grell, Saulo R. Freitas, A scale and aerosol aware stochastic convective 2 (12) (2020) 697–708.
parameterization for weather and air quality modeling, Atmos. Chem. Phys. 14 [44] Shusen Liu, et al., Visualizing high-dimensional data: advances in the past decade,
(10) (2014) 5233–5250. IEEE Trans. Visual. Comput. Graph. 23 (3) (2016) 1249–1268.
[18] Zhiyong Wu, et al., Environmental impacts of large-scale CSP plants in [45] Stefan Huber, Persistent homology in data science, in: Data Science–Analytics and
northwestern China, Environmental Science: Process. Impacts 16 (10) (2014) Applications: Proceedings of the 3rd International Data Science
2432–2441. Conference–iDSC2020, Springer Fachmedien Wiesbaden, 2021, pp. 81–88.
[19] Philippa Roddis, et al., What shapes community acceptance of large-scale solar [46] Letscher Edelsbrunner, Zomorodian, Topological persistence and simplification,
farms? A case study of the UK’s first ‘nationally significant’solar farm, Sol. Energy Discrete Comput. Geom. 28 (2002) 511–533.
209 (2020) 235–244. [47] David Cohen-Steiner, Edelsbrunner Herbert, John Harer, Stability of persistence
[20] Juan Du, et al., Short-term solar irradiance forecasts using sky images and radiative diagrams, in: Proceedings of the Twenty-First Annual Symposium on
transfer model, Energies 11 (5) (2018) 1107. Computational Geometry, 2005, pp. 263–271.
[21] Joaquın Alonso-Montesinos, Francisco Javier Batlles, Solar radiation forecasting in [48] Satya Deo, Algebraic Topology, Springer Singapore, 2018.
the short-and medium-term under all sky conditions, Energy 83 (2015) 387–393. [49] Peter Bubenik, Statistical topological data analysis using persistence landscapes,
[22] Tsai Chu, et al., Estimation of solar irradiance and solar power based on all-sky J. Mach. Learn. Res. 16 (1) (2015) 77–102.
images, Sol. Energy 249 (2023) 495–506. [50] Tao Song, et al., A deep learning method with merged LSTM neural networks for
[23] Bijan Nouri, et al., Probabilistic solar nowcasting based on all-sky imagers, Sol. SSHA prediction, IEEE J. Sel. Top. Appl. Earth Obs. Rem. Sens. 13 (2020)
Energy 253 (2023) 285–307. 2853–2860.
[24] Quentin Paletta, Guillaume Arbod, Joan Lasenby, Omnivision forecasting: [51] Zheng Chu, Jiong Yu, Askar Hamdulla, LPG-model: a novel model for throughput
combining satellite and sky images for improved deterministic and probabilistic prediction in stream processing, using a light gradient boosting machine,
intra-hour solar energy predictions, Appl. Energy 336 (2023) 120818. incremental principal component analysis, and deep gated recurrent unit network,
[25] M. Caldas, R. Alonso-Suárez, Very short-term solar irradiance forecast using all-sky Inf. Sci. 535 (2020) 107–129.
imaging and real-time irradiance measurements, Renew. Energy 143 (2019) [52] Kamilya Smagulova, Alex Pappachen James, A survey on LSTM memristive neural
1643–1658. network architectures and applications, The European Physical Journal Special
[26] Bijan Nouri, et al., A hybrid solar irradiance nowcasting approach: combining all Topics 228 (10) (2019) 2313–2324.
sky imager systems and persistence irradiance models for increased accuracy, Sol. [53] Farah Shahid, Aneela Zameer, Muhammad Muneeb, Predictions for COVID-19 with
RRL 6 (5) (2022) 2100442. deep learning models of LSTM, GRU and Bi-LSTM. Chaos, Solitons & Fractals 140
[27] Huang, et al., A 3D ConvLSTM-CNN network based on multi-channel color (2020) 110212.
extraction for ultra-short-term solar irradiance forecasting, Energy 272 (2023) [54] Weicong Kong, et al., Short-term residential load forecasting based on LSTM
127140. recurrent neural network, IEEE Trans. Smart Grid 10 (1) (2017) 841–851.
[28] Dhivya Sampath Kumar, et al., Solar irradiance resource and forecasting: a [55] Pratima Kumari, Durga Toshniwal, Deep learning models for solar irradiance
comprehensive review, IET Renew. Power Gener. 14 (10) (2020) 1641–1656. forecasting: a comprehensive review, J. Clean. Prod. 318 (2021) 128566.
[29] Mohtasin Golam, et al., A long short-term memory-based solar irradiance [56] Tian Han, et al., A deep leaming model with multi-scale skip connections for solar
prediction scheme using meteorological data, Geosci. Rem. Sens. Lett. IEEE 19 flare prediction combined with prior information, in: 2019 IEEE International
(2021) 1–5. Conference on Big Data (Big Data), IEEE, 2019, pp. 5829–5835.
[30] Alberto Dolara, Sonia Leva, Giampaolo Manzolini, Comparison of different [57] Furkan Dincer, The analysis on photovoltaic electricity generation status, potential
physical models for PV power output prediction, Sol. Energy 119 (2015) 83–99. and policies of the leading countries in solar energy, Renewable and sustainable
[31] Ines Sansa, Zina Boussaada, Najiba Mrabet Bellaaj, Solar radiation prediction using energy reviews 15 (1) (2011) 713–720.
a novel hybrid model of ARMA and NARX, Energies 14 (21) (2021) 6920. [58] Li Li, et al., Review and outlook on the international renewable energy
[32] R. Meenal, A. Immanuel Selvakumar, Assessment of SVM, empirical and ANN development, Energy and Built Environment 3 (2) (2022) 139–157.
based solar radiation prediction models with most influencing input parameters, [59] Md Abdullah-Al-Mahbub, Abu Reza Md Towfiqul Islam, Current status of running
Renew. Energy 121 (2018) 324–343. renewable energy in Bangladesh and future prospect, A global comparison (2023).
[33] Zhihong Pang, Fuxin Niu, O’Neill Zheng, Solar radiation prediction using recurrent [60] Grant Buster, et al., Physics-guided machine learning for improved accuracy of the
neural network and artificial neural network: a case study with comparisons, national solar radiation Database, Sol. Energy 232 (2022) 483–492.
Renew. Energy 156 (2020) 279–289. [61] Tatiane C. Carneiro, et al., Ridge regression ensemble of machine learning models
[34] Pratima Kumari, Durga Toshniwal, Long short term memory–convolutional neural applied to solar and wind forecasting in Brazil and Spain, Appl. Energy 314 (2022)
network based deep hybrid approach for solar irradiance forecasting, Appl. Energy 118936.
295 (2021) 117061. [62] Ramendra Prasad, et al., Designing a multi-stage multivariate empirical mode
[35] ANM Fahim Faisal, et al., Neural networks based multivariate time series decomposition coupled with ant colony optimization and random forest model to
forecasting of solar radiation using meteorological data of different cities of forecast monthly solar radiation, Appl. Energy 236 (2019) 778–792.
Bangladesh, Results in Engineering 13 (2022) 100365. [63] Qing Li, et al., A Multi-step ahead photovoltaic power forecasting model based on
[36] Anil Kumar, Yashwant Kashyap, Panagiotis Kosmopoulos, Enhancing solar energy TimeGAN, Soft DTW-based K-medoids clustering, and a CNN-GRU hybrid neural
forecast using multi-column convolutional neural network and multipoint time network, Energy Rep. 8 (2022) 10346–10362.
series approach, Rem. Sens. 15 (1) (2022) 107. [64] Tian Peng, et al., An integrated framework of Bi-directional long-short term
[37] Ankit Bhatt, Weerakorn Ongsakul, Jai Govind Singh, Sliding window approach memory (BiLSTM) based on sine cosine algorithm for hourly solar radiation
with first-order differencing for very short-term solar irradiance forecasting using forecasting, Energy 221 (2021) 119887.
deep learning models, Sustain. Energy Technol. Assessments 50 (2022) 101864. [65] Georgios Mitrentsis, Hendrik Lens, An interpretable probabilistic model for short-
[38] Larry Wasserman, Topological data analysis, Annual Review of Statistics and Its term solar power forecasting using natural gradient boosting, Appl. Energy 309
Application 5 (2018) 501–532. (2022) 118473.
[39] Yara Skaf, Reinhard Laubenbacher, Topological data analysis in biomedicine: a [66] Ricardo Marquez, Carlos FM. Coimbra, Proposed metric for evaluation of solar
review, J. Biomed. Inf. 130 (2022) 104082. forecasting models, J. Sol. Energy Eng. 135 (1) (2013) 011016.
[40] Guillaume Tauzin, et al., giotto-tda: a topological data analysis toolkit for machine [67] Dazhi Yang, A guideline to solar forecasting research practice: reproducible,
learning and data exploration, J. Mach. Learn. Res. 22 (1) (2021) 1834–1839. operational, probabilistic or physically-based, ensemble, and skill (ROPES),
J. Renew. Sustain. Energy 11 (2) (2019).
12