Dynamic Ensemble Deep Echo State Network For Significant Wave Height Forecasting
Dynamic Ensemble Deep Echo State Network For Significant Wave Height Forecasting
Applied Energy
journal homepage: www.elsevier.com/locate/apenergy
Dynamic ensemble deep echo state network for significant wave height
forecasting
Ruobin Gao a , Ruilin Li b , Minghui Hu b , Ponnuthurai Nagaratnam Suganthan b,c , Kum Fai Yuen a ,∗
a
School of Civil and Environmental Engineering, Nanyang Technological University, Singapore
b
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
c
KINDI Center for Computing Research, College of Engineering, Qatar University, Doha, Qatar
Keywords: Forecasts of the wave heights can assist in the data-driven control of wave energy systems. However, the
Forecasting dynamic properties and extreme fluctuations of the historical observations pose challenges to the construction
Machine learning of forecasting models. This paper proposes a novel dynamic ensemble deep Echo state networks (ESN) to learn
Deep learning
the dynamic characteristics of the significant wave height. The dynamic ensemble ESN creates a profound
Randomized neural networks
representation of the input and trains an independent readout module for each reservoir. To begin, numerous
Echo state network
reservoir layers are built in a hierarchical order, adopting a reservoir pruning approach to filter out the poorer
representations. Finally, a dynamic ensemble block is used to integrate the forecasts of all readout layers. The
suggested model has been tested on twelve available datasets and statistically outperforms state-of-the-art
approaches.
1. Introduction (SVR) [11], artificial neural networks (ANN) [12,13], extreme learning
machine (ELM) [14] etc. However, the existing methods do not fully
The consumption of non-renewable energy resources poses severe utilize deep representations and ensemble learning [15].
challenges in maintaining the rising global temperature and carbon The algorithmic learning of temporal patterns and intricate interac-
dioxide levels, prompting the development of renewable energy sources tions among numerous variables, such as wind direction, wind speed,
such as wind [1], solar [2], and ocean energy [3]. In comparison to gust speed, wave periods, and wave heights at previous time steps,
wind energy, wave energy has a higher level of assurance [4]. However, is complicated due to the dynamic and chaotic qualities of significant
the dynamic characteristics of the waves affect the reliable and precise wave height. Researchers have devoted to solving this challenging task.
forecasts of wave energy. For instance, an ensemble ELM (EELM) is proposed to reduce the single
One of the most important metrics describing ocean wave con- model’s uncertainty on wave height forecasting [16], but only external
ditions is significant wave height (WVHT). Wave energy is directly networks are considered in the ensemble pool. Genetic algorithm is
and strongly connected with significant wave height. Hence, precise implemented to select the most suitable input features for the ELM [17].
forecasts of significant wave height can provide solid recommenda- A hybrid evolutionary Takagi–Sugeno–Kant fuzzy system is proposed
tions for electricity generation. Numerical wave models are one way to forecast the significant wave height in a buoy of California’s West
to estimate wave parameters, such as simulating waves nearshore
Coast [18]. Besides generating one-step ahead forecast, multitask learn-
(SWAN) [5,6]. The numerical wave propagation models can generate
ing technique is employed to help the neural network forecast multiple
forecasts over the studied computation grid [5]. An alternative strat-
prediction horizons [19]. Ali et al. [20] implemented the covariance-
egy is to train machine learning models which can precisely forecast
weighted least square method to optimize the multiple linear regres-
wave time series in a data-driven fashion. Recently, researchers have
sion model for wave height forecasting. The hyper-parameters of the
found that the machine learning outperforms the numerical methods on
forecasting methods significantly affect the performance. Researchers
wave height forecasting [7,8]. Many researchers are concentrating their
investigate different evolutionary optimizations to automatically de-
efforts on implementing and creating superior data-driven forecast-
ing algorithms for this difficult and necessary task. These data-driven sign the forecasting methods. For instance, genetic algorithm, particle
algorithms include decision trees [9,10], support vector regression swarm optimization (PSO) and cuckoo search algorithm are utilized to
∗ Corresponding author.
E-mail addresses: [email protected] (R. Gao), [email protected] (R. Li), [email protected] (M. Hu), [email protected] (P.N. Suganthan),
[email protected] (K.F. Yuen).
https://doi.org/10.1016/j.apenergy.2022.120261
Received 20 April 2022; Received in revised form 2 October 2022; Accepted 27 October 2022
Available online 10 November 2022
0306-2619/© 2022 Elsevier Ltd. All rights reserved.
R. Gao et al. Applied Energy 329 (2023) 120261
2
R. Gao et al. Applied Energy 329 (2023) 120261
Despite the fact that much effort has been dedicated to wave height can be integrated with the predictive controller. For instance, a novel
forecasting and constructing ESN models, there are several significant predictive controller based on a deep MLP, which is employed to
limitations in the literature. First, there is no research on ensemble forecast short-term wave forces, is proposed to assist in maximizing the
deep learning in the literature on wave height forecasting. Second, energy absorption for the wave energy converter [36].
the random nature of each reservoir may deteriorate the performance, A critical step of constructing the forecasting method is the hyper-
especially when the network becomes deep. Third, relying on only parameter optimization [37]. A proper selection of the hyper-
one readout layer using the global states is prone to overfit, and parameters assists in achieving precise forecasts. The most common
the massive dimension of the global states may not ensure a reliable practice is to utilize the grid search [35]. However, grid search relies
solution. Therefore, a novel ensemble deep ESN (edESN) is presented on the fixed choices given by the researchers and cannot explore
to address the aforementioned research gaps, inheriting the benefits of other configurations. Therefore, researchers have suggested to utilize
deep representations and ensemble learning. Unlike the shallow ESN, Bayesian optimization (BO) and evolutionary algorithm to determine
the edESN builds hierarchical reservoirs to enhance the input time the hyper-parameters. For instance, BO is utilized to determine the
series representations. Then, an independent readout layer is trained hyper-parameters of a hybrid ELM whose input is determined by a
to precisely model each scale’s representation. Moreover, a reservoir grouping genetic algorithm. The comparative results demonstrate the
pruning strategy is proposed to filter out the inferior representations. proposed model’s success in predicting the significant wave height
Finally, a dynamic ensemble module is utilized to combine all readout and the wave energy flux [17]. Furthermore, the hybrid ELM, whose
layers’ outputs, including the global readout layer. A detailed com- parameters are determined by BO, selects meaningful features for the
parative experiment on twelve significant wave height time series is prediction models, SVR and ELM. In addition to BO, genetic algorithm,
conducted to demonstrate the proposed novel edESN’s superiority and particle swarm optimization (PSO), and cuckoo search algorithm are
suitability. In conclusion, this work adds to the literature from the utilized to optimize ANN for wave height prediction [21]. The genetic
following perspectives: algorithm cannot only optimize ANNs, but also optimize the parameters
of the clustering part within a neural-fuzzy system [22].
• This paper explores the deep ESN’s forecasting ability on signifi-
In addition to the hyper-parameters optimization, the determina-
cant wave height time series for the first time to the authors’ best
tion of input plays a crucial role in precise forecasts. For instance, a
knowledge.
genetic algorithm is utilized to select features from nearby buoys for
• A novel ensemble deep ESN forecasting model is proposed for
the ELM to estimate the significant wave height [38]. Besides genetic
the significant wave height. The proposed edESN uses ensem-
algorithms, the partial autocorrelation function determines the ELM’s
ble learning to reduce model uncertainty while inheriting the
significant lags. This advanced ELM outperforms deep learning and
representation ability of deep architectures.
conventional machine learning models on peak wave energy period
• A straightforward reservoir pruning strategy is proposed to fil-
forecasting [39]. Besides working on time series data, the SVR trained
ter out the inferior reservoir states and facilitate building deep
on X-band radar images successfully estimates the significant wave
architectures. With the help of the proposed strategy, only the
height and outperforms the MLP [40].
meaningful states are propagated to the deep layers.
Since the patterns within the time series are dynamic, signal decom-
• A dynamic ensemble module considering the evolving properties
position techniques are utilized to de-noise the wave data and extract
of the wave time series is proposed for the edESN. Instead of using
features for the following modeling steps [41–43]. For instance, Duan
a static ensemble module, the dynamic module assigns evolving
et al. [42] implemented the empirical mode decomposition (EMD) to
combination weights to each readout layer based on the recent
extract multi-scale modes fed into the SVR for forecasting. EMD is also
performance.
proven to improve the long short-term memory network’s (LSTM’s)
The following is a breakdown of the paper’s structure. To make performance on the wave height forecasting [44]. Discrete wavelet
this research paper self-sufficient, Section 2 gives the preliminaries transformation is combined with ANN to forecast the wave heights
concerning the ESN and DESN. Section 3 describes the proposed model in India [45]. A hybrid fuzzy system combined with wavelet trans-
in depth. Section 4 delves into the experiments in detail, including formation outperforms ANNs and statistical methods [46]. The im-
descriptions of the data, pre-processing data methods, comparative proved complete ensemble empirical mode decomposition (ICEMD) is
results, and ablation studies. Section 5 presents insights into the ben- conducted to facilitate learning ELMs [41]. Although decomposition
efits and drawbacks of the proposed model, as well as possible future significantly boosts the forecasting accuracy, the data leakage problem
directions. Finally, conclusions are drawn. The algorithm is developed during decomposition is not analyzed and addressed for forecasting.
by the authors in the framework of this publication. The source codes This problem is presently being studied in the literature [47–49].
of the algorithm are available on https://github.com/P-N-Suganthan/
CODES. 3. Methodology
2. Related works This section explains the preliminaries regarding ESN and its deep
variant to make this article self-contained. This section begins with a
Intelligent algorithms have achieved significant success in wave brief introduction of the classical ESN. Then, the DESN, which stacks
heights prediction. For instance Shamshirband et al. [8] compare ELM, many reservoir layers to obtain multi-scale representation, is described.
SVR, and ANN in forecasting the significant wave height. The authors The input to the ESN-based models 𝐮(𝑡) ∈ R𝑁𝑢 is the wave time series
find that the ELM slightly outperforms the other machine learning consisting of five variables, the WVHT, WDIR, WSPD, APD, and GST.
methods. To further improve the ELM’s accuracy in wave height predic- Hence, the input dimension 𝑁𝑢 at time 𝑡 equals five. The model aims to
tion, residual learning is conducted [32]. In addition to the ELM-based estimate the value of WVHT at time 𝑡 + 1. The ESN processes the wave
methods, a fuzzy system is investigated only to utilize meteorological time series step by step until it reaches the end.
variables to estimate the wave energy without spectral wave mea-
surements [33]. Recently, the ensemble of MLP and gradient boosting 3.1. ESN
decision trees is shown to outperform the non-ensemble models [34].
Besides the predictions on continuous values, the wave heights are Randomized neural networks do not train the intermediate layer’s
discretized to formulate an ordinal classification problem, and the connections and only optimize the weights of the output layers utilizing
experimental results show that ordinal approaches outperform nominal closed-form solutions [50]. ESN distinguishes with the randomized
approaches [35]. Additionally, the intelligent forecasting algorithms neural networks because of its recurrent connections [24,51]. The
3
R. Gao et al. Applied Energy 329 (2023) 120261
where 𝐮(𝑡) ∈ R𝑁𝑢 and 𝐱(𝑡 − 1) ∈ R𝑁𝑟 denote the input wave time series where 𝐖1𝑖𝑛 ∈ R𝑁𝑟 ×𝑁𝑢 and 𝐖𝐫 1 ∈ R𝑁𝑟 ×𝑁𝑟 represent the first layer’s input
and reservoir state, respectively, 𝐖𝑖𝑛 ∈ R𝑁𝑟 ×𝑁𝑢 represents the input and recurrent connections.
layer’s connections and 𝐖𝑟 ∈ R𝑁𝑟 ×𝑁𝑟 denotes the recurrent weight The 𝑙th layer’s reservoir state 𝐱(𝑡)𝑙 can be computed by
matrix. The input at time 𝑡, 𝐮(𝑡), consists of the values at 𝑡 of the 𝐱(𝑡)𝑙 = 𝑡𝑎𝑛ℎ(𝐖𝑙𝑖𝑛 𝐱(𝑡)𝑙−1 + 𝐖𝐫 𝑙 𝐱(𝑡 − 1)𝑙 ), (4)
time series of WVHT, WDIR, WSPD, APD, and GST. Hence, the input
𝑙
dimension 𝑁𝑢 equals five. The ESN processes the wave time series step where 𝐖𝑙𝑖𝑛∈ R𝑁𝑟 ×𝑁𝑟 and 𝐖𝐫 ∈ R𝑁𝑟 ×𝑁𝑟 denote the input and recurrent
by step until it reaches the end. weights of the 𝑙th layer.
In general, 𝐖𝑟 is randomly initialized according to the uniform The 𝐖𝑜𝑢𝑡 is the unique part that requires learning faced by the ESN
distribution. The random weights are usually re-scaled to ensure the with a single reservoir layer. The DESN concatenates all hidden layers’
spectral characteristics. The connections in the input layer are also reservoir states to formulate the global states, 𝐱(𝑡) ∈ R𝑁𝑟 𝑁𝐿 , then 𝐲(𝑡) is
created at random from a uniform distribution [−𝑠𝑖𝑛 , 𝑠𝑖𝑛 ], where the 𝑠𝑖𝑛 computed by
represents the input-scaling.
𝐲(𝑡) = 𝐖𝑜𝑢𝑡 𝐱(𝑡), (5)
The ESN calculates the output at time 𝑡 by linearly combining the
reservoir states, where 𝐖𝑜𝑢𝑡 ∈ R𝑁𝑜 ×𝑁𝑟 𝑁𝐿 represents the weights of the readout layer.
4
R. Gao et al. Applied Energy 329 (2023) 120261
Third, there is only one readout layer that uses global representa- layer. The shallow layer’s structures are fixed when they have been
tions, resulting in a high level of prediction uncertainty. All reservoir determined, and the cross-validation is conducted to the next layer.
states are concatenated into a single and massive vector. A matrix Layer-wise cross-validation assigns its own set of hyper-parameters for
inversion of size 𝑁𝑟 𝑁𝐿 × 𝑁𝑟 𝑁𝐿 is necessary to train the DESN’s readout each layer. As a result, each readout layer owns its regularization
connections. Mostly, the matrix inversion of size 𝑁𝑟 𝑁𝐿 ×𝑁𝑟 𝑁𝐿 requires strength, enabling the edESN to learn a variety of accurate readout
a 𝑂((𝑁𝑟 𝑁𝐿 )3 ) time and 𝑂((𝑁𝑟 𝑁𝐿 )2 ) memory [53]. The inversion of a layers. The training algorithm is presented in Algorithm 1.
huge matrix places a significant strain on the hardware’s memory and
may result in an out-of-memory error. As a result, an unique framework
Algorithm 1: Training algorithm for the edESN
for training the DESN model is proposed, which splits the huge 𝐖𝑜𝑢𝑡
Input: 𝑁𝑟 , the reservoir dimension
into tiny 𝐖𝑙𝑜𝑢𝑡 for each layer. Each layer’s 𝐖𝑙𝑜𝑢𝑡 is trained independently,
and each layer may be thought of as a distinct ESN. In this manner, each 𝑁𝐿 ,the number of reservoir layers
layer necessitates the inversion of matrix of size 𝑁𝑟 × 𝑁𝑟 . 𝜆𝑙 ,the 𝑙𝑡ℎ layer’s regularization parameter
Fourth, a dynamic ensemble block is added to integrate all fore- Output: 𝐖𝑜𝑢𝑡 = [𝐖1𝑜𝑢𝑡 , ..., 𝐖𝑙𝑜𝑢𝑡 ], 𝐖𝑔𝑜𝑢𝑡
casts considering the evolving properties. The recent performance of 1 Initialize the 𝐖1𝑖𝑛 and 𝐖1𝑟 randomly
each forecasting module offers the most valuable information about 2 𝑙=1
the evolving characteristics. Therefore, a dynamic ensemble block is 3 for 𝑙 ≤ 𝐿 do
especially intended to aggregate the edESN’s outputs, while taking into 4 if 𝑙 == 1 then
account each forecasting candidate’s most recent performance. 5 Calculate the reservoir states 𝐱1 using 𝐖1𝑖𝑛 and 𝐖1𝑟 using
The proposed edESN is explained in detail in the following lines. Eq. (3)
Fig. 3 presents the architecture of an edESN with 𝑁𝐿 hidden layers. 6 Calculate 𝑅𝐼 1 using Eq. (7)
Unlike the DESN with a single readout layer depicted in Fig. 2, there 7 Calculate the first layer’s output connections 𝐖1𝑜𝑢𝑡 using
are 𝑁𝐿 + 1 readout layers consisting of 𝑁𝐿 readout layers and a global 𝜆1 as in Eq. (9)
readout layer. The global readout layer utilizes all reservoir states. After 8 else
training all readout layers, the 𝑦(𝑡) is computed by a dynamic ensemble 9 Initialize the 𝐖𝑙𝑖𝑛 and 𝐖𝑙𝑟 randomly
module. The 𝑙th layer’s reservoir states are calculated as the same as the 10 Calculate the reservoir states 𝐱𝑙 using 𝐖𝑙𝑖𝑛 and 𝐖𝑙𝑟 as in
DESN based on Eqs. (3) and (4). The output 𝐲𝑙 (𝑡) of 𝑙th readout layer Eq. (10)
can be calculated by 11 Calculate 𝑅𝐼 𝑙 using Eq. (7)
12 Calculate 𝑙𝑡ℎ layer’s output connections 𝐖𝑙𝑜𝑢𝑡 using 𝜆𝑙 as
𝑦(𝑡) = 𝐖𝑙𝑜𝑢𝑡 𝐱(𝑡)𝑙 , (6) in Eq. (9)
13 end
where 𝐖𝑙𝑜𝑢𝑡 ∈ R𝑁𝑜 ×𝑁𝑟 is the 𝑙th layer’s readout connections.
14 𝑙++
This article employs a linear reservoir pruning strategy because
of the quick computation, excellent interpretability, and compatibility 15 end
with ESN’s readout layer. This linear pruning strategy is appropriate 16 Train the global readout 𝐖𝑔𝑜𝑢𝑡 using all reservoir states
because of the linear property of the readout layer. The reservoir
importance can be computed by Eq. (7). 5. Experiments
𝑙 𝑙𝑇 𝑙 −1 𝑙𝑇
𝑅𝐼 = 𝑎𝑏𝑠((𝐃 𝐃 ) 𝐃 𝑌 ), (7)
5.1. Data
where 𝐃𝑙 = [𝐱(𝑡)𝑙−1 , 𝐮(𝑡)]
∈ R𝑁𝑟 +𝑁𝑢
and 𝑎𝑏𝑠() denotes the absolute
values. The magnitude of the 𝑅𝐼 indicates the respective reservoir The significant wave height data from four national buoy center
states’ contribution to the particular output. Larger values indicate a stations, 46083, 46080, 46076, 46001 and 46077, of years 2017,
possible greater effect on the forecasting performance. Then, the best 2018, 2019 are collected for experimental analysis [54]. Their station
reservoir states 𝐃𝑙 are used to train the successive layers. After reservoir identities are five-character alpha-numeric according to the World Me-
pruning, the loss function of 𝑙th readout layer is defined as teorological Organization (WMO). The first two numbers represent the
continental or oceanic region. The specific location is defined using the
𝐿𝑜𝑠𝑠𝑙 = ‖𝐃𝑙𝑟 𝐖𝑙𝑜𝑢𝑡 − 𝑦(𝑡)‖2 + 𝜆𝑙 ‖𝐖𝑙𝑜𝑢𝑡 ‖2 , (8)
𝑙 last three numbers. These buoy stations’ information is summarized in
where 𝐃𝑙𝑟
indicates the best reservoir states, 𝐖𝑙𝑜𝑢𝑡
denotes the connec- Table 1 and the stations’ locations are visualized in Fig. 4. Fig. 4 shows
𝑙
tions of the readout and 𝜆𝑙 is this layer’s regularization parameter. that these buoys are dispersed in or around the Gulf of Alaska, which
The readout connections of the 𝑙th layer, 𝐖𝑙𝑜𝑢𝑡 , can be computed by reflects extreme significant heights. In addition, these stations are not
the following equation, close to each other, which reflects the distinct spatial characteristics.
𝑇 𝑇 Four buoys are near coastlines, and one buoy 46001 in the western Gulf
𝐖𝑙𝑜𝑢𝑡 = (𝐃𝑙𝑟 𝐃𝑙𝑟 + 𝜆𝑙 𝐈)−𝑙 𝐃𝑙𝑟 𝑦(𝑡). (9) of Alaska is in deep oceanic sites. The wind direction, the average wave
𝑙 𝑙 𝑙
Then the deep layers’ reservoir states are computed by period, the wind speed, and the peak gust speed are used as explanatory
factors. The computations of these variables are the same as the official
𝐱(𝑡)𝑙 = 𝑡𝑎𝑛ℎ(𝐖𝑙𝑖𝑛 𝐃𝑙−1
𝑟 + 𝐖𝑙𝑟 𝐱(𝑡 − 1)𝑙 ). (10) definitions by National Data Buoy Center.1 The WSPD (m/s) is the
𝑙−1
average speed of an eight-minute period. The WDIR represents the
After collecting all readout layers’ outputs, a dynamic ensemble is
direction the wind is coming from in degrees clockwise from true North.
used to integrate them. The weight candidate 𝑤̂ 𝑖𝑀 (𝑖) of readout 𝑖𝑀 for
The highest third of all wave heights throughout a twenty-minute
time 𝑖 is computed using Eq. (11),
sample period are averaged to get the WVHT (m). The average wave
𝜃(𝑦̂∗𝑖 (𝑡) − 𝑦(𝑡)) period (in seconds) for all waves throughout a twenty-minute period
𝑀
𝑤̂ 𝑖𝑀 (𝑖) = ∑ (11)
𝑁𝑙 +1
𝜃(𝑦̂∗𝑗 (𝑡) − 𝑦(𝑡)) is represented by the APD. The GST (m/s) represents the gust speed
𝑗 =1 𝑀 𝑀
recorded during the eight-minute or two-minute period. The descriptive
where 𝜃() denotes the forecasting performance indicators. This paper statistics of all variables are summarized in Table 2. The statistics of
utilizes the inverse of mean squared error (IMSE), the inverse of mean each buoy’s time series of three years are computed separately. The
absolute error (IMAE) and Softmax for the weights calculations.
Because the performance of the upper layers is dependent on the
1
lower ones, the entire model’s configurations are modified layer by https://www.ndbc.noaa.gov/measdes.shtml.
5
R. Gao et al. Applied Energy 329 (2023) 120261
Table 1
Information of the studied buoys.
Station Longitude Latitude Water depth
46083 138.019 W 58.270 N 128.9 m
46080 150.042 W 57.947 N 254.5 m
46076 148.009 W 59.471 N 192.0 m
46001 148.057 W 56.291 N 4139.0 m
46077 154.211 W 57.869 N 200.0 m
Fig. 4. Locations of the investigated stations. To assess the accuracy of these models, four forecasting evaluation
metrics are used. The root mean square error (RMSE) is the first
evaluation metric as shown in the following equation
mean WPSD is around 6 m/s for all buoys, and the maximums of WPSD √
√
√ 𝐿𝑡𝑒𝑠𝑡
for all buoys are larger than 19 m/s. The maximum WVHT of 11.06 m √ 1 ∑
is observed in 2017 at buoy 46001 h. A detailed definition of these 𝑅𝑀𝑆𝐸 = √ (𝑥̂ − 𝑥𝑗 )2 , (13)
𝐿𝑡𝑒𝑠𝑡 𝑗=1 𝑗
explanatory factors is provided in [54]. The observations are recorded
and maintained on an hourly basis. The WDIR of 252 deg provides the where 𝐿𝑡𝑒𝑠𝑡 is the length of the test set, 𝑥𝑗 and 𝑥̂𝑗 are the ground truth
highest wave height of 11.06 m. and forecasts. The second evaluation metric is the mean absolute scaled
error (MASE) [55]. MASE can be calculated by
5.2. Data pre-processing 𝑥̂𝑗 − 𝑥𝑗
𝑀𝐴𝑆𝐸 = 𝑚𝑒𝑎𝑛( ∑𝐿𝑡𝑟𝑎𝑖𝑛 ), (14)
1
𝐿𝑡𝑟𝑎𝑖𝑛 −1 𝑡=2
|𝑥𝑡 − 𝑥𝑡−1 |
A proper data pre-processing strategy aids machine learning mod-
els in producing reliable results. We suppose that the training set’s where 𝐿𝑡𝑟𝑎𝑖𝑛 represents the size of training set. The mean absolute error
maximum and minimum values are 𝑥𝑚𝑎𝑥 and 𝑥𝑚𝑖𝑛 , respectively. The of the in-sample Persistence forecast is the denominator of MASE. The
investigated data are normalized into the range [−1,1] due to the 𝑡𝑎𝑛ℎ mean absolute percentage error (MAPE) is the third error metric, and
6
R. Gao et al. Applied Energy 329 (2023) 120261
Table 2
Descriptive statistics.
2017 2018 2019
Station Variable Mean Median Max Std Mean Median Max Std Mean Median Max Std
46083 WDIR 199.51 187.00 360.00 82.37 193.31 172.00 360.00 90.29 188.60 148.00 360.00 99.53
WSPD 6.44 5.60 23.40 4.20 6.61 5.70 21.60 4.25 6.32 5.60 20.90 3.92
GST 8.08 6.90 33.40 5.05 8.21 7.10 27.70 5.13 7.88 7.00 26.60 4.71
APD 6.65 6.50 11.32 1.32 6.67 6.60 11.46 1.26 6.74 6.65 11.44 1.26
WVHT 2.12 1.77 10.17 1.25 2.13 1.81 9.94 1.21 2.09 1.76 8.17 1.17
46080 WDIR 167.06 160.00 360.00 87.69 185.63 195.00 360.00 85.59 188.60 148.00 360.00 99.53
WSPD 6.52 5.90 20.30 3.41 6.66 6.20 21.80 3.34 6.32 5.60 20.90 3.92
GST 7.95 7.20 25.70 4.12 8.20 7.50 28.40 4.08 7.88 7.00 26.60 4.71
APD 5.72 5.62 9.53 1.03 6.46 6.35 11.10 1.15 6.74 6.65 11.44 1.26
WVHT 1.87 1.54 7.67 1.08 2.16 1.81 8.16 1.21 2.09 1.76 8.17 1.17
46076 WDIR 184.13 197.00 360.00 106.32 176.44 179.00 360.00 103.75 170.66 163.00 360.00 102.29
WSPD 6.25 5.60 21.70 3.71 6.43 5.80 22.80 3.72 6.25 5.70 20.10 3.68
GST 7.76 7.00 26.80 4.47 7.94 7.20 28.70 4.49 7.73 7.00 25.90 4.42
APD 6.33 6.17 10.53 1.16 6.46 6.31 11.13 1.19 6.42 6.30 10.71 1.08
WVHT 1.91 1.53 8.76 1.19 2.01 1.70 10.18 1.21 1.93 1.60 9.36 1.16
46001 WDIR 189.51 191.00 360.00 93.36 196.27 205.00 359.00 83.61 213.82 223.00 360.00 75.59
WSPD 7.58 7.20 20.00 3.54 7.56 7.10 21.00 3.52 7.93 7.60 20.10 3.78
GST 9.38 8.90 25.70 4.33 9.33 8.70 26.60 4.29 9.88 9.30 26.20 4.61
APD 6.35 6.17 12.44 1.32 6.23 6.10 12.36 1.26 6.97 6.87 13.97 1.26
WVHT 2.51 2.18 11.06 1.37 2.52 2.24 10.23 1.30 2.85 2.56 9.11 1.43
46077 WDIR 156.94 185.00 360.00 100.52 147.59 170.00 360.00 98.81 155.96 186.00 360.00 97.76
WSPD 6.69 6.50 21.10 3.73 7.17 6.90 21.90 3.85 7.16 7.00 19.90 3.59
GST 8.12 7.70 26.50 4.47 8.69 8.20 26.50 4.62 8.65 8.30 25.30 4.28
APD 4.57 4.45 16.91 1.03 4.65 4.54 8.96 0.89 4.58 4.46 10.16 0.95
WVHT 0.98 0.82 5.34 0.67 1.07 0.90 4.95 0.66 1.02 0.87 4.48 0.61
7
R. Gao et al. Applied Energy 329 (2023) 120261
Table 4
Comparative results of one-step ahead forecasting.
Year Station Metric Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRPedESN
2017 46083 RMSE 0.24859 0.24984 0.26899 0.23940 0.26942 0.24017 0.24391 0.25106 0.25384 0.24184 0.35952 0.27109 0.23578 0.23740
MASE 1.41322 1.48899 1.54900 1.35937 1.57080 1.38799 1.37987 1.43559 1.48490 1.39003 2.22196 1.57675 1.34807 1.36233
MAPE 0.06503 0.07640 0.07882 0.06241 0.07795 0.06686 0.06454 0.06696 0.07359 0.06650 0.11715 0.07654 0.06308 0.06408
46080 RMSE 0.20169 0.19632 0.20695 0.19720 0.19335 0.19105 0.19069 0.19282 0.19766 0.18500 0.19106 0.18902 0.18554 0.18414
MASE 1.10527 1.12968 1.17885 1.09868 1.08360 1.09580 1.04894 1.04830 1.11799 1.04288 1.09542 1.07708 1.03916 1.03851
MAPE 0.06626 0.07117 0.07397 0.06602 0.06676 0.06724 0.06342 0.06323 0.06959 0.06379 0.06796 0.06671 0.06346 0.06373
46076 RMSE 0.25217 0.24153 0.24307 0.25109 0.27928 0.23477 0.23858 0.26356 0.23739 0.23809 0.23996 0.26750 0.23449 0.23185
MASE 1.54416 1.52243 1.53729 1.52562 1.74492 1.47597 1.44396 1.52848 1.47029 1.43681 1.49563 1.69663 1.45652 1.43968
MAPE 0.06601 0.06942 0.07121 0.06485 0.08040 0.06638 0.06240 0.06441 0.06481 0.06201 0.06700 0.07871 0.06421 0.06336
46001 RMSE 0.36873 0.34045 0.34537 0.33844 0.81509 0.69448 0.97589 0.34544 0.34441 0.33067 0.96677 0.63167 0.34771 0.34502
MASE 1.56618 1.50970 1.59740 1.48264 3.70363 3.21757 4.00501 1.48179 1.52474 1.43095 4.25479 2.68681 1.51357 1.46509
MAPE 0.06433 0.06378 0.07237 0.06080 0.14731 0.14130 0.16693 0.06042 0.06535 0.05978 0.19323 0.10856 0.06293 0.06025
46077 RMSE 0.15979 0.14095 0.16808 0.15833 0.16034 0.14406 0.14781 0.15525 0.16182 0.15118 0.14169 0.17224 0.14215 0.13850
MASE 1.44464 1.33906 1.55592 1.45102 1.52607 1.37957 1.34309 1.40057 1.51554 1.39595 1.34706 1.64269 1.33407 1.31327
MAPE 0.08557 0.08301 0.09690 0.08868 0.09512 0.08645 0.08149 0.08276 0.09416 0.08459 0.08316 0.10418 0.08140 0.08074
2018 46083 RMSE 0.28614 0.27887 0.29846 0.28237 0.31618 0.27573 0.30088 0.31345 0.28228 0.28456 0.33639 0.32099 0.27447 0.27478
MASE 1.75679 1.68696 1.79547 1.69601 1.88296 1.67706 1.69461 1.76731 1.70141 1.67885 2.10664 1.92641 1.66090 1.66456
MAPE 0.06317 0.06091 0.06483 0.06089 0.06850 0.06046 0.06015 0.06200 0.06148 0.05981 0.08013 0.06988 0.05959 0.05965
46080 RMSE 0.28386 0.27161 0.29985 0.28037 0.30056 0.27158 0.27602 0.27777 0.27916 0.26531 0.29272 0.27404 0.26346 0.26442
MASE 1.61853 1.50792 1.74420 1.57514 1.72282 1.53362 1.53800 1.54551 1.55781 1.47812 1.65660 1.54311 1.49826 1.49120
MAPE 0.06867 0.06476 0.07769 0.06763 0.07563 0.06588 0.06519 0.06530 0.06644 0.06263 0.07152 0.06599 0.06404 0.06352
46076 RMSE 0.32167 0.31419 0.31846 0.32591 0.32955 0.30410 0.31127 0.33795 0.31251 0.31303 0.38467 0.31819 0.30592 0.30359
MASE 1.85914 1.80971 1.82240 1.82707 1.91545 1.73778 1.74962 1.84522 1.78497 1.74621 2.31531 1.83647 1.75385 1.73875
MAPE 0.07078 0.07178 0.07177 0.06931 0.07756 0.06669 0.06599 0.06793 0.06999 0.06642 0.09564 0.07096 0.06723 0.06634
46001 RMSE 0.33517 0.34262 0.34765 0.32742 0.35373 0.33649 0.36130 0.32962 0.33458 0.32414 0.34074 0.31714 0.31637 0.31730
MASE 1.61299 1.58561 1.64166 1.56530 1.63453 1.54810 1.58005 1.52961 1.56676 1.50667 1.63780 1.50463 1.49224 1.50513
MAPE 0.07028 0.06761 0.07111 0.06785 0.06964 0.06588 0.06604 0.06534 0.06697 0.06443 0.07205 0.06533 0.06462 0.06538
46077 RMSE 0.16257 0.14334 0.15137 0.15425 0.15722 0.14502 0.14041 0.14685 0.14633 0.14214 0.18607 0.16052 0.14208 0.14175
MASE 1.44480 1.30025 1.40574 1.36399 1.45088 1.32012 1.26208 1.30045 1.33321 1.28223 1.75709 1.47948 1.28024 1.26894
MAPE 0.09123 0.08471 0.09487 0.08612 0.09815 0.08688 0.08194 0.08247 0.08741 0.08245 0.12370 0.10071 0.08258 0.08169
2019 46083 RMSE 0.26634 0.25853 0.26379 0.25730 0.28113 0.25806 0.24998 0.26443 0.25661 0.25360 0.27550 0.26469 0.25534 0.25332
MASE 1.73494 1.68967 1.72040 1.68018 1.83740 1.69057 1.62198 1.71111 1.67387 1.65197 1.81602 1.73164 1.65520 1.64634
MAPE 0.06152 0.06120 0.06244 0.05947 0.06654 0.06053 0.05718 0.05989 0.05986 0.05838 0.06617 0.06219 0.05866 0.05812
46080 RMSE 0.30868 0.51632 0.28875 0.30588 0.63474 0.41289 0.53173 0.36993 0.38593 0.34743 0.45787 0.58912 0.35550 0.29828
MASE 1.76912 2.40389 1.67993 1.72004 2.76016 2.05596 2.26655 1.85837 2.04591 1.87059 2.33119 2.43200 1.86991 1.69222
MAPE 0.06543 0.08710 0.06460 0.06276 0.09819 0.07321 0.08251 0.06371 0.07356 0.06575 0.08644 0.08850 0.06883 0.06182
46076 RMSE 0.27466 0.25971 0.27986 0.26357 0.29387 0.25094 0.25375 0.27155 0.26549 0.25644 0.37865 0.27704 0.25694 0.25462
MASE 1.70686 1.66799 1.79870 1.65280 1.84393 1.59266 1.60124 1.65950 1.67286 1.60180 2.53186 1.76053 1.61079 1.60169
MAPE 0.06484 0.06597 0.06985 0.06351 0.07180 0.06082 0.06069 0.06199 0.06434 0.06070 0.10614 0.06834 0.06153 0.06094
46001 RMSE 0.33843 0.30603 0.35898 0.31758 0.31962 0.30420 0.32039 0.33017 0.30744 0.30923 0.32825 0.41794 0.30846 0.31249
MASE 1.46866 1.33732 1.60022 1.38553 1.39470 1.32748 1.37501 1.43394 1.33229 1.34367 1.42688 1.78485 1.33527 1.35623
MAPE 0.06362 0.05870 0.07354 0.06003 0.06100 0.05856 0.05928 0.06177 0.05873 0.05849 0.06237 0.07950 0.05812 0.05901
46077 RMSE 0.14961 0.13869 0.14813 0.15803 0.15611 0.14181 0.13683 0.16460 0.14219 0.14422 0.15599 0.15663 0.13898 0.13702
MASE 1.49769 1.39929 1.55257 1.50763 1.57759 1.44998 1.39218 1.51969 1.46550 1.44054 1.58623 1.59211 1.40635 1.38545
MAPE 0.07948 0.07405 0.08709 0.07797 0.08522 0.07787 0.07417 0.07757 0.07986 0.07614 0.08581 0.08565 0.07487 0.07383
models. The critical distance’s definition is presented by: the extracted features of signal decomposition. The ensemble of BMLP
√ and LGBM achieves competitive performance, and outperforms the
𝑘(𝑘 + 1) BMLP and LGBM. In addition, the superiority of EELM over PSOELM
𝐶𝐷 = 𝑞𝛼 (17)
6𝑁𝑑 emphasizes the importance of ensemble learning.
where 𝑞𝛼 represents the critical value obtained from the studentized A pair-wise statistical Wilcoxon test is conducted for pair-wise com-
√
range statistic divided by 2, 𝑘 is the number of methodologies and parison. Tables 10–12 summarize the 𝑝 values of Wilcoxon test results
𝑁𝑑 is the number of time series [60]. In this paper’s experiments, the of the investigated three prediction horizons. The 𝑝 values smaller
𝑞𝛼 equals to 3.35. Fig. 5 shows the Nemenyi test results based on the than 0.05 indicate that the proposed deep learning model outperforms
MASE for these three steps-ahead forecasting tasks. The models at the the corresponding method significantly. The 𝑝 values larger than 0.05
top outperform those at the bottom. Fig. 5 shows the proposed models are presented in bold. According to Table 10, the proposed edESN
are always at the top, which indicates the outstanding performance slightly outperforms the LSTM and DWTESN because LSTM owns deep
on all prediction horizons. Furthermore, the DESN does not guarantee representations and sequential modeling capability, and DWTESN’s
improvement over ESN because of the large dimension of the global DWT block assists in temporal feature extraction. The proposed model
states. Whereas, the proposed edESN outperform the ESN and DESN generally outperforms EELM based on RMSE but significantly outper-
significantly, because the edESN makes full use of all layers’ states forms it in terms of MASE and MAPE. As the number of prediction steps
as well as the global states based on the most recent performance. grows, the proposed edESN significantly outperforms the other models
Besides, the EWTESN also outperforms the ESN and DESN significantly according to Tables 11 and 12. The dynamic ensemble block, which
because of the multi-scale representation of the raw time series and decides each readout’s contributions depending on the most recent
8
R. Gao et al. Applied Energy 329 (2023) 120261
Table 5
Comparative results of two-steps ahead forecasting.
Year Station Metric Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRPedESN
2017 46083 RMSE 0.31484 0.31485 0.34854 0.31652 0.34547 0.30815 0.31117 0.31429 0.32824 0.30722 0.40708 0.36200 0.29851 0.30020
MASE 1.79500 1.86974 2.06759 1.78544 2.09530 1.80655 1.79325 1.79039 1.94876 1.78151 2.50566 2.10897 1.72875 1.72834
MAPE 0.08402 0.09590 0.10584 0.08139 0.10613 0.08897 0.08601 0.08597 0.09780 0.08737 0.13237 0.10461 0.08352 0.08359
46080 RMSE 0.28523 0.25020 0.26203 0.26779 0.26065 0.24904 0.24955 0.24118 0.24959 0.23343 0.23990 0.27720 0.24573 0.22726
MASE 1.55825 1.44228 1.49604 1.52599 1.48891 1.42990 1.41450 1.36554 1.44241 1.34254 1.38860 1.57225 1.39059 1.30978
MAPE 0.09370 0.08997 0.09198 0.09412 0.09270 0.08860 0.08752 0.08439 0.09037 0.08301 0.08657 0.09850 0.08558 0.08162
46076 RMSE 0.34129 0.30539 0.31627 0.34718 0.34149 0.30028 0.30199 0.32733 0.30229 0.30160 0.30691 0.33462 0.30214 0.29396
MASE 2.05920 1.96751 2.06328 2.11686 2.16918 1.89975 1.85423 1.94615 1.91006 1.85172 1.92815 2.14245 1.89375 1.83622
MAPE 0.08911 0.09457 0.10162 0.09475 0.10116 0.08749 0.08253 0.08525 0.08624 0.08274 0.08798 0.10144 0.08574 0.08291
46001 RMSE 0.45213 0.49636 0.41580 0.41458 0.91288 0.73662 1.17003 0.40931 0.41365 0.39035 0.90476 0.77852 0.57386 0.40253
MASE 1.99767 2.27690 1.96898 1.87768 4.45944 3.50590 4.86670 1.84088 1.89226 1.76284 4.13822 3.30178 2.56493 1.75901
MAPE 0.08368 0.09301 0.08634 0.07809 0.19056 0.14622 0.21325 0.07638 0.08089 0.07430 0.18814 0.14912 0.10578 0.07346
46077 RMSE 0.26055 0.21408 0.24421 0.24959 0.23185 0.22360 0.22296 0.22312 0.23882 0.22031 0.21422 0.27499 0.21919 0.21951
MASE 2.25608 2.00926 2.23961 2.20093 2.13523 2.08302 2.04010 2.07168 2.18690 2.03165 2.01207 2.60961 2.02876 2.00179
MAPE 0.13211 0.12159 0.13704 0.12660 0.13387 0.12795 0.12332 0.12479 0.13378 0.12367 0.12328 0.16703 0.12242 0.11984
2018 46083 RMSE 0.39545 0.36834 0.39724 0.37753 0.39995 0.36445 0.38864 0.39619 0.37264 0.36836 0.39143 0.38412 0.36520 0.35773
MASE 2.35462 2.13885 2.31919 2.21950 2.33970 2.15266 2.16492 2.28394 2.17023 2.13855 2.31126 2.25675 2.12839 2.10026
MAPE 0.08376 0.07599 0.08237 0.07887 0.08435 0.07722 0.07607 0.08010 0.07780 0.07589 0.08399 0.08160 0.07551 0.07479
46080 RMSE 0.38727 0.35886 0.37509 0.39034 0.39966 0.35772 0.36900 0.35207 0.36565 0.34496 0.35803 0.44810 0.34418 0.34398
MASE 2.19120 1.97424 2.09774 2.17468 2.24421 1.98514 2.01136 1.96452 2.01879 1.89627 2.00341 2.46715 1.90201 1.88792
MAPE 0.09511 0.08436 0.09162 0.09126 0.09719 0.08626 0.08590 0.08470 0.08656 0.08157 0.08758 0.10494 0.08197 0.08157
46076 RMSE 0.44807 0.40196 0.41086 0.46117 0.42881 0.40790 0.41349 0.42348 0.41097 0.40444 0.46457 0.41722 0.40860 0.39437
MASE 2.51549 2.27779 2.32113 2.54757 2.47208 2.28630 2.27419 2.32469 2.31069 2.24050 2.72594 2.38461 2.28930 2.22194
MAPE 0.09550 0.09027 0.09023 0.09461 0.09852 0.08757 0.08548 0.08611 0.08980 0.08488 0.11006 0.09293 0.08816 0.08559
46001 RMSE 0.44070 0.42270 0.43614 0.42295 0.42567 0.40989 0.45837 0.42007 0.41427 0.40798 0.43278 0.40738 0.40229 0.39930
MASE 2.09267 1.96368 2.06328 2.01670 1.98095 1.91059 2.00479 1.91795 1.94707 1.88423 2.05560 1.90177 1.86494 1.86284
MAPE 0.09121 0.08406 0.08983 0.08875 0.08515 0.08183 0.08281 0.08139 0.08370 0.08024 0.09010 0.08241 0.07991 0.08059
46077 RMSE 0.25749 0.21979 0.23524 0.23830 0.23335 0.22219 0.21614 0.22127 0.22227 0.21737 0.25162 0.23630 0.21778 0.21681
MASE 2.27741 1.95643 2.11805 2.09811 2.14671 2.00929 1.94914 1.97376 2.01415 1.94656 2.33381 2.16339 1.93681 1.91677
MAPE 0.14460 0.12645 0.13671 0.13427 0.14466 0.13315 0.12798 0.12717 0.13220 0.12610 0.15994 0.14658 0.12538 0.12341
2019 46083 RMSE 0.34205 0.32589 0.34274 0.33066 0.34437 0.32553 0.31777 0.32880 0.32545 0.31856 0.35192 0.32939 0.32684 0.31991
MASE 2.20782 2.09979 2.29670 2.12409 2.24776 2.08186 2.01686 2.11775 2.09082 2.04513 2.29267 2.11912 2.09728 2.04874
MAPE 0.07812 0.07574 0.08754 0.07533 0.08275 0.07418 0.07131 0.07413 0.07476 0.07230 0.08375 0.07559 0.07409 0.07229
46080 RMSE 0.40252 0.68321 0.52878 0.39960 0.53417 0.67879 0.79497 0.42850 0.44957 0.41469 0.59681 0.90229 0.40276 0.35740
MASE 2.36706 3.08712 2.78291 2.30240 2.78262 3.00095 3.22740 2.28767 2.47436 2.29709 2.74571 3.52865 2.24952 2.05108
MAPE 0.08951 0.11204 0.10472 0.08448 0.09968 0.10633 0.11538 0.08094 0.08976 0.08214 0.10097 0.12909 0.08394 0.07662
46076 RMSE 0.37270 0.34201 0.37121 0.35299 0.36861 0.32889 0.33015 0.34080 0.34649 0.33043 0.42082 0.37647 0.33185 0.32285
MASE 2.36057 2.17082 2.39676 2.24527 2.33000 2.07676 2.10223 2.12825 2.20562 2.09032 2.78764 2.40297 2.10340 2.05205
MAPE 0.08922 0.08394 0.09269 0.08696 0.08920 0.07981 0.07973 0.08026 0.08497 0.07963 0.11433 0.09324 0.08063 0.07853
46001 RMSE 0.43079 0.37361 0.43306 0.41151 0.38752 0.37700 0.39613 0.38832 0.38079 0.37252 0.39907 0.53268 0.37616 0.37388
MASE 1.87034 1.59329 1.90369 1.75844 1.66006 1.62293 1.68883 1.67076 1.62784 1.58529 1.71513 2.26199 1.60785 1.59921
MAPE 0.08186 0.07112 0.08719 0.07575 0.07355 0.07265 0.07281 0.07294 0.07341 0.07036 0.07529 0.10166 0.07065 0.07035
46077 RMSE 0.23271 0.21333 0.22099 0.23835 0.22553 0.21666 0.21223 0.23929 0.21275 0.21739 0.22528 0.23146 0.21061 0.20729
MASE 2.31550 2.12702 2.26445 2.27363 2.29321 2.19310 2.14223 2.25746 2.14983 2.15115 2.25231 2.34183 2.12160 2.08071
MAPE 0.12295 0.11297 0.12667 0.11782 0.12343 0.11864 0.11385 0.11578 0.11609 0.11330 0.12069 0.12637 0.11309 0.11062
Fig. 5. Nemenyi test results using MASE for the predictions of (a) One-step ahead (𝑝 value = 9.70e−28), (b) Two-steps ahead (𝑝 value = 3.99e−25) and (c) Four-steps ahead (𝑝
value = 1.26e−27). The CD is 5.12, and 𝑞𝛼 is 3.35.
9
R. Gao et al. Applied Energy 329 (2023) 120261
Table 6
Comparative results of four-steps ahead forecasting.
Year Station Metric Naive SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRPedESN
2017 46083 RMSE 0.46820 0.44416 0.48421 0.44575 0.48144 0.46495 0.46813 0.46051 0.46568 0.44337 0.51745 0.46935 0.43842 0.43540
MASE 2.69001 2.58089 2.85550 2.60299 2.90298 2.75321 2.72345 2.59653 2.76388 2.56646 3.13821 2.76310 2.55611 2.52160
MAPE 0.12751 0.13106 0.14410 0.12795 0.14785 0.13960 0.13425 0.12767 0.13949 0.12795 0.16494 0.14024 0.12801 0.12421
46080 RMSE 0.45008 0.36058 0.39075 0.41762 0.36579 0.38035 0.36489 0.35493 0.35146 0.33751 0.36536 0.42087 0.35952 0.33160
MASE 2.38202 2.04278 2.15307 2.22718 2.10000 2.16739 2.04188 2.03132 1.98788 1.90981 2.07136 2.36782 2.02325 1.87967
MAPE 0.14044 0.12445 0.12776 0.13352 0.13075 0.13286 0.12722 0.12601 0.12131 0.11737 0.12702 0.14698 0.12287 0.11493
46076 RMSE 0.51370 0.44296 0.47527 0.52331 0.46419 0.44506 0.45151 0.44297 0.45519 0.43259 0.43739 0.49077 0.43769 0.41337
MASE 3.09443 2.78515 2.98348 3.12832 2.98067 2.81167 2.72624 2.71861 2.84612 2.66426 2.74233 3.16122 2.74031 2.55846
MAPE 0.13445 0.13132 0.13587 0.14220 0.14118 0.13144 0.12237 0.12078 0.13123 0.12088 0.12715 0.15315 0.12604 0.11662
46001 RMSE 0.63409 0.67511 0.57900 0.58567 0.72258 1.06158 1.50565 0.55179 0.56835 0.53342 0.91976 0.99061 0.58242 0.53703
MASE 2.92952 3.16270 2.75539 2.73014 3.28712 5.04033 6.43734 2.57775 2.66947 2.49477 4.34897 4.35922 2.75580 2.44696
MAPE 0.12493 0.13020 0.11797 0.11910 0.13263 0.21782 0.28800 0.11023 0.11794 0.10880 0.19486 0.20283 0.12281 0.10467
46077 RMSE 0.43115 0.36030 0.39345 0.42980 0.36552 0.38106 0.32987 0.35039 0.38278 0.35462 0.34912 0.43951 0.36340 0.35065
MASE 3.58472 3.16413 3.49941 3.50680 3.29386 3.32805 3.08301 3.23432 3.36404 3.18619 3.13871 4.04530 3.19523 3.07799
MAPE 0.20880 0.18693 0.21850 0.20479 0.20295 0.20106 0.19062 0.19838 0.20513 0.19411 0.19193 0.25634 0.19204 0.18475
2018 46083 RMSE 0.60370 0.53671 0.55847 0.56728 0.55890 0.54134 0.57994 0.54315 0.53251 0.52225 0.54217 0.58846 0.53868 0.53145
MASE 3.57898 3.07359 3.26544 3.34246 3.24003 3.15905 3.10895 3.19995 3.10640 3.07111 3.17879 3.45803 3.11763 3.03726
MAPE 0.12567 0.10677 0.11382 0.11724 0.11459 0.11142 0.10792 0.11120 0.11006 0.10754 0.11267 0.12455 0.10906 0.10614
46080 RMSE 0.60021 0.55642 0.57241 0.59087 0.54953 0.54225 0.54633 0.52593 0.54433 0.51790 0.53270 0.70023 0.51409 0.51585
MASE 3.39093 3.06204 3.24157 3.24070 3.04798 2.99861 2.96097 2.91925 2.99285 2.83147 2.90441 3.79565 2.78958 2.78795
MAPE 0.14859 0.12854 0.14201 0.13945 0.13132 0.12811 0.12557 0.12391 0.12520 0.11917 0.12553 0.15932 0.11868 0.11931
46076 RMSE 0.67829 0.59256 0.61151 0.68775 0.61504 0.60141 0.60509 0.61267 0.58504 0.58142 0.63062 0.61291 0.58880 0.56868
MASE 3.83255 3.29776 3.39542 3.80923 3.56527 3.42221 3.29266 3.39754 3.29367 3.21396 3.60546 3.47052 3.29294 3.16416
MAPE 0.14575 0.12680 0.12825 0.13820 0.14075 0.13281 0.12309 0.12558 0.12800 0.12153 0.14160 0.13501 0.12766 0.12228
46001 RMSE 0.66276 0.59891 0.61204 0.62649 0.60683 0.65174 0.66623 0.57841 0.58269 0.57124 0.61285 0.59539 0.57678 0.57444
MASE 3.18283 2.79519 2.90029 3.01912 2.86907 2.95094 2.95323 2.74912 2.78305 2.71544 2.92614 2.83482 2.74154 2.71005
MAPE 0.13910 0.11803 0.12448 0.13171 0.12260 0.12356 0.12148 0.11762 0.11973 0.11606 0.12720 0.12202 0.11743 0.11648
46077 RMSE 0.41965 0.36531 0.36860 0.38326 0.37376 0.35813 0.35297 0.36878 0.36085 0.35786 0.37856 0.38555 0.35579 0.35273
MASE 3.70604 3.20577 3.31791 3.37942 3.40333 3.24569 3.18266 3.31013 3.25224 3.20936 3.49388 3.52404 3.18724 3.13261
MAPE 0.23654 0.20647 0.22165 0.21573 0.23046 0.22030 0.21483 0.21634 0.22010 0.21307 0.24073 0.24331 0.21155 0.20663
2019 46083 RMSE 0.48961 0.46144 0.46707 0.48343 0.47584 0.46263 0.44878 0.45914 0.45684 0.44857 0.47239 0.47127 0.46058 0.44444
MASE 3.16465 2.97860 3.12005 3.09753 3.12213 2.98955 2.87063 2.95578 2.98851 2.89868 3.06199 3.04885 2.95466 2.84624
MAPE 0.11365 0.10703 0.11649 0.11101 0.11439 0.10770 0.10299 0.10480 0.10828 0.10358 0.11093 0.11015 0.10523 0.10142
46080 RMSE 0.61977 0.98762 0.61099 0.62750 0.77130 0.82789 1.67215 0.58588 0.63179 0.58144 0.81772 1.47990 0.61912 0.54462
MASE 3.64709 4.48141 3.49540 3.57802 4.03184 3.92267 6.00303 3.27948 3.61704 3.34017 3.90203 5.83426 3.42232 3.14437
MAPE 0.13889 0.16177 0.13165 0.13013 0.14813 0.14429 0.21922 0.11855 0.13582 0.12274 0.14449 0.21480 0.12863 0.11747
46076 RMSE 0.56820 0.50861 0.55023 0.53483 0.54439 0.50828 0.49902 0.49615 0.51164 0.48691 0.54625 0.51474 0.49158 0.47271
MASE 3.67144 3.18791 3.48720 3.37748 3.40950 3.19792 3.20143 3.16911 3.24121 3.08933 3.59672 3.28330 3.12652 2.98899
MAPE 0.13883 0.12040 0.13293 0.12840 0.12977 0.12168 0.12207 0.12159 0.12402 0.11803 0.14284 0.12618 0.11923 0.11343
46001 RMSE 0.64707 0.55637 0.60379 0.60187 0.59668 0.58513 0.59368 0.57548 0.56282 0.55310 0.57686 0.71994 0.55617 0.54349
MASE 2.84639 2.37702 2.59552 2.62117 2.59268 2.52655 2.51350 2.43231 2.40169 2.35109 2.50974 3.10070 2.39822 2.32782
MAPE 0.12527 0.10552 0.11698 0.11652 0.11561 0.11368 0.10815 0.10570 0.10886 0.10439 0.11080 0.13978 0.10592 0.10256
46077 RMSE 0.36666 0.34853 0.34132 0.35412 0.34898 0.33918 0.35188 0.35488 0.33995 0.33628 0.34314 0.39346 0.33038 0.32534
MASE 3.66819 3.46019 3.47041 3.47971 3.52794 3.43001 3.37723 3.42924 3.48298 3.35887 3.38891 3.92587 3.27252 3.20385
MAPE 0.19289 0.17944 0.19226 0.18252 0.19126 0.18568 0.17778 0.17822 0.18897 0.17857 0.17934 0.21129 0.17343 0.16878
performance, may be responsible for the significant improvement. The 5.6. Ablation study
most recent performance is a direct reflection of which readout is
appropriate for the future phases. The comparative study demonstrates the superiority of the proposed
model over the baselines. This section researches the impact of each
The scatterplots of different horizons’ forecasts and raw data at
component in the proposed model and the effect of time lags.
station 46083 of different years, as well as the related coefficients
of determination 𝑅2 , are shown in Fig. 6. The larger the 𝑅2 is, the
5.6.1. Analysis of each component
more precise the forecasts are. The 𝑅2 decreases with the increase An ablation study is conducted to investigate the necessity of each
of prediction steps. For one-step ahead forecast, the 𝑅2 of all years component in the proposed model. We assess the following four vari-
is larger than 0.96, which further demonstrates the suitability of the ants on three prediction horizons:
proposed ESN model.
• DnedESN: The dynamic ensemble edESN without reservoir prun-
We only present the forecasts and raw observations at station 36083
ing and direct link connecting each reservoir layer to the input
of 2017, 2018, and 2019 for brevity. Fig. 7 visualizes the comparisons
layer.
between the proposed model’s forecasts and the ground truth. Fig. 7 • DedESN: The dynamic ensemble edESN without reservoir prun-
demonstrates that the proposed DRPedESN precisely captures and an- ing.
ticipates the future variations, trends, and cycles on the three prediction • MRPedESN: The edESN with reservoir pruning which utilizes the
horizons. mean as the ensemble operator.
10
R. Gao et al. Applied Energy 329 (2023) 120261
Table 7
Slope comparisons of one-step ahead forecasting.
Year Station Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRIedESN
2017 46083 0.98183 0.98176 0.97855 0.98327 0.97862 0.98296 0.98239 0.98135 0.98118 0.98276 0.95516 0.97736 0.98353 0.98330
46080 0.98031 0.98140 0.97957 0.98106 0.98220 0.98256 0.98257 0.98193 0.98115 0.98350 0.98219 0.98272 0.98335 0.98354
46076 0.98381 0.98556 0.98516 0.98469 0.98066 0.98604 0.98550 0.98284 0.98597 0.98607 0.98540 0.98165 0.98603 0.98636
46001 0.97085 0.97561 0.97543 0.97535 0.88922 0.91970 0.82532 0.97411 0.97492 0.97662 0.85847 0.90999 0.97427 0.97433
46077 0.96907 0.97562 0.96533 0.97032 0.96840 0.97463 0.97332 0.97045 0.96798 0.97201 0.97544 0.96416 0.97547 0.97663
2018 46083 0.97489 0.97580 0.97240 0.97521 0.96929 0.97640 0.97202 0.96941 0.97524 0.97490 0.96496 0.96897 0.97675 0.97666
46080 0.97405 0.97664 0.97140 0.97466 0.97167 0.97619 0.97535 0.97503 0.97509 0.97733 0.97209 0.97571 0.97758 0.97739
46076 0.97433 0.97550 0.97500 0.97346 0.97312 0.97687 0.97593 0.97153 0.97562 0.97562 0.96329 0.97472 0.97660 0.97696
46001 0.96629 0.96597 0.96418 0.96760 0.96328 0.96645 0.96286 0.96806 0.96779 0.96964 0.96515 0.96975 0.96988 0.96980
46077 0.97588 0.98111 0.97898 0.97888 0.97724 0.98075 0.98213 0.98022 0.98030 0.98144 0.96464 0.97654 0.98143 0.98151
2019 46083 0.97728 0.97856 0.97763 0.97870 0.97453 0.97848 0.97989 0.97743 0.97874 0.97922 0.97540 0.97740 0.97894 0.97929
46080 0.97687 0.93928 0.97997 0.97847 0.90759 0.96070 0.92932 0.97079 0.96748 0.97495 0.94980 0.91176 0.96878 0.97916
46076 0.97887 0.98153 0.97893 0.98035 0.97700 0.98267 0.98191 0.97953 0.98082 0.98207 0.96005 0.97854 0.98141 0.98176
46001 0.96856 0.97469 0.96514 0.97213 0.97257 0.97444 0.97198 0.97042 0.97391 0.97390 0.97058 0.95002 0.97375 0.97318
46077 0.97265 0.97657 0.97329 0.96939 0.97110 0.97517 0.97696 0.96739 0.97502 0.97477 0.96948 0.96943 0.97617 0.97685
Average 0.97504 0.97504 0.97473 0.97624 0.96377 0.97293 0.96383 0.97470 0.97608 0.97765 0.96081 0.96458 0.97760 0.97845
Median 0.97489 0.97657 0.97543 0.97535 0.97257 0.97640 0.97593 0.97411 0.97524 0.97662 0.96515 0.97472 0.97675 0.97739
Minimum 0.96629 0.93928 0.96418 0.96760 0.88922 0.91970 0.82532 0.96739 0.96748 0.96964 0.85847 0.90999 0.96878 0.96980
Maximum 0.98381 0.98556 0.98516 0.98469 0.98220 0.98604 0.98550 0.98284 0.98597 0.98607 0.98540 0.98272 0.98603 0.98636
Table 8
Slope comparisons of two-steps ahead forecasting.
Year Station Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRIedESN
2017 46083 0.97086 0.97071 0.96412 0.97274 0.96463 0.97186 0.97115 0.97074 0.96851 0.97216 0.94584 0.96056 0.97346 0.97324
46080 0.96048 0.97042 0.96787 0.96518 0.96675 0.97071 0.97039 0.97157 0.96963 0.97337 0.97200 0.96325 0.97051 0.97484
46076 0.97031 0.97675 0.97474 0.97142 0.97147 0.97717 0.97663 0.97325 0.97713 0.97748 0.97619 0.97110 0.97692 0.97815
46001 0.95618 0.95229 0.96248 0.96304 0.87456 0.90679 0.74244 0.96345 0.96438 0.96753 0.86824 0.86548 0.92886 0.96507
46077 0.91792 0.94311 0.92559 0.92554 0.93328 0.93819 0.93975 0.93915 0.92908 0.93995 0.94388 0.91327 0.94145 0.94056
2018 46083 0.95217 0.95763 0.95063 0.95568 0.95020 0.95895 0.95326 0.95059 0.95645 0.95748 0.95252 0.95458 0.95879 0.96029
46080 0.95169 0.95967 0.95365 0.95478 0.94947 0.95841 0.95599 0.95977 0.95717 0.96171 0.95858 0.93028 0.96202 0.96212
46076 0.95017 0.95968 0.95778 0.94728 0.95436 0.95806 0.95719 0.95563 0.95801 0.95967 0.94725 0.95608 0.95809 0.96101
46001 0.94159 0.94738 0.94380 0.94493 0.94558 0.94925 0.93926 0.94804 0.94943 0.95132 0.94439 0.95017 0.95168 0.95229
46077 0.93945 0.95497 0.94954 0.94753 0.94907 0.95416 0.95718 0.95481 0.95395 0.95610 0.93889 0.94823 0.95592 0.95625
2019 46083 0.96253 0.96573 0.96258 0.96476 0.96153 0.96563 0.96739 0.96511 0.96567 0.96723 0.95985 0.96503 0.96545 0.96689
46080 0.96069 0.89533 0.93362 0.96440 0.93837 0.89395 0.84798 0.96170 0.95698 0.96513 0.90825 0.81280 0.96133 0.97001
46076 0.96111 0.96892 0.96466 0.96470 0.96350 0.97028 0.96932 0.96774 0.96730 0.97025 0.95187 0.95996 0.96908 0.97086
46001 0.94906 0.96237 0.94982 0.95608 0.96017 0.96074 0.95742 0.95946 0.95991 0.96229 0.95694 0.91893 0.96152 0.96181
46077 0.93379 0.94387 0.93867 0.92920 0.93705 0.94120 0.94373 0.92889 0.94340 0.94164 0.93610 0.93216 0.94468 0.94649
Average 0.95187 0.95525 0.95330 0.95515 0.94800 0.95169 0.93661 0.95799 0.95847 0.96155 0.94405 0.93346 0.95865 0.96266
Median 0.95217 0.95967 0.95365 0.95608 0.95020 0.95841 0.95718 0.95977 0.95801 0.96229 0.94725 0.95017 0.96133 0.96212
Minimum 0.91792 0.89533 0.92559 0.92554 0.87456 0.89395 0.74244 0.92889 0.92908 0.93995 0.86824 0.81280 0.92886 0.94056
Maximum 0.97086 0.97675 0.97474 0.97274 0.97147 0.97717 0.97663 0.97325 0.97713 0.97748 0.97619 0.97110 0.97692 0.97815
Table 9
Slope comparisons of four-steps ahead forecasting.
Year Station Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRIedESN
2017 46083 0.93556 0.94028 0.93157 0.94057 0.93141 0.93570 0.93364 0.93611 0.93509 0.94082 0.91685 0.93483 0.94185 0.94353
46080 0.90063 0.93584 0.93518 0.91334 0.93326 0.93089 0.93927 0.93803 0.93862 0.94357 0.93376 0.91848 0.93575 0.94566
46076 0.93245 0.95216 0.94573 0.92824 0.94685 0.94980 0.94719 0.95223 0.94812 0.95451 0.95162 0.93788 0.95155 0.95694
46001 0.91382 0.90944 0.92943 0.92415 0.89748 0.80887 0.61493 0.93340 0.93090 0.93924 0.85072 0.80420 0.92723 0.93759
46077 0.77689 0.83356 0.79845 0.78064 0.82602 0.81384 0.86985 0.84638 0.81138 0.83922 0.85055 0.77055 0.83605 0.84572
2018 46083 0.88892 0.90950 0.90265 0.89860 0.90128 0.90883 0.89494 0.90571 0.90930 0.91284 0.90817 0.89470 0.90934 0.91101
46080 0.88380 0.89862 0.88892 0.88192 0.90099 0.90222 0.90409 0.90911 0.90271 0.91245 0.90810 0.83634 0.91565 0.91378
46076 0.88570 0.91218 0.90949 0.88845 0.90460 0.90721 0.90752 0.90674 0.91298 0.91580 0.90134 0.90366 0.91163 0.91796
46001 0.86768 0.89369 0.88656 0.87846 0.88862 0.86807 0.86745 0.90072 0.89797 0.90314 0.88781 0.89240 0.89974 0.90197
46077 0.83878 0.87171 0.86804 0.85928 0.86398 0.87633 0.88151 0.86904 0.87430 0.87639 0.85943 0.85771 0.87774 0.87997
2019 46083 0.92326 0.93143 0.92746 0.92303 0.92572 0.92993 0.93454 0.93202 0.93192 0.93493 0.92721 0.92759 0.93089 0.93573
46080 0.90692 0.78750 0.91410 0.90944 0.86559 0.83909 0.61216 0.92510 0.91094 0.92727 0.83940 0.66794 0.90852 0.93072
46076 0.90968 0.93071 0.91942 0.91766 0.92151 0.92912 0.92958 0.93047 0.92942 0.93534 0.92006 0.92504 0.93232 0.93789
46001 0.88504 0.91567 0.90203 0.89555 0.90594 0.90469 0.90500 0.91373 0.91177 0.91785 0.91037 0.85487 0.91584 0.91931
46077 0.83555 0.84582 0.84783 0.83766 0.84038 0.84980 0.83665 0.83653 0.84926 0.85409 0.84709 0.79514 0.85975 0.86437
Average 0.88565 0.89788 0.90046 0.89180 0.89691 0.89029 0.86522 0.90902 0.90631 0.91383 0.89416 0.86142 0.91026 0.91614
Median 0.88892 0.90950 0.90949 0.89860 0.90128 0.90469 0.90409 0.91373 0.91177 0.91785 0.90810 0.89240 0.91565 0.91931
Minimum 0.77689 0.78750 0.79845 0.78064 0.82602 0.80887 0.61216 0.83653 0.81138 0.83922 0.83940 0.66794 0.83605 0.84572
Maximum 0.93556 0.95216 0.94573 0.94057 0.94685 0.94980 0.94719 0.95223 0.94812 0.95451 0.95162 0.93788 0.95155 0.95694
11
R. Gao et al. Applied Energy 329 (2023) 120261
Table 10
Wilcoxon test results of one-step ahead forecasting.
Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN
RMSE 0.0001 0.0054 0.0002 0.0004 0.0001 0.0151 0.0026 0.0001 0.0006 0.0215 0.0001 0.0001 0.0730
MASE 0.0001 0.0003 0.0001 0.0001 0.0001 0.0015 0.0067 0.0001 0.0001 0.0946 0.0001 0.0001 0.0946
MAPE 0.0001 0.0002 0.0001 0.0012 0.0001 0.0004 0.1514 0.0004 0.0001 0.4887 0.0001 0.0001 0.1354
Table 11
Wilcoxon test results of two-steps ahead forecasting.
Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN
RMSE 0.0001 0.0004 0.0001 0.0001 0.0001 0.0001 0.0003 0.0001 0.0001 0.0353 0.0001 0.0001 0.0009
MASE 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0002 0.0001 0.0001 0.0006 0.0001 0.0001 0.0001
MAPE 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0009 0.0001 0.0001 0.0084 0.0001 0.0001 0.0006
Table 12
Wilcoxon test results of four-steps ahead forecasting.
Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN
RMSE 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0003 0.0001 0.0001 0.0067 0.0001 0.0001 0.0001
MASE 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
MAPE 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0009 0.0001 0.0001 0.0001
horizons, emphasizing the importance of the direct link [61]. The Lag = 1 Lag = 4 Lag = 8 Lag = 16
clean information is transmitted to guide the reservoir states creation RMSE 0.32 ± 0.07 0.62 ± 0.31 0.71 ± 0.24 0.96 ± 0.33
in deep structures via the direct link connecting the input layer to MASE 1.90 ± 0.23 3.94 ± 1.85 4.69 ± 1.76 6.52 ± 2.04
MAPE 0.09 ± 0.02 0.21 ± 0.13 0.27 ± 0.14 0.39 ± 0.17
each reservoir layer. Third, across all prediction horizons, the DR-
PedESN outperforms the MRPedESN, demonstrating the superiority of
the dynamic ensemble module over the static ensemble. Unlike the Table 15
Comparative results of different time lags for four-steps ahead forecasting.
static ensemble, the dynamic ensemble adjusts the combination weights
Lag = 1 Lag = 4 Lag = 8 Lag = 16
based on each forecasting candidate’s most recent performance. Such
an update facilitates capturing the time series evolving properties and RMSE 0.46 ± 0.09 0.70 ± 0.17 0.87 ± 0.26 1.11 ± 0.40
MASE 2.79 ± 0.38 4.67 ± 1.48 5.87 ± 1.93 7.56 ± 2.52
filtering out the inferior models by assigning low weights. Finally,
MAPE 0.13 ± 0.03 0.26 ± 0.15 0.35 ± 0.17 0.46 ± 0.20
the DRPedESN outperforms the DedESN on one-step, and two-steps
ahead forecasts while maintaining similar performance on four-steps
ahead forecasting. As a result, we argue that reservoir pruning is
necessary, but it has the smallest impact on forecasting performance edESN on hourly wave energy time series. Therefore, the ensemble deep
when compared to other factors. structure is proven to boost the performance of a deep neural network.
Unlike traditional deep neural networks with a single output layer
5.6.2. Analysis of time lags used in the energy forecasting literature, the proposed edESN has mul-
tiple readouts to avoid overfitting. The various readout layers of each
As a recurrent neural network, the ESN processes the wave time
reservoir are unaffected by the global states’ huge size and compensate
series hour by hour. The original wave time series is transformed into
for the poor performance of particular readouts. Although ensemble
the ESN’s hidden states, and then the output layer is trained using the
learning improves the performance [62], the challenge of designing
hidden states. The ESN generally only utilizes the latest hidden states
combination schemes for multiple outputs within one single deep net-
for decision making and output layer training. This section analyzes
work is crucial for performance. The combination scheme should assign
the performance of utilizing the different lags. Tables 13–15 summarize
large weights to the readout layer, which is likely to forecast the next
the average performance and the respective standard deviations for
step precisely. However, the evolving and chaotic nature of the signifi-
different time lags. Utilizing the latest information achieves the best
cant wave height time series imposes critical challenges on determining
averaged performance because the ESN recurrently processes the wave
precise readout layers for the future. In time series, the observations
time series, and the latest hidden states contain efficient information.
change continuously. Observations with similar time indices are more
Furthermore, utilizing more lags increases the input dimension of the
likely to have similar patterns than those with time indexes far apart
output layer, which may deteriorates the performance.
on the time axis. As a result, the most recent forecasting accuracy
provides trustworthy direction and insights into the dynamic ensemble
6. Discussion: findings, limitations and future directions scheme’s construction. This work proposes to use the most recent pre-
dicting performance to estimate the dynamic ensemble weights based
The ensemble deep architecture within one neural network is less on this intuition. These ensemble weights are different at different
explored and researched in the literature on wave energy forecasting. time steps, creating an evolving ensemble module. The diversities of
The experimental studies have demonstrated the appropriateness of the all reservoir layers ensure the possibilities of satisfactory forecasts for
12
R. Gao et al. Applied Energy 329 (2023) 120261
Fig. 6. Scatter-plots of predictions and raw observations at station 46083 of year 2017, 2018 and 2019. The coefficients of determination 𝑅2 are displayed.
various scenarios. The dynamic ensemble model can determine the and propagate the more desirable features to the deep layers. The
suitable combinations of outputs considering the latest performance use of a linear pruning technique is justified for two reasons. First,
under different scenarios. When there are extreme heights, the dynamic only linear relationships between the reservoir and outputs can be
ensemble module assists in assigning large weights to the layers which learned by the linear readout layer. Second, linear pruning is not
generate accurate estimations for the latest steps. The ablation study computationally intensive. After linear pruning, the remaining features
compares the dynamic ensemble with a static ensemble approach, the are fed into the corresponding readout and subsequent hidden layers.
mean operator. Although some forecasting literature claims that the The ablation study shows that employing a dynamic ensemble and the
simple equal-weight combination is a reliable option [63], the ablation pruning method can marginally increase performance. Furthermore,
the pruning approach is simple to integrate with other deep randomized
study’s results prove that the dynamic ensemble is more suitable on
neural networks.
significant wave height time series.
Although the current study demonstrates the superiority of the pro-
Besides the dynamic ensemble block, the reservoir pruning strategy posed edESN on significant wave height forecasting, there are several
is necessary for accurate forecasts. Randomized neural networks have limitations. First, the long-term predictions of wave height are still
achieved significant success in energy forecasting [47]. However, one challenging based on the comparative results. Designing a forecasting
defect of the randomized neural networks is the naive hidden states model which can mine long-term dependency is critical. Second, the
generated by the untrained weights, which degrades the representation forecasting candidates in this paper are within one network, which
ability. In general, randomized neural networks overcome such defects discourages the diversity among the candidates. Hence, researchers can
by many hidden nodes. This paper proposes a straightforward linear include more different candidates or utilize different activation func-
pruning strategy to filter out shallow layers’ inferior random features tions for each hidden layer to foster diversity among multiple readouts.
13
R. Gao et al. Applied Energy 329 (2023) 120261
Fig. 7. Comparisons between predictions and true data at station 46083 of year 2017, 2018 and 2019.
Fig. 8. The barplot in terms of RMSE with a certain component purposefully removed.
Third, the automatic design of the deep network is crucial for forecast- optimized model. All the experiments are conducted using Intel i7-
ing. Evolutionary algorithms are potential solutions for the automatic 10700K CPU. First, randomized neural networks take less training
and data-driven design of the deep randomized networks [64,65]. time than gradient-based neural networks. Second, the recurrent neural
Finally, Table 16 records the computational cost in terms of the av- networks’ training is slower than feed-forward neural networks. For
erage simulation time. The optimization refers to the hyper-parameters instance, the ESN-based models show a longer time than ELM-based
tuning by cross-validation. Testing represents the evaluation of the models. Third, the proposed model’s testing takes 0.33 s, indicating its
14
R. Gao et al. Applied Energy 329 (2023) 120261
Fig. 9. The barplot in terms of MASE with a certain component purposefully removed.
Fig. 10. The barplot in terms of MAPE with a certain component purposefully removed.
computational cost is not heavy. The proposed model takes more time Table 16
Average simulation time.
than the DESN because of the reservoir pruning and multiple outputs
layers’ training. However, the increased computational burden of 0.08 Model Optimization Testing
s is not heavy because the added components are all linear with non- SVR 27.92 s 0.54 s
MLP 86.00 s 10.42 s
iterative solutions. In addition, randomized neural networks’ training
LSTM 868.34 s 1032.97 s
speed is high when trained by non-iterative methods. The non-iterative BRF 336.67 s 0.75 s
methods directly compute the closed-form solution of the output layer’s PSOELM 16.00 s 25.58 ms
weights. Hence, they need to collect all historical observations so EELM 75.17 s 1.83 s
that the closed-from solutions can be precisely calculated, indicating AELM 8.75 s 64.17 ms
LGBM 2.08 s 2.75 ms
that memory increases when the number of observations increases.
BMLP 804.17 s 80.25 s
The proposed method, ELM-based and ESN-based models, utilize non- ESN 14.67 s 0.13 s
iterative training methods, so they consume more memory when the DESN 229.42 s 0.25 s
number of training observations increases. However, gradient-based Proposed 61.42 s 0.33 s
methods iteratively train neural networks in a batch mode, such as
MLP, BMP, and LSTM. Each iteration utilizes a batch of data taking less
memory than all data. When the number of training samples becomes wave height time series. Therefore, a dynamic ensemble module is
large, it is practical to train randomized neural networks with iterative designed to handle the ever-changing combination of forecasts. This
methods. ensemble module determines the combination weights according to
the most recent performance of each candidate. Finally, the dynamic
7. Conclusion combination of all readouts is the forecast for the significant wave
height.
DRPedESN, an ensemble deep learning model for significant wave
A detailed experiment on twelve significant wave height time series
height predictions, is proposed in this research. The DRPedESN uses
across three years is conducted. Three prediction horizons are evalu-
deep representations to train multiple readout layers. As a result,
ated. The proposed model is compared with nine forecasting methods
ensemble readouts limit the danger of overfitting while deep structures
to show its superiority. The performance is evaluated using three
collect multi-scale characteristics. A linear reservoir pruning approach
forecasting metrics. Then statistical tests are conducted to differentiate
removes the inferior information from each reservoir layer, allowing
the forecasting methods further. The following conclusions are drawn
only vital information to reach the deep levels. A direct link is estab-
from the experimental studies:
lished to overcome the deep reservoir layers’ excessive randomness to
connect each reservoir layer to the input layer. A static ensemble may • The prediction performance decreases with the increase of pre-
suffer from the dynamic and chaotic characteristics of the significant diction horizons.
15
R. Gao et al. Applied Energy 329 (2023) 120261
• Ensemble approaches outperform the respective single model on [10] Ajeesh K, Deka PC. Forecasting of significant wave height using support vector
significant wave height forecasting. regression. In: 2015 fifth international conference on advances in computing and
communications (ICACC). IEEE; 2015, p. 50–3.
• The ablation study demonstrates the necessity of reservoir prun-
[11] Berbić J, Ocvirk E, Carević D, Lončar G. Application of neural networks and
ing, direct connection, and dynamic ensemble block. support vector machine for significant wave height prediction. Oceanologia
• The dynamic ensemble, which assigns different weights at differ- 2017;59(3):331–49.
ent time steps, outperforms the static combination. [12] Deo MC, Jha A, Chaphekar A, Ravikant K. Neural networks for wave forecasting.
• The ensemble deep variation of the canonical ESN outperforms Ocean Eng 2001;28(7):889–98.
[13] Londhe S, Shah S, Dixit P, Nair TB, Sirisha P, Jain R. A coupled numerical and
the shallow ESN significantly.
artificial neural network model for improving location specific wave forecast.
Appl Ocean Res 2016;59:483–91.
This study proposes a novel ensemble deep learning network for
[14] Kaloop MR, Kumar D, Zarzoura F, Roy B, Hu JW. A wavelet-particle swarm
significant wave height forecasting. Advanced feature extraction al- optimization-extreme learning machine hybrid modeling for significant wave
gorithms can be coupled with the suggested network in the future height prediction. Ocean Eng 2020;213:107777.
to improve performance even more. In addition, testing the proposed [15] Cuadra L, Salcedo-Sanz S, Nieto-Borge J, Alexandre E, Rodríguez G. Computa-
methodology on additional renewable energy sources, such as solar and tional intelligence in wave energy: Comprehensive review and case study. Renew
Sustain Energy Rev 2016;58:1223–46.
wind, is practical and promising.
[16] Kumar NK, Savitha R, Al Mamun A. Ocean wave height prediction using
ensemble of extreme learning machine. Neurocomputing 2018;277:12–20.
CRediT authorship contribution statement [17] Cornejo-Bueno L, Nieto-Borge J, García-Díaz P, Rodríguez G, Salcedo-Sanz S.
Significant wave height and energy flux prediction for marine energy applica-
Ruobin Gao: Theoretical development, Empirical study, Literature tions: A grouping genetic algorithm–extreme learning machine approach. Renew
review and finishing the manuscript. Ruilin Li: Development of the Energy 2016;97:380–9.
[18] Cornejo-Bueno L, Rodríguez-Mier P, Mucientes M, Nieto-Borge J, Salcedo-Sanz S.
proposed model and revision of the manuscript. Minghui Hu: De-
Significant wave height and energy flux estimation with a genetic fuzzy system
velopment of the proposed model and revision of the manuscript. for regression. Ocean Eng 2018;160:33–44.
Ponnuthurai Nagaratnam Suganthan: Empirical study, Revision of [19] Gómez-Orellana A, Guijo-Rubio D, Gutiérrez P, Hervás-Martínez C. Simul-
the manuscript and making suggestions on the comparison in the taneous short-term significant wave height and energy flux prediction us-
experiments. Kum Fai Yuen: Revision of the manuscript. ing zonal multi-task evolutionary artificial neural networks. Renew Energy
2022;184:975–89.
[20] Ali M, Prasad R, Xiang Y, Deo RC. Near real-time significant wave height
Declaration of competing interest forecasting with hybridized multiple linear regression algorithms. Renew Sustain
Energy Rev 2020;132:110003.
The authors declare the following financial interests/personal rela- [21] Yang S, Xia T, Zhang Z, Zheng C, Li X, Li H, Xu J. Prediction of significant
tionships which may be considered as potential competing interests: wave heights based on CS-BP model in the south China sea. IEEE Access
2019;7:147490–500.
Ruobin Gao reports a relationship with Nanyang Technological Uni-
[22] Zanaganeh M, Mousavi SJ, Shahidi AFE. A hybrid genetic algorithm–adaptive
versity that includes: employment. Kum Fai Yuen reports a relation- network-based fuzzy inference system in prediction of wave parameters. Eng
ship with Nanyang Technological University that includes: employ- Appl Artif Intell 2009;22(8):1194–202.
ment. Ponnuthurai Nagaratnam Suganthan reports a relationship with [23] Roulston MS, Ellepola J, von Hardenberg J, Smith LA. Forecasting wave
Nanyang Technological University that includes: employment. height probabilities with numerical weather prediction models. Ocean Eng
2005;32(14–15):1841–63.
[24] Jaeger H, Haas H. Harnessing nonlinearity: Predicting chaotic systems and saving
Data availability energy in wireless communication. Science 2004;304(5667):78–80.
[25] Gao R, Du L, Duru O, Yuen KF. Time series forecasting based on echo
Data will be made available on request. state network and empirical wavelet transformation. Appl Soft Comput
2021;102:107111.
Acknowledgment [26] Chouikhi N, Ammar B, Rokbani N, Alimi AM. PSO-based analysis of echo
state network parameters for time series forecasting. Appl Soft Comput
2017;55:211–25.
We express our sincere gratitude to the National Data Buoy Center [27] Gallicchio C, Micheli A. Architectural and markovian factors of echo state
for the data provided. networks. Neural Netw 2011;24(5):440–56.
[28] Gallicchio C, Micheli A, Pedrelli L. Deep reservoir computing: A critical
References experimental analysis. Neurocomputing 2017;268:87–99.
[29] Hu H, Wang L, Lv S-X. Forecasting energy consumption and wind power
generation using deep echo state network. Renew Energy 2020;154:598–613.
[1] Crippa P, Alifa M, Bolster D, Genton MG, Castruccio S. A temporal model for
[30] Song Z, Wu K, Shao J. Destination prediction using deep echo state network.
vertical extrapolation of wind speed and wind energy assessment. Appl Energy
Neurocomputing 2020;406:343–53.
2021;301:117378.
[31] Bai K, Yi Y, Zhou Z, Jere S, Liu L. Moving toward intelligence: Detecting symbols
[2] Ma Q, Wang P, Fan J, Klar A. Underground solar energy storage via energy piles:
on 5g systems through deep echo state network. IEEE J Emerg Sel Top Circuits
An experimental study. Appl Energy 2022;306:118042.
Syst 2020;10(2):253–63.
[3] Gao H, Xiao J. Effects of power take-off parameters and harvester shape on wave
energy extraction and output of a hydraulic conversion system. Appl Energy [32] Wang T, Gao S, Bi F, Li Y, Guo D, Ren P. Residual learning with multifac-
2021;299:117278. tor extreme learning machines for waveheight prediction. IEEE J Ocean Eng
[4] Reikard G, Pinson P, Bidlot J-R. Forecasting ocean wave energy: The ECMWF 2020;46(2):611–23.
wave model and time series methods. Ocean Eng 2011;38(10):1089–99. [33] Özger M. Prediction of ocean wave energy from meteorological variables by
[5] Anastasiou S, Sylaios G. Nearshore wave field simulation at the lee of a large fuzzy logic modeling. Expert Syst Appl 2011;38(5):6269–74.
island. Ocean Eng 2013;74:61–71. [34] Gracia S, Olivito J, Resano J, Martin-del Brio B, de Alfonso M, Álvarez E. Improv-
[6] Soukissian TH, Prospathopoulos AM, Diamanti C. Wind and wave data ing accuracy on wave height estimation through machine learning techniques.
analysis for the aegean sea-preliminary results. Glob Atmos Ocean Syst Ocean Eng 2021;236:108699.
2002;8(2–3):163–89. [35] Fernández JC, Salcedo-Sanz S, Gutiérrez PA, Alexandre E, Hervás-Martínez C.
[7] Fan S, Xiao N, Dong S. A novel model to predict significant wave height based Significant wave height and energy flux range forecast with machine learning
on long short-term memory network. Ocean Eng 2020;205:107298. classifiers. Eng Appl Artif Intell 2015;43:44–53.
[8] Shamshirband S, Mosavi A, Rabczuk T, Nabipour N, Chau K-w. Prediction of [36] Li L, Yuan Z, Gao Y. Maximization of energy absorption for a wave energy
significant wave height; comparison between nested grid numerical model, and converter using the deep machine learning. Energy 2018;165:340–9.
machine learning models of artificial neural networks, extreme learning and [37] Bergmeir C, Benítez JM. On the use of cross-validation for time series predictor
support vector machines. Eng Appl Comput Fluid Mech 2020;14(1):805–17. evaluation. Inform Sci 2012;191:192–213.
[9] Mahjoobi J, Etemad-Shahidi A. An alternative approach for the prediction of [38] Alexandre E, Cuadra L, Nieto-Borge J, Candil-García G, Del Pino M, Salcedo-
significant wave heights based on classification and regression trees. Appl Ocean Sanz S. A hybrid genetic algorithm—extreme learning machine approach for
Res 2008;30(3):172–7. accurate significant wave height reconstruction. Ocean Model 2015;92:115–23.
16
R. Gao et al. Applied Energy 329 (2023) 120261
[39] Ali M, Prasad R, Xiang Y, Sankaran A, Deo RC, Xiao F, Zhu S. Advanced [50] Suganthan PN, Katuwal R. On the origins of randomization-based feedforward
extreme learning machines vs. deep learning models for peak wave energy neural networks. Appl Soft Comput 2021;105:107239.
period forecasting: A case study in Queensland, Australia. Renew Energy [51] Lukoševičius M, Jaeger H. Reservoir computing approaches to recurrent neural
2021;177:1031–44. network training. Comp Sci Rev 2009;3(3):127–49.
[40] Salcedo-Sanz S, Borge JN, Carro-Calvo L, Cuadra L, Hessner K, Alexandre E. Sig- [52] Kim T, King BR. Time series prediction using deep echo state networks. Neural
nificant wave height estimation using SVR algorithms and shadowing information Comput Appl 2020;32(23):17769–87.
from simulated and real measured X-band radar images of the sea surface. Ocean [53] Shi Q, Katuwal R, Suganthan P, Tanveer M. Random vector functional link neural
Eng 2015;101:244–53. network based ensemble deep learning. Pattern Recognit 2021;117:107978.
[41] Ali M, Prasad R. Significant wave height forecasting via an extreme learning [54] NDBC. National data buoy center. 2022, URL: https://www.ndbc.noaa.gov/.
machine model integrated with improved complete ensemble empirical mode [55] Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. Int J
decomposition. Renew Sustain Energy Rev 2019;104:281–95. Forecast 2006;22(4):679–88.
[42] Duan W, Han Y, Huang L, Zhao B, Wang M. A hybrid EMD-SVR model for the [56] Mahjoobi J, Mosabbeb EA. Prediction of significant wave height using regressive
short-term prediction of significant wave height. Ocean Eng 2016;124:54–73. support vector machines. Ocean Eng 2009;36(5):339–47.
[43] Huang W, Dong S. Improved short-term prediction of significant wave height [57] Wang H, Lei Z, Zhang X, Zhou B, Peng J. A review of deep learning for renewable
by decomposing deterministic and stochastic components. Renew Energy energy forecasting. Energy Convers Manage 2019;198:111799.
2021;177:743–58. [58] Wang H, Lei Z, Liu Y, Peng J, Liu J. Echo state network based ensemble approach
[44] Zhou S, Bethel BJ, Sun W, Zhao Y, Xie W, Dong C. Improving significant wave for wind power forecasting. Energy Convers Manage 2019;201:112188.
height forecasts using a joint empirical mode decomposition–long short-term [59] Sylaios G, Bouchette F, Tsihrintzis VA, Denamiel C. A fuzzy inference system for
memory network. J Mar Sci Eng 2021;9(7):744. wind-wave modeling. Ocean Eng 2009;36(17–18):1358–65.
[45] Deka PC, Prahlada R. Discrete wavelet neural network approach in significant [60] Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach
wave height forecasting for multistep lead time. Ocean Eng 2012;43:32–42. Learn Res 2006;7:1–30.
[46] Özger M. Significant wave height forecasting using wavelet fuzzy logic approach. [61] Wang Y, Wang L, Yang F, Di W, Chang Q. Advantages of direct input-to-output
Ocean Eng 2010;37(16):1443–51. connections in neural networks: The Elman network for stock index forecasting.
[47] Del Ser J, Casillas-Perez D, Cornejo-Bueno L, Prieto-Godino L, Sanz-Justo J, Inform Sci 2021;547:1066–79.
Casanova-Mateo C, Salcedo-Sanz S. Randomization-based machine learning in [62] Ren Y, Suganthan P, Srikanth N. Ensemble methods for wind and solar power
renewable energy prediction problems: critical literature review, new results and forecasting—A state-of-the-art review. Renew Sustain Energy Rev 2015;50:82–91.
perspectives. Appl Soft Comput 2022;108526. [63] Hsiao C, Wan SK. Is there an optimal forecast combination? J Econometrics
[48] Huang Y, Deng Y. A new crude oil price forecasting model based on variational 2014;178:294–309.
mode decomposition. Knowl-Based Syst 2021;213:106669. [64] Lynn N, Ali MZ, Suganthan PN. Population topologies for particle swarm
[49] Gao R, Du L, Yuen KF, Suganthan PN. Walk-forward empirical wavelet ran- optimization and differential evolution. Swarm Evol Comput 2018;39:24–35.
dom vector functional link for time series forecasting. Appl Soft Comput [65] Rajasekhar A, Lynn N, Das S, Suganthan PN. Computing with the collective
2021;108:107450. intelligence of honey bees–a survey. Swarm Evol Comput 2017;32:25–48.
17