0% found this document useful (0 votes)
17 views17 pages

Dynamic Ensemble Deep Echo State Network For Significant Wave Height Forecasting

This paper presents a novel dynamic ensemble deep Echo State Network (edESN) for forecasting significant wave heights, addressing challenges posed by the dynamic properties of wave data. The proposed model utilizes hierarchical reservoir layers and a pruning strategy to enhance representation and reduce uncertainty, outperforming existing forecasting methods across twelve datasets. The study emphasizes the importance of hyper-parameter optimization and feature selection in improving forecasting accuracy for wave energy systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views17 pages

Dynamic Ensemble Deep Echo State Network For Significant Wave Height Forecasting

This paper presents a novel dynamic ensemble deep Echo State Network (edESN) for forecasting significant wave heights, addressing challenges posed by the dynamic properties of wave data. The proposed model utilizes hierarchical reservoir layers and a pruning strategy to enhance representation and reduce uncertainty, outperforming existing forecasting methods across twelve datasets. The study emphasizes the importance of hyper-parameter optimization and feature selection in improving forecasting accuracy for wave energy systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Applied Energy 329 (2023) 120261

Contents lists available at ScienceDirect

Applied Energy
journal homepage: www.elsevier.com/locate/apenergy

Dynamic ensemble deep echo state network for significant wave height
forecasting
Ruobin Gao a , Ruilin Li b , Minghui Hu b , Ponnuthurai Nagaratnam Suganthan b,c , Kum Fai Yuen a ,∗
a
School of Civil and Environmental Engineering, Nanyang Technological University, Singapore
b
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
c
KINDI Center for Computing Research, College of Engineering, Qatar University, Doha, Qatar

ARTICLE INFO ABSTRACT

Keywords: Forecasts of the wave heights can assist in the data-driven control of wave energy systems. However, the
Forecasting dynamic properties and extreme fluctuations of the historical observations pose challenges to the construction
Machine learning of forecasting models. This paper proposes a novel dynamic ensemble deep Echo state networks (ESN) to learn
Deep learning
the dynamic characteristics of the significant wave height. The dynamic ensemble ESN creates a profound
Randomized neural networks
representation of the input and trains an independent readout module for each reservoir. To begin, numerous
Echo state network
reservoir layers are built in a hierarchical order, adopting a reservoir pruning approach to filter out the poorer
representations. Finally, a dynamic ensemble block is used to integrate the forecasts of all readout layers. The
suggested model has been tested on twelve available datasets and statistically outperforms state-of-the-art
approaches.

1. Introduction (SVR) [11], artificial neural networks (ANN) [12,13], extreme learning
machine (ELM) [14] etc. However, the existing methods do not fully
The consumption of non-renewable energy resources poses severe utilize deep representations and ensemble learning [15].
challenges in maintaining the rising global temperature and carbon The algorithmic learning of temporal patterns and intricate interac-
dioxide levels, prompting the development of renewable energy sources tions among numerous variables, such as wind direction, wind speed,
such as wind [1], solar [2], and ocean energy [3]. In comparison to gust speed, wave periods, and wave heights at previous time steps,
wind energy, wave energy has a higher level of assurance [4]. However, is complicated due to the dynamic and chaotic qualities of significant
the dynamic characteristics of the waves affect the reliable and precise wave height. Researchers have devoted to solving this challenging task.
forecasts of wave energy. For instance, an ensemble ELM (EELM) is proposed to reduce the single
One of the most important metrics describing ocean wave con- model’s uncertainty on wave height forecasting [16], but only external
ditions is significant wave height (WVHT). Wave energy is directly networks are considered in the ensemble pool. Genetic algorithm is
and strongly connected with significant wave height. Hence, precise implemented to select the most suitable input features for the ELM [17].
forecasts of significant wave height can provide solid recommenda- A hybrid evolutionary Takagi–Sugeno–Kant fuzzy system is proposed
tions for electricity generation. Numerical wave models are one way to forecast the significant wave height in a buoy of California’s West
to estimate wave parameters, such as simulating waves nearshore
Coast [18]. Besides generating one-step ahead forecast, multitask learn-
(SWAN) [5,6]. The numerical wave propagation models can generate
ing technique is employed to help the neural network forecast multiple
forecasts over the studied computation grid [5]. An alternative strat-
prediction horizons [19]. Ali et al. [20] implemented the covariance-
egy is to train machine learning models which can precisely forecast
weighted least square method to optimize the multiple linear regres-
wave time series in a data-driven fashion. Recently, researchers have
sion model for wave height forecasting. The hyper-parameters of the
found that the machine learning outperforms the numerical methods on
forecasting methods significantly affect the performance. Researchers
wave height forecasting [7,8]. Many researchers are concentrating their
investigate different evolutionary optimizations to automatically de-
efforts on implementing and creating superior data-driven forecast-
ing algorithms for this difficult and necessary task. These data-driven sign the forecasting methods. For instance, genetic algorithm, particle
algorithms include decision trees [9,10], support vector regression swarm optimization (PSO) and cuckoo search algorithm are utilized to

∗ Corresponding author.
E-mail addresses: [email protected] (R. Gao), [email protected] (R. Li), [email protected] (M. Hu), [email protected] (P.N. Suganthan),
[email protected] (K.F. Yuen).

https://doi.org/10.1016/j.apenergy.2022.120261
Received 20 April 2022; Received in revised form 2 October 2022; Accepted 27 October 2022
Available online 10 November 2022
0306-2619/© 2022 Elsevier Ltd. All rights reserved.
R. Gao et al. Applied Energy 329 (2023) 120261

Nomenclature LSTM Long short-term memory


MAPE Mean absolute percentage error
𝜖 The 𝜖-tube within which no penalty is
MASE Mean absolute scaled error
associated in the loss function
MLP Multi-layer perceptron
𝑤̂ 𝑖𝑀 (𝑖) The dynamic ensemble weight for 𝑖𝑀
candidate PSO Particle swarm optimization
𝜆𝑙 The 𝑙th layer’s regularization PSOELM Extreme learning machine optimized by
particle swarm optimization
𝐃𝑙 Concatenation of 𝐮(𝑡) and 𝐱(𝑡)𝑙−1
RMSE Root mean square error
𝐃𝑙𝑟 The best reservoir states of the 𝑙th layer
𝑙
RNN Recurrent neural network
𝐮(𝑡) Input to the network at time 𝑡
Std Standard deviation
𝐖𝑖𝑛 Echo state network’s input layer’s weights
SVR Support vector regression
𝐖𝑙𝑖𝑛 The 𝑙th layer’s weights
SWAN Simulating waves nearshore
𝐖𝑜𝑢𝑡 Echo state network’s readout layer’s
weights WDIR Wind direction
𝐖𝑙𝑜𝑢𝑡 The 𝑙th readout layer’s weights WSPD Wind speed
𝐖𝑟 Echo state network’s recurrent layer’s WVHT Significant wave height
weights
𝐖𝑙𝑟 The 𝑙th recurrent layer’s weights
𝐱(𝑡) Echo state network’s reservoir states at
optimize ANN for wave height prediction [21]. The number and ar-
time 𝑡
chitecture of fuzzy rules of the forecasting model are determined by
𝐱(𝑡)𝑙 The 𝑙th layer’s reservoir states at time 𝑡
a genetic algorithm [22]. Explanatory variables, such as numerical
𝐲𝑔 (𝑡) The global readout layer’s output at time 𝑡
weather predictions, have been shown to boost the forecasting accuracy
𝐲𝑙 (𝑡) The 𝑙th readout layer’s output at time 𝑡 of the significant wave height [23].
𝜃() The forecast performance indicators Among the fruitful forecasting literature, the echo state network
𝑎𝑏𝑠 Absolute values (ESN) has shown its superiority on various forecasting tasks [24,25].
𝐶 Support vector regression’s regularization The ESN utilizes a recurrent architecture to memorize and summarize
𝐿𝑡𝑒𝑠𝑡 The length of the test set the sequential information [24]. The recurrent connection is randomly
𝐿𝑡𝑟𝑎𝑖𝑛 The length of the training set initialized and fixed without training. Then, only a linear readout
𝑁𝐿 Number of hidden layers layer is trained with a closed-form solution [24]. Such a novel design
𝑁𝑜 Output vector’s dimension of recurrent neural networks (RNN) avoids the computation burden
𝑁𝑟 Reservoir states’ dimension of gradient propagation through time and the unstable training of
𝑁𝑢 Input vector’s dimension RNNs. Researchers have tried to enrich the representation ability of
the canonical ESN. For instance, empirical wavelet transformation is
𝑅𝐼 Reservoir states’ importance
utilized to determine the inputs for the ESN [25]. Chouikhi et al.
𝑠𝑖𝑛 Input scaling
[26] pre-trains the random weights using the PSO. Recently, with the
𝑥𝑚𝑎𝑥 Maximum of the data
resurgence of deep learning, which enriches the representations via
𝑥𝑚𝑖𝑛 Minimum of the data
multiple layers, Gallicchio and Micheli [27] propose a deep ESN (DESN)
𝑥𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 Normalized data model which stacks multiple reservoir layers to enforce a recurrent
AELM Advanced extreme learning machine deep representation of the input time series. The deep reservoir states
ANN Artificial neural network are constructed using the shallow ones, and the recurrent and input
APD Average wave period connections between reservoir layers are randomized without train-
BMLP Bagging multi-layer perceptron ing [28]. Then, all reservoir states are concatenated to formulate the
BO Bayesian optimization global states, which are used to train a linear readout layer. Since
DedESN Dynamic ensemble deep echo state network the birth of DESN, it has succeeded in various tasks, such as energy
DESN Deep echo state network consumption forecasting [29], destination prediction [30] and symbol
DRPedESN Dynamic ensemble deep echo state network detection in 5G system [31].
based on reservoir pruning This paper employs time series forecasting algorithms to estimate
DWTESN Ensemble echo state network based on the significant wave heights utilizing wave data collected by ocean
discrete wavelet transform wave buoys. Therefore, developing a novel ensemble deep ESN model
edESN Ensemble deep echo state network for significant wave height forecasting is the main objective. The in-
put variables investigated include wind speed (WSPD in m/s), wind
EELM Ensemble extreme learning machine
direction (WDIR in degT), average wave period (APD in seconds), peak
ELM Extreme learning machine
gust speed (GST in m/s), and the WVHT (meters), which are also
EMD Empirical mode decomposition
considered as input in [7,14,18]. The wind direction, wind speed, and
ESN Echo state network
gust speed are wind-related parameters [14]. The average wave periods
GST Peak gust speed and wave heights are wave-related parameters [14,18]. The output of
ICEMD Improved complete empirical mode decom- the forecasting model is the significant wave height. The developed
position forecasting model aims to learn the mapping from the input variables to
IMAE The inverse of mean absolute error the significant wave heights from the historical observations. This paper
IMSE The inverse of mean squared error proposes a novel ensemble deep ESN method with a pruning component
LGBM Light gradient boosting machine to eliminate the inferior features. Finally, a dynamic ensemble block,
which combines multiple estimations of the wave heights, is mainly
designed.

2
R. Gao et al. Applied Energy 329 (2023) 120261

Despite the fact that much effort has been dedicated to wave height can be integrated with the predictive controller. For instance, a novel
forecasting and constructing ESN models, there are several significant predictive controller based on a deep MLP, which is employed to
limitations in the literature. First, there is no research on ensemble forecast short-term wave forces, is proposed to assist in maximizing the
deep learning in the literature on wave height forecasting. Second, energy absorption for the wave energy converter [36].
the random nature of each reservoir may deteriorate the performance, A critical step of constructing the forecasting method is the hyper-
especially when the network becomes deep. Third, relying on only parameter optimization [37]. A proper selection of the hyper-
one readout layer using the global states is prone to overfit, and parameters assists in achieving precise forecasts. The most common
the massive dimension of the global states may not ensure a reliable practice is to utilize the grid search [35]. However, grid search relies
solution. Therefore, a novel ensemble deep ESN (edESN) is presented on the fixed choices given by the researchers and cannot explore
to address the aforementioned research gaps, inheriting the benefits of other configurations. Therefore, researchers have suggested to utilize
deep representations and ensemble learning. Unlike the shallow ESN, Bayesian optimization (BO) and evolutionary algorithm to determine
the edESN builds hierarchical reservoirs to enhance the input time the hyper-parameters. For instance, BO is utilized to determine the
series representations. Then, an independent readout layer is trained hyper-parameters of a hybrid ELM whose input is determined by a
to precisely model each scale’s representation. Moreover, a reservoir grouping genetic algorithm. The comparative results demonstrate the
pruning strategy is proposed to filter out the inferior representations. proposed model’s success in predicting the significant wave height
Finally, a dynamic ensemble module is utilized to combine all readout and the wave energy flux [17]. Furthermore, the hybrid ELM, whose
layers’ outputs, including the global readout layer. A detailed com- parameters are determined by BO, selects meaningful features for the
parative experiment on twelve significant wave height time series is prediction models, SVR and ELM. In addition to BO, genetic algorithm,
conducted to demonstrate the proposed novel edESN’s superiority and particle swarm optimization (PSO), and cuckoo search algorithm are
suitability. In conclusion, this work adds to the literature from the utilized to optimize ANN for wave height prediction [21]. The genetic
following perspectives: algorithm cannot only optimize ANNs, but also optimize the parameters
of the clustering part within a neural-fuzzy system [22].
• This paper explores the deep ESN’s forecasting ability on signifi-
In addition to the hyper-parameters optimization, the determina-
cant wave height time series for the first time to the authors’ best
tion of input plays a crucial role in precise forecasts. For instance, a
knowledge.
genetic algorithm is utilized to select features from nearby buoys for
• A novel ensemble deep ESN forecasting model is proposed for
the ELM to estimate the significant wave height [38]. Besides genetic
the significant wave height. The proposed edESN uses ensem-
algorithms, the partial autocorrelation function determines the ELM’s
ble learning to reduce model uncertainty while inheriting the
significant lags. This advanced ELM outperforms deep learning and
representation ability of deep architectures.
conventional machine learning models on peak wave energy period
• A straightforward reservoir pruning strategy is proposed to fil-
forecasting [39]. Besides working on time series data, the SVR trained
ter out the inferior reservoir states and facilitate building deep
on X-band radar images successfully estimates the significant wave
architectures. With the help of the proposed strategy, only the
height and outperforms the MLP [40].
meaningful states are propagated to the deep layers.
Since the patterns within the time series are dynamic, signal decom-
• A dynamic ensemble module considering the evolving properties
position techniques are utilized to de-noise the wave data and extract
of the wave time series is proposed for the edESN. Instead of using
features for the following modeling steps [41–43]. For instance, Duan
a static ensemble module, the dynamic module assigns evolving
et al. [42] implemented the empirical mode decomposition (EMD) to
combination weights to each readout layer based on the recent
extract multi-scale modes fed into the SVR for forecasting. EMD is also
performance.
proven to improve the long short-term memory network’s (LSTM’s)
The following is a breakdown of the paper’s structure. To make performance on the wave height forecasting [44]. Discrete wavelet
this research paper self-sufficient, Section 2 gives the preliminaries transformation is combined with ANN to forecast the wave heights
concerning the ESN and DESN. Section 3 describes the proposed model in India [45]. A hybrid fuzzy system combined with wavelet trans-
in depth. Section 4 delves into the experiments in detail, including formation outperforms ANNs and statistical methods [46]. The im-
descriptions of the data, pre-processing data methods, comparative proved complete ensemble empirical mode decomposition (ICEMD) is
results, and ablation studies. Section 5 presents insights into the ben- conducted to facilitate learning ELMs [41]. Although decomposition
efits and drawbacks of the proposed model, as well as possible future significantly boosts the forecasting accuracy, the data leakage problem
directions. Finally, conclusions are drawn. The algorithm is developed during decomposition is not analyzed and addressed for forecasting.
by the authors in the framework of this publication. The source codes This problem is presently being studied in the literature [47–49].
of the algorithm are available on https://github.com/P-N-Suganthan/
CODES. 3. Methodology

2. Related works This section explains the preliminaries regarding ESN and its deep
variant to make this article self-contained. This section begins with a
Intelligent algorithms have achieved significant success in wave brief introduction of the classical ESN. Then, the DESN, which stacks
heights prediction. For instance Shamshirband et al. [8] compare ELM, many reservoir layers to obtain multi-scale representation, is described.
SVR, and ANN in forecasting the significant wave height. The authors The input to the ESN-based models 𝐮(𝑡) ∈ R𝑁𝑢 is the wave time series
find that the ELM slightly outperforms the other machine learning consisting of five variables, the WVHT, WDIR, WSPD, APD, and GST.
methods. To further improve the ELM’s accuracy in wave height predic- Hence, the input dimension 𝑁𝑢 at time 𝑡 equals five. The model aims to
tion, residual learning is conducted [32]. In addition to the ELM-based estimate the value of WVHT at time 𝑡 + 1. The ESN processes the wave
methods, a fuzzy system is investigated only to utilize meteorological time series step by step until it reaches the end.
variables to estimate the wave energy without spectral wave mea-
surements [33]. Recently, the ensemble of MLP and gradient boosting 3.1. ESN
decision trees is shown to outperform the non-ensemble models [34].
Besides the predictions on continuous values, the wave heights are Randomized neural networks do not train the intermediate layer’s
discretized to formulate an ordinal classification problem, and the connections and only optimize the weights of the output layers utilizing
experimental results show that ordinal approaches outperform nominal closed-form solutions [50]. ESN distinguishes with the randomized
approaches [35]. Additionally, the intelligent forecasting algorithms neural networks because of its recurrent connections [24,51]. The

3
R. Gao et al. Applied Energy 329 (2023) 120261

Fig. 1. The canonical ESN’s architecture.

reservoir layer with recurrent information propagation is the ESN’s


distinguishing feature. The recurrent connections and input weights
are randomly defined without training, while the input signals are
represented by a suffix-based Markovian model [27]. Lastly, a linear
layer calculates the output using reservoir states. Fig. 1 shows the
architecture of the canonical ESN. For clarity, the bias is not included
in the formula. The reservoir state is characterized by the following
Fig. 2. The DESN’s architecture.
equation

𝐱(𝑡) = 𝑡𝑎𝑛ℎ(𝐖𝑖𝑛 𝐮(𝑡) + 𝐖𝑟 𝐱(𝑡 − 1)), (1)

where 𝐮(𝑡) ∈ R𝑁𝑢 and 𝐱(𝑡 − 1) ∈ R𝑁𝑟 denote the input wave time series where 𝐖1𝑖𝑛 ∈ R𝑁𝑟 ×𝑁𝑢 and 𝐖𝐫 1 ∈ R𝑁𝑟 ×𝑁𝑟 represent the first layer’s input
and reservoir state, respectively, 𝐖𝑖𝑛 ∈ R𝑁𝑟 ×𝑁𝑢 represents the input and recurrent connections.
layer’s connections and 𝐖𝑟 ∈ R𝑁𝑟 ×𝑁𝑟 denotes the recurrent weight The 𝑙th layer’s reservoir state 𝐱(𝑡)𝑙 can be computed by
matrix. The input at time 𝑡, 𝐮(𝑡), consists of the values at 𝑡 of the 𝐱(𝑡)𝑙 = 𝑡𝑎𝑛ℎ(𝐖𝑙𝑖𝑛 𝐱(𝑡)𝑙−1 + 𝐖𝐫 𝑙 𝐱(𝑡 − 1)𝑙 ), (4)
time series of WVHT, WDIR, WSPD, APD, and GST. Hence, the input
𝑙
dimension 𝑁𝑢 equals five. The ESN processes the wave time series step where 𝐖𝑙𝑖𝑛∈ R𝑁𝑟 ×𝑁𝑟 and 𝐖𝐫 ∈ R𝑁𝑟 ×𝑁𝑟 denote the input and recurrent
by step until it reaches the end. weights of the 𝑙th layer.
In general, 𝐖𝑟 is randomly initialized according to the uniform The 𝐖𝑜𝑢𝑡 is the unique part that requires learning faced by the ESN
distribution. The random weights are usually re-scaled to ensure the with a single reservoir layer. The DESN concatenates all hidden layers’
spectral characteristics. The connections in the input layer are also reservoir states to formulate the global states, 𝐱(𝑡) ∈ R𝑁𝑟 𝑁𝐿 , then 𝐲(𝑡) is
created at random from a uniform distribution [−𝑠𝑖𝑛 , 𝑠𝑖𝑛 ], where the 𝑠𝑖𝑛 computed by
represents the input-scaling.
𝐲(𝑡) = 𝐖𝑜𝑢𝑡 𝐱(𝑡), (5)
The ESN calculates the output at time 𝑡 by linearly combining the
reservoir states, where 𝐖𝑜𝑢𝑡 ∈ R𝑁𝑜 ×𝑁𝑟 𝑁𝐿 represents the weights of the readout layer.

𝐲(𝑡) = 𝐖𝑜𝑢𝑡 𝐱(𝑡), (2)


4. The proposed model
where 𝐲(𝑡) ∈ R𝑁𝑜 represents the output and 𝐖𝑜𝑢𝑡 ∈ R𝑁𝑜 ×𝑁𝑟 denotes
the readout connections. A common approach is to utilize the ridge Although the DESN demonstrates its success in some forecasting
regression to train 𝐖𝑜𝑢𝑡 . tasks [52], the architecture has a number of flaws. This section presents
a novel forecasting model based on the ESN, the dynamic ensemble
3.2. Deep ESN deep ESN based on reservoir pruning (DRPedESN), to overcome the
limitations in the DESN. The proposed model aims to improve the deep
Inspired by the outstanding performance of deep neural networks, ESN from the following perspectives:
several pioneers have stacked many reservoirs to construct a deep First, each reservoir layer is generated based on the reservoir states
version of the ESN, DESN [28]. Fig. 2 presents a DESN’s architecture of the shallow levels. However, the reservoir’s random nature may
with 𝑁𝐿 reservoir layers. Each reservoir layer contains 𝑁𝑟 recurrent degrade performance as the networks go deeper. As a result, a direct
hidden units. Such deep architecture facilitates extracting hierarchical link is established between each reservoir layer and the input layer to
temporal patterns from the input sequential data. To fully utilize the guide reservoir creation.
features from all scales, all reservoir layers’ states are concatenated to Second, there are inferior representations of each layer’s reservoir
train a single global readout layer. We utilize 𝐖𝑙𝑟 ∈ R𝑁𝑟 ×𝑁𝑟 to denote states due to the random nature of the reservoir connections. These
the recurrent connections of the 𝑙th layer. The 𝑙th layer’s reservoir state inferior reservoir states hamper the performance of the deep layers and
global representations. When growing the deep architectures, a reser-
at time 𝑡 is denoted as 𝐱(𝑡)𝑙 . The first layer’s reservoir states at time 𝑡
voir pruning step is proposed to filter out these inferior representations.
are calculated using the following equation
Finally, only the reservoir states with strong representation ability are
𝐱(𝑡)1 = 𝑡𝑎𝑛ℎ(𝐖1𝑖𝑛 𝐮(𝑡) + 𝐖𝐫 1 𝐱(𝑡 − 1)1 ), (3) propagated to the deep layers.

4
R. Gao et al. Applied Energy 329 (2023) 120261

Third, there is only one readout layer that uses global representa- layer. The shallow layer’s structures are fixed when they have been
tions, resulting in a high level of prediction uncertainty. All reservoir determined, and the cross-validation is conducted to the next layer.
states are concatenated into a single and massive vector. A matrix Layer-wise cross-validation assigns its own set of hyper-parameters for
inversion of size 𝑁𝑟 𝑁𝐿 × 𝑁𝑟 𝑁𝐿 is necessary to train the DESN’s readout each layer. As a result, each readout layer owns its regularization
connections. Mostly, the matrix inversion of size 𝑁𝑟 𝑁𝐿 ×𝑁𝑟 𝑁𝐿 requires strength, enabling the edESN to learn a variety of accurate readout
a 𝑂((𝑁𝑟 𝑁𝐿 )3 ) time and 𝑂((𝑁𝑟 𝑁𝐿 )2 ) memory [53]. The inversion of a layers. The training algorithm is presented in Algorithm 1.
huge matrix places a significant strain on the hardware’s memory and
may result in an out-of-memory error. As a result, an unique framework
Algorithm 1: Training algorithm for the edESN
for training the DESN model is proposed, which splits the huge 𝐖𝑜𝑢𝑡
Input: 𝑁𝑟 , the reservoir dimension
into tiny 𝐖𝑙𝑜𝑢𝑡 for each layer. Each layer’s 𝐖𝑙𝑜𝑢𝑡 is trained independently,
and each layer may be thought of as a distinct ESN. In this manner, each 𝑁𝐿 ,the number of reservoir layers
layer necessitates the inversion of matrix of size 𝑁𝑟 × 𝑁𝑟 . 𝜆𝑙 ,the 𝑙𝑡ℎ layer’s regularization parameter
Fourth, a dynamic ensemble block is added to integrate all fore- Output: 𝐖𝑜𝑢𝑡 = [𝐖1𝑜𝑢𝑡 , ..., 𝐖𝑙𝑜𝑢𝑡 ], 𝐖𝑔𝑜𝑢𝑡
casts considering the evolving properties. The recent performance of 1 Initialize the 𝐖1𝑖𝑛 and 𝐖1𝑟 randomly
each forecasting module offers the most valuable information about 2 𝑙=1
the evolving characteristics. Therefore, a dynamic ensemble block is 3 for 𝑙 ≤ 𝐿 do
especially intended to aggregate the edESN’s outputs, while taking into 4 if 𝑙 == 1 then
account each forecasting candidate’s most recent performance. 5 Calculate the reservoir states 𝐱1 using 𝐖1𝑖𝑛 and 𝐖1𝑟 using
The proposed edESN is explained in detail in the following lines. Eq. (3)
Fig. 3 presents the architecture of an edESN with 𝑁𝐿 hidden layers. 6 Calculate 𝑅𝐼 1 using Eq. (7)
Unlike the DESN with a single readout layer depicted in Fig. 2, there 7 Calculate the first layer’s output connections 𝐖1𝑜𝑢𝑡 using
are 𝑁𝐿 + 1 readout layers consisting of 𝑁𝐿 readout layers and a global 𝜆1 as in Eq. (9)
readout layer. The global readout layer utilizes all reservoir states. After 8 else
training all readout layers, the 𝑦(𝑡) is computed by a dynamic ensemble 9 Initialize the 𝐖𝑙𝑖𝑛 and 𝐖𝑙𝑟 randomly
module. The 𝑙th layer’s reservoir states are calculated as the same as the 10 Calculate the reservoir states 𝐱𝑙 using 𝐖𝑙𝑖𝑛 and 𝐖𝑙𝑟 as in
DESN based on Eqs. (3) and (4). The output 𝐲𝑙 (𝑡) of 𝑙th readout layer Eq. (10)
can be calculated by 11 Calculate 𝑅𝐼 𝑙 using Eq. (7)
12 Calculate 𝑙𝑡ℎ layer’s output connections 𝐖𝑙𝑜𝑢𝑡 using 𝜆𝑙 as
𝑦(𝑡) = 𝐖𝑙𝑜𝑢𝑡 𝐱(𝑡)𝑙 , (6) in Eq. (9)
13 end
where 𝐖𝑙𝑜𝑢𝑡 ∈ R𝑁𝑜 ×𝑁𝑟 is the 𝑙th layer’s readout connections.
14 𝑙++
This article employs a linear reservoir pruning strategy because
of the quick computation, excellent interpretability, and compatibility 15 end
with ESN’s readout layer. This linear pruning strategy is appropriate 16 Train the global readout 𝐖𝑔𝑜𝑢𝑡 using all reservoir states
because of the linear property of the readout layer. The reservoir
importance can be computed by Eq. (7). 5. Experiments
𝑙 𝑙𝑇 𝑙 −1 𝑙𝑇
𝑅𝐼 = 𝑎𝑏𝑠((𝐃 𝐃 ) 𝐃 𝑌 ), (7)
5.1. Data
where 𝐃𝑙 = [𝐱(𝑡)𝑙−1 , 𝐮(𝑡)]
∈ R𝑁𝑟 +𝑁𝑢
and 𝑎𝑏𝑠() denotes the absolute
values. The magnitude of the 𝑅𝐼 indicates the respective reservoir The significant wave height data from four national buoy center
states’ contribution to the particular output. Larger values indicate a stations, 46083, 46080, 46076, 46001 and 46077, of years 2017,
possible greater effect on the forecasting performance. Then, the best 2018, 2019 are collected for experimental analysis [54]. Their station
reservoir states 𝐃𝑙 are used to train the successive layers. After reservoir identities are five-character alpha-numeric according to the World Me-
pruning, the loss function of 𝑙th readout layer is defined as teorological Organization (WMO). The first two numbers represent the
continental or oceanic region. The specific location is defined using the
𝐿𝑜𝑠𝑠𝑙 = ‖𝐃𝑙𝑟 𝐖𝑙𝑜𝑢𝑡 − 𝑦(𝑡)‖2 + 𝜆𝑙 ‖𝐖𝑙𝑜𝑢𝑡 ‖2 , (8)
𝑙 last three numbers. These buoy stations’ information is summarized in
where 𝐃𝑙𝑟
indicates the best reservoir states, 𝐖𝑙𝑜𝑢𝑡
denotes the connec- Table 1 and the stations’ locations are visualized in Fig. 4. Fig. 4 shows
𝑙
tions of the readout and 𝜆𝑙 is this layer’s regularization parameter. that these buoys are dispersed in or around the Gulf of Alaska, which
The readout connections of the 𝑙th layer, 𝐖𝑙𝑜𝑢𝑡 , can be computed by reflects extreme significant heights. In addition, these stations are not
the following equation, close to each other, which reflects the distinct spatial characteristics.
𝑇 𝑇 Four buoys are near coastlines, and one buoy 46001 in the western Gulf
𝐖𝑙𝑜𝑢𝑡 = (𝐃𝑙𝑟 𝐃𝑙𝑟 + 𝜆𝑙 𝐈)−𝑙 𝐃𝑙𝑟 𝑦(𝑡). (9) of Alaska is in deep oceanic sites. The wind direction, the average wave
𝑙 𝑙 𝑙

Then the deep layers’ reservoir states are computed by period, the wind speed, and the peak gust speed are used as explanatory
factors. The computations of these variables are the same as the official
𝐱(𝑡)𝑙 = 𝑡𝑎𝑛ℎ(𝐖𝑙𝑖𝑛 𝐃𝑙−1
𝑟 + 𝐖𝑙𝑟 𝐱(𝑡 − 1)𝑙 ). (10) definitions by National Data Buoy Center.1 The WSPD (m/s) is the
𝑙−1
average speed of an eight-minute period. The WDIR represents the
After collecting all readout layers’ outputs, a dynamic ensemble is
direction the wind is coming from in degrees clockwise from true North.
used to integrate them. The weight candidate 𝑤̂ 𝑖𝑀 (𝑖) of readout 𝑖𝑀 for
The highest third of all wave heights throughout a twenty-minute
time 𝑖 is computed using Eq. (11),
sample period are averaged to get the WVHT (m). The average wave
𝜃(𝑦̂∗𝑖 (𝑡) − 𝑦(𝑡)) period (in seconds) for all waves throughout a twenty-minute period
𝑀
𝑤̂ 𝑖𝑀 (𝑖) = ∑ (11)
𝑁𝑙 +1
𝜃(𝑦̂∗𝑗 (𝑡) − 𝑦(𝑡)) is represented by the APD. The GST (m/s) represents the gust speed
𝑗 =1 𝑀 𝑀
recorded during the eight-minute or two-minute period. The descriptive
where 𝜃() denotes the forecasting performance indicators. This paper statistics of all variables are summarized in Table 2. The statistics of
utilizes the inverse of mean squared error (IMSE), the inverse of mean each buoy’s time series of three years are computed separately. The
absolute error (IMAE) and Softmax for the weights calculations.
Because the performance of the upper layers is dependent on the
1
lower ones, the entire model’s configurations are modified layer by https://www.ndbc.noaa.gov/measdes.shtml.

5
R. Gao et al. Applied Energy 329 (2023) 120261

Fig. 3. Architecture of the DRPedESN.

Table 1
Information of the studied buoys.
Station Longitude Latitude Water depth
46083 138.019 W 58.270 N 128.9 m
46080 150.042 W 57.947 N 254.5 m
46076 148.009 W 59.471 N 192.0 m
46001 148.057 W 56.291 N 4139.0 m
46077 154.211 W 57.869 N 200.0 m

activation using the following equation:


𝑥 − 𝑥𝑚𝑖𝑛
𝑥𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 = 2 ∗ −1 (12)
𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
where 𝑥𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 and 𝑥 denote the normalized and raw observations,
respectively.

5.3. Evaluation metrics

Fig. 4. Locations of the investigated stations. To assess the accuracy of these models, four forecasting evaluation
metrics are used. The root mean square error (RMSE) is the first
evaluation metric as shown in the following equation
mean WPSD is around 6 m/s for all buoys, and the maximums of WPSD √

√ 𝐿𝑡𝑒𝑠𝑡
for all buoys are larger than 19 m/s. The maximum WVHT of 11.06 m √ 1 ∑
is observed in 2017 at buoy 46001 h. A detailed definition of these 𝑅𝑀𝑆𝐸 = √ (𝑥̂ − 𝑥𝑗 )2 , (13)
𝐿𝑡𝑒𝑠𝑡 𝑗=1 𝑗
explanatory factors is provided in [54]. The observations are recorded
and maintained on an hourly basis. The WDIR of 252 deg provides the where 𝐿𝑡𝑒𝑠𝑡 is the length of the test set, 𝑥𝑗 and 𝑥̂𝑗 are the ground truth
highest wave height of 11.06 m. and forecasts. The second evaluation metric is the mean absolute scaled
error (MASE) [55]. MASE can be calculated by
5.2. Data pre-processing 𝑥̂𝑗 − 𝑥𝑗
𝑀𝐴𝑆𝐸 = 𝑚𝑒𝑎𝑛( ∑𝐿𝑡𝑟𝑎𝑖𝑛 ), (14)
1
𝐿𝑡𝑟𝑎𝑖𝑛 −1 𝑡=2
|𝑥𝑡 − 𝑥𝑡−1 |
A proper data pre-processing strategy aids machine learning mod-
els in producing reliable results. We suppose that the training set’s where 𝐿𝑡𝑟𝑎𝑖𝑛 represents the size of training set. The mean absolute error
maximum and minimum values are 𝑥𝑚𝑎𝑥 and 𝑥𝑚𝑖𝑛 , respectively. The of the in-sample Persistence forecast is the denominator of MASE. The
investigated data are normalized into the range [−1,1] due to the 𝑡𝑎𝑛ℎ mean absolute percentage error (MAPE) is the third error metric, and

6
R. Gao et al. Applied Energy 329 (2023) 120261

Table 2
Descriptive statistics.
2017 2018 2019
Station Variable Mean Median Max Std Mean Median Max Std Mean Median Max Std
46083 WDIR 199.51 187.00 360.00 82.37 193.31 172.00 360.00 90.29 188.60 148.00 360.00 99.53
WSPD 6.44 5.60 23.40 4.20 6.61 5.70 21.60 4.25 6.32 5.60 20.90 3.92
GST 8.08 6.90 33.40 5.05 8.21 7.10 27.70 5.13 7.88 7.00 26.60 4.71
APD 6.65 6.50 11.32 1.32 6.67 6.60 11.46 1.26 6.74 6.65 11.44 1.26
WVHT 2.12 1.77 10.17 1.25 2.13 1.81 9.94 1.21 2.09 1.76 8.17 1.17
46080 WDIR 167.06 160.00 360.00 87.69 185.63 195.00 360.00 85.59 188.60 148.00 360.00 99.53
WSPD 6.52 5.90 20.30 3.41 6.66 6.20 21.80 3.34 6.32 5.60 20.90 3.92
GST 7.95 7.20 25.70 4.12 8.20 7.50 28.40 4.08 7.88 7.00 26.60 4.71
APD 5.72 5.62 9.53 1.03 6.46 6.35 11.10 1.15 6.74 6.65 11.44 1.26
WVHT 1.87 1.54 7.67 1.08 2.16 1.81 8.16 1.21 2.09 1.76 8.17 1.17
46076 WDIR 184.13 197.00 360.00 106.32 176.44 179.00 360.00 103.75 170.66 163.00 360.00 102.29
WSPD 6.25 5.60 21.70 3.71 6.43 5.80 22.80 3.72 6.25 5.70 20.10 3.68
GST 7.76 7.00 26.80 4.47 7.94 7.20 28.70 4.49 7.73 7.00 25.90 4.42
APD 6.33 6.17 10.53 1.16 6.46 6.31 11.13 1.19 6.42 6.30 10.71 1.08
WVHT 1.91 1.53 8.76 1.19 2.01 1.70 10.18 1.21 1.93 1.60 9.36 1.16
46001 WDIR 189.51 191.00 360.00 93.36 196.27 205.00 359.00 83.61 213.82 223.00 360.00 75.59
WSPD 7.58 7.20 20.00 3.54 7.56 7.10 21.00 3.52 7.93 7.60 20.10 3.78
GST 9.38 8.90 25.70 4.33 9.33 8.70 26.60 4.29 9.88 9.30 26.20 4.61
APD 6.35 6.17 12.44 1.32 6.23 6.10 12.36 1.26 6.97 6.87 13.97 1.26
WVHT 2.51 2.18 11.06 1.37 2.52 2.24 10.23 1.30 2.85 2.56 9.11 1.43
46077 WDIR 156.94 185.00 360.00 100.52 147.59 170.00 360.00 98.81 155.96 186.00 360.00 97.76
WSPD 6.69 6.50 21.10 3.73 7.17 6.90 21.90 3.85 7.16 7.00 19.90 3.59
GST 8.12 7.70 26.50 4.47 8.69 8.20 26.50 4.62 8.65 8.30 25.30 4.28
APD 4.57 4.45 16.91 1.03 4.65 4.54 8.96 0.89 4.58 4.46 10.16 0.95
WVHT 0.98 0.82 5.34 0.67 1.07 0.90 4.95 0.66 1.02 0.87 4.48 0.61

its definition is as follows: Table 3


Forecasting methods’ hyper-parameter search space.
𝐿𝑡𝑒𝑠𝑡
1 ∑ 𝑥̂𝑗 − 𝑥𝑗 Forecasting method Hyper-parameter Space
𝑀𝐴𝑃 𝐸 = | |. (15)
𝐿𝑡𝑒𝑠𝑡 𝑗=1 𝑥𝑗 SVR 𝐶 [2−5 , 25 ]
𝜖 [2−5 , 2−1 ]
Besides the above forecasting errors, the slope between forecasts and Kernel radius [0.001,0.01,0.1]
raw observations in the scatter plot is utilized to evaluate the perfor-
MLP Hidden dimension [12,24,36,48]
mance. Fitting a straight regression line between the forecasts and raw Hidden layers [1,2,4]
observations based on the following equation, Optimizer 𝐴𝑑𝑎𝑚
Activation function 𝑅𝑒𝑙𝑢
𝑥𝑗 = 𝛾 𝑥̂𝑗 . (16)
LSTM Hidden dimension [12,24,36,48]
Hidden layers [1,2,4]
The slope 𝛾 = 1 represents the best fitness. When 𝛾 < 1, the 𝑥̂𝑗 > 𝑥𝑗 ,
Optimizer 𝐴𝑑𝑎𝑚
indicating the model overestimates the observations. When 𝛾 > 1, the Activation function 𝑇 𝑎𝑛ℎ
𝑥̂𝑗 < 𝑥𝑗 , indicating the model underestimates the observations.
ELM-based models Hidden dimension [100,600,100]
Selection ratio [0.1, 0.2, 0.3]
5.4. Hyper-parameter optimization LGBM Estimators [200,400,600]
Number of leaves [2, 4, 8, 16]
This section presents the hyper-parameter settings for the compara- ESN/DESN/edESN Reservoir dimension [23 , 24 , … , 29 ]
tive study. We compare the proposed edESN with the methods from the Transients 12,24
literature including SVR [56], deep multi-layer perceptron (MLP) [57], Regularization parameter [10−9 , 10−11 , 10−13 ]
deep LSTM LSTM [7], ELM optimized by PSO (PSOELM) [14], en- Input scalings [1, 0.1, 0.01, 0.001]
Inferior states [1, 2, 3]
semble ELM (EELM) [16], advanced ELM (AELM) [39], bagging MLP
(BMLP) [34], light gradient boosting machine (LGBM) [34], the en-
semble of BMLP and LGBM (BMLP + LGBM) [34], ensemble ESN
based on discrete wavelet transform (DWTESN) [58] and DESN [28]. in bold denote that the corresponding model outperforms the others
A cross-validation procedure is implemented to optimize all models’ based on the given the metric and dataset. Based on these forecasting
hyper-parameters. These twelve time series are divided into three parts, performance, the authors conclude that the forecasting accuracy drops
the training, validation and test set, to implement the cross-validation when the prediction steps become large, which coincides with the
for time series methods [37]. For all variants of deep ESN, the number findings in [59].
of reservoir layers is set as five for a fair comparison. The validation and
The slopes between all models’ forecasts and observations are sum-
test sets make up 10% and 20% of the total time series, respectively.
marized in Tables 7, 8, and 9. The goodness of fitting is proportional to
The training set is made up of the remaining data. Table 3 presents the
the deviation between the slope and one. Tables 7, 8, and 9 show that
search space of the hyper-parameters of all forecasting methods.
the model slightly under-estimates the wave heights. The slopes’ aver-
5.5. Comparative results age, median, minimum, and maximum are computed to demonstrate
the superiority of the proposed model.
Tables 4–6 summarize the forecasting performance of one-step, two- After presenting the forecasting metrics, the statistical Nemenyi test,
steps and four-steps prediction horizons, respectively. The last two is conducted to differentiate the methodologies [60]. The Nemenyi
columns represent the two proposed forecasting methods. The numbers test utilizes the critical distance to characterize the difference among

7
R. Gao et al. Applied Energy 329 (2023) 120261

Table 4
Comparative results of one-step ahead forecasting.
Year Station Metric Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRPedESN
2017 46083 RMSE 0.24859 0.24984 0.26899 0.23940 0.26942 0.24017 0.24391 0.25106 0.25384 0.24184 0.35952 0.27109 0.23578 0.23740
MASE 1.41322 1.48899 1.54900 1.35937 1.57080 1.38799 1.37987 1.43559 1.48490 1.39003 2.22196 1.57675 1.34807 1.36233
MAPE 0.06503 0.07640 0.07882 0.06241 0.07795 0.06686 0.06454 0.06696 0.07359 0.06650 0.11715 0.07654 0.06308 0.06408

46080 RMSE 0.20169 0.19632 0.20695 0.19720 0.19335 0.19105 0.19069 0.19282 0.19766 0.18500 0.19106 0.18902 0.18554 0.18414
MASE 1.10527 1.12968 1.17885 1.09868 1.08360 1.09580 1.04894 1.04830 1.11799 1.04288 1.09542 1.07708 1.03916 1.03851
MAPE 0.06626 0.07117 0.07397 0.06602 0.06676 0.06724 0.06342 0.06323 0.06959 0.06379 0.06796 0.06671 0.06346 0.06373

46076 RMSE 0.25217 0.24153 0.24307 0.25109 0.27928 0.23477 0.23858 0.26356 0.23739 0.23809 0.23996 0.26750 0.23449 0.23185
MASE 1.54416 1.52243 1.53729 1.52562 1.74492 1.47597 1.44396 1.52848 1.47029 1.43681 1.49563 1.69663 1.45652 1.43968
MAPE 0.06601 0.06942 0.07121 0.06485 0.08040 0.06638 0.06240 0.06441 0.06481 0.06201 0.06700 0.07871 0.06421 0.06336

46001 RMSE 0.36873 0.34045 0.34537 0.33844 0.81509 0.69448 0.97589 0.34544 0.34441 0.33067 0.96677 0.63167 0.34771 0.34502
MASE 1.56618 1.50970 1.59740 1.48264 3.70363 3.21757 4.00501 1.48179 1.52474 1.43095 4.25479 2.68681 1.51357 1.46509
MAPE 0.06433 0.06378 0.07237 0.06080 0.14731 0.14130 0.16693 0.06042 0.06535 0.05978 0.19323 0.10856 0.06293 0.06025

46077 RMSE 0.15979 0.14095 0.16808 0.15833 0.16034 0.14406 0.14781 0.15525 0.16182 0.15118 0.14169 0.17224 0.14215 0.13850
MASE 1.44464 1.33906 1.55592 1.45102 1.52607 1.37957 1.34309 1.40057 1.51554 1.39595 1.34706 1.64269 1.33407 1.31327
MAPE 0.08557 0.08301 0.09690 0.08868 0.09512 0.08645 0.08149 0.08276 0.09416 0.08459 0.08316 0.10418 0.08140 0.08074
2018 46083 RMSE 0.28614 0.27887 0.29846 0.28237 0.31618 0.27573 0.30088 0.31345 0.28228 0.28456 0.33639 0.32099 0.27447 0.27478
MASE 1.75679 1.68696 1.79547 1.69601 1.88296 1.67706 1.69461 1.76731 1.70141 1.67885 2.10664 1.92641 1.66090 1.66456
MAPE 0.06317 0.06091 0.06483 0.06089 0.06850 0.06046 0.06015 0.06200 0.06148 0.05981 0.08013 0.06988 0.05959 0.05965

46080 RMSE 0.28386 0.27161 0.29985 0.28037 0.30056 0.27158 0.27602 0.27777 0.27916 0.26531 0.29272 0.27404 0.26346 0.26442
MASE 1.61853 1.50792 1.74420 1.57514 1.72282 1.53362 1.53800 1.54551 1.55781 1.47812 1.65660 1.54311 1.49826 1.49120
MAPE 0.06867 0.06476 0.07769 0.06763 0.07563 0.06588 0.06519 0.06530 0.06644 0.06263 0.07152 0.06599 0.06404 0.06352

46076 RMSE 0.32167 0.31419 0.31846 0.32591 0.32955 0.30410 0.31127 0.33795 0.31251 0.31303 0.38467 0.31819 0.30592 0.30359
MASE 1.85914 1.80971 1.82240 1.82707 1.91545 1.73778 1.74962 1.84522 1.78497 1.74621 2.31531 1.83647 1.75385 1.73875
MAPE 0.07078 0.07178 0.07177 0.06931 0.07756 0.06669 0.06599 0.06793 0.06999 0.06642 0.09564 0.07096 0.06723 0.06634

46001 RMSE 0.33517 0.34262 0.34765 0.32742 0.35373 0.33649 0.36130 0.32962 0.33458 0.32414 0.34074 0.31714 0.31637 0.31730
MASE 1.61299 1.58561 1.64166 1.56530 1.63453 1.54810 1.58005 1.52961 1.56676 1.50667 1.63780 1.50463 1.49224 1.50513
MAPE 0.07028 0.06761 0.07111 0.06785 0.06964 0.06588 0.06604 0.06534 0.06697 0.06443 0.07205 0.06533 0.06462 0.06538

46077 RMSE 0.16257 0.14334 0.15137 0.15425 0.15722 0.14502 0.14041 0.14685 0.14633 0.14214 0.18607 0.16052 0.14208 0.14175
MASE 1.44480 1.30025 1.40574 1.36399 1.45088 1.32012 1.26208 1.30045 1.33321 1.28223 1.75709 1.47948 1.28024 1.26894
MAPE 0.09123 0.08471 0.09487 0.08612 0.09815 0.08688 0.08194 0.08247 0.08741 0.08245 0.12370 0.10071 0.08258 0.08169
2019 46083 RMSE 0.26634 0.25853 0.26379 0.25730 0.28113 0.25806 0.24998 0.26443 0.25661 0.25360 0.27550 0.26469 0.25534 0.25332
MASE 1.73494 1.68967 1.72040 1.68018 1.83740 1.69057 1.62198 1.71111 1.67387 1.65197 1.81602 1.73164 1.65520 1.64634
MAPE 0.06152 0.06120 0.06244 0.05947 0.06654 0.06053 0.05718 0.05989 0.05986 0.05838 0.06617 0.06219 0.05866 0.05812

46080 RMSE 0.30868 0.51632 0.28875 0.30588 0.63474 0.41289 0.53173 0.36993 0.38593 0.34743 0.45787 0.58912 0.35550 0.29828
MASE 1.76912 2.40389 1.67993 1.72004 2.76016 2.05596 2.26655 1.85837 2.04591 1.87059 2.33119 2.43200 1.86991 1.69222
MAPE 0.06543 0.08710 0.06460 0.06276 0.09819 0.07321 0.08251 0.06371 0.07356 0.06575 0.08644 0.08850 0.06883 0.06182

46076 RMSE 0.27466 0.25971 0.27986 0.26357 0.29387 0.25094 0.25375 0.27155 0.26549 0.25644 0.37865 0.27704 0.25694 0.25462
MASE 1.70686 1.66799 1.79870 1.65280 1.84393 1.59266 1.60124 1.65950 1.67286 1.60180 2.53186 1.76053 1.61079 1.60169
MAPE 0.06484 0.06597 0.06985 0.06351 0.07180 0.06082 0.06069 0.06199 0.06434 0.06070 0.10614 0.06834 0.06153 0.06094

46001 RMSE 0.33843 0.30603 0.35898 0.31758 0.31962 0.30420 0.32039 0.33017 0.30744 0.30923 0.32825 0.41794 0.30846 0.31249
MASE 1.46866 1.33732 1.60022 1.38553 1.39470 1.32748 1.37501 1.43394 1.33229 1.34367 1.42688 1.78485 1.33527 1.35623
MAPE 0.06362 0.05870 0.07354 0.06003 0.06100 0.05856 0.05928 0.06177 0.05873 0.05849 0.06237 0.07950 0.05812 0.05901

46077 RMSE 0.14961 0.13869 0.14813 0.15803 0.15611 0.14181 0.13683 0.16460 0.14219 0.14422 0.15599 0.15663 0.13898 0.13702
MASE 1.49769 1.39929 1.55257 1.50763 1.57759 1.44998 1.39218 1.51969 1.46550 1.44054 1.58623 1.59211 1.40635 1.38545
MAPE 0.07948 0.07405 0.08709 0.07797 0.08522 0.07787 0.07417 0.07757 0.07986 0.07614 0.08581 0.08565 0.07487 0.07383

models. The critical distance’s definition is presented by: the extracted features of signal decomposition. The ensemble of BMLP
√ and LGBM achieves competitive performance, and outperforms the
𝑘(𝑘 + 1) BMLP and LGBM. In addition, the superiority of EELM over PSOELM
𝐶𝐷 = 𝑞𝛼 (17)
6𝑁𝑑 emphasizes the importance of ensemble learning.
where 𝑞𝛼 represents the critical value obtained from the studentized A pair-wise statistical Wilcoxon test is conducted for pair-wise com-

range statistic divided by 2, 𝑘 is the number of methodologies and parison. Tables 10–12 summarize the 𝑝 values of Wilcoxon test results
𝑁𝑑 is the number of time series [60]. In this paper’s experiments, the of the investigated three prediction horizons. The 𝑝 values smaller
𝑞𝛼 equals to 3.35. Fig. 5 shows the Nemenyi test results based on the than 0.05 indicate that the proposed deep learning model outperforms
MASE for these three steps-ahead forecasting tasks. The models at the the corresponding method significantly. The 𝑝 values larger than 0.05
top outperform those at the bottom. Fig. 5 shows the proposed models are presented in bold. According to Table 10, the proposed edESN
are always at the top, which indicates the outstanding performance slightly outperforms the LSTM and DWTESN because LSTM owns deep
on all prediction horizons. Furthermore, the DESN does not guarantee representations and sequential modeling capability, and DWTESN’s
improvement over ESN because of the large dimension of the global DWT block assists in temporal feature extraction. The proposed model
states. Whereas, the proposed edESN outperform the ESN and DESN generally outperforms EELM based on RMSE but significantly outper-
significantly, because the edESN makes full use of all layers’ states forms it in terms of MASE and MAPE. As the number of prediction steps
as well as the global states based on the most recent performance. grows, the proposed edESN significantly outperforms the other models
Besides, the EWTESN also outperforms the ESN and DESN significantly according to Tables 11 and 12. The dynamic ensemble block, which
because of the multi-scale representation of the raw time series and decides each readout’s contributions depending on the most recent

8
R. Gao et al. Applied Energy 329 (2023) 120261

Table 5
Comparative results of two-steps ahead forecasting.
Year Station Metric Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRPedESN
2017 46083 RMSE 0.31484 0.31485 0.34854 0.31652 0.34547 0.30815 0.31117 0.31429 0.32824 0.30722 0.40708 0.36200 0.29851 0.30020
MASE 1.79500 1.86974 2.06759 1.78544 2.09530 1.80655 1.79325 1.79039 1.94876 1.78151 2.50566 2.10897 1.72875 1.72834
MAPE 0.08402 0.09590 0.10584 0.08139 0.10613 0.08897 0.08601 0.08597 0.09780 0.08737 0.13237 0.10461 0.08352 0.08359

46080 RMSE 0.28523 0.25020 0.26203 0.26779 0.26065 0.24904 0.24955 0.24118 0.24959 0.23343 0.23990 0.27720 0.24573 0.22726
MASE 1.55825 1.44228 1.49604 1.52599 1.48891 1.42990 1.41450 1.36554 1.44241 1.34254 1.38860 1.57225 1.39059 1.30978
MAPE 0.09370 0.08997 0.09198 0.09412 0.09270 0.08860 0.08752 0.08439 0.09037 0.08301 0.08657 0.09850 0.08558 0.08162

46076 RMSE 0.34129 0.30539 0.31627 0.34718 0.34149 0.30028 0.30199 0.32733 0.30229 0.30160 0.30691 0.33462 0.30214 0.29396
MASE 2.05920 1.96751 2.06328 2.11686 2.16918 1.89975 1.85423 1.94615 1.91006 1.85172 1.92815 2.14245 1.89375 1.83622
MAPE 0.08911 0.09457 0.10162 0.09475 0.10116 0.08749 0.08253 0.08525 0.08624 0.08274 0.08798 0.10144 0.08574 0.08291

46001 RMSE 0.45213 0.49636 0.41580 0.41458 0.91288 0.73662 1.17003 0.40931 0.41365 0.39035 0.90476 0.77852 0.57386 0.40253
MASE 1.99767 2.27690 1.96898 1.87768 4.45944 3.50590 4.86670 1.84088 1.89226 1.76284 4.13822 3.30178 2.56493 1.75901
MAPE 0.08368 0.09301 0.08634 0.07809 0.19056 0.14622 0.21325 0.07638 0.08089 0.07430 0.18814 0.14912 0.10578 0.07346

46077 RMSE 0.26055 0.21408 0.24421 0.24959 0.23185 0.22360 0.22296 0.22312 0.23882 0.22031 0.21422 0.27499 0.21919 0.21951
MASE 2.25608 2.00926 2.23961 2.20093 2.13523 2.08302 2.04010 2.07168 2.18690 2.03165 2.01207 2.60961 2.02876 2.00179
MAPE 0.13211 0.12159 0.13704 0.12660 0.13387 0.12795 0.12332 0.12479 0.13378 0.12367 0.12328 0.16703 0.12242 0.11984
2018 46083 RMSE 0.39545 0.36834 0.39724 0.37753 0.39995 0.36445 0.38864 0.39619 0.37264 0.36836 0.39143 0.38412 0.36520 0.35773
MASE 2.35462 2.13885 2.31919 2.21950 2.33970 2.15266 2.16492 2.28394 2.17023 2.13855 2.31126 2.25675 2.12839 2.10026
MAPE 0.08376 0.07599 0.08237 0.07887 0.08435 0.07722 0.07607 0.08010 0.07780 0.07589 0.08399 0.08160 0.07551 0.07479

46080 RMSE 0.38727 0.35886 0.37509 0.39034 0.39966 0.35772 0.36900 0.35207 0.36565 0.34496 0.35803 0.44810 0.34418 0.34398
MASE 2.19120 1.97424 2.09774 2.17468 2.24421 1.98514 2.01136 1.96452 2.01879 1.89627 2.00341 2.46715 1.90201 1.88792
MAPE 0.09511 0.08436 0.09162 0.09126 0.09719 0.08626 0.08590 0.08470 0.08656 0.08157 0.08758 0.10494 0.08197 0.08157

46076 RMSE 0.44807 0.40196 0.41086 0.46117 0.42881 0.40790 0.41349 0.42348 0.41097 0.40444 0.46457 0.41722 0.40860 0.39437
MASE 2.51549 2.27779 2.32113 2.54757 2.47208 2.28630 2.27419 2.32469 2.31069 2.24050 2.72594 2.38461 2.28930 2.22194
MAPE 0.09550 0.09027 0.09023 0.09461 0.09852 0.08757 0.08548 0.08611 0.08980 0.08488 0.11006 0.09293 0.08816 0.08559

46001 RMSE 0.44070 0.42270 0.43614 0.42295 0.42567 0.40989 0.45837 0.42007 0.41427 0.40798 0.43278 0.40738 0.40229 0.39930
MASE 2.09267 1.96368 2.06328 2.01670 1.98095 1.91059 2.00479 1.91795 1.94707 1.88423 2.05560 1.90177 1.86494 1.86284
MAPE 0.09121 0.08406 0.08983 0.08875 0.08515 0.08183 0.08281 0.08139 0.08370 0.08024 0.09010 0.08241 0.07991 0.08059

46077 RMSE 0.25749 0.21979 0.23524 0.23830 0.23335 0.22219 0.21614 0.22127 0.22227 0.21737 0.25162 0.23630 0.21778 0.21681
MASE 2.27741 1.95643 2.11805 2.09811 2.14671 2.00929 1.94914 1.97376 2.01415 1.94656 2.33381 2.16339 1.93681 1.91677
MAPE 0.14460 0.12645 0.13671 0.13427 0.14466 0.13315 0.12798 0.12717 0.13220 0.12610 0.15994 0.14658 0.12538 0.12341
2019 46083 RMSE 0.34205 0.32589 0.34274 0.33066 0.34437 0.32553 0.31777 0.32880 0.32545 0.31856 0.35192 0.32939 0.32684 0.31991
MASE 2.20782 2.09979 2.29670 2.12409 2.24776 2.08186 2.01686 2.11775 2.09082 2.04513 2.29267 2.11912 2.09728 2.04874
MAPE 0.07812 0.07574 0.08754 0.07533 0.08275 0.07418 0.07131 0.07413 0.07476 0.07230 0.08375 0.07559 0.07409 0.07229

46080 RMSE 0.40252 0.68321 0.52878 0.39960 0.53417 0.67879 0.79497 0.42850 0.44957 0.41469 0.59681 0.90229 0.40276 0.35740
MASE 2.36706 3.08712 2.78291 2.30240 2.78262 3.00095 3.22740 2.28767 2.47436 2.29709 2.74571 3.52865 2.24952 2.05108
MAPE 0.08951 0.11204 0.10472 0.08448 0.09968 0.10633 0.11538 0.08094 0.08976 0.08214 0.10097 0.12909 0.08394 0.07662

46076 RMSE 0.37270 0.34201 0.37121 0.35299 0.36861 0.32889 0.33015 0.34080 0.34649 0.33043 0.42082 0.37647 0.33185 0.32285
MASE 2.36057 2.17082 2.39676 2.24527 2.33000 2.07676 2.10223 2.12825 2.20562 2.09032 2.78764 2.40297 2.10340 2.05205
MAPE 0.08922 0.08394 0.09269 0.08696 0.08920 0.07981 0.07973 0.08026 0.08497 0.07963 0.11433 0.09324 0.08063 0.07853

46001 RMSE 0.43079 0.37361 0.43306 0.41151 0.38752 0.37700 0.39613 0.38832 0.38079 0.37252 0.39907 0.53268 0.37616 0.37388
MASE 1.87034 1.59329 1.90369 1.75844 1.66006 1.62293 1.68883 1.67076 1.62784 1.58529 1.71513 2.26199 1.60785 1.59921
MAPE 0.08186 0.07112 0.08719 0.07575 0.07355 0.07265 0.07281 0.07294 0.07341 0.07036 0.07529 0.10166 0.07065 0.07035

46077 RMSE 0.23271 0.21333 0.22099 0.23835 0.22553 0.21666 0.21223 0.23929 0.21275 0.21739 0.22528 0.23146 0.21061 0.20729
MASE 2.31550 2.12702 2.26445 2.27363 2.29321 2.19310 2.14223 2.25746 2.14983 2.15115 2.25231 2.34183 2.12160 2.08071
MAPE 0.12295 0.11297 0.12667 0.11782 0.12343 0.11864 0.11385 0.11578 0.11609 0.11330 0.12069 0.12637 0.11309 0.11062

Fig. 5. Nemenyi test results using MASE for the predictions of (a) One-step ahead (𝑝 value = 9.70e−28), (b) Two-steps ahead (𝑝 value = 3.99e−25) and (c) Four-steps ahead (𝑝
value = 1.26e−27). The CD is 5.12, and 𝑞𝛼 is 3.35.

9
R. Gao et al. Applied Energy 329 (2023) 120261

Table 6
Comparative results of four-steps ahead forecasting.
Year Station Metric Naive SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRPedESN
2017 46083 RMSE 0.46820 0.44416 0.48421 0.44575 0.48144 0.46495 0.46813 0.46051 0.46568 0.44337 0.51745 0.46935 0.43842 0.43540
MASE 2.69001 2.58089 2.85550 2.60299 2.90298 2.75321 2.72345 2.59653 2.76388 2.56646 3.13821 2.76310 2.55611 2.52160
MAPE 0.12751 0.13106 0.14410 0.12795 0.14785 0.13960 0.13425 0.12767 0.13949 0.12795 0.16494 0.14024 0.12801 0.12421

46080 RMSE 0.45008 0.36058 0.39075 0.41762 0.36579 0.38035 0.36489 0.35493 0.35146 0.33751 0.36536 0.42087 0.35952 0.33160
MASE 2.38202 2.04278 2.15307 2.22718 2.10000 2.16739 2.04188 2.03132 1.98788 1.90981 2.07136 2.36782 2.02325 1.87967
MAPE 0.14044 0.12445 0.12776 0.13352 0.13075 0.13286 0.12722 0.12601 0.12131 0.11737 0.12702 0.14698 0.12287 0.11493

46076 RMSE 0.51370 0.44296 0.47527 0.52331 0.46419 0.44506 0.45151 0.44297 0.45519 0.43259 0.43739 0.49077 0.43769 0.41337
MASE 3.09443 2.78515 2.98348 3.12832 2.98067 2.81167 2.72624 2.71861 2.84612 2.66426 2.74233 3.16122 2.74031 2.55846
MAPE 0.13445 0.13132 0.13587 0.14220 0.14118 0.13144 0.12237 0.12078 0.13123 0.12088 0.12715 0.15315 0.12604 0.11662

46001 RMSE 0.63409 0.67511 0.57900 0.58567 0.72258 1.06158 1.50565 0.55179 0.56835 0.53342 0.91976 0.99061 0.58242 0.53703
MASE 2.92952 3.16270 2.75539 2.73014 3.28712 5.04033 6.43734 2.57775 2.66947 2.49477 4.34897 4.35922 2.75580 2.44696
MAPE 0.12493 0.13020 0.11797 0.11910 0.13263 0.21782 0.28800 0.11023 0.11794 0.10880 0.19486 0.20283 0.12281 0.10467

46077 RMSE 0.43115 0.36030 0.39345 0.42980 0.36552 0.38106 0.32987 0.35039 0.38278 0.35462 0.34912 0.43951 0.36340 0.35065
MASE 3.58472 3.16413 3.49941 3.50680 3.29386 3.32805 3.08301 3.23432 3.36404 3.18619 3.13871 4.04530 3.19523 3.07799
MAPE 0.20880 0.18693 0.21850 0.20479 0.20295 0.20106 0.19062 0.19838 0.20513 0.19411 0.19193 0.25634 0.19204 0.18475
2018 46083 RMSE 0.60370 0.53671 0.55847 0.56728 0.55890 0.54134 0.57994 0.54315 0.53251 0.52225 0.54217 0.58846 0.53868 0.53145
MASE 3.57898 3.07359 3.26544 3.34246 3.24003 3.15905 3.10895 3.19995 3.10640 3.07111 3.17879 3.45803 3.11763 3.03726
MAPE 0.12567 0.10677 0.11382 0.11724 0.11459 0.11142 0.10792 0.11120 0.11006 0.10754 0.11267 0.12455 0.10906 0.10614

46080 RMSE 0.60021 0.55642 0.57241 0.59087 0.54953 0.54225 0.54633 0.52593 0.54433 0.51790 0.53270 0.70023 0.51409 0.51585
MASE 3.39093 3.06204 3.24157 3.24070 3.04798 2.99861 2.96097 2.91925 2.99285 2.83147 2.90441 3.79565 2.78958 2.78795
MAPE 0.14859 0.12854 0.14201 0.13945 0.13132 0.12811 0.12557 0.12391 0.12520 0.11917 0.12553 0.15932 0.11868 0.11931

46076 RMSE 0.67829 0.59256 0.61151 0.68775 0.61504 0.60141 0.60509 0.61267 0.58504 0.58142 0.63062 0.61291 0.58880 0.56868
MASE 3.83255 3.29776 3.39542 3.80923 3.56527 3.42221 3.29266 3.39754 3.29367 3.21396 3.60546 3.47052 3.29294 3.16416
MAPE 0.14575 0.12680 0.12825 0.13820 0.14075 0.13281 0.12309 0.12558 0.12800 0.12153 0.14160 0.13501 0.12766 0.12228

46001 RMSE 0.66276 0.59891 0.61204 0.62649 0.60683 0.65174 0.66623 0.57841 0.58269 0.57124 0.61285 0.59539 0.57678 0.57444
MASE 3.18283 2.79519 2.90029 3.01912 2.86907 2.95094 2.95323 2.74912 2.78305 2.71544 2.92614 2.83482 2.74154 2.71005
MAPE 0.13910 0.11803 0.12448 0.13171 0.12260 0.12356 0.12148 0.11762 0.11973 0.11606 0.12720 0.12202 0.11743 0.11648

46077 RMSE 0.41965 0.36531 0.36860 0.38326 0.37376 0.35813 0.35297 0.36878 0.36085 0.35786 0.37856 0.38555 0.35579 0.35273
MASE 3.70604 3.20577 3.31791 3.37942 3.40333 3.24569 3.18266 3.31013 3.25224 3.20936 3.49388 3.52404 3.18724 3.13261
MAPE 0.23654 0.20647 0.22165 0.21573 0.23046 0.22030 0.21483 0.21634 0.22010 0.21307 0.24073 0.24331 0.21155 0.20663
2019 46083 RMSE 0.48961 0.46144 0.46707 0.48343 0.47584 0.46263 0.44878 0.45914 0.45684 0.44857 0.47239 0.47127 0.46058 0.44444
MASE 3.16465 2.97860 3.12005 3.09753 3.12213 2.98955 2.87063 2.95578 2.98851 2.89868 3.06199 3.04885 2.95466 2.84624
MAPE 0.11365 0.10703 0.11649 0.11101 0.11439 0.10770 0.10299 0.10480 0.10828 0.10358 0.11093 0.11015 0.10523 0.10142

46080 RMSE 0.61977 0.98762 0.61099 0.62750 0.77130 0.82789 1.67215 0.58588 0.63179 0.58144 0.81772 1.47990 0.61912 0.54462
MASE 3.64709 4.48141 3.49540 3.57802 4.03184 3.92267 6.00303 3.27948 3.61704 3.34017 3.90203 5.83426 3.42232 3.14437
MAPE 0.13889 0.16177 0.13165 0.13013 0.14813 0.14429 0.21922 0.11855 0.13582 0.12274 0.14449 0.21480 0.12863 0.11747

46076 RMSE 0.56820 0.50861 0.55023 0.53483 0.54439 0.50828 0.49902 0.49615 0.51164 0.48691 0.54625 0.51474 0.49158 0.47271
MASE 3.67144 3.18791 3.48720 3.37748 3.40950 3.19792 3.20143 3.16911 3.24121 3.08933 3.59672 3.28330 3.12652 2.98899
MAPE 0.13883 0.12040 0.13293 0.12840 0.12977 0.12168 0.12207 0.12159 0.12402 0.11803 0.14284 0.12618 0.11923 0.11343

46001 RMSE 0.64707 0.55637 0.60379 0.60187 0.59668 0.58513 0.59368 0.57548 0.56282 0.55310 0.57686 0.71994 0.55617 0.54349
MASE 2.84639 2.37702 2.59552 2.62117 2.59268 2.52655 2.51350 2.43231 2.40169 2.35109 2.50974 3.10070 2.39822 2.32782
MAPE 0.12527 0.10552 0.11698 0.11652 0.11561 0.11368 0.10815 0.10570 0.10886 0.10439 0.11080 0.13978 0.10592 0.10256

46077 RMSE 0.36666 0.34853 0.34132 0.35412 0.34898 0.33918 0.35188 0.35488 0.33995 0.33628 0.34314 0.39346 0.33038 0.32534
MASE 3.66819 3.46019 3.47041 3.47971 3.52794 3.43001 3.37723 3.42924 3.48298 3.35887 3.38891 3.92587 3.27252 3.20385
MAPE 0.19289 0.17944 0.19226 0.18252 0.19126 0.18568 0.17778 0.17822 0.18897 0.17857 0.17934 0.21129 0.17343 0.16878

performance, may be responsible for the significant improvement. The 5.6. Ablation study
most recent performance is a direct reflection of which readout is
appropriate for the future phases. The comparative study demonstrates the superiority of the proposed
model over the baselines. This section researches the impact of each
The scatterplots of different horizons’ forecasts and raw data at
component in the proposed model and the effect of time lags.
station 46083 of different years, as well as the related coefficients
of determination 𝑅2 , are shown in Fig. 6. The larger the 𝑅2 is, the
5.6.1. Analysis of each component
more precise the forecasts are. The 𝑅2 decreases with the increase An ablation study is conducted to investigate the necessity of each
of prediction steps. For one-step ahead forecast, the 𝑅2 of all years component in the proposed model. We assess the following four vari-
is larger than 0.96, which further demonstrates the suitability of the ants on three prediction horizons:
proposed ESN model.
• DnedESN: The dynamic ensemble edESN without reservoir prun-
We only present the forecasts and raw observations at station 36083
ing and direct link connecting each reservoir layer to the input
of 2017, 2018, and 2019 for brevity. Fig. 7 visualizes the comparisons
layer.
between the proposed model’s forecasts and the ground truth. Fig. 7 • DedESN: The dynamic ensemble edESN without reservoir prun-
demonstrates that the proposed DRPedESN precisely captures and an- ing.
ticipates the future variations, trends, and cycles on the three prediction • MRPedESN: The edESN with reservoir pruning which utilizes the
horizons. mean as the ensemble operator.

10
R. Gao et al. Applied Energy 329 (2023) 120261

Table 7
Slope comparisons of one-step ahead forecasting.
Year Station Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRIedESN
2017 46083 0.98183 0.98176 0.97855 0.98327 0.97862 0.98296 0.98239 0.98135 0.98118 0.98276 0.95516 0.97736 0.98353 0.98330
46080 0.98031 0.98140 0.97957 0.98106 0.98220 0.98256 0.98257 0.98193 0.98115 0.98350 0.98219 0.98272 0.98335 0.98354
46076 0.98381 0.98556 0.98516 0.98469 0.98066 0.98604 0.98550 0.98284 0.98597 0.98607 0.98540 0.98165 0.98603 0.98636
46001 0.97085 0.97561 0.97543 0.97535 0.88922 0.91970 0.82532 0.97411 0.97492 0.97662 0.85847 0.90999 0.97427 0.97433
46077 0.96907 0.97562 0.96533 0.97032 0.96840 0.97463 0.97332 0.97045 0.96798 0.97201 0.97544 0.96416 0.97547 0.97663
2018 46083 0.97489 0.97580 0.97240 0.97521 0.96929 0.97640 0.97202 0.96941 0.97524 0.97490 0.96496 0.96897 0.97675 0.97666
46080 0.97405 0.97664 0.97140 0.97466 0.97167 0.97619 0.97535 0.97503 0.97509 0.97733 0.97209 0.97571 0.97758 0.97739
46076 0.97433 0.97550 0.97500 0.97346 0.97312 0.97687 0.97593 0.97153 0.97562 0.97562 0.96329 0.97472 0.97660 0.97696
46001 0.96629 0.96597 0.96418 0.96760 0.96328 0.96645 0.96286 0.96806 0.96779 0.96964 0.96515 0.96975 0.96988 0.96980
46077 0.97588 0.98111 0.97898 0.97888 0.97724 0.98075 0.98213 0.98022 0.98030 0.98144 0.96464 0.97654 0.98143 0.98151
2019 46083 0.97728 0.97856 0.97763 0.97870 0.97453 0.97848 0.97989 0.97743 0.97874 0.97922 0.97540 0.97740 0.97894 0.97929
46080 0.97687 0.93928 0.97997 0.97847 0.90759 0.96070 0.92932 0.97079 0.96748 0.97495 0.94980 0.91176 0.96878 0.97916
46076 0.97887 0.98153 0.97893 0.98035 0.97700 0.98267 0.98191 0.97953 0.98082 0.98207 0.96005 0.97854 0.98141 0.98176
46001 0.96856 0.97469 0.96514 0.97213 0.97257 0.97444 0.97198 0.97042 0.97391 0.97390 0.97058 0.95002 0.97375 0.97318
46077 0.97265 0.97657 0.97329 0.96939 0.97110 0.97517 0.97696 0.96739 0.97502 0.97477 0.96948 0.96943 0.97617 0.97685
Average 0.97504 0.97504 0.97473 0.97624 0.96377 0.97293 0.96383 0.97470 0.97608 0.97765 0.96081 0.96458 0.97760 0.97845
Median 0.97489 0.97657 0.97543 0.97535 0.97257 0.97640 0.97593 0.97411 0.97524 0.97662 0.96515 0.97472 0.97675 0.97739
Minimum 0.96629 0.93928 0.96418 0.96760 0.88922 0.91970 0.82532 0.96739 0.96748 0.96964 0.85847 0.90999 0.96878 0.96980
Maximum 0.98381 0.98556 0.98516 0.98469 0.98220 0.98604 0.98550 0.98284 0.98597 0.98607 0.98540 0.98272 0.98603 0.98636

Table 8
Slope comparisons of two-steps ahead forecasting.
Year Station Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRIedESN
2017 46083 0.97086 0.97071 0.96412 0.97274 0.96463 0.97186 0.97115 0.97074 0.96851 0.97216 0.94584 0.96056 0.97346 0.97324
46080 0.96048 0.97042 0.96787 0.96518 0.96675 0.97071 0.97039 0.97157 0.96963 0.97337 0.97200 0.96325 0.97051 0.97484
46076 0.97031 0.97675 0.97474 0.97142 0.97147 0.97717 0.97663 0.97325 0.97713 0.97748 0.97619 0.97110 0.97692 0.97815
46001 0.95618 0.95229 0.96248 0.96304 0.87456 0.90679 0.74244 0.96345 0.96438 0.96753 0.86824 0.86548 0.92886 0.96507
46077 0.91792 0.94311 0.92559 0.92554 0.93328 0.93819 0.93975 0.93915 0.92908 0.93995 0.94388 0.91327 0.94145 0.94056
2018 46083 0.95217 0.95763 0.95063 0.95568 0.95020 0.95895 0.95326 0.95059 0.95645 0.95748 0.95252 0.95458 0.95879 0.96029
46080 0.95169 0.95967 0.95365 0.95478 0.94947 0.95841 0.95599 0.95977 0.95717 0.96171 0.95858 0.93028 0.96202 0.96212
46076 0.95017 0.95968 0.95778 0.94728 0.95436 0.95806 0.95719 0.95563 0.95801 0.95967 0.94725 0.95608 0.95809 0.96101
46001 0.94159 0.94738 0.94380 0.94493 0.94558 0.94925 0.93926 0.94804 0.94943 0.95132 0.94439 0.95017 0.95168 0.95229
46077 0.93945 0.95497 0.94954 0.94753 0.94907 0.95416 0.95718 0.95481 0.95395 0.95610 0.93889 0.94823 0.95592 0.95625
2019 46083 0.96253 0.96573 0.96258 0.96476 0.96153 0.96563 0.96739 0.96511 0.96567 0.96723 0.95985 0.96503 0.96545 0.96689
46080 0.96069 0.89533 0.93362 0.96440 0.93837 0.89395 0.84798 0.96170 0.95698 0.96513 0.90825 0.81280 0.96133 0.97001
46076 0.96111 0.96892 0.96466 0.96470 0.96350 0.97028 0.96932 0.96774 0.96730 0.97025 0.95187 0.95996 0.96908 0.97086
46001 0.94906 0.96237 0.94982 0.95608 0.96017 0.96074 0.95742 0.95946 0.95991 0.96229 0.95694 0.91893 0.96152 0.96181
46077 0.93379 0.94387 0.93867 0.92920 0.93705 0.94120 0.94373 0.92889 0.94340 0.94164 0.93610 0.93216 0.94468 0.94649
Average 0.95187 0.95525 0.95330 0.95515 0.94800 0.95169 0.93661 0.95799 0.95847 0.96155 0.94405 0.93346 0.95865 0.96266
Median 0.95217 0.95967 0.95365 0.95608 0.95020 0.95841 0.95718 0.95977 0.95801 0.96229 0.94725 0.95017 0.96133 0.96212
Minimum 0.91792 0.89533 0.92559 0.92554 0.87456 0.89395 0.74244 0.92889 0.92908 0.93995 0.86824 0.81280 0.92886 0.94056
Maximum 0.97086 0.97675 0.97474 0.97274 0.97147 0.97717 0.97663 0.97325 0.97713 0.97748 0.97619 0.97110 0.97692 0.97815

Table 9
Slope comparisons of four-steps ahead forecasting.
Year Station Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN DRIedESN
2017 46083 0.93556 0.94028 0.93157 0.94057 0.93141 0.93570 0.93364 0.93611 0.93509 0.94082 0.91685 0.93483 0.94185 0.94353
46080 0.90063 0.93584 0.93518 0.91334 0.93326 0.93089 0.93927 0.93803 0.93862 0.94357 0.93376 0.91848 0.93575 0.94566
46076 0.93245 0.95216 0.94573 0.92824 0.94685 0.94980 0.94719 0.95223 0.94812 0.95451 0.95162 0.93788 0.95155 0.95694
46001 0.91382 0.90944 0.92943 0.92415 0.89748 0.80887 0.61493 0.93340 0.93090 0.93924 0.85072 0.80420 0.92723 0.93759
46077 0.77689 0.83356 0.79845 0.78064 0.82602 0.81384 0.86985 0.84638 0.81138 0.83922 0.85055 0.77055 0.83605 0.84572
2018 46083 0.88892 0.90950 0.90265 0.89860 0.90128 0.90883 0.89494 0.90571 0.90930 0.91284 0.90817 0.89470 0.90934 0.91101
46080 0.88380 0.89862 0.88892 0.88192 0.90099 0.90222 0.90409 0.90911 0.90271 0.91245 0.90810 0.83634 0.91565 0.91378
46076 0.88570 0.91218 0.90949 0.88845 0.90460 0.90721 0.90752 0.90674 0.91298 0.91580 0.90134 0.90366 0.91163 0.91796
46001 0.86768 0.89369 0.88656 0.87846 0.88862 0.86807 0.86745 0.90072 0.89797 0.90314 0.88781 0.89240 0.89974 0.90197
46077 0.83878 0.87171 0.86804 0.85928 0.86398 0.87633 0.88151 0.86904 0.87430 0.87639 0.85943 0.85771 0.87774 0.87997
2019 46083 0.92326 0.93143 0.92746 0.92303 0.92572 0.92993 0.93454 0.93202 0.93192 0.93493 0.92721 0.92759 0.93089 0.93573
46080 0.90692 0.78750 0.91410 0.90944 0.86559 0.83909 0.61216 0.92510 0.91094 0.92727 0.83940 0.66794 0.90852 0.93072
46076 0.90968 0.93071 0.91942 0.91766 0.92151 0.92912 0.92958 0.93047 0.92942 0.93534 0.92006 0.92504 0.93232 0.93789
46001 0.88504 0.91567 0.90203 0.89555 0.90594 0.90469 0.90500 0.91373 0.91177 0.91785 0.91037 0.85487 0.91584 0.91931
46077 0.83555 0.84582 0.84783 0.83766 0.84038 0.84980 0.83665 0.83653 0.84926 0.85409 0.84709 0.79514 0.85975 0.86437
Average 0.88565 0.89788 0.90046 0.89180 0.89691 0.89029 0.86522 0.90902 0.90631 0.91383 0.89416 0.86142 0.91026 0.91614
Median 0.88892 0.90950 0.90949 0.89860 0.90128 0.90469 0.90409 0.91373 0.91177 0.91785 0.90810 0.89240 0.91565 0.91931
Minimum 0.77689 0.78750 0.79845 0.78064 0.82602 0.80887 0.61216 0.83653 0.81138 0.83922 0.83940 0.66794 0.83605 0.84572
Maximum 0.93556 0.95216 0.94573 0.94057 0.94685 0.94980 0.94719 0.95223 0.94812 0.95451 0.95162 0.93788 0.95155 0.95694

11
R. Gao et al. Applied Energy 329 (2023) 120261

Table 10
Wilcoxon test results of one-step ahead forecasting.
Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN
RMSE 0.0001 0.0054 0.0002 0.0004 0.0001 0.0151 0.0026 0.0001 0.0006 0.0215 0.0001 0.0001 0.0730
MASE 0.0001 0.0003 0.0001 0.0001 0.0001 0.0015 0.0067 0.0001 0.0001 0.0946 0.0001 0.0001 0.0946
MAPE 0.0001 0.0002 0.0001 0.0012 0.0001 0.0004 0.1514 0.0004 0.0001 0.4887 0.0001 0.0001 0.1354

Table 11
Wilcoxon test results of two-steps ahead forecasting.
Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN
RMSE 0.0001 0.0004 0.0001 0.0001 0.0001 0.0001 0.0003 0.0001 0.0001 0.0353 0.0001 0.0001 0.0009
MASE 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0002 0.0001 0.0001 0.0006 0.0001 0.0001 0.0001
MAPE 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0009 0.0001 0.0001 0.0084 0.0001 0.0001 0.0006

Table 12
Wilcoxon test results of four-steps ahead forecasting.
Persistence SVR MLP LSTM PSOELM EELM AELM LGBM BMLP BMLP+LGBM ESN DESN DWTESN
RMSE 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0003 0.0001 0.0001 0.0067 0.0001 0.0001 0.0001
MASE 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
MAPE 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0009 0.0001 0.0001 0.0001

• DRPedESN: The proposed edESN which combines dynamic en- Table 13


Comparative results of different time lags for one-step ahead forecasting.
semble, reservoir pruning and direct link.
Lag = 1 Lag = 4 Lag = 8 Lag = 16
The forecasting performance is normalized to highlight the difference RMSE 0.25 ± 0.06 0.50 ± 0.28 0.63 ± 0.22 0.86 ± 0.31
among these variants. The bar plots of the ablation studies in terms of MASE 1.46 ± 0.19 3.15 ± 1.84 4.02 ± 1.32 5.78 ± 1.85
MAPE 0.07 ± 0.01 0.16 ± 0.12 0.22 ± 0.11 0.33 ± 0.14
RMSE, MASE and MAPE are presented in Figs. 8–10, respectively. The
following findings are based on the results of the ablation studies. First,
as the prediction horizons grow longer, the forecasting performance Table 14
decreases. Second, the DnedESN performs poorly across all prediction Comparative results of different time lags for two-steps ahead forecasting.

horizons, emphasizing the importance of the direct link [61]. The Lag = 1 Lag = 4 Lag = 8 Lag = 16

clean information is transmitted to guide the reservoir states creation RMSE 0.32 ± 0.07 0.62 ± 0.31 0.71 ± 0.24 0.96 ± 0.33
in deep structures via the direct link connecting the input layer to MASE 1.90 ± 0.23 3.94 ± 1.85 4.69 ± 1.76 6.52 ± 2.04
MAPE 0.09 ± 0.02 0.21 ± 0.13 0.27 ± 0.14 0.39 ± 0.17
each reservoir layer. Third, across all prediction horizons, the DR-
PedESN outperforms the MRPedESN, demonstrating the superiority of
the dynamic ensemble module over the static ensemble. Unlike the Table 15
Comparative results of different time lags for four-steps ahead forecasting.
static ensemble, the dynamic ensemble adjusts the combination weights
Lag = 1 Lag = 4 Lag = 8 Lag = 16
based on each forecasting candidate’s most recent performance. Such
an update facilitates capturing the time series evolving properties and RMSE 0.46 ± 0.09 0.70 ± 0.17 0.87 ± 0.26 1.11 ± 0.40
MASE 2.79 ± 0.38 4.67 ± 1.48 5.87 ± 1.93 7.56 ± 2.52
filtering out the inferior models by assigning low weights. Finally,
MAPE 0.13 ± 0.03 0.26 ± 0.15 0.35 ± 0.17 0.46 ± 0.20
the DRPedESN outperforms the DedESN on one-step, and two-steps
ahead forecasts while maintaining similar performance on four-steps
ahead forecasting. As a result, we argue that reservoir pruning is
necessary, but it has the smallest impact on forecasting performance edESN on hourly wave energy time series. Therefore, the ensemble deep
when compared to other factors. structure is proven to boost the performance of a deep neural network.
Unlike traditional deep neural networks with a single output layer
5.6.2. Analysis of time lags used in the energy forecasting literature, the proposed edESN has mul-
tiple readouts to avoid overfitting. The various readout layers of each
As a recurrent neural network, the ESN processes the wave time
reservoir are unaffected by the global states’ huge size and compensate
series hour by hour. The original wave time series is transformed into
for the poor performance of particular readouts. Although ensemble
the ESN’s hidden states, and then the output layer is trained using the
learning improves the performance [62], the challenge of designing
hidden states. The ESN generally only utilizes the latest hidden states
combination schemes for multiple outputs within one single deep net-
for decision making and output layer training. This section analyzes
work is crucial for performance. The combination scheme should assign
the performance of utilizing the different lags. Tables 13–15 summarize
large weights to the readout layer, which is likely to forecast the next
the average performance and the respective standard deviations for
step precisely. However, the evolving and chaotic nature of the signifi-
different time lags. Utilizing the latest information achieves the best
cant wave height time series imposes critical challenges on determining
averaged performance because the ESN recurrently processes the wave
precise readout layers for the future. In time series, the observations
time series, and the latest hidden states contain efficient information.
change continuously. Observations with similar time indices are more
Furthermore, utilizing more lags increases the input dimension of the
likely to have similar patterns than those with time indexes far apart
output layer, which may deteriorates the performance.
on the time axis. As a result, the most recent forecasting accuracy
provides trustworthy direction and insights into the dynamic ensemble
6. Discussion: findings, limitations and future directions scheme’s construction. This work proposes to use the most recent pre-
dicting performance to estimate the dynamic ensemble weights based
The ensemble deep architecture within one neural network is less on this intuition. These ensemble weights are different at different
explored and researched in the literature on wave energy forecasting. time steps, creating an evolving ensemble module. The diversities of
The experimental studies have demonstrated the appropriateness of the all reservoir layers ensure the possibilities of satisfactory forecasts for

12
R. Gao et al. Applied Energy 329 (2023) 120261

Fig. 6. Scatter-plots of predictions and raw observations at station 46083 of year 2017, 2018 and 2019. The coefficients of determination 𝑅2 are displayed.

various scenarios. The dynamic ensemble model can determine the and propagate the more desirable features to the deep layers. The
suitable combinations of outputs considering the latest performance use of a linear pruning technique is justified for two reasons. First,
under different scenarios. When there are extreme heights, the dynamic only linear relationships between the reservoir and outputs can be
ensemble module assists in assigning large weights to the layers which learned by the linear readout layer. Second, linear pruning is not
generate accurate estimations for the latest steps. The ablation study computationally intensive. After linear pruning, the remaining features
compares the dynamic ensemble with a static ensemble approach, the are fed into the corresponding readout and subsequent hidden layers.
mean operator. Although some forecasting literature claims that the The ablation study shows that employing a dynamic ensemble and the
simple equal-weight combination is a reliable option [63], the ablation pruning method can marginally increase performance. Furthermore,
the pruning approach is simple to integrate with other deep randomized
study’s results prove that the dynamic ensemble is more suitable on
neural networks.
significant wave height time series.
Although the current study demonstrates the superiority of the pro-
Besides the dynamic ensemble block, the reservoir pruning strategy posed edESN on significant wave height forecasting, there are several
is necessary for accurate forecasts. Randomized neural networks have limitations. First, the long-term predictions of wave height are still
achieved significant success in energy forecasting [47]. However, one challenging based on the comparative results. Designing a forecasting
defect of the randomized neural networks is the naive hidden states model which can mine long-term dependency is critical. Second, the
generated by the untrained weights, which degrades the representation forecasting candidates in this paper are within one network, which
ability. In general, randomized neural networks overcome such defects discourages the diversity among the candidates. Hence, researchers can
by many hidden nodes. This paper proposes a straightforward linear include more different candidates or utilize different activation func-
pruning strategy to filter out shallow layers’ inferior random features tions for each hidden layer to foster diversity among multiple readouts.

13
R. Gao et al. Applied Energy 329 (2023) 120261

Fig. 7. Comparisons between predictions and true data at station 46083 of year 2017, 2018 and 2019.

Fig. 8. The barplot in terms of RMSE with a certain component purposefully removed.

Third, the automatic design of the deep network is crucial for forecast- optimized model. All the experiments are conducted using Intel i7-
ing. Evolutionary algorithms are potential solutions for the automatic 10700K CPU. First, randomized neural networks take less training
and data-driven design of the deep randomized networks [64,65]. time than gradient-based neural networks. Second, the recurrent neural
Finally, Table 16 records the computational cost in terms of the av- networks’ training is slower than feed-forward neural networks. For
erage simulation time. The optimization refers to the hyper-parameters instance, the ESN-based models show a longer time than ELM-based
tuning by cross-validation. Testing represents the evaluation of the models. Third, the proposed model’s testing takes 0.33 s, indicating its

14
R. Gao et al. Applied Energy 329 (2023) 120261

Fig. 9. The barplot in terms of MASE with a certain component purposefully removed.

Fig. 10. The barplot in terms of MAPE with a certain component purposefully removed.

computational cost is not heavy. The proposed model takes more time Table 16
Average simulation time.
than the DESN because of the reservoir pruning and multiple outputs
layers’ training. However, the increased computational burden of 0.08 Model Optimization Testing

s is not heavy because the added components are all linear with non- SVR 27.92 s 0.54 s
MLP 86.00 s 10.42 s
iterative solutions. In addition, randomized neural networks’ training
LSTM 868.34 s 1032.97 s
speed is high when trained by non-iterative methods. The non-iterative BRF 336.67 s 0.75 s
methods directly compute the closed-form solution of the output layer’s PSOELM 16.00 s 25.58 ms
weights. Hence, they need to collect all historical observations so EELM 75.17 s 1.83 s
that the closed-from solutions can be precisely calculated, indicating AELM 8.75 s 64.17 ms
LGBM 2.08 s 2.75 ms
that memory increases when the number of observations increases.
BMLP 804.17 s 80.25 s
The proposed method, ELM-based and ESN-based models, utilize non- ESN 14.67 s 0.13 s
iterative training methods, so they consume more memory when the DESN 229.42 s 0.25 s
number of training observations increases. However, gradient-based Proposed 61.42 s 0.33 s
methods iteratively train neural networks in a batch mode, such as
MLP, BMP, and LSTM. Each iteration utilizes a batch of data taking less
memory than all data. When the number of training samples becomes wave height time series. Therefore, a dynamic ensemble module is
large, it is practical to train randomized neural networks with iterative designed to handle the ever-changing combination of forecasts. This
methods. ensemble module determines the combination weights according to
the most recent performance of each candidate. Finally, the dynamic
7. Conclusion combination of all readouts is the forecast for the significant wave
height.
DRPedESN, an ensemble deep learning model for significant wave
A detailed experiment on twelve significant wave height time series
height predictions, is proposed in this research. The DRPedESN uses
across three years is conducted. Three prediction horizons are evalu-
deep representations to train multiple readout layers. As a result,
ated. The proposed model is compared with nine forecasting methods
ensemble readouts limit the danger of overfitting while deep structures
to show its superiority. The performance is evaluated using three
collect multi-scale characteristics. A linear reservoir pruning approach
forecasting metrics. Then statistical tests are conducted to differentiate
removes the inferior information from each reservoir layer, allowing
the forecasting methods further. The following conclusions are drawn
only vital information to reach the deep levels. A direct link is estab-
from the experimental studies:
lished to overcome the deep reservoir layers’ excessive randomness to
connect each reservoir layer to the input layer. A static ensemble may • The prediction performance decreases with the increase of pre-
suffer from the dynamic and chaotic characteristics of the significant diction horizons.

15
R. Gao et al. Applied Energy 329 (2023) 120261

• Ensemble approaches outperform the respective single model on [10] Ajeesh K, Deka PC. Forecasting of significant wave height using support vector
significant wave height forecasting. regression. In: 2015 fifth international conference on advances in computing and
communications (ICACC). IEEE; 2015, p. 50–3.
• The ablation study demonstrates the necessity of reservoir prun-
[11] Berbić J, Ocvirk E, Carević D, Lončar G. Application of neural networks and
ing, direct connection, and dynamic ensemble block. support vector machine for significant wave height prediction. Oceanologia
• The dynamic ensemble, which assigns different weights at differ- 2017;59(3):331–49.
ent time steps, outperforms the static combination. [12] Deo MC, Jha A, Chaphekar A, Ravikant K. Neural networks for wave forecasting.
• The ensemble deep variation of the canonical ESN outperforms Ocean Eng 2001;28(7):889–98.
[13] Londhe S, Shah S, Dixit P, Nair TB, Sirisha P, Jain R. A coupled numerical and
the shallow ESN significantly.
artificial neural network model for improving location specific wave forecast.
Appl Ocean Res 2016;59:483–91.
This study proposes a novel ensemble deep learning network for
[14] Kaloop MR, Kumar D, Zarzoura F, Roy B, Hu JW. A wavelet-particle swarm
significant wave height forecasting. Advanced feature extraction al- optimization-extreme learning machine hybrid modeling for significant wave
gorithms can be coupled with the suggested network in the future height prediction. Ocean Eng 2020;213:107777.
to improve performance even more. In addition, testing the proposed [15] Cuadra L, Salcedo-Sanz S, Nieto-Borge J, Alexandre E, Rodríguez G. Computa-
methodology on additional renewable energy sources, such as solar and tional intelligence in wave energy: Comprehensive review and case study. Renew
Sustain Energy Rev 2016;58:1223–46.
wind, is practical and promising.
[16] Kumar NK, Savitha R, Al Mamun A. Ocean wave height prediction using
ensemble of extreme learning machine. Neurocomputing 2018;277:12–20.
CRediT authorship contribution statement [17] Cornejo-Bueno L, Nieto-Borge J, García-Díaz P, Rodríguez G, Salcedo-Sanz S.
Significant wave height and energy flux prediction for marine energy applica-
Ruobin Gao: Theoretical development, Empirical study, Literature tions: A grouping genetic algorithm–extreme learning machine approach. Renew
review and finishing the manuscript. Ruilin Li: Development of the Energy 2016;97:380–9.
[18] Cornejo-Bueno L, Rodríguez-Mier P, Mucientes M, Nieto-Borge J, Salcedo-Sanz S.
proposed model and revision of the manuscript. Minghui Hu: De-
Significant wave height and energy flux estimation with a genetic fuzzy system
velopment of the proposed model and revision of the manuscript. for regression. Ocean Eng 2018;160:33–44.
Ponnuthurai Nagaratnam Suganthan: Empirical study, Revision of [19] Gómez-Orellana A, Guijo-Rubio D, Gutiérrez P, Hervás-Martínez C. Simul-
the manuscript and making suggestions on the comparison in the taneous short-term significant wave height and energy flux prediction us-
experiments. Kum Fai Yuen: Revision of the manuscript. ing zonal multi-task evolutionary artificial neural networks. Renew Energy
2022;184:975–89.
[20] Ali M, Prasad R, Xiang Y, Deo RC. Near real-time significant wave height
Declaration of competing interest forecasting with hybridized multiple linear regression algorithms. Renew Sustain
Energy Rev 2020;132:110003.
The authors declare the following financial interests/personal rela- [21] Yang S, Xia T, Zhang Z, Zheng C, Li X, Li H, Xu J. Prediction of significant
tionships which may be considered as potential competing interests: wave heights based on CS-BP model in the south China sea. IEEE Access
2019;7:147490–500.
Ruobin Gao reports a relationship with Nanyang Technological Uni-
[22] Zanaganeh M, Mousavi SJ, Shahidi AFE. A hybrid genetic algorithm–adaptive
versity that includes: employment. Kum Fai Yuen reports a relation- network-based fuzzy inference system in prediction of wave parameters. Eng
ship with Nanyang Technological University that includes: employ- Appl Artif Intell 2009;22(8):1194–202.
ment. Ponnuthurai Nagaratnam Suganthan reports a relationship with [23] Roulston MS, Ellepola J, von Hardenberg J, Smith LA. Forecasting wave
Nanyang Technological University that includes: employment. height probabilities with numerical weather prediction models. Ocean Eng
2005;32(14–15):1841–63.
[24] Jaeger H, Haas H. Harnessing nonlinearity: Predicting chaotic systems and saving
Data availability energy in wireless communication. Science 2004;304(5667):78–80.
[25] Gao R, Du L, Duru O, Yuen KF. Time series forecasting based on echo
Data will be made available on request. state network and empirical wavelet transformation. Appl Soft Comput
2021;102:107111.
Acknowledgment [26] Chouikhi N, Ammar B, Rokbani N, Alimi AM. PSO-based analysis of echo
state network parameters for time series forecasting. Appl Soft Comput
2017;55:211–25.
We express our sincere gratitude to the National Data Buoy Center [27] Gallicchio C, Micheli A. Architectural and markovian factors of echo state
for the data provided. networks. Neural Netw 2011;24(5):440–56.
[28] Gallicchio C, Micheli A, Pedrelli L. Deep reservoir computing: A critical
References experimental analysis. Neurocomputing 2017;268:87–99.
[29] Hu H, Wang L, Lv S-X. Forecasting energy consumption and wind power
generation using deep echo state network. Renew Energy 2020;154:598–613.
[1] Crippa P, Alifa M, Bolster D, Genton MG, Castruccio S. A temporal model for
[30] Song Z, Wu K, Shao J. Destination prediction using deep echo state network.
vertical extrapolation of wind speed and wind energy assessment. Appl Energy
Neurocomputing 2020;406:343–53.
2021;301:117378.
[31] Bai K, Yi Y, Zhou Z, Jere S, Liu L. Moving toward intelligence: Detecting symbols
[2] Ma Q, Wang P, Fan J, Klar A. Underground solar energy storage via energy piles:
on 5g systems through deep echo state network. IEEE J Emerg Sel Top Circuits
An experimental study. Appl Energy 2022;306:118042.
Syst 2020;10(2):253–63.
[3] Gao H, Xiao J. Effects of power take-off parameters and harvester shape on wave
energy extraction and output of a hydraulic conversion system. Appl Energy [32] Wang T, Gao S, Bi F, Li Y, Guo D, Ren P. Residual learning with multifac-
2021;299:117278. tor extreme learning machines for waveheight prediction. IEEE J Ocean Eng
[4] Reikard G, Pinson P, Bidlot J-R. Forecasting ocean wave energy: The ECMWF 2020;46(2):611–23.
wave model and time series methods. Ocean Eng 2011;38(10):1089–99. [33] Özger M. Prediction of ocean wave energy from meteorological variables by
[5] Anastasiou S, Sylaios G. Nearshore wave field simulation at the lee of a large fuzzy logic modeling. Expert Syst Appl 2011;38(5):6269–74.
island. Ocean Eng 2013;74:61–71. [34] Gracia S, Olivito J, Resano J, Martin-del Brio B, de Alfonso M, Álvarez E. Improv-
[6] Soukissian TH, Prospathopoulos AM, Diamanti C. Wind and wave data ing accuracy on wave height estimation through machine learning techniques.
analysis for the aegean sea-preliminary results. Glob Atmos Ocean Syst Ocean Eng 2021;236:108699.
2002;8(2–3):163–89. [35] Fernández JC, Salcedo-Sanz S, Gutiérrez PA, Alexandre E, Hervás-Martínez C.
[7] Fan S, Xiao N, Dong S. A novel model to predict significant wave height based Significant wave height and energy flux range forecast with machine learning
on long short-term memory network. Ocean Eng 2020;205:107298. classifiers. Eng Appl Artif Intell 2015;43:44–53.
[8] Shamshirband S, Mosavi A, Rabczuk T, Nabipour N, Chau K-w. Prediction of [36] Li L, Yuan Z, Gao Y. Maximization of energy absorption for a wave energy
significant wave height; comparison between nested grid numerical model, and converter using the deep machine learning. Energy 2018;165:340–9.
machine learning models of artificial neural networks, extreme learning and [37] Bergmeir C, Benítez JM. On the use of cross-validation for time series predictor
support vector machines. Eng Appl Comput Fluid Mech 2020;14(1):805–17. evaluation. Inform Sci 2012;191:192–213.
[9] Mahjoobi J, Etemad-Shahidi A. An alternative approach for the prediction of [38] Alexandre E, Cuadra L, Nieto-Borge J, Candil-García G, Del Pino M, Salcedo-
significant wave heights based on classification and regression trees. Appl Ocean Sanz S. A hybrid genetic algorithm—extreme learning machine approach for
Res 2008;30(3):172–7. accurate significant wave height reconstruction. Ocean Model 2015;92:115–23.

16
R. Gao et al. Applied Energy 329 (2023) 120261

[39] Ali M, Prasad R, Xiang Y, Sankaran A, Deo RC, Xiao F, Zhu S. Advanced [50] Suganthan PN, Katuwal R. On the origins of randomization-based feedforward
extreme learning machines vs. deep learning models for peak wave energy neural networks. Appl Soft Comput 2021;105:107239.
period forecasting: A case study in Queensland, Australia. Renew Energy [51] Lukoševičius M, Jaeger H. Reservoir computing approaches to recurrent neural
2021;177:1031–44. network training. Comp Sci Rev 2009;3(3):127–49.
[40] Salcedo-Sanz S, Borge JN, Carro-Calvo L, Cuadra L, Hessner K, Alexandre E. Sig- [52] Kim T, King BR. Time series prediction using deep echo state networks. Neural
nificant wave height estimation using SVR algorithms and shadowing information Comput Appl 2020;32(23):17769–87.
from simulated and real measured X-band radar images of the sea surface. Ocean [53] Shi Q, Katuwal R, Suganthan P, Tanveer M. Random vector functional link neural
Eng 2015;101:244–53. network based ensemble deep learning. Pattern Recognit 2021;117:107978.
[41] Ali M, Prasad R. Significant wave height forecasting via an extreme learning [54] NDBC. National data buoy center. 2022, URL: https://www.ndbc.noaa.gov/.
machine model integrated with improved complete ensemble empirical mode [55] Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. Int J
decomposition. Renew Sustain Energy Rev 2019;104:281–95. Forecast 2006;22(4):679–88.
[42] Duan W, Han Y, Huang L, Zhao B, Wang M. A hybrid EMD-SVR model for the [56] Mahjoobi J, Mosabbeb EA. Prediction of significant wave height using regressive
short-term prediction of significant wave height. Ocean Eng 2016;124:54–73. support vector machines. Ocean Eng 2009;36(5):339–47.
[43] Huang W, Dong S. Improved short-term prediction of significant wave height [57] Wang H, Lei Z, Zhang X, Zhou B, Peng J. A review of deep learning for renewable
by decomposing deterministic and stochastic components. Renew Energy energy forecasting. Energy Convers Manage 2019;198:111799.
2021;177:743–58. [58] Wang H, Lei Z, Liu Y, Peng J, Liu J. Echo state network based ensemble approach
[44] Zhou S, Bethel BJ, Sun W, Zhao Y, Xie W, Dong C. Improving significant wave for wind power forecasting. Energy Convers Manage 2019;201:112188.
height forecasts using a joint empirical mode decomposition–long short-term [59] Sylaios G, Bouchette F, Tsihrintzis VA, Denamiel C. A fuzzy inference system for
memory network. J Mar Sci Eng 2021;9(7):744. wind-wave modeling. Ocean Eng 2009;36(17–18):1358–65.
[45] Deka PC, Prahlada R. Discrete wavelet neural network approach in significant [60] Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach
wave height forecasting for multistep lead time. Ocean Eng 2012;43:32–42. Learn Res 2006;7:1–30.
[46] Özger M. Significant wave height forecasting using wavelet fuzzy logic approach. [61] Wang Y, Wang L, Yang F, Di W, Chang Q. Advantages of direct input-to-output
Ocean Eng 2010;37(16):1443–51. connections in neural networks: The Elman network for stock index forecasting.
[47] Del Ser J, Casillas-Perez D, Cornejo-Bueno L, Prieto-Godino L, Sanz-Justo J, Inform Sci 2021;547:1066–79.
Casanova-Mateo C, Salcedo-Sanz S. Randomization-based machine learning in [62] Ren Y, Suganthan P, Srikanth N. Ensemble methods for wind and solar power
renewable energy prediction problems: critical literature review, new results and forecasting—A state-of-the-art review. Renew Sustain Energy Rev 2015;50:82–91.
perspectives. Appl Soft Comput 2022;108526. [63] Hsiao C, Wan SK. Is there an optimal forecast combination? J Econometrics
[48] Huang Y, Deng Y. A new crude oil price forecasting model based on variational 2014;178:294–309.
mode decomposition. Knowl-Based Syst 2021;213:106669. [64] Lynn N, Ali MZ, Suganthan PN. Population topologies for particle swarm
[49] Gao R, Du L, Yuen KF, Suganthan PN. Walk-forward empirical wavelet ran- optimization and differential evolution. Swarm Evol Comput 2018;39:24–35.
dom vector functional link for time series forecasting. Appl Soft Comput [65] Rajasekhar A, Lynn N, Das S, Suganthan PN. Computing with the collective
2021;108:107450. intelligence of honey bees–a survey. Swarm Evol Comput 2017;32:25–48.

17

You might also like