0% found this document useful (0 votes)

24 views16 pages

Pinball-Huber Boosted Extreme Learning Machine Regression: A Multiobjective Approach To Accurate Power Load Forecasting

Uploaded by

sonali.saluja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views16 pages

Pinball-Huber Boosted Extreme Learning Machine Regression: A Multiobjective Approach To Accurate Power Load Forecasting

Uploaded by

sonali.saluja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Applied Intelligence (2024) 54:8745–8760

https://doi.org/10.1007/s10489-024-05651-3

Pinball-Huber boosted extreme learning machine regression:

a multiobjective approach to accurate power load forecasting
Yang Yang1 · Hao Lou1 · Zijin Wang1 · Jinran Wu2

Accepted: 25 June 2024 / Published online: 3 July 2024

Abstract
Power load data frequently display outliers and an uneven distribution of noise. To tackle this issue, we present a forecasting
model based on an improved extreme learning machine (ELM). Specifically, we introduce the novel Pinball-Huber robust
loss function as the objective function in training. The loss function enhances the precision by assigning distinct penalties
to errors based on their directions. We employ a genetic algorithm, combined with a swift nondominated sorting technique,
for multiobjective optimization in the ELM-Pinball-Huber context. This method simultaneously reduces training errors while
streamlining model structure. We practically apply the integrated model to forecast power load data in Taixing City, which is
situated in the southern part of Jiangsu Province. The empirical findings confirm the method’s effectiveness.

Keywords Load forecasting · Robust loss function · Multi-objective optimization · Neural networks · Extreme learning
machine

1 Introduction substantial progress in enhancing robustness and accuracy.

However, their symmetric nature renders them inadequate
Power load forecasting forms the foundation of power sys- for effectively addressing the deviations caused by outliers in
tem scheduling and operation [1, 2]. Ensuring accurate power power load datasets. Thus, there is a need for a comprehensive
load forecasting is crucial for the stable operation and eco- and robust loss function that can consider both the magnitude
nomic efficiency of a power system [3, 4]. Power load data and direction of errors within the machine learning frame-
often encompass anomalies and asymmetric noise because work. Additionally, the number of hidden layer nodes in ELM
of factors such as climate fluctuations and changes in market significantly impacts the complexity of model training and
demands [5]. These data elements can impede model train- final predictive accuracy. The lack of reliable benchmarks
ing, hence affecting the accuracy of power load forecasting to balance the structural parameters and prediction accu-
and making the subject of mitigating asymmetric noise a sig- racy also poses a challenge. Therefore, integrating novel and
nificant concern in the field of power load forecasting. effective optimization techniques is essential for enhancing
Despite significant advancements in power load forecast- the ELM model and determining the optimal parameters for
ing methods, there is a need for further improvement to precise predictive modeling.
meet the current demands. Existing loss functions have made The present paper focuses on two key aspects: asymmet-
ric loss function and multi-objective optimization. To address
the aforementioned challenges, the paper contributes the fol-
B Jinran Wu
lowing:
[email protected]
Yang Yang (1) An asymmetric Pinball-Huber loss function for more
[email protected]
effective data handling is developed. Because of its supe-
Zijin Wang rior characteristics compared with other loss functions,
[email protected]
it has been incorporated into the training objective of the
1 Nanjing University of Posts and Telecommunications, ELM model.
Nanjing 210023, Jiangsu, PR China (2) The multiobjective optimization algorithm NSGA-II has
2 Australian Catholic University, North Sydney 2060, NSW, been used to optimize two critical objectives of the ELM
Australia model: training error and output weight. By considering

123
8746 Y. Yang et al.

the input weights, hidden layer thresholds, and hidden Unlike traditional artificial neural networks, ELM’s input
node numbers as input parameters, the optimized ELM weights and biases in the hidden layer are randomly assigned.
model can achieve minimized training errors and a more ELM derives hidden weights through the least squares
streamlined network structure. method, eliminating the need for adjusting hidden layer
(3) The superiorty of proposed NSGA-II-ELM-Pinball-Huber weights through iterative backpropagation [21]. As a result,
model is validated by comparing it with benchmarks the ELM model demonstrates faster learning and more pro-
(LSTM, GRU, CNN-BiLSTM-Attention) using power nounced generalization with minimal preset parameters [22].
load data from Taixing City. This validation has empha- Numerous ELM-based predictive models have been pro-
sized the effectiveness of the Pinball-Huber loss, high- posed, showcasing their exceptional regression capabilities
lighting the enhanced performance of the multiobjective in forecasting. Ni et al. [23] employed an ensemble method
optimization algorithm NSGA-II. using ELM and lower upper bound estimation (LUBE) for
short-term power prediction. Han et al. [24] developed sea-
The rest of the current paper is organized as follows: Section 2 sonal multimodels based on ELM by considering the seasonal
contains the literature review. Section 3 presents definitions distribution of power features. The effectiveness of the pro-
of some terms. Section 4 gives the methodology (method). posed methods was validated through a comparison with
Section 5 goes over the results. Section 6 discusses the con- other approaches. Thus, compared with shallow learning sys-
clusions and future research areas. tems, ELM exhibits higher efficiency, lower computational
costs, and stronger generalization.
The loss function reflects the disparity between the pre-
dicted values and actual values during the optimization
2 Literature review process, significantly impacting the learning model’s gen-
eralization and accuracy [25]. Chen et al. [26] utilized ELM
Over the past few decades, researchers have proposed numer-
enhanced with an L2-norm loss function for feature selec-
ous short-term load forecasting methods [6], which can be
tion. Most neural network methods adopt mean squared error
broadly categorized into physical, statistical, and intelligent
(MSE) or L2 loss function. Unfortunately, MSE loss function
methods [7]. Physical methods establish the mathematical
relies on Gaussian assumptions, making it sensitive to out-
relationships between historical data and physical character-
liers and challenging to precisely evaluate nonlinear errors.
istics to achieve power load forecasting. Statistical methods
Yang et al. [27] suggested employing the Huber loss func-
perform mathematical statistics on historical data, estab-
tion as the model’s training objective. The Huber loss treats
lishing the correlations between load and time to make
errors of different magnitudes differently. However, it lacks
predictions [8]. These models typically include linear regres-
consideration for the direction of errors. Power load data are
sion (LR) [9], autoregressive integrated moving average
nonlinear and often exhibit various asymmetric noise dis-
(ARIMA) [10], gray models (GM) [11], and seasonal expo-
tributions [5], necessitating the development of a new loss
nential smoothing(SEs) [7]. However, these methods fail to
function that comprehensively considers both error magni-
capture the nonlinear characteristics present in load data.
tude and direction.
Compared with traditional physical and statistical meth-
In conventional ANNs, including ELM, certain param-
ods, intelligent methods exhibit greater potential in handling
eters are set randomly, leading to a degree of error and
the nonlinear fluctuations and complex relationships within
variability in the predictive outcomes. Artificial intelligence
power load data, hence demonstrating higher accuracy in the
also exhibits drawbacks such as slow convergence, suscepti-
field of power load forecasting [12, 13]. Intelligent methods
bility to local optima, and overfitting [11, 28]. Hence, several
such as artificial neural networks (ANN) [14], support vector
intelligent optimization algorithms have been proposed to
regression (SVR) [15], and ELM [16] have found exten-
alleviate these limitations. Optimization algorithms applied
sive applications in recent power forecasting studies. Among
to machine learning algorithms have further improved their
these, ANN is adept at modeling more intricate relationships
regression capabilities to some extent [22]. For instance,
between the power load and correlated variables compared
Niu et al. [29] utilized a cooperative search algorithm
with other methods, hence leading to its widespread usage
that can explore the optimal hyperparameters of support
in power load forecasting [2, 17]. ANN, which is akin to
vector machines (SVM), using this algorithm to predict elec-
the structure of the human brain, can interpret vast amounts
tricity consumption in four Chinese provinces. Niu et al.
of data and transform it into actionable knowledge [18].
[29] optimized BPNN parameters using a genetic algo-
ELM, an enhanced single-hidden-layer feedforward neural
rithm (GA). Shang et al. [30] established a prediction model
network, has been widely employed in forecasting tasks [19,
combining least squares support vector machines (LSSVM)
20].

123
Pinball-Huber boosted extreme learning machine regression: a multiobjective... 8747

with generalized regression neural networks and optimized that the error obtained is amplified. The L2-norm loss func-
the weight coefficients by using the whale optimization tion can be described as follows:
algorithm (WOA). Xie et al. [31] proposed a short-term
1 2
power load forecasting method combining Elman neural L2(r ) = r , (1)
network (ENN) and particle swarm optimization (PSO). 2
Differing from traditional random initialization, PSO was where r = y − ŷ is the residual, y represents the expected
employed to search for the optimal learning rate for ENN. results, and ŷ represents the forecasting results.
Addressing the issue of model parameter determination,
arithmetic optimization algorithms (AOA) [32], gene expres- 3.1.2 L1-norm loss
sion programming (GEP) [33], and chimpanzee optimization
algorithm (ChOA) [34], among others, have been utilized. In the regression problem, L1-norm loss measures the abso-
Many studies on power load forecasting solely employed lute value of the difference between the forecasting value and
single-objective algorithms to optimize a criterion. However, true value. The L1-norm loss function can be described as
in practical applications, meeting multiple constraints is often follows:
necessary [7, 35].
The present paper introduces a novel power load fore- L1(r ) =| r | . (2)
casting model to address the aforementioned issues. Named
NSGA-II-ELM-Pinball-Huber, this model is based on an The L1-norm loss function is a function commonly used
enhanced Pinball-Huber loss function and multiobjective in regression problems.
optimization algorithm NSGA-II. To effectively handle
errors and anomalies in power load data, we introduced an 3.1.3 Huber loss
asymmetric and robust Pinball-Huber loss function. Within
the ELM framework, this loss function is employed as the The Huber loss function was proposed in 1964. It absorbs
objective, and the iteratively reweighted least squares (IRLS) the advantages of L1-norm and L2-norm loss functions and
method is utilized to determine the output weight vector. The makes up for their shortcomings. Concerning outliers in the
present paper conducts global multiobjective optimization data, Huber loss can perform more robustly. Not only is it
of the ELM model by employing the NSGA-II algorithm to more robust to outliers, but Huber is also derivable in the
simultaneously optimize training errors and output weights. whole domain, greatly simplifying the calculation difficulty.
The experimental results demonstrate that the proposed load Huber loss function can be described as follows:
forecasting model significantly enhances predictive perfor-
manc r , |r | ≤ δ
1 2
Hδ (r ) = 2 (3)
| r | δ − δ2 , |r | > δ,
2

3 The preliminaries where parameter δ represents tuning parameters, which con-

trol the quadratic and linear range. It is recommended to set
3.1 Regression loss function the parameter δ to 1.345 [37].

Within various enhanced algorithms, the role of the loss func- 3.1.4 Pinball loss
tion is to assess the merits and drawbacks of the improved
model by computing its minimum value within the improved The Pinball loss function is asymmetric. It not only imposes
function. Yet during practical application, because of factors certain penalties on outliers in data, but it also imposes addi-
such as the loss function’s objective, the nature of the applica- tional penalties according to different situations of outliers.
tion, data attributes, and the desired level of confidence in the In addition, because of the introduction of quantile distance,
forecasted values, a single loss function cannot be universally the Pinball loss function improves the insensitivity to charac-
applied to all model experiments. Thus, a range of loss func- teristic noise and resampling. The expression of the Pinball
tions is required to be explored to optimize the treatment of loss function is as follows:
target-type data and achieve optimal evaluation results [36].
r τ, r ≥ 0
3.1.1 L2-norm loss Pτ (r ) = (4)
r (1 − τ ), r < 0.

L2-norm loss is a smooth function that is derivable in the The parameter τ ∈ [0, 1]. When parameter τ = 1, Pinball
whole domain and simplifies the calculation. When the error loss is the same as the L1-norm loss in function, so the Pinball
increases, the error is squared because of L2-norm loss, so

123
8748 Y. Yang et al.

loss function can be considered a generalized L1-norm loss with hidden layer thresholds, without any further adjustments
function. In addition, Pinball loss also absorbs the advantages during algorithm execution. With no need for additional
of L1-norm loss and can handle the deviation of outliers. parameter settings, ELM offers simplicity in its usage. As
highlighted by Huang et al. [43], ELM commonly employs
3.1.5 Biweight loss the Moore-Penrose generalized inverse to determine key
node weights. This methodology involves only a single cal-
Tukey’s Biweight loss function is also a non-convex loss culation step (linear equation operation) to establish the
function, which can overcome the interference and influence weight matrix between the hidden and output layers [44].
caused by outlier samples and noise samples in regres- Unlike backpropagation, there’s no gradient operation, sig-
sion tasks, hence showing strong robustness in regression nificantly reducing computational demands and enhancing
tasks [38, 39]. The Biweight loss function is defined as fol- speed. Furthermore, ELM demonstrates superior general-
lows: ization compared with alternative algorithms. The structural
2 diagram of ELM is shown in Fig. 1.
c
[1 − (1 − ( rc )2 )], | r |≤ c A typical ELM network structure consists of an input
Bc (r ) = c62 (5)
6 , otherwise,
layer, a hidden layer, and an output layer, with n, L, and
m nodes, respectively. For data set (xi , yi )(i = 1, 2, ..., N )
where c is a tuning constant, which is generally specified as with N samples, xi = [xi1 , xi2 , ..., xin ]T is the input vector,
4.685. At this time, Tukey’s Biweight can achieve a regres- yi = [yi1 , yi2 , ..., yim ]T is output vector, and the output of
sion effect like that of the L2-norm loss function (95% ELM can be described as:
progressive) in minimizing the variance consistent with the
normal distribution [40]. Tukey’s Biweight suppresses the
L
y= β j G(w j · xi + b j ) (i = 1, 2, ..., n) (7)
influence of outliers during backpropagation by reducing the
j=1
gradient size to near zero. Another interesting feature of this
loss function is that it imposes a soft constraint between the where w j is the weight from the input layer to the j-th hidden
inner layer and outlier without setting a hard threshold for layer node, b j is the threshold of the j-th hidden layer node,
the residual. β j is the output layer weight connecting the j-th hidden layer
node, and G(·) represents the activation function.
3.1.6 Lncosh loss Equation (7) can be simplified as H β = Y . The objective
function of the ELM model can be written as follows:
Lncosh is a loss function commonly used in regression tasks,
with high smoothness. Define it as [41, 42]: min ||H β − Y ||. (8)

L(r ) = ln(cosh(r )). (6) Using the least square method to solve the (8), the solution
β is the following:
For the smaller residual r , ln(cosh(r )) is approximately
2
equal to r2 ; for the larger residual r , it is roughly equal to β = (H T H )−1 H T Y = H + Y , (9)
| r | −ln2. This means that the working principle of Lncosh
is very similar to the mean square error to a large extent, but it
is not greatly affected by the occasional wrong forecasting. It
has all the advantages of the Huber loss function, but unlike
Huber loss, it is quadratically differentiable everywhere.

3.2 Extreme learning machine

Unlike traditional feedforward neural networks, such as BP,

ELM is a single hidden layer feedforward neural network
that removes the requirement to set an excessive number of
node parameters. ELM was introduced by Huang et al. [16]
and has demonstrated substantial progress in the realm of
artificial intelligence algorithms.
ELM stands out from conventional neural networks Fig. 1 The structural diagram of ELM. ((x1 , x2 , ..., x N ) is the input of
because of its distinctive approach. ELM selects its weight ELM. (w N , b N ) is the weight and threshold of hidden layer. β M is the
matrix between the input and hidden layers randomly, along output layer weight.)

123
Pinball-Huber boosted extreme learning machine regression: a multiobjective... 8749

where H + is the Moore-Penrose generalized inverse of the Step 4: Combine the parent Pt and offspring Q t to get

hidden layer output matrix H . the population Rt = Pt Q t , where t is the number of
iterations; a new parent population Pt+1 is selected through
3.3 Multiobjective optimization elitist strategy.
Step 5: When the iteration reaches the specific number or
The concept behind the multiobjective optimization algo- the termination condition is met, the final population and its
rithm is to identify a collection of optimal Pareto solutions, Pareto solution set will be obtained; otherwise, let the number
where each solution fulfills the fundamental criteria of mul- of iterations t = t + 1 and go to Step 2.
tiple optimization objectives and showcases an optimal state
holistically. Within the optimal Pareto solution set, no other
solution surpasses itself in all optimization objectives [45]. 4 The proposed method
Achieving this demands that the optimization algorithm
extensively explores the solution set, guarantees a global 4.1 Proposed Pinball-Huber loss
optimization outcome, and prevents being trapped in local
optimization. Section 3.1 provides an overview of six fundamental loss
Derived from biological genetic theory, the genetic algo- functions: L2-norm, L1-norm, Huber, Pinball, Biweight, and
rithm has evolved and found applications across diverse Lncosh. The strengths, weaknesses, and suitable application
domains [46]. By incorporating the genetic algorithm, the contexts for each loss function are analyzed. Upon examina-
drawbacks associated with traditional multiobjective opti- tion and consolidation, it is evident that these loss functions
mization approaches, such as the risk of converging to local often lack compatibility with robustness and accuracy in
optima, are circumvented. This integration ensures that the diverse evaluation models, measurement approaches, and
solutions’ diversity is effectively maintained. forecasting experiments. Additionally, they tend to inade-
The general multiobjective optimization problem can be quately address standard positive and negative errors and
described as follows: outliers in machine learning challenges. As a result, this may
lead to suboptimal evaluation levels and reduced accuracy in
min F(x) = [ f 1 (x), f 2 (x), ..., f n (x)] forecasting outcomes.
To address the aforementioned challenges, we propose a
s.t. x ∈ C, (10)
solution by merging the Pinball loss with the Huber loss. The
Pinball loss function offers the ability to adapt to positive and
where f i (x) is the optimization objective, x is the solution,
negative errors during forecasting computations, displaying
and C is the constraint.
self-adjusting asymmetry. On the other hand, the Huber loss
NSGA-II is an advanced multiobjective optimization
function demonstrates remarkable robustness and effectively
algorithm that was improved by Deb et al. [47]. NSGA-
handles outliers; however, it treats both positive and negative
II introduces the concepts of fast nondominated sorting,
issues concurrently in the algorithmic process, leading to a
crowding-distance sorting, and elitist strategy, which greatly
reduction in forecasting precision. Our innovation lies in the
enhance the practical application of NSGA-II. In NSGA-II,
development of novel loss functions, combining the attributes
we can initialize a certain population P and use the genetic
of Huber and Pinball. This allows for distinct measures to be
algorithm to select, cross, and mutate the parent population
applied to diverse errors within the training procedure, sig-
P to produce the offspring population Q. After fast nondom-
nificantly enhancing the model’s performance. The proposed
inated sorting and crowding
distance sorting of the combined
Pinball-Huber loss function is presented as follows:
population R = P Q, the new population and its Pareto
optimal solution set are obtained by using the elitist strategy. ⎧
⎪
⎪ 2 r τ, 0 ≤ r ≤ δ
1 2
The outflow diagram of NSGA-II is shown in Fig. 2. The ⎪
⎪
⎨ 1 r 2 (1 − τ ), −δ ≤ r ≤ 0
specific steps are as follows: P Hδ,τ (r ) = 2 (11)
⎪ (| r | δ − δ2 )τ, r > δ
2
Step 1: Set population and iteration times and initialize ⎪
⎪
⎪
⎩(| r | δ − δ 2 )(1 − τ ), r < −δ.
the parent population P. 2
Step 2: For the parent population P, conduct fast non-
dominated sorting and crowding distance sorting and assign The newly introduced Pinball-Huber loss function com-
each individual the rank. prises two adjustable parameters: δ and τ . Notably, these
Step 3: Generate the offspring population Q 0 through parameters originate from the Huber and Pinball loss func-
tournament selection, simulated binary crossover, and poly- tions and are skillfully merged to leverage their distinct roles.
nomial mutation. Their combined utilization allows for tailored actions based

123
8750 Y. Yang et al.

Fig. 2 The outflow diagram of

NSGA-II

on the magnitude and direction of training errors. Beyond structure complexity. Achieving higher accuracy demands
refining the accuracy of the foundational loss function, a network that can personalize its modeling to the data,
the Pinball-Huber approach introduces a novel perspective which often results in an intricate network structure, particu-
by categorizing training errors based on their directional larly within the hidden layer, potentially harboring numerous
attributes. This innovative method presents a fresh approach unnecessary nodes. While pursuing a simplified neural net-
for addressing outliers. In the context of the power sys- work structure, it is not prudent to directly designate a
tem, where power load data are influenced by variables like minimal number of nodes and related parameters. Subjec-
weather, season, and market demand, volatility and the pres- tively determining the appropriate count of neurons for the
ence of outliers and asymmetric noise are common. Our network to accurately capture the input-to-output relation-
proposed Pinball-Huber loss function addresses these intrica- ship is not a feasible approach. What is required is a rational
cies by meticulously dissecting errors and handling positive and efficient algorithm to assist in identifying the optimal
and negative scenarios in distinct ways. number of nodes for ELM.
Utilizing the minimization of training error and output
4.2 NSGA-II-ELM-Pinball-Huber weight within the ELM-Pinball-Huber model as the dual
optimization objectives, we employ the multiobjective opti-
In the practical application of ELM, there exists a funda- mization algorithm NSGA-II to enhance the ELM model.
mental trade-off between forecasting accuracy and network From the derived set of Pareto front solutions, we carefully

123
Pinball-Huber boosted extreme learning machine regression: a multiobjective... 8751

choose the most suitable solution to execute the forecasting multiobjective optimization algorithm NSGA-II. The steps
task. The objective function for this multiobjective optimiza- of the model can be found in pseudocode Algorithm 1.
tion endeavor is presented as follows:
Algorithm 1 The NSGA-II-ELM-Pinball-Huber model.
N
min i=1 P H (ri ) Require: Hidden layer nodes L, input weight ω, and hidden layer
L (12)
min i=1 | βi |, threshold b
Ensure: Training error and output weight of ELM
where ri is the training error and N represents the number of 1: Set the population and iteration times
2: Initialize the population:
samples; we use the training error based on the new proposed 3: Randomly generate multiple groups of ELMs with different numbers
Pinball-Huber loss function as one optimization objective. β of hidden layer nodes
is the output weight vector of the output layer in the ELM 4: for each ELM
model, and L is the number of hidden layer nodes; in addition, 5: Select the input weight ω and hidden layer threshold b randomly
6: Take the proposed Pinball-Huber loss function as the
we take the L1 norm of it as the other optimization objective. objective function and solve the output weight vector β =
Following the multiobjective optimization process, we (H T W H )−1 H T W Y by IRLS
N
arrive at the set of Pareto solutions. By representing the two 7: Get the training error i=1 P H (ri ) based on Pinball-Huber loss
L
optimization objectives along the horizontal and vertical axe, function and output weight i=1 | βi | of ELM as the two optimiza-
respectively, we observe that the solutions within the Pareto tion objectives
8: end for
set form a U-shaped distribution. This pattern highlights the 9: Fast nondominated sort and crowding distance sort on the population
inherent trade-off between training error and output weight and take it as the Parent
as optimization objectives. The solution situated at the inflec- 10: for each i = 1, 2, 3, · · · , gen do
tion point simultaneously possesses a lower training error and 11: Tournament selection, Simulated Binary Crossover, and Poly-
nomial mutation to produce the Offspring
output weight norm, rendering it the optimal choice for our 12: Merge Parent and Offspring
multiobjective optimization. 13: Fast nondominated sort and crowding distance sort
Furthermore, to validate the effectiveness of the Pinball- 14: Generate new Parent by Elitist strategy
Huber loss function, we conducted a separate comparative 15: end for
16: Get optimized Pareto solution set of all the groups
test. Employing various loss functions in conjunction with 17: Select the best sample from the set, that is, the ELM model with
ELM and integrating a lasso penalty term, the composite the best parameters L, ω, b
ELM-loss function models were employed for power load
forecasting within identical experimental parameters. Elab-
orate insights into the model’s objective function and its
solution procedure are provided in Appendix A.
5 Case study
4.2.1 The overall steps
This section employs the power load dataset from Taixing
In this section, we provide a combined load forecasting model City in southern Jiangsu Province to validate the efficacy
NSGA-II-ELM-Pinball-Huber. The model is ELM based on of the integrated NSGA-II-ELM-Pinball-Huber forecasting
the Pinball-Huber loss function and then optimized by the model within the power load system.

Fig. 3 The loss functions

(a) L2-norm (b) L1

-norm (c) Huber

(d) Biweight (e) Lncosh (f) Pinball-Huber

123
8752 Y. Yang et al.

Table 1 Loss functions and their weight functions

Loss Loss function Derivative function Weight function Default
∂((r )) φ(r )
(r ) φ(r ) = ∂r ) ω(r ) = r

1 2
L2 loss 2r r 1 –
L1 loss |r | sign(r) max(|r |, ) ,
1
= 10− 6 –

2 r , | r |≤ δ
1 2
r , | r |≤ δ δ
Huber loss min(1, |r | ) 1.345
| r | δ − δ2 , | r |> δ
2
δsign(r ), | r |> δ
2
6 [1 − (1 − ( c ) ) ], | r |≤ c
c r 2 3
Biweight loss c2
r (1 − ( rc )2 )2 , | r |≤ c (1 − ( rc )2 )2 , | r |≤ c 4.685
6 , other wise
tanh(r )
Lncosh loss ln(cosh(r)) tanh(r) r –
⎧1 2 ⎧ ⎧
⎪
⎪
⎪ 2 r τ, 0 ≤ r ≤ δ ⎪
⎪
⎪
τr , 0 ≤ r ≤ δ ⎪
⎪
⎪
τ, 0 ≤ r ≤ δ
⎨ 1 r 2 (1 − τ ), −δ ≤ r ≤ 0 ⎨(1 − τ )r , −δ ≤ r ≤ 0 ⎨1 − τ, −δ ≤ r ≤ 0
Pinball-Huber loss 2
δ2 δτ δ, τ
⎪
⎪(| r | δ − 2 )τ, r > δ ⎪
⎪ δτ sign(r ), r > δ ⎪
⎪ |r | , r > δ
⎪
⎩ ⎪
⎩ ⎪
⎩ δ(1−τ )
(| r | δ − δ2 )(1 − τ ), r < −δ
2
δ(1 − τ )sign(r ), r < −δ |r | , r < −δ

5.1 Experimental setup and evaluation criteria 1 2

N
RMSE = ri , (14)
N
In this section, alongside the newly proposed Pinball-Huber i=1
loss function, various common loss functions are integrated
with ELM for comparison, highlighting the performance of and
the novel loss function. Six loss functions, namely L2-norm,
100% ri
N
L1-norm, Huber, Biweight, Lncosh, and Pinball-Huber, are
MAPE = | |. (15)
employed as the objective functions for ELM, enabling a N yi
i=1
comparison of their distinct effects.
The specific experiments involve forecasting the 49th In the three formulas, ri = yi − ŷi (i = 1, 2, ..., N ) is
observation based on the preceding 48 observations. Multi- the residual, yi is the actual power load value, and ŷi is the
step experiments, encompassing three-step, five-step, and forecasting value, representing the forecasting value of the
seven-step forecasts, are conducted under consistent condi- i-th sample (Table 1).
tions. A choice of 200 is made for the number of hidden layer
nodes in ELM, allowing for the demonstration of the com-
5.2 Taixing power load data
pression effect from the lasso penalty within the ELM-loss
function model. The entire experimentation is conducted in
For the Taixing electric power data set, we carry out power
Matlab, utilizing Matlab2016 to compile the experimental
load forecasting. From 2018.5.13 to 2021.8.2, the data set
code (Fig. 3).
records the power load data every other day. In the power
We use root mean square error (RMSE), mean absolute
data set of Taixing City, there are 1,175 data points. We divide
error (MAE), and mean absolute percentage error (MAPE)
these into the training set and test set by a ratio of 8:2, in which
to measure the forecasting effect of the model, as follows:
the former includes 940 points and the latter 235 points. The
specific characteristics of the data are shown in Table 2.
1
N
MAE = | ri |, (13) In Table 1, some of the six loss functions listed need
N
i=1 to set parameters. The δ in Huber loss represents the tun-
ing parameters, which determines how to deal with outliers.

Table 2 The descriptive

Taixing data set Size Min. Max. Median Mean Std. Dev.
statistics of Taixing data set
Data set 1175 1210.872 2875.318 1893.728 1902.926 249.691
Training set 940 1210.872 2578.343 1842.336 1843.150 216.938
Testing set 235 1424.918 2875.318 2137.340 2142.030 228.071

123
Pinball-Huber boosted extreme learning machine regression: a multiobjective... 8753

Fig. 4 Training error distribution diagram of ELM-Pinball-Huber in the Taixing data set

The hyperparameter τ in Pinball loss is the target quantile, Furthermore, an intriguing observation emerged from the
which is used to handle the direction errors in forecasting. comparison among the six loss functions. The L2-norm and
For the robust Pinball-Huber loss function proposed by us, L1-norm loss functions exhibited subpar and unstable perfor-
both parameters δ and τ need to be set. The proper values are mance in multistep forecasting. Huber, Biweight, and Lncosh
determined using the time series cross-validation approach, loss functions demonstrated favorable performance, but their
with δ and τ each choosing a random value between [0,1] stability in multistep experiments displayed notable fluc-
and [1,2]. Taking the training error in single-step forecasting tuations. Conversely, the ELM-Pinball-Huber consistently
as an example for analysis, as shown in Fig. 4, the training demonstrated the most optimal forecasting results while
errors have deviations and are asymmetrically distributed. maintaining a relatively stable performance throughout the
The hyperparameters of the Pinball-Huber loss function experiments.
obtained through the time series cross-validation method are
shown in Table 3. The experimental analysis is as follows: 5.2.2 Comparisons of ELM with loss functions with/without
multiobjective optimization
5.2.1 Comparisons among ELM-Pinball-Huber and ELM
with other loss functions In the preceding section’s comparative experiments, ELM
utilizing the Pinball-Huber loss function exhibited consistent
This section presents the comparative experiments conducted advantages in forecasting accuracy. Nonetheless, a notewor-
on the ELM model using six distinct loss functions that are thy observation was that achieving higher precision often
aimed at substantiating the benefits of the newly introduced resulted in elevated output weights within the ELM model.
Pinball-Huber loss function. Based on the three evaluation This outcome could lead to intricate network structures and
metrics-RMSE, MSE, and MAPE-outlined in Table 4, it can even overfitting issues. To ascertain the enhanced forecast-
be deduced that the forecasting outcomes of ELM utilizing ing performance of ELM-Pinball-Huber through NSGA-II
our Pinball-Huber loss function surpass those achieved with optimization, a comparative validation was conducted.
other loss functions, both in single-step and multistep fore- Table 5 presents the outcomes of multiobjective opti-
casting scenarios. Figure 5 provides a visual representation mization for ELM using diverse loss functions. Upon
of the ELM’s performance in multistep forecasting across comparison with the results from the pre-multiobjective
the six different loss functions. Hence, adopting the pro- optimization experiments illustrated in Table 4, the ELM,
posed Pinball-Huber loss function as the objective function post-multiobjective optimization not only attains heightened
for ELM can lead to enhanced forecasting capabilities within forecasting precision, but it also substantially diminishes the
the power load prediction. output weight within the ELM model. The distribution of
solutions within the Pareto solution set, along with the curve
showcasing the alteration in the two optimization objectives
Table 3 The hyperparameters of Pinball-Huber loss function
with the number of iterations, is depicted in Fig. 6. Notably, as
the number of iterations increases, the values of the two opti-
Steps Single-step Three-step Five-step Seven-step mization objectives consistently decrease. Ultimately, within
δ 1.25 1.60 1.20 1.50 the figure, a Pareto solution is discernible, maintaining com-
τ 0.45 0.30 0.50 0.30 mendable values for both optimization objectives, thereby
achieving elevated forecasting accuracy while concurrently

123
8754 Y. Yang et al.

Table 4 Multistep forecasting results of ELM-loss function models in the Taixing data set
Models Taixing data set
MAE RMSE MAPE β(%) MAE RMSE MAPE β(%)

Single-step Three-step
ELM-L2 115.337 157.701 0.055 48.0 130.577 183.425 0.063 56.0
ELM-L1 274.207 329.957 0.125 52.5 233.836 269.626 0.107 60.5
ELM-Huber 274.858 330.423 0.126 34.0 222.722 285.020 0.103 26.5
ELM-Biweight 85.986 112.454 0.041 38.0 132.345 186.186 0.064 31.0
ELM-Lncosh 258.347 317.961 0.119 50.5 143.788 192.219 0.069 58.0
ELM-Pinball-Huber 73.946 96.879 0.036 73.0 121.818 167.292 0.060 72.0

Five-step Seven-step
ELM-L2 161.549 214.583 0.077 55.5 163.955 216.922 0.079 51.5
ELM-L1 258.779 316.949 0.119 58.5 273.857 331.715 0.125 61.0
ELM-Huber 273.836 330.600 0.125 25.5 274.930 332.128 0.126 31.0
ELM-Biweight 162.277 208.288 0.077 27.0 168.125 221.927 0.081 27.0
ELM-Lncosh 273.831 331.458 0.125 61.0 234.569 296.230 0.108 57.5
ELM-Pinball-Huber 258.779 316.949 0.119 69.0 163.492 222.261 0.077 75.5

Fig. 5 Multistep forecasting

results of ELM-Pinball-Huber
and ELM with other loss
functions in the Taixing data set

123
Pinball-Huber boosted extreme learning machine regression: a multiobjective... 8755

Table 5 Multistep forecasting

Steps Models RMSE Output weight Hidden neurons
results of NSGA-II-ELM-loss
Training Test
function models in the Taixing
data set Single-step NSGA-II-ELM-L2 67.993 96.143 7.974 60
NSGA-II-ELM-L1 67.499 98.078 7.622 50
NSGA-II-ELM-Huber 68.335 109.475 7.556 55
NSGA-II-ELM-Biweight 69.071 114.421 7.644 55
NSGA-II-ELM-Lncosh 71.925 86.592 6.402 45
NSGA-II-ELM-PH 66.545 84.273 7.427 50
Three-step NSGA-II-ELM-L2 98.761 139.295 7.957 60
NSGA-II-ELM-L1 117.152 146.849 3.770 25
NSGA-II-ELM-Huber 120.028 150.411 2.491 15
NSGA-II-ELM-Biweight 117.582 163.121 3.215 20
NSGA-II-ELM-Lncosh 124.808 199.248 2.957 20
NSGA-II-ELM-PH 113.192 129.229 6.934 45
Five-step NSGA-II-ELM-L2 123.072 181.270 4.929 35
NSGA-II-ELM-L1 121.165 179.277 7.597 35
NSGA-II-ELM-Huber 124.281 171.076 6.407 50
NSGA-II-ELM-Biweight 122.325 177.307 5.344 50
NSGA-II-ELM-Lncosh 120.771 185.114 6.386 50
NSGA-II-ELM-PH 120.812 169.957 6.235 45
Seven-step NSGA-II-ELM-L2 126.301 206.447 12.118 70
NSGA-II-ELM-L1 146.658 210.009 3.016 20
NSGA-II-ELM-Huber 147.564 202.533 2.511 20
NSGA-II-ELM-Biweight 145.008 213.595 3.167 20
NSGA-II-ELM-Lncosh 147.561 219.394 2.700 20
NSGA-II-ELM-PH 143.977 196.647 8.706 50

preserving smaller output weights. This simplification con- The multistep ahead forecasting curves for the NSGA-II-
siderably reduces the intricacies of the model network. The ELM-Pinball-Huber model are displayed in Fig. 7.
streamlined ELM necessitates fewer hidden layer nodes,
enhancing its generalization capabilities. Moreover, we have 5.2.3 Comparisons among NGSA-II-ELM-Pinball-Huber
also observed that NSGA-II can enhance the performance and comparative models
of various ELM-loss function combinations, indicating its
wide applicability for ELM. Notably, the amalgamation of To assess the predictive performance of the NSGA-II-ELM-
the Pinball-Huber loss function and ELM, following NSGA- Pinball-Huber model, we conduct comparative experiments
II optimization, demonstrates the most optimal performance.

Fig. 6 Distribution of Pareto

solution set and optimization
iteration curves of
NSGA-II-ELM-Pinball-Huber
in the Taixing data set

123
8756 Y. Yang et al.

Fig. 7 Multi-step forecasting curves of NSGA-II-ELM-Pinball-Huber in Taixing data set

Table 6 Parameter of
Model Parameter name Parameter value
comparative models
LSTM number of layers, units [3, 64]
GRU number of layers, units [3, 64]
CNN-BiLSTM-Attention number of layers, kernel size, units [9, 1, 64]

Table 7 Comparisons among

Steps Models RMSE Output weight
NGSA-II-ELM-Pinball-Huber
and other models Single-step LSTM 184.97 11.06
GRU 184.58 11.15
CNN-BiLSTM-Attention 273.91 96.45
NSGA-II-ELM-PH 84.27 7.43
Three-step LSTM 221.56 29.94
GRU 228.79 31.56
CNN-BiLSTM-Attention 257.66 287.01
NSGA-II-ELM-PH 129.23 6.93
Five-step LSTM 229.74 47.26
GRU 256.36 49.84
CNN-BiLSTM-Attention 259.29 480.26
NSGA-II-ELM-PH 169.96 6.24
Seven-step LSTM 256.58 67.89
GRU 314.94 63.85
CNN-BiLSTM-Attention 258.21 669.97
NSGA-II-ELM-PH 196.65 8.71

123
Pinball-Huber boosted extreme learning machine regression: a multiobjective... 8757

Table 8 The Wilcoxon

Steps NSGA-II-ELM-PH NSGA-II-ELM-PH NSGA-II-ELM-PH
signed-rank test of the compared
vs vs vs
models
LSTM GRU CNN-BiLSTM-Attention
p-value h p-value h p-value h

Single-step <0.05 1 <0.05 1 < 0.05 1

Three-step <0.05 1 <0.05 1 < 0.05 1
Five-step <0.05 1 <0.05 1 > 0.05 0
Seven-step <0.05 1 <0.05 1 > 0.05 0

with three models: LSTM, GRU, and CNN-BiLSTM-Atten- significance level for the one-tailed test was set at α = 0.05.
tion. Brief descriptions of these models are as follows: The original hypothesis posited that there would be no signif-
icant difference in the predictive results between our model
(1) LSTM model: LSTM, an improved variant of traditional and the comparative models in power load forecasting. If
RNN, effectively captures the semantic relationships in the p-value is less than 0.05, the original hypothesis will be
long sequences, mitigating gradient vanishing or explod- rejected (h=1). The predicted values of the proposed model
ing issues. LSTM features a more complex structure [48]. and three comparison models were turned into Wilcoxon
(2) GRU model: Introduced by Cho et al. [49] in 2014, the signed-rank tests separately in multistep forecasting from 1
GRU neural network addresses the gradient vanishing to 7 steps. The results are shown in Table 8.
problem in standard recurrent neural networks and shares
a similar design philosophy with LSTM.
(3) CNN-BiLSTM-Attention model [50]: This model employs
complex mathematical operations in the convolutional 6 Conclusion and future work
and pooling layers of the convolutional neural net-
works (CNN) to extract the spatial features of the input In the current paper, we have introduced a robust Pinball-
variables. The multi-head attention layer minimizes irrel- Huber loss function that demonstrates remarkable resistance
evant feature impact, enhancing the extracted features. to outliers and substantially reduces the likelihood of over-
The BiLSTM layer models trend information in time fitting. This loss function effectively manages outliers and
series, generating a probability model for prediction dis- asymmetrical noise within the dataset, serving as the objec-
tribution. tive function for training the ELM model. Given the ELM’s
susceptibility to preset parameters’ influence, and aiming to
We adjust the hyperparameters of each model to achieve opti- ensure forecasting accuracy while maximizing and simpli-
mal performance, as shown in Table 6. The evaluation metrics fying the ELM network structure, as well as preventing the
for forecasting performance are presented in Table 7. squandering of training time and the emergence of overfit-
The predictive performance of LSTM and GRU is simi- ting because of an excessive number of hidden layer nodes,
lar, showing close values for RMSE and the output weight. we employed the NSGA-II algorithm for the optimization
Comparing the predictive performance evaluation metrics of of both training errors and output weights within the ELM
LSTM, GRU, and CNN-BiLSTM-Attention with NSGA-II- model. The combined NSGA-II-ELM Pinball-Huber model
ELM-Pinball-Huber, the RMSE of the proposed model was was then employed for power load forecasting in the context
always lower than that of the three comparison models. Par- of Taixing City. By employing the multi-objective optimiza-
ticularly in single-step forecasting, the prediction error of tion algorithm NSGA-II, we acquired the Pareto optimal
NSGA-II-ELM-Pinball-Huber (RMSE=84.27) was signifi- solution set for the number of hidden layer nodes in the ELM
cantly smaller than that of the three comparative models model, enabling an in-depth analysis of the forecasting out-
(RMSE=184.87, 184.58, 273.91). These results indicate that comes. Our analysis of the experimental outcomes revealed
the proposed model effectively captured the changing trends that the performance of the suggested Pinball-Huber loss
in power load data in both the spatial and temporal dimen- function within the ELM framework surpassed that of other
sions. Furthermore, the proposed model maintained a stable loss functions. Moreover, the NSGA-II algorithm effectively
structure in multistep forecasting. enhanced the performance of diverse ELM-loss function
Finally, to verify whether the NSGA-II-ELM-Pinball- combinations. The innovative combined approach, NSGA-
Huber model significantly improves predictive accuracy II-ELM Pinball-Huber model, can be seen as a promising
in power load forecasting compared with other models, and effective method for power load forecasting, offering a
we conducted the Wilcoxon signed-rank test [51]. The novel solution to this domain.

123
8758 Y. Yang et al.

Multiobjective optimization greatly improves the predic- where W N is the sample weight matrix and W L is the weight
tive performance of the model, but it takes up a significant matrix of hidden nodes. The details of wi of the loss function
amount of computational resources. In the future, we aim to can be found in Table 1. Their specific forms are as follows:
delve deeper into simplifying the computational resources
and time required for training the proposed method, which ⎡ ⎤
w(r1 ) 0 · · · 0
is crucial for the widespread applicability of the model. Fur- ⎢ 0 w(r2 ) · · · 0 ⎥
⎢ ⎥
thermore, this forecasting model can only provide predicted WN = ⎢ . .. . . .. ⎥
values for future power loads, and recent research has focused ⎣ .. . . . ⎦
on uncertain predictions. Future studies will delve deeper into 0 0 · · · w(r N )
⎡ ⎤
the probability predictions of the model, which holds signif- 1
max(|β1 |, ) 0 ··· 0
icant value for practical applications in power systems [52, ⎢ 1
··· ⎥
⎢ 0 max(|β2 |, ) 0 ⎥
53]. WL = ⎢
⎢ .. .. .. .. ⎥
⎥
⎣ . . . . ⎦
0 0 ··· 1
max(|β L |, )
Appendix
In general, the specific steps of the ELM-Pinball-Huber
A The ELM-loss function model model are as follows:
Step 1: Initialize the relevant parameters w, b, and L of
The combined ELM-loss function is a single objective model, the ELM-Pinball-Huber model.
and its mathematical model can be written as follows: Step 2: Calculate the output weight vector β by (18).
Step 3: Update sample weight matrix W N and hidden

N
L
nodes’ weight matrix W L .
min C P H (ri ) + | βj | Step 4: Repeat steps 2-3 until β converges; then, obtain
i=1 j=1 (16)
the trained ELM-Pinball-Huber model.
s.t. h(xi ) = yi − ri , i = 1, 2, ..., N Step 5: Substitute the test set into the trained model to get
the forecasting results.
where ri represents the training error of the sample and Similar to the above ELM-Pinball-Huber model, we can
N
i=1 P H (ri ) is the total error under Pinball-Huber loss combine the loss functions in Table 1 with ELM, respectively,
function of N different training samples, here representing to compare their performance.
experience risk. Lj=1 | β j | is a lasso penalty term, repre-
Acknowledgements The work is supported by the Australian Research
senting the complexity of the model. C > 0 is called the
Council project (grant number DP160104292), the National Natu-
regularization parameter or the penalty parameter and is used ral Science Foundation of China under Grant 61873130 and Grant
to balance empirical risk and model complexity. 61833011, the Natural Science Foundation of Jiangsu Province under
Lagrangian multipliers are introduced for each equality Grant BK20191377, the 1311 Talent Project of Nanjing University of
Posts and Telecommunications, and “Chunhui Program” Collaborative
constraint condition in the model, and the Lagrangian func-
Scientific Research Project (202202004).
tion is constructed to transform it into an unconstrained
optimization problem: Author Contributions Yang Yang: Writing - review & editing, Fund-
ing acquisition. Hao Lou: Software, Visualization, Formal analysis,
Writing-original draft. Zijin Wang: Writing-review & editing. Jinran

N
L
Wu: Supervision, Formal analysis, Writing-original draft, Writing-
L(β, r , α) = C P H (ri ) + | βj | review & editing.
i=1 j=1

N Funding Open Access funding enabled and organized by CAUL and
− αi (h(xi )β − yi − ri ). (17) its Member Institutions.
i=1
Data Availability The data are available with a reasonable request.

where α = [αi , α2 , ..., α N ] is the Lagrange multiplier vector.

Declarations
We solve (17), and we obtain the output weight vector β
as follows:
Competing of interest The authors declare that they have no known
competing financial interests or personal relationships that could have
( WCL + H T W N H )−1 H T W N Y , N ≥ L
β= influenced the work reported in this paper.
W L−1 H T ( CI + W N H W L−1 H T )−1 W N Y , N<L
Ethical approval This article does not contain any studies on human
(18) participants or animals.

123
Pinball-Huber boosted extreme learning machine regression: a multiobjective... 8759

Open Access This article is licensed under a Creative Commons 17. Biswas MR, Robinson MD, Fumo N (2016) Prediction of resi-
Attribution 4.0 International License, which permits use, sharing, adap- dential building energy consumption: A neural network approach.
tation, distribution and reproduction in any medium or format, as Energy 117:84–92
long as you give appropriate credit to the original author(s) and the 18. Trairat P, Banjerdpongchai D (2022) Multi-objective optimal oper-
source, provide a link to the Creative Commons licence, and indi- ation of building energy management systems with thermal and
cate if changes were made. The images or other third party material battery energy storage in the presence of load uncertainty. Sustain-
in this article are included in the article’s Creative Commons licence, ability 14(19):12717
unless indicated otherwise in a credit line to the material. If material 19. Tian X, Zou Y, Wang X, Tseng M, Li H, Zhang H (2022) Improving
is not included in the article’s Creative Commons licence and your the efficiency and sustainability of intelligent electricity inspec-
intended use is not permitted by statutory regulation or exceeds the tion: IMFO-ELM Algorithm for Load Forecasting. Sustainability
permitted use, you will need to obtain permission directly from the copy- 14(21):13942
right holder. To view a copy of this licence, visit http://creativecomm 20. Sajjadi S, Shamshirband S, Alizamir M, Yee L, Mansor Z, Manaf
ons.org/licenses/by/4.0/. AA et al (2016) Extreme learning machine for prediction of heat
load in district heating systems. Energy Build 122:222–227
21. Ding S, Xu X, Nie R (2014) Extreme learning machine and its
applications. Neural Comput Appl 25(3):549–556
References 22. Zhou Y, Zhou N, Gong L, Jiang M (2020) Prediction of photovoltaic
power output based on similar day analysis, genetic algorithm and
1. Li K, Huang W, Hu G, Li J (2023) Ultra-short term power load extreme learning machine. Energy 204:117894
forecasting based on CEEMDAN-SE and LSTM neural network. 23. Ni Q, Zhuang S, Sheng H, Kang G, Xiao J (2017) An ensemble
Energy Build 279:112666 prediction intervals approach for short-term PV power forecasting.
2. Wen L, Zhou K, Yang S, Lu X (2019) Optimal load dispatch of Solar Energy 155:1072–1083
community microgrid with deep learning based solar power and 24. Han Y, Wang N, Ma M, Zhou H, Dai S, Zhu H (2019) A PV power
load forecasting. Energy 171:1053–1065 interval forecasting based on seasonal model and nonparametric
3. Lebotsa ME, Sigauke C, Bere A, Fildes R, Boylan JE (2018) Short estimation algorithm. Solar Energy 184:515–526
term electricity demand forecasting using partially linear additive 25. Chen X, Yu R, Ullah S, Wu D, Li Z, Li Q et al (2022) A novel
quantile regression with an application to the unit commitment loss function of deep learning in wind speed forecasting. Energy
problem. Appl Energy 222:104–118 238:121808
4. He F, Zhou J, Mo L, Feng K, Liu G, He Z (2020) Day- 26. Chen J, Zeng Y, Li Y, Huang GB (2020) Unsupervised feature
ahead short-term load probability density forecasting method with selection based extreme learning machine for clustering. Neuro-
a decomposition-based quantile regression forest. Appl Energy computing 386:198–207
262:114396 27. Yang Y, Tao Z, Qian C, Gao Y, Zhou H, Ding Z, et al (2022) A
5. Gupta D, Hazarika BB, Berlin M (2020) Robust regularized hybrid robust system considering outliers for electric load series
extreme learning machine with asymmetric Huber loss function. forecasting. Applied Intelligence, pp 1–23
Neural Comput Appl 32(16):12971–12998 28. Wang J, Zhu H, Cheng F, Zhou C, Zhang Y, Xu H et al (2023)
6. Zhang J, Siya W, Zhongfu T, Anli S (2023) An improved hybrid A novel wind power prediction model improved with feature
model for short term power load prediction. Energy 268:126561 enhancement and autoregressive error compensation. J Clean Prod
7. Wang J, Zhang L, Li Z (2022) Interval forecasting system for 420:138386
electricity load based on data pre-processing strategy and multi- 29. Wj Niu, Zk Feng, Li Ss Wu, Hj Wang Jy (2021) Short-term elec-
objective optimization algorithm. Appl Energy 305:117911 tricity load time series prediction by machine learning model via
8. Wu F, Cattani C, Song W, Zio E (2020) Fractional ARIMA with an feature selection and parameter optimization using hybrid cooper-
improved cuckoo search optimization for the efficient Short-term ation search algorithm. Environ Res Lett 16(5):055032
power load forecasting. Alex Eng J 59(5):3111–3118 30. Shang Z, He Z, Song Y, Yang Y, Li L, Chen Y (2020) A novel
9. Dudek G (2016) Pattern-based local linear regression models for combined model for short-term electric load forecasting based on
short-term load forecasting. Electr Power Syst Res 130:139–147 whale optimization algorithm. Neural Process Lett 52:1207–1232
10. Lee CM, Ko CN (2011) Short-term load forecasting using lifting 31. Xie K, Yi H, Hu G, Li L, Fan Z (2020) Short-term power load
scheme and ARIMA models. Expert Syst Appl 38(5):5902–5911 forecasting based on Elman neural network with particle swarm
11. Wang J, Xing Q, Zeng B, Zhao W (2022) An ensemble forecasting optimization. Neurocomputing 416:136–142
system for short-term power load based on multi-objective opti- 32. Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi A
mizer and fuzzy granulation. Appl Energy 327:120042 (2020) The arithmetic optimization algorithm. Comput Methods
12. Voyant C, Notton G, Kalogirou S, Nivet ML, Paoli C, Motte F et al Appl Mech Eng 376:113609
(2017) Machine learning methods for solar radiation forecasting: 33. Kaboli SHA, Fallahpour A, Selvaraj J, Rahim N (2017) Long-
A review. Renew Energy 105:569–582 term electrical energy consumption formulating and forecasting via
13. Kim MK, Kim YS, Srebric J (2020) Predictions of electricity con- optimized gene expression programming. Energy 126:144–164
sumption in a campus building using occupant rates and weather 34. Khishe M, Mosavi MR (2020) Chimp optimization algorithm.
elements with sensitivity analysis: Artificial neural network vs. lin- Expert Syst Appl 149:113338
ear regression. Sustain Cities and Soc 62:102385 35. Luo L, Li H, Wang J, Hu J (2021) Design of a combined wind speed
14. Hopfield JJ (1988) Artificial neural networks. IEEE Circ Devices forecasting system based on decomposition-ensemble and multi-
Mag 4(5):3–10 objective optimization approach. Appl Math Model 89:49–72
15. Awad M, Khanna R, Awad M, Khanna R (2015) Support vector 36. Yang Y, Zhou H, Gao Y, Wu J, Wang YG, Fu L (2022) Robust
regression. Theories, concepts, and applications for engineers and penalized extreme learning machine regression with applications
system designers, Efficient learning machines, pp 67–80 in wind speed forecasting. Neural Comput Appl 34(1):391–407
16. Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: 37. Huber PJ (1973) Robust regression: asymptotics, conjectures and
a new learning scheme of feedforward neural networks. In: 2004 Monte Carlo. The Annals of Statistics, pp 799–821
IEEE international joint conference on neural networks (IEEE Cat.
No. 04CH37541), vol 2. Ieee; pp 985–990

123
8760 Y. Yang et al.

38. Wang K, Zhong P (2014) Robust non-convex least squares loss 48. Wang J, Zhu H, Zhang Y, Cheng F, Zhou C (2023) A novel pre-
function for regression with outliers. Knowl-Based Syst 71:290– diction model for wind power based on improved long short-term
302 memory neural network. Energy 265:126283
39. Yang X, Tan L, He L (2014) A robust least squares support vector 49. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares
machine for regression and classification with noise. Neurocom- F, Schwenk H, et al (2014) Learning phrase representations
puting 140:41–52 using RNN encoder-decoder for statistical machine translation.
40. Beaton AE, Tukey JW (1974) The fitting of power series, meaning arXiv:1406.1078
polynomials, illustrated on band-spectroscopic data. Technomet- 50. Zhang YM, Wang H (2023) Multi-head attention-based probabilis-
rics 16(2):147–185 tic CNN-BiLSTM for day-ahead wind speed forecasting. Energy
41. Wang X, Jiang Y, Huang M, Zhang H (2013) Robust vari- 278:127865
able selection with exponential squared loss. J Am Stat Assoc 51. Li D, Jiang MR, Li MW, Hong WC, Xu RZ (2023) A floating
108(502):632–643 offshore platform motion forecasting approach based on EEMD
42. Karal O (2017) Maximum likelihood optimal and robust Support hybrid ConvLSTM and chaotic quantum ALO. Applied Soft Com-
Vector Regression with lncosh loss function. Neural Netw 94:1–12 puting, pp 110487
43. Huang GB, Siew CK (2005) Extreme learning machine with ran- 52. Hong T, Fan S (2016) Probabilistic electric load forecasting: A
domly assigned RBF kernels. Int J Inf Technol 11(1):16–24 tutorial review. Int J Forecast 32(3):914–938
44. Huang G, Huang GB, Song S, You K (2015) Trends in extreme 53. Zhang Y, Wang J, Wang X (2014) Review on probabilistic forecast-
learning machines: A review. Neural Netw 61:32–48 ing of wind power generation. Renew Sust Energ Rev 32:255–270
45. Konak A, Coit DW, Smith AE (2006) Multi-objective optimiza-
tion using genetic algorithms: A tutorial. Reliab Eng & Syst Saf
91(9):992–1007
Publisher’s Note Springer Nature remains neutral with regard to juris-
46. Sampson JR.: Adaptation in natural and artificial systems (John H.
dictional claims in published maps and institutional affiliations.
Holland). Society for Industrial and Applied Mathematics
47. Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast eli-
tist non-dominated sorting genetic algorithm for multi-objective
optimization: NSGA-II. In: International conference on parallel
problem solving from nature. Springer, pp 849–858

123

Electric Load Forecasting by The SVR Model With Differential Empirical Mode Decomposition and Auto Regression
No ratings yet
Electric Load Forecasting by The SVR Model With Differential Empirical Mode Decomposition and Auto Regression
13 pages
2024 基于特征提取统计技术的深度学习短期电力负荷预测混合模型
No ratings yet
2024 基于特征提取统计技术的深度学习短期电力负荷预测混合模型
16 pages
Short-Term Load Forecasting With Weather Component Based On Improved Extreme Learning Machine
No ratings yet
Short-Term Load Forecasting With Weather Component Based On Improved Extreme Learning Machine
6 pages
68 Shortterm Load Forecasting Based On Support Vector Regression and Load Profiling
No ratings yet
68 Shortterm Load Forecasting Based On Support Vector Regression and Load Profiling
13 pages
Energies 17 05524
No ratings yet
Energies 17 05524
27 pages
A Novel Data-Driven Method With Decomposition Mechanism Suitable For Different Periods of Electrical Load Forecasting
No ratings yet
A Novel Data-Driven Method With Decomposition Mechanism Suitable For Different Periods of Electrical Load Forecasting
14 pages
Short-Term Electricity Load Forecasting Based On Ensemble Empirical Mode Decomposition and Long Short-Term Memory Neural Network
No ratings yet
Short-Term Electricity Load Forecasting Based On Ensemble Empirical Mode Decomposition and Long Short-Term Memory Neural Network
5 pages
Machine-Learning Based Methods in Short-Term Load Forecasting
No ratings yet
Machine-Learning Based Methods in Short-Term Load Forecasting
7 pages
IET Generation Trans Dist - 2019 - Tang - Short Term Power Load Forecasting Based On Multi Layer Bidirectional Recurrent
No ratings yet
IET Generation Trans Dist - 2019 - Tang - Short Term Power Load Forecasting Based On Multi Layer Bidirectional Recurrent
8 pages
Cải thiện dự báo tải điện bằng cách sử dụng mạng bộ nhớ dài hạn phân vị với cơ chế chú ý kép
No ratings yet
Cải thiện dự báo tải điện bằng cách sử dụng mạng bộ nhớ dài hạn phân vị với cơ chế chú ý kép
13 pages
Short-Term Power Load Forecasting Using SSA-CNN-LSTM Method
No ratings yet
Short-Term Power Load Forecasting Using SSA-CNN-LSTM Method
11 pages
Electric Power Systems Research
No ratings yet
Electric Power Systems Research
13 pages
Appliquer À L'algorithme D'apprentissage Des Limites de Données Du Micro-Réseau
No ratings yet
Appliquer À L'algorithme D'apprentissage Des Limites de Données Du Micro-Réseau
13 pages
Online Short Term Load Forecasting Methods Using Hybrids
No ratings yet
Online Short Term Load Forecasting Methods Using Hybrids
10 pages
Energies 17 05173
No ratings yet
Energies 17 05173
12 pages
Electrical Power and Energy Systems: Wei-Chiang Hong
No ratings yet
Electrical Power and Energy Systems: Wei-Chiang Hong
9 pages
Short Term Load Forecasting Based On Feature Extraction and Improved General Regression Neural Network Model
No ratings yet
Short Term Load Forecasting Based On Feature Extraction and Improved General Regression Neural Network Model
11 pages
Energies 16 01434
No ratings yet
Energies 16 01434
21 pages
Research Article: Optimized Extreme Learning Machine For Power System Transient Stability Prediction Using Synchrophasors
No ratings yet
Research Article: Optimized Extreme Learning Machine For Power System Transient Stability Prediction Using Synchrophasors
9 pages
Power System Load Forecasting Method Based On Neur
No ratings yet
Power System Load Forecasting Method Based On Neur
7 pages
Short-Term Load and Wind Power Forecasting Using Neural Network-Based Prediction Intervals
No ratings yet
Short-Term Load and Wind Power Forecasting Using Neural Network-Based Prediction Intervals
13 pages
Electricity Load Forecasting by An Improved Forecast Engine For Building Level Consumers
No ratings yet
Electricity Load Forecasting by An Improved Forecast Engine For Building Level Consumers
3 pages
Short-Term Load Forecasting With Temporal Fusion Transformers For Power Distribution Networks
No ratings yet
Short-Term Load Forecasting With Temporal Fusion Transformers For Power Distribution Networks
5 pages
Función Real Base PS Feb 07
No ratings yet
Función Real Base PS Feb 07
9 pages
Energies 17 05385
No ratings yet
Energies 17 05385
35 pages
1 s2.0 S0378779623002766 Main
No ratings yet
1 s2.0 S0378779623002766 Main
10 pages
Electrical Load Forecasting Model Using Hybrid LSTM Neural Networks With Online Correction
No ratings yet
Electrical Load Forecasting Model Using Hybrid LSTM Neural Networks With Online Correction
9 pages
Electric Power Systems Research: MD Ohirul Qays, Iftekhar Ahmad, Daryoush Habibi, Mohammad A.S. Masoum
No ratings yet
Electric Power Systems Research: MD Ohirul Qays, Iftekhar Ahmad, Daryoush Habibi, Mohammad A.S. Masoum
11 pages
Etasr 8304
No ratings yet
Etasr 8304
6 pages
The Electrical Load Forecasting Base On An Optimal Selection Method of Multiple Models in DSM
No ratings yet
The Electrical Load Forecasting Base On An Optimal Selection Method of Multiple Models in DSM
8 pages
Ref 21
No ratings yet
Ref 21
18 pages
Prediction of Electricity Power Consumption Using Machine Learning Approach
No ratings yet
Prediction of Electricity Power Consumption Using Machine Learning Approach
7 pages
IET Generation Trans Dist - 2019 - Labed - Extreme Learning Machine Based Alleviation For Overloaded Power System
No ratings yet
IET Generation Trans Dist - 2019 - Labed - Extreme Learning Machine Based Alleviation For Overloaded Power System
14 pages
Comparison of ARIMA and SVM For Short-Term Load Forecasting
No ratings yet
Comparison of ARIMA and SVM For Short-Term Load Forecasting
6 pages
Supporting Future Electrical Utilities - Infotech
No ratings yet
Supporting Future Electrical Utilities - Infotech
6 pages
Input Dimension Reduction For Load Forecasting Based On Support Vector Machines
No ratings yet
Input Dimension Reduction For Load Forecasting Based On Support Vector Machines
5 pages
Short-Term Load Forecasting of An Interconnected Grid by Using Neural Network
No ratings yet
Short-Term Load Forecasting of An Interconnected Grid by Using Neural Network
10 pages
Approaches To Wind Power Curve Modeling A Review and Discussion
No ratings yet
Approaches To Wind Power Curve Modeling A Review and Discussion
21 pages
Thesis Qiu Xueheng Final
100% (1)
Thesis Qiu Xueheng Final
154 pages
Mehtodalgy
No ratings yet
Mehtodalgy
22 pages
Dự báo tải điện cực ngắn hạn ở cấp độ phút dựa trên các tính năng dữ liệu chuỗi thời gian
No ratings yet
Dự báo tải điện cực ngắn hạn ở cấp độ phút dựa trên các tính năng dữ liệu chuỗi thời gian
22 pages
Applsci 14 07803
No ratings yet
Applsci 14 07803
19 pages
An Integrated Modeling Strategy For Wind Power Forecasting Based On Dynamic Meteorological Visualization
No ratings yet
An Integrated Modeling Strategy For Wind Power Forecasting Based On Dynamic Meteorological Visualization
11 pages
Probabilistic Electric Load Forecasting Through Bayesian Mixture Density Networks
No ratings yet
Probabilistic Electric Load Forecasting Through Bayesian Mixture Density Networks
31 pages
2021-Bayesian Deep Learning For Dynamic Power System State Prediction Considering Renewable Energy Uncertainty
No ratings yet
2021-Bayesian Deep Learning For Dynamic Power System State Prediction Considering Renewable Energy Uncertainty
10 pages
An Integrated Gaussian Process Modeling Framework For Residential Load Prediction
No ratings yet
An Integrated Gaussian Process Modeling Framework For Residential Load Prediction
11 pages
Wang 2022 J. Phys. Conf. Ser. 2378 012068
No ratings yet
Wang 2022 J. Phys. Conf. Ser. 2378 012068
6 pages
Voltage Stability Prediction Using Active Machine Learning
No ratings yet
Voltage Stability Prediction Using Active Machine Learning
8 pages
1 s2.0 S0360544222030584 Main
No ratings yet
1 s2.0 S0360544222030584 Main
15 pages
Forcasting Related
No ratings yet
Forcasting Related
7 pages
Electricity-05-00039 Hif
No ratings yet
Electricity-05-00039 Hif
19 pages
Deep Neural Networks For Short-Term Load Forecasting in ERCOT System
No ratings yet
Deep Neural Networks For Short-Term Load Forecasting in ERCOT System
6 pages
Energies: Stacking Ensemble Learning For Short-Term Electricity Consumption Forecasting
No ratings yet
Energies: Stacking Ensemble Learning For Short-Term Electricity Consumption Forecasting
31 pages
1 s2.0 S0378779621007276 Main
No ratings yet
1 s2.0 S0378779621007276 Main
11 pages
Application of Elman Neural Network and MATLAB To Load Forecasting
No ratings yet
Application of Elman Neural Network and MATLAB To Load Forecasting
5 pages
SVC Based Demand Power
No ratings yet
SVC Based Demand Power
12 pages
Improved Deep Belief Network and Model Interpretation Method For Power System Transient Stability Assessment
No ratings yet
Improved Deep Belief Network and Model Interpretation Method For Power System Transient Stability Assessment
11 pages
Optimizing Short Term Load Forecast A Study On Machine Learning Model Accuracy and Predictor Selection
No ratings yet
Optimizing Short Term Load Forecast A Study On Machine Learning Model Accuracy and Predictor Selection
4 pages
Hybrid Machine Learning-Based Estimation of Remaining Useful Life (RUL) and SOH of Lithium-Ion Batteries for EV Applications
From Everand
Hybrid Machine Learning-Based Estimation of Remaining Useful Life (RUL) and SOH of Lithium-Ion Batteries for EV Applications
Giritharan Mani
No ratings yet
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
From Everand
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
JAVA PROGRAMMING Lab Manual
No ratings yet
JAVA PROGRAMMING Lab Manual
42 pages
Exercise 1: Pizza Class Represents A Type of Pizza That Belongs To The Pizzeria's Menu
No ratings yet
Exercise 1: Pizza Class Represents A Type of Pizza That Belongs To The Pizzeria's Menu
4 pages
Iso 123
No ratings yet
Iso 123
13 pages
PilotstarD AP02-S01 Mar09
No ratings yet
PilotstarD AP02-S01 Mar09
168 pages
Sample MA Due Diligence Issues Report
No ratings yet
Sample MA Due Diligence Issues Report
10 pages
Chapter 3 Selections Part 2
No ratings yet
Chapter 3 Selections Part 2
33 pages
Di Tia v17
No ratings yet
Di Tia v17
48 pages
KWV 230 BT
No ratings yet
KWV 230 BT
96 pages
Computer 10 4th MY ANSWER
No ratings yet
Computer 10 4th MY ANSWER
11 pages
LXV50 2stroke Workshop Manual PDF
No ratings yet
LXV50 2stroke Workshop Manual PDF
162 pages
Steps For Price Bid and EPublsih
No ratings yet
Steps For Price Bid and EPublsih
39 pages
Kareem Shagar Formation An Oil Field Located in Ras Gharib Development
No ratings yet
Kareem Shagar Formation An Oil Field Located in Ras Gharib Development
53 pages
VISTA EXPLODIDA Lei SA
No ratings yet
VISTA EXPLODIDA Lei SA
56 pages
Advanced ATM Crime Prevention System by Using Wireless Communication
No ratings yet
Advanced ATM Crime Prevention System by Using Wireless Communication
6 pages
PanduitProductDetails UTP28SP2MBU
No ratings yet
PanduitProductDetails UTP28SP2MBU
2 pages
Reed Arena Facility Guide
No ratings yet
Reed Arena Facility Guide
28 pages
DLL - Mapeh 6 - Q1 - W6
No ratings yet
DLL - Mapeh 6 - Q1 - W6
6 pages
15kw - SN College - SLD
No ratings yet
15kw - SN College - SLD
1 page
4470 - 121302 Part List
No ratings yet
4470 - 121302 Part List
2 pages
Network Security Engineer
No ratings yet
Network Security Engineer
5 pages
Datatool Alarm Manual
No ratings yet
Datatool Alarm Manual
20 pages
Methods2023 Syllabus
No ratings yet
Methods2023 Syllabus
7 pages
Problem Set 1 Answers
No ratings yet
Problem Set 1 Answers
4 pages
Design and Implement of Performance of M
No ratings yet
Design and Implement of Performance of M
4 pages
Fire Fighting Techniques
No ratings yet
Fire Fighting Techniques
3 pages
Hephaestus 7100 - Quick Reference Guide
No ratings yet
Hephaestus 7100 - Quick Reference Guide
4 pages
DRM Steps
100% (3)
DRM Steps
30 pages
Clinical Job Aid Radiant Warmer Phoenix
No ratings yet
Clinical Job Aid Radiant Warmer Phoenix
2 pages
Skillnet Ireland - Network Brand Guidelines
100% (1)
Skillnet Ireland - Network Brand Guidelines
59 pages
Aman Pandey Resume 20241012
No ratings yet
Aman Pandey Resume 20241012
2 pages

Pinball-Huber Boosted Extreme Learning Machine Regression: A Multiobjective Approach To Accurate Power Load Forecasting

Uploaded by

Pinball-Huber Boosted Extreme Learning Machine Regression: A Multiobjective Approach To Accurate Power Load Forecasting

Uploaded by

Applied Intelligence (2024) 54:8745–8760

Pinball-Huber boosted extreme learning machine regression:

Accepted: 25 June 2024 / Published online: 3 July 2024

1 Introduction substantial progress in enhancing robustness and accuracy.

3 The preliminaries where parameter δ represents tuning parameters, which con-

3.2 Extreme learning machine

Unlike traditional feedforward neural networks, such as BP,

Fig. 2 The outflow diagram of

Fig. 3 The loss functions

(a) L2-norm (b) L1

(d) Biweight (e) Lncosh (f) Pinball-Huber

Table 1 Loss functions and their weight functions

5.1 Experimental setup and evaluation criteria 1 2

Table 2 The descriptive

Fig. 5 Multistep forecasting

Table 5 Multistep forecasting

Fig. 6 Distribution of Pareto

Fig. 7 Multi-step forecasting curves of NSGA-II-ELM-Pinball-Huber in Taixing data set

Table 7 Comparisons among

Table 8 The Wilcoxon

Single-step <0.05 1 <0.05 1 < 0.05 1

where α = [αi , α2 , ..., α N ] is the Lagrange multiplier vector.

You might also like