0% found this document useful (0 votes)

10 views19 pages

Double Level Optimal Fuzzy Association Rules Prediction - 2024 - Expert Systems

Information granulation theory has been widely used in short-term time-series forecasting research and holds significant weight. However, the error accumulation due to the lack of granular accuracy, along with information redundancy or deficiency in predictions, significantly affects short-term prediction accuracy. To compensate for these shortcomings, this paper proposes a double-level optimal fuzzy association rules prediction model for short-term time-series forecasting, which can strengthen

Uploaded by

mvictorrajmechanical

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views19 pages

Double Level Optimal Fuzzy Association Rules Prediction - 2024 - Expert Systems

Uploaded by

mvictorrajmechanical

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Expert Systems With Applications 251 (2024) 123959

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

Double-level optimal fuzzy association rules prediction model for time series
based on DTW-i𝐿1 fuzzy C-means
Sidong Xian a,b ,∗, Chaozheng Li a , Miaomiao Feng b , Yonghong Li b
a
Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, PR China
b
Key Laboratory of Intelligent Analysis and Decision on Complex Systems, Chongqing University of Posts and Telecommunications, Chongqing 400065, PR China

ARTICLE INFO ABSTRACT

Keywords: Information granulation theory has been widely used in short-term time-series forecasting research and
Granular time series holds significant weight. However, the error accumulation due to the lack of granular accuracy, along with
Linear fuzzy information granules information redundancy or deficiency in predictions, significantly affects short-term prediction accuracy.
Breakpoint
To compensate for these shortcomings, this paper proposes a double-level optimal fuzzy association rules
Granule-suited fuzzy c-means
prediction model for short-term time-series forecasting, which can strengthen the performance of information
Short-term forecasting
granulation in prediction. Firstly, this paper proposes a concept of breakpoints, which can accurately segment
complex linear trends in time series and thus obtain a granular time series with highly accurate linear
fuzzy information granules (LFIGs). Secondly, a improved distance is proposed to more accurately reflect the
similarity between LFIGs by addressing counter-intuitive problems in the original distance. Theoretical analysis
shows that the improved distance can effectively reduce errors in granular calculation. Then, a granule-suited
fuzzy c-means algorithm is proposed for clustering LFIGs. Finally, this paper proposes a double-level optimal
fuzzy association rules prediction model, which establishes the optimal rules for each cluster and selects the
optimal two rules for prediction by the contribution of the clusters. The experimental results show that the
prediction method effectively avoids the problems of information redundancy and information deficiency,
and increases forecast accuracy. The model’s exceptional performance is demonstrated through comparative
analysis with existing models in experimental investigations.

1. Introduction In the early research work, traditional statistical methods such as

autoregressive moving average (ARMA) (Júnior, de Oliveira, & de
Time series forecasting has long received extensive attention in Mattos Neto, 2019; Khashei, Bijari, & Hejazi, 2012; Tian, 2019) and
various fields. In recent decades, the increase in computing power and kalman filter (Guo, Li, Zhao, & Zhao, 2016; Qiao, Han, Zhu, Shu,
the development of big data have driven tremendous progress in the Zheng, & Yuan, 2018) have dominated for a long time (Okutani &
field of time series forecasting, and a variety of models have been Stephanedes, 1984; Singh, Mohapatra, et al., 2019; Swaraj et al., 2021).
constructed (Cheng, Yang, Xiang, & Liu, 2022; Du, Gao, Suganthan, & They formulate a mathematical model for time series using statistical
Wang, 2022; Reza & Pulugurtha, 2019; Soon, Lim, & Parthiban, 2019; analysis of past data and utilize the statistical properties of the model
Tian, Zhang, Li, Lin, & Yang, 2018; Wu, Cattani, Song, & Zio, 2020). But to make predictions. This type of traditional statistical method has the
the construction of a generalized multi-domain applicable model is still advantages of theoretical maturity, stability and reliability. However,
an urgent problem and challenge in the field. For example, in the area
there are limitations such as difficulty in dealing with non-linear time
of production control in manufacturing, existing models sometimes fail
series and strict assumptions about data distribution.
to meet the need for accurate short-term forecasts of critical yields,
As machine learning theory advances, support vector regression
which can lead to production losses. In the area of financial investment,
(SVR) (Yang, Huang, King, & Lyu, 2009) and neural network mod-
for example, the prediction of short-term movements in securities may
els (Han, Zhao, Leung, Ma, & Wang, 2019) have been introduced to
occasionally be inaccurate, resulting in investment risks. Therefore,
there is an urgent need to develop a short-term forecasting model that time series forecasting. SVR has the ability to handle complex non-
is applicable to multiple domains. linear relationships, strong robustness to outliers, and many kernel

∗ Corresponding author at: Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing
400065, PR China.
E-mail addresses: [email protected] (S.D. Xian), [email protected] (C.Z. Li), [email protected] (M.M. Feng), [email protected] (Y.H. Li).

https://doi.org/10.1016/j.eswa.2024.123959
Received 28 November 2023; Received in revised form 9 March 2024; Accepted 8 April 2024
Available online 21 April 2024
0957-4174/© 2024 Elsevier Ltd. All rights reserved.
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

functions that can be selected for different data characteristics (Fan linear information granules by introducing the improved FCM and
& Sharma, 2021). However, SVR is highly sensitive to parameters, improved distance formulas, respectively. All the models mentioned
requires precise tuning, and may face high computational complexity above rely on time series granulation with unequal sizes, which can
when dealing with large-scale datasets (Guajardo, Weber, & Crone, well characterize the trend in each granular data.
2010; Li, Fang, & Liu, 2018). Neural network models are widely used Currently, 𝑙1 trend filtering (Li et al., 2021; Yang et al., 2022)
for prediction, such as long short-term memory networks (LSTM), gated and turning point (Chen & Gao, 2020; Yin, Si, & Gong, 2011) are
recurrent units (GRU) and convolutional neural networks (CNN). They the main methods for the partitioning of linear trend window in time
have strong feature extraction and generalization capabilities and do series. 𝑙1 trend filtering method is too dependent on parameter set-
not require many assumptions about the data (Ma, Tao, Wang, Yu, & tings, and improper parameter settings can cause incorrect division.
Wang, 2015; Niu, Yu, Tang, Wu, & Reformat, 2020; Zhang & Dong, It also does not identify and reflect the change well when there is
2020). Among them, LSTM and its various improved versions (Fang, a significant step change in data. In addition, the roughness of turn-
Ma, Pan, Yang, & Arce, 2023; Lihong & Qian, 2020; Zha, Liu, Wan, ing point segmentation can lead to redundant segmentation points,
Luo, Li, Yang, & Xu, 2022) have shown excellent performance in resulting in overly detailed results that make it difficult to accurately
dealing with highly temporally correlated data, and have become the capture true segmentation points or significant trend changes. Thus,
preferred forecasting model in the current era of deep learning. How- LFIGs constructed based on the existing 𝑙1 trend filtering and turning
ever, these models have the disadvantages of being data-dependent, points segmentation methods cannot accurately capture and reflect the
parameter-sensitive and poorly interpretable, and are unable to provide trend variation in data. At the same time, there is a lack of effective
generalized solutions in all cases. For short-term forecasts, the accu- algorithms to measure the similarity between LFIGs of unequal size.
mulation of errors in the one-step forecasts of these models remains Duan et al. (Duan et al., 2018) improved the distance calculation for
problematic and does not allow a good fit for future trends. non-uniform size LFIGs compared to the original equal size method,
In order to make forecasts more relevant to actual trends, more but the method cannot accurately reflect the trend similarity of two
accurate capture and modeling of trend information is essential. Some granules. Yang et al. (Yang et al., 2022) improves the counterintuitive
time series models based on fuzzy information granules (FIGs)(Wang,
drawback to a certain extent, but some of the original distance-related
Liu, & Chen, 2020; Wang, Liu, Pedrycz, & Shao, 2014; Wang, Pedrycz,
issues still endure.
& Liu, 2015; Zhao, Han, Pedrycz, & Wang, 2015) have been proposed,
In addition, a forecasting approach using fuzzy inference system
which show better stability in short-term forecasting. FIGs are formed
and multi-order fuzzy logic relationships was proposed by Yu Chuan
by extracting essential information from data, which can better handle
et al. (Chang, Chen, & Liau, 2008). Yang et al. (Yang et al., 2017)
uncertainty and ambiguity through fuzzy set theory and linguistic
performed weighted predictions by combining LFIGs and this method;
descriptions, and its inherent fuzziness makes it more robust to missing
Wang et al. (Wang et al., 2015) performed weighted predictions by
and noisy data. The model constructed by FIGs is simpler and has a
combining granular time series(GTS) after FCM clustering and this
shorter reliance on historical data, which helps to reduce cumulative
method. In the current study, too many historical factors and too many
errors in model predictions. Meanwhile, this method can extract trends
weighted averages are incorporated into the forecasts, and this kind of
and correlations from the data, making it more sensitive to recent
fuzzy weighted forecasts will lead to a smooth trend in the forecasts,
changes in patterns and enabling more accurate predictions in the short
and cannot predict the future trend well. Finding an algorithm that
term (Yang, Yu, & Pedrycz, 2017). In contrast, neural network models
such as LSTM have a strong dependence on specific historical values, reduces the weighting error is an important direction of research.
so they perform poorly at predicting flexibility. In addition, the LSTM Therefore, to better capture the features of trends with increased
models are more sensitive to data noise and anomalies (Liu, Xiang, & precision and leverage them for more precise forecasting, this paper
Elangovan, 2023). Training the model demands an extensive volume proposes a new concept of breakpoints for segmenting linear trends
of data (Cao, Zhu, Sun, Wang, Zheng, Xiong, Hou, & Tian, 2018; Qian in time series, thus enabling accurate granulation of linear trends for
& Chen, 2019), and if the available data is limited, it will result in time series. Meanwhile, we propose a new method for calculating the
poor model performance. So models based on FIGs work better in short- distance between granules by improving and optimizing the original
term forecasting. These methods help to reduce error accumulation by inter-granule distance calculation, as an effective tool for measuring
constructing fuzzy sets with equal-size granules to express the overall the similarity between granules. Based on this new distance measure-
trend of the time series (Li, Yang, Yu, Wang, & Wang, 2019; Lu, ment method, we devise a granule-suited FCM algorithm, offering
Pedrycz, Liu, Yang, & Li, 2014; Yang et al., 2017). Since these equal- robust assistance in granule clustering tasks. In order to obtain cor-
size FIGs are limited by the fixed-size property, they cannot accurately relations between linear trends and more comprehensive forecasts,
represent trend information of different sizes and thus suffer from a we propose a multilevel fuzzy association rules (FARs) optimization
lack of precision. This also makes it difficult for them to meet the strict extraction algorithm. Compared with the former FARs, the multilevel
requirements in semantic representation. Therefore, these equal-sized FARs the algorithm constructs can capture the multilevel effects of past
FIGs models still need to be improved in terms of accuracy and trend trends on subsequent trends. On this basis, we construct a double-level
representation ability. optimal fuzzy association rules prediction model to achieve short-
To precisely grasp the features of the time series and enhance the term forecasting of time series. Experimentally proven, our model
predictive precision, many scholars have established unequal-size FIGs, has demonstrated broad applicability in a number of fields, and in
which are mainly divided into non-linear trend FIGs and linear trend particular has shown compelling and excellent potential for forecasting
FIGs. For non-linear FIGs, Xian et al. (Xian, Feng, & Cheng, 2023) in areas such as demography, meteorology, physics and electricity. The
added time-varying core lines to generate non-linear trend granules novel contributions of this study can be outlined as follows:
based on linear trend granules, combined with DeeAR networks for
forecasting. Cheng et al. (Cheng, Xing, Pedrycz, Xian, & Liu, 2023) • A concept of breakpoints has been proposed to capture linear
constructed a nonlinear trend fuzzy granulation method for dynamic trend changes in time series, thus achieving precise granulation
traffic flow prediction. Wang et al. (Wang et al., 2015) combined of time series.
non-linear trend granulation with improved fuzzy c-means (FCM) for • An improved distance is proposed, which can measure the simi-
forecasting. For linear FIGs, Li et al. (Li et al., 2021) constructed a larity among LFIGs more accurately.
multilinear trend fuzzy information granulation for prediction using • Granule-suited fuzzy c-means clustering is proposed.
linear information granules of unequal sizes. Yang (Yang et al., 2022) • A double-level optimal fuzzy association rules prediction model
and Duan et al. (Duan, Yu, Pedrycz, Wang, & Yang, 2018) clustered is proposed for short-term forecasting.

2
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

The remainder of the paper follows this structure. Section 2 briefly 2.1.3. Gaussian linear Fuzzy information granules
reviews some related concepts. Section 3 proposes a new concept of Linear fuzzy information granules are a continuation of Gaussian
breakpoints and establishes a breakpoint-based linear granulation. To fuzzy numbers, namely Gaussian linear fuzzy information granules. It
evaluate the similarity between these granules more accurately and extends the core feature 𝜇 of Gaussian fuzzy numbers from a static
classify them, Section 4 makes improvements to the original distance value to a linear time-dependent function 𝜇(𝑡) = 𝑘𝑡 + 𝑏 to better fit
between granules and proposed a fuzzy c-means clustering suitable the variation in trend and the span of fluctuation in original data.
for granules. On this basis, in Section 5, a double-level optimal fuzzy Similar to Gaussian fuzzy numbers, LFIGs are represented in the
association rules prediction model for short-term forecasting is con- form of 𝐺(𝑘, 𝑏, 𝜎, 𝑇 ), where 𝑘, 𝑏, 𝜎, and 𝑇 correspond to the charac-
structed. Section 6 gives four experiments to confirm the efficacy of teristic parameters of the Gaussian function 𝑓 . 𝑓 (𝑥; 𝑘𝑡 + 𝑏) represents
the suggested model. Finally, Section 7 concludes this paper. the membership degree of 𝑥 at time 𝑡 belonging to 𝐺(𝑘, 𝑏, 𝜎, 𝑇 ). The
membership function of Gaussian LFIGs is(Yang et al., 2017):
2. Related works (𝑥 − (𝑘𝑡 + 𝑏))2
𝑓 (𝑥; 𝑘𝑡 + 𝑏) = 𝑒𝑥𝑝(− ), 𝑡 ∈ [0, 𝑇 ] (3)
2𝜎 2
In this section, we will briefly describe the concept of Gaussian
where 𝜇(𝑡) = 𝑘𝑡 + 𝑏 is the time-varying core line, 𝑘 and 𝑏 ∈ 𝑅 are
Linear Fuzzy Information Granules, and introduce the distance calcula-
the slope and intercept of the core line respectively. 𝑥 is the real value
tion method between Gaussian LFIGs and Sequences Scale Equalization,
of time series and 𝜎 is the dispersion of the data around 𝜇(𝑡), and 𝑇
which will provide important theoretical background for the study.
is the length of the corresponding line partition. LFIG is expressed as
𝐺(𝑘, 𝑏, 𝜎, 𝑇 ). The parameters are generated by linear regression.
2.1. Linear Fuzzy information granules

2.2. Distance between Gaussian LFIGs

2.1.1. Granules and information granules
Human cognition rests upon three fundamental concepts: granu-
Consider two Gaussian 𝐿𝐹 𝐼𝐺𝑠 of the same size 𝑇 , denoted 𝐺1 and
lation, organization, and causation (Zadeh, 1997). In simpler terms,
𝐺2 , whose parameters are 𝐺1 (𝑘1 , 𝑏1 , 𝜎1 , 𝑇 ) and 𝐺2 (𝑘2 , 𝑏2 , 𝜎2 , 𝑇 ), respec-
granulation refers to breaking down wholes into parts; organization
tively. The type 𝐿1 Hausdorff distance between these two Gaussian
involves assembling parts into wholes; and causation relates to linking
𝐿𝐹 𝐼𝐺𝑠(Yang et al., 2017), denoted 𝐿1 (𝐺1 , 𝐺2 ), can be reformulated as:
causes with their effects. Granules are a cluster of objects formed by
their equivalence, resemblance, nearness, or utility. For example, the
granules of the human head include the forehead, nose, cheeks, ears, ⎧1 √
2𝜋
eyes and so on. ⎪ 2 𝛥𝑘𝑇 2 + (𝛥𝑏 + 2 𝛥𝜎)𝑇 , 𝑡∗ < 0
⎪ √
Information granules (IGs) are entities that result from the extrac- 𝐿1 (𝐺1 , 𝐺2 ) = ⎨−𝛥𝑘𝑡 + 2𝛥𝑏𝑡 + 𝑇 (−𝛥𝑏 + 𝑇 𝛥𝑘 +
∗2 ∗ 2𝜋
𝛥𝜎), 𝑡∗ ∈ [0, 𝑇 ]
⎪ 1 √ 2 2
tion of data and derived knowledge from information (Mencar, 2005). 2𝜋
⎪− 2 𝛥𝑘𝑇 2 + (𝛥𝑏 + 2 𝛥𝜎)𝑇 , 𝑡∗ > 𝑇
The purpose of IG is to break down complex problems into simpler ⎩
ones, thereby removing redundant or irrelevant information. Zadeh
(4)
first proposed information granulation, offering a general form for
𝑏1 −𝑏2
it (Zadeh, 1979): where 𝑡∗ = 𝑘1 −𝑘2
represents the intersection of the two time-varying
core lines 𝜇1 (𝑡) = 𝑘1 𝑡 + 𝑏1 and 𝜇2 (𝑡) = 𝑘2 𝑡 + 𝑏2 , and 𝛥𝑘 = |𝑘1 − 𝑘2 |,
𝑔 ≜ 𝑋 𝑖𝑠 𝐺 𝑖𝑠 𝜆 (1)
𝛥𝑏 = |𝑏1 − 𝑏2 |, 𝛥𝜎 = |𝜎1 − 𝜎2 |.
where 𝑋 is a variable in a universe of discourse 𝑈 , 𝐺 represents The above distance measures the similarity between two same
a convex fuzzy subset of 𝑈 , and 𝜆 represents the probability of 𝑋 size LFIGs. Duan et al. (Duan et al., 2018) extended the distance of
belonging to subset 𝐺. unequal length on the basis of the above. Given two Gaussian 𝐿𝐹 𝐼𝐺𝑠
𝐺1 (𝑘1 , 𝑏1 , 𝜎1 , 𝑇1 ) and 𝐺2 (𝑘2 , 𝑏2 , 𝜎2 , 𝑇2 ) satisfying 𝑇1 < 𝑇2 , the distance
2.1.2. Fuzzy information granules between 𝐺1 and 𝐺2 is defined as follows:
In the information granulation mode, granules are crisp. However,
𝐿1 (𝐺1 , 𝐺2 ) = 𝐿11 (𝐺1 , 𝐺21 ) + 𝐿21 (𝐺3 , 𝐺22 ) (5)
crisp IG overlooks the fact that granules tend to be fuzzy in most human
notion. For example, the granules of the human head are fuzzy because where 𝐺21 (𝑘2 , 𝑏2 , 𝜎2 , 𝑇1 )
and 𝐺22 (𝑘2 , 𝑏′2 , 𝜎2 , 𝑇2 − 𝑇1 ),
𝐺3 (𝑘3 , 𝑏3 , 𝜎3 , 𝑇2 − 𝑇1 ) is
there are no clear definitions of boundaries between features such an auxiliary granule defined by the end points of 𝐺1 and 𝐺2 with 𝜎3 = 0,
as cheeks, nose, forehead, ears, and so on. The fuzziness of granules used to help calculate the distance between two 𝐿𝐹 𝐼𝐺𝑠 of unequal
(𝑘 ⋅𝑇 +𝑏 )−(𝑘 ⋅𝑇 +𝑏 )
is a characteristic of how humans process information. This is more length, where 𝑘3 = 2 2 𝑇2 −𝑇 1 1 1 , 𝑏′2 = 𝑘2 𝑇1 + 𝑏2 , 𝑏3 = 𝑘1 𝑇1 + 𝑏1
2 1
reasonable than crisp IG method for granulation (Zadeh, 1997). and 𝜎3 = 0.
Compared to crisp information granules, fuzzy information granules
introduce membership functions based on information granules. They 2.3. Sequences scale equalization based on the dynamic time warping
can be used to measure the matching degree between the information
granules and the original data. However, different granulation methods Guo et al. (Guo, Pedrycz, & Liu, 2018) proposes to adjust the length
can result in different granules. Common membership functions in- of the sequence by introducing the Dynamic Time Warping, whose
clude triangular, trapezoidal and Gaussian. Here, we mainly introduce two theorems provide theoretical support for the method. First, let us
Gaussian fuzzy information granulation. Gaussian fuzzy numbers (Yang start by introducing the DTW algorithm, which forms the foundation of
et al., 2017) are a common FIG, where the membership function 𝑓 is the equilibrium method. Then, we will delve into the primary concept
as follows: behind this equilibrium approach.
(𝑥 − 𝜇)2
𝑓 (𝑥; 𝜇, 𝜎) = 𝑒𝑥𝑝(− ) (2)
2𝜎 2 2.3.1. Dynamic time warping
where 𝜇 and 𝜎 represent the center and spread of the fuzzy number. 𝐷𝑊 𝑇 is a frequently utilized technique for measuring the similarity
A Gaussian fuzzy number with membership function 𝑓 (𝑥) is denoted of numerical sequences. It determines the best alignment between the
as 𝐺(𝜇, 𝜎), where 𝑓 (𝑥) represents the membership degree of value 𝑥 provided sequences and quantifies their similarity by matching the
belonging to 𝐺(𝜇, 𝜎). coordinates of the two sequences.

3
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
{ } { }
Given two sequences 𝑋 = 𝑥1 , 𝑥2 , … , 𝑥𝑛𝑥 and 𝑌 = 𝑦1 , 𝑦2 , … , 𝑦𝑛𝑦 ,
{ }
𝐻 is a matrix represented as 𝐻 = ℎ𝑖𝑗 𝑛 ×𝑛 , and its elements are
𝑥 𝑦
computed as follows:
{ }
ℎ𝑖𝑗 = |𝑥𝑖 − 𝑦𝑗 | + 𝑚𝑖𝑛 ℎ𝑖−1,𝑗 , ℎ𝑖−1,𝑗−1 , ℎ𝑖,𝑗−1 . (6)

The matrix 𝐻 is designed to obtain the DTW distance, which is used to

quantify the similarity between 𝑋 and 𝑌 . DTW distance is

𝐷𝑇 𝑊 (𝑋, 𝑌 ) = ℎ𝑛𝑥 ,𝑛𝑦 . (7)

2.3.2. DTW-based sequences scale equalization

As mentioned before, the 𝐷𝑇 𝑊 (𝑋, 𝑌 ) distance can reflect the de-
gree of similarity between the sequence 𝑋 and the sequence 𝑌 . There-
fore, for a given sequence 𝑋, if its length is to be adjusted to 𝐿, the goal
is to seek a sequence 𝑌 of length 𝐿, such that 𝐷𝑇 𝑊 (𝑋, 𝑌 ) is minimized. Fig. 1. Usual Forms (a)Peak point. (b)Trough point. (c)Slow uptrend to fast uptrend.
(d)Fast uptrend to slow uptrend. Special forms (e)Transition points of a stable
This is expressed in the following expression:
fluctuating trend. (f)Transition points of a fluctuating uptrend.
𝐸𝑞-𝐷𝑇 𝑊 (𝑋, 𝐿) = arg min DTW(𝑋, 𝑌 ). (8)
length (𝑌 )=𝐿
{ }
(1) Increasing length: Supposing 𝑋 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 is less than 𝐿 (3) A transition point from slow uptrend (downtrend) to fast uptrend
in length, the goal is to increase 𝑋 without altering its key character- (downtrend).
istics. To achieve this goal, a suitable (𝑛 + 1)-size sequence 𝑌 can be (4) A transition point from a fast uptrend (downtrend) to a slow
constructed by recurring 𝑥𝑖 between the 𝑖𝑡ℎ and (𝑖 + 1)th element of 𝑋. uptrend (downtrend).
Such an approach ensures that DTW(X,Y) is zero while maintaining the
The special forms of breakpoints are as follows:
core features of 𝑋. So the sequence 𝑌 can be constructed by recurring
𝑥𝑖 between the 𝑖th and (𝑖 + 1)th element of 𝑋. (1) A transition point from an uptrend(downtrend) to a stable fluc-
{ }
(2) Decreasing length: Supposing 𝑋 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 is more than tuating trend.
𝐿 in length, the goal is to decrease 𝑋 without altering its key charac- (2) A transition point from an stable trend to a fluctuating up-
teristics. An additional element is used to substitute two consecutive trend(downtrend).
elements with the smallest absolute difference in a sequence, and
Usual forms and special forms are shown in Fig. 1.
DTW(X,Y) will be more or equal than the minimum distance between
the two consecutive elements. Specifically, when the additional element
falls within the interval constructed by that two contiguous elements, Remark 1. It merits underscoring that the stable trend of the data
the DTW distance is minimized. requires a time span threshold 𝑘 to be determined in the process of
Increasing length and Decreasing length effectively balance the length obtaining breakpoints. Specifically, we consider a trend in a time series
of the sequence 𝑋 without changing its essential features, thus main- to be stable if it occurs over a time span greater than 𝑘. Conversely,
taining the consistency and similarity of the data. if the time span of the trend is less than 𝑘, it is considered to be a
non-stationary trend.
3. Granulating time series based on breakpoints
Remark 2. For the uptrend, we set thresholds 𝑝 for hierarchical
In this section we present a method for granulating time series based division to reflect more comprehensively and accurately the changing
on breakpoints. First, Section 3.1 proposes a new concept for partition patterns of data. Specifically, 𝑝 further classifies the uptrend into two
time series, breakpoints. Then, Section 3.2 shows how to obtain the categories, ‘‘slow uptrend’’ and ‘‘fast uptrend’’. This hierarchical par-
tition can facilitate a deeper understanding of the upward pattern of
breakpoints and introduce a particle swarm optimization algorithm to
the time series and identify complex local features. The downtrend is
obtain the optimal breakpoints for a better granulation of time series.
similar.
3.1. Breakpoints
3.2. Getting optimal breakpoints

The linear trend granulation method works better when used for This subsection first discusses how breakpoints are obtained and
forecasting, but traditional time series partition methods do not capture then looks at how they can be optimized to get the best breakpoints.
linear trends well. Therefore, we propose a novel method for partition-
ing the linear trend of a time series, which ensures that the granule 3.2.1. Getting breakpoints { }
maximally matches the trend information of the time series. Given a time series 𝑋 = 𝑥{ 1 , 𝑥2 , … , 𝑥𝑛 , compute the first-order
}
{ } difference of 𝑋, denoted 𝐷 = 𝑑1 , 𝑑2 , … , 𝑑𝑛−1 . For the uptrend of
Definition 1 (Breakpoints). For given a time series 𝑋 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 , the time series, we classify it into two types, ‘‘slow uptrend’’ and ‘‘fast
{ }
if there is a stable slow uptrend 𝑥𝑖 , 𝑥𝑖+1 , … , 𝑥𝑗 transforms to another uptrend’’, and set the corresponding threshold 𝑝 to distinguish these
{ }
stable fast uptrend 𝑥𝑗+1 , 𝑥𝑗+2 , … , 𝑥𝑘 , then the transition point 𝑥𝑗 is two upward patterns. Specifically, the origin and endpoint of a slow
called a breakpoint. Similarly, if there exists a stable slow downtrend uptrend are obtained in the form 0 ≤ 𝑑𝑖 ≤ 𝑝; the origin and endpoint
{ } {
𝑥𝑖 , 𝑥𝑖+1 , … , 𝑥𝑗 transforms to another stable fast uptrend 𝑥𝑗+1 , 𝑥𝑗+2 , of a fast uptrend are obtained in the form 𝑑𝑖 > 𝑝. Combining both, the
} { }
… , 𝑥𝑘 , or a steady slow uptrend 𝑥𝑖 , 𝑥𝑖+1 , … , 𝑥𝑗 to another steady uptrend is marked as
{ } { }
fast downtrend 𝑥𝑗+1 , 𝑥𝑗+2 , … , 𝑥𝑘 , etc, then the transition point 𝑥𝑗 is
𝑈 𝑝 = (𝑢𝑝𝑆𝑡1 , 𝑢𝑝𝐸𝑑1 ), (𝑢𝑝𝑆𝑡2 , 𝑢𝑝𝐸𝑑2 ), … , (𝑢𝑝𝑆𝑡𝑛1 , 𝑢𝑝𝐸𝑑𝑛1 ) .
called a breakpoint. The usual forms of a breakpoint are as follows:
For a downward trend, we can also use a symmetric negative threshold
(1) A transition point from uptrend to downtrend, i.e., a peak point.
−𝑝 for similar hierarchical partitioning, obtaining
(2) A transition point from downtrend to uptrend, i.e., a trough { }
point. 𝐷𝑛 = (𝑑𝑛𝑆𝑡1 , 𝑑𝑛𝐸𝑑1 ), (𝑑𝑛𝑆𝑡2 , 𝑑𝑛𝐸𝑑2 ), … , (𝑑𝑛𝑆𝑡𝑛2 , 𝑑𝑛𝐸𝑑𝑛2 ) .

4
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

Fig. 3. Consecutive non-stable window. (a)Stable fluctuating trend. (b)Stable

Fig. 2. Consecutive upward non-stable window. fluctuating uptrend.

For a smooth trend, ascertain the origin and endpoint of the trend with In addition, for merging mode 2: if the overall length exceeds the
𝑑𝑖 = 0 and greater than or equal to 𝑘 in a row, and ascertain the origin stability threshold, the entire AS window will be merged; otherwise,
and endpoint of the stable smooth trend merging mode 1 will be used. It is obvious that merging mode 2
{ } converges.
𝑃 𝑙 = (𝑝𝑙𝑆𝑡1 , 𝑝𝑙𝐸𝑑1 ), (𝑝𝑙𝑆𝑡2 , 𝑝𝑙𝐸𝑑2 ), … , (𝑝𝑙𝑆𝑡𝑛3 , 𝑝𝑙𝐸𝑑𝑛3 ) . In summary, the merging algorithm converges. □
In addition, for non-stationary trends in the series, which have few
data points and too short a time span, they also increase the complexity Algorithm 1 Optimal Merger Algorithm.
of the model. They are merged as follows { }
Input: 𝑈 𝑑 𝑢𝑑1 , 𝑢𝑑2 , ..., 𝑢𝑑𝑛𝑢 , stable trend threshold 𝑘.
(1) Step 1, dealing with non-stationary trends of continuous up- Output: Breakpoints 𝐵𝑝.
trend and continuous downtrend. 1: Calculate the length of each segment within 𝑈 𝑑 to get 𝑤𝑖𝑑𝑡.
2: Find consecutive subscript partitions in 𝑤𝑖𝑑𝑡 less than or equal to 𝑘 to the
In an uptrend 𝑈 𝑝, if there exists a set of continuous non-
array 𝑖𝑛𝑑𝑖𝑐𝑒𝑠.
stationary trends with a sum of time spans greater than or equal
3: for each indice in 𝑖𝑛𝑑𝑖𝑐𝑒𝑠, from back to front:
to 𝑘, the continuous window is merged and treated as a special
4: Calculate the length of each indice as 𝑙𝑒𝑛.
stable upward sequence. As shown in Fig. 2. Downtrend 𝐷𝑛 5: for each subinterval, from back to front:
ditto. Then, the starting points of the uptrend 𝑈 𝑝 and down- 6: 𝑟1 is 𝑅2 value merged with the previous interval.
trend 𝐷𝑛, 𝑢𝑝𝑆𝑡 and 𝑑𝑛𝑆𝑡, are combined
{ with the stable
} smooth 7: 𝑟2 is 𝑅2 value merged with the latter interval.
trend 𝑃 𝑙 and denoted as 𝑈 𝑑 = 𝑢𝑑1 , 𝑢𝑑2 , … , 𝑢𝑑𝑛𝑢 , in prepa- 8: if 𝑟1 ≥ 𝑟2 , then
ration for handling of the non-stationary trends of continuous 9: Select to merge with the previous interval.
up-downtrend later. 10: if Length of the merged interval ≥ 𝑘, then
(2) Step 2, dealing with non-stationary trends of continuous up- 11: Move two places forward in this 𝑓 𝑜𝑟.
12: endif
downtrend { and single non-stationary
} trend.
13: elseif 𝑟1 < 𝑟2 , then
For 𝑈 𝑑 = 𝑢𝑑1 , 𝑢𝑑2 , … , 𝑢𝑑𝑛𝑢 , with 𝑢𝑑𝑖 as the splitting point, 14: Select to merge with the latter interval.
if there are single non-stationary trends or consecutive non- 15: endif
stationary up-downtrends with length sum less than 𝑘, merge 16: endfor
them with a front or back window based on the 𝑅2 value. If 17: Calculate of the average value of 𝑅2 for the merged segment, 𝑅1 , and
the sum of the lengths of the consecutive non-stationary up- returns 𝑈 𝑑1 .
downtrends is greater than or equal to 𝑘, determine whether it 18: if 𝑙𝑒𝑛 ≥ 𝑘, then
is appropriate to form a special stable trend by local 𝑅2 aver- 19: Merge all subintervals of the partition and compute the partition
aging, as shown in Fig. 3, and if not, merge the non-stationary 𝑅2 noted as 𝑅2 , and returns 𝑈 𝑑2 .
20: endif
window with the front or back window. { Details in Algorithm
} 1. 21: if 𝑅1 ≥ 𝑅2 , then
Eventually we get the breakpoints 𝐵𝑝 = 𝑏𝑝1 , 𝑏𝑝2 , … , 𝑏𝑝𝑛𝑏 . 22: 𝐵𝑝 is updated from 𝑈 𝑑1 .
23: endif
Theorem 3.1 (Convergence of Optimal Merger Algorithm). For any con- 24: else, then
tinuous non-stationary window series 𝐴𝑆 = [𝑎1 , 𝑎2 , … , 𝑎𝑛1 ] in the window 25: 𝐵𝑝 is updated from 𝑈 𝑑2 .
{ } 26: endif
series segmented by 𝑈 𝑑 𝑢𝑑1 , 𝑢𝑑2 , … , 𝑢𝑑𝑛𝑢 , the number of non-stationary 27: endfor
windows will converge to 0 after the algorithm runs. 28: return Breakpoint 𝐵𝑝.

Proof. For the Optimal Merger Algorithm, we can decompose it

into two merging modes and demonstrate the overall convergence of 3.2.2. Optimize and get optimal breakpoints
the algorithm by proving the convergence of these two modes. Let In the process of obtaining breakpoints, it is crucial to choose
𝐴𝑆 = [𝑎1 , 𝑎2 , … , 𝑎𝑛1 ] represent a continuous window, where 𝑎𝑖 (𝑖 ≤ 𝑛1 ) appropriate thresholds 𝑘 and 𝑝, as different values lead to different
represents a non-stationary window with a length less than the stable segmentation effects. Specifically, Fig. 4 shows the effect of three
threshold. different values of 𝑘, the appropriate value, too large and too small
First, for merging mode 1: decreasing from 𝑖 = 𝑛1 , we define 𝑅1 (𝑎𝑖 ) anomalies, from which it can be seen that 𝑘 directly affects the number
to be the 𝑅2 obtained by merging 𝑎𝑖 forward in the window, and and quality of breakpoints. Similarly, the three different values of 𝑝 are
𝑅2 (𝑎𝑖 ) to be the 𝑅2 obtained by merging 𝑎𝑖 backward in the window. shown in Fig. 5, and it is clear that a right 𝑝 allows a more precise
We compare 𝑅1 (𝑎𝑖 ) and 𝑅2 (𝑎𝑖 ) and choose the larger one to merge, delineation of the trend in the time series. Therefore, how these values
i.e. 𝑎𝑖 = 𝑎𝑖 + 𝑎𝑖+1 or 𝑎𝑖 = 𝑎𝑖 + 𝑎𝑖−1 . Repeat this process until the length are determined is critical.
of 𝑎𝑖 is greater than or equal to the stability threshold. As 𝑖 tends to 1, To partition data more accurately, this paper introduces Particle
it is clear that the non-stationary window of the AS tends to 0. Thus, Swarm Optimization(PSO) algorithm in the process of obtaining break-
merging mode 1 is convergent. points. The details are as follows. For the time series data as 𝑋 =

5
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

Fig. 4. Three types of 𝑘 get breakpoints. (a)Suitable 𝑘. (b)Too small 𝑘. (c)Too large 𝑘.

Fig. 5. Three types of 𝑝 get breakpoints. (a)Suitable 𝑝. (b)Too small 𝑝. (c)Too large 𝑝.

{ }
𝑥1 , 𝑥2 , … ..𝑥𝑛 , partition the trend based on the initial thresholds 𝑘 Table 1
{ } Comparison of the results of the three division methods.
and 𝑝, obtaining 𝑚 windows 𝑇 𝑆𝑊 𝑊1 , 𝑊2 , … , 𝑊𝑚 . Next, calculate
the coefficient of determination of linear regression for each window 𝑙𝑚 𝑟𝑚 𝑙𝑟
{ }
𝑊𝑖 , denoted as 𝑇 𝑆𝑅 𝑅21 , 𝑅22 , … , 𝑅2𝑚 . Using the mean value of 𝑇 𝑆𝑅 as Turning Points 2.94 0.96 2.81
𝑙1 Trend Filter 3.03 0.63 1.91
the objective function 𝐽 (𝑘, 𝑝) for particle swarm optimization.
Breakpoints 5.88 0.73 4.28
𝑅21 + 𝑅22 + ⋯ + 𝑅2𝑚 Note: A value in bold is the optimal index.
𝐽 (𝑘, 𝑝) = (9)
𝑚
The thresholds 𝑘 and 𝑝 are optimized iteratively to maximize the
objective function, thus obtaining the best window partition results. method is able to segment the time series better, avoiding the problem
This method harnesses the complete global search potential of PSO of redundant segmentation points and showing better overall results
and can effectively find the optimal breakpoints of the time series. The and metrics.
formula is as follows: The steps to granulate the time series are as follows:
𝑘, 𝑝 = arg max 𝐽 (𝑖𝑘 , 𝑖𝑝 ) (10)
𝑖𝑘 ∈[𝑎,𝑏],𝑖𝑝 ∈[𝑐,𝑑] (1) Determine the optimal threshold 𝑘, 𝑝 by the PSO to obtain the
optimal breakpoints.
where [𝑎, 𝑏] and [𝑐, 𝑑] are the constraint boundaries for 𝑖𝑘 and 𝑖𝑝 , (2) Partition the time series into several windows based on the
respectively.
breakpoints.
The breakpoints optimized by PSO provide better linear partition.
(3) Construct a LFIG on each window to obtain a granular time
As shown in Fig. 6. Compared with the 𝑙1 trend filter partition and
series.
turning points partition on a same random dataset, it can be seen that
the partition based on breakpoints can segment the trend of data more
accurately. To better illustrate the benefits of breakpoints, we use 𝑙𝑚 4. DTW-i𝑳𝟏 Fuzzy C-means
and 𝑟𝑚 as the main evaluation metrics, where 𝑙𝑚 is the window length
mean and 𝑟𝑚 is the 𝑅2 mean. It should be noted that there is no fixed
In the Fuzzy C-means(FCM) clustering, distance calculations are
standard for 𝑙𝑚, it should be judged according to the effect diagram.
critical to the quality of the results. It should be noted that the size
𝑟𝑚 is bigger, the better the effect. The value of 𝑟𝑚 depends on the
of each LFIGs is not uniform. Although previous research has proposed
value of 𝑙𝑚, 𝑙𝑚 is smaller, 𝑟𝑚 tends to be larger. So to make a more
a method for calculating distances between unequal size LFIGs, it can
accurate comparison, we use 𝑙𝑚 ∗ 𝑟𝑚(𝑙𝑟) as a new comparison metric,
sometimes yield counter-intuitive results. To attain a more reasonable
and obviously the larger the 𝑙𝑟, the better the partitioning effect.
clustering, we need to improve the calculation of the LFIGs distance to
produce clustering results that are more in line with expectations. In
Definition 2 (lr). Given a time series 𝑋 = {𝑥1 , 𝑥2 , … , 𝑥𝑛 }, 𝑛 ∈ R, let
Section 4.1, we improve the intergranular distance calculation method
the window series be denoted as 𝑊 𝑆 = {𝑊1 , 𝑊2 , … , 𝑊𝑚 }, 𝑚 ∈ R, by
partitioning, and 𝐿𝑆 = {𝑙1 , 𝑙2 , … , 𝑙𝑚 } and 𝑅𝑆 = {𝑅1 , 𝑅2 , … , 𝑅𝑚 } are and solve the previous problem by proposing a completely new method
length series and coefficient of determination of linear regression 𝑅2 for calculating the distance. In Section 4.2, we present a DTW-i𝐿1 Fuzzy
series of 𝑊 𝑆, respectively. The product of the length mean (𝑙𝑚) and C-means clustering by combining FCM with the new distance.
the 𝑅2 mean (𝑟𝑚), denoted as 𝑙𝑟, is calculated as

𝑙𝑟 = 𝑟𝑚 × 𝑙𝑚, (11) 4.1. DTW-i𝐿1 distance

𝑙1 +𝑙2 +⋯+𝑙𝑚 𝑅1 +𝑅2 +⋯+𝑅𝑚
where 𝑙𝑚 = and 𝑟𝑚 = .
𝑚 𝑚 We first point out the problems with the original distances be-
As shown in Table 1 and Fig. 6, the length of the window divided tween granules and improve them. Based on this improvement, we
by the two compared methods is significantly too small, indicating the then propose a completely new method for calculating intergranular
presence of redundant segmentation points. In contrast, the breakpoint distances.

6
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

Fig. 6. Three results partitions.(a)Turning points partitions. (b)𝑙1 trend filter partitions. (c)Breakpoints partitions.

Fig. 8. Geometric form of 𝐿1 distance. (a)The geometric form of 𝛥𝑏𝑇 . (b)The

geometric form of 𝛥𝑘𝑇 2 .

Fig. 7. Distance calculation of two kinds of equal-size LFIGs.

4.1.2. Unequal-size LFIGs computation problems and countermeasures

Given two new LFIGs 𝐺1 and 𝐺2 of unequal sizes, as shown in
4.1.1. Equal-size LFIGs computation problems and countermeasures Fig. 9(a)(b). By adding a fictional 𝐺3 to the end of the smaller 𝐺1 ,
Given three core lines of Gaussian LFIGs, as shown in Fig. 7, the combined size of 𝐺1 + 𝐺3 equals that of 𝐺2 . Then, the 𝐿1 distance
𝐺1∗ (4, 30, 1, 5), 𝐺2∗ (5, 15, 2, 5) and 𝐺3∗ (−3, 45, 0, 5). Intuitively, 𝐺1∗ is more between 𝐺1 and 𝐺2 , 𝐿1 (𝐺1 , 𝐺2 ) = 𝐿1 (𝐺1 , 𝐺21 ) + 𝐿1 (𝐺3 , 𝐺22 ). In this
similar to 𝐺2∗ than 𝐺3∗ in terms of trend characteristics. Therefore, method, by adding a 𝐺3 at the end of 𝐺1 that is trending differently
the distance between 𝐺1∗ and 𝐺2∗ should be smaller than the distance from 𝐺1 , it is possible to make 𝐺1 the same length as 𝐺2 , but it does not
between 𝐺1∗ and 𝐺3∗ . However, this outcome is not achieved through accurately represent the trendiness of the original 𝐺1 . This adjustment
the original 𝐿1 Hausdorff distance calculation (refer to Table 2). There leads to a large difference between the trend of 𝐺1 with the addition of
is no doubt that the similarity of trends is largely influenced by the 𝐺3 and the trend of the original 𝐺1 , thus increasing the distance error
slope of the core line. However, in the above case even if 𝛥𝑏 and 𝑇 between 𝐺1 and 𝐺2 . Therefore, this approach may not be the optimal
and 𝛥𝜎 take equal values respectively, the distance is closer for larger choice.
𝛥𝑘, which reveals an obvious counter-intuitive problem with the light To better solve the problem of similarity measurement of LFIGs, this
paper proposes a new way to measure the similarity between unequal
weighting of 𝛥𝑘 in Eq. (4). Similarly, Li’s (Yang et al., 2022) literature
size granules. The method introduces the DTW equalization algorithm,
also points out that the distance is susceptible to excessive influence of
denoted as 𝐸𝑞-𝐷𝑇 𝑊 , by minimizing the loss to equalize unequal
𝑏 value, i.e., the proportion of 𝛥𝑏 is too high. Therefore, we can infer
sized granules into equal sized ones, maintaining the trendiness of the
that there is a contradiction between 𝛥𝑘 and 𝛥𝑏. In order to improve
granules to the fullest extent. Specifically, the steps of the method are
the accuracy of the distance, a reasonable idea is to increase the weight as follows:
of 𝛥𝑘. Step 1, Through the DTW equalization algorithm, the granular
In this regard, we propose a new idea. It is to increase the weight time series are equalized to achieve equal { size. }
from 12 𝑇 2 and − 12 𝑇 2 in Eq. (4) to a larger value. Specifically in terms Given a granular time series 𝐺𝑇 𝑆 𝐺1 , 𝐺2 , … , 𝐺𝑛 , where 𝐺𝑖 is
of geometry, given two 𝐺1 , 𝐺2 as shown in Fig. 8(a), the 𝛥𝑏𝑇 in the 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝑖 ≤ 𝑛; firstly, { the average} length within the 𝐺𝑇 𝑆 is
distance formula (4) represents the area of a parallelogram with length calculated as 𝐿; then 𝐺𝑇 𝑆 𝐺1 , 𝐺2 , … , 𝐺𝑛 equalized to the equal size
𝛥𝑏 and width 𝑇 . Adjusting the starting points of the core lines of 𝐺1 and 𝐿 by 𝐸𝑞-𝐷𝑇 𝑊 (𝑋𝐺𝑖 , 𝐿) from Eq. {(8), where 𝑋𝐺𝑖 is the } original win-
𝐺2 , causes them to align, illustrating in Fig. 8(b). The 𝛥𝑘𝑇 reflects the dow data of 𝐺𝑖 , obtaining 𝑁𝐺𝑇 𝑆 𝑁𝐺1 , 𝑁𝐺2 , … , 𝑁𝐺𝑛 , where 𝑁𝐺𝑖 is
extent of difference between the two core lines regarding 𝑘 within the 𝐺𝑖 (𝑘′𝑖 , 𝑏′𝑖 , 𝜎𝑖′ , 𝑇𝑖′ ), 𝑖 ≤ 𝑛. Fig. 9(c) is a graphical illustration which clearly
period 𝑇 . To further strengthen the role of 𝛥𝑘, we refer to the expression shows that the DTW-equilibrated granules better retain their original
of 𝛥𝑏𝑇 and add a new parallelogram area of length 𝛥𝑘𝑇 and width 𝑇 , characteristics in terms of morphology and trend.
𝛥𝑘𝑇 2 , to increase the weight of 𝛥𝑘. The improved distance formula 𝑖𝐿1 Step 2, Compute the distance between LFIGs by the 𝑖𝐿1 distance
is specifically expressed as: for equilibrium equal size granular time series.
√
For equal-sized granular time series 𝑁𝐺𝑇 𝑆, the distance between
⎧3 2 2𝜋 granules 𝐺𝑖 and 𝐺𝑗 is calculated by 𝑁𝐺𝑖 and 𝑁𝐺𝑗 corresponding to 𝐺𝑖
⎪ 2 𝛥𝑘𝑇 + (𝛥𝑏 + 2 𝛥𝜎)𝑇 , 𝑖𝑓 𝑡∗ < 0
⎪3 √ and 𝐺𝑗 according to the following equation
𝑖𝐿1 = ⎨ 𝛥𝑘𝑇 2 + 2𝜋 𝛥𝜎𝑇 + 𝛥𝑏(𝑡∗ − 𝑇 ), 𝑖𝑓 𝑡∗ ∈ [0, 𝑇 ] (12)
⎪2 2
√ 𝐷(𝐺𝑖 , 𝐺𝑗 ) = 𝑖𝐿1 (𝑁𝐺𝑖 , 𝑁𝐺𝑗 ). (13)
⎪ 1 𝛥𝑘𝑇 2 + (𝛥𝑏 + 2𝜋 𝛥𝜎)𝑇 , 𝑖𝑓 𝑡∗ > 𝑇
⎩2 2
4.1.3. DTW-i𝐿1 definition and properties
𝑏 −𝑏
where 𝑡∗ = 𝑘1 −𝑘2 represents the intersection of the two core lines, and Combining the discussions in the above two subsections, we pro-
1 2
𝛥𝑘 = |𝑘1 − 𝑘2 |, 𝛥𝑏 = |𝑏1 − 𝑏2 |, 𝛥𝜎 = |𝜎1 − 𝜎2 |. poses an novel distance, DTW-i𝐿1 .

7
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

Fig. 9. Distance calculation of two unequal-size LFIGs. (a) Core lines of two unequal-size LFIGs. (b) Adding a fictional LFIG 𝐺3 in two unequal-size LFIGs. (c) DTW equilibrium
transformation of unequal-size LFIGs to NG.

{ }
Definition 3 (DTW-i𝐿1 Distance). Given a 𝐺𝑇 𝑆 𝐺1 , 𝐺2 , … , 𝐺𝑛 and Theorem 4.4 (Monotonicity). The monotonicity of this distance is specified
obtain the mean size 𝑇 of the granules, use Eq. (8) to equalize the as follows.
granular time series to obtain a new granular time series 𝑁𝐺𝑇 𝑆
{ } • For any 𝑏, 𝜎, 𝑇 equal three granules 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝐺𝑗 (𝑘𝑗 , 𝑏𝑗 , 𝜎𝑗 , 𝑇𝑗 )
𝑁𝐺1 , 𝑁𝐺2 , … , 𝑁𝐺𝑛 , where 𝑁𝐺𝑖 = 𝐺(𝑘′𝑖 , 𝑏′𝑖 , 𝜎𝑖′ , 𝑇 ). The DTW-i𝐿1 dis-
and 𝐺𝑝 (𝑘𝑝 , 𝑏𝑝 , 𝜎𝑝 , 𝑇𝑝 ) with unequal k-values in the three partitioned
tance between any two granules 𝐺𝑖 and 𝐺𝑗 is obtained as follows:
domains of 𝑡∗ , there is |𝑘𝑖 − 𝑘𝑗 | ≤ |𝑘𝑖 − 𝑘𝑝 |, i.e. 𝛥𝑘𝑖𝑗 ≤ 𝛥𝑘𝑖𝑝 , and
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≤ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ).
⎧3 √
2𝜋 • For any 𝑘, 𝜎, 𝑇 equal three granules 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝐺𝑗 (𝑘𝑗 , 𝑏𝑗 , 𝜎𝑗 , 𝑇𝑗 )
⎪ 2 𝛥𝑘𝑇 2 + (𝛥𝑏 + 2 𝛥𝜎)𝑇 , 𝑖𝑓 𝑡∗ < 0
⎪3 √ and 𝐺𝑝 (𝑘𝑝 , 𝑏𝑝 , 𝜎𝑝 , 𝑇𝑝 ) with unequal b-values, there is |𝑏𝑖 − 𝑏𝑗 | ≤
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) = ⎨ 𝛥𝑘𝑇 2 + 2𝜋 𝛥𝜎𝑇 + 𝛥𝑏(𝑡∗ − 𝑇 ), 𝑖𝑓 𝑡∗ ∈ [0, 𝑇 ] |𝑏𝑖 − 𝑏𝑝 |, i.e. 𝛥𝑘𝑖𝑗 ≤ 𝛥𝑘𝑖𝑝 , and in the case of 𝑡∗ < 0 or 𝑡∗ > 𝑇 ,
⎪ 21 2 √
2𝜋
⎪ 2 𝛥𝑘𝑇 2 + (𝛥𝑏 + 2 𝛥𝜎)𝑇 , 𝑖𝑓 𝑡∗ > 𝑇 then 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) < 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ). At 0 ≤ 𝑡∗ ≤ 𝑇 ,
⎩ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≥ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ).
(14) • For any 𝑘, 𝑏, 𝑇 equal three granules 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝐺𝑗 (𝑘𝑗 , 𝑏𝑗 , 𝜎𝑗 , 𝑇𝑗 )
𝑏′𝑖 −𝑏′𝑗 and 𝐺𝑝 (𝑘𝑝 , 𝑏𝑝 , 𝜎𝑝 , 𝑇𝑝 ) with unequal 𝜎-values in the three partitioned
where 𝑡∗ = 𝑘′𝑖 −𝑘′𝑗
, 𝛥𝑘 denotes the absolute value of 𝑘 for 𝑁𝐺𝑖 and 𝑁𝐺𝑗 domains of 𝑡∗ , there is |𝜎𝑖 − 𝜎𝑗 | ≤ |𝜎𝑖 − 𝜎𝑝 |, i.e., 𝛥𝜎𝑖𝑗 ≤ 𝛥𝜎𝑖𝑝 , and
i.e. |𝑘′𝑖 − 𝑘′𝑗 |, and similarly, 𝛥𝑏 = |𝑏′𝑖 − 𝑏′𝑗 |, 𝛥𝜎 = |𝜎𝑖′ − 𝜎𝑗′ |. then 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≤ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ).
• For any 𝑘, 𝑏, 𝜎 equal three granules 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝐺𝑗 (𝑘𝑗 , 𝑏𝑗 , 𝜎𝑗 , 𝑇𝑗 )
Theorem 4.1 (Non-negative). For any two granules 𝐺𝑖 and 𝐺𝑗 , satisfy and 𝐺𝑝 (𝑘𝑝 , 𝑏𝑝 , 𝜎𝑝 , 𝑇𝑝 ) with unequal T-values in the three partitioned
𝑇𝑖 +𝑇𝑗 𝑇𝑖 +𝑇𝑝
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≥ 0, ‘‘=’’ holds if and only if 𝐺𝑖 = 𝐺𝑗 . domains of 𝑡∗ , there is 2
≤ 2
, and then 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≤
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ).
Theorem 4.2 (Symmetry). For any two granules 𝐺𝑖 and 𝐺𝑗 , satisfy
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) = 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑗 , 𝐺𝑖 ). Proof. In the three partition fields of 𝑡∗ , for any 𝑏, 𝜎, 𝑇 equal three gran-
ules 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝐺𝑗 (𝑘𝑗 , 𝑏𝑗 , 𝜎𝑗 , 𝑇𝑗 ) and 𝐺𝑝 (𝑘𝑝 , 𝑏𝑝 , 𝜎𝑝 , 𝑇𝑝 ) with unequal
Proof. The computational part of DTW-i𝐿1 is the addition of a 𝛥𝑘𝑇 2 k-values, the distances between 𝐺𝑖 , 𝐺𝑗 and 𝐺𝑖 , 𝐺𝑝 are respectively,
to the original 𝐿1 distance, which obviously has the same properties as √
3 2𝜋
the original distance of 𝐿1 , and thus it still obeys non-negativity and 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ) = 𝑇 2 𝛥𝑘𝑖𝑝 + (𝛥𝑏𝑖𝑝 + 𝛥𝜎𝑖𝑝 )𝑇 ,
2 2
symmetry. □ √
3 2𝜋
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) = 𝑇 2 𝛥𝑘𝑖𝑗 + (𝛥𝑏𝑖𝑗 + 𝛥𝜎𝑖𝑗 )𝑇 ,
Theorem 4.3 (Triangle Inequality). For any three granules 𝐺𝑖 , 𝐺𝑗 and 𝐺𝑝 , 2 2
satisfy 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) + 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑗 , 𝐺𝑝 ) ≥ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ). The then
‘‘=’’ holds when 𝐺𝑗 = 𝐺𝑖 or 𝐺𝑗 = 𝐺𝑝 . 1
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ) − 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) = 𝑇 2 (𝛥𝑘𝑖𝑝 − 𝛥𝑘𝑖𝑗 ).
2
Proof. For any three particles 𝐺𝑖 , 𝐺𝑗 and 𝐺𝑝 , DTW-i𝐿1 obtains If |𝑘𝑖 − 𝑘𝑗 | ≤ |𝑘𝑖 − 𝑘𝑝 |, i.e. 𝛥𝑘𝑖𝑗 ≤ 𝛥𝑘𝑖𝑝 , then
𝑁𝐺𝑖 (𝑘′𝑖 , 𝑏′𝑖 , 𝜎𝑖′ , 𝑇𝑖′ ), 𝑁𝐺𝑗 (𝑘′𝑗 , 𝑏′𝑗 , 𝜎𝑗′ , 𝑇𝑗′ ) and 𝑁𝐺𝑝 (𝑘′𝑝 , 𝑏′𝑝 , 𝜎𝑝′ , 𝑇𝑝′ ) by equilib-
rium and adds an additional term 𝛥𝑘𝑇 2 to the original 𝐿1 distance to 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ) − 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≥ 0.
compute the distance, so we only need to show the triangular inequality Thus, for any 𝛥𝑘𝑖𝑗 ≤ 𝛥𝑘𝑖𝑝 ,
for 𝛥𝑘𝑇 2 , i.e.
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ) ≤ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 )
𝛥𝑘′𝑖𝑝 𝑇 2 ≤ 𝛥𝑘′𝑖𝑗 𝑇 2 + 𝛥𝑘′𝑗𝑝 𝑇 2 .
holds. Similarly, the monotonicity of 𝑏, 𝜎, 𝑇 can be proved. □
Simplifying for
For this algorithm, first we improve the equal-size 𝐿1 Hausdorff
𝛥𝑘′𝑖𝑝 𝑇 2 𝑎𝑛𝑑 𝛥𝑘′𝑖𝑗 𝑇 2 + 𝛥𝑘′𝑗𝑝 𝑇 2 distance, which solves the original counterintuitive problem; then, we
introduced the DTW equalization algorithm to improve the unequal
yields
granule distances, making it more in line with trend patterns. Com-
𝛥𝑘′𝑖𝑝 𝑎𝑛𝑑 𝛥𝑘′𝑖𝑗 + 𝛥𝑘′𝑗𝑝 . bining the advantages of the 𝑖𝐿1 distance and the DTW equalization
algorithm, DTW-i𝐿1 can more accurately reflect the inter-granule sim-
There is
ilarity and is more practical. As shown in Table 2, the novel DTW-i𝐿1
|𝑘′𝑖 − 𝑘′𝑗 | + |𝑘′𝑗 − 𝑘′𝑝 | ≥ |𝑘′𝑖 − 𝑘′𝑗 + 𝑘′𝑗 − 𝑘′𝑝 | = |𝑘′𝑖 − 𝑘′𝑝 |, distance is more intuitive.

so 4.2. DTW-i𝐿1 Fuzzy C-means

𝛥𝑘′𝑖𝑝 𝑇 2 ≤ 𝛥𝑘′𝑖𝑗 𝑇 2 + 𝛥𝑘′𝑗𝑝 𝑇 2 ,
The FCM presents the fuzzy relationships of data by quantifying
and DTW-i𝐿1 satisfies the triangle inequality property. □ the memberships of data points to various clusters, making it more

8
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

Table 2 Table 3
Distance Comparison of LFIGs. Comparison of Silhouette Coefficient and Dunn Index of granular time series.
𝐺1∗ , 𝐺2∗ 𝐺1∗ , 𝐺3∗ 𝐺2∗ , 𝐺3∗ Model Clusters:5 Clusters:6 Clusters:7 Clusters:8 Clusters:9 Clusters:10
𝐿1 68.77 50.91 75.03 Si Du Si Du Si Du Si Du Si Du Si Du
𝐷𝑇 𝑊 -𝑖𝐿1 93.77 225.91 275.03
OFCM 0.38 0.39 0.33 0.41 0.34 0.44 0.33 0.49 0.33 0.43 0.29 0.18
DFCM 0.42 0.44 0.44 0.59 0.45 0.94 0.45 0.73 0.44 0.68 0.41 0.71

Note: A value in bold is the optimal index.

suitable for comprehensively capturing the potential laws of complex
problems. DTW-i𝐿1 is a method of measuring the distance between
linear fuzzy information granules of equal or unequal size, which
more intuitively reflects the similarity between granules. Therefore, we
propose a DTW-i𝐿1 fuzzy c-means(DFCM) for clustering granules.
{ }
Suppose 𝐺𝑇 𝑆 = 𝐺1 , 𝐺2 , … .𝐺𝑁 denotes the granular time series to
be divided into c clusters. The algorithm minimizes the objective func-
tion 𝐽 of the FCM, which is defined as the sum of squared membership
weighted distances from the data point to the cluster center:
∑
𝑁 ∑
𝑐
2
𝐽= 𝑢𝑚
𝑖,𝑗 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝑉𝑗 ), (15)
𝑖=1 𝑗=1

accompanied by the following restrictions: Fig. 10. Clustering evaluation indicators. (a)Silhouette coefficient. (b)Dunn index.

∑
𝑐
𝑢𝑖,𝑗 = 1, 0 ≤ 𝑢𝑖,𝑗 ≤ 1, 𝑖 = 1, 2, … , 𝑁, (16)
𝑗=1 between the sample and its cluster to assess the accuracy of the clus-
where 𝑢𝑖,𝑗 denotes the membership of granule 𝐺𝑖 in cluster j, 𝑉𝑗 denotes tering. The dunn index calculates the ratio of the minimum inter-cluster
the clustering center of the 𝑗th cluster, m is a fuzzy weighting index, distance to the maximum intra-cluster compactness to quantify the
usually greater than 1, and 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑗 , 𝑉𝑖 ) is the distance between effectiveness of clustering. An aspect warranting accentuation is that
granule 𝐺𝑖 and clustering center 𝑉𝑗 . The computation of the member- the higher the values of silhouette coefficient and dunn index, the better
ship 𝑢𝑖,𝑗 and the clustering center 𝑉𝑗 follows the following formula: the clustering effect.
DFCM and OFCM clustering was performed on the same set of
∑𝑐
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝑉𝑗 ) 2 −1 grain time series, with the number of clusters set between 5 and 10.
𝑢𝑖,𝑗 = ( ( ) 𝑚−1 ) , 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑗 ≤ 𝑐. (17) According to the evaluation indices in Table 3, the DFCM outperforms
𝑘=1
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝑉𝑘 )
∑𝑁 𝑚 the OFCM in both silhouette coefficient and dunn index, showing a
𝑖=1 𝑢𝑖,𝑗 𝐺𝑖 better performance. It is also clear from Fig. 10 that DFCM is superior
𝑉𝑗 = ∑ 𝑁 , 1 ≤ 𝑗 ≤ 𝑐. (18)
𝑚 to OFCM in terms of metrics. Thus, our study shows that the DFCM
𝑗=1 𝑢𝑖,𝑗
clustering method is more effective in classifying granular time series
The steps to implement the DFCM algorithm are as follows:
compared to OFCM.
(1) Step 1: Set the fuzzy weighting index 𝑚, the number of clusters 𝑐,
the threshold for iteration stopping 𝜖, the initial cluster centroid 5. Double-level optimal fuzzy association rules prediction model
value 𝑉𝑖 (𝑖 = 0, 1, 2....𝑐), the stochastic membership matrix 𝑈 and based on DFCM
the upper bound for the number of iterations 𝑚𝑎𝑥_𝑖𝑡𝑒𝑟.
(2) Step 2: Compute the 𝑐 clusters 𝑉𝑗 (𝑗 = 1, 2....𝑐) using Eq. (18).
When dealing with fuzzy association rules obtained from granular
(3) Step 3: Calculate the objective function 𝐽 according to Eq. (15). time series, different approaches may produce different predictive per-
If the advancement of the objective function relative to the last formance. Therefore, digging deeper into the intrinsic patterns of these
time is less than the threshold of 𝑒𝑝𝑠𝑖𝑙𝑜𝑛 or the maximum number rules is a problem for further research. In this regard, we first propose
of iterations is reached, the algorithm stops.
multilevel FARs optimization extraction based on DFCM algorithm in
(4) Step 4: Calculate the new 𝑈 matrix using Eq. (17). Return to step
Section 5.1, and then establish a double-level optimal fuzzy associa-
2.
tion rules prediction model based on DFCM in Section 5.2. Then, we
(5) Step 5: The maximum membership cluster of each granule will
introduce the entire short-term forecasting process in Section 5.3.
be considered as the final assigned cluster for that granule.

Theorem 4.5 (Convergence of DTW-i𝐿1 Fuzzy C-means). For any given 5.1. Multilevel FARs optimization extraction based on DFCM
initial value of a cluster center 𝑉 , the objective function 𝐽 of the DFCM will
definitely converge. Fuzzy inference systems (FIS) are constructed using a collection
of 𝑖𝑓 − 𝑡ℎ𝑒𝑛 rules and logical operations, where the 𝑖𝑓 part and the
Proof. Since the FCM algorithm using other distance metrics, such 𝑡ℎ𝑒𝑛 part are called the antecedent and the consequent of the rule,
as Euclidean distance and Manhattan distance, is convergent (Saha & respectively. Specifically, FAR can be abbreviated as 𝐴𝑖−1 → 𝐴𝑖 , where
Das, 2018). So according to the equivalence of distance metric, the the antecedent and consequent parts of the association rule are linear
DFCM algorithm using the DTW-𝑖𝐿1 distance metric is also guaranteed granules within the corresponding time window. Different association
to converge. □ rules have varying effects on the prediction process. It is crucial to
To demonstrate the superiority of DFCM, we compare DFCM with select appropriate association rules and accurately quantify their role
original distance-based FCM (OFCM). We introduced the silhouette in prediction to enhance the accuracy and logic of prediction. In this re-
coefficient(Si) and the dunn index(Du) as metrics to evaluate clustering gard, we consider a set of FARs consisting of 𝑁 granules, 𝐴1 , 𝐴2 , … , 𝐴𝑁 .
performance(Hartama & Anjelita, 2022; Ncir, Hamza, & Bouaguel, We propose a multilevel FARs optimization extraction based on DFCM
2021). The silhouette coefficient calculates the degree of correlation for extracting optimal rules of FARs hierarchically.

9
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

Definition 4 (Multilevel FARs Optimization Extraction based on DFCM, Similarly, we can get the prediction results of 𝐴𝑁 in all clusters,
{ }
𝑀𝐹 𝑂𝐸). For a given sequence 𝐺𝑇 𝑆 = 𝐴1 , 𝐴2 , … , 𝐴𝑁 , we construct
{ } i.e. 𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙 is
ordinary FARs to obtain 𝐴1 → 𝐴2 , 𝐴2 → 𝐴3 , … , 𝐴𝑁−1 → 𝐴𝑁 . Next, { }
we set the number of DFCM clusters to c, perform DFCM clustering on 𝑟𝑢𝑙𝑒1𝑜𝑝𝑡𝑖𝑐𝑎𝑙 , 𝑟𝑢𝑙𝑒2𝑜𝑝𝑡𝑖𝑐𝑎𝑙 , … , 𝑟𝑢𝑙𝑒𝑐𝑜𝑝𝑡𝑖𝑐𝑎𝑙
{ }
GTS, and then construct FARs within each cluster, obtaining c clusters’ = 𝐴1𝑖1 → 𝐴𝑖1+1 , 𝐴2𝑖2 → 𝐴𝑖2+1 , … , 𝐴𝑐𝑖𝑐 → 𝐴𝑖𝑐+1 ,
FARs
{ } where 𝐴1𝑖1 is the most similar granule to 𝐴𝑁 in cluster 1, and 𝑖1 is the
𝐹 𝐴𝑅𝑠1 = 𝐴1𝑖 → 𝐴𝑖1 +1 , … , 𝐴1𝑖 → 𝐴𝑖𝑛 +1 , … , index and different from 𝑖1 .
1 𝑛1 1
{ }
𝐹 𝐴𝑅𝑠𝑐 = 𝐴𝑐𝑗 → 𝐴𝑗1 +1 , … , 𝐴𝑐𝑗 → 𝐴𝑗𝑛 +1 .
1 𝑛𝑐 𝑐 Remark 3. Since the size of each granule is unequal, in order to make
Assuming the prediction is executed in the (𝑁 + 1)th segment, we the prediction size equal, the prediction result 𝐴𝑖+1 size is uniformly
discretize the antecedent 𝐴𝑁 into c-clusters based on the DFCM, i.e. used as the mean size of the granules in this paper.
⎧𝐴 1 , 𝑢 For any current linear trend granule, MFOE’s goal is to select the
𝑁,1
⎪ 𝑁
⎪𝐴2𝑁 , 𝑢𝑁,2 optimal association rule for each level from the FARs of all levels, and
𝐴𝑁 =⎨
the selection criterion is the optimal value of the match between the
⎪...
⎪𝐴 𝑐 , 𝑢 current granule and the antecedent granule of each FAR in the level,
⎩ 𝑁 𝑁,𝑐
which is determined based on the DTW-i𝐿1 distance. Based on this,
where 𝑐 represents the number of clusters, 𝐴𝑘𝑁 denotes that 𝐴𝑁 is in we propose a Multilevel FARs Optimization Extraction based on DFCM
the 𝑘th cluster, 𝑘 ∈ [1, 𝑐], and 𝑢𝑁,𝑘 denotes the membership of 𝐴𝑁 in Algorithm(Algorithm 2), which outputs optimal rules based on 𝐴𝑁 for
cluster 𝑘. We define Multilevel FARs Optimization Extraction based on all levels.
DFCM as extracting an optimal FAR of 𝐴𝑁 in FARs of each cluster
through 𝐴𝑖𝑁 (𝑖 = 1, 2, … , 𝑐), resulting in c multilevel optimal rules Theorem 5.1 (Convergence of Multilevel FARs Optimization Extraction
{ 1 } { }
𝐴𝑖1 → 𝐴𝑖1+1 , … , 𝐴𝑐𝑖𝑐 → 𝐴𝑖𝑐+1 , denoted as based on DFCM). For a given granular time series 𝐺𝑇 𝑆 = 𝐴1 , 𝐴2 , … , 𝐴𝑁 ,
{ } the result of the algorithm will converge to a set of optimal 𝐹 𝐴𝑅𝑠 =
𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙 = 𝑟𝑢𝑙𝑒1𝑜𝑝𝑡𝑖𝑐𝑎𝑙 , 𝑟𝑢𝑙𝑒2𝑜𝑝𝑡𝑖𝑐𝑎𝑙 , … , 𝑟𝑢𝑙𝑒𝑐𝑜𝑝𝑡𝑖𝑐𝑎𝑙 . { 1 }
𝐴𝑖1 → 𝐴𝑖1+1 , 𝐴2𝑖2 → 𝐴𝑖2+1 , … , 𝐴𝑐𝑖𝑐 → 𝐴𝑖𝑐+1 , where 𝑐 is the number of
Specifically, we will illustrate this with the example of 𝐴𝑘𝑁 (0 ≤ 𝑘 ≤ DFCM and 𝑖1 is the index.
𝑐). The set of granules of cluster 𝑘 is expressed as { }
{ } Proof. For a given granular time series 𝐺𝑇 𝑆 = 𝐴1 , 𝐴2 , … , 𝐴𝑁 ,
𝐶𝑘 = 𝐴𝑘𝑗 | 𝑗𝑖 ∈ [1, 𝑁] , obviously, the number of elements in each cluster after FCM must be
𝑖
greater than or equal to 1. According to the formula
where 𝐶𝑘 denotes the set of granules belonging to the 𝑘th cluster, and
𝐴𝑘𝑗 denotes the 𝑖th granule in the 𝑘th cluster, with subscript 𝑗𝑖 . Forms ⎧ 𝑘
𝑖
of FIS of cluster 𝑘 are as below: ⎪𝐴 → 𝐴𝑗𝑞 +1
𝑟𝑢𝑙𝑒𝑘𝑜𝑝𝑡𝑖𝑐𝑎𝑙 = ⎨ 𝑗𝑞 ,
′ ′ 𝑘 𝑘
Rule 1 : if 𝐴𝑘𝑁 is 𝐴𝑘𝑗 , then 𝐴𝑘𝑁+1 is 𝐴𝑗1 +1 ⎪𝜔𝑞 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝜔𝑖 (𝐴𝑁 , 𝐴𝑗𝑖 ), 𝑖 ∈ [1, 𝑝]
1
⎩
Rule 2 : if 𝐴𝑘𝑁 is𝐴𝑘𝑗 , then 𝐴𝑘𝑁+1 is 𝐴𝑗2 +1
2
⋮ there will definitely be an optimal granule in each cluster, thereby
Rule 𝑛𝑘 : if 𝐴𝑘𝑁 is 𝐴𝑘𝑗 , then 𝐴𝑘𝑁+1 is 𝐴𝑗𝑛 +1 forming an optimal FAR belonging to that cluster. Therefore, this
𝑛𝑘 𝑘
Observation : 𝐴𝑘𝑁 is 𝐴𝑘𝑗 algorithm is guaranteed to converge. □
Conclusion : 𝐴𝑁+1 𝑖𝑠 𝐴𝑗+1
The input to the FIS takes the form of the last granule of a granular
time series, i.e. 𝐴𝑘𝑁 = 𝐴𝑘𝑗 . For a fuzzy rule 𝑖, if the premise Algorithm 2 Multilevel Optimization Extraction based on DFCM.
{ }
Input: Granular Time Series 𝐺𝑇 𝑆 𝐴1 , 𝐴2 , ..., 𝐴𝑁 , the number of clusters 𝑐
𝐴𝑘𝑁 𝑖𝑠 𝐴𝑘𝑗 , Output: 𝐹 𝐴𝑅𝑠 𝑜𝑝𝑡𝑖𝑐𝑎𝑙
𝑖
1: Each granule 𝐴𝑖 in 𝐺𝑇 𝑆 is DFCM clustered.
is true to certain degree, which is defined by a matching degree
2: Each granule 𝐴𝑖 gets a different membership to different clusters.
between observations 𝐴𝑘𝑁 and the antecedent 𝐴𝑘𝑗 , i.e.
3: The maximum membership cluster is taken as its attributed cluster.
𝜔′𝑖 = 𝜔′𝑖 (𝐴𝑘𝑁 , 𝐴𝑘𝑗 ), 4: for cluster 𝑖 in all clusters:
𝑖 5: for 𝐴𝑖𝑗 in cluster 𝑖:
then the conclusion part of rule 𝑖 6: Building the 𝐴𝑖𝑗 → 𝐴𝑗+1 rule.
7: endfor
𝐴𝑁+1 = 𝐴𝑗𝑖 +1 , 8: endfor
9: for cluster 𝑖 in all clusters:
applies as well. Based on this, the optimal rule of 𝐴𝑁 in class k is
10: for 𝐴𝑖𝑗 in cluster 𝑖:
selected from the fuzzy rules of this cluster 𝑟𝑢𝑙𝑒 1, 𝑟𝑢𝑙𝑒 2, … , 𝑟𝑢𝑙𝑒 𝑛𝑘 , 11:
′
Calculate the similarity 𝜔𝑖 (𝐴𝑁 , 𝐴𝑖𝑗 )
which are chosen from among them, as shown in the following formula: 12: Extract the rule with the highest similarity to 𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙
⎧ 𝑘 13: endfor
⎪𝐴 → 𝐴𝑗𝑞 +1 14: endfor
𝑟𝑢𝑙𝑒𝑘𝑜𝑝𝑡𝑖𝑐𝑎𝑙 = ⎨ 𝑗𝑞 (19) 15: Return 𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙
′ ′ 𝑘 𝑘
⎪𝜔𝑞 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝜔𝑖 (𝐴𝑁 , 𝐴𝑗𝑖 ), 𝑖 ∈ [1, 𝑛𝑘 ]
⎩
where 𝜔′𝑖 (𝐴𝑘𝑁 , 𝐺𝑗𝑘 ) is the matching degree of the observation 𝐴𝑘𝑁 with 5.2. Double-level optimal Fuzzy association rules prediction model based on
𝑖
the antecedent 𝐴𝑘𝑗 of the 𝑟𝑢𝑙𝑒 𝑖. Obviously, the larger the distance DFCM
𝑖
between the two granules, the smaller the similarity between the two
granules. Therefore, 𝜔′𝑖 (𝐴𝑘𝑁 , 𝐴𝑘𝑗 ) can be expressed in this form:
𝑖 On the basis of the MFOE algorithm, we construct a double-level
1 optimal fuzzy association rules prediction model based on DFCM for
𝜔′𝑖 (𝐴𝑘𝑁 , 𝐴𝑘𝑗 ) = . (20)
𝑖 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐴𝑘𝑁 , 𝐴𝑘𝑗 ) short-term forecasting.
𝑖

10
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

Fig. 11. Compare participating with different clusters on US daily consumptive load.(a)one top cluster. (b)double top clusters. (c)three top clusters. (d)all clusters.

Fig. 12. Compare participating with different clusters on ZhenJiang power consumption.(a)one top cluster. (b)double top clusters. (c)three top clusters. (d)all clusters.

{ }
For a given 𝐺𝑇 𝑆 𝐴1 , 𝐴2 , … , 𝐴𝑁 by DFCM clustering into c clus- FARs decreases and their probability of being noise increases. And the
ters, the MFOE algorithm is used to generate the optimal association presence of noise can bias the predicted trend, so we believe that more
rule accurate predictions are produced when noise is minimized.
{ } We performed three sets of experimental data validation, and
𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙 = 𝐴1𝑖1 → 𝐴𝑖1+1 , 𝐴2𝑖2 → 𝐴𝑖2+1 , … , 𝐴𝑐𝑖𝑐 → 𝐴𝑖𝑐+1 .
Figs. 11, 12, and 13 show four comparisons of three different data
Select top 𝑁𝑚 available association rules for prediction from 𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙 under the same conditions, respectively. The number of clusters for the
based on the descending order of membership, and suppose 𝑁𝑚 avail- three datasets is set to 8. We considered the results for one optimal
able association rules are cluster, double-level optimal clusters, three-level optimal clusters and
{ 𝑘 𝑘𝑁
} all clusters. It can be seen that when an optimal cluster is selected for
𝑘
𝐴𝑖𝑘1 → 𝐴𝑖𝑘1 +1 , … , 𝐴𝑖𝑘𝑗 → 𝐴𝑖𝑘𝑗 +1 , … , 𝐴𝑖𝑘 𝑚 → 𝐴𝑖𝑘𝑁 +1 ,
1 𝑗 𝑁𝑚 𝑚 prediction, there is a deviation between the predicted and actual re-
1 ≤ 𝑘𝑗 ≤ 𝑐, 1 ≤ 𝑗 ≤ 𝑁𝑚 , 𝑖𝑘𝑗 < 𝑁. sults, and this deviation can be clearly seen from their regression lines.
When the number of clusters is 2, the results improve significantly;
The prediction is expressed in the following equation, when the number of clusters is increased to 3, the prediction bias begins
𝑁𝑚
∑ to appear; when all clusters are considered, it is clear that the presence
𝑥∗ (𝑞) = 𝑤𝑗 ∗ 𝐴𝑖𝑘𝑗 +1 (𝑞), 1 ≤ 𝑞 ≤ 𝑇 + 1 (21) of noise causes the prediction bias to increase significantly, and the
𝑗=1 prediction tends toward a flat and sluggish trend. Therefore, we believe
𝑢𝑁,𝑘𝑗 that the prediction is better when 𝑁𝑚 is 2, and propose the double-level
where 𝑤𝑗 = ∑𝑁𝑚 , 𝑢𝑁,𝑘𝑗 is the membership of 𝐴𝑁 in the 𝑘𝑗 th cluster
𝑢
𝑘=1 𝑁,𝑘𝑗 optimal fuzzy association rules prediction model based on DFCM.
and 𝑁𝑚 > 1 is known as the multilevel fuzzy number, which indicates The double-level optimal fuzzy association rules prediction model
the number of selected clusters for prediction, 𝑇 is the mean length of based on DFCM is expressed in the following equation,
all granules.
The size of the number of clusters 𝑁𝑚 used for prediction is related 𝑥∗ (𝑞) = 𝑤1 ∗ 𝐴𝑖𝑘1 +1 (𝑞) + 𝑤2 ∗ 𝐴𝑖𝑘2 +1 (𝑞), 1 ≤ 𝑞 ≤ 𝑇 + 1 (22)
to the complexity and generality of the model. In general, the size 𝑢𝑁,𝑘
1
where 𝑤1 = , 𝑢𝑁,𝑘1 is the membership of 𝐴𝑁 in the 𝑘1 th
of 𝑁𝑚 will vary depending on different rule styles and data condi- 𝑢𝑁,𝑘 +𝑢𝑁,𝑘
1 2
tions to achieve the optimal prediction result. According to Occam’s cluster, 𝑇 is the mean length of all granules.
razor, the model should be kept as simple as possible to avoid over- To clearly illustrate the performance of our model, we introduce the
parameterization and to ensure that the model performs well in diverse following three evaluation metrics.
and intricate scenarios. So we are looking for a 𝑁𝑚 with universal and
(1) Root Mean Square Error (RMSE):
generalized properties. √
In the current rule-based forecasting, if too many FARs are intro- ∑𝑛 ∗ 2
𝑡=1 (𝑥 (𝑡) − 𝑥(𝑡))
duced, the forecasting results may tend to flatten out, making the RMSE = (23)
𝑛
forecasts stagnant. Conversely, if only a single optimal FAR is consid-
ered, important information may be lost, leading to instability in the (2) Mean Absolute Percentage Error (MAPE):
prediction. Thus, among the rules generated by the MFOE algorithm,
1 ∑ |𝑥∗ (𝑡) − 𝑥(𝑡)|
𝑛
the best prediction rules tend to be distributed among the optimal MAPE = × 100 (24)
𝑁𝑚 (1 < 𝑁𝑚 ≤ 𝑐) clusters. We take the rule of an optimal cluster 𝑛 𝑡=1 𝑥(𝑡)
as the basic prediction rule. When 𝑁𝑚 increases to 2, the probability
(3) Mean Absolute Error (MAE):
that a added cluster FAR is available is the highest, which also means
1 ∑| ∗
𝑛
that it is the least likely to act as a rule that interferes with the
MAE = 𝑥 (𝑡) − 𝑥(𝑡)|| (25)
prediction, i.e., noise. As 𝑁𝑚 increases, the value of the added cluster 𝑛 𝑡=1 |

11
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

Fig. 13. Compare participating with different clusters on a power load of a Chinese mathematical modeling.(a)one top cluster. (b)double top clusters. (c)three top clusters. (d)all
clusters.

(2) Part 2. Clustering granular time series by DFCM.

Using DFCM clustering method, the granules 𝐴1 , 𝐴2 , … , 𝐴𝑁 were
clustered to c clusters. The maximum membership cluster of
each granule is considered as the final belonging cluster, then
{ } { }
𝐺𝑇 𝑆 𝐴1 , 𝐴2 , … .𝐴𝑁 ∈ 𝐶 𝐶1 , 𝐶2 , … .𝐶𝑐 ,
∑
where 𝑐𝑖=1 𝐶𝑖 = 𝐺𝑇 𝑆. The step is given in Section 4.
(3) Part 3. Constructing FARs and extracting multilevel optimal
rules for 𝐴𝑁 .
The granular fuzzy logic relation is of the form 𝐴𝑖 → 𝐴𝑖+1 , where
𝐴(𝑡 − 1) = 𝐴𝑖 , 𝐴(𝑡) = 𝐴𝑖+1 . Based on this, we construct the fuzzy
association rule
{ }
𝐴1 → 𝐴2 , 𝐴2 → 𝐴3 , … , 𝐴𝑁−1 → 𝐴𝑁 .

Taking the prediction of 𝑥𝑚 as an example, let 𝑚 = 𝑇 ∗ 𝑁 + 𝑗, 1 ≤

𝑗 ≤ 𝑇 + 1, which implies that 𝑥𝑚 falls in the (𝑁 + 1)th interval,
Fig. 14. Framework of the entire short-term forecasting. i.e., the prediction of the (𝑁 + 1)th interval. In this case, we say
that 𝑥𝑚 corresponds to the (𝑁 + 1)th window. So the granule
𝐴𝑁 is used as the prediction antecedent, which is discretized
where 𝑥(𝑡) is the actual value and 𝑥∗ (𝑡)(𝑡 = 1, 2, … , 𝑛) is the correspond- into 𝑐 clusters, and for fuzzy association rules of 𝐴𝑁 in each
ing predicted value. It merits underscoring that lesser values of the cluster, the multilevel optimal rule of 𝐴𝑁 is extracted by the
RMSE, MAPE, and MAE variables connote heightened fidelity in the MFOE algorithm, obtaining
forecasting endeavor. { }
𝐹 𝐴𝑅𝑠𝑐 = 𝐴1𝑖1 → 𝐴𝑖1+1 , 𝐴2𝑖2 → 𝐴𝑖2+1 , … , 𝐴𝑐𝑖𝑐 → 𝐴𝑖𝑐+1 .

5.3. The entire short-term forecasting process The step is given in Section 5.
(4) Part 4. Double-level optimal fuzzy association rules prediction
The short-term forecasting framework is shown in Fig. 14, and to model based on DFCM for short-term forecasting.
understand the model clearly, the following steps have been devised to Select two rules based on membership. Suppose the first two
describe it in detail. clusters with the largest membership in MFOE are cluster 1 and
To ensure prediction quality, it is necessary to pre-tune the relevant cluster 3, then the predicted double-level available association
hyperparameter settings to enable the model to perform optimally in rules are 𝐴1𝑖1 → 𝐴𝑖1+1 and 𝐴3𝑖3 → 𝐴𝑖3+1 , and the final prediction
predictions. In order to determine the hyperparameters of the model, result of 𝑥∗𝑚 is
the training set is divided into two parts: a training subset and a testing
subset, with the latter being an average granule size. On the testing 𝑥∗𝑚 = 𝜔1 𝐴𝑖1+1 (𝑗) + 𝜔3 𝐴𝑖3+1 (𝑗), 1 ≤ 𝑗 ≤ 𝑇 + 1
subset, we evaluate the performance of varying cluster numbers on 𝑢𝑁,1 𝑢𝑁,3
where 𝜔1 = , 𝜔3 = . The step is given in
prediction, with each number of cluster predicted repeatedly. Based on 𝑢𝑁,1 +𝑢𝑁,3 𝑢𝑁,1 +𝑢𝑁,3
three evaluation indicators, we select the optimal number of clusters Section 5.
and initial cluster center, and apply them to the subsequent actual
The proposed method is an iterative prediction model. In each
prediction process of future data.
prediction iteration, it first makes a prediction based on the existing
(1) Part 1. Linear granulation based data, and then incorporates the prediction result into the original data
{ on breakpoints
} and get LFIGs.
For given a time series 𝑋 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 , we firstly get the to form an extended new data as the input for the next prediction. Due
breakpoints of the time series. Based on the breakpoints, 𝑋 is to the introduction of new data, the breakpoints, granular series, and
divided into 𝑁 subintervals to obtain 𝑁 subsequences. Each sub- granule clustering results will change in each iteration. Consequently,
sequence is granulated into one LFIG by Eq. (3). Finally we get the last LFIG used as the prediction antecedent will also change, result-
the granular time series ing in different multilevel optimal FARs. This process will continue,
{ } with Part 1 to 4 being executed in a loop until the preset predicted
𝐺𝑇 𝑆 𝐴1 , 𝐴2 , … , 𝐴𝑁 .
horizon is reached. The hyperparameters for breakpoints and clustering
The step is given in Section 3. remain unchanged to minimize the differences between iterations.

12
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

where 𝛷𝑝 (𝐵) is a non-seasonal autoregressive polynomial, 𝛷𝑃 (𝐵 𝑠 )

is seasonal autoregressive polynomial, 𝜃𝑞 (𝐵) is non-seasonal
moving average polynomial, 𝛩𝑄 (𝐵 𝑠 ) is seasonal moving average
polynomial, ∇𝑑 ∇𝐷 𝑠 𝑦𝑡 is the seasonally differenced series, 𝐵 is the
back-shift operator, and 𝑒𝑡 is random error.
(6) LSTM: It is a special recurrent neural network, which consists of
memory units, gating units and hidden states, etc. Through the
mechanism of gates, it effectively solves the gradient problem
on long sequences, and is often used to process and predict time
series data.

6.2. CA-Births

To evaluate the effectiveness of our prediction model, we conducted

experiments on the time series presented in Fig. 15(a). We used the first
310 data as the training set and the last 30 data as the testing set. Before
conducting the experiments, we applied the Savitzky–Golay filter to the
data, which could reduce noise and enhance data quality, thus ensuring
the accuracy of the experimental results. For a fair comparison with
six different models, the Savitzky–Golay filtering parameters window
length and order were set to 7 and 2, and the rest of the experiments
are similar.
Fig. 15. Four time series.(a)CA-Births. (b)MT. (c)SMN. (d)USDD. One should underscore that the prediction model based on the if-
then rule in this paper is different from the neural network models such
as LSTM. The LSTM and SVR in the compared models have a look-back
6. Experimental studies value parameter, which is usually determined based on the period of
the data, but the proposed model does not have an explicitly specified
The proposed granule prediction model is used to predict California look-back parameter, but rather uses the last LFIG of the data as an
female daily births (CA-Births), Melbourne temperature (MT) (Yang antecedent in the if-then to search for its multilevel optimal FARs for
et al., 2017), monthly sunspot number (SMN) (Li et al., 2021) and U.S. the prediction, so the look-back can be regarded as the last size of the
electricity daily demand(USDD) as shown in Fig. 15. We performed LFIG.
an empirical analysis to confirm the preliminary effectiveness of the For a comprehensive comparative analysis, we set the prediction
proposed model through a comparison of predictive results with six range 𝐿 to 10, 20 and 30. Taking the prediction of data point 𝑥312 (312 =
alternative models presented in Section 6.1. Finally, in Section 6.6, 𝑁 ∗ 𝑇 + 2) as an example, we can decompose the prediction process of
to further validate the effectiveness of the model, further comparative 𝑥312 into the following key steps.
experiments were conducted on the M3 Competition(3003 series), M4
Competition(100,000 series), and M5 Competition time series datasets (1) Step 1. Linear granulation based on breakpoints and get LFIGs.
(MC345). For the time series of CA-Birth data, the PSO is used to optimize
the segmentation of breakpoints. The optimal breakpoints are
6.1. Comparison models achieved by setting the swarm size to 10, the inertia weighting
coefficient to 0.5, and both acceleration constants to 1, with 100
The following six classic models are selected in this paper. iterations. The range of stable trend time span threshold 𝑘 is set
2 and 5, and the range of fast uptrend threshold 𝑝 is set 0.5 and
(1) LFIG-based prediction model: All association rules play a pre- 1.5. Finally we got the optimal breakpoints, {0, 2, 6, … , 305, 309}.
dictive role, the only difference being their degree of influence, The time series is segmented using breakpoints into multiple
which is determined by the similarity of the rules’ antecedents sub-sequences and granulated into LFIG by applying Eq. (3) to
to the current data. generate the granular time series
(2) LGKP: Using k-medoids clustering, available association rules { }
with high similarity to the current data antecedents are selected 𝐺𝑇 𝑆 = 𝐴1 , 𝐴2 , … ..𝐴68 .
to improve the diversity and accuracy of the predictions.
(2) Step 2. Clustering granular time series by DFCM.
(3) Rational granulation prediction based on FCM(RGP): A refined
The granular time series 𝐺𝑇 𝑆 is clustered based on DTW-i𝐿1
fuzzy c-means approach is introduced, which eliminates the
distance, and the number of clusters is set to 8. After clustering
requirement for uniform dimensionality across the data, and
model is built by combining the improved fuzzy cluster and { } { } ∑8
𝐴1 , 𝐴2 , … , 𝐴68 ∈ 𝐶 𝐶1 , 𝐶2 , … , 𝐶8 , 𝐶𝑖 = 𝐺𝑇 𝑆.
information granulation.
𝑖=1
(4) SVR: The goal of SVR is to find a regression function 𝑓 (𝑥) that fits
the training data with the lowest possible generalization error (3) Step 3. Constructing FARs and extracting multilevel optimal
rules for 𝐴68 .
𝑓 (𝑥) = ⟨𝑤, 𝜙(𝑥)⟩ + 𝑏 (26) 𝑥312 is within the 69th interval. So our goal is to forecast the
where 𝜙(𝑥) maps 𝑥 into a higher dimensional feature space, 𝑤 69th interval. The 68th granule 𝐴68 is used as the prediction
is the parameter vector to be optimized, 𝑏 is the bias term, and antecedent, and 𝐴68 is discretized into 8 clusters based on the
⟨𝑤, 𝜙(𝑥)⟩ denotes the inner product of 𝑤 and 𝜙(𝑥). membership, and the optimal association rules for the 8 levels
(5) SARIMA: A model for processing time series data with seasonal are obtained by the MFOE algorithm
patterns and the expression is {
𝐹 𝐴𝑅𝑠8 = 𝐴51 → 𝐴2 , 𝐴864 → 𝐴65 , 𝐴28 → 𝐴9 , 𝐴158 → 𝐴59 ,
}
𝜙𝑝 (𝐵)𝛷𝑃 (𝐵 𝑠 )∇𝑑 ∇𝐷 𝑠
𝑠 𝑦𝑡 = 𝜃𝑞 (𝐵)𝛩𝑄 (𝐵 )𝑒𝑡 (27) 𝐴463 → 𝐴64 , 𝐴757 → 𝐴58 , 𝐴342 → 𝐴43 , 𝐴656 → 𝐴57 .

13
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959

Fig. 16. Evaluations of predictive precision for CA-Births. (a) Comparisons of RMSE, MAE and MAPE. (b) Comparisons of 𝑑𝑖 .

Table 4
Comparison of RMSE, MAE and MAPE of CA-Births time series.
Model Forecasting horizon:10 Forecasting horizon:20 Forecasting horizon:30
RMSE MAE MAPE(%) RMSE MAE MAPE(%) RMSE MAE MAPE(%)
LFIG 5.18 4.60 11.61 4.48 3.78 9.15 5.96 5.02 12.27
LGKP 4.64 3.97 9.41 6.20 5.45 12.07 5.96 5.22 11.65
RGP 12.08 11.28 28.70 9.48 8.00 19.58 10.29 8.81 21.79
SVR 5.95 4.67 12.23 4.71 3.43 8.64 7.02 4.99 12.76
SARIMA 5.74 5.19 13.28 4.84 4.01 9.81 5.14 4.27 10.60
LSTM 6.27 5.26 13.69 5.38 4.58 11.08 5.08 4.33 10.50
Proposed 2.94 2.30 5.98 2.94 2.40 5.71 2.90 2.45 5.83