Double Level Optimal Fuzzy Association Rules Prediction - 2024 - Expert Systems
Double Level Optimal Fuzzy Association Rules Prediction - 2024 - Expert Systems
Double-level optimal fuzzy association rules prediction model for time series
based on DTW-i𝐿1 fuzzy C-means
Sidong Xian a,b ,∗, Chaozheng Li a , Miaomiao Feng b , Yonghong Li b
a
Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, PR China
b
Key Laboratory of Intelligent Analysis and Decision on Complex Systems, Chongqing University of Posts and Telecommunications, Chongqing 400065, PR China
Keywords: Information granulation theory has been widely used in short-term time-series forecasting research and
Granular time series holds significant weight. However, the error accumulation due to the lack of granular accuracy, along with
Linear fuzzy information granules information redundancy or deficiency in predictions, significantly affects short-term prediction accuracy.
Breakpoint
To compensate for these shortcomings, this paper proposes a double-level optimal fuzzy association rules
Granule-suited fuzzy c-means
prediction model for short-term time-series forecasting, which can strengthen the performance of information
Short-term forecasting
granulation in prediction. Firstly, this paper proposes a concept of breakpoints, which can accurately segment
complex linear trends in time series and thus obtain a granular time series with highly accurate linear
fuzzy information granules (LFIGs). Secondly, a improved distance is proposed to more accurately reflect the
similarity between LFIGs by addressing counter-intuitive problems in the original distance. Theoretical analysis
shows that the improved distance can effectively reduce errors in granular calculation. Then, a granule-suited
fuzzy c-means algorithm is proposed for clustering LFIGs. Finally, this paper proposes a double-level optimal
fuzzy association rules prediction model, which establishes the optimal rules for each cluster and selects the
optimal two rules for prediction by the contribution of the clusters. The experimental results show that the
prediction method effectively avoids the problems of information redundancy and information deficiency,
and increases forecast accuracy. The model’s exceptional performance is demonstrated through comparative
analysis with existing models in experimental investigations.
∗ Corresponding author at: Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing
400065, PR China.
E-mail addresses: [email protected] (S.D. Xian), [email protected] (C.Z. Li), [email protected] (M.M. Feng), [email protected] (Y.H. Li).
https://doi.org/10.1016/j.eswa.2024.123959
Received 28 November 2023; Received in revised form 9 March 2024; Accepted 8 April 2024
Available online 21 April 2024
0957-4174/© 2024 Elsevier Ltd. All rights reserved.
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
functions that can be selected for different data characteristics (Fan linear information granules by introducing the improved FCM and
& Sharma, 2021). However, SVR is highly sensitive to parameters, improved distance formulas, respectively. All the models mentioned
requires precise tuning, and may face high computational complexity above rely on time series granulation with unequal sizes, which can
when dealing with large-scale datasets (Guajardo, Weber, & Crone, well characterize the trend in each granular data.
2010; Li, Fang, & Liu, 2018). Neural network models are widely used Currently, 𝑙1 trend filtering (Li et al., 2021; Yang et al., 2022)
for prediction, such as long short-term memory networks (LSTM), gated and turning point (Chen & Gao, 2020; Yin, Si, & Gong, 2011) are
recurrent units (GRU) and convolutional neural networks (CNN). They the main methods for the partitioning of linear trend window in time
have strong feature extraction and generalization capabilities and do series. 𝑙1 trend filtering method is too dependent on parameter set-
not require many assumptions about the data (Ma, Tao, Wang, Yu, & tings, and improper parameter settings can cause incorrect division.
Wang, 2015; Niu, Yu, Tang, Wu, & Reformat, 2020; Zhang & Dong, It also does not identify and reflect the change well when there is
2020). Among them, LSTM and its various improved versions (Fang, a significant step change in data. In addition, the roughness of turn-
Ma, Pan, Yang, & Arce, 2023; Lihong & Qian, 2020; Zha, Liu, Wan, ing point segmentation can lead to redundant segmentation points,
Luo, Li, Yang, & Xu, 2022) have shown excellent performance in resulting in overly detailed results that make it difficult to accurately
dealing with highly temporally correlated data, and have become the capture true segmentation points or significant trend changes. Thus,
preferred forecasting model in the current era of deep learning. How- LFIGs constructed based on the existing 𝑙1 trend filtering and turning
ever, these models have the disadvantages of being data-dependent, points segmentation methods cannot accurately capture and reflect the
parameter-sensitive and poorly interpretable, and are unable to provide trend variation in data. At the same time, there is a lack of effective
generalized solutions in all cases. For short-term forecasts, the accu- algorithms to measure the similarity between LFIGs of unequal size.
mulation of errors in the one-step forecasts of these models remains Duan et al. (Duan et al., 2018) improved the distance calculation for
problematic and does not allow a good fit for future trends. non-uniform size LFIGs compared to the original equal size method,
In order to make forecasts more relevant to actual trends, more but the method cannot accurately reflect the trend similarity of two
accurate capture and modeling of trend information is essential. Some granules. Yang et al. (Yang et al., 2022) improves the counterintuitive
time series models based on fuzzy information granules (FIGs)(Wang,
drawback to a certain extent, but some of the original distance-related
Liu, & Chen, 2020; Wang, Liu, Pedrycz, & Shao, 2014; Wang, Pedrycz,
issues still endure.
& Liu, 2015; Zhao, Han, Pedrycz, & Wang, 2015) have been proposed,
In addition, a forecasting approach using fuzzy inference system
which show better stability in short-term forecasting. FIGs are formed
and multi-order fuzzy logic relationships was proposed by Yu Chuan
by extracting essential information from data, which can better handle
et al. (Chang, Chen, & Liau, 2008). Yang et al. (Yang et al., 2017)
uncertainty and ambiguity through fuzzy set theory and linguistic
performed weighted predictions by combining LFIGs and this method;
descriptions, and its inherent fuzziness makes it more robust to missing
Wang et al. (Wang et al., 2015) performed weighted predictions by
and noisy data. The model constructed by FIGs is simpler and has a
combining granular time series(GTS) after FCM clustering and this
shorter reliance on historical data, which helps to reduce cumulative
method. In the current study, too many historical factors and too many
errors in model predictions. Meanwhile, this method can extract trends
weighted averages are incorporated into the forecasts, and this kind of
and correlations from the data, making it more sensitive to recent
fuzzy weighted forecasts will lead to a smooth trend in the forecasts,
changes in patterns and enabling more accurate predictions in the short
and cannot predict the future trend well. Finding an algorithm that
term (Yang, Yu, & Pedrycz, 2017). In contrast, neural network models
such as LSTM have a strong dependence on specific historical values, reduces the weighting error is an important direction of research.
so they perform poorly at predicting flexibility. In addition, the LSTM Therefore, to better capture the features of trends with increased
models are more sensitive to data noise and anomalies (Liu, Xiang, & precision and leverage them for more precise forecasting, this paper
Elangovan, 2023). Training the model demands an extensive volume proposes a new concept of breakpoints for segmenting linear trends
of data (Cao, Zhu, Sun, Wang, Zheng, Xiong, Hou, & Tian, 2018; Qian in time series, thus enabling accurate granulation of linear trends for
& Chen, 2019), and if the available data is limited, it will result in time series. Meanwhile, we propose a new method for calculating the
poor model performance. So models based on FIGs work better in short- distance between granules by improving and optimizing the original
term forecasting. These methods help to reduce error accumulation by inter-granule distance calculation, as an effective tool for measuring
constructing fuzzy sets with equal-size granules to express the overall the similarity between granules. Based on this new distance measure-
trend of the time series (Li, Yang, Yu, Wang, & Wang, 2019; Lu, ment method, we devise a granule-suited FCM algorithm, offering
Pedrycz, Liu, Yang, & Li, 2014; Yang et al., 2017). Since these equal- robust assistance in granule clustering tasks. In order to obtain cor-
size FIGs are limited by the fixed-size property, they cannot accurately relations between linear trends and more comprehensive forecasts,
represent trend information of different sizes and thus suffer from a we propose a multilevel fuzzy association rules (FARs) optimization
lack of precision. This also makes it difficult for them to meet the strict extraction algorithm. Compared with the former FARs, the multilevel
requirements in semantic representation. Therefore, these equal-sized FARs the algorithm constructs can capture the multilevel effects of past
FIGs models still need to be improved in terms of accuracy and trend trends on subsequent trends. On this basis, we construct a double-level
representation ability. optimal fuzzy association rules prediction model to achieve short-
To precisely grasp the features of the time series and enhance the term forecasting of time series. Experimentally proven, our model
predictive precision, many scholars have established unequal-size FIGs, has demonstrated broad applicability in a number of fields, and in
which are mainly divided into non-linear trend FIGs and linear trend particular has shown compelling and excellent potential for forecasting
FIGs. For non-linear FIGs, Xian et al. (Xian, Feng, & Cheng, 2023) in areas such as demography, meteorology, physics and electricity. The
added time-varying core lines to generate non-linear trend granules novel contributions of this study can be outlined as follows:
based on linear trend granules, combined with DeeAR networks for
forecasting. Cheng et al. (Cheng, Xing, Pedrycz, Xian, & Liu, 2023) • A concept of breakpoints has been proposed to capture linear
constructed a nonlinear trend fuzzy granulation method for dynamic trend changes in time series, thus achieving precise granulation
traffic flow prediction. Wang et al. (Wang et al., 2015) combined of time series.
non-linear trend granulation with improved fuzzy c-means (FCM) for • An improved distance is proposed, which can measure the simi-
forecasting. For linear FIGs, Li et al. (Li et al., 2021) constructed a larity among LFIGs more accurately.
multilinear trend fuzzy information granulation for prediction using • Granule-suited fuzzy c-means clustering is proposed.
linear information granules of unequal sizes. Yang (Yang et al., 2022) • A double-level optimal fuzzy association rules prediction model
and Duan et al. (Duan, Yu, Pedrycz, Wang, & Yang, 2018) clustered is proposed for short-term forecasting.
2
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
The remainder of the paper follows this structure. Section 2 briefly 2.1.3. Gaussian linear Fuzzy information granules
reviews some related concepts. Section 3 proposes a new concept of Linear fuzzy information granules are a continuation of Gaussian
breakpoints and establishes a breakpoint-based linear granulation. To fuzzy numbers, namely Gaussian linear fuzzy information granules. It
evaluate the similarity between these granules more accurately and extends the core feature 𝜇 of Gaussian fuzzy numbers from a static
classify them, Section 4 makes improvements to the original distance value to a linear time-dependent function 𝜇(𝑡) = 𝑘𝑡 + 𝑏 to better fit
between granules and proposed a fuzzy c-means clustering suitable the variation in trend and the span of fluctuation in original data.
for granules. On this basis, in Section 5, a double-level optimal fuzzy Similar to Gaussian fuzzy numbers, LFIGs are represented in the
association rules prediction model for short-term forecasting is con- form of 𝐺(𝑘, 𝑏, 𝜎, 𝑇 ), where 𝑘, 𝑏, 𝜎, and 𝑇 correspond to the charac-
structed. Section 6 gives four experiments to confirm the efficacy of teristic parameters of the Gaussian function 𝑓 . 𝑓 (𝑥; 𝑘𝑡 + 𝑏) represents
the suggested model. Finally, Section 7 concludes this paper. the membership degree of 𝑥 at time 𝑡 belonging to 𝐺(𝑘, 𝑏, 𝜎, 𝑇 ). The
membership function of Gaussian LFIGs is(Yang et al., 2017):
2. Related works (𝑥 − (𝑘𝑡 + 𝑏))2
𝑓 (𝑥; 𝑘𝑡 + 𝑏) = 𝑒𝑥𝑝(− ), 𝑡 ∈ [0, 𝑇 ] (3)
2𝜎 2
In this section, we will briefly describe the concept of Gaussian
where 𝜇(𝑡) = 𝑘𝑡 + 𝑏 is the time-varying core line, 𝑘 and 𝑏 ∈ 𝑅 are
Linear Fuzzy Information Granules, and introduce the distance calcula-
the slope and intercept of the core line respectively. 𝑥 is the real value
tion method between Gaussian LFIGs and Sequences Scale Equalization,
of time series and 𝜎 is the dispersion of the data around 𝜇(𝑡), and 𝑇
which will provide important theoretical background for the study.
is the length of the corresponding line partition. LFIG is expressed as
𝐺(𝑘, 𝑏, 𝜎, 𝑇 ). The parameters are generated by linear regression.
2.1. Linear Fuzzy information granules
3
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
{ } { }
Given two sequences 𝑋 = 𝑥1 , 𝑥2 , … , 𝑥𝑛𝑥 and 𝑌 = 𝑦1 , 𝑦2 , … , 𝑦𝑛𝑦 ,
{ }
𝐻 is a matrix represented as 𝐻 = ℎ𝑖𝑗 𝑛 ×𝑛 , and its elements are
𝑥 𝑦
computed as follows:
{ }
ℎ𝑖𝑗 = |𝑥𝑖 − 𝑦𝑗 | + 𝑚𝑖𝑛 ℎ𝑖−1,𝑗 , ℎ𝑖−1,𝑗−1 , ℎ𝑖,𝑗−1 . (6)
The linear trend granulation method works better when used for This subsection first discusses how breakpoints are obtained and
forecasting, but traditional time series partition methods do not capture then looks at how they can be optimized to get the best breakpoints.
linear trends well. Therefore, we propose a novel method for partition-
ing the linear trend of a time series, which ensures that the granule 3.2.1. Getting breakpoints { }
maximally matches the trend information of the time series. Given a time series 𝑋 = 𝑥{ 1 , 𝑥2 , … , 𝑥𝑛 , compute the first-order
}
{ } difference of 𝑋, denoted 𝐷 = 𝑑1 , 𝑑2 , … , 𝑑𝑛−1 . For the uptrend of
Definition 1 (Breakpoints). For given a time series 𝑋 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 , the time series, we classify it into two types, ‘‘slow uptrend’’ and ‘‘fast
{ }
if there is a stable slow uptrend 𝑥𝑖 , 𝑥𝑖+1 , … , 𝑥𝑗 transforms to another uptrend’’, and set the corresponding threshold 𝑝 to distinguish these
{ }
stable fast uptrend 𝑥𝑗+1 , 𝑥𝑗+2 , … , 𝑥𝑘 , then the transition point 𝑥𝑗 is two upward patterns. Specifically, the origin and endpoint of a slow
called a breakpoint. Similarly, if there exists a stable slow downtrend uptrend are obtained in the form 0 ≤ 𝑑𝑖 ≤ 𝑝; the origin and endpoint
{ } {
𝑥𝑖 , 𝑥𝑖+1 , … , 𝑥𝑗 transforms to another stable fast uptrend 𝑥𝑗+1 , 𝑥𝑗+2 , of a fast uptrend are obtained in the form 𝑑𝑖 > 𝑝. Combining both, the
} { }
… , 𝑥𝑘 , or a steady slow uptrend 𝑥𝑖 , 𝑥𝑖+1 , … , 𝑥𝑗 to another steady uptrend is marked as
{ } { }
fast downtrend 𝑥𝑗+1 , 𝑥𝑗+2 , … , 𝑥𝑘 , etc, then the transition point 𝑥𝑗 is
𝑈 𝑝 = (𝑢𝑝𝑆𝑡1 , 𝑢𝑝𝐸𝑑1 ), (𝑢𝑝𝑆𝑡2 , 𝑢𝑝𝐸𝑑2 ), … , (𝑢𝑝𝑆𝑡𝑛1 , 𝑢𝑝𝐸𝑑𝑛1 ) .
called a breakpoint. The usual forms of a breakpoint are as follows:
For a downward trend, we can also use a symmetric negative threshold
(1) A transition point from uptrend to downtrend, i.e., a peak point.
−𝑝 for similar hierarchical partitioning, obtaining
(2) A transition point from downtrend to uptrend, i.e., a trough { }
point. 𝐷𝑛 = (𝑑𝑛𝑆𝑡1 , 𝑑𝑛𝐸𝑑1 ), (𝑑𝑛𝑆𝑡2 , 𝑑𝑛𝐸𝑑2 ), … , (𝑑𝑛𝑆𝑡𝑛2 , 𝑑𝑛𝐸𝑑𝑛2 ) .
4
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
For a smooth trend, ascertain the origin and endpoint of the trend with In addition, for merging mode 2: if the overall length exceeds the
𝑑𝑖 = 0 and greater than or equal to 𝑘 in a row, and ascertain the origin stability threshold, the entire AS window will be merged; otherwise,
and endpoint of the stable smooth trend merging mode 1 will be used. It is obvious that merging mode 2
{ } converges.
𝑃 𝑙 = (𝑝𝑙𝑆𝑡1 , 𝑝𝑙𝐸𝑑1 ), (𝑝𝑙𝑆𝑡2 , 𝑝𝑙𝐸𝑑2 ), … , (𝑝𝑙𝑆𝑡𝑛3 , 𝑝𝑙𝐸𝑑𝑛3 ) . In summary, the merging algorithm converges. □
In addition, for non-stationary trends in the series, which have few
data points and too short a time span, they also increase the complexity Algorithm 1 Optimal Merger Algorithm.
of the model. They are merged as follows { }
Input: 𝑈 𝑑 𝑢𝑑1 , 𝑢𝑑2 , ..., 𝑢𝑑𝑛𝑢 , stable trend threshold 𝑘.
(1) Step 1, dealing with non-stationary trends of continuous up- Output: Breakpoints 𝐵𝑝.
trend and continuous downtrend. 1: Calculate the length of each segment within 𝑈 𝑑 to get 𝑤𝑖𝑑𝑡.
2: Find consecutive subscript partitions in 𝑤𝑖𝑑𝑡 less than or equal to 𝑘 to the
In an uptrend 𝑈 𝑝, if there exists a set of continuous non-
array 𝑖𝑛𝑑𝑖𝑐𝑒𝑠.
stationary trends with a sum of time spans greater than or equal
3: for each indice in 𝑖𝑛𝑑𝑖𝑐𝑒𝑠, from back to front:
to 𝑘, the continuous window is merged and treated as a special
4: Calculate the length of each indice as 𝑙𝑒𝑛.
stable upward sequence. As shown in Fig. 2. Downtrend 𝐷𝑛 5: for each subinterval, from back to front:
ditto. Then, the starting points of the uptrend 𝑈 𝑝 and down- 6: 𝑟1 is 𝑅2 value merged with the previous interval.
trend 𝐷𝑛, 𝑢𝑝𝑆𝑡 and 𝑑𝑛𝑆𝑡, are combined
{ with the stable
} smooth 7: 𝑟2 is 𝑅2 value merged with the latter interval.
trend 𝑃 𝑙 and denoted as 𝑈 𝑑 = 𝑢𝑑1 , 𝑢𝑑2 , … , 𝑢𝑑𝑛𝑢 , in prepa- 8: if 𝑟1 ≥ 𝑟2 , then
ration for handling of the non-stationary trends of continuous 9: Select to merge with the previous interval.
up-downtrend later. 10: if Length of the merged interval ≥ 𝑘, then
(2) Step 2, dealing with non-stationary trends of continuous up- 11: Move two places forward in this 𝑓 𝑜𝑟.
12: endif
downtrend { and single non-stationary
} trend.
13: elseif 𝑟1 < 𝑟2 , then
For 𝑈 𝑑 = 𝑢𝑑1 , 𝑢𝑑2 , … , 𝑢𝑑𝑛𝑢 , with 𝑢𝑑𝑖 as the splitting point, 14: Select to merge with the latter interval.
if there are single non-stationary trends or consecutive non- 15: endif
stationary up-downtrends with length sum less than 𝑘, merge 16: endfor
them with a front or back window based on the 𝑅2 value. If 17: Calculate of the average value of 𝑅2 for the merged segment, 𝑅1 , and
the sum of the lengths of the consecutive non-stationary up- returns 𝑈 𝑑1 .
downtrends is greater than or equal to 𝑘, determine whether it 18: if 𝑙𝑒𝑛 ≥ 𝑘, then
is appropriate to form a special stable trend by local 𝑅2 aver- 19: Merge all subintervals of the partition and compute the partition
aging, as shown in Fig. 3, and if not, merge the non-stationary 𝑅2 noted as 𝑅2 , and returns 𝑈 𝑑2 .
20: endif
window with the front or back window. { Details in Algorithm
} 1. 21: if 𝑅1 ≥ 𝑅2 , then
Eventually we get the breakpoints 𝐵𝑝 = 𝑏𝑝1 , 𝑏𝑝2 , … , 𝑏𝑝𝑛𝑏 . 22: 𝐵𝑝 is updated from 𝑈 𝑑1 .
23: endif
Theorem 3.1 (Convergence of Optimal Merger Algorithm). For any con- 24: else, then
tinuous non-stationary window series 𝐴𝑆 = [𝑎1 , 𝑎2 , … , 𝑎𝑛1 ] in the window 25: 𝐵𝑝 is updated from 𝑈 𝑑2 .
{ } 26: endif
series segmented by 𝑈 𝑑 𝑢𝑑1 , 𝑢𝑑2 , … , 𝑢𝑑𝑛𝑢 , the number of non-stationary 27: endfor
windows will converge to 0 after the algorithm runs. 28: return Breakpoint 𝐵𝑝.
5
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Fig. 4. Three types of 𝑘 get breakpoints. (a)Suitable 𝑘. (b)Too small 𝑘. (c)Too large 𝑘.
Fig. 5. Three types of 𝑝 get breakpoints. (a)Suitable 𝑝. (b)Too small 𝑝. (c)Too large 𝑝.
{ }
𝑥1 , 𝑥2 , … ..𝑥𝑛 , partition the trend based on the initial thresholds 𝑘 Table 1
{ } Comparison of the results of the three division methods.
and 𝑝, obtaining 𝑚 windows 𝑇 𝑆𝑊 𝑊1 , 𝑊2 , … , 𝑊𝑚 . Next, calculate
the coefficient of determination of linear regression for each window 𝑙𝑚 𝑟𝑚 𝑙𝑟
{ }
𝑊𝑖 , denoted as 𝑇 𝑆𝑅 𝑅21 , 𝑅22 , … , 𝑅2𝑚 . Using the mean value of 𝑇 𝑆𝑅 as Turning Points 2.94 0.96 2.81
𝑙1 Trend Filter 3.03 0.63 1.91
the objective function 𝐽 (𝑘, 𝑝) for particle swarm optimization.
Breakpoints 5.88 0.73 4.28
𝑅21 + 𝑅22 + ⋯ + 𝑅2𝑚 Note: A value in bold is the optimal index.
𝐽 (𝑘, 𝑝) = (9)
𝑚
The thresholds 𝑘 and 𝑝 are optimized iteratively to maximize the
objective function, thus obtaining the best window partition results. method is able to segment the time series better, avoiding the problem
This method harnesses the complete global search potential of PSO of redundant segmentation points and showing better overall results
and can effectively find the optimal breakpoints of the time series. The and metrics.
formula is as follows: The steps to granulate the time series are as follows:
𝑘, 𝑝 = arg max 𝐽 (𝑖𝑘 , 𝑖𝑝 ) (10)
𝑖𝑘 ∈[𝑎,𝑏],𝑖𝑝 ∈[𝑐,𝑑] (1) Determine the optimal threshold 𝑘, 𝑝 by the PSO to obtain the
optimal breakpoints.
where [𝑎, 𝑏] and [𝑐, 𝑑] are the constraint boundaries for 𝑖𝑘 and 𝑖𝑝 , (2) Partition the time series into several windows based on the
respectively.
breakpoints.
The breakpoints optimized by PSO provide better linear partition.
(3) Construct a LFIG on each window to obtain a granular time
As shown in Fig. 6. Compared with the 𝑙1 trend filter partition and
series.
turning points partition on a same random dataset, it can be seen that
the partition based on breakpoints can segment the trend of data more
accurately. To better illustrate the benefits of breakpoints, we use 𝑙𝑚 4. DTW-i𝑳𝟏 Fuzzy C-means
and 𝑟𝑚 as the main evaluation metrics, where 𝑙𝑚 is the window length
mean and 𝑟𝑚 is the 𝑅2 mean. It should be noted that there is no fixed
In the Fuzzy C-means(FCM) clustering, distance calculations are
standard for 𝑙𝑚, it should be judged according to the effect diagram.
critical to the quality of the results. It should be noted that the size
𝑟𝑚 is bigger, the better the effect. The value of 𝑟𝑚 depends on the
of each LFIGs is not uniform. Although previous research has proposed
value of 𝑙𝑚, 𝑙𝑚 is smaller, 𝑟𝑚 tends to be larger. So to make a more
a method for calculating distances between unequal size LFIGs, it can
accurate comparison, we use 𝑙𝑚 ∗ 𝑟𝑚(𝑙𝑟) as a new comparison metric,
sometimes yield counter-intuitive results. To attain a more reasonable
and obviously the larger the 𝑙𝑟, the better the partitioning effect.
clustering, we need to improve the calculation of the LFIGs distance to
produce clustering results that are more in line with expectations. In
Definition 2 (lr). Given a time series 𝑋 = {𝑥1 , 𝑥2 , … , 𝑥𝑛 }, 𝑛 ∈ R, let
Section 4.1, we improve the intergranular distance calculation method
the window series be denoted as 𝑊 𝑆 = {𝑊1 , 𝑊2 , … , 𝑊𝑚 }, 𝑚 ∈ R, by
partitioning, and 𝐿𝑆 = {𝑙1 , 𝑙2 , … , 𝑙𝑚 } and 𝑅𝑆 = {𝑅1 , 𝑅2 , … , 𝑅𝑚 } are and solve the previous problem by proposing a completely new method
length series and coefficient of determination of linear regression 𝑅2 for calculating the distance. In Section 4.2, we present a DTW-i𝐿1 Fuzzy
series of 𝑊 𝑆, respectively. The product of the length mean (𝑙𝑚) and C-means clustering by combining FCM with the new distance.
the 𝑅2 mean (𝑟𝑚), denoted as 𝑙𝑟, is calculated as
6
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Fig. 6. Three results partitions.(a)Turning points partitions. (b)𝑙1 trend filter partitions. (c)Breakpoints partitions.
7
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Fig. 9. Distance calculation of two unequal-size LFIGs. (a) Core lines of two unequal-size LFIGs. (b) Adding a fictional LFIG 𝐺3 in two unequal-size LFIGs. (c) DTW equilibrium
transformation of unequal-size LFIGs to NG.
{ }
Definition 3 (DTW-i𝐿1 Distance). Given a 𝐺𝑇 𝑆 𝐺1 , 𝐺2 , … , 𝐺𝑛 and Theorem 4.4 (Monotonicity). The monotonicity of this distance is specified
obtain the mean size 𝑇 of the granules, use Eq. (8) to equalize the as follows.
granular time series to obtain a new granular time series 𝑁𝐺𝑇 𝑆
{ } • For any 𝑏, 𝜎, 𝑇 equal three granules 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝐺𝑗 (𝑘𝑗 , 𝑏𝑗 , 𝜎𝑗 , 𝑇𝑗 )
𝑁𝐺1 , 𝑁𝐺2 , … , 𝑁𝐺𝑛 , where 𝑁𝐺𝑖 = 𝐺(𝑘′𝑖 , 𝑏′𝑖 , 𝜎𝑖′ , 𝑇 ). The DTW-i𝐿1 dis-
and 𝐺𝑝 (𝑘𝑝 , 𝑏𝑝 , 𝜎𝑝 , 𝑇𝑝 ) with unequal k-values in the three partitioned
tance between any two granules 𝐺𝑖 and 𝐺𝑗 is obtained as follows:
domains of 𝑡∗ , there is |𝑘𝑖 − 𝑘𝑗 | ≤ |𝑘𝑖 − 𝑘𝑝 |, i.e. 𝛥𝑘𝑖𝑗 ≤ 𝛥𝑘𝑖𝑝 , and
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≤ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ).
⎧3 √
2𝜋 • For any 𝑘, 𝜎, 𝑇 equal three granules 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝐺𝑗 (𝑘𝑗 , 𝑏𝑗 , 𝜎𝑗 , 𝑇𝑗 )
⎪ 2 𝛥𝑘𝑇 2 + (𝛥𝑏 + 2 𝛥𝜎)𝑇 , 𝑖𝑓 𝑡∗ < 0
⎪3 √ and 𝐺𝑝 (𝑘𝑝 , 𝑏𝑝 , 𝜎𝑝 , 𝑇𝑝 ) with unequal b-values, there is |𝑏𝑖 − 𝑏𝑗 | ≤
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) = ⎨ 𝛥𝑘𝑇 2 + 2𝜋 𝛥𝜎𝑇 + 𝛥𝑏(𝑡∗ − 𝑇 ), 𝑖𝑓 𝑡∗ ∈ [0, 𝑇 ] |𝑏𝑖 − 𝑏𝑝 |, i.e. 𝛥𝑘𝑖𝑗 ≤ 𝛥𝑘𝑖𝑝 , and in the case of 𝑡∗ < 0 or 𝑡∗ > 𝑇 ,
⎪ 21 2 √
2𝜋
⎪ 2 𝛥𝑘𝑇 2 + (𝛥𝑏 + 2 𝛥𝜎)𝑇 , 𝑖𝑓 𝑡∗ > 𝑇 then 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) < 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ). At 0 ≤ 𝑡∗ ≤ 𝑇 ,
⎩ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≥ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ).
(14) • For any 𝑘, 𝑏, 𝑇 equal three granules 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝐺𝑗 (𝑘𝑗 , 𝑏𝑗 , 𝜎𝑗 , 𝑇𝑗 )
𝑏′𝑖 −𝑏′𝑗 and 𝐺𝑝 (𝑘𝑝 , 𝑏𝑝 , 𝜎𝑝 , 𝑇𝑝 ) with unequal 𝜎-values in the three partitioned
where 𝑡∗ = 𝑘′𝑖 −𝑘′𝑗
, 𝛥𝑘 denotes the absolute value of 𝑘 for 𝑁𝐺𝑖 and 𝑁𝐺𝑗 domains of 𝑡∗ , there is |𝜎𝑖 − 𝜎𝑗 | ≤ |𝜎𝑖 − 𝜎𝑝 |, i.e., 𝛥𝜎𝑖𝑗 ≤ 𝛥𝜎𝑖𝑝 , and
i.e. |𝑘′𝑖 − 𝑘′𝑗 |, and similarly, 𝛥𝑏 = |𝑏′𝑖 − 𝑏′𝑗 |, 𝛥𝜎 = |𝜎𝑖′ − 𝜎𝑗′ |. then 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≤ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ).
• For any 𝑘, 𝑏, 𝜎 equal three granules 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝐺𝑗 (𝑘𝑗 , 𝑏𝑗 , 𝜎𝑗 , 𝑇𝑗 )
Theorem 4.1 (Non-negative). For any two granules 𝐺𝑖 and 𝐺𝑗 , satisfy and 𝐺𝑝 (𝑘𝑝 , 𝑏𝑝 , 𝜎𝑝 , 𝑇𝑝 ) with unequal T-values in the three partitioned
𝑇𝑖 +𝑇𝑗 𝑇𝑖 +𝑇𝑝
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≥ 0, ‘‘=’’ holds if and only if 𝐺𝑖 = 𝐺𝑗 . domains of 𝑡∗ , there is 2
≤ 2
, and then 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≤
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ).
Theorem 4.2 (Symmetry). For any two granules 𝐺𝑖 and 𝐺𝑗 , satisfy
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) = 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑗 , 𝐺𝑖 ). Proof. In the three partition fields of 𝑡∗ , for any 𝑏, 𝜎, 𝑇 equal three gran-
ules 𝐺𝑖 (𝑘𝑖 , 𝑏𝑖 , 𝜎𝑖 , 𝑇𝑖 ), 𝐺𝑗 (𝑘𝑗 , 𝑏𝑗 , 𝜎𝑗 , 𝑇𝑗 ) and 𝐺𝑝 (𝑘𝑝 , 𝑏𝑝 , 𝜎𝑝 , 𝑇𝑝 ) with unequal
Proof. The computational part of DTW-i𝐿1 is the addition of a 𝛥𝑘𝑇 2 k-values, the distances between 𝐺𝑖 , 𝐺𝑗 and 𝐺𝑖 , 𝐺𝑝 are respectively,
to the original 𝐿1 distance, which obviously has the same properties as √
3 2𝜋
the original distance of 𝐿1 , and thus it still obeys non-negativity and 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ) = 𝑇 2 𝛥𝑘𝑖𝑝 + (𝛥𝑏𝑖𝑝 + 𝛥𝜎𝑖𝑝 )𝑇 ,
2 2
symmetry. □ √
3 2𝜋
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) = 𝑇 2 𝛥𝑘𝑖𝑗 + (𝛥𝑏𝑖𝑗 + 𝛥𝜎𝑖𝑗 )𝑇 ,
Theorem 4.3 (Triangle Inequality). For any three granules 𝐺𝑖 , 𝐺𝑗 and 𝐺𝑝 , 2 2
satisfy 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) + 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑗 , 𝐺𝑝 ) ≥ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ). The then
‘‘=’’ holds when 𝐺𝑗 = 𝐺𝑖 or 𝐺𝑗 = 𝐺𝑝 . 1
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ) − 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) = 𝑇 2 (𝛥𝑘𝑖𝑝 − 𝛥𝑘𝑖𝑗 ).
2
Proof. For any three particles 𝐺𝑖 , 𝐺𝑗 and 𝐺𝑝 , DTW-i𝐿1 obtains If |𝑘𝑖 − 𝑘𝑗 | ≤ |𝑘𝑖 − 𝑘𝑝 |, i.e. 𝛥𝑘𝑖𝑗 ≤ 𝛥𝑘𝑖𝑝 , then
𝑁𝐺𝑖 (𝑘′𝑖 , 𝑏′𝑖 , 𝜎𝑖′ , 𝑇𝑖′ ), 𝑁𝐺𝑗 (𝑘′𝑗 , 𝑏′𝑗 , 𝜎𝑗′ , 𝑇𝑗′ ) and 𝑁𝐺𝑝 (𝑘′𝑝 , 𝑏′𝑝 , 𝜎𝑝′ , 𝑇𝑝′ ) by equilib-
rium and adds an additional term 𝛥𝑘𝑇 2 to the original 𝐿1 distance to 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ) − 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 ) ≥ 0.
compute the distance, so we only need to show the triangular inequality Thus, for any 𝛥𝑘𝑖𝑗 ≤ 𝛥𝑘𝑖𝑝 ,
for 𝛥𝑘𝑇 2 , i.e.
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑝 ) ≤ 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝐺𝑗 )
𝛥𝑘′𝑖𝑝 𝑇 2 ≤ 𝛥𝑘′𝑖𝑗 𝑇 2 + 𝛥𝑘′𝑗𝑝 𝑇 2 .
holds. Similarly, the monotonicity of 𝑏, 𝜎, 𝑇 can be proved. □
Simplifying for
For this algorithm, first we improve the equal-size 𝐿1 Hausdorff
𝛥𝑘′𝑖𝑝 𝑇 2 𝑎𝑛𝑑 𝛥𝑘′𝑖𝑗 𝑇 2 + 𝛥𝑘′𝑗𝑝 𝑇 2 distance, which solves the original counterintuitive problem; then, we
introduced the DTW equalization algorithm to improve the unequal
yields
granule distances, making it more in line with trend patterns. Com-
𝛥𝑘′𝑖𝑝 𝑎𝑛𝑑 𝛥𝑘′𝑖𝑗 + 𝛥𝑘′𝑗𝑝 . bining the advantages of the 𝑖𝐿1 distance and the DTW equalization
algorithm, DTW-i𝐿1 can more accurately reflect the inter-granule sim-
There is
ilarity and is more practical. As shown in Table 2, the novel DTW-i𝐿1
|𝑘′𝑖 − 𝑘′𝑗 | + |𝑘′𝑗 − 𝑘′𝑝 | ≥ |𝑘′𝑖 − 𝑘′𝑗 + 𝑘′𝑗 − 𝑘′𝑝 | = |𝑘′𝑖 − 𝑘′𝑝 |, distance is more intuitive.
8
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Table 2 Table 3
Distance Comparison of LFIGs. Comparison of Silhouette Coefficient and Dunn Index of granular time series.
𝐺1∗ , 𝐺2∗ 𝐺1∗ , 𝐺3∗ 𝐺2∗ , 𝐺3∗ Model Clusters:5 Clusters:6 Clusters:7 Clusters:8 Clusters:9 Clusters:10
𝐿1 68.77 50.91 75.03 Si Du Si Du Si Du Si Du Si Du Si Du
𝐷𝑇 𝑊 -𝑖𝐿1 93.77 225.91 275.03
OFCM 0.38 0.39 0.33 0.41 0.34 0.44 0.33 0.49 0.33 0.43 0.29 0.18
DFCM 0.42 0.44 0.44 0.59 0.45 0.94 0.45 0.73 0.44 0.68 0.41 0.71
accompanied by the following restrictions: Fig. 10. Clustering evaluation indicators. (a)Silhouette coefficient. (b)Dunn index.
∑
𝑐
𝑢𝑖,𝑗 = 1, 0 ≤ 𝑢𝑖,𝑗 ≤ 1, 𝑖 = 1, 2, … , 𝑁, (16)
𝑗=1 between the sample and its cluster to assess the accuracy of the clus-
where 𝑢𝑖,𝑗 denotes the membership of granule 𝐺𝑖 in cluster j, 𝑉𝑗 denotes tering. The dunn index calculates the ratio of the minimum inter-cluster
the clustering center of the 𝑗th cluster, m is a fuzzy weighting index, distance to the maximum intra-cluster compactness to quantify the
usually greater than 1, and 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑗 , 𝑉𝑖 ) is the distance between effectiveness of clustering. An aspect warranting accentuation is that
granule 𝐺𝑖 and clustering center 𝑉𝑗 . The computation of the member- the higher the values of silhouette coefficient and dunn index, the better
ship 𝑢𝑖,𝑗 and the clustering center 𝑉𝑗 follows the following formula: the clustering effect.
DFCM and OFCM clustering was performed on the same set of
∑𝑐
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝑉𝑗 ) 2 −1 grain time series, with the number of clusters set between 5 and 10.
𝑢𝑖,𝑗 = ( ( ) 𝑚−1 ) , 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑗 ≤ 𝑐. (17) According to the evaluation indices in Table 3, the DFCM outperforms
𝑘=1
𝐷𝑇 𝑊 -𝑖𝐿1 (𝐺𝑖 , 𝑉𝑘 )
∑𝑁 𝑚 the OFCM in both silhouette coefficient and dunn index, showing a
𝑖=1 𝑢𝑖,𝑗 𝐺𝑖 better performance. It is also clear from Fig. 10 that DFCM is superior
𝑉𝑗 = ∑ 𝑁 , 1 ≤ 𝑗 ≤ 𝑐. (18)
𝑚 to OFCM in terms of metrics. Thus, our study shows that the DFCM
𝑗=1 𝑢𝑖,𝑗
clustering method is more effective in classifying granular time series
The steps to implement the DFCM algorithm are as follows:
compared to OFCM.
(1) Step 1: Set the fuzzy weighting index 𝑚, the number of clusters 𝑐,
the threshold for iteration stopping 𝜖, the initial cluster centroid 5. Double-level optimal fuzzy association rules prediction model
value 𝑉𝑖 (𝑖 = 0, 1, 2....𝑐), the stochastic membership matrix 𝑈 and based on DFCM
the upper bound for the number of iterations 𝑚𝑎𝑥_𝑖𝑡𝑒𝑟.
(2) Step 2: Compute the 𝑐 clusters 𝑉𝑗 (𝑗 = 1, 2....𝑐) using Eq. (18).
When dealing with fuzzy association rules obtained from granular
(3) Step 3: Calculate the objective function 𝐽 according to Eq. (15). time series, different approaches may produce different predictive per-
If the advancement of the objective function relative to the last formance. Therefore, digging deeper into the intrinsic patterns of these
time is less than the threshold of 𝑒𝑝𝑠𝑖𝑙𝑜𝑛 or the maximum number rules is a problem for further research. In this regard, we first propose
of iterations is reached, the algorithm stops.
multilevel FARs optimization extraction based on DFCM algorithm in
(4) Step 4: Calculate the new 𝑈 matrix using Eq. (17). Return to step
Section 5.1, and then establish a double-level optimal fuzzy associa-
2.
tion rules prediction model based on DFCM in Section 5.2. Then, we
(5) Step 5: The maximum membership cluster of each granule will
introduce the entire short-term forecasting process in Section 5.3.
be considered as the final assigned cluster for that granule.
Theorem 4.5 (Convergence of DTW-i𝐿1 Fuzzy C-means). For any given 5.1. Multilevel FARs optimization extraction based on DFCM
initial value of a cluster center 𝑉 , the objective function 𝐽 of the DFCM will
definitely converge. Fuzzy inference systems (FIS) are constructed using a collection
of 𝑖𝑓 − 𝑡ℎ𝑒𝑛 rules and logical operations, where the 𝑖𝑓 part and the
Proof. Since the FCM algorithm using other distance metrics, such 𝑡ℎ𝑒𝑛 part are called the antecedent and the consequent of the rule,
as Euclidean distance and Manhattan distance, is convergent (Saha & respectively. Specifically, FAR can be abbreviated as 𝐴𝑖−1 → 𝐴𝑖 , where
Das, 2018). So according to the equivalence of distance metric, the the antecedent and consequent parts of the association rule are linear
DFCM algorithm using the DTW-𝑖𝐿1 distance metric is also guaranteed granules within the corresponding time window. Different association
to converge. □ rules have varying effects on the prediction process. It is crucial to
To demonstrate the superiority of DFCM, we compare DFCM with select appropriate association rules and accurately quantify their role
original distance-based FCM (OFCM). We introduced the silhouette in prediction to enhance the accuracy and logic of prediction. In this re-
coefficient(Si) and the dunn index(Du) as metrics to evaluate clustering gard, we consider a set of FARs consisting of 𝑁 granules, 𝐴1 , 𝐴2 , … , 𝐴𝑁 .
performance(Hartama & Anjelita, 2022; Ncir, Hamza, & Bouaguel, We propose a multilevel FARs optimization extraction based on DFCM
2021). The silhouette coefficient calculates the degree of correlation for extracting optimal rules of FARs hierarchically.
9
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Definition 4 (Multilevel FARs Optimization Extraction based on DFCM, Similarly, we can get the prediction results of 𝐴𝑁 in all clusters,
{ }
𝑀𝐹 𝑂𝐸). For a given sequence 𝐺𝑇 𝑆 = 𝐴1 , 𝐴2 , … , 𝐴𝑁 , we construct
{ } i.e. 𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙 is
ordinary FARs to obtain 𝐴1 → 𝐴2 , 𝐴2 → 𝐴3 , … , 𝐴𝑁−1 → 𝐴𝑁 . Next, { }
we set the number of DFCM clusters to c, perform DFCM clustering on 𝑟𝑢𝑙𝑒1𝑜𝑝𝑡𝑖𝑐𝑎𝑙 , 𝑟𝑢𝑙𝑒2𝑜𝑝𝑡𝑖𝑐𝑎𝑙 , … , 𝑟𝑢𝑙𝑒𝑐𝑜𝑝𝑡𝑖𝑐𝑎𝑙
{ }
GTS, and then construct FARs within each cluster, obtaining c clusters’ = 𝐴1𝑖1 → 𝐴𝑖1+1 , 𝐴2𝑖2 → 𝐴𝑖2+1 , … , 𝐴𝑐𝑖𝑐 → 𝐴𝑖𝑐+1 ,
FARs
{ } where 𝐴1𝑖1 is the most similar granule to 𝐴𝑁 in cluster 1, and 𝑖1 is the
𝐹 𝐴𝑅𝑠1 = 𝐴1𝑖 → 𝐴𝑖1 +1 , … , 𝐴1𝑖 → 𝐴𝑖𝑛 +1 , … , index and different from 𝑖1 .
1 𝑛1 1
{ }
𝐹 𝐴𝑅𝑠𝑐 = 𝐴𝑐𝑗 → 𝐴𝑗1 +1 , … , 𝐴𝑐𝑗 → 𝐴𝑗𝑛 +1 .
1 𝑛𝑐 𝑐 Remark 3. Since the size of each granule is unequal, in order to make
Assuming the prediction is executed in the (𝑁 + 1)th segment, we the prediction size equal, the prediction result 𝐴𝑖+1 size is uniformly
discretize the antecedent 𝐴𝑁 into c-clusters based on the DFCM, i.e. used as the mean size of the granules in this paper.
⎧𝐴 1 , 𝑢 For any current linear trend granule, MFOE’s goal is to select the
𝑁,1
⎪ 𝑁
⎪𝐴2𝑁 , 𝑢𝑁,2 optimal association rule for each level from the FARs of all levels, and
𝐴𝑁 =⎨
the selection criterion is the optimal value of the match between the
⎪...
⎪𝐴 𝑐 , 𝑢 current granule and the antecedent granule of each FAR in the level,
⎩ 𝑁 𝑁,𝑐
which is determined based on the DTW-i𝐿1 distance. Based on this,
where 𝑐 represents the number of clusters, 𝐴𝑘𝑁 denotes that 𝐴𝑁 is in we propose a Multilevel FARs Optimization Extraction based on DFCM
the 𝑘th cluster, 𝑘 ∈ [1, 𝑐], and 𝑢𝑁,𝑘 denotes the membership of 𝐴𝑁 in Algorithm(Algorithm 2), which outputs optimal rules based on 𝐴𝑁 for
cluster 𝑘. We define Multilevel FARs Optimization Extraction based on all levels.
DFCM as extracting an optimal FAR of 𝐴𝑁 in FARs of each cluster
through 𝐴𝑖𝑁 (𝑖 = 1, 2, … , 𝑐), resulting in c multilevel optimal rules Theorem 5.1 (Convergence of Multilevel FARs Optimization Extraction
{ 1 } { }
𝐴𝑖1 → 𝐴𝑖1+1 , … , 𝐴𝑐𝑖𝑐 → 𝐴𝑖𝑐+1 , denoted as based on DFCM). For a given granular time series 𝐺𝑇 𝑆 = 𝐴1 , 𝐴2 , … , 𝐴𝑁 ,
{ } the result of the algorithm will converge to a set of optimal 𝐹 𝐴𝑅𝑠 =
𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙 = 𝑟𝑢𝑙𝑒1𝑜𝑝𝑡𝑖𝑐𝑎𝑙 , 𝑟𝑢𝑙𝑒2𝑜𝑝𝑡𝑖𝑐𝑎𝑙 , … , 𝑟𝑢𝑙𝑒𝑐𝑜𝑝𝑡𝑖𝑐𝑎𝑙 . { 1 }
𝐴𝑖1 → 𝐴𝑖1+1 , 𝐴2𝑖2 → 𝐴𝑖2+1 , … , 𝐴𝑐𝑖𝑐 → 𝐴𝑖𝑐+1 , where 𝑐 is the number of
Specifically, we will illustrate this with the example of 𝐴𝑘𝑁 (0 ≤ 𝑘 ≤ DFCM and 𝑖1 is the index.
𝑐). The set of granules of cluster 𝑘 is expressed as { }
{ } Proof. For a given granular time series 𝐺𝑇 𝑆 = 𝐴1 , 𝐴2 , … , 𝐴𝑁 ,
𝐶𝑘 = 𝐴𝑘𝑗 | 𝑗𝑖 ∈ [1, 𝑁] , obviously, the number of elements in each cluster after FCM must be
𝑖
greater than or equal to 1. According to the formula
where 𝐶𝑘 denotes the set of granules belonging to the 𝑘th cluster, and
𝐴𝑘𝑗 denotes the 𝑖th granule in the 𝑘th cluster, with subscript 𝑗𝑖 . Forms ⎧ 𝑘
𝑖
of FIS of cluster 𝑘 are as below: ⎪𝐴 → 𝐴𝑗𝑞 +1
𝑟𝑢𝑙𝑒𝑘𝑜𝑝𝑡𝑖𝑐𝑎𝑙 = ⎨ 𝑗𝑞 ,
′ ′ 𝑘 𝑘
Rule 1 : if 𝐴𝑘𝑁 is 𝐴𝑘𝑗 , then 𝐴𝑘𝑁+1 is 𝐴𝑗1 +1 ⎪𝜔𝑞 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝜔𝑖 (𝐴𝑁 , 𝐴𝑗𝑖 ), 𝑖 ∈ [1, 𝑝]
1
⎩
Rule 2 : if 𝐴𝑘𝑁 is𝐴𝑘𝑗 , then 𝐴𝑘𝑁+1 is 𝐴𝑗2 +1
2
⋮ there will definitely be an optimal granule in each cluster, thereby
Rule 𝑛𝑘 : if 𝐴𝑘𝑁 is 𝐴𝑘𝑗 , then 𝐴𝑘𝑁+1 is 𝐴𝑗𝑛 +1 forming an optimal FAR belonging to that cluster. Therefore, this
𝑛𝑘 𝑘
Observation : 𝐴𝑘𝑁 is 𝐴𝑘𝑗 algorithm is guaranteed to converge. □
Conclusion : 𝐴𝑁+1 𝑖𝑠 𝐴𝑗+1
The input to the FIS takes the form of the last granule of a granular
time series, i.e. 𝐴𝑘𝑁 = 𝐴𝑘𝑗 . For a fuzzy rule 𝑖, if the premise Algorithm 2 Multilevel Optimization Extraction based on DFCM.
{ }
Input: Granular Time Series 𝐺𝑇 𝑆 𝐴1 , 𝐴2 , ..., 𝐴𝑁 , the number of clusters 𝑐
𝐴𝑘𝑁 𝑖𝑠 𝐴𝑘𝑗 , Output: 𝐹 𝐴𝑅𝑠 𝑜𝑝𝑡𝑖𝑐𝑎𝑙
𝑖
1: Each granule 𝐴𝑖 in 𝐺𝑇 𝑆 is DFCM clustered.
is true to certain degree, which is defined by a matching degree
2: Each granule 𝐴𝑖 gets a different membership to different clusters.
between observations 𝐴𝑘𝑁 and the antecedent 𝐴𝑘𝑗 , i.e.
3: The maximum membership cluster is taken as its attributed cluster.
𝜔′𝑖 = 𝜔′𝑖 (𝐴𝑘𝑁 , 𝐴𝑘𝑗 ), 4: for cluster 𝑖 in all clusters:
𝑖 5: for 𝐴𝑖𝑗 in cluster 𝑖:
then the conclusion part of rule 𝑖 6: Building the 𝐴𝑖𝑗 → 𝐴𝑗+1 rule.
7: endfor
𝐴𝑁+1 = 𝐴𝑗𝑖 +1 , 8: endfor
9: for cluster 𝑖 in all clusters:
applies as well. Based on this, the optimal rule of 𝐴𝑁 in class k is
10: for 𝐴𝑖𝑗 in cluster 𝑖:
selected from the fuzzy rules of this cluster 𝑟𝑢𝑙𝑒 1, 𝑟𝑢𝑙𝑒 2, … , 𝑟𝑢𝑙𝑒 𝑛𝑘 , 11:
′
Calculate the similarity 𝜔𝑖 (𝐴𝑁 , 𝐴𝑖𝑗 )
which are chosen from among them, as shown in the following formula: 12: Extract the rule with the highest similarity to 𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙
⎧ 𝑘 13: endfor
⎪𝐴 → 𝐴𝑗𝑞 +1 14: endfor
𝑟𝑢𝑙𝑒𝑘𝑜𝑝𝑡𝑖𝑐𝑎𝑙 = ⎨ 𝑗𝑞 (19) 15: Return 𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙
′ ′ 𝑘 𝑘
⎪𝜔𝑞 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝜔𝑖 (𝐴𝑁 , 𝐴𝑗𝑖 ), 𝑖 ∈ [1, 𝑛𝑘 ]
⎩
where 𝜔′𝑖 (𝐴𝑘𝑁 , 𝐺𝑗𝑘 ) is the matching degree of the observation 𝐴𝑘𝑁 with 5.2. Double-level optimal Fuzzy association rules prediction model based on
𝑖
the antecedent 𝐴𝑘𝑗 of the 𝑟𝑢𝑙𝑒 𝑖. Obviously, the larger the distance DFCM
𝑖
between the two granules, the smaller the similarity between the two
granules. Therefore, 𝜔′𝑖 (𝐴𝑘𝑁 , 𝐴𝑘𝑗 ) can be expressed in this form:
𝑖 On the basis of the MFOE algorithm, we construct a double-level
1 optimal fuzzy association rules prediction model based on DFCM for
𝜔′𝑖 (𝐴𝑘𝑁 , 𝐴𝑘𝑗 ) = . (20)
𝑖 𝐷𝑇 𝑊 -𝑖𝐿1 (𝐴𝑘𝑁 , 𝐴𝑘𝑗 ) short-term forecasting.
𝑖
10
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Fig. 11. Compare participating with different clusters on US daily consumptive load.(a)one top cluster. (b)double top clusters. (c)three top clusters. (d)all clusters.
Fig. 12. Compare participating with different clusters on ZhenJiang power consumption.(a)one top cluster. (b)double top clusters. (c)three top clusters. (d)all clusters.
{ }
For a given 𝐺𝑇 𝑆 𝐴1 , 𝐴2 , … , 𝐴𝑁 by DFCM clustering into c clus- FARs decreases and their probability of being noise increases. And the
ters, the MFOE algorithm is used to generate the optimal association presence of noise can bias the predicted trend, so we believe that more
rule accurate predictions are produced when noise is minimized.
{ } We performed three sets of experimental data validation, and
𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙 = 𝐴1𝑖1 → 𝐴𝑖1+1 , 𝐴2𝑖2 → 𝐴𝑖2+1 , … , 𝐴𝑐𝑖𝑐 → 𝐴𝑖𝑐+1 .
Figs. 11, 12, and 13 show four comparisons of three different data
Select top 𝑁𝑚 available association rules for prediction from 𝐹 𝐴𝑅𝑠𝑜𝑝𝑡𝑖𝑐𝑎𝑙 under the same conditions, respectively. The number of clusters for the
based on the descending order of membership, and suppose 𝑁𝑚 avail- three datasets is set to 8. We considered the results for one optimal
able association rules are cluster, double-level optimal clusters, three-level optimal clusters and
{ 𝑘 𝑘𝑁
} all clusters. It can be seen that when an optimal cluster is selected for
𝑘
𝐴𝑖𝑘1 → 𝐴𝑖𝑘1 +1 , … , 𝐴𝑖𝑘𝑗 → 𝐴𝑖𝑘𝑗 +1 , … , 𝐴𝑖𝑘 𝑚 → 𝐴𝑖𝑘𝑁 +1 ,
1 𝑗 𝑁𝑚 𝑚 prediction, there is a deviation between the predicted and actual re-
1 ≤ 𝑘𝑗 ≤ 𝑐, 1 ≤ 𝑗 ≤ 𝑁𝑚 , 𝑖𝑘𝑗 < 𝑁. sults, and this deviation can be clearly seen from their regression lines.
When the number of clusters is 2, the results improve significantly;
The prediction is expressed in the following equation, when the number of clusters is increased to 3, the prediction bias begins
𝑁𝑚
∑ to appear; when all clusters are considered, it is clear that the presence
𝑥∗ (𝑞) = 𝑤𝑗 ∗ 𝐴𝑖𝑘𝑗 +1 (𝑞), 1 ≤ 𝑞 ≤ 𝑇 + 1 (21) of noise causes the prediction bias to increase significantly, and the
𝑗=1 prediction tends toward a flat and sluggish trend. Therefore, we believe
𝑢𝑁,𝑘𝑗 that the prediction is better when 𝑁𝑚 is 2, and propose the double-level
where 𝑤𝑗 = ∑𝑁𝑚 , 𝑢𝑁,𝑘𝑗 is the membership of 𝐴𝑁 in the 𝑘𝑗 th cluster
𝑢
𝑘=1 𝑁,𝑘𝑗 optimal fuzzy association rules prediction model based on DFCM.
and 𝑁𝑚 > 1 is known as the multilevel fuzzy number, which indicates The double-level optimal fuzzy association rules prediction model
the number of selected clusters for prediction, 𝑇 is the mean length of based on DFCM is expressed in the following equation,
all granules.
The size of the number of clusters 𝑁𝑚 used for prediction is related 𝑥∗ (𝑞) = 𝑤1 ∗ 𝐴𝑖𝑘1 +1 (𝑞) + 𝑤2 ∗ 𝐴𝑖𝑘2 +1 (𝑞), 1 ≤ 𝑞 ≤ 𝑇 + 1 (22)
to the complexity and generality of the model. In general, the size 𝑢𝑁,𝑘
1
where 𝑤1 = , 𝑢𝑁,𝑘1 is the membership of 𝐴𝑁 in the 𝑘1 th
of 𝑁𝑚 will vary depending on different rule styles and data condi- 𝑢𝑁,𝑘 +𝑢𝑁,𝑘
1 2
tions to achieve the optimal prediction result. According to Occam’s cluster, 𝑇 is the mean length of all granules.
razor, the model should be kept as simple as possible to avoid over- To clearly illustrate the performance of our model, we introduce the
parameterization and to ensure that the model performs well in diverse following three evaluation metrics.
and intricate scenarios. So we are looking for a 𝑁𝑚 with universal and
(1) Root Mean Square Error (RMSE):
generalized properties. √
In the current rule-based forecasting, if too many FARs are intro- ∑𝑛 ∗ 2
𝑡=1 (𝑥 (𝑡) − 𝑥(𝑡))
duced, the forecasting results may tend to flatten out, making the RMSE = (23)
𝑛
forecasts stagnant. Conversely, if only a single optimal FAR is consid-
ered, important information may be lost, leading to instability in the (2) Mean Absolute Percentage Error (MAPE):
prediction. Thus, among the rules generated by the MFOE algorithm,
1 ∑ |𝑥∗ (𝑡) − 𝑥(𝑡)|
𝑛
the best prediction rules tend to be distributed among the optimal MAPE = × 100 (24)
𝑁𝑚 (1 < 𝑁𝑚 ≤ 𝑐) clusters. We take the rule of an optimal cluster 𝑛 𝑡=1 𝑥(𝑡)
as the basic prediction rule. When 𝑁𝑚 increases to 2, the probability
(3) Mean Absolute Error (MAE):
that a added cluster FAR is available is the highest, which also means
1 ∑| ∗
𝑛
that it is the least likely to act as a rule that interferes with the
MAE = 𝑥 (𝑡) − 𝑥(𝑡)|| (25)
prediction, i.e., noise. As 𝑁𝑚 increases, the value of the added cluster 𝑛 𝑡=1 |
11
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Fig. 13. Compare participating with different clusters on a power load of a Chinese mathematical modeling.(a)one top cluster. (b)double top clusters. (c)three top clusters. (d)all
clusters.
5.3. The entire short-term forecasting process The step is given in Section 5.
(4) Part 4. Double-level optimal fuzzy association rules prediction
The short-term forecasting framework is shown in Fig. 14, and to model based on DFCM for short-term forecasting.
understand the model clearly, the following steps have been devised to Select two rules based on membership. Suppose the first two
describe it in detail. clusters with the largest membership in MFOE are cluster 1 and
To ensure prediction quality, it is necessary to pre-tune the relevant cluster 3, then the predicted double-level available association
hyperparameter settings to enable the model to perform optimally in rules are 𝐴1𝑖1 → 𝐴𝑖1+1 and 𝐴3𝑖3 → 𝐴𝑖3+1 , and the final prediction
predictions. In order to determine the hyperparameters of the model, result of 𝑥∗𝑚 is
the training set is divided into two parts: a training subset and a testing
subset, with the latter being an average granule size. On the testing 𝑥∗𝑚 = 𝜔1 𝐴𝑖1+1 (𝑗) + 𝜔3 𝐴𝑖3+1 (𝑗), 1 ≤ 𝑗 ≤ 𝑇 + 1
subset, we evaluate the performance of varying cluster numbers on 𝑢𝑁,1 𝑢𝑁,3
where 𝜔1 = , 𝜔3 = . The step is given in
prediction, with each number of cluster predicted repeatedly. Based on 𝑢𝑁,1 +𝑢𝑁,3 𝑢𝑁,1 +𝑢𝑁,3
three evaluation indicators, we select the optimal number of clusters Section 5.
and initial cluster center, and apply them to the subsequent actual
The proposed method is an iterative prediction model. In each
prediction process of future data.
prediction iteration, it first makes a prediction based on the existing
(1) Part 1. Linear granulation based data, and then incorporates the prediction result into the original data
{ on breakpoints
} and get LFIGs.
For given a time series 𝑋 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 , we firstly get the to form an extended new data as the input for the next prediction. Due
breakpoints of the time series. Based on the breakpoints, 𝑋 is to the introduction of new data, the breakpoints, granular series, and
divided into 𝑁 subintervals to obtain 𝑁 subsequences. Each sub- granule clustering results will change in each iteration. Consequently,
sequence is granulated into one LFIG by Eq. (3). Finally we get the last LFIG used as the prediction antecedent will also change, result-
the granular time series ing in different multilevel optimal FARs. This process will continue,
{ } with Part 1 to 4 being executed in a loop until the preset predicted
𝐺𝑇 𝑆 𝐴1 , 𝐴2 , … , 𝐴𝑁 .
horizon is reached. The hyperparameters for breakpoints and clustering
The step is given in Section 3. remain unchanged to minimize the differences between iterations.
12
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
6.2. CA-Births
13
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Fig. 16. Evaluations of predictive precision for CA-Births. (a) Comparisons of RMSE, MAE and MAPE. (b) Comparisons of 𝑑𝑖 .
Table 4
Comparison of RMSE, MAE and MAPE of CA-Births time series.
Model Forecasting horizon:10 Forecasting horizon:20 Forecasting horizon:30
RMSE MAE MAPE(%) RMSE MAE MAPE(%) RMSE MAE MAPE(%)
LFIG 5.18 4.60 11.61 4.48 3.78 9.15 5.96 5.02 12.27
LGKP 4.64 3.97 9.41 6.20 5.45 12.07 5.96 5.22 11.65
RGP 12.08 11.28 28.70 9.48 8.00 19.58 10.29 8.81 21.79
SVR 5.95 4.67 12.23 4.71 3.43 8.64 7.02 4.99 12.76
SARIMA 5.74 5.19 13.28 4.84 4.01 9.81 5.14 4.27 10.60
LSTM 6.27 5.26 13.69 5.38 4.58 11.08 5.08 4.33 10.50
Proposed 2.94 2.30 5.98 2.94 2.40 5.71 2.90 2.45 5.83
Note: A value in bold is the optimal index among all the models.
(4) Step 4. Double-level optimal fuzzy association rules prediction the distribution range of 𝑑𝑖 in this model is the narrowest, and the
model based on DFCM for short-term forecasting. disparity from greatest to smallest values is minimal, indicating that the
Selecting the first two rules with the largest membership, i.e., stability of the prediction results is the best. In contrast to other models,
cluster 3 and cluster 6, the available rules for prediction are the proposed model maintains a controllable error level, resulting in
{ 3 } predicted values that closely align with the actual data. This confirms
𝐴42 → 𝐴43 , 𝐴656 → 𝐴57 .
that model possesses merits in improving capabilities for near-future
Finally, the 𝑥312 prediction result predictions and provides an effective technical solution.
In summary, the proposed model shows better predictive effects
𝑥∗312 = 𝜔3 𝐴43 (2) + 𝜔6 𝐴57 (2), than other comparative models in terms of quantitative evaluation
where 𝜔3 = 𝑢 68,3
𝑢
= 0.162
= 0.433, 𝜔6 = 0.567. Therefore indices and prediction curves.
68,3 +𝑢68,6 0.162+0.212
the final forecast result
6.3. MT
𝑥∗312 = 0.433 ∗ 49.285 + 0.567 ∗ 43.571 = 46.05.
The MT time series was selected for the daily minimum tempera-
Merge the predicted results with the existing data into a new tures in Melbourne from 1 January to 20 August 1981. We consider a
dataset, still using the initial hyperparameters for the next round of cycle as 30 days and use the data from the last 30 days as the testing set,
prediction. Repeat this process until the target prediction horizon 30 is while using the previous data as the training set. To comprehensively
reached. Subsequent experimental predictions follow the same process. evaluate the model’s performance in predicting one cycle into the
Table 4 and Fig. 16(a) show models’ results. As the table shows our future, we set the forecast horizon to 10, 20 and 30 days and Table 5
model achieves lower values in the three evaluation metrics of RMSE, and Fig. 19(a) summarize the results of the assessment metrics for
MAE and MAPE compared to other models, and performs optimally. each model under these forecast horizons. Compared with the better
Specifically, when the prediction horizon is 10, the RMSE, MAE and models LSTM, LGKP and RGP in this experiment, overall, the model
MAPE of the proposed model are 2.94, 2.30 and 5.98 respectively, and has obvious advantages in indicators. Specifically, when the prediction
our model exceeds it by 36.64%, 45.62%, and 36.45%, respectively, horizon was 10, although indicators were not optimal, the model was
compared with the optimal model, LGKP, in the comparison models. significantly ahead in reducing prediction error in subsequent experi-
When the prediction horizon is extended to 20 and 30, the proposed ments with prediction horizons of 20 and 30. Overall our model shows
model still has significant advantages over the comparative models. a clear predictive advantage.
To visually assess the prediction capability of the models, the pre- In addition, the prediction curves in Fig. 18 show that the resultant
dicted trend lines of the proposed model and six compared models in curves generated by the model are able to capture the trend of the
the experiment are shown in Fig. 17. Prediction curves of our model are target time series very accurately. It outperforms other models in terms
the fittest to actual observations, indicating that the model accurately of stability and compactness of the distribution of 𝑑𝑖 in Fig. 19(b).
captures the intrinsic pattern of the data. Specifically, The spread between the highest and lowest values of
To assess the models’ predictive precision, we introduce the absolute 𝑑𝑖 is smaller in this model than in all other models. This indicates
prediction error index 𝑑𝑖 , where 𝑑𝑖 is calculated as 𝑑𝑖 = |𝑥𝑖 − 𝑥∗𝑖 |, that the proposed model predicts better stability and consistency on
which represents the absolute difference between the observed value the same data. This provides additional evidence that the proposed
𝑥𝑖 and the predicted value 𝑥∗𝑖 . From Fig. 16(b), it clearly shows that model can offer more precise and trustworthy short-term forecasts.
14
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Fig. 17. The prediction fitting diagram for CA-Births time series.
Table 5
Comparison of RMSE, MAE and MAPE of MT time series.
Forecasting horizon:10 Forecasting horizon:20 Forecasting horizon:30
Model
RMSE MAE MAPE(%) RMSE MAE MAPE(%) RMSE MAE MAPE(%)
LFIG 1.77 1.65 25.43 1.79 1.53 23.81 2.03 1.76 25.44
LGKP 2.22 1.81 32.04 2.02 1.67 28.73 1.94 1.63 26.52
RGP 1.81 1.44 25.39 1.69 1.40 23.04 1.85 1.56 22.59
SVR 1.21 0.95 17.32 1.59 1.29 20.03 2.09 1.72 23.16
SARIMA 2.07 1.77 23.96 1.83 1.59 21.73 2.76 2.24 27.20
LSTM 1.80 1.24 24.76 1.81 1.39 25.00 1.88 1.55 25.13
Proposed 1.88 1.62 25.16 1.55 1.27 19.97 1.49 1.22 18.86
Note: A value in bold is the optimal index among all the models.
Combining the quantitative indicators and prediction curves, it clearly to fit real trends very accurately, especially in the intervals of large
shows that the suggested model performs well compared with other fluctuations. Compared with the curves generated by other models, our
models, providing more accurate and reliable results, which verifies the model has the highest degree of fit and the smoothest changes.
strong adaptability and prediction ability of the model. This experiment is based on real sunspot annual number time series
data, which demonstrates that the learned feature expressions and
6.4. SMN prediction mechanisms of the proposed model are robust to various
categories of time series. Despite the complexity of the nonlinear trend
SMN time series selected from monthly sunspot numbers from data, the proposed model still provides relatively low prediction errors.
January 1749 to April 1772. The SMN time series is characterized by
obvious periodic variations and short-term trends, and its distribution 6.5. USDD
differs from that of the sample data of the two previous experiments.
We used the initial 250 data as a training set and the last 50 as a testing USDD time series selected from U.S. daily consumptive load data
set. For a detailed comparison, we set the prediction ranges to 10, from 27 March 2018 to 1 July 2019. We used the first 400 data for
30 and 50 in the SMN time series experiments. From the quantitative training and the last 60 data for testing. To describe the prediction
results in Table 6 and the visual representation in Fig. 20(a), it can performance in detail, we set the prediction range to 15, 30, and 60
be observed that on this nonlinear non-stationary time series data, the days. Table 7 and Fig. 22(a) show the indicator values and bar charts
RMSE and MAPE evaluation metrics of the model proposed in this for our model and the comparison models. It is clear that our model
paper obtain better values compared to the other comparative models. significantly outperforms the comparison models on all three metrics,
The specific forecast trend plots and 𝑑𝑖 are shown in Figs. 20 and 21(b), indicating that our model performs accurately in predicting outcomes.
and it shows that the resultant curves generated by our model are able Fig. 23 shows the prediction curves of our model and six compared
15
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Fig. 19. Evaluations of predictive precision for MT time series. (a) Comparisons of RMSE, MAE and MAPE. (b) Comparisons of 𝑑𝑖 .
Table 6
Comparison of RMSE, MAE and MAPE of SMN time series.
Model Forecasting horizon:10 Forecasting horizon:30 Forecasting horizon:50
RMSE MAE MAPE(%) RMSE MAE MAPE(%) RMSE MAE MAPE(%)
LFIG 46.38 40.87 37.05 34.68 26.67 26.39 28.56 20.80 27.90
LGKP 43.90 39.29 39.66 32.86 26.60 32.14 28.33 22.15 38.70
RGP 71.15 63.87 74.14 78.23 72.92 96.91 93.57 88.42 171.85
SVR 54.76 52.00 50.76 61.33 58.29 63.17 52.59 47.75 63.94
SARIMA 36.77 32.47 38.24 60.79 52.45 73.77 96.55 84.46 180.95
LSTM 25.44 20.84 20.29 39.48 34.15 36.80 32.68 26.98 38.51
Proposed 22.23 18.62 71.26 18.34 15.22 18.60 19.81 16.24 33.70
Note: A value in bold is the optimal index among all the models.
Fig. 20. Evaluations of predictive precision for SMN. (a) Comparisons of RMSE, MAE and MAPE. (b) Comparisons of 𝑑𝑖 .
Fig. 21. The prediction fitting diagram for SMN time series.
16
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Table 7
Comparison of RMSE, MAE and MAPE of USDD time series.
Model Forecasting horizon:15 Forecasting horizon:30 Forecasting horizon:60
RMSE MAE MAPE(%) RMSE MAE MAPE(%) RMSE MAE MAPE(%)
LFIG 0.73 0.65 6.74 0.74 0.61 6.07 0.93 0.74 6.89
LGKP 0.36 0.28 2.95 0.71 0.53 5.02 1.08 0.89 7.85
RGP 0.56 0.49 5.04 0.82 0.69 6.62 1.32 1.11 9.86
SVR 1.01 0.97 9.98 0.94 0.88 8.67 1.09 1.01 9.37
SARIMA 0.66 0.59 6.10 0.64 0.55 5.37 1.05 0.86 7.62
LSTM 0.57 0.52 5.41 0.47 0.39 3.90 0.69 0.53 4.74
Proposed 0.44 0.41 4.25 0.43 0.38 3.66 0.56 0.44 3.94
Note: A value in bold is the optimal index among all the models.
Fig. 22. Evaluations of predictive precision for USDD. (a) Comparisons of RMSE, MAE and MAPE. (b) Comparisons of 𝑑𝑖 .
Fig. 23. The prediction fitting diagram for USDD time series.
models, and it clearly shows that our curve is fitter to true values. performance, the difference between its value and the results of the
Meanwhile, the boxplot in Fig. 22(b) also highlights that our model comparison model was small.
exhibits superior stability in prediction. For the 120 time series in the M4 dataset, we divided the forecast
horizons into 1, 7 and 14. Table 9 shows that the proposed model
6.6. MC345 performs poorly when the forecast horizon is 1, but performs better
when the forecast horizons are 7 and 14.
For the 100 time series of the M5 dataset, the forecast horizon
To further verify the efficacy of the proposed model, it will be
was divided into 7, 14 and 28. Table 10 shows that the proposed
tested on the M3, M4, and M5 datasets. In order to verify and compare
model works better when the forecast horizon is 7. When the forecast
more efficiently, three models with better performance were selected horizons were extended to 14 and 28, despite the model’s functionality
for comparison, namely LSTM, SVR, and LGKP. Meanwhile, 80, 120 and falling short of the ideal benchmark, the index values were very close
100 time series were randomly selected for experiments on the M3, M4 to the best results. As a result, the proposed model performs well on
and M5 datasets, respectively. Moreover, the predicted horizon of the short-term prediction tasks and shows good generalization ability.
experiment was based on the predicted horizon of the M competition,
where M3, M4, and M5 were 18, 14, and 28, respectively. 7. Conclusion
For the 80 time series of the M3 dataset, the forecast horizon
was divided into 7, 14 and 18. Table 8 shows the average values of This paper focuses on the short-term forecasting problem of time-
the three evaluation metrics, and it can be seen that the RMSE and series based on trend and information mining. In order to effectively
MAE indicators show better results at forecast horizons 7, 14 and 18, capture and portray trend information in time series, we propose a
respectively. Although the MAPE indicator did not achieve the best breakpoint-based linear granulation approach. In order to accurately
17
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Table 8
Comparison of average RMSE, MAE and MAPE for M3 data.
Model Forecasting horizon:7 Forecasting horizon:14 Forecasting horizon:18
RMSE MAE MAPE(%) RMSE MAE MAPE(%) RMSE MAE MAPE(%)
LGKP 293.41 289.99 8.48 283.89 277.46 7.99 281.12 273.17 7.86
SVR 284.77 280.37 7.77 308.13 300.02 8.39 319.26 309.57 8.71
LSTM 295.86 291.63 6.27 332.83 325.23 7.00 348.36 339.02 7.34
Proposed 265.58 261.57 9.35 272.10 265.18 9.46 274.95 266.47 9.49
Note: A value in bold is the optimal index among all the models.
Table 9
Comparison of average RMSE, MAE and MAPE for M4 data.
Model Forecasting horizon:1 Forecasting horizon:7 Forecasting horizon:14
RMSE MAE MAPE(%) RMSE MAE MAPE(%) RMSE MAE MAPE(%)
LGKP 744.90 744.90 21.03 745.89 740.62 20.99 750.74 738.39 21.04
SVR 623.95 623.95 16.12 724.53 717.20 18.52 820.81 806.12 20.66
LSTM 676.57 676.57 14.63 720.35 713.72 15.69 771.82 756.07 16.81
Proposed 719.77 719.77 17.50 706.82 700.04 16.99 693.72 679.27 16.51
Note: A value in bold is the optimal index among all the models.
Table 10
Comparison of average RMSE, MAE and MAPE for M5 data.
Model Forecasting horizon:7 Forecasting horizon:14 Forecasting horizon:28
RMSE MAE MAPE(%) RMSE MAE MAPE(%) RMSE MAE MAPE(%)
LGKP 1.58 1.26 94.25 1.74 1.31 94.07 1.85 1.34 94.25
SVR 1.98 1.79 90.03 1.99 1.75 91.22 2.11 1.83 92.92
LSTM 1.49 1.25 85.45 1.63 1.30 85.54 1.74 1.34 86.41
Proposed 1.47 1.09 87.19 1.73 1.25 90.12 1.91 1.36 92.64
Note: A value in bold is the optimal index among all the models.
measure granule similarity and classify granules, we modified the inter- Data availability
granule distance calculation method and proposed a DFCM clustering
suitable for granules; Note that each granule is correlated with all The data that has been used is confidential.
clusters after clustering by DFCM, which means that there is some
degree of similarity between the samples in each cluster. Therefore Acknowledgments
we propose a multilevel FARs optimization extraction based on DFCM
algorithm. Unlike the past rules, this association rule can dig deeper This work was supported by the Graduate Teaching Reform Re-
into the relationship that exists between forward and backward trends. search Program of Chongqing Municipal Education Commission, PR
On this basis, we propose a double-level optimal fuzzy association rules China (No. YJG212022, No. YKCSZ23121), the Science and technology
prediction model based on DFCM for short-term forecasting. project of Chongqing market supervision and Administration Bureau,
Finally, experiments show that our proposed model exhibits excel- PR China (No. CQSCJG202401), the Chongqing research and innova-
lent performance on publicly available time series data. The model has tion project of graduate students, PR China (No. CYS23436) and the
demonstrated outstanding performance in forecasting demographics, Information Industry Cooperation Research Center project of CQUPT,
meteorology, astronomy and electricity, assisting decision-makers in PR China (No. P2023-55).
devising more precise plans and strategies, which in turn increase
efficiency and reduce costs and risks. Ethical approval
We argue that the choice of association rules plays an important
role in prediction. In the current model, a current trend is used as a This article does not contain any information with human partici-
predicate rule for multilevel prediction. In the future, multilevel pre- pants or animals performed by any of the authors.
diction of multi-order trends could also be considered, thus improving
the efficiency of the selection process. References
CRediT authorship contribution statement Cao, Z., Zhu, Y., Sun, Z., Wang, M., Zheng, Y., Xiong, P., et al. (2018). Improving
prediction accuracy in LSTM network model for aircraft testing flight data. In 2018
IEEE international conference on smart cloud (pp. 7–12). IEEE.
Sidong Xian: Conceptualization, Formulation or evolution of over-
Chang, Y.-C., Chen, S.-M., & Liau, C.-J. (2008). Fuzzy interpolative reasoning for sparse
arching research goals and aims, Writing – review & editing, Project ad- fuzzy-rule-based systems based on the areas of fuzzy sets. IEEE Transactions on Fuzzy
ministration, Supervision, Funding acquisition. Chaozheng Li: Method- Systems, 16(5), 1285–1301.
ology, Creation of models, Programming, Software, Formal analysis, Chen, H., & Gao, X. (2020). A new time series similarity measurement method based
on fluctuation features. Tehnički vjesnik, 27(4), 1134–1141.
Writing – original draft. Miaomiao Feng: Programming, Formal anal-
Cheng, Y., Xing, W., Pedrycz, W., Xian, S., & Liu, W. (2023). NFIG-X: Non-linear fuzzy
ysis. Yonghong Li: Synthesize study data, Writing – review & editing. information granule series for long-term traffic flow time series forecasting. IEEE
Transactions on Fuzzy Systems, http://dx.doi.org/10.1109/TFUZZ.2023.3261893.
Declaration of competing interest Cheng, D., Yang, F., Xiang, S., & Liu, J. (2022). Financial time series forecasting
with multi-modality graph neural network. Pattern Recognition, 121, Article 108218.
http://dx.doi.org/10.1016/j.patcog.2021.108218.
The authors declare that they have no known competing finan- Du, L., Gao, R., Suganthan, P. N., & Wang, D. Z. (2022). Bayesian optimization based
cial interests or personal relationships that could have appeared to dynamic ensemble for time series forecasting. Information Sciences, 591, 155–175.
influence the work reported in this paper. http://dx.doi.org/10.1016/j.ins.2022.01.010.
18
S.D. Xian et al. Expert Systems With Applications 251 (2024) 123959
Duan, L., Yu, F., Pedrycz, W., Wang, X., & Yang, X. (2018). Time-series clustering Qiao, S.-J., Han, N., Zhu, X.-W., Shu, H.-P., Zheng, J.-L., & Yuan, C.-A. (2018). A
based on linear fuzzy information granules. Applied Soft Computing, 73, 1053–1067. dynamic trajectory prediction algorithm based on Kalman filter. Acta Electonica
http://dx.doi.org/10.1016/j.asoc.2018.09.032. Sinica, 46(2), 418.
Fan, M., & Sharma, A. (2021). Design and implementation of construction cost Reza, R. Z., & Pulugurtha, S. S. (2019). Forecasting short-term relative changes in travel
prediction model based on svm and lssvm in industries 4.0. International Journal of time on a freeway. Case Studies on Transport Policy, 7(2), 205–217.
Intelligent Computing and Cybernetics, 14(2), 145–157. Saha, A., & Das, S. (2018). Stronger convergence results for the center-based fuzzy
Fang, Z., Ma, X., Pan, H., Yang, G., & Arce, G. R. (2023). Movement forecasting clustering with convex divergence measure. IEEE Transactions on Cybernetics,
of financial time series based on adaptive LSTM-bn network. Expert Systems with 49(12), 4229–4242.
Applications, 213, Article 119207. http://dx.doi.org/10.1016/j.eswa.2022.119207. Singh, S., Mohapatra, A., et al. (2019). Repeated wavelet transform based ARIMA
Guajardo, J., Weber, R., & Crone, S. (2010). A study on the ability of support vector model for very short-term wind speed forecasting. Renewable Energy, 136, 758–768.
regression and neural networks to forecast basic time series patterns. International http://dx.doi.org/10.1016/j.renene.2019.01.031.
Federation for Information Processing Digital Library, 217(1), 149–157. Soon, K. L., Lim, J. M.-Y., & Parthiban, R. (2019). Extended pheromone-based short-
Guo, L., Li, L., Zhao, Y., & Zhao, Z. (2016). Pedestrian tracking based on camshift term traffic forecasting models for vehicular systems. Engineering Applications of
with Kalman prediction for autonomous vehicles. International Journal of Advanced Artificial Intelligence, 82, 60–75. http://dx.doi.org/10.1016/j.engappai.2019.03.017.
Robotic Systems, 13(3), 120. Swaraj, A., Verma, K., Kaur, A., Singh, G., Kumar, A., & de Sales, L. M. (2021).
Guo, H., Pedrycz, W., & Liu, X. (2018). Hidden Markov models based approaches to Implementation of stacking based ARIMA model for prediction of Covid-19 cases
long-term prediction for granular time series. IEEE Transactions on Fuzzy Systems, in India. Journal of Biomedical Informatics, 121, Article 103887. http://dx.doi.org/
26(5), 2807–2817. 10.1016/j.jbi.2021.103887.
Han, Z., Zhao, J., Leung, H., Ma, K. F., & Wang, W. (2019). A review of deep learning Tian, F. (2019). Autoregressive moving average model based relationship identification
models for time series prediction. IEEE Sensors Journal, 21(6), 7833–7848. between exchange rate and export trade. Cluster Computing, 22, 4971–4977. http:
Hartama, D., & Anjelita, M. (2022). Analysis of silhouette coefficient evaluation with //dx.doi.org/10.1007/s10586-018-2448-9.
euclidean distance in the clustering method (case study: Number of public schools Tian, Y., Zhang, K., Li, J., Lin, X., & Yang, B. (2018). LSTM-based traffic flow prediction
in Indonesia). Jurnal Mantik, 6(3), 3667–3677. with missing data. Neurocomputing, 318, 297–305. http://dx.doi.org/10.1016/j.
Júnior, D. S. d. O. S., de Oliveira, J. F., & de Mattos Neto, P. S. (2019). An intelligent neucom.2018.08.067.
hybridization of ARIMA with machine learning models for time series forecasting. Wang, W., Liu, W., & Chen, H. (2020). Information granules-based BP neural network
Knowledge-Based Systems, 175, 72–86. http://dx.doi.org/10.1016/j.knosys.2019.03. for long-term prediction of time series. IEEE Transactions on Fuzzy Systems, 29(10),
011. 2975–2987.
Khashei, M., Bijari, M., & Hejazi, S. R. (2012). Combining seasonal ARIMA models with Wang, L., Liu, X., Pedrycz, W., & Shao, Y. (2014). Determination of temporal
computational intelligence techniques for time series forecasting. Soft Computing, information granules to improve forecasting in fuzzy time series. Expert Systems
16, 1091–1105. http://dx.doi.org/10.1007/s00500-012-0805-9. with Applications, 41(6), 3134–3142.
Li, S., Fang, H., & Liu, X. (2018). Parameter optimization of support vector regression Wang, W., Pedrycz, W., & Liu, X. (2015). Time series long-term forecasting model based
based on sine cosine algorithm. Expert Systems with Applications, 91, 63–77. http: on information granules and fuzzy clustering. Engineering Applications of Artificial
//dx.doi.org/10.1016/j.eswa.2017.08.038. Intelligence, 41, 17–24. http://dx.doi.org/10.1016/j.engappai.2015.01.006.
Li, F., Tang, Y., Yu, F., Pedrycz, W., Liu, Y., & Zeng, W. (2021). Multilinear-trend fuzzy Wu, F., Cattani, C., Song, W., & Zio, E. (2020). Fractional ARIMA with an improved
information granule-based short-term forecasting for time series. IEEE Transactions cuckoo search optimization for the efficient short-term power load forecasting.
on Fuzzy Systems, 30(8), 3360–3372. Alexandria Engineering Journal, 59(5), 3111–3118.
Li, F., Yang, H., Yu, F., Wang, F., & Wang, X. (2019). A one-factor granular fuzzy Xian, S., Feng, M., & Cheng, Y. (2023). Incremental nonlinear trend fuzzy granulation
logical relationship based multi-point ahead prediction model. In 2019 IEEE for carbon trading time series forecast. Applied Energy, 352, Article 121977. http:
14th international conference on intelligent systems and knowledge engineering (pp. //dx.doi.org/10.1016/j.apenergy.2023.121977.
1133–1138). IEEE, http://dx.doi.org/10.1109/ISKE47853.2019.9170339. Yang, H., Huang, K., King, I., & Lyu, M. R. (2009). Localized support vector regression
Lihong, D., & Qian, X. (2020). Short-term electricity price forecast based on long short- for time series prediction. Neurocomputing, 72(10–12), 2659–2669.
term memory neural network. Journal of Physics: Conference Series, 1453, Article Yang, Z., Jiang, S., Yu, F., Pedrycz, W., Yang, H., & Hao, Y. (2022). Linear fuzzy
012103. information-granule-based fuzzy 𝐶-means algorithm for clustering time series. IEEE
Liu, S., Xiang, W., & Elangovan, V. (2023). A GPS distance error forecast model Transactions on Cybernetics, http://dx.doi.org/10.1109/TCYB.2022.3184999.
based on IIR filter de-noising and LSTM. IEEE Transactions on Instrumentation and Yang, X., Yu, F., & Pedrycz, W. (2017). Long-term forecasting of time series based on
Measurement. linear fuzzy information granules and fuzzy inference system. International Journal
Lu, W., Pedrycz, W., Liu, X., Yang, J., & Li, P. (2014). The modeling of time series based of Approximate Reasoning, 81, 1–27. http://dx.doi.org/10.1016/j.ijar.2016.10.010.
on fuzzy information granules. Expert Systems with Applications, 41(8), 3799–3808. Yin, J., Si, Y.-W., & Gong, Z. (2011). Financial time series segmentation based on
Ma, X., Tao, Z., Wang, Y., Yu, H., & Wang, Y. (2015). Long short-term memory turning points. In Proceedings 2011 international conference on system science and
neural network for traffic speed prediction using remote microwave sensor data. engineering (pp. 394–399). IEEE, http://dx.doi.org/10.1109/ICSSE.2011.5961935.
Transportation Research Part C (Emerging Technologies), 54, 187–197. http://dx.doi. Zadeh, L. A. (1979). Fuzzy sets and information granularity. Fuzzy Sets, Fuzzy Logic,
org/10.1016/j.trc.2015.03.014. and Fuzzy Systems: Selected Papers, 433–448.
Mencar, C. (2005). Theory of fuzzy information granulation: Contributions to Zadeh, L. A. (1997). Toward a theory of fuzzy information granulation and its centrality
interpretability issues. University of BARI, 3–8. in human reasoning and fuzzy logic. Fuzzy Sets and Systems, 90(2), 111–127.
Ncir, C.-E. B., Hamza, A., & Bouaguel, W. (2021). Parallel and scalable dunn index for Zha, W., Liu, Y., Wan, Y., Luo, R., Li, D., Yang, S., et al. (2022). Forecasting monthly
the validation of big data clusters. Parallel Computing, 102, Article 102751. gas field production based on the CNN-LSTM model. Energy, Article 124889.
Niu, Z., Yu, Z., Tang, W., Wu, Q., & Reformat, M. (2020). Wind power forecasting Zhang, Z., & Dong, Y. (2020). Temperature forecasting via convolutional recurrent
using attention-based gated recurrent unit network. Energy, 196, Article 117081. neural networks based on time-series data. Complexity, 2020, 1–8. http://dx.doi.
http://dx.doi.org/10.1016/j.energy.2020.117081. org/10.1155/2020/3536572.
Okutani, I., & Stephanedes, Y. J. (1984). Dynamic prediction of traffic volume through Zhao, J., Han, Z., Pedrycz, W., & Wang, W. (2015). Granular model of long-term
Kalman filtering theory. Transportation Research, Part B (Methodological), 18(1), prediction for energy system in steel industry. IEEE Transactions on Cybernetics,
1–11. 46(2), 388–400.
Qian, F., & Chen, X. (2019). Stock prediction based on LSTM under different stability.
In 2019 IEEE 4th international conference on cloud computing and big data analysis
(pp. 483–486). IEEE.
19