Time Series Forecasting Using Clustering With Periodinc Pattern
Time Series Forecasting Using Clustering With Periodinc Pattern
Jan Kostrzewa
Instytut Podstaw Informatyki Polskiej Akademii Nauk, ul. Jana Kazimierza 5, 01-248 Warszawa, Poland
Keywords: Time Series, Forecasting, Data Mining, Subseries, Clustering, Periodic Pattern.
Abstract: Time series forecasting have attracted a great deal of attention from various research communities. One of
the method which improves accuracy of forecasting is time series clustering. The contribution of this work
is a new method of clustering which relies on finding periodic pattern by splitting the time series into two
subsequences (clusters) with lower potential error of prediction then whole series. Having such subsequences
we predict their values separately with methods customized to the specificities of the subsequences and then
merge results according to the pattern and obtain prediction of original time series. In order to check efficiency
of our approach we perform analysis of various artificial data sets. We also present a real data set for which
application of our approach gives more then 300% improvement in accuracy of prediction. We show that in
artificially created series we obtain even more pronounced accuracy improvement. Additionally our approach
can be use to noise filtering. In our work we consider noise of a periodic repetitive pattern and we present
simulation where we find correct series from data where 50% of elements is random noise.
85
Kostrzewa, J..
Time Series Forecasting using Clustering with Periodic Pattern.
In Proceedings of the 7th International Joint Conference on Computational Intelligence (IJCCI 2015) - Volume 3: NCTA, pages 85-92
ISBN: 978-989-758-157-1
Copyright
c 2015 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications
86
Time Series Forecasting using Clustering with Periodic Pattern
87
NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications
3.4 Time Series Forecasting we use it during whole process of clustering. Find-
ing proper vector in dictionary costs not more than
In order to predict value of xt+1 we predict value of O (log2 c). This is why extending binary vector by n
bt+1 in S series (refer to pseudo-code in Table 2). elements cost O (nlog2 c).
1
Then if bt+1 = 1 we take prediction xt+1 calculated
1
on subsequence X otherwise we choose prediction 4.3 Time Complexity of the Algorithm
0
xk+1 calculated on subsequence X 0 , where X 0 is com- Which Finds Best Pattern S
plementary subsequence X 1 to X.
Algorithm which finds best pattern S is described in
subsection 3.3. We choose some arbitrary length of
4 COMPLEXITY OF PROPOSED the first subsequence c and multiplicity parameter k.
We start with subsequence of length c and then in ev-
APPROACH ery step we extend this subsequence k times. Also we
remove all S proposals but d 1k e with the lowest corre-
Our goal is to prove that clustering with our approach sponding MSE. Number of operations can be approx-
has time complexity equal to O (lognk MSE(n)) where imated by:
MSE is arbitrary chosen prediction function, k is con-
stant multiplicity parameter and n is length of series. 2c−1 MSE(c) + 2c−1 c2 + clog2 (c)
We assume that prediction function has complexity
1
not less than O (n). Firstly we determine complexities + d e2c−1 MSE(kc) + kclog2 (c)
of every part of the algorithm. k
1 (3)
+ d( )2 e2c−1 MSE(k2 c) + k2 clog2 (c)
4.1 Time Complexity of the Algorithm k
1 n/c
Which Creates Corresponding + ... + d( )logk e2c−1 MSE(n)
k
Subsequence which is equal to
Algorithm which creates vector X 1 and X 0 from orig- n/c
logk
1
nal series using pattern S is described in the section
3.1. Complexity of that algorithm is O (n).
2 c−1 2
c + ∑ d( )i e2c−1 (MSE(ki c) + ki clog2 c) (4)
k
i=0
4.2 Time Complexity of the Algorithm where 2c−1 is number of all proposals of S pro-
posed in the first step, c2 is maximal cost for creation
Which Extends Binary Vector n/c
binary vector dictionary (refer to Section 4.2), logk
is the maximal number of steps after which length
Algorithm which extends binary vector (refer to
of S reaches n, MSE(ki c) is cost of approximation
pseudo-code in Table 2) contains two parts. Firstly
prediction error on every step for every S proposal,
we have to create vector dictionary which is able
ki clog2 c is cost of extending binary vector S k times.
to extend binary sequence. We notice that maximal
Taking into account this equation we can say that
number of vectors in dictionary cannot be larger than
number of operation in our approach is definitely
2d where d is the vector length. However, at the same
smaller than
time dictionary cannot contain more then c elements
where c is the length of the vector on which we n/c
2c c2 + logk (2c )(MSE(n) + (nlog2 c)) (5)
build dictionary. Due to that we can say that in every
step dictionary length is not larger that min(2d , c). After taking into consideration that complexity of
Moreover we know that algorithm will produce not MSE(n) is not less than O (n) and omitting constants
more that c such dictionaries. We know that number we can say that complexity of our approach according
of operation is equal to to n is equal to:
c
∑ min(2d , c) ∗ c < c2 (2)
O (lognk MSE(n)) (6)
i=1
so we can say that time complexity of that algo- On the contrary, the time complexity according to
rithm is O (c2 ). Another part of the algorithm is ex- c is equal to:
tending binary vector using created dictionary. What
is important we create dictionary only once and then O (2c ) (7)
88
Time Series Forecasting using Clustering with Periodic Pattern
89
NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications
Table 1: Table which presents on different time series plots of subsequences X 1 and X 2 after clustering with our approach.
Time series IceTargets with random noise
Pattern proposed by our approach: S = (101010101010101010101001...)
Time series Subsequence X0 found by our approach Subsequence X1 found by our approach
5.4 Series Predicted With Different predict their values with the same method - neural
Methods network. However, our approach gives a possibility
to use completely different methods of prediction to
In all previous simulations we used our approach to each subsequence. Due to that we can choose differ-
splitting time series into subsequences and then we ent methods according to specific prediction proper-
90
Time Series Forecasting using Clustering with Periodic Pattern
Table 2: Comparisons with other methods for time series based on MSE.
IceTargets merged with noise IceTargets merged with cos QuarterlyGrossFarmProduct
Our approach 0,62 0,0170 0,007
Single Neural Network 0,93 0,5144 0,0211
Increase efficiency 1,5 times 30,25 times 3,014 times
ties of each subsequence and take advantage of both nificant improvement of accuracy. Moreover we have
methods. To show that it is possible we merged proven that generated overhead asymptotically is log-
two series with completely different prediction prop- arithmic with respect to time series length. Low com-
erties into one time series. We choose simple series putation overhead caused by our approach suggests
which grows linearly according to time and statistic that it can be useful regardless of the time series
data IceTargets which expected value do not seems length. Moreover algorithm can be processed parallel
to change in time. We merged them with the pattern: and therefore we can decrease time of computation by
X = (1, 2, IceTargets(1), IceTargets(2), 3, 4, implementing it on multiple processors.
IceTargets(3), IceTargets(4), 5, 6, IceTargets(5), Our solution opens up broad prospects of further
IceTargets(6) ...) work. First of all our approach use strict partition-
Pattern is described with a vector ing clustering where every element belongs to exactly
S = (110011001100110011001100...). one cluster. Future research may design and examine
We use our approach which splits time series into two our approach with overlapping clustering where sin-
subsequences (Please see table 1). To predict X1 we gle element may belong to many clusters. Efficiency
use linear regression and to predict X0 we use neural of our approach with such modification should be in-
network. Thanks to that we use advanteges of both vestigated on real data. Another open question is in-
methods and get MSE=0.0101. In case of using sin- fluence of choice of maximal searched pattern period
gle neural network method we get MSE = 2535.45 and minimal acceptable subseries length into our ap-
and when using only single linear regression MSE = proach prediction efficiency. One of future area of re-
30.35 (refer to Table 3). Our approach provides the search could be also design and implementation auto-
prediction error over 250000 times smaller then us- mated method of selecting different prediction meth-
ing only neural network and 3000 times smaller then ods to proposed subseries.
using only linear regression.
91
NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications
92