0% found this document useful (0 votes)
63 views8 pages

Time Series Forecasting Using Clustering With Periodinc Pattern

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views8 pages

Time Series Forecasting Using Clustering With Periodinc Pattern

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Time Series Forecasting using Clustering with Periodic Pattern

Jan Kostrzewa
Instytut Podstaw Informatyki Polskiej Akademii Nauk, ul. Jana Kazimierza 5, 01-248 Warszawa, Poland

Keywords: Time Series, Forecasting, Data Mining, Subseries, Clustering, Periodic Pattern.

Abstract: Time series forecasting have attracted a great deal of attention from various research communities. One of
the method which improves accuracy of forecasting is time series clustering. The contribution of this work
is a new method of clustering which relies on finding periodic pattern by splitting the time series into two
subsequences (clusters) with lower potential error of prediction then whole series. Having such subsequences
we predict their values separately with methods customized to the specificities of the subsequences and then
merge results according to the pattern and obtain prediction of original time series. In order to check efficiency
of our approach we perform analysis of various artificial data sets. We also present a real data set for which
application of our approach gives more then 300% improvement in accuracy of prediction. We show that in
artificially created series we obtain even more pronounced accuracy improvement. Additionally our approach
can be use to noise filtering. In our work we consider noise of a periodic repetitive pattern and we present
simulation where we find correct series from data where 50% of elements is random noise.

1 INTRODUCTION time. However using knowledge gained by clustering


into time series forecasting is very limited. This re-
Time series forecasting is rich and dynamically grow- sults from the simple fact that even if we are able to
ing science field and its methods applied in numer- group elements into clusters with specific forecasting
ous areas such as medicine, economics, finance, engi- properties we do not know to which clusters future el-
neering and many other crucial fields [(Huanmei Wu, ements would belong to.
2005),(Zhang, 2007),(Zhang, 2003),(Tong, 1983)]. We would like to bypass this problem and present
Currently there are many popular and well developed usage of time series clustering for time series fore-
methods of time series forecasting such as ARIMA casting. Our assumption is that there exist such peri-
models, Neural Networks or Fuzzy Cognitive Maps odic pattern in time series based on which we are able
[(S. Makridakis, 1997) (J. Han, 2003) (Song and to create subsequence with much lower potential error
Miao, 2010) ]. Clustering is process of grouping into of prediction then whole series. Elements which are
one clusters ”by some natural criterion of similarity” not included in chosen subsequence are grouped in
(Duda and Hart, 1973). This vague definition is one second subsequence. Due to the periodicity of the pat-
of the reason why there are so many different cluster- tern we can assume to which cluster future elements
ing algorithms (Estivill-Castro, 2002). Although dif- should belong to. Because of that we are able to pre-
ferent clustering methods group elements according dict values of every subsequence separately and then
to completely different criterions of similarity there merge them according to periodic pattern to get pre-
always has to be mathematically defined similarity diction of original series X. Main problem with that
measurement metric. Every algorithm using this met- idea is that number of possible periodic patterns in-
ric groups together elements which are closer to each crease exponentially according to time series length.
other then those in other clusters. Classical example This means that in practise evaluating potential er-
of time series clustering’s usage is classification based ror for every periodic pattern is impossible but using
on ECG of a particular patient into cluster of nor- our approach we can find proposal of best pattern in
mal or dysfunctional ECG. Other type of time series reachable time.
clustering is presented in partition methods such SAX This paper is organized as fallows. Section 2 re-
algorithm (Jessica Lin, 2007). Goal of that type of views related work. The proposed approach is de-
algorithms is discretization of numerical data which scribed with in detail in section 3. Simulations of
shows some features of and compress data at the same different series are presented in section 5. In section

85
Kostrzewa, J..
Time Series Forecasting using Clustering with Periodic Pattern.
In Proceedings of the 7th International Joint Conference on Computational Intelligence (IJCCI 2015) - Volume 3: NCTA, pages 85-92
ISBN: 978-989-758-157-1
Copyright c 2015 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications

4 we estimate complexity of our approach overhead 3.1 Create Corresponding Subsequence


according to time series length. The last section 6
concludes the paper. We have time series X = (x1 , x2 , ..., xn ) and binary
vector S = (b1 , b2 , ..., bn ). We create subseries
X 1 = (x11 , x21 , ..., x1j ) which contains all elements xi
2 RELATED WORKS such that corresponding b is equal to 1. Analogously
we create subseries X 0 which contains all elements xi
In book ”Data Mining: Concepts and Techniques, such that corresponding b is equal to 0. For example:
Morgan Kaufmann.” (J. Han, 2001) are discussed five X = (x1 , x2 , ..., xn ), S = (1, 1, 0, 0, 1, 1, 0, 0...0, 0, 1)
major categories of clustering: partitioning methods X 1 = (x1 , x2 , x5 , x6 , .., xn ), X 0 = (x3 , x4 ...xn−2 , xn−1 )
(for example k-mean algorithm (MacQueen, 1967)
), hierarchical methods (for example Chameleon al-
gorithm (G. Karypis, 1999)), density based methods 3.2 Extend Binary Vector
(for example DBSCAN algorithm (M. Ester, 1996)
), grid-based methods (for example STING algorithm The first step needed to extend binary vector is cre-
(W. Wang, 1997)) and model-based methods (for ex- ation of vector dictionary. A pseudo-code definition
ample AutoClass algorithm (P. Cheeseman, 1996)). of creating vector dictionary is given in Table 1.
Main property for all of these categories is grouping create dictionary: gets binary vector S which we
into one cluster elements from one interval in contrast would like to extend. Initially we set variable d to
to our approach which groups into one clusters ele- 1 and create empty dictionary set. We create vector
ments scattered across the whole time series. Other which is interval of length d + 1 starting from first po-
type of clustering is Hybrid Dimensionality Reduc- sition. We check if dictionary contains vector which
tion and Extended Hybrid Dimensionality Reduction is coincides our vector on first d position but differs
[(Moon S, 2012) (S. Uma, 2012)]. These method con- on last d +1 position. If such vector occurs that means
sists of clustering of all elements with specific type that our dictionary words are too short to predict un-
of value. Algorithm can group into one clusters ele- equivocally binary vector S so we clear dictionary, in-
ments scatter across a whole time series hover it does crease d by 1 and repeat whole process from the first
not suggest pattern. Because of that we are not able position. Else if dictionary does not contain the vector
to assume to which cluster future elements should be- we add it to the dictionary. Then we increase interval
long to, which is a significant difference between our starting position by 1 and repeat the whole process till
approach and the methods described above. end of the interval do not exceed end of the S vector.
The function returns the dictionary when end of the
interval exceeds end of the S vector .
3 THE PROPOSED APPROACH When we have the dictionary we can start extend-
ing S vector. A pseudo-code definition of extending
Main idea of our approach is to find a pattern S which binary vector is given in Table 2.
is periodic binary vector and according to it split- extend binary vector: gets binary vector S which
ting time series X into two subsequences (clusters) X 1 we would like to extend and expected length value.
and its complement X 0 . After that we use prediction Thanks to function create dictionary we have dictio-
methods on X 1 subsequence and X 0 separately. Then nary for binary vector S. Let d be length of every
we merge results according to S pattern and get pre- vector in that dictionary and n be the length of S vec-
diction of original time series. As measurement of tor. In dictionary we try to find such vector which on
error we use mean square error (MSE). every position but the last is equal to S(bn−d+2 ...bn ).
If there is no such vector we extend S by random bi-
1 n nary number. Otherwise we extend S by last value of
MSE = ∑ (y − ŷ)2
n i=1
(1) the vector found in dictionary. We repeat this process
till S length reach expected new length.
Where n is number of all predicted values, y is real
value and ŷ is predicted value. MSE can be treated as
similarity measure according to which we group ele- 3.3 Find Proposition of Best Pattern S
ments into clusters. Because every element belongs to
exactly one subsequence we can say that our approach We have to find such binary vector which:
use strict partitioning clustering. In order to describe 1. Splits vector X into subsequences X 1 and its com-
our approach with more details we split algorithm to plement X 0 such that MSE of that subsequences
simple functions and describe them separately. would be lower then MSE of time series X.

86
Time Series Forecasting using Clustering with Periodic Pattern

FUNCTION create_dictionary(S) FUNCTION find_best_subsequence(time_series,c,k)


d=1 S = cob(c)//cob returns all binary combinations
dict = [] //of length c with 1 on at least c/2 positions
i=1 FOR i=1;i++;i<=number_of_rows(S)
WHILE i<=(size(S)-d) X1(i,:)=create_subseq(S(i,:),time_series)
window = S(i:i+d) Xtrain=X1(1:0.7*size(X))
IF ismember(window(1:end-1),dict(1:end-1) Xtest =X1(0.7*size(X):end)
&& !ismember(window,dict) ) MSE = chosen_prediction_method(Xtrain,Xtest)
size_of_window=size_of_window+1 S(i,end+1) = MSE
i=1 ENDFOR
dict = [] S = sort_ascendning_by_last_column(S)
ELSE S = S(1:ceiling(end/k),:)
dict = dict.add_new_row(window) WHILE c<size(time_series)
ENDIF c=k*c
i = i+1 IF c>size(time_series)
ENDWHILE c = size(time_series)
RETURN dict ENDIF
FOR j=1;j++;j<number_of_rows(S)
Figure 1: Pseudo-code of algorithm which creates vector S(j,:)=extend_binary_vector(S(j,:),c)
dictionary for pattern S. X(j,:)=create_subseq(S(j,:),time_series)
Xtrain=X(1:0.7*size(X))
Xtest=X(0.7*size(X):end)
FUNCTION extend_binary_vector(S,new_length)
MSE=any_prediction_method(Xtrain,Xtest)
dict = create_dictionary(S)
S(j,end+1) = MSE
i = size(S)-length_of_row(dict)+1
ENDFOR
WHILE length(S)<new_length
S = sort_ascendning_by_last_column(S)
small_win = S(i:i+length_of_row(dict)-1)
S = S(1:ceiling(end/k),:)
index =
ENDWHILE
index_of_element(small_win,dict(:,1:end-1) )
//return S with lowest MSE
IF index>0
RETURN S(1,:)
S.add(dict.elementAt(index).elementAt(end))
ELSE
S.add(randomly_0_or_1()) Figure 3: Pseudo-code of algorithm which finds proposition
ENDIF of best subsequence.
i = i+1
ENDWHILE tion. It is worth noting that we can create m processes
RETURN S and calculate MSE for vectors X11 , X21 , ..., Xm1 paral-
lel. Parallel computing in practise can significantly
Figure 2: Pseudo-code of algorithm which extends binary decrease computational time. The number of possible
vector to length c. S subsequences increase exponentially according to c
number. This is why it is the most time consuming
2. contains regularity such that it is possible to pre- part of the algorithm.
dict correctly new values of binary vector. In this part of the algorithm we have set of pairs
A pseudo-code definition of an algorithm for finding (S1 , MSE1 ),(S2 , MSE2 ), (S3 , MSE3 )...(Sm , MSEm ).
proposal of the best pattern binary vector is given in Then we reject all S rows but d 1k e of the rows with
Table 3. the lowest MSE. We extend rows of S using function
f ind best subsequence: gets time series and arbi- extend binary vector (refer to pseudo-code in Table
trary chosen constants c and multiplicity number k. 2) to get binary vectors with length k · c. Now we
Then we create all possible different binary vectors of have set S1 , S2 , ..., Sd 1 e where every row S has length
k
length c such that 1s are on at least d 2c e positions. We k · c. For every row Si we create vectors Xi1 as it was
save these vectors as rows of matrix S. This means described in section 3.1. We repeat process of calcu-
that S has m rows where for odd c we get c−1
c−1 1 c
 m=2 lating MSE, selection and extending S rows length
and for even c we get m = 2 + 2 c/2 . For every while its length does not exceed training set length.
i − th row of S matrix we create vector Xi1 as it was As the result we return row S with corresponding
described at Section 3.1. Every subsequence Xi1 is lowest MSE.
splitted in such a way that 0.7 of that series is train set
Xi1train and 0.3 is test set Xi1test where 0.7 and 0.3 are ar-
bitrary chosen constants. Then by using arbitrary cho-
sen prediction method we calculate MSE of predic-

87
NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications

3.4 Time Series Forecasting we use it during whole process of clustering. Find-
ing proper vector in dictionary costs not more than
In order to predict value of xt+1 we predict value of O (log2 c). This is why extending binary vector by n
bt+1 in S series (refer to pseudo-code in Table 2). elements cost O (nlog2 c).
1
Then if bt+1 = 1 we take prediction xt+1 calculated
1
on subsequence X otherwise we choose prediction 4.3 Time Complexity of the Algorithm
0
xk+1 calculated on subsequence X 0 , where X 0 is com- Which Finds Best Pattern S
plementary subsequence X 1 to X.
Algorithm which finds best pattern S is described in
subsection 3.3. We choose some arbitrary length of
4 COMPLEXITY OF PROPOSED the first subsequence c and multiplicity parameter k.
We start with subsequence of length c and then in ev-
APPROACH ery step we extend this subsequence k times. Also we
remove all S proposals but d 1k e with the lowest corre-
Our goal is to prove that clustering with our approach sponding MSE. Number of operations can be approx-
has time complexity equal to O (lognk MSE(n)) where imated by:
MSE is arbitrary chosen prediction function, k is con-
stant multiplicity parameter and n is length of series. 2c−1 MSE(c) + 2c−1 c2 + clog2 (c)
We assume that prediction function has complexity
1
not less than O (n). Firstly we determine complexities + d e2c−1 MSE(kc) + kclog2 (c)
of every part of the algorithm. k
1 (3)
+ d( )2 e2c−1 MSE(k2 c) + k2 clog2 (c)
4.1 Time Complexity of the Algorithm k
1 n/c
Which Creates Corresponding + ... + d( )logk e2c−1 MSE(n)
k
Subsequence which is equal to
Algorithm which creates vector X 1 and X 0 from orig- n/c
logk
1
nal series using pattern S is described in the section
3.1. Complexity of that algorithm is O (n).
2 c−1 2
c + ∑ d( )i e2c−1 (MSE(ki c) + ki clog2 c) (4)
k
i=0

4.2 Time Complexity of the Algorithm where 2c−1 is number of all proposals of S pro-
posed in the first step, c2 is maximal cost for creation
Which Extends Binary Vector n/c
binary vector dictionary (refer to Section 4.2), logk
is the maximal number of steps after which length
Algorithm which extends binary vector (refer to
of S reaches n, MSE(ki c) is cost of approximation
pseudo-code in Table 2) contains two parts. Firstly
prediction error on every step for every S proposal,
we have to create vector dictionary which is able
ki clog2 c is cost of extending binary vector S k times.
to extend binary sequence. We notice that maximal
Taking into account this equation we can say that
number of vectors in dictionary cannot be larger than
number of operation in our approach is definitely
2d where d is the vector length. However, at the same
smaller than
time dictionary cannot contain more then c elements
where c is the length of the vector on which we n/c
2c c2 + logk (2c )(MSE(n) + (nlog2 c)) (5)
build dictionary. Due to that we can say that in every
step dictionary length is not larger that min(2d , c). After taking into consideration that complexity of
Moreover we know that algorithm will produce not MSE(n) is not less than O (n) and omitting constants
more that c such dictionaries. We know that number we can say that complexity of our approach according
of operation is equal to to n is equal to:
c
∑ min(2d , c) ∗ c < c2 (2)
O (lognk MSE(n)) (6)
i=1
so we can say that time complexity of that algo- On the contrary, the time complexity according to
rithm is O (c2 ). Another part of the algorithm is ex- c is equal to:
tending binary vector using created dictionary. What
is important we create dictionary only once and then O (2c ) (7)

88
Time Series Forecasting using Clustering with Periodic Pattern

Figure 4: IceTargets series plot.


Figure 5: Diagram of neural network used in simulations.
It is worth noticing that algorithm can be processed
in parallel and consequently that time calculation in ing to it (refer to Table 1). Due to that neural net-
practise can decrease significantly. works separately predicts IceTargets series and ran-
dom noise. Our approach has mean MSE equal to
0.62 where neural network trained on whole set gives
5 SIMULATIONS MSE equal to 0.93. The results are presented in Table
2.
In order to check efficiency of our approach we made
several simulations. In our simulations we used neu- 5.2 Cosinus with IceTargets
ral networks with hidden layer and delay equal to
2 (refer to diagram on Figure 5). As a neural net- We created time series by merging cosinus and
work training method we used Levenberg-Marquardt IceTargets time series using pattern:
backpropagation algorithm [(Marquardt, 1963)]. in X = (cos(0.1), IceTargets(1) , cos(0.2),
comparative simulation we used neural networks with cos(0.3), IceTargets(2), IceTargets(3), cos(0.4),
the same structure, training rate, training method and IceTargets(4) , cos(0.5), cos(0.6), IceTargets(5),
number of iteration as in our approach. The only IceTargets(6) ...)
difference was that neural networks in our approach So pattern could be described by vector
were trained on subsequences chosen by our algo- S = (101100101100101100101100...)
rithm where neural network used in comparative sim- Our approach finds correct pattern which splits time
ulation was trained on whole training set. In every series into proper subsequences (refer to Table 1).
simulation as a constant c number we used 12 and Neural networks trained on subsequences give mean
as multiplicity parameter k we used 2. In order to MSE equal to 0,0170 where neural network trained
avoid random bias we repeated every simulation 10 on whole training set gives MSE equal to 0,5144
times and used mean values. We also used IceTargets (refer to Table 2).
data which contains a time series of 219 scalar val-
ues representing measurements of global ice volume 5.3 Quarterly Australian Gross Farm
over the last 440,000 years (see Figure 4). Time series Product
is available at (http://lib.stat.cmu.edu/datasets/, ) or in
the standard Matlab library as ice dataset. In this simulation we used real statistic data of
Quarterly Australian Gross Farm Product $m
5.1 IceTargets with Random Noise 1989/90 prices. Time series is build from 135
data points represented values measured between
We modified IceTargets series by adding random September 1959 and March 1993. Data is available
numbers generated from uniform distribution on - at (https://datamarket.com/data/set/22xn/quarterly-
1.81 to 2.12. Where -1.81 is minimum value australian-gross-farm-product-m-198990-prices-sep-
from IceTargets series and 2.12 is maximum value 59-mar 93#!ds=22xn&display=line, ). The data
from IceTargets series. Random number occurrence was rescaled to 0-1 range. One of the proposed
scheme is as fallow: subsequence is presented on table 1. Average value
X = (rand(1),IceTargets(1),rand(2),IceTargets(2), of MSE of this time series forecasting calculated
rand(3), ... IceTargets(219),rand(220)) using our approach was equal to 0,007 when average
Our approach finds vector S = (0, 1, 0, 1, ..., 1, 0) value of MSE achieved by single neural network was
which is correct pattern and splits time series accord- equal to 0,0211 (refer to Table 2).

89
NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications

Table 1: Table which presents on different time series plots of subsequences X 1 and X 2 after clustering with our approach.
Time series IceTargets with random noise
Pattern proposed by our approach: S = (101010101010101010101001...)
Time series Subsequence X0 found by our approach Subsequence X1 found by our approach

Time series Cosinus with IceTargets


Pattern proposed by our approach: S = (101100101100101100101100...)
Time series Subsequence X0 found by our approach Subsequence X1 found by our approach

Time series Quarterly˙Australian˙Gross˙Farm


Pattern proposed by our approach: S = (1011001100110011001100...)
Time series Subsequence X0 found by our approach Subsequence X1 found by our approach

Time series predicted with different methods


Pattern proposed by our approach: S = (110011001100110011001100...)
Time series Subsequence X0 found by our approach Subsequence X1 found by our approach

5.4 Series Predicted With Different predict their values with the same method - neural
Methods network. However, our approach gives a possibility
to use completely different methods of prediction to
In all previous simulations we used our approach to each subsequence. Due to that we can choose differ-
splitting time series into subsequences and then we ent methods according to specific prediction proper-

90
Time Series Forecasting using Clustering with Periodic Pattern

Table 2: Comparisons with other methods for time series based on MSE.
IceTargets merged with noise IceTargets merged with cos QuarterlyGrossFarmProduct
Our approach 0,62 0,0170 0,007
Single Neural Network 0,93 0,5144 0,0211
Increase efficiency 1,5 times 30,25 times 3,014 times

ties of each subsequence and take advantage of both nificant improvement of accuracy. Moreover we have
methods. To show that it is possible we merged proven that generated overhead asymptotically is log-
two series with completely different prediction prop- arithmic with respect to time series length. Low com-
erties into one time series. We choose simple series putation overhead caused by our approach suggests
which grows linearly according to time and statistic that it can be useful regardless of the time series
data IceTargets which expected value do not seems length. Moreover algorithm can be processed parallel
to change in time. We merged them with the pattern: and therefore we can decrease time of computation by
X = (1, 2, IceTargets(1), IceTargets(2), 3, 4, implementing it on multiple processors.
IceTargets(3), IceTargets(4), 5, 6, IceTargets(5), Our solution opens up broad prospects of further
IceTargets(6) ...) work. First of all our approach use strict partition-
Pattern is described with a vector ing clustering where every element belongs to exactly
S = (110011001100110011001100...). one cluster. Future research may design and examine
We use our approach which splits time series into two our approach with overlapping clustering where sin-
subsequences (Please see table 1). To predict X1 we gle element may belong to many clusters. Efficiency
use linear regression and to predict X0 we use neural of our approach with such modification should be in-
network. Thanks to that we use advanteges of both vestigated on real data. Another open question is in-
methods and get MSE=0.0101. In case of using sin- fluence of choice of maximal searched pattern period
gle neural network method we get MSE = 2535.45 and minimal acceptable subseries length into our ap-
and when using only single linear regression MSE = proach prediction efficiency. One of future area of re-
30.35 (refer to Table 3). Our approach provides the search could be also design and implementation auto-
prediction error over 250000 times smaller then us- mated method of selecting different prediction meth-
ing only neural network and 3000 times smaller then ods to proposed subseries.
using only linear regression.

Table 3: Comparison of MSE calculated with different REFERENCES


methods for the time series created by merging linear func-
tion and IceTarget.
Duda, R. and Hart, P. (1973). Pattern classification and
Method Neural Linear Our scene analysis. In John Wiley and Sons, NY, USA,
Network regression approach 1973.
MSE 2535.45 30.35 0,0101 Estivill-Castro, V. (20 June 2002). Why so many clus-
tering algorithms a position paper. In ACM
SIGKDD Explorations Newsletter 4 (1): 6575.
doi:10.1145/568574.568575.
6 CONCLUSIONS G. Karypis, E.-H. Han, V. K. (1999). Chameleon: hierarchi-
cal clustering using dynamic modeling. In Computer
In presented work, we proposed a novel method for 6875.
time series forecasting. Our approach is based on http://lib.stat.cmu.edu/datasets/.
splitting of the series into a subsequence and its com- https://datamarket.com/data/set/22xn/quarterly-australian-
plement what can result in much lower potential pre- gross-farm-product-m-198990-prices-sep-59-mar
diction error. Moreover, it allows application of dif- 93#!ds=22xn&display=line.
ferent prediction methods to both subsequences and Huanmei Wu, Betty Salzberg, G. C. S.-S. B. J.-H. S. D. K.
therefore to combine their benefits. The proposed ap- (2005). Subsequence matching on structured time se-
ries data. In SIGMOD.
proach is not associated with any specific time series
J. Han, M. K. (2001). Data mining: Concepts and tech-
forecasting method and can be applied as a generic
niques, morgan kaufmann. In San Francisco, 2001
solution in time series preprocessing. Moreover we pp. 346389.
show that our approach allows to noise filtering. In J. Han, M. K. (2003). Application of neural networks to
order to validate the efficiency of the introduced so- an emerging financial market: forecasting and trading
lution we conducted series of experiments. Obtained the taiwan stock index. In Computers & Operations
results proved that using our approach results in sig- Research 30, pp. 901-923.

91
NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications

Jessica Lin, Eamonn Keogh, L. W. S. L. (2007). Experi-


encing sax: a novel symbolic representation of time
series. In Data Mining and Knowledge Discovery, Vol-
ume 15, Issue 2, pp 107-144.
M. Ester, H.-P. Kriegel, J. S. X. X. (1996). A density-
based algorithm for discovering clusters in large spa-
tial databases. In Proceedings of the 1996 Interna-
tional Conference on Knowledge Discovery and Data
Mining (KDD96).
MacQueen, J. (1967). Some methods for classification and
analysis of multivariate observations, in: L.m. lecam,
j. neyman (eds.). In Proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probabil-
ity, vol. 1, pp. 281297.
Marquardt, D. (June 1963). An algorithm for least-squares
estimation of nonlinear parameters. In SIAM Journal
on Applied Mathematics, Vol. 11, No. 2, pp. 431-441.
Moon S, Q. H. (2012). Hybrid dimensionality reduction
method based on support vector machine and inde-
pendent component analysis. In IEEE Trans Neu-
ral Netw Learn Syst. 2012 May;23(5):749-61. doi:
10.1109/TNNLS.2012.2189581.
P. Cheeseman, J. S. (1996). Sting: a statistical information
grid approach to spatial data mining. In Bayesian clas-
sification (AutoClass): theory and results, in: U.M.
Fayyard, G. Piatetsky-Shapiro, P. Smyth, R. Uthu-
rusamy (Eds.), Advances in Knowledge Discovery and
Data Mining, AAAI/MIT Press, Cambridge, MA.
S. Makridakis, S. Wheelwright, R. H. (1997). Forecasting:
Methods and applications. In Wiley.
S. Uma, A. C. (Jan 2012). Pattern recognition using en-
hanced non-linear time-series models for predicting
dynamic real-time decision making environments. In
Int. J. Business Information Systems, Vol. 11, Issue 1,
pp. 69-92.
Song, H. J., S. Z. Q. and Miao, C. Y. M. (2010). Fuzzy cog-
nitive map learning based on multi-objective particle
swarm optimization. In IEEE Transactions on Fuzzy
Volume 18 Issue 2 233-250. IEEE Press Piscataway.
Tong, H. (1983). Threshold models in non-linear time series
analysis. In Springer-Verlag.
W. Wang, J. Yang, R. M. R. (1997). Sting: a statistical
information grid approach to spatial data mining. In
Proceedings of the 1997 International Conference on
Very Large Data Base (VLDB97).
Zhang, G. (2003). Time series forecasting using a hybrid
arima and neural network model. In Neurocomputing
50 pages: 159-175.
Zhang, G. (2007). A neural network ensemble method with
jittered training data for time series forecasting. In
Information Sciences 177 pages: 5329-5346.

92

You might also like