Encoding Time Series As Images For Visual Inspection and 10179-46015-1-PB
Encoding Time Series As Images For Visual Inspection and 10179-46015-1-PB
40
data input is typically represented by concatenating Mel-
frequency cepstral coefficients (MFCCs) or perceptual lin-
ear predictive coefficient (PLPs) (Hermansky 1990), typical
time series data are not likely to benefit from transformations
typically applied to speech or acoustic data.
In this paper, we present two new representations for en-
coding time series as images that we call them Gramian An-
gular Field (GAF) and the Markov Transition Field (MTF).
We select the same twelve “hard” time series dataset used
by Oates et al., and applied deep Tiled Convolutional Neural
Networks (Tiled CNN) with a pretraining stage that exploits
local orthogonality by Topographic ICA (Ngiam et al. 2010)
to “visually” represent the time series. We report our clas- Figure 1: Illustration of the proposed encoding map of
sification performance both on GAF and MTF separately, Gramian Angular Field. X is a sequence of typical time se-
and GAF-MTF which resulted from combining GAF and ries in dataset ’SwedishLeaf’. After X is rescaled by eq. (1)
MTF representations into a single image. By comparing our and smoothed by PAA optionally, we transform it into polar
results with five previous and current state-of-the-art hand- coordinate system by eq. (2) and finally calculate its GAF
crafted representation and classification methods, we show image with eq. (4). In this example, we build GAF with-
that our approach in practice achieves competitive perfor- out PAA smoothing, so the GAF has a high resolution of
mance with the state of the art while exploring a relatively 128 ⇥ 128.
small parameter space. We also find that our Tiled CNN
based deep learning method works well with small time se-
ries datasets, while the traditional CNN may not work well novel way to understand time series. As time increases, cor-
on such small datasets (Zheng et al. 2014). In addition to ex- responding values warp among different angular points on
ploring the high level features learned by Tiled CNNs, we the spanning circles, like water rippling. The encoding map
provide an in-depth analysis in terms of the duality between of equation 2 has two important properties. First, it is bijec-
time series and images within our frameworks that more pre- tive as cos( ) is monotonic when 2 [0, ⇡]. Given a time
cisely identifies the reasons why our approaches work. series, the proposed map produces one and only one result in
the polar coordinate system with a unique inverse function.
Encoding Time Series to Images Second, as opposed to Cartesian coordinates, polar coordi-
nates preserve absolute temporal relations. In Cartesian co-
We first introduce our two frameworks for encoding time se- R x( j)
ries as images. The first type of image is a Gramian Angular ordinates, the area is defined by Si,j = x(i) f (x(t))dx(t),
field (GAF), in which we represent time series in a polar co- we have Si,i+k = Sj,j+k if f (x(t)) has the same values
ordinate system instead of the typical Cartesian coordinates. on [i, i + k] and [j, j + k]. However, in polar coordinates,
In the Gramian matrix, each element is actually the cosine of R (j)
if the area is defined as Si,j
0
= (i) r[ (t)]2 d( (t)), then
the summation of angles. Inspired by previous work on the
duality between time series and complex networks (Cam-
0
Si,i+k 0
6= Sj,j+k . That is, the corresponding area from time
panharo et al. 2011), the main idea of the second framework, stamp i to time stamp j is not only dependent on the time
the Markov Transition Field (MTF), is to build the Markov interval |i j|, but also determined by the absolute value of
matrix of quantile bins after discretization and encode the i and j. We will discuss this in more detail in another work.
dynamic transition probability in a quasi-Gramian matrix. After transforming the rescaled time series into the po-
lar coordinate system, we can easily exploit the angular per-
Gramian Angular Field spective by considering the trigonometric sum between each
Given a time series X = {x1 , x2 , ..., xn } of n real-valued point to identify the temporal correlation within different
observations, we rescale X so that all values fall in the in- time intervals. The GAF is defined as follows:
terval [ 1, 1]: 2 3
cos( 1 + 1 ) · · · cos( 1 + n)
(xi max(X) + (xi min(X)) 6 cos( 2 + 1 ) · · · cos( 2+ n)7
x̃i = (1)
max(X) min(X) G = 6
4 .. .. .. 7
5 (3)
. . .
Thus we can represent the rescaled time series X̃ in polar cos( n + 1 ) · · · cos( n+ n)
coordinates by encoding the value as the angular cosine and p 0 p
time stamp as the radius with the equation below: = X̃ 0 · X̃ I X̃ 2 · I X̃ 2 (4)
⇢
= arccos (x̃i ), 1 x̃i 1, x̃i 2 X̃
ti (2) I is the unit row vector [1, 1, ..., 1]. After transforming to
r= N , ti 2 N the polar coordinate system, we take time series at each time
In the equation above, ti is the time stamp and N is a step as a 1-D metric p space. By p defining the inner product
constant factor to regularize the span of the polar coordi- < x, y >= x · y 1 x2 · 1 y 2 , G is a Gramian
nate system. This polar coordinate based representation is a matrix:
41
A B C D Markov Transition Matrx
2 3 A 0.917 0.083 0 0
< x˜1 , x˜1 > ··· < x˜1 , x˜n > B 0.083 0.583 0.334 0
C 0 0.260 0.522 0.218
6 < x˜2 , x˜1 > ··· < x˜2 , x˜n > 7 D 0 0.083 0.167 0.75
6 .. .. .. 7 (5) Typical Time Series
4 . . . 5
D
D
< x˜n , x˜1 > ··· < x˜n , x˜n >
CC
The GAF has several advantages. First, it provides a way BB
the position moves from top-left to bottom-right. The GAF Markov Transition Field
42
Convolutional I TICA Pooling I Convolutional II TICA Pooling II Linear SVM
Feature maps 𝑙 = 6
...
Pooling Size 3 × 3 ... beef 0.633 0.4 0.533 0.233
coffee 0 0 0 0
ECG200 0.16 0.11 0.15 0.21
faceall 0.121 0.244 0.102 0.259
Pooling Size 3 × 3 lighting2 0.2 0.18 0.167 0.361
Receptive Field 3 × 3
lighting7 0.329 0.397 0.386 0.411
oliveoil 0.2 0.2 0.033 0.3
Figure 3: Structure of the tiled convolutional neural network. OSULeaf 0.415 0.463 0.43 0.483
We fix the size of receptive field to 8 ⇥ 8 in the first convo- SwedishLeaf 0.134 0.104 0.206 0.176
lutional layer and 3 ⇥ 3 in the second convolutional layer. yoga 0.183 0.177 0.193 0.243
Each TICA pooling layer pools over a block of 3 ⇥ 3 input
units in the previous layer without wraparound the boarders
to optimize for sparsity of the pooling units. The number of platform). Thus, the results in this paper are a preliminary
pooling units in each map is exactly the same as the number lower bound on the potential best performance. Thoroughly
of input units. The last layer is a linear SVM for classifica- exploring the deep network structures and parameters will
tion. We construct this network by stacking two Tiled CNNs, be addressed in future work. The structure and parameters
each with 6 maps (l = 6) and tiling size k = 2. of the tiled CNN used in this paper are illustrated in Figure
3.
Above, W 2 Rp⇥q and V 2 Rp⇥p where p is the number Classifying Time Series Using GAF/MTF
of hidden units in a layer and q is the size of the input. V is
a logical matrix (Vij = 1 or 0) that encodes the topographic We apply Tiled CNNs to classify using GAF and MTF rep-
structure of the hidden units by a contiguous 3 ⇥ 3 block. resentation on twelve tough datasets, on which the classifi-
The orthogonality constraint W W T = I provides diversity cation error rate is above 0.1 with the state-of-the-art SAX-
among learned features. BoP approach (Lin, Khade, and Li 2012; Oates et al. 2012).
More detailed statistics are summarized in Table 2. The
Neither GAF nor MTF images are natural images; they datasets are pre-split into training and testing sets for ex-
have no natural concepts such as “edges” and “angles”. perimental comparisons. For each dataset, the table gives its
Thus, we propose to exploit the benefits of unsupervised pre- name, the number of classes, the number of training and test
training with TICA to learn many diverse features with local instances, and the length of the individual time series.
orthogonality. In addition, Ngiam et al. empirically demon-
strate that tiled CNNs perform well with limited labeled data
Experimental Setting
because the partial weight tying requires fewer parameters
and reduces the need for a large amount of labeled data. Our In our experiments, the size of the GAF image is regulated
data from the UCR Time Series Repository (Keogh et al. by the the number of PAA bins SGAF . Given a time se-
2011) tends to have few instances (e.g., the “yoga” dataset ries X of size n, we divide the time series into SGAF ad-
has 300 labeled instance in the training set and 3000 unla- jacent, non-overlapping windows along the time axis and
beled instance in the test set), tiled CNNs are suitable for our extract the means of each bin. This enables us to construct
learning task. the smaller GAF matrix GSGAF ⇥SGAF . MTF requires the
Typically, tiled CNNs are trained with two hyperparam- time series to be discretized into Q quantile bins to calculate
eters, the tiling size k and the number of feature maps l. the Q ⇥ Q Markov transition matrix, from which we con-
In our experiments, we directly fixed the network structures struct the raw MTF image Mn⇥n afterwards. Before classi-
without tuning these hyperparameters in loops for several fication, we shrink the MTF image size to SM T F ⇥ SM T F
reasons. First, our goal is to explore the expressive power of by the blurring kernel { m12 }m⇥m where m = d SMnT F e. The
the high level features learned from GAF and MTF images. Tiled CNN is trained with image size {SGAF , SM T F } 2
We have already achieved competitive results with the de- {16, 24, 32, 40, 48} and quantile size Q 2 {8, 16, 32, 64}.
fault deep network structures that Ngiam et al. used for im- At the last layer of the Tiled CNN, we use a linear soft mar-
age classification on the NORB image classification bench- gin SVM (Fan et al. 2008) and select C by 5-fold cross val-
mark. Although tuning the parameters will surely enhance idation over {10 4 , 10 3 , . . . , 104 } on the training set.
performance, doing so may cloud our understanding of the For each input of image size SGAF or SM T F and quan-
power of the representation. Another consideration is com- tile size Q, we pretrain the Tiled CNN with the full unla-
putational efficiency. All of the experiments on the 12 “hard” beled dataset (both training and test set) to learn the initial
datasets could be done in one day on a laptop with an In- weights W through TICA. Then we train the SVM at the last
tel i7-3630QM CPU and 8GB of memory (our experimental layer by selecting the penalty factor C with cross validation.
43
Table 2: Summary statistics of standard dataset and comparative results
DATASET CLASS TRAIN TEST LENGTH 1NN- 1NN- FAST BOP SAX- GAF-
EUCLIDEAN DTW SHAPELET VSM MTF
50words 50 450 455 270 0.369 0.242 0.4429 0.466 N/A 0.284
Adiac 37 390 391 176 0.389 0.391 0.514 0.432 0.381 0.307
Beef 5 30 30 470 0.467 0.467 0.447 0.433 0.033 0.3
Coffee 2 28 28 286 0.25 0.18 0.067 0.036 0 0
ECG200 2 100 100 96 0.12 0.23 0.227 0.14 0.14 0.08
FaceAll 14 560 1,690 131 0.286 0.192 0.402 0.219 0.207 0.223
Lightning2 2 60 61 637 0.246 0.131 0.295 0.164 0.196 0.18
Lightning7 7 70 73 319 0.425 0.274 0.403 0.466 0.301 0.397
OliveOil 4 30 30 570 0.133 0.133 0.213 0.133 0.1 0.167
OSULeaf 6 200 242 427 0.483 0.409 0.359 0.236 0.107 0.446
SwedishLeaf 15 500 625 128 0.213 0.21 0.27 0.198 0.251 0.093
Yoga 2 300 3,000 426 0.17 0.164 0.249 0.17 0.164 0.16
Finally, we classify the test set using the optimal hyperpa- nal” channels, like different colors in the RGB image space.
rameters {S, Q, C} with the lowest error rate on the training Thus, we can combine GAF and MTF images of the same
set. If two or more models tie, we prefer the larger S and Q size (i.e. SGAF = SM T F ) to construct a double-channel im-
because larger S helps preserve more information through age (GAF-MTF). Since GAF-MTF combines both the static
the PAA procedure and larger Q encodes the dynamic tran- and dynamic statistics embedded in raw time series, we posit
sition statistics with more detail. Our model selection ap- that it will be able to enhance classification performance. In
proach provides generalization without being overly expen- the next experiment, we pretrain and train the Tiled CNN
sive computationally. on the compound GAF-MTF images. Then, we report the
classification error rate on test sets.
Results and Discussion Table 2 compares the classification error rate of our ap-
proach with previously published performance results of
We use Tiled CNNs to classify GAF and MTF representa- five competing methods: two state-of-the-art 1NN classifiers
tions separately on the 12 datasets. The training and test er- based on Euclidean distance and DTW, the recently pro-
ror rates are shown in Table 1. Generally, our approach is posed Fast-Shapelets based classifier (Rakthanmanon and
not prone to overfitting as seen by the relatively small differ- Keogh 2013), the classifier based on Bag-of-Patterns (BoP)
ence between training and test set errors. One exception is (Lin, Khade, and Li 2012; Oates et al. 2012) and the most re-
the Olive Oil dataset with the MTF approach where the test cent SAX-VSM approach (Senin and Malinchik 2013). Our
error is significantly higher. approach outperforms 1NN-Euclidean, fast-shapelets, and
In addition to the risk of potential overfitting, MTF has BoP, and is competitive with 1NN-DTW and SAX-VSM.
generally higher error rates than GAF. This is most likely be- In addition, by comparing the results between Table 2 and
cause of uncertainty in the inverse image of MTF. Note that Table 1, we verified our assumption that combined GAF-
the encoding function from time series to GAF and MTF are MTF images have better expressive power than GAF or
both surjective. The map functions of GAF and MTF will MTF alone for classification. GAF-MTF achieves the lower
each produce only one image with fixed S and Q for each test error rate on ten datasets out of twelve (except for the
given time series X . Because they are both surjective map- dataset Adiac and Beef). On the Olive Oil dataset, the train-
ping functions, the inverse image of both mapping functions ing error rate is 6.67% and the test error rate is 16.67%. This
is not fixed. As shown in a later section, we can approx- demonstrates that the integration of both types of images
imately reconstruct the raw time series from GAF, but it is into one compound image decreases the risk of overfitting
very hard to even roughly recover the signal from MTF. GAF as well as enhancing the overall classification accuracy.
has smaller uncertainty in the inverse image of its mapping
function because such randomness only comes from the am- Analysis on Features and Weights Learned
biguity of cos( ) when 2 [0, 2⇡]. MTF, on the other hand,
has a much larger inverse image space, which results in large through Tiled CNNs
variation when we try to recover the signal. Although MTF In contrast to the cases in which the CNN is applied in natu-
encodes the transition dynamics which are important fea- ral image recognition tasks, neither GAF nor MTF has nat-
tures of time series, such features seem not to be sufficient ural interpretations of visual concepts like “edges” or “an-
for recognition/classification tasks. gles”. In this section we analyze the features and weights
Note that at each pixel, Gij , denotes the superstition of learned through Tiled CNNs to explain why our approach
the directions at ti and tj , Mij is the transition probability works.
from quantile at ti to quantile at tj . GAF encodes static in- As mentioned earlier, the mapping function from time se-
formation while MTF depicts information about dynamics. ries to GAF is surjective and the uncertainty in its inverse
From this point of view, we consider them as two “orthogo- image comes from the ambiguity of the cos( ) when 2
44
Figure 5: learned sparse weights W for the last SVM layer
Figure 4: (a) Original GAF and its six learned feature maps in Tiled CNN (left) and its orthogonality constraint by
before the SVM layer in Tiled CNN (top left), and (b) raw W W T = I (right).
time series and approximate reconstructions based on the
main diagonal of six feature maps (top right) on ’50Words’ use of local orthogonality. The TICA pretraining provides
dataset; (c) Original MTF and its six learned feature maps the built-in advantage that the function w.r.t the parameter
before the SVM layer in Tiled CNN (bottom left), and (d) space is not likely to be ill-conditioned as W W T = 1. As
curve of self-transition probability along time axis (main shown in Figure 5 (right), the weight matrix W is quasi-
diagonal of MTF) and approximate reconstructions based orthogonal and approaching 0 without very large magnitude.
on the main diagonal of six feature maps (bottom right) on This implies that the condition number of W approaches 1
”SwedishLeaf” dataset. helps the system to be well-conditioned.
[0, 2⇡]. The main diagonal of GAF, i.e. {Gii } = {cos(2 i )} Conclusions and Future Work
allows us to approximately reconstruct the original time se- We created a pipeline for converting time series data into
ries, ignoring the signs by novel representations, GAF and MTF images, and extracted
r high-level features from these using Tiled CNN. The fea-
cos(2 ) + 1 tures were subsequently used for classification. We demon-
cos( ) = (8)
2 strated that our approach yields competitive results when
compared to state-of-the-art methods when searching a rel-
MTF has much larger uncertainty in its inverse image, atively small parameter space. We found that GAF-MTF
making it hard to reconstruct the raw data from MTF alone. multi-channel images are scalable to larger numbers of
However, the diagonal {Mij||i j|=k } represents the transi- quasi-orthogonal features that yield more comprehensive
tion probability among the quantiles in temporal order con- images. Our analysis of high-level features learned from
sidering the time interval k. We construct the self-transition Tiled CNN suggested that Tiled CNN works like a multi-
probability along the time axis from the main diagonal of frequency moving average that benefits from the 2D tempo-
MTF like we do for GAF. Although such reconstructions ral dependency that is preserved by Gramian matrix.
less accurately capture the morphology of the raw time se- Important future work will involve applying our method
ries, they provide another perspective of how Tiled CNNs to massive amounts of data and searching in a more com-
capture the transition dynamics embedded in MTF. plete parameter space to solve the real world problems. We
Figure 4 illustrates the reconstruction results from six fea- are also quite interested in how different deep learning ar-
ture maps learned before the last SVM layer on GAF and chitectures perform on the GAF and MTF images. Another
MTF. The Tiled CNN extracts the color patch, which is es- interesting future work is to model time series through GAF
sentially a moving average that enhances several receptive and MTF images. We aim to apply learned time series mod-
fields within the nonlinear units by different trained weights. els in regression/imputation and anomaly detection tasks. To
It is not a simple moving average but the synthetic integra- extend our methods to the streaming data, we suppose to
tion by considering the 2D temporal dependencies among design the online learning approach with recurrent network
different time intervals, which is a benefit from the Gramian structures.
matrix structure that helps preserve the temporal informa-
tion. By observing the rough orthogonal reconstruction from
each layer of the feature maps, we can clearly observe that
References
the tiled CNN can extract the multi-frequency dependen- Abdel-Hamid, O.; Mohamed, A.-r.; Jiang, H.; and Penn, G.
cies through the convolution and pooling architecture on the 2012. Applying convolutional neural networks concepts to
GAF and MTF images to preserve the trend while address- hybrid nn-hmm model for speech recognition. In Acoustics,
ing more details in different subphases. As shown in Figure Speech and Signal Processing (ICASSP), 2012 IEEE Inter-
4(b) and 4(d), the high-leveled feature maps learned by the national Conference on, 4277–4280. IEEE.
Tiled CNN are equivalent to a multi-frequency approximator Abdel-Hamid, O.; Deng, L.; and Yu, D. 2013. Explor-
of the original curve. ing convolutional neural network structures and optimiza-
Figure 5 demonstrates the learned sparse weight matrix tion techniques for speech recognition. In INTERSPEECH,
W with the constraint W W T = I, which makes effective 3366–3370.
45
Campanharo, A. S.; Sirer, M. I.; Malmgren, R. D.; Ramos, Lawrence, S.; Giles, C. L.; Tsoi, A. C.; and Back, A. D.
F. M.; and Amaral, L. A. N. 2011. Duality between time 1997. Face recognition: A convolutional neural-network ap-
series and networks. PloS one 6(8):e23378. proach. Neural Networks, IEEE Transactions on 8(1):98–
Deng, L.; Abdel-Hamid, O.; and Yu, D. 2013. A deep 113.
convolutional neural network using heterogeneous pooling LeCun, Y., and Bengio, Y. 1995. Convolutional networks
for trading acoustic invariance with phonetic confusion. In for images, speech, and time series. The handbook of brain
Acoustics, Speech and Signal Processing (ICASSP), 2013 theory and neural networks 3361.
IEEE International Conference on, 6669–6673. IEEE. LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998.
Deng, L.; Li, J.; Huang, J.-T.; Yao, K.; Yu, D.; Seide, F.; Gradient-based learning applied to document recognition.
Seltzer, M.; Zweig, G.; He, X.; Williams, J.; et al. 2013. Proceedings of the IEEE 86(11):2278–2324.
Recent advances in deep learning for speech research at LeCun, Y.; Kavukcuoglu, K.; and Farabet, C. 2010. Convo-
microsoft. In Acoustics, Speech and Signal Processing lutional networks and applications in vision. In Circuits and
(ICASSP), 2013 IEEE International Conference on, 8604– Systems (ISCAS), Proceedings of 2010 IEEE International
8608. IEEE. Symposium on, 253–256. IEEE.
Deng, L.; Hinton, G.; and Kingsbury, B. 2013. New types Leggetter, C. J., and Woodland, P. C. 1995. Maximum likeli-
of deep neural network learning for speech recognition and hood linear regression for speaker adaptation of continuous
related applications: An overview. In Acoustics, Speech and density hidden markov models. Computer Speech & Lan-
Signal Processing (ICASSP), 2013 IEEE International Con- guage 9(2):171–185.
ference on, 8599–8603. IEEE.
Lin, J.; Khade, R.; and Li, Y. 2012. Rotation-invariant sim-
Erhan, D.; Bengio, Y.; Courville, A.; Manzagol, P.-A.; Vin- ilarity in time series using bag-of-patterns representation.
cent, P.; and Bengio, S. 2010. Why does unsupervised pre- Journal of Intelligent Information Systems 39(2):287–315.
training help deep learning? The Journal of Machine Learn-
Mohamed, A.-r.; Dahl, G. E.; and Hinton, G. 2012. Acoustic
ing Research 11:625–660.
modeling using deep belief networks. Audio, Speech, and
Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.; Wang, X.-R.; and Language Processing, IEEE Transactions on 20(1):14–22.
Lin, C.-J. 2008. Liblinear: A library for large linear classifi-
Ngiam, J.; Chen, Z.; Chia, D.; Koh, P. W.; Le, Q. V.; and
cation. The Journal of Machine Learning Research 9:1871–
Ng, A. Y. 2010. Tiled convolutional neural networks. In
1874.
Advances in Neural Information Processing Systems, 1279–
Hermansky, H. 1990. Perceptual linear predictive (plp) 1287.
analysis of speech. the Journal of the Acoustical Society
Oates, T.; Mackenzie, C. F.; Stein, D. M.; Stansbury, L. G.;
of America 87(4):1738–1752.
Dubose, J.; Aarabi, B.; and Hu, P. F. 2012. Exploiting rep-
Hinton, G.; Deng, L.; Yu, D.; Dahl, G. E.; Mohamed, A.-r.; resentational diversity for time series classification. In Ma-
Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, chine Learning and Applications (ICMLA), 2012 11th Inter-
T. N.; et al. 2012. Deep neural networks for acoustic model- national Conference on, volume 2, 538–544. IEEE.
ing in speech recognition: The shared views of four research
Rakthanmanon, T., and Keogh, E. 2013. Fast shapelets:
groups. Signal Processing Magazine, IEEE 29(6):82–97.
A scalable algorithm for discovering time series shapelets.
Hinton, G.; Osindero, S.; and Teh, Y.-W. 2006. A fast In Proceedings of the thirteenth SIAM conference on data
learning algorithm for deep belief nets. Neural computation mining (SDM). SIAM.
18(7):1527–1554.
Reynolds, D. A., and Rose, R. C. 1995. Robust text-
Hubel, D. H., and Wiesel, T. N. 1962. Receptive fields, independent speaker identification using gaussian mixture
binocular interaction and functional architecture in the cat’s speaker models. Speech and Audio Processing, IEEE Trans-
visual cortex. The Journal of physiology 160(1):106. actions on 3(1):72–83.
Kavukcuoglu, K.; Sermanet, P.; Boureau, Y.-L.; Gregor, K.; Senin, P., and Malinchik, S. 2013. Sax-vsm: Interpretable
Mathieu, M.; and Cun, Y. L. 2010. Learning convolutional time series classification using sax and vector space model.
feature hierarchies for visual recognition. In Advances in In Data Mining (ICDM), 2013 IEEE 13th International Con-
neural information processing systems, 1090–1098. ference on, 1175–1180. IEEE.
Keogh, E. J., and Pazzani, M. J. 2000. Scaling up dynamic Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; and Zhao, J. L. 2014.
time warping for datamining applications. In Proceedings of Time series classification using multi-channels deep convo-
the sixth ACM SIGKDD international conference on Knowl- lutional neural networks. In Web-Age Information Manage-
edge discovery and data mining, 285–289. ACM. ment. Springer. 298–310.
Keogh, E.; Xi, X.; Wei, L.; and Ratanamahatana, C. A. 2011.
The ucr time series classification/clustering homepage.
URL= http://www. cs. ucr. edu/˜ eamonn/time series data.
Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012.
Imagenet classification with deep convolutional neural net-
works. In Advances in neural information processing sys-
tems, 1097–1105.
46