Deep Multivariate Time Series Embedding Clustering
Deep Multivariate Time Series Embedding Clustering
Abstract
Clustering is an optimization problem and an iterative process. Through clustering, observations of a given
arXiv:2004.07296v1 [cs.LG] 11 Apr 2020
data set clustered into distinct groups. The optimization goal is to maximize the similarities of data items clustered
in the same group while minimizing the similarities of data objects grouped in separate clusters. With respect to
the complexity of features captured for the given data set, a simple clustering task can turn into a multi-objectives
clustering process in which more than one feature is utilized to cluster data objects. Hence, the complexity of cluster
analysis heavily depends on the dimensionality of the data and thus the number of features to be considered. While
the dimensionality of given data is observable, the identification of possible features involved in the analysis is
a challenging task. A more daunting problem is the identification of hidden features in the given data set. More
specifically, the detection of such hidden features is a non-trivial task that needs more advanced algorithmic and
mathematical techniques and solutions. An example of such data sets with complex structures and known and
hidden features is time series data. As a special case, Time Series Clustering (TSC) inherently is a complex
and multi-objective problem where the exact and complete set of features and their significance is unknown and
hidden. As a result, conventional data mining-based clustering algorithms may miss these hidden features and thus
not effectively perform clustering for time series data.
Machine learning and in particular deep learning algorithms are the emerging approaches to data analysis.
These techniques have transformed traditional data mining-based analysis radically into a learning-based model in
which existing data sets along with their cluster labels (i.e., train set) are learned to build a supervised learning
model and predict the cluster labels of unseen data (i.e., test set). In particular, deep learning techniques are capable
of capturing and learning hidden features in a given data sets and thus building a more accurate prediction model
for clustering and labeling problem. However, the major problem is that time series data are often unlabeled and
thus supervised learning-based deep learning algorithms cannot be directly adapted to solve the clustering problems
for these special and complex types of data sets. To address this problem, this paper introduces a two-stage method
for clustering time series data. First, a novel technique is introduced to utilize the characteristics (e.g., volatility) of
given time series data in order to create labels and thus be able to transform the problem from unsupervised learning
into supervised learning. Second, an autoencoder-based deep learning model is built to learn and model both known
and hidden features of time series data along with their created labels to predict the labels of unseen time series
data. The paper reports a case study in which financial and stock time series data of selected 70 stock indices
are clustered into distinct groups using the introduced two-stage procedure. The results show that the proposed
procedure is capable of achieving 87.5% accuracy in clustering and predicting the labels for unseen time series
data.
Keywords: KMeans Clustering, Financial Data Analysis, Time-Series Clustering, Deep Learning, Encoder-
Decoder, Unsupervised Learning, Supervised Learning.
I. I NTRODUCTION
N. Tavakoli is with the Department of Computer Science, Georgia Institute of Technology, Atlanta, GA, 30332 USA e-mail:
[email protected]
S. Siami-Namini is with the Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, 79409 USA e-mail: sima.siami-
[email protected]
M. Adl Khangha and F. Mirza Soltani are with the Department of Computer Science, University of Debrecen, Debrecen, Hungary, e-mail:
[email protected] and [email protected]
A. Siami Namin is with the Department of Computer Science, Texas Tech University, Lubbock, TX, 79409 USA e-mail: ak-
[email protected]
The pre-print of an accepted paper for publication in the Journal of Springer Nature (SN) of Applied Sciences (March 2020).
2
N important step prior to performing any detailed data analysis is to understand the nature and
A characteristics of a given data set. There are several statistical techniques that help in creating
different level of abstractions, each representing the data set from different angles. A very basic and
preliminary technique is descriptive statistics such as mean and standard deviation that are often utilized
by data analysts in order to grasp the trend and variation of observations and thus capture a big picture of
the data. These types of metadata can describe and, more specifically, “featurize” data. Hence, in practice
and theory data analysis refers to the identification, selection, and analysis of the known features of data
sets.
As a very important and prevalent data type, time series data are playing a key role in different
application domains such as social and human sciences such as psychology, economics, business, and
finance as well as engineering, quality control, monitoring and security. What makes time series data
unique is the addition of another dimension to the data (i.e., time) and the elevation of the complexity
involved in analysis. The high dimensionality and additional complexity introduced by time make analysis
of such data types very challenging. Therefore, it raises the concern of whether conventional data analysis
techniques are suitable in exploring all “possible” features, factors, and their causalities of such data type.
A popular data analysis task is the traditional clustering problem, in which the given dataset is divided
into subgroups with the goal of maximizing the similarity of the data observations grouped together;
while maximizing the dissimilarity of the observations clustered in distinct groups. A simple and typical
clustering algorithm (e.g., KMeans) takes as input a numerical vector representing the original data and
measures the distance between data items using a simple distance metric (e.g., Euclidean). The assignment
of data observations to different groups is then optimized with respect to the adjustment and optimization,
which is an repetitive process. It is also possible to extend the basic clustering algorithm to address a
more general multi-objectives problem where more than one feature, or characteristics, of datasets are
taken into account for clustering. However, the problem of clustering of time series data is undeniably
more daunting and challenging that need further analysis. There are several challenges associated with
the problem of Time Series Clustering (TSC) including:
1) Unlabeled Data. It is hard to find or automatically label time series data. As a result, existing
clustering techniques in supervised learning are less applicable. While there are great benefits
associated with unsupervised learning (e.g., no need to know the exact number of desired clusters),
the absence of labels in time series data would cause overlooking more accurate clustering techniques
based on supervised learning, in which the labels of time series are known and thus it is easier to
predict the cluster labels of time series data.
2) High Dimensionality. Due to the inclusion of time factor, in addition to some other features, the
dimensionality of the time series data and thus the number of features is increasingly high. It
is therefore of utmost importance to identify salient features that contribute significantly to the
characteristics of the time series data as well as reduce the effects of nuisances that exhibit themselves
as false features.
3) Hidden Features. The most critical issue is the possibility of existence of some hidden features
that may not be apparent and thus are missed from the direct data analysis. Examples of such
hidden factors might be exogenous and even some endogenous factors in the given data sets. The
conventional data mining and even machine learning techniques are less effective in capturing these
existence but hidden features. As a result, more advanced and rigorous methods and techniques are
needed to take into account these possible features when modeling the clustering solutions.
To address the aforementioned challenges, this paper introduces a two-stage methodology for time series
clustering. The first stage of the introduced methodology targets the problem of “unlabeled data” in time
series. The goal of this stage is to transform an unsupervised learning problem into a supervised learning
and then be able to perform clustering and prediction of labels using supervised learning techniques. The
basic idea is to derive the prospective cluster labels through utilization of characteristics and features
of the given time series. Once the characteristics and features of time series data are “vectorized” (i.e.,
numerically calculated and represented), a conventional K-means clustering can be used to cluster the
3
feature vector data. The generated clusters and the label for each cluster can then be utilized to label the
original time series data and thus the problem can be transformed to supervised learning.
The second stage targets the problems of “high dimensionality” and dealing with “hidden features” of
time series data. The proposed approach is to utilize deep learning to capture and take into account the
effects of hidden layers. Deep learning is capable of optimizing a prediction model by iteratively learning
new features through various internal neural networks and the neurons incorporated in each layer. To
address both problems simultaneously, an autoencoder-based deep learning algorithm is utilized, in which
the autoencoder not only take into accounts the hidden features but also preserves the features that are
salient for computation and prediction. Through changing the architecture of neural networks including
the shape of the input data, the number of internal layers, the number of neurons on each layer, and the
activation and optimization functions it is possible to enhance the accuracy of the prediction and more
specifically supervised learning-based clustering problems through deep learning. The key contributions
of this paper are:
– Introduce a two-stage methodology to address time series clustering. In the first stage, we introduce a
methodology to create cluster labels and thus enable transforming unsupervised learning to supervised
learning for time series data. In the second stage, an autoencoder-based deep learning algorithm is
being built to model clustering time series data is presented.
– Demonstrate the performance of the proposed two-stage methodology though a case study performed
on clustering time series data of 70 stock indices. The results show achieving an accuracy of 87.5%
in correctly predicting the cluster labels of time series data.
This article is organized as follows: Section II reviews the state-of-the-art of this time series clustering.
Section III highlights the key characteristics of time series and in particular financial time series data.
A brief overview of artificial neural networks is represented in Section IV. The general picture of the
two-stage model is presented in Section V, and the encoder-decoder deep learning model is decribed in
Section VI. The autoencoder-based model is evaluated and the results are reported in Section VIII. Section
IX concludes the paper and highlights future research directions.
the last few decades. Most of them critically depend on the choice of distance (i.e., similarity) measure
among time-series.
In general, there are three different approaches to cluster whole time-series: 1) shape-based, 2) feature-
based, and 3) model-based. On the other hand, with respect to the length of the time-series, whole
time-series clustering can be classified into two categories: 1) shape-level, and 2) structure-level. In the
shape-based approach, clustering is performed based on the shape similarity, where shapes of two time-
series are matched using a non-linear stretching and contacting of the time axes. Convectional clustering
methods are used in the shape-based clustering.
In the feature-based clustering methods, feature extraction approaches are used. Feature extraction refers
to the methods that transform the raw time-series into the set of features. Feature extraction is used to
compress large data sets using dimensionality reduction. Hence, in feature-based clustering raw time-series
are transformed into the feature vector of lower dimension (i.e., for each time-series a fixed-length and an
equal-length feature vector is created). Then, a conventional clustering algorithm is applied on the lower
dimension feature vector. The extracted features are usually application dependent which implies that one
set of features that are useful for one application might not be relevant and useful for another one. In some
studies other feature selection methods are performed to further reduce the number of feature dimensions
after feature extraction [7]. As the notion of shape cannot be precisely defined, dozens of similarity (i.e.,
distance) measures have been proposed [1], [2], [3], [4], [8], [9]. In this paper, we also employ similar
techniques in order not only to cluster but also to utilize cluster labels for further analysis and more
specifically for supervised learning.
Model-based clustering approaches assume a model for each cluster and attempt to fit the data into the
assumed model. Then, each raw time-series data is transformed into either model parameters (one model
for each time-series) or into a mixture of underlying probability distributions. One of the major problems
of the model-based approaches is the scalability problems [10] where its performance deteriorated when
the clusters are very similar.
Whole time-series clustering contains four major components: 1) dimensionality reduction (or time-
series representation), 2) distance measurement (or similarity), 3) the clustering algorithm, and 4) prototype
definition and evaluation where prototype refers to the summerization of the time-series. Depending on
the application, time-series clustering uses some or all of these components. The reason of having each
component is as follows: the dimensionality reduction is usually used to fit data in memory. Afterwards,
a clustering algorithm is performed on the data using a similarity (distance) measure, and as a result a
prototype is created which shows a summarization of the time-series. Finally, the created prototype is
evaluated using different criteria.
In the rest of this review, main components of the whole time-series clustering are discussed in more
details.
To date, several time-series representation methods have been proposed to improve the performance of
time-series clustering [13], [14]. Ding et al. [2] have provided a comprehensive study on eight different
representation methods which are performed on 38 datasets.
In general, there are four types of time-series representation methods: 1) data adaptive, 2) non-data
adaptive, 3) model-based, and 4) data-dictated (or clipped data) based approaches [15], [11], [16], [17]
which are explained as follows:
1) Data adaptive representation methods aim to minimize the global reconstruction error [4], and can
be applied on all types of time-series.
2) Non-data adaptive approaches are only appropriate for those time-series that have fixed-size and
equal-length segmentation.
3) Model-based approaches are a special kind of time-series representation methods that are used to
represent a time-series in a stochastic way such as Hidden Markov Model (HMM) [18], statistical
models, time-series Bitmaps [19], and Auto-Regressive Moving Average (ARMA) [20].
4) Data-dictated (Clipped data) time-series representation approaches are the less known type of rep-
resentations where the feature reduction ratio is automatically defined according on raw time-series.
The most famous method of this type of representation is called clipping (bit-level) representa-
tion [16].
– Chaos. This feature exists when a dynamic system exhibits a sensitivity to initial conditions and
thus reacts to unpredictable long-term behavior. There exists very small evidence of low-dimensional
chaos in financial markets.
Classical data mining techniques and even machine learning algorithms might not be able to capture
all these features and thus generated model might not be accurate. As discussed and presented in this
paper, deep learning approaches are better well-positioned to formulate these features through layers of
learning.
volatility affects returns through stock liquidity, suggesting that liquidity providers play an important role
in the market-return relationship in the United States. Sen and Bandhopadhyay [41] evaluated a dynamic
return and volatility spillover from US stock market into the Indian stock market. These conflicting results
warrant further estimation by using appropriate techniques and algorithms.
Volatility clustering is the main feature of volatility of asset prices and the volatility shocks can affect
the expectation of volatility in future [41]. Volatility clustering means the large changes of prices (variance
of return) for a period.
There is a double relationship between volatility and returns in equity markets. Long run fluctuations
of volatility show risk premiums and therefore establish a positive relation to returns. On the other hand,
short run volatility indicates news effects and shocks to leverage, and thus produce a negative volatility-
return relation. The leverage effect explains how the volatility rises when the asset prices reduces. While
long run volatility is related with a higher return, the opposite appears in the short run volatility.
B. Encoder-Decoder
Autoencoders are a type of neural networks that transforms input data into their output. Autoencoder
uses two parts in this transformation [42]:
1) Encoder by which it transforms its high dimensional inputs into a smaller set of dimensions while
keeping the most important features, and
2) Decoder by which the reduced set of features is used to reconstruct the initial input data.
10
The output of the encoder, referred to as “latent-space representation”, is a compressed form of the
input data in which the most influential and important features are kept. The output of the encoder is then
utilized to reconstruct the initial input data given to the autoencoder.
From mathematical point of view, an autoencoder network is a composition of functions (f g)(x). More
specifically, an encoder is a function f that takes x as input and maps x into h, or the latent-space
representation (i.e., h = f (x)). On the other hand, a decoder is a function g that takes the output of the
encoder (i.e., h) and produces r (i.e., r = g(h)). The objective is to make r as close as possible to x.
The key objective of autoencoders is not just to copy the input into the output. In fact, through the
training of an autoencoder and transformation of the input into the output, it is aimed that the produced
latent-space representation (i.e., h) holds only unique and important properties and features of the dataset
that can be used for further analysis. In order to extract the only important features of the given dataset
in the form of latent-space representation, a set of constraints can be defined on function that generates
h so that the resulting compressed form of the dataset has smaller dimensions than initial dataset x. As
a result, the quality of detecting most salient features of the dataset x heavily depends on the constraints
defined on h. There are different variations of autoencoders [42]:
1) Basic autoencoder in which there are three layers: a) an input layer of size |x|, b) a hidden layer
of size |h| (i.e., |h| < |x|), and b) an output layer of size |r| (i.e., |r| = |x|) in which size refers to
the number of nodes incorporated and designed in the underlying layer.
2) Multilayer autoencoder, in which the number of hidden layers is increased to more than one. This
type of autoencoders is useful when additional internal hidden layers are required to extract the
hidden features and train the model.
3) Convolutional autoencoder, in which the input data is filtered for the goal of extracting only
some parts of it. These types of autoencoders are particularly very effective in image processing
applications and conversions from 3-D into smaller dimensions of filtered images.
4) Regularized autoencoder, in which the extraction and training stages are performed in accordance
with some other factors such as loss functions than solely based on defining hidden layers.
In the following sections, we provide an in-depth description of the synergic methodology proposed
for clustering time series data. First, we focus on the architecture of the designed autoencoder and then
provide in-depth discussion of the algorithms developed.
purpose is to detect and take into account hidden features that might exist even within volatility and return
when modeling the deep learning-based clustering.
Since an autoencoder is a symmetric neural network, the number of layers and nodes on each layer of
the decoder should be symmetric with the number of layers and neurons in the encoder side. As a result,
there are three layers on the decoder side (i.e., L4, L5, and L6) with the number of nodes of 20, 50, and
100, respectively, in which an ascending order of the number of neurons is apparent. The decoder side
explicitly reconstructs the original inputs using the reduced features with exact shape for both input and
output. The layer h is exactly where the number of prospective clusters for clustering data is taken into
consideration. In our financial case study, the optimal number of clusters is four (See Section VIII) and
thus the number of nodes on this layer (i.e., h) is also considered four.
As it is apparent from Figure 3, through the number of nodes and layers defined for the encoder part,
the most salient features are captured and through the decoder side, which is symmetric to the encoder
side, the exact shape of the input data is reconstructed. The adjustment of weights for the internal layers
and their nodes are decided and optimized in a repetitive manner where the loss function on the output
(i.e., reconstructed input) is used as a means to measure the accuracy of the clustering.
The activation function incorporated on the layer h is in the form of a “sigmoid” function. A sigmoid
function is used to predict the probability values, since its values range between (0 to 1).
Once the model is trained on training set, the model (i.e., the output layer) produces a “floating” value
in the range of [1 − C, C − 1] where C is the number of desired clusters (i.e., four in our case study).
A simple application of rounding (i.e., np.rint function in Python) and absolute (i.e., np.absolute
function in Python) functions to the output generated by the model will produce a “positive integer” value
between [0, C − 1] that represent the class label of the underlying stock index data for which the output
has been generated.
VII. T HE A LGORITHMS
The introduced autoencoder-based deep learning methodology for time series clustering is represented
through two algorithms: 1) Transforming unsupervised data into supervised through building feature vector
and characterizing time series using descriptive metadata (i.e., volatility and return), and 2) Building an
autoencoder-based deep learning to predict cluster labels of supervised stock data. In following sections,
we describe each algorithm in further details.
A. Algorithm 1. Enabling Supervised Learning through Characterizing Time Series and Utilizing Metadata
as Labels
Algorithm 1 lists the step-by-step of transforming time series data (i.e., unsupervised data) into labeled
data. The algorithm utilizes two descriptive concepts to characterize financial time series data: 1) volatility,
and 2) return. Therefore, a vector of < volatility, return > for each array of stock prices captured for
each stock index and for a given period of time will be computed and constructed. Lets review Algorithm
1 in further details.
Algorithm 1 takes as inputs 1) a URL to scrap and enumerate stock indices, 2) the desired number of
clusters for clustering stock indices, 3) number of stock indices to analyze and cluster, and 4) the start
date of stock prices. The algorithm then labels each stock index with respect to the cluster the underlying
stock index belongs to by utilizing characteristic of time series data (i.e., volatility and return) and thus
converting unsupervised time series data to supervised data.
In Algorithm 1, first the setting variables are initialized (lines 1 - 5) followed by declaring with a few
data structures to hold captured data (lines 6 - 9). The algorithm then proceeds with scrapping the given
URL and listing the stock indices (i.e., tickers) in order to perform cluster analysis (lines 10 - 11). The
case study performed and reported in this paper focuses only on the first 70 stock indices. Once the list
of stock tickers is prepared, the “Adjusted Close” price of each stock index is retrieved for a given time
14
period. For our case study, we retrieved data for the start date of January 1, 2019 to April 15, 2019 (the
day of running this experiment and capturing the data) (lines 12 - 15).
The constructed and filled data structure (i.e., T P DF : < ticker, prices[] >) is then sorted in order
to enable one to one cluster assignment of each stock index (line 16). As two of the key characteristic
features of time series, the volatility and return values are computed for each time series of prices for each
stock index and the computed values along with the stock indices are preserved in a data structure (i.e.,
T V R DF ) (lines 17 - 22). Then, a projection of < index, volatility, return > data saved in T V R DF
is created for the purpose of KMeans clustering and creating labels based on < volatility, return > (line
23).
A KMeans-based clustering model with respect to the number of desired clusters (i.e., no clusters)
is then built (line 24) and the captured < volatility, return > are then given to the clustering model in
order to cluster and then create cluster labels (line 25). The centroids of clusters are optimized and the
silhouette value for the clustering is computed (lines 26 - 27). The labels for each stock index is created,
representing the cluster they belong to (i.e., 0, 1, 2, 3). The data are then saved in a file that will be used
by Algorithm 2 to build an autoencoder.
B. Algorithm 2. Predicting Cluster Labels of Time Series Data through Autoencoder-based Deep Learning
The second part of the methodology builds an autoencoder-based deep learning for clustering stock
indices. The algorithm takes as inputs: 1) training labeled time series data, 2) testing unlabeled data, 3)
number of clusters (i.e., neuron or node) to encode, 4) the shape of the input data (i.e., 2 in our case
< volatility, return >, 5) the shape of the output data (i.e., 1 in our case, a floating value), 6) the number
of iterations or epochs. The detailed of Algorithm 2 is given in Listing 2.
The algorithm starts with initiating setting variables including: 1) the number of clusters to project
(i.e., no cluster), 2) the number of batch size to retrieve and feed the autoencoder (i.e, BatchSize), 3)
the shape of the input data (i.e., in our case is 2, which is the number of input columns entered to the
model (< volatility, return >)), 4) the output shape (i.e., in out case is 1, an output with one column,
which is a floating variable representing the cluster label), 5) the test size (i.e., 33% for testing and 67%
for training), and 6) the number of epochs for iterative training (lines 1 - 8). The algorithm then loads
previously saved data that were captured through Algorithm 1 and saves the data into a data structure
T V R DF (line 9). The loaded data are then split into two data sets of train (68%) and (33%) for test
sets (lines 10 - 12).
The exact building of the autoencoder starts with specifying the shape of the input data (line 13). In our
case, the shape of the input data (i.e., InCol) is a vector with two columns < volatility, return >. The
creation of different layers of the autoencoder starts at line 14 where the input shape is given to build the
x part of the autoencoder model (as specified in Figure 3). The input layers of the autoencoder are built
through lines 14 – 17 where the shape of the input (i.e., input dim) is given to the first layer (line 14)
with 100 neurons, and the built first layer and its output is given to the second (line 15) with 50 nodes
or neurons, and the third (line 16) with 20 neurons or nodes. The activation function for building these
layers is “relu” which returns a value between (0 to 1). The h part of the autoencoder (Figure 3) is built
by line 17, where number of cluster labels is specified. The activation function here is “sigmoid.”
The encoding part of the autoencoder and the encoding layers are built through lines 14 - 17 in which
a decreasing number of neurons or nodes on each layer indicates focusing on important features of data
and preserving them for further analysis by the next layer (i.e., feature reduction).
Conversely and in a similar manner, the decoder part of the autoencoder intends to reconstruct the
initial input data using the encoded data (lines 18 - 21). The first layer of the decoder takes the output of
the “encoder” with 20 neurons (line 18). The additional decoder layers are then built symmetrically with
respect to the layers incorporated for the encoder part (lines 18 - 21) with similar activation function.
Eventually, the r part of the autoencoder (See Figure 3) is build where a floating variable is estimated to
show the cluster label of the input data < volatility, return >.
15
Listing 1. Enabling supervised learning through utilization of metadata of time series as labels.
Algorithm 1 (Transforming Unsupervised Learning to Supervised Learning):
Description: Transforming Unsupervised Data to Labeled Data and Clustering
Stock Labeled Data Using Optimal KMean Algorithm.
Inputs: 1) URL to scrap, 2) Number of Clusters,
3) Number of Stock Tickers, 4) The Start Date of Collecting Stock Prices
Outputs: The Cluster Label of the Scrapped Stock Tickers
# Setting
1. numpy.random.seed(7) # For reproducibility purpose
2. sp500_URL = ’https://en.wikipedia.org/wiki/List_of_S%26P_500_companies’ # Web page to scrape
3. no_clusters = 4 # Number of desired clusters obtained through experiments
4. no_tickers = 70 # Number of tickers to scrap and analyze
5. start_date = 0 1 /01/2019 # The start date to scrap t i c k e r s data
# Assign the cluster tag regarding each ticker (<index, voly, ret, cluster>)
28. for each ticker in TP_DF do
29. TVR_DF.cluster[index == ticker] = pd.DataFrame(predicts[index == ticker])
30. end for
# Save the data to a file to be used by Algorithm 2: (<ticker, vol, ret, cluster>)
31. TVR_DF.to_csv("/.../k-means-StockData.csv")
16
Listing 2. Deep learning-based (Encoder-Decoder) supervised learning for predicting cluster labels of stock indices.
Algorithm 2 (Supervised Learning through Autoencoder-based Deep Learning):
Description: Building An Autoencoder to Predict Cluster Label of Stock Indices
Inputs: 1) Training labeled set, 2) Testing unlabeled set.
Outputs: An Autoencoder-based Deep Learning Model to Predict Cluster Labels
# Setting
1. seed = 7
2. numpy.random.seed(seed) # For reproducibility purpose
3. no_clusters = 4 # Number of desired clusters (i.e., # of Neurons or Nodes)
4. BatchSize = 1024 # data batch size retrieved by the learner in each iteration
5. InCol = 2 # The shape of the input data used regarding training <vol, ret>
6. OuCol = 1 # The shape of the output data, A floating value
7. TestSize = 0.33 # the percentage of the test data of the splitting data
8. noEpochs = 1000 # Number of epochs (learning round) regarding training the model
# Loading labeled data (train and test): (<ticker, volatility, returns, cluster>)
9. TVR_DF = pd.read_csv("/ /k-means-StockData.csv") # Created by Algorithm 1
# Predict the cluster tag of the test data set using the autonecoder model
25. predicts = autoencoder.predict(X_test)
The built autoencoder maps the input to the decoded and reconstructed output and the model itself is
built (line 22). The model is then compiled using “adam” optimizer and mean square error (i.e., mse) as
a metric to assess the precision of the prediction (line 23). The built model is then given the training data
set (line 24) with a given number of epochs and bath size and eventually the test data are provided to
the model for the purpose of prediction of their cluster labels (line 25). In the end, the absolute and the
round value of the floating output value is reported as the predicted cluster label (line 26).
A. Development Platform
The authors implemented the algorithms in Python 2.7.13, the anaconda version. The deep learning
portion of the algorithms was developed using tensorflow and keras, the open source Python implemen-
tations of deep learning and neural networks. The experiments were executed on a Mac computer with
OS X El Capital 10.11.2 operating system with 2.8 GHz Intel Core i7 and 16GB 1600 MHz DDR3.
B. Data Collection
The authors collected the indexes and ticker symbols for 70 companies listed by S&P 500. The ticker
symbols were scarped from the URL of the Wiki page of the S&P 5001 . The read html Python library
was used to automatically scrap and extract the required data from the given Web page. Once the thicker
and symbol of the selected companies are identified, a Python script captured the time series data and
more specifically the “Adjusted Close” for the selected stock symbol. Moreover, the adjusted close value
data were captured for the period of January 1, 2019 to April 15, 2019 on a daily basis.
a) Annualized Stock’s Volatility2 . To calculate annualized stock’s volatility, the standard deviation of
price should be multiplied by the square root of 252 assuming that there are 252 trading days in a given
year.
b) Annualized Stock’s Return. The annualized stock’s return is computable in a similar fashion. However,
instead of standard deviation, the mean value of prices should be multiplied by the square root of 252.
TABLE I
T HE RESULT OF KM EANS CLUSTERING BASED ON FOUR CLUSTERS .
four, the figures show the exact time series of members of each cluster for the period of January 1, 2019
and April 15, 20193 .
Let us take a look at the clusters and the stock indices grouped together.
Figures 9 - 12 illustrate the range of volatility and return values of stock indices that are clustered
together. The stock indices are clustered with respect to two descriptive variables < volatility, return >.
As a result, the trends of these two variables is similar to the other stock indices clustered in the same
group. The figures visualize the range of volatility and returns computed for each member of clusters
produces by KMeans clustering for the period of January 1, 2019 to April 15, 2019.
Fig. 5. KMeans clustering: Cluster ”0” Fig. 6. KMeans clustering: Cluster ”1”
Fig. 7. KMeans clustering: Cluster ”3” Fig. 8. KMeans clustering: Cluster ”2”
TABLE II
T HE INPUT AND OUTPUT SHAPES ALONG WITH THE NUMBER OF PARAMETERS TRAINED .
We trained the model with different number of repetition (i.e., epoch) in order to understand the
performance of estimating the parameter values in details. We repeated the training model for 1000
epochs. Figure 13 illustrates the relationship between number of epochs and the error (i.e., loss). As the
figure indicates, the loss value is approximately zero when the number of epochs is greater than 316.
We kept the number of epochs as 1000, even though 316 epochs were sufficient. Once the network is
trained using the training dataset, it is given the test dataset to predict the labels. The prediction is in the
form of a numeral value that needs to be rounded. The computed numerical output of the program and
the rounded with absolute d values along with the actual label for the test data set are reported in Table
III.
The total number of data set was 70 (i.e., the time series data 70 stock indices were captured), of which
46 time series data were considered for training the network, and the remaining data set (i.e., 24) was
used for testing the model. As Table III lists, the network was able to predict the cluster label of 21 out
21
Fig. 9. KMeans clustering: The range of volatility and returnsFig. 10. KMeans clustering: The range of volatility and returns
for Cluster ”0” for Cluster ”1”
Fig. 11. KMeans clustering: The range of volatility and returns Fig. 12. KMeans clustering: The range of volatility and returns
for Cluster ”2” for Cluster ”3”
of 24 test data correctly achieving an accuracy of 87.5 in prediction. The miss-classified stock indices are
ALXN, APA, and AMZN which are colored in red in the figures.
To help realize the results of the autoencoder-based deep learning time series classification model, we
visualize the results. Figures 14 - 17 show the results of the prediction of cluster’s label for the test set
in which a time series with black color (i.e., except Cluster “2”) show the miss-classifications performed
by the prediction. The prediction results in three instances of mislabeling colored in black in the figures.
TABLE III
N UMERICAL PREDICTION OF TIME SERIES ’ CLUSTER LABELS .
The findings indicate that, these three stock indices (i.e., ALXN, APA, and AMZN) are on the border
line of clusters (See Figures 18 and 19). Even though the figures may imply that the clustering produced
by KMeans has been performed reasonably well, it may also indicate that clustering performed by the
autoencoder might have taken into account some other hidden factors. Hence, since deep learning-based
approach is discovering and taking into account more hidden features among these two values (i.e.,
23
Fig. 14. Prediction of time series data for Cluster ”0” Fig. 15. Prediction of time series data for Cluster ”1”
Fig. 16. Prediction of time series data for Cluster ”3” Fig. 17. Prediction of time series data for Cluster ”2”
volatility and return), the clustering performed by the autoencoder is actually providing more insights
about these stock indices and their relationships. More precisely, it might indicate that there might be some
other hidden features discovered by the autoencoder that are missed and not formulated by conventional
KMeans clustering algorithms.
and trend in order to build prediction models. Although these features and techniques have acceptable
accuracy, their performance might be deteriorated mainly due to the existence of some other and hidden
features which are not part of the prediction or classification models.
Deep learning-based techniques are emerging approaches to data sciences and analyses. These tech-
niques are capable of detecting hidden features of data and thus take into account these features when
building prediction models. In particular, these techniques are expected to outperform traditional statistical
data analysis in the context of time series, due to the additional complexity added by the time factor to
the data.
This paper introduces a deep learning-based approach to model the time series clustering problem,
in which the given time series data are clustered into groups with respect to some features. The time
series clustering solution presented in this paper first generates labels for time series data using KMeans
clustering and thus enabling supervised learning. Once the cluster labels are generated, they are given
to an encode-decoder-based deep learning neural network in order to build a clustering and prediction
model. The most important advantage of building such as neural network is that it models hidden features
and takes into account such features into the prediction. The case study conducted in the context of the
financial time series data shows the accuracy of %87.5 in clustering such data. More importantly, we
observed that the deep learning-based model outperforms the conventional KMeans clustering.
The application of deep learning approaches to time series analysis and in particular financial time series
data is in its early stages. Several other classical problems in time series analysis can be formulated using
deep learning techniques such as shock and anomaly detection, seasonal effects as well as clustering and
prediction at different levels of data abstractions. Neural network-based techniques such as Long Short
Term Memory (LSTM) [43], [44], [45], [46], [47], Generative Adversarial Networks (GANS), and many
others need to be further explored for formulating classical problems in time series data analysis.
F UNDING S TATEMENT
This work is partially funded by the support from National Science Foundation under the grant numbers
1564293, 1723765, and 1821560.
R EFERENCES
[1] Y. Chen, M. A. Nascimento, B. C. Ooi, and A. K. Tung, “Spade: On shape-based pattern detection in streaming time series,” in 2007
IEEE 23rd International Conference on Data Engineering, pp. 786–795, IEEE, 2007.
[2] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, “Querying and mining of time series data: experimental comparison
of representations and distance measures,” Proceedings of the VLDB Endowment, vol. 1, no. 2, pp. 1542–1552, 2008.
[3] A. Stefan, V. Athitsos, and G. Das, “The move-split-merge metric for time series,” IEEE transactions on Knowledge and Data
Engineering, vol. 25, no. 6, pp. 1425–1438, 2013.
[4] X. Wang, A. Mueen, H. Ding, G. Trajcevski, P. Scheuermann, and E. Keogh, “Experimental comparison of representation methods
and distance measures for time series data,” Data Mining and Knowledge Discovery, vol. 26, no. 2, pp. 275–309, 2013.
[5] S. Aghabozorgi, A. S. Shirkhorshidi, and T. Y. Wah, “Time-series clustering–a decade review,” Information Systems, vol. 53, pp. 16–38,
2015.
[6] E. Keogh and J. Lin, “Clustering of time-series subsequences is meaningless: implications for previous and future research,” Knowledge
and information systems, vol. 8, no. 2, pp. 154–177, 2005.
[7] T. W. Liao, “Clustering of time series dataa survey,” Pattern recognition, vol. 38, no. 11, pp. 1857–1874, 2005.
[8] N. Tavakoli, D. Dai, and Y. Chen, “Client-side straggler-aware i/o scheduler for object-based parallel file systems,” Parallel Computing,
vol. 82, pp. 3–18, 2019.
[9] N. Tavakoli, D. Dai, and Y. Chen, “Log-assisted straggler-aware i/o scheduler for high-end computing,” in 2016 45th International
Conference on Parallel Processing Workshops (ICPPW), pp. 181–189, IEEE, 2016.
[10] M. Vlachos and D. Gunopulos, “Indexing time series under condition of noise. data mining in time series database: Series in machine
perception and artificial intelligence,” 2004.
[11] C. Ratanamahatana, E. Keogh, A. J. Bagnall, and S. Lonardi, “A novel bit level time series representation with implication of similarity
search and clustering,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 771–777, Springer, 2005.
25
[12] E. Keogh and S. Kasetty, “On the need for time series data mining benchmarks: a survey and empirical demonstration,” Data Mining
and knowledge discovery, vol. 7, no. 4, pp. 349–371, 2003.
[13] J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing sax: a novel symbolic representation of time series,” Data Mining and
knowledge discovery, vol. 15, no. 2, pp. 107–144, 2007.
[14] I. Popivanov and R. J. Miller, “Similarity search over time-series data using wavelets,” in Proceedings 18th international conference
on data engineering, pp. 212–221, IEEE, 2002.
[15] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A symbolic representation of time series, with implications for streaming algorithms,” in
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp. 2–11, ACM, 2003.
[16] A. Bagnall, E. Keogh, S. Lonardi, G. Janacek, et al., “A bit level representation for time series data mining with shape based similarity,”
Data Mining and Knowledge Discovery, vol. 13, no. 1, pp. 11–40, 2006.
[17] J. Shieh and E. Keogh, “i sax: indexing and mining terabyte sized time series,” in Proceedings of the 14th ACM SIGKDD international
conference on Knowledge discovery and data mining, pp. 623–631, ACM, 2008.
[18] D. Minnen, C. L. Isbell, I. Essa, and T. Starner, “Discovering multivariate motifs using subsequence density estimation and greedy
mixture learning,” in Proceedings of the National Conference on Artificial Intelligence, vol. 22, p. 615, Menlo Park, CA; Cambridge,
MA; London; AAAI Press; MIT Press; 1999, 2007.
[19] N. Kumar, V. N. Lolla, E. Keogh, S. Lonardi, C. A. Ratanamahatana, and L. Wei, “Time-series bitmaps: a practical visualization tool
for working with large time series databases,” in Proceedings of the 2005 SIAM international conference on data mining, pp. 531–535,
SIAM, 2005.
[20] K. Kalpakis, D. Gada, and V. Puttagunta, “Distance measures for effective clustering of arima time-series,” in Proceedings 2001 IEEE
international conference on data mining, pp. 273–280, IEEE, 2001.
[21] A. Bagnall and G. Janacek, “Clustering time series with clipped data,” Machine Learning, vol. 58, no. 2-3, pp. 151–178, 2005.
[22] S. Chu, E. Keogh, D. Hart, and M. Pazzani, “Iterative deepening dynamic time warping for time series,” in Proceedings of the 2002
SIAM International Conference on Data Mining, pp. 195–212, SIAM, 2002.
[23] X. Wang, K. Smith, and R. Hyndman, “Characteristic-based clustering for time series data,” Data mining and knowledge Discovery,
vol. 13, no. 3, pp. 335–364, 2006.
[24] L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons, 2009.
[25] E. J. Keogh and M. J. Pazzani, “An enhanced representation of time series which allows fast and accurate classification, clustering and
relevance feedback.,” in Kdd, vol. 98, pp. 239–243, 1998.
[26] L. Gupta, D. L. Molfese, R. Tammana, and P. G. Simos, “Nonlinear alignment and averaging for estimating the evoked potential,”
IEEE Transactions on Biomedical Engineering, vol. 43, no. 4, pp. 348–356, 1996.
[27] V. Hautamaki, P. Nykanen, and P. Franti, “Time-series clustering by approximate prototypes,” in 2008 19th International Conference
on Pattern Recognition, pp. 1–4, IEEE, 2008.
[28] J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley
symposium on mathematical statistics and probability, vol. 1, pp. 281–297, Oakland, CA, USA, 1967.
[29] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based algorithm for discovering clusters in large spatial databases with
noise.,” in Kdd, vol. 96, pp. 226–231, 1996.
[30] W. Wang, J. Yang, R. Muntz, et al., “Sting: A statistical information grid approach to spatial data mining,” in VLDB, vol. 97, pp. 186–
195, 1997.
[31] G. Sheikholeslami, S. Chatterjee, and A. Zhang, “Wavecluster: A multi-resolution clustering approach for very large spatial databases,”
in VLDB, vol. 98, pp. 428–439, 1998.
[32] T.-c. Fu, F.-l. Chung, V. Ng, and R. Luk, “Pattern discovery from stock time series using self-organizing maps,” in Workshop Notes of
KDD2001 Workshop on Temporal Data Mining, pp. 26–29, 2001.
[33] S. Aghabozorgi, T. Ying Wah, T. Herawan, H. A. Jalab, M. A. Shaygan, and A. Jalali, “A hybrid algorithm for clustering of time series
data based on affinity search technique,” The Scientific World Journal, vol. 2014, 2014.
[34] C.-P. Lai, P.-C. Chung, and V. S. Tseng, “A novel two-level clustering method for time series data analysis,” Expert Systems with
Applications, vol. 37, no. 9, pp. 6319–6326, 2010.
[35] M. Sewell, “Characterization of financial time series,” 2011.
[36] D. Mamtha and K. S. Srinivasan, “Stock market volatility: Conceptual perspective through literature survey,” Mediterranean Journal
of Social Sciences, vol. 7, no. 1, pp. 208 – 212, 2016.
[37] G. Bakaert and G. Wu, “Asymmetric volatility and risk in equity markets,” Review of financial Studies, vol. 13, no. 1, pp. 1 – 42, 2000.
[38] R. Whitelaw, “Stock market risk and return: An empirical equilibrium approach,” Review of financial Studies, vol. 13, no. 3, pp. 521
– 547, 2000.
[39] H. A. Shawky and A. Marathe, “Expected stock returns and volatility in a two regime market,” The Journal of Economics and Business,
vol. 47, no. 5, pp. 409 – 422, 1995.
[40] K. Chung and C. Chuwongananat, “Market volatility and stock returns: The role of liquidity providers,” Journal of Financial Markets,
vol. 37, pp. 17 – 34, 2018.
[41] Sen and Bandhopadhyay, “On the return and volatility spillover between us and indian stock market,” International Journal of Financial
Management, vol. 1, no. 3, 2012.
[42] N. Hubens, “Deep inside: Autoencoders,” 2019.
[43] S. Siami-Namini, N. Tavakoli, and A. S. Namin, “A comparison of ARIMA and LSTM in forecasting time series,” in 17th IEEE
International Conference on Machine Learning and Applications, ICMLA 2018, Orlando, FL, USA, December 17-20, 2018, pp. 1394–
1401, 2018.
[44] N. Tavakoli, “Modeling genome data using bidirectional LSTM,” in The 1st IEEE International Workshop on Deep Analysis of Data-
Driven Applications (DADA) in conjunction with COMPSAC, 2019.
26
[45] S. Siami-Namini, N. Tavakoli, and A. S. Namin, “The performance of LSTM and BiLSTM in forecasting time series,” in IEEE Big
Data, Los Angeles, California, USA, 2019.
[46] S. Siami-Namini, N. Tavakoli, and A. S. Namin, “A comparative analysis of forecasting financial time series using arima, lstm, and
bilstm,” arXiv preprint arXiv:1911.09512, 2019.
[47] F. Abri, S. Siami-Namini, M. A. Khanghah, F. M. Soltani, and A. S. Namin, “Can machine/deep learning classifiers detect zero-day
malware with high accuracy?,” in IEEE Big Data, Los Angeles, California, USA, 2019.