0% found this document useful (0 votes)
30 views4 pages

Large-Scale Unusual Time Series Detection

Uploaded by

jewel.plus.sic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views4 pages

Large-Scale Unusual Time Series Detection

Uploaded by

jewel.plus.sic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Large-Scale Unusual Time Series Detection

Rob J Hyndman Earo Wang Nikolay Laptev


Monash University Monash University Yahoo Research
Victoria, Australia Victoria, Australia California, USA
Email: [email protected] Email: [email protected] Email: [email protected]

Abstract—It is becoming increasingly common for organiza-


tions to collect very large amounts of data over time, and to need
to detect unusual or anomalous time series. For example, Yahoo
has banks of mail servers that are monitored over time. Many
measurements on server performance are collected every hour
for each of thousands of servers. We wish to identify servers that
are behaving unusually.
We compute a vector of features on each time series, mea-
suring characteristics of the series. The features may include lag
correlation, strength of seasonality, spectral entropy, etc. Then we
use a principal component decomposition on the features, and use
various bivariate outlier detection methods applied to the first two
principal components. This enables the most unusual series, based
on their feature vectors, to be identified. The bivariate outlier
detection methods used are based on highest density regions and
α-hulls.
Keywords—Feature Space, Multivariate Anomaly Detection,
Outliers, Time Series Characteristics

I. I NTRODUCTION
In the past decade a lot of work has been done on finding
Fig. 1. Different types of anomalies and corresponding first two principal
the most similar time series efficiently [24], [14]. In this paper components which our method uses for unusual time series detection. These
we focus on finding the least similar time series in a large set. types of anomalous time series may be due to an abnormal server or a malicious
We shall refer to such time series as unusual or anomalous. user.
Figure I gives a visual motivation for our approach. Each graph
in the left column shows a collection of 100 time series, two
of which are outliers having an abnormal trend or seasonality. within Yahoo and the open-source version of the proposed
The second column shows the first two principal components method is released [9] as an R package [18]. As shown in
which we use to identify unusual time series. Some unusual Section IV, the proposed method has impressive performance
time series are not easy to identify (e.g., seasonality anomalies for a wide variety of anomalies present in the time series,
in Figure I); for this reason, a robust, accurate and automated making it applicable to other use-cases such as identifying
solution is critical. anomalous users, data-base transactions, retail sales and many
others.
An important motivation for efficiently finding anomalous
time series comes from large internet companies. At such We make three fundamental contributions. First, we intro-
companies, thousands of servers power user services providing duce a novel and accurate method of using PCA with α-convex
an uninterrupted and secure user experience. It is therefore hulls for finding anomalous time series. Second we perform
critical to monitor the server metrics (e.g., latency, cpu), a study of possible features that are useful for the types of
represented by time series, for any unusual behavior. time series dynamics seen in web-traffic time series. Lastly
we perform experiments on both synthetic and real world data
We are interested in the time series that are anomalous and demonstrate the usefulness and wide applicability of our
relative to the other time series in the same cluster, or more method to finding interesting time series in a collection of other
generally, in the same set. This type of anomaly detection time series.
is different from univariate anomaly detection or even from
a multivariate point anomaly detection [6] because we are In Section II we present our approach that uses PCA and
interested in identifying entire time series that are behaving α-convex hulls. In Section III we look at the features used
unusually in the context of other metrics. Early detection of for explaining the variance in different scenarios. Experiments
these anomalous time series is critical for taking preemptive of the method are described in Section IV. Demonstration
action to protect users and provide a better user-experience. is described in Section V. Related work and conclusions are
The solution presented in this paper has been deployed at scale presented in Sections VI and VII respectively.
Fig. 2. Scree plots showing that on our real dataset, a significant proportion of the variation can be captured using the first three to five components. For
unusual time series detection we found that the first 2 components are sufficient.

II. A PPROACH have been specifically selected to address our use-case. For
example we divide a series into blocks of 24 observations
We first extract n features (see Section III) from m time to remove any daily seasonality. Then the variances of each
series. We then use Principal Component Analysis (PCA) (sim- block are computed and the variance of the variances across
ilar to [24]) to identify the patterns (i.e., principal components). blocks measures the “lumpiness” of the series. Some of our
The first two principal components (PCs) are then selected and features rely on a robust STL decomposition [3]. For example,
a two dimensional outlier detection algorithm is used to find the size and location of the peaks and troughs in the seasonal
the top k ∈ m outliers. component are used, and the spikiness feature is the variance
PCA is a tool for dimension reduction in high dimensional of the leave-one-out variances of the remainder component.
data. A principal component is a combination of the original Other features measure structural changes over time. The “level
variables after a linear transformation. For example the first shift” is defined as the maximum difference in mean between
principal component captures the maximum variation in the consecutive blocks of 24 observations, “variance change” is
rows of the m × n matrix. Therefore, loosely speaking the first computed similarly using variances, and the Kullback-Leibler
k principal components capture the k most prevalent patterns (KL) score is the maximum difference in KL divergence
in the data. (measured using kernel density estimation) between consecutive
blocks of 48 observations. “Flat spots” are computed by dividing
Figure II shows the fraction of the variance captured by the the sample space of a time series into ten equal-sized intervals,
first k principal components from real time series. We found that and computing the maximum run length within any single
using the first two principal components was sufficient for our interval. Finally, “crossing points” are defined as the number
use-cases. To find anomalies in the first two PCs we use a multi- of times a time series crosses the mean line.
dimensional outlier detection algorithm. We have implemented
a density-based and an α-hull based multi-dimensional outlier
detection algorithms. A more detailed look at the features will be presented in
the longer version of our paper.
The density based multi-dimensional anomaly detection
algorithm [7] finds points in the first two principal components
with lowest density. The α-hull method [17] is a generalization Feature Description
of the convex hull [6] which is a bounding region of a point set.
The α parameter in the α-hull method defines a generalized disk Mean Mean.
of radius α. When α is sufficiently large, the α-hull method is Var Variance.
equivalent to the convex hull. Given α, an edge of the α-shape ACF1 First order of autocorrelation.
Trend Strength of trend.
is drawn between two members of the finite point set if there
Linearity Strength of linearity.
exists a generalized disk of radius α containing the entire point Curvature Strength of curvature
set and the two points lie on its boundary. Season Strength of seasonality.
Peak Strength of peaks.
III. F EATURES Trough Strength of trough.
Entropy Spectral entropy.
We now describe the time series features we use in the Lumpiness Changing variance in remainder.
PCA. While we focus on our use-case of identifying anomalous Spikiness Strength of spikiness
servers in a large internet company, we attempt to make our Lshift Level shift using rolling window.
approach general and applicable to other use-cases where Vchange Variance change.
finding anomalous time series is critical. Fspots Flat spots using disretization.
Cpoints The number of crossing points.
The features identified should capture the global information KLscore Kullback-Leibler score.
of the time series. The features identified in our research add to Change.idx Index of the maximum KL score.
an already existing set of well established features that describe TABLE I. S UMMARY OF FEATURES USED FOR DETECTING UNUSUAL
TIME SERIES .
time series [4] including measures of trend, seasonality, and
serial correlation [22] and spectral entropy [5]. Some of features
Baseline Method Description
Baseline 1 Computes Mean Absolute Difference
between time series.
Baseline 2 Computes similarity between time se-
ries using discrete wavelet transform
(DWT) [10].
Baseline 3 Uses PCA to extract raw time series fea-
tures and uses K-Means for clustering.
The time series in the smallest cluster
are labeled as outliers [20].
TABLE II. S UMMARY OF THE BASELINE METHOD .

IV. E XPERIMENTS
We now evaluate the effectiveness of our anomaly detection
method using real-world and synthetic data comprising normal
and anomalous time series. Our goal is to detect anomalous
time series accurately.
The real dataset comes from Yahoo and represents the
various server metrics (e.g., memory usage, latency, cpu). The Fig. 3. Average accuracy of our method compared to baseline approaches.
unusual time series in the real dataset are based on a malicious
activity, new feature deployment or a traffic shift. The synthetic
dataset was generated by varying various time series parameters
such as the trend, seasonality and noise. Both the synthetic
and real datasets contain approximately 1500 time series with
labeled anomalies.
A. Overall Detection Accuracy
Here we evaluate the average performance of our method
relative to the baseline methods. Recall that our approach first
extracts the two most significant principal components (PC)s
from all time series and then determines the outliers in the
new 2D “feature space”. For PC extraction, we have tested the
regular PCA and Robust PCA (RPCA). For multidimensional
outlier detection on the PC space we show results for the
density-based method (HDR) and for the α-hull method.
The baselines are described in Table II. Because our method
has no direct competitor, we use time series similarity and
clustering techniques as baselines to detect unusual time series. Fig. 4. Scalability performance
We label a time series as unusual if it has a low average
similarity score or it belongs to the smallest cluster.
For this experiment both real and synthetic datasets were seconds as the number of total time series increases. Note that
used. For the synthetic dataset, 10 sets of time series were the number of unusual time series also increases proportionally
created. Each set consists of 1500 time series, 5 of which were to constitute roughly 2% of all time series. We chose to cap
creating with unusual features (e.g., unusually high seasonality). the performance comparison at 1000 time-series because 90%
All methods were evaluated in terms of the average accuracy = of our users have 10k time-series or less and our benchmark
#correct represents a good prediction of the expected performance for
#total across both real and synthetic datasets.
these users. We can observe from Figure 4 that our approach
Figure 3 shows that our PCA + α-hull approach performs performs favorably compared to others. Note that we were
the best. While it is not surprising that our technique out- not able to run Baseline method #2 due to extremely slow
performed the baselines because we use a well-researched performance above 100 time series therefore we do not include
feature-space, it is surprising that the Robust PCA method did it in the comparison. Also note that the feature extraction and
not perform well. This, however, can be explained by looking the anomaly detection of the PCA + α-hull increases only
at the optimization equation of Robust PCA [2] which ignores slightly as the number of time series is increased by an order
outliers thereby potentially missing the principal component of magnitude.
that explains the variance better.

B. Performance V. D EMONSTRATION D ESCRIPTION


Here we evaluate the performance of our algorithms com- The demonstration will be organized in two phases: (i)
pared to the baseline methods. The performance is measured in a brief introduction, and (ii) a “hands-on” phase. In (i), the
Fig. 5. A sample UI that uses the Anomalous package. By dragging the slider, the conference participant is able to filter out less ‘interesting’ time-series and
focus on only a few important metrics (out of thousands or millions of time-series) that may have caused a server outage, loss of data or other anomaly.

main features of the Anomalous package will be explained [3] R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning. STL:
and the system interface shown in Figure 5 that uses the a seasonal-trend decomposition procedure based on loess. J Official
Anomalous package to filter the “uninteresting” time-series will Stat., 6(1):3–73, 1990.
be described. In the second part of the demo the public is invited [4] B. D. Fulcher and N. S. Jones. Highly comparative feature-based time-
series classification. IEEE Trans. Knowl. Data Eng., 26(12):3026–3037,
to directly interact with the system and test its capabilities by Dec. 2014.
visually inspecting the results produced by the package on [5] G. Goerg. Forecastable component analysis. In Proc. 30th Int. Conf.
boh synthetic and real [15] data. Specifically, the conference Machine Learning, 2013.
participant will use the Anomalous package to rank the time- [6] V. Hodge and J. Austin. A survey of outlier detection methodologies.
series by how “interesting” they are using the slider shown in Artif. Intell. Rev.
Figure 5 thereby focusing on only a few (instead of thousands [7] R. J. Hyndman. Computing and graphing highest density regions. Amer.
or millions) potentially important series. Statist., 50(2):120–126, 1996.
[8] R. J. Hyndman and H. L. Shang. Rainbow plots, bagplots, and boxplots
VI. R ELATED W ORK for functional data. J Comp. Graph. Stat., 19(1):29–45, 2010.
[9] R. J. Hyndman, E. Wang, and N. Laptev. Anomalous package.
While our approach of identifying entire anomalous time http://github.com/robjhyndman/anomalous-acm.
series is novel, there are some parallels with existing work. [10] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality
For example authors in [11], [12], [13], [1] look at unusual reduction for fast similarity search in large time series databases.
subsequences within a single time series. Authors in [23] detect Knowledge and Information Systems, 3(3):263–286, 2001.
hosts that deviate from a common patten at a given time. PCA [11] E. Keogh, J. Lin, and A. W. Fu. HOT SAX: Efficiently finding the most
has also been used for detecting anomalous functions in a unusual time series subsequence. In ICDM, 2005.
sample of functions [8], and for detecting univariate anomalies [12] E. Keogh, J. Lin, A. W. Fu, and H. Van Herle. Finding unusual medical
by [19], [21]. In addition to anomaly detection, PCA has been time-series subsequences: Algorithms and applications. 2006.
employed as a similarity measure used in clustering [24], [14]. [13] E. Keogh, S. Lonardi, and B. Y. Chiu. Finding surprising patterns in a
time series database in linear time and space. In SIGKDD, 2002.
Authors in [16] use PCA for a multi-dimensional visualization
[14] E. J. Keogh and M. J. Pazzani. A simple dimensionality reduction
of a large collection of time series. None of the above methods, technique for fast similarity search in large time series databases.
however, address our problem of finding unusual time series In Knowledge Discovery & Data Mining: Current Issues & New
in a large collection of time series. Applications. Springer, 2000.
[15] N. Laptev and S. Amizadeh. Yahoo anomaly detection dataset s5.
VII. C ONCLUSION http://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70.
We propose using Principal Component Analysis (PCA) [16] D. T. Nhon and L. Wilkinson. Timeseer: detecting interesting distribu-
together with multi-dimensional anomaly detection to identify tions in multiple time series data. In PVisual Information Communication
and Interaction. ACM.
unusual time series in a large collections of time series.
Our method is robust and accurate as demonstrated by the [17] B. Pateiro-López and A. Rodrıguez-Casal. Generalizing the convex hull
experiments over synthetic and real data from Yahoo. Our of a sample: The R package alphahull. J. Stat. Soft., 34(5):1–28, 4 2010.
approach achieves a detection accuracy of over 80% (compared [18] R Core Team. R: A Language and Environment for Statistical Computing.
to 42% for baseline methods) and requires less than 0.5 seconds R Foundation for Statistical Computing, Vienna, Austria, 2015.
to process 1000 time series which is at least 3x faster than [19] M.-L. Shyu, S.-C. Chen, K. Sarinnapakorn, and L. Chang. Principal
baseline algorithms. More experiments such as the effect on component-based anomaly detection scheme. In Foundtions & Novel
performance as the number of principle components used by Approaches in Data Mining. 2006.
the outlier detection method increases are to be presented in our [20] A. Singhal and D. E. Seborg. Clustering multivariate time-series data.
full paper. Our method requires no a priori labeling or tuning of J. Chemometrics, 19(8):427–438, 2005.
parameters other than the user-acceptable sensitivity threshold. [21] B. Viswanath, M. A. Bashir, M. Crovella, S. Guha, K. P. Gummadi,
Our method incorporates thoughtful selection of features that B. Krishnamurthy, and A. Mislove. Towards detecting anomalous user
measure the types of anomalous behavior likely to occur in the behavior in online social networks. In USENIX, pages 223–238, Aug.
time series collection. The presented approach is open-sourced 2014.
[9] and is already deployed at scale within Yahoo. [22] X. Wang, K. Smith, and R. J. Hyndman. Characteristic-based clustering
for time series data. Data Min Knowl Discov, 13(3), 2006.
R EFERENCES
[23] H. Xiao, J. Gao, D. S. Turaga, L. H. Vu, and A. Biem. Temporal
[1] J. D. Brutlag. Aberrant behavior detection in time series for network multi-view inconsistency detection for network traffic analysis. WWW,
monitoring. In LISA ’00 Proc. 14th USENIX Conf. System Admin., 2000. 2015.
[2] E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component [24] K. Yang and C. Shahabi. A PCA-based similarity measure for
analysis? J. ACM. multivariate time series. In Workshop Multimedia Databases, 2004.

You might also like