0% found this document useful (0 votes)

21 views14 pages

Developing An Unsupervised Real-Time Anomaly Detection Scheme For Time Series With Multi-Seasonality TIMESERIES

The document presents an unsupervised real-time anomaly detection scheme for time series data characterized by multi-seasonality, addressing challenges posed by unlabeled data and complex seasonal patterns. It introduces a novel metric, Local Trend Inconsistency (LTI), and a prediction-driven algorithm that dynamically scores data points for their probability of being anomalous. Experimental results demonstrate that this approach outperforms existing methods in terms of efficiency and accuracy in detecting anomalies in time series data.

Uploaded by

N A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views14 pages

Developing An Unsupervised Real-Time Anomaly Detection Scheme For Time Series With Multi-Seasonality TIMESERIES

Uploaded by

N A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.

3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 1

Developing an Unsupervised Real-time Anomaly

Detection Scheme for Time Series with
Multi-seasonality
Wentai Wu, Student Member, IEEE, Ligang He, Member, IEEE, Weiwei Lin, Yi Su, Yuhua Cui, Carsten Maple,
and Stephen Jarvis, Member, IEEE
arXiv:1908.01146v3 [cs.LG] 23 Apr 2021

Abstract—On-line detection of anomalies in time series is a key regular basis and sends them to a central detection module,
technique used in various event-sensitive scenarios such as robotic which then analyzes the aggregated time series to detect any
system monitoring, smart sensor networks and data center anomalous events including hardware failures, unavailability
security. However, the increasing diversity of data sources and the
variety of demands make this task more challenging than ever. of services and cyber attacks. This requires a reliable on-
Firstly, the rapid increase in unlabeled data means supervised line detector with strong sensitivity and specificity. Otherwise,
learning is becoming less suitable in many cases. Secondly, the inefficient detection may cause unnecessary maintenance
a large portion of time series data have complex seasonality costs.
features. Thirdly, on-line anomaly detection needs to be fast Several classes of schemes have been applied to the problem
and reliable. In light of this, we have developed a prediction-
driven, unsupervised anomaly detection scheme, which adopts a of anomaly detection for time series data. In certain cases de-
backbone model combining the decomposition and the inference cent results can be achieved by these traditional methods such
of time series data. Further, we propose a novel metric, Local as outlier detection [4][8][9][10], pattern (segment) extraction
Trend Inconsistency (LTI), and an efficient detection algorithm [12][13][14][15] and sequence mapping [18][20][21]. How-
that computes LTI in a real-time manner and scores each data ever, we are facing a growing number of new scenarios and
point robustly in terms of its probability of being anomalous.
We have conducted extensive experimentation to evaluate our applications which produce large volumes of time series data
algorithm with several datasets from both public repositories and with unprecedented complexity, posing challenges that tradi-
production environments. The experimental results show that our tional anomaly detection methods cannot address effectively.
scheme outperforms existing representative anomaly detection First, more and more time series data are being produced
algorithms in terms of the commonly used metric, Area Under without labels since data labeling/annotation is usually very
Curve (AUC), while achieving the desired efficiency.
time-consuming and costly. Sometimes it is also unrealistic
Index Terms—time series, seasonality, anomaly detection, un- or impossible to acquire reliable labels when their correctness
supervised learning has to be guaranteed. Second, some applications may produce
multi-channel series with complex features such as multi-
I. I NTRODUCTION period seasonality (i.e., multiple seasonal, such as yearly or
monthly, patterns within one channel), long periodicity, fairly
Time series data sources have been of interest in a vast unpredictable channels and different seasonality between chan-
variety of areas for many years – the nature of time series nels. As a result, learning these patterns requires effective sea-
data was examined in a seminal study by Yule [1] and the sonality discovery and strong ability of generalization. Third,
techniques were applied to areas such as econometric [2] and the process is commonly required to be fast enough to support
oceanographic data [3] since the 1930s. However, in an era instant reporting or alarming once unexpected situation occurs.
of hyperconnectivity, big data and machine intelligence, new The capability of on-line detection is especially important in
technical scenarios are emerging such as autonomous driving, a wide range of event-sensitive scenarios such as medical and
edge computing and Internet of Things (IoT). Analysis of such industrial process control systems.
systems poses new challenges to the detection of anomalies in In this paper, we propose a predictive solution to detecting
time series data. Further, for a wide range of systems which anomalies effectively in time series with complex seasonality.
require 24/7 monitoring services, it has become crucial to The fundamental idea is to inspect the data samples as they ar-
have the detection techniques that can provide early, reliable rive and match the data samples with an ensemble of forecasts
reports of anomalies. In cloud data centers, for example, a made chronologically. Specifically, our solution comprises an
distributed monitoring system usually collects a variety of log augmented forecasting model and a novel detection algorithm
data from the virtual machine level to the cluster level on a that exploits the predictions of local sequences made by the
underlying forecasting model. We built a frame-to-sequence
Corresponding author: Ligang He. W. Wu, L. He and Y. Su are with the
Department of Computer Science, University of Warwick. W. Lin is with the Gated Recurrent Unit (GRU) network while extending its
School of Computer Science and Engineering at the South China University input with seasonal terms extracted by decomposing the time
of Technology. Y Cui is with the Research Institute of Worldwide Byte series of each sample channel. The integration of the seasonal
Information Security. C. Maple is with the Warwick Manufacturer Group,
University of Warwick. S. Jarvis is with the College of Engineering and features can alleviate negative impact from anomalous samples
Physical Sciences, University of Birmingham. in the training data since the anomalous samples have minor
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 2

impact on the long-term periodic patterns. Because of the The rest of this paper is organized as follows: Section II
above reasons, our prediction framework does not require discusses a number of studies related to anomaly detection.
the labels (specifying which data are normal or abnormal) In Section III, we introduce Local Trend Inconsistency as the
or uncontaminated training data (i.e., our solution tolerates key metric in our unsupervised anomaly detection scheme.
polluted/abnormal training samples). We then systematically present our unsupervised anomaly
After predicting local sequences (i.e., the output of the detection solution in Section IV, including the backbone model
forecasting model), we use a novel method to weight the for prediction and a scoring algorithm for anomaly detection.
ensemble of different forecasts based on the reliability of their We present and analyze the experimental results in Section V,
forecast sources and make it a chronological process to fit the and finally conclude this paper in Section VI.
on-line detection. The weight of each forecast is determined
dynamically during the process of detection by scoring each
II. R ELATED W ORK
forecast source (i.e., the forecast made based on this data
source), which reflects how likely the predictions made by a The term anomaly refers to a data point that significantly
forecast source is trustworthy. Based on the above ensemble, deviates from the rest of the data which are assumed to follow
we propose a new metric, termed Local Trend Inconsistency some distribution or pattern. There are two main categories
(LTI), for measuring the deviation of an actual sequence from of approaches for anomaly detection: novelty detection and
the predictions in real-time, and assigns an anomaly score to outlier detection. While novelty detection (e.g. classification
each of the newly arrived data points (which we also call methods [39][40][41][42]) requires the training data to be clas-
frames) in order to quantify the probability that a frame is sified, outlier detection (e.g., clustering, principal component
anomalous. analysis [20] and feature mapping methods [43][44]) does not
We also propose a method to map the LTI value of a frame need a prior knowledge of classes (i.e., labels) and thus is
to its Anomaly Score (AS) by a logistic-shaped function. The also known as unsupervised anomaly detection. The precise
mapping further differentiates anomalies and normal data. In terminology and definitions of these terminology may vary
order to determine the logistic mapping function, we propose in different sources. We use the same taxonomy as Ahmed
a method to automatically determine the optimal values of the et al. did in reference [45] whilst in the survey presented by
fitting parameters in the logistic mapping function. The AS Hodge and Austin [38] unsupervised detection is classified
value of a frame in turn becomes the weight of its impact on as a subtype of outlier detection. The focus of our work is
the detection of future frames. This makes our LTI metric on unsupervised anomaly detection since we aim to design
robust to the anomalous frames in the course of detection a more generic scheme and thus do not need to assume the
and significantly mitigates the potential impact of anomalous labels are unavailable.
samples on the detection results of the future frames. This In the detection of time series anomalies, we are interested
feature also enables our algorithm to work chronologically in discovering abnormal, unusual or unexpected records. In a
without maintaining a large reference database or caching too time series, an anomaly can be detected within the scope of
many historical data frames. To the best of our knowledge, the a single record or as a subsequence/pattern. Many classical
existing prediction-driven detection schemes do not take into algorithms can be applied to detect single-record anomaly as
account the reliability of the forecast sources. an outlier, such as the One Class Support Vector Machine
The main contributions of our work are as follows: (OCSVM) [4], a variant of SVM that exploits a hyperplane
• We designed a frame-to-sequence forecasting model in- to separate normal and anomalous data points. Zhang et al.
tegrating a GRU network with time series decomposition [5] implemented a network performance anomaly detector
(using Prophet, an additive time series model developed using OCSVM with Radial Basic Function (RBF), which
by Facebook [29]) to enable the contamination-tolerant is a commonly used kernel for SVM. Maglaras and Jiang
training on multi-seasonal time series data without any [6] developed an intrusion detection module based on K-
labels. OCSVM, the core of which is an algorithm that performs K-
• We propose a new metric termed Local Trend Incon- means clustering iteratively on detected anomalies. Shang et
sistency (LTI), and based on this metric we further al. [7] applies Particle Swarm Optimization (PSO) to find the
propose an unsupervised detection algorithm to score the optimal parameters for OCSVM, which they applied to detect
probability of data anomaly. An practical method is also the abnormalities in TCP traffic. In addition, Radovanović et
proposed for fitting the scoring function. al. [9] investigated the correlation between hub points and
• We mathematically present the computation of LTI in outliers, providing a useful guidance on using reverse nearest-
the form of matrix operations and prove the possibility neighbor counts to detect anomalies. Liu et al. [8] found that
of parallelization for further speeding up the detection anomalies are susceptible to the property of ”isolation” and
procedure. thus proposed Isolation Forest (iForest), an anomaly detection
• We conducted extensive experiments to evaluate the algorithm based on the structure of random forest. Taking
proposed scheme on two public datasets from the UCI advantage of iForest’s flexibility, Calheiros et al. [10] adapted
data repository and a more complex dataset from a pro- it to dynamic failures detection in large-scale data centers.
duction environment. The result shows that our solution For anomalous sequence or pattern detection, there are a
outperforms the existing algorithms significantly with low number of classical methods available such as box modeling
detection overhead. [11], symbolic sequence matching [18] and pattern extraction
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 3

[14][15]). For example, Huang et al. [19] proposed a scheme to have been applied to several forms of machine learning models
identify the anomalies in VM live migrations by combining the for efficiency boost.
extended Local Outlier Factor (LOF) and Symbolic Aggregate
ApproXimation (SAX). III. L OCAL T REND I NCONSISTENCY
Recent advance in machine learning techniques inspires
prediction-driven solutions for intelligent surveillance and de- In this section, we first introduce a series of basic notions
tection systems (e.g., [48][49]). A prediction-driven anomaly and frequently-used symbols, then define a couple of distance
detection scheme is often a sliding window-based scheme, in metrics, and finally present the core concept in our anomaly
which future data values are predicted and then the predictions detection scheme - Local Trend Inconsistency (LTI).
are compared against the actual values when the data arrive. In some systems, more than one data collection device is
This type of anomaly detection schemes has been attracting deployed to gather information from multiple variables relat-
much attention recently thanks to the remarkable performance ing to a common entity simultaneously, which consequently
of recurrent neural networks (RNNs) in prediction/forecasting generates multi-variate time series. In this paper we call them
tasks. Filonov et al. [33] proposed a fault detection framework multi-channel time series.
that relies on a Long Short Term Memory (LSTM) network Definition 1: A channel is the full-length sequence of a
to make predictions. The set of predictions along with the single variable that comprises the feature space of a time
measured values of data are then used to compute error series.
distribution, based on which anomalies are detected. Similar For the sake of convenience, we define a frame as follows.
methodologies are used by [34] and [24]. LSTM-AD [34] This concept of a frame is inspired by, but is more general
is also a prediction scheme based on multiple forecasts. In than, a frame in video processing (since a video clip can be
LSTM-AD the abnormality of data samples is evaluated by reckoned as a time series of images.)
analyzing the prediction error and the corresponding probabil- Definition 2: A frame is the data record at a particular point
ity in the context of an estimated Gaussian error distribution of time in a series. A frame is a vector in a multi-channel time
obtained from the training data. However, the drawback of series, or a scalar value in a single-channel time series.
LSTM-AD is that it is prone to the contamination of training Most of previous schemes detect anomalies by analyzing
data. Therefore, when the training data contains both normal the data items in a time series as separate frames. However,
and anomalous data, the accuracy of the prediction model is in our approach we attempt to conduct the analysis from the
likely to be affected, which consequently make the anomaly perspective of local sequences.
detection less reliable. Definition 3: A local sequence is a fragment of the target
Malhotra et al. [23] adopt a different architecture named time series; a local sequence at frame x is defined as a
encoder-decoder, which is based on the notion that only normal fragment of the series spanning from a previous frame to frame
sequences can be reconstructed by a well-trained encoder- x.
decoder network. A major limitation of their model is that an For clarity, we list all the symbols frequently used in this
unpolluted training set must be provided. As revealed by Pas- paper in Table I.
canu et al. [25], RNNs may struggle in learning complex sea-
sonal patterns in time series particularly when some channels TABLE I
of the series have long periodicity (e.g., monthly and yearly). L IST OF SYMBOLS
A possible solution to that is decomposing the series before Symbol Description
feeding into the network. Shi et al. [35] proposed a wavelet- X A time series X
BP (Back Propagation) neural network model for predicting X(t) The t-th frame of time series X
the wind power. They decompose the input time series into the X (c) The c-th channel of time series X
X (c) (t) The c-th component of the t-th frame of time series X
frequency components using the wavelet transform and build
x(i) The i-th feature of frame x
a prediction network for each of them. To forecast time series x̂k The forecast of the frame x predicted by frame k
with complex seasonality, De Livera et al. [37] adopt a novel S An actual local sequence from the target time series
state space modeling framework that incorporates the seasonal Sk A local sequence predicted by frame k
S(i) The i-th frame in local sequence S
decomposition methods such as the Fourier representation. A S(i, j) An actual local sequence spanning from frame i to j
similar model was implemented by Gould et al. [36] to fit Sk (i, j) A local sequence predicted by k spanning from frame i to j
hourly and daily patterns in utility loads and traffic flows data.
Ensuring low overhead is essential for real-time anomaly Euclidean Distance and Dynamic Time Warping (DTW)
detection. For example, Gu et al. [16] proposed an efficient Distance are commonly used to measure the distance between
motif (frequently repeated patterns) discovery framework in- two vectors. However, the scale of Euclidean Distance largely
corporating an improved SAX indexing method as well as a depends on the dimensionality, i.e., vector length. DTW dis-
trivial match skipping algorithm. Their experimental results tance can measure the sequence similarity, but cannot produce
on the CPU host load series show excellent time efficiency. the length-independent results. With the relatively high time
Zhu et al. [17] propose a new method for locating similar sub- complexity (O(n2 m) for m-dimensional sequences of length
sequences as well as a parallel approach using GPUs to accel- n), DTW is often applied to the sequence-level analysis, in
erate Dynamic Time Warping (DTW) for time series pattern which the target is a sequence of frames or a pattern of varying
discovery. Similarly, parallel algorithms (e.g., [50][51][52]) length. However, our work aims to perform the frame-wise,
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 4

on-line detection, i.e., detect whether a frame is anomalous as where i denotes the frame index and L − i is the temporal
the frame arrives. distance (with i = L being the current frame). Hence, the
Therefore, in this paper we use a modified form of Eu- corresponding normalization factor DL in Eq. (3) is the
clidean distance, called Dimension-independent Frame Dis- summation of a geometric series of length L:
tance (DF Dist) as formulated in Eq. (1), to measure the
distance between two frames x and y: L
X 1 − e−L
m DL = e−(L−i) = (5)
1 X 1 − e−1
DF Dist(x, y) = (x(i) − y (i) )2 (1) i=1
m i=1

where m is the number of dimensions (i.e., number of chan- where L is the sequence length.
nels) and x(i) and y (i) are the i-th component of frame x and Ideally it is easy to identify the anomalies by calculating
frame y, respectively. We do not square root the result. This W LSDist between the target (such as local sequence or
does not impact the effectiveness of our approach, but makes frame) and the ground truth. However, this approach is not
it easier to handle when we transform all computations into feasible if the labels are unavailable (i.e., there is no ground
matrix operations at the later stage of the processing. Also, truth). A possible solution is to replace the ground truth with
the desired scale (i.e., DF Dist ∈ [0, 1]) of the distance still expectation, which is obtained typically by using time series
holds for normalized data. forecasting methods [22][34], which is the basic idea of the so-
With DF Dist, we can further measure the distance between called prediction-driven anomaly detection schemes. However,
two local sequences of the same length. The desired metric for a critical problem with such a prediction-driven scheme is the
sequence distance should be independent on the length of the reliability of forecast. On the one hand, the prediction error is
sequences as we want to have a unified scale for any pair inevitable. On the other hand, the predictions made based on
of sequences. We formulate the Length-independent Sequence the historical frames, which may include anomalous frames,
Distance (LSDist) between two sequences SX and SY of the can be unreliable. This poses a great challenge for prediction-
same length in Eq. (2), where L is the length of the two local driven anomaly detection schemes.
sequences. Envisaging the above problems, we propose a novel, re-
liable prediction scheme, which makes use of multi-source
L forecasting. Unlike previous studies that use frame-to-frame
1X
LSDist(SX , SY ) = DF Dist(SX (i), SY (i)) (2) predictors, our scheme makes a series of forecast at different
L i=1 time points (i.e, from different sources) by building a frame-to-
sequence predictor. The resulting collection of forecasts form
Although the definition of LSDist already provides a
a common expectation from multiple sources for the target.
unified scale of distance, the temporal information of the
When the target arrives and if it deviates from the common
time series data is neglected. Assuming we are detecting
expectation, it is deemed that the target is likely to be an
the anomaly of the event at time t, we need to compare
anomaly. This is the underlying principle of our unsupervised
the local sequence at frame t with a ground truth sequence
anomaly detection.
(assume there is one) to see if anything goes wrong in the
latest time window. If we use LSDist as the metric, then In order to quantitatively measure how far the target deviates
every time point is regarded as being equally important. from the collection of expectations obtained from multiple
However, this does not practically comply with the rule of sources, we propose a metric we term the Local Trend Incon-
time decay, namely, the most recent data point typically has sistency (LTI). LTI takes into account the second challenging
the greatest reference value and also the greatest impact on issue discussed above (i.e, there may exist anomalous frames
what will happen in the next time point. Therefore, we refine in history) by weighting the prediction made based on a source
LSDist by weighting each term and adding a normalization (i.e., a frame at a previous time point) with the probability of
factor. The Weighted Length-independent Sequence Distance the source being normal.
(W LSDist) is defined in Eq. (3), where di is the weight of For a frame t (i.e., by which we refer to the frame arriving
time decay for frame i and DL is the normalization factor (so at time point t), LT I(t) is formally defined in Eq. (6), where
that W LSDist remains in the same scale as LSDist). S(i + 1, t) is the actual sequence from frame i + 1 to frame t,
and Si (i + 1, t) is the sequence of the same span predicted by
PL frame i (i.e., prediction made when frame i arrives). L is the
di · DF Dist(SX (i), SY (i))
i=1 length of the prediction window, which is a hyper-parameter
W LSDist(SX , SY ) =
DL determining the maximum length of the predicted sequence
(3)
and also the number of sources that make the predictions (i.e.,
Time decay is applied on the basis that the two sequences
the number of predictions/expectations) of the same target.
are chronologically aligned. In this paper, we use the expo-
P (i) denotes the probability of frame i being normal.
nentially decaying weights, which is similar to the exponential
moving average method [46]: Zt is the normalization factor for frame t defined as the
sum of all the probabilistic weights shown in Eq. (7). Zt is
di = e−(L−i) , i = 1, 2, ..., L (4) used to normalize the value of LT I(t) to the range of [0, 1].
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 5

the predicted data, which is measured by the distance between

t−1 them, is used to quantify the abnormality of the given data.
1 X
The whole process can be formulated using matrix opera-
LT I(t) = P (i) · W LSDist S(i + 1, t), Si (i + 1, t)
Zt tions. Assume we are detecting anomaly at frame t and the size
i=t−L
(6) of the prediction window is L. For brevity let dfk (t) denote the
distance between frame t and a forecast of the frame made at
t−1
X time k (i.e., DF Dist(t, t̂k )). We first define the frame-distance
Zt = P (i) (7) matrix DF :
i=t−L
DF (t−L)
 
 DF (t−L+1) 
DF = 
 
.. 
 . 
DF (t−1)
where  T
dfu (u + 1)
 dfu (u + 2) 
DF (u) = 
 
.. 
 . 
dfu (t)
Then we define two diagonal normalization matrices N1 and
N2 as follows:
Fig. 1. An example demonstrating the calculation of Local Trend Inconsis-
tency with the max probe length L equal to 3.  1
0

DL
1
Fig. 1 illustrates how LTI is calculated in a case where
 DL−1

N1 = 
 
L = 3 (i.e., the length of the prediction window is 3). Based . .

 . 
on the actual data arriving at t0 (the actual data are represented 0 1
D1
by circles), our scheme predicts the frames at three future time
points, i.e., t1 , t2 and t3 , which are depicted as green triangles 1
0
 
Zt
in the left part of Fig. 1. When the time elapses to t1 , the data  1 
Zt
at t1 arrives and our scheme predicts the data at the time points N2 = 
 
.. 
of t2 , t3 and t4 (in the figure we only plot the predictions up
 . 
1
to the time point t3 ), which are colored blue. Similarly, when 0 Zt
the time elapses to t2 , the data at t2 arrives and our scheme where DL and Zt are defined in (5) and (7), respectively. For
forecasts the data at the time points of t3 , t4 and t5 (colored convenience let dsk (t) denote W LSDist S(k + 1, t), Sk (k +
orange).
1, t) . Hence we can derive the matrix of weighted local
Now assume we want to calculate LT I(t3 ) to gauge the sequence distances denoted as DS :
abnormality of the data arriving at time t3 . As shown in Fig.  
1, at time t3 , we know the actual local sequence from t0 to t3 , dst−L (t)
i.e., S(t0 , t3 ) (corresponding to the term S(i + 1, t) in Eq. 6),  dst−L+1 (t) 
DS =   = N1 DF T
 
and also we have made the following three predictions, which ..
 . 
are the forecasts at three different time points:
dst−1 (t)
• S0 (t1 , t3 ): the predicted local sequence from t1 to t3 ,
which is predicted at time t0 ; where T is the time decay vector defined as:
• S1 (t2 , t3 ): the predicted local sequence from t2 to t3 , 
d1

which is predicted at time t1 ;  d2 
S2 (t3 ): the prediction of frame t3 made at time t2 . T= . 
 
•
 .. 
LT I(t3 ) is then obtained by i) calculating the weighted
dL
distances (i.e., W SLDist in Eq. 3) between the predicted
sequences and the corresponding actual sequence up to time where di is computed via Eq. (4). Now we assume the
t3 , i.e., the distances between S0 (t1 , t3 ) and S(t1 , t3 ) (shown probability of being normal is already known for each of frame
at the bottom right of Fig. 1), between S1 (t2 , t3 ) and S(t2 , t3 ) t’s predecessors (i.e., P (t − 1), P (t − 2), ...), and we put them
(middle right of Fig. 1), and between S2 (t3 ) and S(t3 ) (top together into a 1 × L matrix P:
right of Fig. 1); ii) calculating the weighted sum (the weight is
P (i)) of the distances obtained in last step, and iii) normalizing P = P (t − L) P (t − L + 1) · · · P (t − 1)
the weighted sum (i.e. divided by Zt in Eq. (7)). Then we can reformulate LT I(t) as below:
This multi-source prediction establishes the common expec-
tation for the data values. How far the actual data deviates from LT I(t) = PN2 DS = PN2 N1 DF T (8)
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 6

Through the use of matrices to formulate the calculation limitation of them is the difficulty in learning complex sea-
of LT I, we can know that the calculation can be performed sonal patterns in multi-seasonal time series. Even though the
efficiently in parallel. The Degree of Parallelism (DoP) of its accuracy may be improved by stacking more hidden layers and
calculation can be higher than L. This is because the DoP for increasing back propagation distance (through time) during
calculating the L terms in Eq. (6) can be L apparently (the training, it could cause prohibitive training cost.
calculation of every term is independent on each other). The In view of this, we propose to include the seasonal features
calculation of each term can be further accelerated (including of the input data explicitly as the input of the neural network.
the calculations of W LSDist and DF Dist) by parallelizing This is achieved by conducting time series decomposition
the matrix multiplication. For example, with a number of L×L before running the prediction model, which is the purpose of
processes (i.e., a grid of processes) and exploiting the Scalable the decomposition module. The resulting seasonal features can
Universal Matrix Multiplication Algorithm (SUMMA) [47], be regarded as the outcome of feature engineering. Technically
we can achieve a roughly L2 speedup in the multiplication of speaking, seasonal features are essentially the ”seasonal terms”
any two matrices with the dimension size of L, which helps decomposed from each channel of the target time series. We
reduce the time complexity of computing N1 DF from O(L3 ) use Prophet [29], a framework based on the decomposable
to O(L). Further, with the resulting N1 DF the computation time series model [28], to extract the channel-wise seasonal
of N1 DF T and PN2 can be performed in parallel as both terms. Let X (c) denote the c-th channel of time series X, and
of them are vector-matrix multiplication requiring only L pro- X (c) (t) the t-th record of the channel. The outcome of time
cesses and have time complexity of O(L2 /L) = O(L). Finally series decomposition for channel c is formulated as below:
multiplying the resulting matrices of PN2 (dimension=1 × L)
and N1 DF T (dimension=L × 1) consumes O(L). Note that X (c) (t) = gc (t) + sc (t) + hc (t) + (9)
the matrix DF contains L × L entries of frame distance, each where gc (t) is the trend term that models non-periodic
of which is calculated using Eq. (1). Therefore, updating DF changes, sc (t) represents the seasonal term that quantifies
(upon a new frame arrives) is an operation with the complexity the seasonal effects. hc (t) reflects the effects of special oc-
of O(L2 m/L2 ) = O(m), where m is the frame dimension. casions such as holidays, and is the error term that is not
Consequently, the time complexity of computing LT I(t) in accommodated by the model. For simplicity, we in this paper
parallel is O(m + L) in theory. only consider daily and weekly seasonal terms as additional
features for the inference module of our model. Prophet relies
IV. A NOMALY D ETECTION WITH LTI on Fourier series to model multi-period seasonality, which
Our anomaly detection scheme is based on LTI (Local Trend enables the flexible approximation of any periodic patterns
Inconsistency) as LTI can effectively indicate how significantly with arbitrary length. The underlying details can be referred
the series deviates locally from the common expectation to [29].
established by multi-source prediction. Separating seasonal terms from original frame values and
As can be seen from Eq. (6), there are still two problems to using them as additional features effectively improve RNN
be solved in calculating LT I. First, a mechanism is required to from the following perspectives. First, explicit input of sea-
make reliable predictions of local sequences. Second, we need sonal terms helps reduce the difficulty of learning complex
an algorithm to quantify the probabilistic factors (in matrix P) seasonal terms in RNN. The extracted seasonal terms quantify
as they are not known apriori. seasonal effects. Second, time cost of training is expected to
In this section, we first introduce the backbone model we decrease as we can apply the Truncated Back Propagation
build for achieving accurate frame-to-sequence forecasting. Through Time (TBPTT) with a distance much shorter than
The model is designed to learn the complex patterns in multi- the length of periodicity. Besides, the series decomposition
seasonal time series with tolerance to pollution in the training process is very efficient, which will be demonstrated later by
data. Then we illustrate how to make use of the predictions experiments. The top part of Fig. 2 shows the architecture
(from multiple source frames) made to compute LTI. Finally, of our backbone prediction model. In the prediction model, a
we propose an anomaly scoring algorithm that uses a scoring stacked GRU network is implemented as the inference module,
function to chronologically calculate anomaly probability for which takes as input the raw features of a frame concatenated
each frame based on LTI. with its seasonality features. We demonstrate the effectiveness
of this backbone model in Section V-A.
A. Prediction Model
To effectively learn and accurately predict local sequences in B. Computing LTI based on Predictions
multi-seasonal time series, we adopt a combinatorial backbone When we calculate Local Trend Inconsistency (LTI) in
model composed of a decomposition module and an inference Eq. (6), we are actually measuring the distance between a
module. local sequence and an ensemble of its predictions by a well
Recurrent Neural Network (RNN) is an ideal network to trained backbone prediction model. The workflow of our on-
implement the inference module of our prediction model. line anomaly detection method includes three main steps:
RNNs (including mutations such as Long Short Term Memory i) feed every arriving frame into the prediction model and
(LSTM) and Gated Recurrent Unit (GRU)) are usually applied continuously gather its output of predicting future frames, ii)
as end-to-end models (e.g., [26] [27]). However, a major organize the frame predictions by their sources (i.e., the frames
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 7

which made the forecast) and concatenate them into local Considering the second reason discussed above, we replace
sequences, and iii) compute LTI of the newly arrived frame P (i) in Eq. (6) with 1 − AS(i) where i = t − L, t − L +
according to Eq. (6). Fig. 2 demonstrates the entire process, in 1, ..., t − 1. Consequently, LT I(t) is reformulated as:
which LTI of a frame is converted to a score of abnormality
using the algorithms to be introduced later. LT I(t) =
t−1
1 X
(1 − AS(i)) · W LSDist S(i + 1, t), Si (i + 1, t)
C. Anomaly Scoring Zt
i=t−L
In theory, the values of LT I(t) can be directly used to score (12)
frame t in terms of its abnormality. However, the range of this
metric is application-specific. So we further develop a measure Pt−1 Zt is the normalization factor reformulated as
where
i=t−L (1 − AS(i)) and 1 − AS(i) represents the probability
that can represent the probability of data anomaly. Specifically, that frame i is normal.
we define a logistic mapping function to convert the value of The function Φ(·) contains two parameters, k and x0 . The
LT I(t) to a probabilistic value: values of these two parameters need to be set before the func-
tion can be used to calculate the anomaly. Since x0 is supposed
1
Φ(x) = (10) to the midpoint of x, we set x0 to be mean(LT I). We set
1+ e−k(x−x0 ) k to c/stdev(LT I) (stdev(LT I) is the standard deviation
where k is the logistic growth rate and x0 the x-value of the of LT I, and c is a constant multiplier). The purpose of the
function’s midpoint. mapping function is to disperse the LTI values that are densely
The left part of Fig. 3 shows the shapes of Φ(·) with clustered. On the one hand, the standard deviation stdev(LT I)
different values of k when x0 is set to 0.5. The shape of can be used to represent how densely the LTI values reside
Φ(·) becomes steeper as k increases. We will introduce how around the mean. The lower the value of stdev(LT I), the
to determine the optimal values of k and x0 later. more closely the LTI values are clustered. On the other hand, k
Now we define the probabilistic anomaly score of frame t represents how steep the middle slope of the logistic mapping
as below: function is. The greater k is, the steeper the logistic mapping
AS(t) = Φ(LT I(t)) (11) function is. The more densely clustered the LTI values are, the
steeper the logistic function needs to be in order to disperse
The reason why we use Eq. (10) to map LT I(t) to AS(t) those values. Therefore, for a set of LTI values with lower
are three folds. First, we find that the LT I(t) values are deviation, a bigger value should be set for k.
clustered together closely (top right of Fig. 3), which means Instead of setting the values of k and x0 manually, we
that the difference in LT I(t) values between normal and propose an automated approach in this work to determine their
abnormal frames are not significant. This makes it difficult to values. More specifically, we design an iterative algorithm. The
differentiate them in practice although we can do so in theory. algorithm runs on a reference time series which is a portion
The right part of Fig. 3 illustrates the situation where we map of the training data. The algorithm is outlined in Algorithm 1.
raw LT I(t) values to AS(t). It can be seen from the figure
that the value of anomaly scores are better dispersed leaving a Algorithm 1: Iterative procedure for unparameterizing
clearer divide between normal data and (potential) anomalies. Φ(·)
For example, the red line we draw separates out roughly 10 Input : prediction span L, reference series length r,
percent of potential anomalies with high scores. Second, as predicted local sequences Si (i + 1, i + L) for
discussed in the previous section, our scheme makes a series of i ∈ [0, r − 1]
forecast from different sources for the target, which establishes Output: k, x0
a common expectation for the target. The challenge is that k ← 1.0, x0 ← 0.5
there may exist anomalous sources, from which the forecast AS(i) ← 0 for all i ∈ [0, r − 1]
made is unreliable. Thus we have to differentiate the quality while convergence criterion is not satisfied do
of the predictions by specifying large weights (i.e., the P (i) for t ← L to r − 1 do
in Eq. 6) for normal sources and small weights for the sources compute LT I(t) via Eq. (12)
that are likely to be abnormal. With the function Φ(·) to compute AS(t) via Eq. (11)
disperse the LT I(t) values (by mapping them into AS(t), end
the impact difference between normal and abnormal frames c
k ← stdev(LT I) , x0 ← mean(LT I)
is magnified. Last but not the least, we find that the actual
end
values of LT I(t) depend on particular applications that our
detection scheme is applied to. After mapping, the AS(t)
values becomes less application-dependent, making it possible In Algorithm 1, parameters k and x0 are set to 1.0 and
to set a universal anomaly threshold. This is similar to the 0.5 initially, respectively. Note that it does not matter much
scenario of determining the unusual events if the samples what the initial values of k and x0 are. When Algorithm 1 is
follow the normal distribution: the values lying beyond two run on the reference time series, LTI for each frame of the
standard deviations from the mean are often regarded as reference series is calculated. The values of k and x0 will
unusual. converge to c/stdev(LT I) and mean(LT I) eventually. In the
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 8

Fig. 2. An overview of the proposed prediction-driven anomaly detection framework for the time series, which uses a seasonality augmented GRU network
as the backbone model to support the abnormality scoring based on Local Trend Inconsistency (LTI).

Algorithm 2: Anomaly Detection based on LTI

Input : current frame t, prediction span L, previous
frames from t − L to t − 1, AS(i) for
i ∈ [t − L, t − 1]
Output: AS(t)
for i ← t − L to t − 1 do
use the proposed prediction model to forecast
Si (i + 1, t)
compute W LSDist S(i + 1, t), Si (i + 1, t)
end
compute LT I(t) according to Eq. (12)
compute AS(t) according to Eq. (11)

predicted local sequences ending at t, which is the out-

put of our backbone prediction model. To analyze the time
Fig. 3. The mapping function Φ(·) we use for anomaly scoring (left), and
the dispersion effect by mapping LT I values (top right) to anomaly scores complexity of Algorithm 2, let m denote the number of
(bottom right) with Φ(·). dimensions of a frame (i.e., channels of the time series) and
L the prediction span, which is a hyper-parameter shared by
the backbone prediction model and the detection algorithm.
algorithm, we set a convergence criterion, in which both k and Without parallelization, it takes O(m) to calculate DF Dist
x0 change by less than 0.1% since last update. In each loop, between each pair of frames, so the time cost for obtaining
the algorithm computes LT I(t) and AS(t) along the reference W LSDist between two local sequences is O(Lm). Therefore,
series for each frame t. After each loop, we update k and x0 the time complexity of detection at a single frame t is O(L2 m)
and check if the criterion is met. since L sources of forecast are used (see Eq. 12). As analyzed
With the anomaly scoring function AS(·) and the trained in Section III, the complexity can be reduced to O(m + L)
backbone model for the target series, we now present our with the proper parallelization.
Anomaly Detection based on Local Trend Inconsistency (AD-
LTI). Assume we are detecting the anomaly for frame t, the V. E XPERIMENTS
pseudo-code of our on-line detection procedure is described In this section, we first evaluate the effectiveness of our
in Algorithm 2. backbone prediction model. Then we compare AD-LTI with
The information required for detection at frame t includes the existing anomaly detection algorithms in sensitivity and
frame t itself, anomaly scores of previous frames, and the specificity (using the AUC metric).
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 9

We set up our experiments on a machine equipped with Dodgers Loop Sensor is also a public dataset available in the
a dual-core CPU (model: Intel Core i5-8500, 3.00 GHz), a UCI data repository. The data were collected at the Glendale
GPU (model: GTX 1050Ti) and 32GB memory. The inference on-ramp for the 101 North freeway in Los Angeles. The sensor
module of our backbone model is implemented on Pytorch is close enough to the stadium for detecting unusual traffic
(version: 1.0.1) platform and the decomposition module is after a Dodgers game, but not so close and heavily used by the
implemented using Prophet (version: 0.4) released by Face- game traffic. Traffic observations were taken over 25 weeks
book. We select three datasets for evaluation. CalIt2 and (from Apr. 10 to Oct. 01, 2005) with date and timestamps
Dodgers Loop Sensor are two public datasets published by provided for both data records and events (i.e., the start and
the University of California Irving (UCI) and available in the end time of games). The raw dataset contains 50400 records
UCI machine learning repository. Another dataset we use is in total. We pro-processed the data to make it an hourly time
from the private production environment of a cyber-security series dataset.
company, which is the collaborator of this project. This dataset
collects the server logs from a number of clusters (owned by
other third-party enterprises) on a regular basis. The dataset is
referred to as the Server Log dataset in this paper.

CalIt2 Dataset
CalIt2 is a multivariate time series dataset containing 10080
observations of two data streams corresponding to the counts
of in-flow and out-flow of a building on UCI campus. The pur-
pose is to detect the presence of an event such as a conference
and seminar held in the building. The timestamps are contained
in the dataset. The original data span across 15 weeks (2520
hours) and is half-hourly aggregated. We truncated the last
120 hours and conducted a simple processing on the remaining
2400 hours of data by making it hourly-aggregated. The CalIt2
dataset is provided with annotations that label the date, start
Fig. 4. The Server Log time series dataset
time and end time of events over the entire period. There are
115 anomalous frames (4.56% contamination ratio) in total.
In our experiment, labels are omitted during training (because
our prediction model forecasts local sequences of frames) and A. Evaluating Backbone Model
will only be used for evaluating detecting results. We trained our prediction model on the datasets separately
to evaluate its accuracy as well as the impact of seasonal terms
Server Log Dataset extracted by the decomposition module. We split the datasets
The Server Log dataset is a multi-channel time series with into training, validation and test sets. On CalIt2, the first 1900
a fixed interval between two consecutive frames. The dataset frames were used for training and the following 500 for testing.
spans from June 29th to September 4th, 2018 (1620 hours On the Server Log dataset, 1100 frames for training and 520
in total). The raw data is provided to us in form of separate for test. On Dodgers Loop, 3000 records for training and 1000
log files, each of which stores the counts of a Linux server for test. 300, 300, and 500 frames were used for validation on
event on an hourly basis. The log files record the invocations of CalIt2, Server Log and Dodgers Loop, respectively.
five different processes, which include CROND, RSYSLOGD, The proposed model uses Prophet to implement the decom-
SESSION, SSHD and SU. Each process represents a channel position module and a stacked GRU network to implement the
of observing the server. We pre-processed the data by aggre- prediction module. We extracted daily and weekly terms for
gating all the files to form a five-channel time series. Fig. 4 each channel. More specifically, for each channel we generated
shows the time series of all five channels. two mapping lists after fitting the data by Prophet. One list
Currently, the company relies on security technicians to contains the readings at each of 24 hours in a day, while the
observe the time series and spot the potential anomalies, other list includes the readings at each of 7 days in a week.
which might be caused by the security attacks. The aim of Fig. 5 shows an example of the mapping lists.
this project is to develop the automated method to spot the The values of seasonal terms are different for CalIt2, Server
potential anomalies and quantify them at real time as the Log and Dodgers datasets, but the resulting mapping lists share
process invocations are being logged in the server. Anomalous the same format as the example shown in Fig. 5.
events such as external cyber attacks exist in the Server Log Based on the mapping lists and the timestamp field provided
dataset, but the labels are not available. We acquired the in the data we build our prediction network with seasonal
manual annotations for the test set from the technicians in features as additional input. Table II shows the network
the company. Totally 76 frames are labeled as anomalies in structures adopted for each of the datasets, where L is the
the test set, equivalent to a contamination ratio of 14.6%. maximum length of local sequences as a hyper-parameter.
tanh is used as the activation function and Mean Square Error
Dodgers Loop Sensor Dataset (MSE) loss as the loss function. Dropout is not enabled and
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 10

of representative related algorithms for comparison. These

baseline algorithms include One Class Support Vector Ma-
chine (OCSVM) [4], Isolation Forest (iForest) [8], Piecewise
Median Anomaly Detection [32], LSTM-based Fault Detec-
tion (LSTM-FD) [33] and LSTM-AD, which is LSTM-based
anomaly detection scheme using multiple forecasts [34].
OCSVM is a mutation of SVM for unsupervised outlier
detection. OCSVM shares the same theoretical basis as SVM
while using an additional argument ν as an anomaly ratio-
related parameter. Isolation forest is an outlier detection ap-
proach based on random forest in which isolation trees are
Fig. 5. An example of seasonal terms mapping in which the numerical values built instead of decision trees. An a priori parameter cr is
quantify seasonal impacts
required to indicate the contamination ratio. Both OCSVM
and Isolation Forest are embedded in the Scikit-learn package
we set a weight decay of 6e − 6 during the training to prevent [31]. Piecewise Median Anomaly Detection is a window-based
over-fitting. We use Adam [30] as the optimizer with the initial algorithm that splits the series into fixed-size windows within
learning rate set to 0.001. which anomalies are detected based on a decomposable series
model. LSTM-FD is a typical prediction-driven approach that
TABLE II detects anomalies in time series by simply analyzing (predic-
N ETWORK STRUCTURES OF THE INFERENCE PART FOR PREDICTING
LOCAL SEQUENCE OF LENGTH L tion) error distribution. They adopt a frame-to-frame LSTM
network as their backbone model. Similar to our approach,
Dataset type # of features (raw+seasonal) topology LSTM-AD also uses a multi-source prediction scheme (we
CalIt2 GRU 2+4 [6, 20×2, 2L]
Server log GRU 5+10 [15, 20×3, 5L] discussed its working in Section II).
Dodgers GRU 1+2 [3, 20×2, L] We use the AUC metric to measure the effectiveness.
Area Under the Curve, abbreviated as AUC, is a commonly
In order to evaluate the impact of the concatenated seasonal used metric for comprehensively assessing the performance of
features, we also implemented a baseline GRU network with binary classifiers. ”The curve” refers to the Receiver Operator
the same structure and hyper-parameters as our inference mod- Characteristic (ROC) Curve, which is generated by plotting
ule except that the seasonal features are not included. We also the true positive rate (y-axis) against the false positive rate (x-
consider the impact of a critical hyper-parameter, time steps, axis) based on the dynamics of decisions made by the target
in training the inference networks. The larger the time steps, classifier (the anomaly detector in our case). The concept of
the longer the gradients back-propagate through time and the ROC and AUC can reveal the effectiveness of a detection
more time-consuming the training process becomes. We set algorithm from the perspectives of both specificity and sen-
different values of time steps for the training of both our sitivity. Another reason why we choose AUC is because it
inference network and the baseline network to investigate the is a threshold-independent metric. AD-LTI does not perform
impact of seasonal features. The prediction span L is fixed to classification but presents the detection results in the form
5 (hours). The results are summarized in Table III. of probability. Hence metrics such as precision and recall
In Table III, the decomposition time and training time refer cannot be calculated unless we consider the threshold as an
to the fitting/training time spent by the decomposition module extra parameter, which violates our aim of designing a generic
and the inference module, respectively. We evaluated three scheme.
cases where time steps takes different values of 24 (daily We evaluate AD-LTI and the baseline algorithms on these
seasonality length), 72 or 168 (weekly seasonality length). three datasets. Parameters for baseline algorithms are set to
Mean squared error (MSE) is calculated on the normalized the default or the same as in the original papers if they
test data to reflect the model quality. From the results we can were suggested. For LSTM-FD, LSTM-AD and AD-LTI,
first observe that it only takes the decomposition module of time steps is set to 72 (hours).
our model a few seconds to extract the seasonal terms from As shown in Fig. 6, Fig. 7 and Fig. 8, we draw three groups
all the channels. More importantly, we find that augmenting of 1-D heatmaps to compare the detection decisions made by
the GRU model with seasonal terms (ST) makes the backbone each algorithm (labelled on the y-axis) with the ground truth
model (GRU+ST) more complicated in structure, but it does on each test dataset. Normal and anomalous frames are marked
not increase the training cost while resulting in much better by green and red, respectively, on the map of ground truth.
accuracy – it outperforms the baseline GRU network (without Frames are also marked by each anomaly detection algorithm
Seasonal Terms) significantly in accuracy (i.e., lower error). with scores, which are reflected using a range of colors from
The accuracy increases by more than 20 percent on CalIt2 and green to red. Anomaly events are sparse in Calit2 dataset (Fig.
by from 35 to nearly 50 percent on the Server Log dataset. 6) while comparatively more anomalous data point exist in the
Server Log (Fig. 7). From the figures, we can see that most
B. Evaluating AD-LTI of the “hotspots” are captured by our scheme and its false
In this section we evaluate our unsupervised anomaly alarm rate is comparatively low. Notably we also observe that
detection algorithm AD-LTI. We also implement a number OCSVM produces a large number of false alarms on Calit2
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 11

TABLE III
C OMPARING GRU+ST ( THE PROPOSED BACKBONE MODEL AUGMENTED WITH SEASONAL FEATURES ) WITH THE VANILLA GRU IN ACCURACY, WHICH
IS INDICTED BY THE LOWEST TEST MSE (M EAN S QUARE E RROR ) ACHIEVED UNDER DIFFERENT TRAINING SETTINGS OF time steps (ts). I N EACH
GROUP OF COMPARISON , BOTH MODELS HAVE CONVERGED AND TRAINED FOR THE SAME NUMBER OF EPOCHS .

Calit2 Dataset Server Log Dataset Dodgers Loop Dataset

GRU+ST GRU GRU+ST GRU GRU+ST GRU
seasonal term decomp. time 2.7s - 6.6s - 2.9s -
Test MSE 0.0068 0.0092 0.0020 0.0039 0.0098 0.0113
ts=24
Training time to converge 173.7s 176.7s 460.4s 464.9s 550.2s 552.3s
Test MSE 0.0066 0.0089 0.0013 0.0020 0.0066 0.0085
ts=72
Training time to converge 180.2s 185.6s 468.3s 436.9s 628.6s 632.9s
Test MSE 0.0067 0.0085 0.0018 0.0033 0.0072 0.0086
ts=168
Training time to converge 169.8s 174.5s 421.0s 446.5s 709.9s 752.9s

but fails to spot most of the anomaly frames on the Server Log
dataset. The Piecewise method misses a lot of anomalies, while
the iForest method tends to mistakenly label a large portion
of normal data as anomalies. LSTM-AD produced the results
close to our method on the Dodgers dataset, but rendered a
large portion of false alarms on other two datasets. To give a
more intuitive view, we plot the ROC curves of AD-LTI and
the baseline algorithms on the test data in Fig. 9, Fig. 10 and
Fig. 11.

Fig. 9. ROC curves of anomaly detection algorithms on Calit2 dataset

Fig. 6. Heatmaps of detection decisions made by AD-LTI and baseline

algorithms compared with the ground truth on CalIt2 dataset

Fig. 10. ROC curves of anomaly detection algorithms on Server Log dataset

Fig. 7. Heatmaps of detection decisions made by AD-LTI and baseline top-left corner for all of the three datasets, especially on the
algorithms compared with the ground truth on Server Log dataset Server Log Dataset (see Fig. 10), which features the complex
seasonality in each channel. The detection difficulty on the
Server Log dataset appears to be harder for other existing
algorithms (the reason is explained later) - none of other
algorithms achieve high true positive rate at a low false positive
rate. We further calculate the corresponding AUC for each
algorithm on both datasets. The resulting AUC values are
shown in Table IV.
As shown in Table IV, AD-LTI achieves the highest AUC
values of 0.93, 0.977 and 0.923 on CalIt2, Server Log dataset
Fig. 8. Heatmaps of decision results by AD-LTI and baseline algorithms
and the Dodgers Loop datasets, respectively. On CalIt2, the
compared with the ground truth on Dodgers Loop dataset AUC values of the baseline algorithms are between 0.8 and
0.9 with the only exception of OCSVM when nu is set to 0.05
From the ROC curves we can observe that AD-LTI produced - the approximately actual anomaly rate (0.046, precisely) for
the most reliable decisions as its curve is the closest to the CalIt2. This to some degree indicates that OCSVM is sensitive
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 12

AD and also observe different impacts of the probe window

length L on different datasets. On CalIt2 and Dodgers, the
impact of L on the detection reliability (revealed by AUC)
is subtle, while on the Server Log dataset very large L
values show obvious negative effect on our scheme. The
reasons behind these results are partly because as L becomes
bigger (L is set to 20 or above), the prediction made by the
backbone model becomes less accurate, and partly because of
the dilution of local information. In comparison, LSTM-AD is
much more susceptible to the hyper-parameter L. Besides, as
expected a longer probe length leads to the increased overhead
in detection, which can be mitigated by running the scheme
Fig. 11. ROC curves of anomaly detection algorithms on Dodgers Loop in parallel. Empirically, we recommend setting L to a value
dataset
between 5 and 20 considering both detection reliability and
efficiency.
TABLE IV
C OMPARING THE AUC VALUES OF ANOMALY DETECTION ALGORITHMS
ON C AL I T 2, S ERVER L OG AND D ODGERS L OOP DATASETS WHEREIN VI. C ONCLUSION
ACTUAL CONTAMINATION RATIOS (CR) ARE APPROXIMATELY 0.05, 0.15
AND 0.10, RESPECTIVELY.
On-line detection of anomalies in time series has been cru-
cial in a broad range of information and control systems that
CalIt2 Server Log Dodgers Loop are sensitive to unexpected events. In this paper, we propose an
OCSVM [4](default) 0.876 0.677 0.591
OCSVM (nu = CR) 0.708 0.672 0.525
unsupervised, prediction-driven approach to reliably detecting
iForest [8](default) 0.891 0.756 0.535 anomalies in time series with complex seasonality. We first
iForest (cr = CR) 0.877 0.761 0.518 present our backbone prediction model, which is composed
Piecewise AD [32] 0.833 0.721 0.751
LSTM-FD [33] 0.847 0.755 0.829
of a time series decomposition module for seasonal feature
LSTM-AD [34](L = L∗ ) 0.900 0.793 0.859 extraction, and an inference module implemented using a
AD-LTI (L = L∗ ) 0.935 0.977 0.923 GRU network. Then we define Local Trend Inconsistency, a
novel metric that measures abnormality by weighting local
expectations from previous records. We then use a scoring
to parameters. Anomaly detection is much more challenging function along with a detection algorithm to convert the LT I
on the Server Log dataset due to the increase in the number value into the probability that indicates a record’s likelihood of
of channels, and the complexity in seasonality and uncertainty being anomalous. The whole process can leverage the matrix
(e.g., channel SU is fairly unpredictable). As the result shows, operations for parallelization. We evaluated the proposed de-
the AUC values for all existing algorithms drop below 0.8 tection algorithm on three different datasets. The result shows
with the best of them, Isolation Forest, reaching 0.761 (with that our scheme outperformed several representative anomaly
the contamination ratio cr set to 0.15), which could float as it detection schemes commonly used in practice.
is a randomized algorithm. However, the actual contamination In the future we plan to focus on extending our work to
ratio is hardly a priori knowledge in practical scenarios. We address new challenges in large-scale, information-intensive
also observed that prediction-driven approaches (LSTM-FD, distributed systems such as edge computing and IoT. We aim
LSTM-AD and AD-LTI) significantly outperformed others on to refine our method with scenario-oriented designs, for in-
the Dodgers Loop dataset – this is mainly because of the stance, detection in asynchronized streams sent by distributed
presence of strong noise in the traffic data. The proposed sensors, and build a robust monitoring mechanism in order to
AD-LTI algorithm makes the most reliable decisions in all support intelligent decisioning in these types of systems.
of the tested scenarios. The main reasons are two-fold: from
one perspective, the underlying backbone model for AD-LTI ACKNOWLEDGEMENT
is very accurate with the complement of seasonal features This work is partially supported by Worldwide Byte Se-
that effectively captures complex seasonality and mitigates curity Co. LTD, and is supported by National Natural Sci-
the noise in raw data. From another perspective, AD-LTI is ence Foundation of China (Grant Nos. 61772205, 61872084),
robust in scoring each frame because we leverage multi-source Guangdong Science and Technology Department (Grant No.
forecasting and weight each prediction based on the confidence 2017B010126002), Guangzhou Science and Technology Pro-
of the prediction source. gram key projects (Grant Nos. 201802010010, 201807010052,
AD-LTI has an important hyper-parameter L, which deter- 201902010040 and 201907010001), and the Fundamental Re-
mines both the prediction length for the backbone model and search Funds for the Central Universities, SCUT (Grant No.
the maximum probe length for computing LTI. We evaluated 2019ZD26).
our algorithm against LSTM-AD (which is also based on
multiple forecasts) with different L values to investigate the R EFERENCES
impact of L on detection reliability and time efficiency. The
[1] Yule, G. U. (1926). Why do we sometimes get nonsense-correlations
result is summarized in Table V. between Time-Series?–a study in sampling and the nature of time-series.
From Table V we can see our method outperformed LSTM- Journal of the royal statistical society, 89(1), 1-63.
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 13

TABLE V
AUC VALUES AND DETECTION OVERHEADS ( IN MS PER FRAME ) USING LSTM-AD AND AD-LTI UNDER DIFFERENT SETTINGS OF PROBE LENGTH L.
B OTH METHODS USE MULTIPLE FORECASTS WITH EACH FRAME BEING PREDICTED FOR L TIMES .

Calit2 Dataset Server Log Dataset Dodgers Loop Dataset

Algorithm L
AUC overhead(ms/frame) AUC overhead(ms/frame) AUC overhead(ms/frame)
L=5 0.900 0.126 0.793 0.129 0.859 0.123
L = 10 0.883 0.142 0.753 0.148 0.778 0.142
LSTM-AD [34]
L = 20 0.847 0.174 0.596 0.183 0.813 0.174
L = 30 0.813 0.206 0.505 0.215 0.815 0.205
L=5 0.911 0.189 0.977 0.282 0.912 0.196
L = 10 0.912 0.353 0.925 0.399 0.923 0.316
AD-LTI
L = 20 0.935 0.706 0.845 0.784 0.906 0.707
L = 30 0.912 1.125 0.784 1.461 0.915 1.110

[2] Frisch, R., & Waugh, F. V. (1933). Partial time regressions as com- [19] Huang, T., Zhu, Y., Wu, Y., Bressan, S., & Dobbie, G. (2016). Anomaly
pared with individual trends. Econometrica: Journal of the Econometric detection and identification scheme for VM live migration in cloud
Society, 387-401. infrastructure. Future Generation Computer Systems, 56, 736-745.
[3] Seiwell, H. R. (1949). The principles of time series analyses applied to [20] Hyndman, R. J., Wang, E., & Laptev, N. (2015, November). Large-scale
ocean wave data. Proceedings of the National Academy of Sciences of unusual time series detection. In 2015 IEEE international conference on
the United States of America, 35(9), 518. data mining workshop (ICDMW) (pp. 1616-1619). IEEE.
[4] Schölkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., & Platt, [21] Li, J., Pedrycz, W., & Jamal, I. (2017). Multivariate time series anomaly
J. C. (2000). Support vector method for novelty detection. Proceedings detection: A framework of Hidden Markov Models. Applied Soft Com-
of the 12th International Conference on Neural Information Processing puting, 60, 229-240.
Systems (NIPS’99), pp. 582-588. [22] Chauhan, Sucheta and Vig, Lovekesh. Anomaly detection in ECG time
[5] Zhang, R., Zhang, S., Lan, Y., & Jiang, J. (2008). Network anomaly signals via deep long short-term memory networks. In Data Science and
detection using one class support vector machine. In Proceedings of Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International
the International MultiConference of Engineers and Computer Scientists Conference on, pp. 1–7. IEEE, 2015.
(Vol. 1). [23] Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P.,
& Shroff, G. (2016). LSTM-based encoder-decoder for multi-sensor
[6] Maglaras, L. A., & Jiang, J. (2014, August). Ocsvm model combined
anomaly detection. arXiv preprint arXiv:1607.00148.
with k-means recursive clustering for intrusion detection in scada
[24] Ahmad, S., Lavin, A., Purdy, S., & Agha, Z. (2017). Unsupervised real-
systems. In 10th International conference on heterogeneous networking
time anomaly detection for streaming data. Neurocomputing, 262, 134-
for quality, reliability, security and robustness (pp. 133-134). IEEE.
147.
[7] Shang, W., Zeng, P., Wan, M., Li, L., & An, P. (2016). Intrusion detection [25] Pascanu, R., Mikolov, T., & Bengio, Y. (2013, February). On the diffi-
algorithm based on OCSVM in industrial control system. Security and culty of training recurrent neural networks. In International conference
Communication Networks, 9(10), 1040-1049. on machine learning (pp. 1310-1318).
[8] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-based anomaly [26] Tang, X. (2019). Large-Scale Computing Systems Workload Prediction
detection. ACM Transactions on Knowledge Discovery from Data Using Parallel Improved LSTM Neural Network. IEEE Access, 7,
(TKDD), 6(1). 40525-40533.
[9] Radovanović, M., Nanopoulos, A., & Ivanović, M. (2014). Reverse [27] Chen, S., Li, B., Cao, J., & Mao, B. (2018). Research on Agricultural
nearest neighbors in unsupervised distance-based outlier detection. IEEE Environment Prediction Based on Deep Learning. Procedia computer
transactions on knowledge and data engineering, 27(5), 1369-1382. science, 139, 33-40.
[10] Calheiros, R. N., Ramamohanarao, K., Buyya, R., Leckie, C., & [28] Harvey, A. & Peters, S. (1990), Estimation procedures for structural time
Versteeg, S. (2017). On the effectiveness of isolation-based anomaly series models, Journal of Forecasting, Vol. 9, 89-108.
detection in cloud data centers. Concurrency and Computation: Practice [29] Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American
and Experience, 29(2017)e4169. doi: 10.1002/cpe.4169 Statistician, 72(1), 37-45.
[11] Chan, P. K., & Mahoney, M. V. (2005, November). Modeling multiple [30] Kingma, D. and Ba, J. (2015) Adam: A Method for Stochastic Opti-
time series for anomaly detection. In Fifth IEEE International Confer- mization. Proceedings of the 3rd International Conference on Learning
ence on Data Mining (ICDM’05) (pp. 8-pp). IEEE. Representations (ICLR 2015).
[12] Ye, L., & Keogh, E. (2009, June). Time series shapelets: a new primitive [31] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B.,
for data mining. In Proceedings of the 15th ACM SIGKDD international Grisel, O., ... & Vanderplas, J. (2011). Scikit-learn: Machine learning in
conference on Knowledge discovery and data mining (pp. 947-956). Python. Journal of machine learning research, 12(Oct), 2825-2830.
ACM. [32] Vallis, O., Hochenbaum, J., & Kejariwal, A. (2014). A novel technique
[13] Zakaria, J., Mueen, A., & Keogh, E. (2012, December). Clustering time for long-term anomaly detection in the cloud. In 6th USENIX Workshop
series using unsupervised-shapelets. In 2012 IEEE 12th International on Hot Topics in Cloud Computing (HotCloud 14).
Conference on Data Mining (pp. 785-794). IEEE. [33] P. Filonov, A. Lavrentyev, A. Vorontsov, Multivariate Industrial Time
Series with Cyber-Attack Simulation: Fault Detection Using an LSTM-
[14] Yeh, C. C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H.
based Predictive Data Model, NIPS Time Series Workshop 2016,
A., ... & Keogh, E. (2018). Time series joins, motifs, discords and
Barcelona, Spain, 2016.
shapelets: a unifying view that exploits the matrix profile. Data Mining
[34] Malhotra, P., Vig, L., Shroff, G., & Agarwal, P. (2015, April). Long
and Knowledge Discovery, 32(1), 83-123.
short term memory networks for anomaly detection in time series. In
[15] Hou, L., Kwok, J. T., & Zurada, J. M. (2016, February). Efficient Proceedings of European Symposium on Artificial Neural Networks,
learning of time series shapelets. In 13th AAAI Conference on Artificial Computational Intelligence and Machine Learning (ESANN 15’), pp.
Intelligence. 89-94.
[16] Gu, Z., He, L., Chang, C., Sun, J., Chen, H., & Huang, C. (2017). [35] Shi, H., Yang, J., Ding, M., & Wang, J. (2011). A short-term wind
Developing an efficient pattern discovery method for CPU utilizations power prediction method based on wavelet decomposition and BP neural
of computers. International Journal of Parallel Programming, 45(4), 853- network. Automation of Electric Power Systems, 35(16), 44-48.
878. [36] Gould, P. G., Koehler, A. B., Ord, J. K., Snyder, R. D., Hyndman, R.
[17] Zhu, H., Gu, Z., Zhao, H., Chen, K., Li, C. T., & He, L. (2018). J., & Vahid-Araghi, F. (2008). Forecasting time series with multiple
Developing a pattern discovery method in time series data and its GPU seasonal patterns. European Journal of Operational Research, 191(1),
acceleration. Big Data Mining and Analytics, 1(4), 266-283. 207-222.
[18] Wei, L., Kumar, N., Lolla, V. N., Keogh, E. J., Lonardi, S., & Chotirat [37] De Livera, A. M., Hyndman, R. J., & Snyder, R. D. (2011). Forecasting
(Ann) Ratanamahatana. (2005, June). Assumption-Free Anomaly Detec- time series with complex seasonal patterns using exponential smoothing.
tion in Time Series. In SSDBM (Vol. 5, pp. 237-242). Journal of the American Statistical Association, 106(496), 1513-1527.
THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 14

[38] Hodge, V., & Austin, J. (2004). A survey of outlier detection method-
ologies. Artificial intelligence review, 22(2), 85-126.
[39] Janssens, O., Slavkovikj, V., Vervisch, B., Stockman, K., Loccufier,
M., Verstockt, S., ... & Van Hoecke, S. (2016). Convolutional neural
network based fault detection for rotating machinery. Journal of Sound
and Vibration, 377, 331-345.
[40] Ince, T., Kiranyaz, S., Eren, L., Askar, M., & Gabbouj, M. (2016). Real-
time motor fault detection by 1-D convolutional neural networks. IEEE
Transactions on Industrial Electronics, 63(11), 7067-7075.
[41] Sabokrou, M., Fayyaz, M., Fathy, M., Moayed, Z., & Klette, R. (2018).
Deep-anomaly: Fully convolutional neural network for fast anomaly de-
tection in crowded scenes. Computer Vision and Image Understanding,
172, 88-97.
[42] Zheng, Y., Liu, Q., Chen, E., Ge, Y., & Zhao, J. L. (2014, June).
Time series classification using multi-channels deep convolutional neural
networks. In International Conference on Web-Age Information Man-
agement (pp. 298-310). Springer, Cham.
[43] Rajan, J. J., & Rayner, P. J. (1995). Unsupervised time series classifi-
cation. Signal processing, 46(1), 57-74.
[44] Längkvist, M., Karlsson, L., & Loutfi, A. (2014). A review of unsu-
pervised feature learning and deep learning for time-series modeling.
Pattern Recognition Letters, 42, 11-24.
[45] Ahmed, M., Mahmood, A. N., & Hu, J. (2016). A survey of network
anomaly detection techniques. Journal of Network and Computer Ap-
plications, 60, 19-31.
[46] Holt, C. C. (2004). Forecasting seasonals and trends by exponentially
weighted moving averages. International journal of forecasting, 20(1),
5-10.
[47] Van De Geijn, R. A., & Watts, J. (1997). SUMMA: Scalable universal
matrix multiplication algorithm. Concurrency: Practice and Experience,
9(4), 255-274.
[48] Chen, J., Li, K., Deng, Q., Li, K., & Philip, S. Y. (2019). Distributed
Deep Learning Model for Intelligent Video Surveillance Systems with
Edge Computing. IEEE Transactions on Industrial Informatics.
[49] Chen, J., Li, K., Bilal, K., Metwally, A. A., Li, K., & Yu, P. (2018).
Parallel protein community detection in large-scale PPI networks based
on multi-source learning. IEEE/ACM transactions on computational
biology and bioinformatics.
[50] Chen, J., Li, K., Bilal, K., Li, K., & Philip, S. Y. (2018). A bi-
layered parallel training architecture for large-scale convolutional neural
networks. IEEE transactions on parallel and distributed systems, 30(5),
965-976.
[51] Duan, M., Li, K., Liao, X., & Li, K. (2017). A parallel multiclassifi-
cation algorithm for big data using an extreme learning machine. IEEE
transactions on neural networks and learning systems, 29(6), 2337-2351.
[52] Chen, C., Li, K., Ouyang, A., Tang, Z., & Li, K. (2017). Gpu-accelerated
parallel hierarchical extreme learning machine on flink for big data.
IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47(10),
2740-2753.

Time Series Forecasting and Anomaly Detection Using Deep Learning - Elsevier
No ratings yet
Time Series Forecasting and Anomaly Detection Using Deep Learning - Elsevier
16 pages
Time Series Anomaly Detection With DL
No ratings yet
Time Series Anomaly Detection With DL
18 pages
Pms Deck Nasyda Linso
100% (1)
Pms Deck Nasyda Linso
21 pages
Anomaly Detection in Time Series Data: A Practical Implementation For Pulp and Paper Industry
No ratings yet
Anomaly Detection in Time Series Data: A Practical Implementation For Pulp and Paper Industry
108 pages
Machine Learning For Time Series Anomaly Detection: Ihssan Tinawi
No ratings yet
Machine Learning For Time Series Anomaly Detection: Ihssan Tinawi
55 pages
Anomaly Detection
No ratings yet
Anomaly Detection
51 pages
Uncertainty Theories and Multisensor Data Fusion
From Everand
Uncertainty Theories and Multisensor Data Fusion
Alain Appriou
No ratings yet
SSRN 5177260
No ratings yet
SSRN 5177260
38 pages
Content - DELMIA - Ergonomics at Work Essentials
No ratings yet
Content - DELMIA - Ergonomics at Work Essentials
28 pages
There Are Four Basic Types of Satellites
100% (1)
There Are Four Basic Types of Satellites
18 pages
Assignment Based On Dot Net Technology (303) Bca Iii Year Session:-2020 - 2021
No ratings yet
Assignment Based On Dot Net Technology (303) Bca Iii Year Session:-2020 - 2021
24 pages
Mathematics 10 04043
No ratings yet
Mathematics 10 04043
30 pages
Deep Learningfor Time Series Anomaly Detection
No ratings yet
Deep Learningfor Time Series Anomaly Detection
42 pages
Deep Learning For Anomaly Detection in Time-Series Data Review Analysis and Guidelines
No ratings yet
Deep Learning For Anomaly Detection in Time-Series Data Review Analysis and Guidelines
23 pages
Online Time-Series Anomaly Detection A Survey of M
No ratings yet
Online Time-Series Anomaly Detection A Survey of M
36 pages
Deep Learning For Time Series Anomaly Detection-A Survey
No ratings yet
Deep Learning For Time Series Anomaly Detection-A Survey
43 pages
Detection Des Anomalie
No ratings yet
Detection Des Anomalie
17 pages
Time - Series - Data 2024 05 22 05 16
No ratings yet
Time - Series - Data 2024 05 22 05 16
50 pages
Awwa C 510
No ratings yet
Awwa C 510
18 pages
Entropy 22 01363 v2
No ratings yet
Entropy 22 01363 v2
15 pages
Analysis of Anomaly and Novelty Detection in Time
No ratings yet
Analysis of Anomaly and Novelty Detection in Time
14 pages
Unsupervised Anomaly Detection in Multivariate Time Series
No ratings yet
Unsupervised Anomaly Detection in Multivariate Time Series
32 pages
hw4 Sol PDF
100% (2)
hw4 Sol PDF
23 pages
Cheboli Deepthi May2010 PDF
No ratings yet
Cheboli Deepthi May2010 PDF
83 pages
Anomalies in Time Series
No ratings yet
Anomalies in Time Series
19 pages
SMBL Merged
No ratings yet
SMBL Merged
28 pages
Automatic Anomaly Detection in The Cloud Via Statistical Learning
No ratings yet
Automatic Anomaly Detection in The Cloud Via Statistical Learning
13 pages
DL For Time Series Anomaly Detection
No ratings yet
DL For Time Series Anomaly Detection
42 pages
Robust Anomaly Detection For Multivariate Time Series Through Stochastic Recurrent Neural Network
No ratings yet
Robust Anomaly Detection For Multivariate Time Series Through Stochastic Recurrent Neural Network
10 pages
Anomaly Detection of Periodic Multivariate Time Series Under High Acquisition Frequency Scene in IoT
No ratings yet
Anomaly Detection of Periodic Multivariate Time Series Under High Acquisition Frequency Scene in IoT
10 pages
1 s2.0 S0893608023006469 Main
No ratings yet
1 s2.0 S0893608023006469 Main
13 pages
Sae Arp741c 2016
No ratings yet
Sae Arp741c 2016
22 pages
USAD Architecture
No ratings yet
USAD Architecture
14 pages
Sun - Etal - 2021 - Generic and Scalable Periodicity Adaptation For Time Series Anomaly Detection
No ratings yet
Sun - Etal - 2021 - Generic and Scalable Periodicity Adaptation For Time Series Anomaly Detection
18 pages
DCdetector Dual Attention Contrastive Representation Learning For Time Series Anomaly Detection
No ratings yet
DCdetector Dual Attention Contrastive Representation Learning For Time Series Anomaly Detection
14 pages
Multivariate Time-Series Anomaly Detection Via
No ratings yet
Multivariate Time-Series Anomaly Detection Via
10 pages
Neural Contextual Anomaly Detection For Time Series
No ratings yet
Neural Contextual Anomaly Detection For Time Series
22 pages
HybridAD A Hybrid Model-Driven Anomaly Detection Approach For Multivariate Time Series
No ratings yet
HybridAD A Hybrid Model-Driven Anomaly Detection Approach For Multivariate Time Series
13 pages
Anomaly Detection For Data Streams in Large-Scale Distributed Heterogeneous Computing Environments
No ratings yet
Anomaly Detection For Data Streams in Large-Scale Distributed Heterogeneous Computing Environments
11 pages
Calibrated One-Class Classification For Unsupervised Time Series Anomaly Detection
No ratings yet
Calibrated One-Class Classification For Unsupervised Time Series Anomaly Detection
14 pages
Deep Learning Based Anomaly Detection Approach For Air Pollution Assessment
No ratings yet
Deep Learning Based Anomaly Detection Approach For Air Pollution Assessment
12 pages
A Novel Anomaly Detection Approach For Internet of Things Time Series Data
No ratings yet
A Novel Anomaly Detection Approach For Internet of Things Time Series Data
13 pages
A Novel Technique For Long-Term Anomaly Detection in The Cloud
No ratings yet
A Novel Technique For Long-Term Anomaly Detection in The Cloud
6 pages
Sequential Anomaly Detection Using Inverse Reinfor
No ratings yet
Sequential Anomaly Detection Using Inverse Reinfor
11 pages
Oh 2019
No ratings yet
Oh 2019
11 pages
Tkde2022 Beatgan
No ratings yet
Tkde2022 Beatgan
14 pages
Go L Mohammad I 2015
No ratings yet
Go L Mohammad I 2015
10 pages
Elk 2111 123
No ratings yet
Elk 2111 123
17 pages
E Brochure Raptor
No ratings yet
E Brochure Raptor
11 pages
Benkabou 2021
No ratings yet
Benkabou 2021
11 pages
Everything-As-A-Service (XaaS) For Original Equipment Manufacturers
No ratings yet
Everything-As-A-Service (XaaS) For Original Equipment Manufacturers
26 pages
1 s2.0 S0166361522000896 Main 13
No ratings yet
1 s2.0 S0166361522000896 Main 13
16 pages
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
No ratings yet
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
13 pages
1 s2.0 S0167739X23000560 Main
No ratings yet
1 s2.0 S0167739X23000560 Main
12 pages
3647-Full Paper-12782-1-10-20230817
No ratings yet
3647-Full Paper-12782-1-10-20230817
6 pages
Anomaly Detection in Meteorological Data Using Machine Learning Techniques
No ratings yet
Anomaly Detection in Meteorological Data Using Machine Learning Techniques
6 pages
Atf ETH Master Thesis AD+RCA
No ratings yet
Atf ETH Master Thesis AD+RCA
43 pages
Time Series Anomaly Detection With Multiresolution Ensemble Decoding
No ratings yet
Time Series Anomaly Detection With Multiresolution Ensemble Decoding
9 pages
Large-Scale Unusual Time Series Detection
No ratings yet
Large-Scale Unusual Time Series Detection
4 pages
2020TadGAN Time Series Anomaly Detection Using
No ratings yet
2020TadGAN Time Series Anomaly Detection Using
11 pages
Anomaly Detection and Time Series Analysis1
No ratings yet
Anomaly Detection and Time Series Analysis1
6 pages
Sensors: Multivariate-Time-Series-Driven Real-Time Anomaly Detection Based On Bayesian Network
No ratings yet
Sensors: Multivariate-Time-Series-Driven Real-Time Anomaly Detection Based On Bayesian Network
13 pages
IoT Anomaly Detection Methods and Applications - A Survey - Elsevier Enhanced Reader
No ratings yet
IoT Anomaly Detection Methods and Applications - A Survey - Elsevier Enhanced Reader
17 pages
Anomaly Detection For IoT Time-Series Data A Survey
No ratings yet
Anomaly Detection For IoT Time-Series Data A Survey
14 pages
IForest ASD
No ratings yet
IForest ASD
6 pages
Display A CDS View Using ALV With IDA
No ratings yet
Display A CDS View Using ALV With IDA
7 pages
CH - 5. Memory Management
No ratings yet
CH - 5. Memory Management
86 pages
Anomaly Detection On Industrial Electrical Systems Using Deep Learning
No ratings yet
Anomaly Detection On Industrial Electrical Systems Using Deep Learning
6 pages
Time-Series Anomaly Detection Service at Microsoft
No ratings yet
Time-Series Anomaly Detection Service at Microsoft
9 pages
Recurrent Residual U-Net For Medical Image Segmentation
No ratings yet
Recurrent Residual U-Net For Medical Image Segmentation
17 pages
5.1.1 Objective and Scope: Jyenis 2020
No ratings yet
5.1.1 Objective and Scope: Jyenis 2020
8 pages
The Beginners Guide To Concrete Maturity Ebook
No ratings yet
The Beginners Guide To Concrete Maturity Ebook
32 pages
A Review On Anomaly Detection in Time Series
No ratings yet
A Review On Anomaly Detection in Time Series
6 pages
Skillnet Ireland - Network Brand Guidelines
100% (1)
Skillnet Ireland - Network Brand Guidelines
59 pages
StotraNidhi Telugu 15-Books Combo
No ratings yet
StotraNidhi Telugu 15-Books Combo
1 page
Customer Management Compact Handbook
No ratings yet
Customer Management Compact Handbook
10 pages
Information Technology: Assignment 2
No ratings yet
Information Technology: Assignment 2
18 pages
Certificates PageNumbers Centered From Intro
No ratings yet
Certificates PageNumbers Centered From Intro
67 pages
Solid-Body Trajectoids Shaped To Roll Along Desired Pathways 2023
100% (1)
Solid-Body Trajectoids Shaped To Roll Along Desired Pathways 2023
20 pages
Abaqus PDF
No ratings yet
Abaqus PDF
27 pages
Machine Learning Reveals The Control Mechanics of An Insect Wing Hinge 2024
No ratings yet
Machine Learning Reveals The Control Mechanics of An Insect Wing Hinge 2024
31 pages
Topographic VAEs Learn Equivariant Capsules
No ratings yet
Topographic VAEs Learn Equivariant Capsules
27 pages
Ford Truck f650 f750 Wiring Diagrams 1999
No ratings yet
Ford Truck f650 f750 Wiring Diagrams 1999
16 pages
Overhaul of WR & IMR Bearings
No ratings yet
Overhaul of WR & IMR Bearings
2 pages
Semantic Encoding During Language Comprehension at Single-Cell Resolution 2024
No ratings yet
Semantic Encoding During Language Comprehension at Single-Cell Resolution 2024
25 pages
A Genome-Wide Association Study For Regulators 2016 G3
No ratings yet
A Genome-Wide Association Study For Regulators 2016 G3
12 pages
Combined Voltage and Current Post Insulator Sensors: Ordering Table Part Number Sequence 96AB/CDEFGH Where
No ratings yet
Combined Voltage and Current Post Insulator Sensors: Ordering Table Part Number Sequence 96AB/CDEFGH Where
2 pages
Espan140 Solution 54860159 8697
No ratings yet
Espan140 Solution 54860159 8697
39 pages
Proposal
No ratings yet
Proposal
5 pages
Methods2023 Syllabus
No ratings yet
Methods2023 Syllabus
7 pages
Standard Truss Garage Plan
No ratings yet
Standard Truss Garage Plan
12 pages
2 Abstract (Black and White)
No ratings yet
2 Abstract (Black and White)
5 pages
The Business of Intellectual Property A Literature Review of IP Management Research
No ratings yet
The Business of Intellectual Property A Literature Review of IP Management Research
20 pages
Ensci 1100 Exam Part 2 Toledo Prince David Art M. Bsce 1 1
No ratings yet
Ensci 1100 Exam Part 2 Toledo Prince David Art M. Bsce 1 1
3 pages
Ole Excel Email
No ratings yet
Ole Excel Email
18 pages
Naat Nisa Brochure 2023...
No ratings yet
Naat Nisa Brochure 2023...
4 pages
CSF213 OOP Handout 2023 24 Sem I
No ratings yet
CSF213 OOP Handout 2023 24 Sem I
3 pages

Developing An Unsupervised Real-Time Anomaly Detection Scheme For Time Series With Multi-Seasonality TIMESERIES

Uploaded by

Developing An Unsupervised Real-Time Anomaly Detection Scheme For Time Series With Multi-Seasonality TIMESERIES

Uploaded by

THIS IS A PREPRINT VERSION OF THE WORK 10.1109/TKDE.2020.

3035685 PUBLISHED IN THE IEEE TKDE BY ©IEEE 1

Developing an Unsupervised Real-time Anomaly

the predicted data, which is measured by the distance between

Algorithm 2: Anomaly Detection based on LTI

predicted local sequences ending at t, which is the out-

of representative related algorithms for comparison. These

Calit2 Dataset Server Log Dataset Dodgers Loop Dataset

Fig. 9. ROC curves of anomaly detection algorithms on Calit2 dataset

Fig. 6. Heatmaps of detection decisions made by AD-LTI and baseline

AD and also observe different impacts of the probe window

Calit2 Dataset Server Log Dataset Dodgers Loop Dataset

You might also like