Revisiting VAE For Unsupervised Time Series Anomaly Detection: A Frequency Perspective

Uploaded by

806637918

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views10 pages

Revisiting VAE For Unsupervised Time Series Anomaly Detection: A Frequency Perspective

Uploaded by

806637918

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Revisiting VAE for Unsupervised Time Series Anomaly Detection:

A Frequency Perspective
Zexin Wang∗ Changhua Pei† Minghua Ma
Computer Network Information Computer Network Information Microsoft
Center, Chinese Academy of Sciences Center, Chinese Academy of Sciences Beijing, China
Beijing, China Beijing, China

Xin Wang Zhihan Li Dan Pei

Stony Brook University Kuaishou Technology Tsinghua University
New York, USA Beijing, China Beijing, China
arXiv:2402.02820v1 [cs.LG] 5 Feb 2024

Saravan Rajmohan Dongmei Zhang Haiming Zhang

Microsoft Qingwei Lin Jianhui Li
Redmond, USA Microsoft Gaogang Xie
Beijing, China Computer Network Information
Center, Chinese Academy of Sciences
Beijing, China

ABSTRACT CCS CONCEPTS

Time series Anomaly Detection (AD) plays a crucial role for web • Computing methodologies → Machine learning; • Security
systems. Various web systems rely on time series data to monitor and privacy → Systems security.
and identify anomalies in real time, as well as to initiate diagno-
sis and remediation procedures. Variational Autoencoders (VAEs) KEYWORDS
have gained popularity in recent decades due to their superior de- Univariate time series, Anomaly detection, Conditional variational
noising capabilities, which are useful for anomaly detection. How- autoencoder, Frequency information
ever, our study reveals that VAE-based methods face challenges
ACM Reference Format:
in capturing long-periodic heterogeneous patterns and detailed
Zexin Wang, Changhua Pei, Minghua Ma, Xin Wang, Zhihan Li, Dan Pei,
short-periodic trends simultaneously. To address these challenges, Saravan Rajmohan, Dongmei Zhang, Qingwei Lin, Haiming Zhang, Jianhui
we propose Frequency-enhanced Conditional Variational Autoen- Li, and Gaogang Xie. 2024. Revisiting VAE for Unsupervised Time Series
coder (FCVAE), a novel unsupervised AD method for univariate Anomaly Detection: A Frequency Perspective. In Proceedings of ACM Web
time series. To ensure an accurate AD, FCVAE exploits an innova- Conference 2024 (WWW ’24). ACM, New York, NY, USA, 10 pages. https:
tive approach to concurrently integrate both the global and local //doi.org/XXXXXXX.XXXXXXX
frequency features into the condition of Conditional Variational
Autoencoder (CVAE) to significantly increase the accuracy of re- 1 INTRODUCTION
constructing the normal data. Together with a carefully designed Time series anomaly detection (AD) is ubiquitous for web systems
“target attention” mechanism, our approach allows the model to pick [9, 11, 13, 14, 17, 21, 33, 36, 49, 50]. Numerous web systems, such as
the most useful information from the frequency domain for better online advertising systems, are monitored using a vast array of time
short-periodic trend construction. Our FCVAE has been evaluated series data (e.g., conversion rate) [52]. Deploying time series AD
on public datasets and a large-scale cloud system, and the results algorithms is essential for timely detecting anomalies and initiating
demonstrate that it outperforms state-of-the-art methods. This con- subsequent diagnosis and remediation processes.
firms the practical applicability of our approach in addressing the Anomalies are rare in real-world time series data [41], making it
limitations of current VAE-based anomaly detection models. difficult to label them and train a supervised model for anomaly de-
∗ Also with University of Chinese Academy of Sciences. tection [50]. Instead, unsupervised machine learning techniques are
† Corresponding author. Email: [email protected] commonly used [5, 7, 16, 18, 27, 34, 50, 54, 57–59]. These techniques
can be divided into two categories: prediction-based [16, 18, 59] and
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed construction-based [5, 27, 34, 50, 54]. Both types aim to identify nor-
for profit or commercial advantage and that copies bear this notice and the full citation mal values and compare them to actual values to detect anomalies.
on the first page. Copyrights for components of this work owned by others than ACM Prediction-based methods were originally developed for forecast-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a ing future data points, regardless of whether they were normal
fee. Request permissions from [email protected]. or anomalous. However, these methods may overfit to anomalous
WWW ’24, May 13–17, 2024, Singapore patterns and underperform. On the other hand, Variational Au-
© 2024 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00 toEncoders (VAEs) [23], the leading construction-based approach,
https://doi.org/XXXXXXX.XXXXXXX encode raw time series into a lower-dimensional latent space and
WWW ’24, May 13–17, 2024, Singapore Wang et al.

(a) Original Values (b) Transformed Spectrogram

Figure 2: A detailed view of the region enclosed by a blue

rectangle ① in Figure 1, where the shaded area represents the
value range before applying a sliding window average.

Challenge 1: Capturing similar yet heterogeneous periodic

patterns. From Figure 1, periodic patterns can be observed in the
curves, with one such period emphasized by the red shaded area
②. However, the shapes across different periods vary. As demon-
strated by the blue ellipse, existing VAE-based methods (as shown
in the second sub-figure) are unable to capture these heterogeneous
patterns effectively. This observation naturally leads to the idea
of utilizing conditional VAE to map data into distinct Gaussian
Figure 1: Comparison of four KPI reconstruction methods spaces by considering the timestamp as a condition. Unfortunately,
presented in our paper, highlighting anomalies in red ③. as illustrated in the fourth sub-figure (Time CVAE [27]), the results
The green shade ⑤ represents the difference between the are unsatisfactory, which we will further discuss below.
reconstructed values and the original values, the red shade
Challenge 2: Capturing detailed trends. Reconstructing monot-
② represents a long period, and the blue ellipse ④ indicates
onous patterns (i.e., trends) might appear straightforward at first
peaks and valleys that are not properly reconstructed, the
glance. However, upon a closer examination of the local window
blue rectangle ① will be magnified in Figure 2 for detailed
(highlighted in a blue rectangle ① in Figure 1 and magnified in
comparison.
Figure 2(a), it becomes evident that existing methods fail to restore
detailed patterns within this time frame. In Figure 2(a), the two
green lines initially overestimate the ground truth (purple curve)
but subsequently underestimate it for the remainder of the window.
then reconstruct them back to their original dimensions. VAEs are This is primarily because existing methods aim to minimize the
well-suited for detecting anomalies, but existing VAE-based anom- overall reconstruction error without focusing on “point-to-point”
aly detection models have not yet reached theoretically optimal dependencies, e.g., the precise upward and downward ranges fol-
performance. In this paper, we aim to re-examine the VAE model lowing a specific point. This omission results in fluctuating recon-
and improve its effectiveness in anomaly detection. struction outcomes (as seen in the second sub-figure). Although
In order to more effectively demonstrate the challenges associ- CNN attempts to model point-to-point dependencies within the
ated with VAE-based techniques, we provide an example in Figure 1. window, it still produces coarse-grained fluctuations (visible in the
The original curve is displayed in the first sub-figure, with anom- third sub-figure in Figure 1). The reason of unsatisfactory result
alies highlighted in red ③. The subsequent four sub-figures repre- of CNN-CVAE lies in Figure 2(b). Upon converting the curves re-
sent curves reconstructed by four distinct VAE-methods, including constructed by various methods (Figure 2(b)), it becomes evident
our proposed method (referred to as FCVAE). The reconstruction that the primary cause of these phenomena is the absence of
error is indicated by the green shaded area ⑤. To achieve superior some frequencies (smaller amplitude of certain frequencies) in
AD performance, the reconstructed result should closely resemble existing methods, hindering the reconstruction of detailed patterns.
the original curve for normal points, while deviating significantly This observation logically suggests the possibility of employing
for anomalous points ③. As evident in the figure, all VAE-based frequency as the conditional factor in a Conditional Variational
methods successfully disregard the anomalies during reconstruc- Autoencoder (CVAE). Nonetheless, employing frequency as the
tion. However, the reconstruction results for some normal points, condition in CVAE presented a new challenge.
particularly those marked by a blue rectangle ① and ellipse ④, are Challenge 3: A large number of sub-frequencies make the
not satisfactory. This substantially impacts the overall performance, signal in condition of CVAE noisy and difficult to use. Di-
leading us to identify three key challenges that we address in the rectly converting the entire window into the frequency-domain
subsequent sections. results in numerous sub-frequencies, adding noise and obstructing
Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective WWW ’24, May 13–17, 2024, Singapore

effective VAE-based reconstruction. To address these challenges, reconstruction. M-ELBO is defined in (1), 𝛼 𝑤 is defined as an indi-
we sub-divide the entire window into smaller ones and propose cator, where 𝛼 𝑤 = 1 indicates 𝑥 𝑤 being not anomalous or missing,
a target attention method to select the most useful sub-window
Í
and 𝛼 𝑤 = 0 otherwise. 𝛽 is defined as ( 𝑊𝑤=1 𝛼 𝑤 )/𝑊 .
frequencies.
𝑊
In this paper, we introduce a novel unsupervised anomaly detec- L = E𝑞𝜙 (z|x) [
∑︁
𝛼 𝑤 log𝑝𝜃 (𝑥 𝑤 |z) + 𝛽log𝑝𝜃 (z|x) − log𝑞𝜙 (z|x) ] (1)
tion algorithm, named FCVAE (Frequency-enhanced Conditional 𝑤=1
Variational AutoEncoder). Different from current VAE-based anom- The overall structure of CVAE [43] is similar to VAE, and it com-
aly detection methods, FCVAE innovatively incorporates both global bines conditional generative models with VAE to achieve stronger
and local frequency information to guide the encoding-decoding control over the generated data. The training objective of CVAE is
procedures, that both heterogeneous periodic and detailed trend defined as (2), where c is the condition, similar to that of VAE. FC-
patterns can be effectively captured. This in turn enables more VAE which will be elaborated on later extends the CVAE framework
accurate anomaly detection. Our paper’s contributions can be sum- by incorporating frequency information.
marized as follows: L = E𝑞𝜙 (z|x,c) [log𝑝𝜃 (x|z, c) + log𝑝𝜃 (z) − log𝑞𝜙 (z|x, c) ] (2)
• Our analysis of the widely-used VAE model for anomaly
detection reveals that existing VAE-based models fail to cap- 3 METHODOLOGY
ture both heterogeneous periodic patterns and detailed trend
3.1 Framework Overview
patterns. We attribute this failure to the missing of some
frequency-domain information, which current methods fail
to reconstruct.
• Our study systematically improves the long-standing VAE by
focusing on frequency. Our proposed FCVAE makes the VAE-
based approach the state-of-the-art in anomaly detection
once more. This is significant because VAE-based methods
can inherently handle mixed anomaly-normal training data,
while prediction-based methods cannot.
• Evaluations demonstrate that our FCVAE substantially sur-
passes state-of-the-art methods (∼40% on public datasets
and 10% in a real-world web system in terms of F1 score).
Comprehensive ablation studies provide an in-depth analy-
sis of the model, revealing the reasons behind its superior
performance.
The replication package for this paper, including all our data, Figure 3: Overall Framework.
source code, and documentation, is publicly available online at
https://github.com/CSTCloudOps/FCVAE. The proposed algorithm for anomaly detection is illustrated in
Figure 3 and comprises three main components: data preprocessing,
2 PRELIMINARIES training, and testing.
2.1 Problem Statement 3.2 Data Preprocessing
Given a UTS x = [𝑥 0, 𝑥 1, 𝑥 2, · · · , 𝑥𝑡 ] and label series L = [𝑙 0, 𝑙 1, 𝑙 2, · · · , 𝑙𝑡 ], Data preprocessing encompasses standardization, filling missing
where 𝑥𝑖 ∈ R, 𝑙𝑖 ∈ {0, 1}, and 𝑡 ∈ N. x represents the entire time and anomaly points, and the newly introduced method of data
series data array, while 𝑥𝑖 signifies the metric value at time 𝑖. L augmentation. The efficacy of data standardization and filling
denotes the label of time series x. We define the UTS anomaly missing and anomaly points has been substantiated in prior studies
detection task as follows: [27, 30, 50]. Therefore, we directly incorporate these techniques
Given a UTS x = [𝑥 0, 𝑥 1, 𝑥 2, · · · , 𝑥𝑡 ], the objective of UTS anomaly into our approach.
detection is to utilize the data [𝑥 0, 𝑥 1, · · · , 𝑥𝑖 −1 ] preceding each point Previous data augmentation methods [26, 47, 53] often added
𝑥𝑖 to predict 𝑙𝑖 . normal samples, such as variations of data from the time domain or
frequency domain. However, for our method, we train the model by
2.2 VAEs and CVAEs incorporating all the time series from the dataset together, which
VAE is composed of an encoder 𝑞𝜙 (z|x) and a decoder 𝑝𝜃 (z|x). VAE provides sufficient pattern diversity. Furthermore, FCVAE has the
can be trained by using the reparameterization trick. SGVB [39] is ability to extract pattern information due to the addition of fre-
a commonly used training method for VAE because of its simplicity quency information, so it can handle new patterns well. Nonethe-
and effectiveness. It maximizes the evidence lower bound (ELBO) to less, even with the introduction of frequency information, anom-
simultaneously train the reconstruction and generation capabilities alies are often challenging to be effectively addressed. For the model
of VAE. to learn how to handle anomalies, we primarily focus on abnormal
DONUT [50] proposed the modified ELBO (M-ELBO) to weaken data augmentation. In time series data, anomalies are mostly mani-
the impact of abnormal and missing data in the window on the fested as pattern mutations or value mutations (shown in Figure 6),
WWW ’24, May 13–17, 2024, Singapore Wang et al.

Figure 4: FCVAE Model Architecture.

Figure 5: Architecure of LFM.

Figure 7: Architecure of GFM.

(a) Pattern Anomaly (b) Value Anomaly tails in the frequency domain. Therefore, we employ a linear layer
after the FFT to filter out the useful frequency information that can
Figure 6: Examples of the two most frequent anomalies, represent the current window pattern. Moreover, we incorporate
where the red shaded area denotes the abnormal segments. a dropout layer following Fedformer [60] to enhance the model’s
ability to learn the missing frequency information.
The 𝑓𝑔𝑙𝑜𝑏𝑎𝑙 ∈ R1×𝑑 is calculated as (4), where d is the embedding
so our data augmentation mainly targets to these two aspects. The dimension of the global frequency information and F means FFT.
augmentation on the pattern mutation is generated by combining
𝑓𝑔𝑙𝑜𝑏𝑎𝑙 = Dropout(Dense( F (x) ) ) (4)
two windows from different curves, with the junction acting as the
anomaly. Value mutation refers to changing some points in the win- 3.3.2 LFM. The attention mechanism [46] has been widely adopted
dow to randomly assigned abnormal values. With the augmented in time series data processing due to its ability to dynamically pro-
anomaly data, M-ELBO in CVAE, which will be introduced in detail cess dependencies between different time steps and focus on impor-
later, can perform well even in an unsupervised setting without tant ones. Target attention, which is developed based on attention,
true labels. is widely used in the field of recommendation [4]. Specifically, tar-
get attention can weigh the features of the target domain, leading
3.3 Network Architecture to more accurate domain adaptation.
The proposed FCVAE model is illustrated in Figure 4. It comprises The GFM module extracts the frequency information from the
three main components: encoder, decoder, and a condition extrac- entire window, proving to be effective in reconstructing the data
tion block that includes a global frequency information extraction within the whole window. However, we use a window to detect
module (GFM) and a local frequency information extraction module whether the last point is abnormal, which poses a challenge because
(LFM). Equation (3) illustrates how our model works. the GFM module does not provide sufficient attention to the last
point. This can result in a situation where the reconstruction is
𝜇, 𝜎 = Encoder(x, LFM(x), GFM(x) )
satisfactory for part of the window but not for another part, espe-
z = Sample(𝜇, 𝜎 ) (3)
cially when changes in system services lead to the concept drift
𝜇 x , 𝜎x = Decoder(z, LFM(x), GFM(x) )
in the time series data. Even in the absence of concept drift, GFM
3.3.1 GFM. The GFM module (Figure 7) extracts the global fre- cannot capture local changes as it extracts the average frequency
quency information using the FFT transformation (F ). However, information from the entire window; hence, the reconstruction of
not all frequency information is useful. The frequencies resulted the last key point may be unsatisfactory. Nonetheless, as previously
from the noise and anomalies in the time series data appear as long mentioned, target attention can effectively address this issue, as
Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective WWW ’24, May 13–17, 2024, Singapore

it captures the frequency information of the entire window while

AnomalyScore = −E𝑞𝜙 (z|x,c) [log𝑝𝜃 (x|z, c) ] (7)
paying a greater attention to the latest time point. Therefore, we
propose the LFM that incorporates the target attention.
As depicted in Figure 5, the LFM module operates by sliding 4 EXPERIMENTS
the entire window x to obtain several small windows x𝑠𝑤 . Subse- 4.1 Experiment Settings
quently, FFT and frequency information extraction are applied to 4.1.1 Datasets. To evaluate the effectiveness of our proposed al-
each small window. The most recent small window is used as the gorithm, we conducted experiments on four datasets. Yahoo [2]
query 𝑄 because it contains the last point that we want to detect. is an open data set for anomaly detection released by Yahoo lab.
The remaining small windows are utilized as keys 𝐾 and values 𝑉 KPI [28] KPI is collected from five large Internet companies (Sougo,
for target attention. Finally, a linear layer is employed to facilitate eBay, Baidu, Tencent, and Ali). WSD [1] Web service dataset (WSD)
the model in learning to extract the most important and useful contains real-world KPIs collected from three top-tier Internet com-
part of the local frequency information, and dropout is also applied panies, Baidu, Sogou, and eBay, providing large-scale Web services.
to enhance the model’s ability to reconstruct some of the local NAB [25] The Numenta Anomaly Benchmark (NAB) is an open
frequency information like GFM. dataset created by Numenta company for evaluating the perfor-
x𝑠𝑤 = SlidingWindow(x) mance of time series anomaly detection algorithms.
𝑄 = Select(Dense( F (x𝑠𝑤 ) ) ) 4.1.2 Baseline Methods. To benchmark our model FCVAE against
(5)
𝐾, 𝑉 = Dense( F (x𝑠𝑤 ) ) existing methods, we chose the following approaches for evalua-
𝑓𝑙𝑜𝑐𝑎𝑙 = Dropout(FeedFawrd( (𝜎 (𝑄 · 𝐾 ⊤ ) · 𝑉 ) ) tion: SPOT [42], SRCNN [38], TFAD [53], DONUT [50], Informer
The calculation of 𝑓𝑙𝑜𝑐𝑎𝑙 ∈ R1×𝑑 in LFM is given by (5), where [59], Anomaly-Transformer [51], AnoTransfer [54], VQRAE [22].
𝑑 is the embedding dimension of the local frequency information, SPOT represents a traditional statistical method rooted in extreme
which is the same as that of GFM. Here, x𝑠𝑤 ∈ R𝑛×𝑘 represents a value theory. SRCNN and TFAD are supervised methods relying
group of small windows extracted from the original window, where on high-quality labels. Donut, VQRAE, and AnoTransfer are unsu-
𝑘 is the dimension of the small windows and 𝑛 is the number of pervised reconstruction-based methods utilizing VAE for normal
small windows. The Select function is employed to select the latest value reconstruction. Informer is an unsupervised prediction-based
window as the query 𝑄 and the Dense function means dense neural method that endeavors to predict normal values using an attention
network. The softmax function 𝜎 is used to calculate the attention mechanism. Anomaly-Transformer is an unsupervised anomaly
weights for the small windows. detection method leveraging the transformer architecture.
4.1.3 Evaluation Metrics. In practical applications, operators tend
3.4 Training and Testing to be less concerned with point-wise anomaly detection, i.e., whether
The training process of FCVAE incorporates three key technologies: each individual point is classified as anomalous or not, and focus
CVAE-based modified evidence lower bound (CM-ELBO), missing more on detecting continuous anomalous segments in time series
data injection, as well as the newly proposed masking the last data. Moreover, due to the substantial impact of anomalous seg-
point. As shown in (6), CM-ELBO is obtained by applying M- ments, operators aim to identify such segments as early as possible.
ELBO [50] to CVAE. Missing data injection [30, 50] is a commonly To address these requirements, we adopt two metrics, best F1 and
used technique in VAE that we directly apply. We observed that an delay F1, which are based on the works of DONUT [50] and SRCNN
anomalous point in time series data manifests as an outlier value [38], respectively.
in the time domain. However, when the data are transformed into
the frequency domain, all frequency information is shifted, leading
to a challenge. The impact of this issue will be amplified when the
last point is abnormal, as we specifically aim to detect the last point
given the whole window. While we use the frequency enhancement
method and frequency selection to mitigate this problem to some
extent, we mask the last point as zero during the extraction of the
frequency condition to address this issue further.
Figure 8: Illustration of the adjustment strategy.
𝑊
∑︁
L = E𝑞𝜙 (z|x,c) [ 𝛼 𝑤 log𝑝𝜃 (𝑥 𝑤 |z, c) + 𝛽log𝑝𝜃 (z) − log𝑞𝜙 (z|x, c) ] (6) Best F1 is obtained by traversing all possible thresholds for anom-
𝑤=1 aly scores, and subsequently applying a point adjustment strategy
While testing, FCVAE adopts the Markov Chain Monte Carlo to the prediction in order to compute the F1 score. Delay F1 is
(MCMC)-based missing imputation algorithm proposed in [39] and similar to best F1 but employs a delay point adjustment strategy to
applied in [30] to mitigate the impact of missing data. Since our goal transform the prediction. The adjustment strategies are illustrated
is to detect the last point of a window, the last point is set to missing in Figure 8, with a delay set to 1 as an example. The detector misses
for MCMC to obtain a normal value. This also allows for a better the second anomalous segment because it takes two-time intervals
adaptation to the last point mask mentioned earlier. FCVAE further to detect this segment, exceeding the maximum delay threshold we
utilizes reconstruction probabilities as anomaly scores, which are established. We configure the delay for all datasets to be 7, except
defined in equation (7). for Yahoo, where it is set to 3, and NAB, where it is set to 150. This
WWW ’24, May 13–17, 2024, Singapore Wang et al.

Table 1: Performance on test data. P means precison, R means recall, F1 means best F1 and F1* means delay F1.

Yahoo KPI WSD NAB

Method P R F1 P R F1* P R F1 P R F1* P R F1 P R F1* P R F1 P R F1*
SPOT[42] 0.572 0.328 0.417 0.572 0.328 0.417 0.966 0.221 0.360 0.911 0.077 0.143 0.947 0.315 0.472 0.887 0.137 0.237 0.992 0.713 0.829 0.992 0.713 0.829
SRCNN[38] 0.268 0.236 0.251 0.219 0.181 0.198 0.673 0.944 0.786 0.617 0.753 0.678 0.093 0.903 0.170 0.028 0.361 0.053 0.825 0.832 0.828 0.460 0.766 0.575
DONUT[50] 0.381 0.150 0.215 0.381 0.150 0.215 0.378 0.569 0.454 0.328 0.407 0.364 0.263 0.195 0.224 0.199 0.131 0.158 0.933 0.937 0.935 0.821 0.774 0.797
VQRAE[22] 0.706 0.399 0.510 0.691 0.381 0.492 0.202 0.418 0.272 0.167 0.117 0.137 0.518 0.233 0.312 0.241 0.066 0.103 0.990 0.882 0.933 0.806 1.000 0.893
Anotransfer[54] 0.902 0.413 0.567 0.575 0.437 0.496 0.815 0.591 0.685 0.557 0.394 0.461 0.695 0.654 0.674 0.331 0.444 0.379 0.962 0.968 0.965 0.837 0.908 0.871
Informer[59] 0.747 0.671 0.707 0.731 0.619 0.671 0.927 0.910 0.918 0.801 0.845 0.822 0.532 0.583 0.557 0.402 0.385 0.393 0.971 0.974 0.973 0.878 0.907 0.892
TFAD[53] 0.884 0.739 0.805 0.883 0.734 0.802 0.684 0.834 0.752 0.650 0.714 0.680 0.541 0.750 0.628 0.431 0.482 0.455 0.749 0.719 0.734 0.265 0.233 0.248
Anomaly-Transformer[51] 0.588 0.179 0.274 0.054 0.020 0.029 0.930 0.814 0.868 0.622 0.240 0.346 0.861 0.630 0.728 0.144 0.129 0.137 0.944 1.000 0.971 0.891 0.932 0.911
FCVAE 0.897 0.821 0.857 0.897 0.792 0.842 0.930 0.924 0.927 0.906 0.772 0.835 0.786 0.881 0.831 0.705 0.571 0.631 0.953 1.000 0.976 0.925 0.909 0.917

is because the anomaly segments in Yahoo are very short, while 4.3 Different Types of Conditions in CVAE
in NAB, they are typically much longer, often spanning several We conduct experiments under identical settings to evaluate differ-
hundred data points. ent types of conditions. The chosen conditions encompass infor-
4.1.4 Implementation Details. To guarantee the widespread appli- mation potentially useful for time series anomaly detection within
cability, all the experiments described below were conducted under the scope of our understanding, including timestamps [54], time
entirely unsupervised conditions, without employing any actual domain information, and frequency domain information. To en-
labels (all labels are set to zero). For consistency across all methods, sure consistency, we apply the same operation on the time domain
we trained a single model for all curves within a dataset. Regard- information as we do on the frequency domain information.
ing hyperparameters, we conducted a grid search to identify the As illustrated in Figure 9(a), the performance of employing the
most effective parameters for different datasets. Additionally, we frequency information as a condition surpasses that using the times-
later evaluated the sensitivity of these parameters to ensure robust tamp or time domain information. This can be readily compre-
performance. hended, as timestamps carry limited information and typically re-
quire one-hot encoding, resulting in sparse data representation.
4.2 Overall Performance Time domain information is already incorporated in VAE, and uti-
lizing it as a condition may lead to redundant information without
The performance of FCVAE and baseline methods across the four
significantly benefiting the reconstruction. Conversely, frequency
datasets is depicted in Table 1. Our method surpasses all baselines
information, as a valuable and complementary prior, render-
on four datasets regarding best F1 by 6.45%, 0.98%, 14.14% and 0.31%.
ing it a more effective condition for anomaly detection.
In terms of delay F1, our method outperforms all baselines on four
datasets by 4.98%, 1.58%, 38.68% and 0.65%.
4.4 Frequency VAE and FACVAE
The performance of various baseline methods on the datasets
exhibits considerable variation. For instance, SPOT [42] does not Is CVAE the optimal strategy for harnessing the frequency infor-
excel on most datasets, as it erroneously treats extreme values as mation in anomaly detection? In this study, we compare FCVAE
anomalies, whereas anomalies are not always manifested as such. with an improved frequency-based VAE (FVAE) model, in which the
SRCNN [38] is a reasonably proficient classifier, yet its performance frequency information is integrated into VAE along with the input
falls short compared to most other models. This underscores the fact to reconstruct the original time series. As depicted in Figure 9(b),
that implicitly extracting abnormal features is challenging. Informer FCVAE surpasses FVAE. This outcome can be attributed to two pri-
[59] outperforms most other baselines across different datasets, mary reasons. Firstly, CVAE, due to its unique architecture that in-
as many anomalies exhibit notable value jumps, and prediction- corporates conditional information, intrinsically outperforms VAE
based methods can effectively manage this situation. However, it in numerous applications. Secondly, FVAE does not fully exploit
struggles with anomalies induced by frequency changes. Anomaly- frequency information. Although it incorporates this additional in-
Transformer [51] attains commendable results on most datasets formation, it still lacks efficient utilization in practice, particularly
in terms of best F1 but demonstrates a low delay F1. It detects in the decoder. Consequently, the CVAE that incorporates the
anomalies based on the relationships with nearby points, and only frequency information as a condition represents the most
when the anomalous point is relatively central within the window effective structure known to date.
can it easily capture the correlation. Conversely, TFAD [53] achieves
favorable results on various datasets but exhibits a certain delay in 4.5 GFM and LFM
detection. We propose GFM and LFM to extract global and local frequency
Moreover, our method surpasses DONUT [50] and VQRAE [22] information, respectively. However, do these two modules achieve
in terms of reconstruction-based methods. Although VQRAE [22] our intended effects through their designs? Additionally, it is worth
introduces numerous modifications to the VAE, employing RNN noting that GFM and LFM may overlap to some degree. Thus, we
to capture temporal relationships, our method still outperforms it. would like to determine if combining the two can further enhance
This finding implies that for UTS anomaly detection, it is imperative the performance.
to incorporate only key information while avoiding overloading We conduct experiments and the results are depicted in Fig-
the model with superfluous data. ure 9(c). It can be observed that, across the four datasets, employing
Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective WWW ’24, May 13–17, 2024, Singapore

(a) Performance of CVAE using different (b) Performance of different ways using (c) Performance of different model struc- (d) Performance of whether using atten-
conditions. frequency information. ture. tion mechanism.

Figure 9: Delay F1 score of different settings.

either LFM or GFM in FCVAE outperforms the VAE model under We present a comprehensive explanation of the attention mech-
identical conditions of other settings except for NAB, where the anism in LFM using a case. A specific data segment, denoted by the
frequent oscillation of data results in inconsistency between the black dashed box in Figure 10(b), is selected and all small windows
information extracted from GFM and the data value of the cur- produced by LFM’s sliding window module are transformed into
rent time. For all datasets, when both LFM and GFM modules are the frequency domain to obtain their spectra. As illustrated in Fig-
utilized concurrently, they synergistically enhance each other, re- ure 10(a), the 5-th (green) and the 8-th (red) windows exhibit the
sulting in superior performance. Consequently, both global and highest similarity, where the 8-th window serves as the query 𝑄 for
local frequency information play a crucial role in detecting our attention. Upon examining Figure 10(b), it can be observed that
anomalies. the heat value of the 5-th window is the highest, which corresponds
with the findings in Figure 10(a).
4.6 Attention Mechanism
It is crucial to discern whether the enhancement in LFM stems from 4.7 Key Techniques in Framework
the reduced window size or the attention mechanism. Thus, we per- In this section, we evaluate the effectiveness of our novel data aug-
form experiments by excluding the attention operation from LFM mentation technique, masking the last point, and the application
while keeping GFM unaltered. Specifically, we utilized frequency of CM-ELBO on four distinct datasets. The results are presented
information either from the latest small window in LFM (Latest) or in Table 2. Based on the results, it is clear that CM-ELBO plays
from the average pooling of frequency information across all small the most crucial role in most datasets, which aligns with our ex-
windows in LFM (Average Pooling). pectations. This is because it can tolerate abnormal or missing
The findings in Figure 9(d) demonstrate that without attention, data to a certain extent. Furthermore, masking the last point has
it is impossible to attain the original performance of FCVAE since a substantial impact on the results, as when an anomaly occurs
it is not feasible to determine the specific weight of each small at the last point of the window, it affects the entire frequency in-
window in advance. However, the attention mechanism effec- formation. Effectively masking this point resolves the issue and
tively addresses this issue by assigning higher weights to improves the detection accuracy. Data augmentation, on the other
more informative windows. hand, introduces some artificial anomalies to boost the performance
of CM-ELBO, particularly in unsupervised settings.

Table 2: Delay F1 of different settings.

Variants Yahoo KPI WSD NAB

w/o data augment 0.841 0.825 0.626 0.904
w/o mask last point 0.835 0.830 0.534 0.877
w/o CM-ELBO 0.690 0.757 0.435 0.897
FCVAE 0.842 0.835 0.631 0.917

(a) Spectrum of small windows for (b) Heatmap of LFM attention in a

data in the black dashed box on the batch. The 8-th window is the latest
4.8 Parameter Sensitivity
right. window. The stability of a model to different parameters is an important
aspect to consider, and therefore we test the sensitivity of our model
Figure 10: An example of attention mechanism in LFM. parameters on two datasets, KPI and WSD. We examine four aspects:
the dimension of the condition, the window size, the proportion of
missing data injection, and the proportion of data augmentation.
WWW ’24, May 13–17, 2024, Singapore Wang et al.

(a) Window Size (b) Embedding Dimension (c) Missing Data Injection Rate (d) Data Augment Rate

Figure 11: Delay F1 score of different settings.

The results, shown in Figure 11, indicate that our model can achieve [3, 53] obtain pseudo-labels through data augmentation to enhance
stable and excellent results under different parameter settings. the learning ability.
Unsupervised methods are mainly divided into reconstruction-
5 PRODUCTION IMPACT AND EFFICIENCY based and prediction-based methods. Reconstruction-based meth-
Our FCVAE approach has been incorporated as a crucial component ods [5, 22, 27, 29, 50] learn low-dimensional representations and
in a large-scale cloud system that caters to millions of users globally reconstruct the “normal patterns” of data and detect anomalies
[6, 19, 20]. The system generates billions of time series data points according to reconstruction error. DONUT[50] proposed the modi-
on a daily basis. The FCVAE detects anomalies in the cloud system, fied ELBO to enhance the capability of VAE in reconstructing the
with the primary goal of identifying any potential regressions in normal data. Buzz[5] is the first to propose a deep generative model.
the system that may indicate the occurrence of an incident. ACVAE[29] adds active learning and contrastive learning on the
basis of VAE. Prediction-based methods [18, 59] try to predict the
normal values of metrics based on historical data and detect anom-
Table 3: Online performance of FCVAE in production com- alies according to the prediction error. Informer[59] changes the
pared to legacy detector. F1 and F1∗ are defined in Table 1. relevant mechanism of self attention to achieve better prediction ef-
fect and efficiency. In recent years, transformer-based methods have
Baseline FCVAE Improvement Inference efficiency been widely proposed. Anomaly-Transformer [51] detect anom-
F1 F1∗ F1 F1∗ F1 F1∗ [points/second] alies by comparing Kullback-Leible (KL) divergence between two
distributions. Some methods [48, 60] have begun to solve some
0.66 0.63 0.73 0.69 10.9% 11.1% 1195.7
practical problems from the frequency domain. Moerover, many
transfer learning methods have been proposed[12, 27, 54, 55].
Table 3 presents the online performance improvement achieved
by employing FCVAE over a period of one year. The experiments
were conducted on a 24GB memory 3090 GPU. The results demon- 7 CONCLUSION
strate substantial enhancements in both Best F1 and Delay F1 com- Our paper presents a novel unsupervised method for detecting
pared to the legacy detector. This underscores the effectiveness anomalies in UTS, termed FCVAE. At the model level, we introduce
and robustness of our proposed method. Furthermore, our model is the frequency domain information as a condition to work with
lightweight and highly efficient, capable of processing over 1000 CVAE. To capture the frequency information more accurately, we
data points within 1 second. This far exceeds the speed at which propose utilizing both GFM and LFM to concurrently capture the
the system generates new temporal points. features from global and local frequency domains, and employing
the target attention to more effectively extract local information. At
6 RELATED WORK the architecture level, we propose several new technologies, includ-
Traditional statistical methods [32, 35, 37, 40, 44, 45, 56] are ing CM-ELBO, data augmentation and masking the last point. We
widely used in time series anomaly detection because of their great carry out experiments on four dataset and an online cloud system
advantages in time series data processing. For example, [37] find the to evaluate our approach’s accuracy, and comprehensive ablation
high frequency abnormal part of data through FFT[45] and verify experiments to demonstrate the effectiveness of each module.
it twice. Twitter[44] uses STL[8] to detect anomaly points. SPOT
[42] considers that some extreme values are abnormal, therefore,
detects them through Extreme Value Theory [10]. 8 ACKNOWLEDGMENTS
Supervised methods [24, 31, 38, 57] mostly learn the features This work was supported in part by the National Key Research and
of anomalies and identify them through classifiers based on the Development Program of China (No.2021YFE0111500), in part by
features learned. Opprentice [31] efficiently combines the results the National Natural Science Foundation of China (No.62202445),
of many detectors through random forest. SRCNN [38] build a in part by the State Key Program of National Natural Science of
classifier through spectral residual [15] and CNN. Some methods China under Grant 62072264.
Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective WWW ’24, May 13–17, 2024, Singapore

REFERENCES [22] Tung Kieu, Bin Yang, Chenjuan Guo, Razvan-Gabriel Cirstea, Yan Zhao, Yale
[1] [n. d.]. WSD dataset. Available: https://github.com/anotransfer/AnoTransfer- Song, and Christian S Jensen. 2022. Anomaly detection in time series with
data/. robust variational quasi-recurrent autoencoders. In 2022 IEEE 38th International
[2] [n. d.]. Yahoo dataset. Available: https://webscope.sandbox.yahoo.com/. Conference on Data Engineering (ICDE). IEEE, 1342–1354.
[3] Chris U Carmona, François-Xavier Aubet, Valentin Flunkert, and Jan Gasthaus. [23] Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes.
2021. Neural contextual anomaly detection for time series. arXiv preprint arXiv preprint arXiv:1312.6114 (2013).
arXiv:2107.07702 (2021). [24] Nikolay Laptev, Saeed Amizadeh, and Ian Flint. 2015. Generic and scalable
[4] Qiwei Chen, Changhua Pei, Shanshan Lv, Chao Li, Junfeng Ge, and Wenwu Ou. framework for automated time-series anomaly detection. In Proceedings of the
2021. End-to-end user behavior retrieval in click-through rateprediction model. 21th ACM SIGKDD international conference on knowledge discovery and data
arXiv preprint arXiv:2108.04468 (2021). mining. 1939–1947.
[5] Wenxiao Chen, Haowen Xu, Zeyan Li, Dan Pei, Jie Chen, Honglin Qiao, Yang [25] Alexander Lavin and Subutai Ahmad. 2015. Evaluating real-time anomaly detec-
Feng, and Zhaogang Wang. 2019. Unsupervised anomaly detection for intricate tion algorithms–the Numenta anomaly benchmark. In 2015 IEEE 14th international
kpis via adversarial training of vae. In IEEE INFOCOM 2019-IEEE Conference on conference on machine learning and applications (ICMLA). IEEE, 38–44.
Computer Communications. IEEE, 1891–1899. [26] Arthur Le Guennec, Simon Malinowski, and Romain Tavenard. 2016. Data
[6] Yinfang Chen, Huaibing Xie, Minghua Ma, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, augmentation for time series classification using convolutional neural networks.
Xuedong Gao, Hao Fan, Ming Wen, et al. 2024. Automatic Root Cause Analysis In ECML/PKDD workshop on advanced analytics and learning on temporal data.
via Large Language Models for Cloud Incidents. (2024). [27] Zeyan Li, Wenxiao Chen, and Dan Pei. 2018. Robust and unsupervised kpi
[7] Yuhang Chen, Chaoyun Zhang, Minghua Ma, Yudong Liu, Ruomeng Ding, Bowen anomaly detection based on conditional variational autoencoder. In 2018 IEEE 37th
Li, Shilin He, Saravan Rajmohan, Qingwei Lin, and Dongmei Zhang. 2023. Imdif- International Performance Computing and Communications Conference (IPCCC).
fusion: Imputed diffusion models for multivariate time series anomaly detection. IEEE, 1–9.
VLDB (2023). [28] Zeyan Li, Nengwen Zhao, Shenglin Zhang, Yongqian Sun, Pengfei Chen, Xidao
[8] Robert B Cleveland, William S Cleveland, Jean E McRae, and Irma Terpenning. Wen, Minghua Ma, and Dan Pei. 2022. Constructing Large-Scale Real-World
1990. STL: A seasonal-trend decomposition. J. Off. Stat 6, 1 (1990), 3–73. Benchmark Datasets for AIOps. arXiv preprint arXiv:2208.03938 (2022).
[9] Liang Dai, Tao Lin, Chang Liu, Bo Jiang, Yanwei Liu, Zhen Xu, and Zhi-Li Zhang. [29] Zhihan Li, Youjian Zhao, Yitong Geng, Zhanxiang Zhao, Hanzhang Wang, Wenx-
2021. SDFVAE: Static and Dynamic Factorized VAE for Anomaly Detection of iao Chen, Huai Jiang, Amber Vaidya, Liangfei Su, and Dan Pei. 2022. Situation-
Multivariate CDN KPIs. In Proceedings of the Web Conference 2021 (Ljubljana, Aware Multivariate Time Series Anomaly Detection Through Active Learning
Slovenia) (WWW ’21). Association for Computing Machinery, New York, NY, and Contrast VAE-Based Models in Large Distributed Systems. IEEE Journal on
USA, 3076–3086. https://doi.org/10.1145/3442381.3450013 Selected Areas in Communications 40, 9 (2022), 2746–2765.
[10] L de Haan and A Ferreira. 2006. Extreme Value Theory: an Introduction Springer [30] Zhihan Li, Youjian Zhao, Jiaqi Han, Ya Su, Rui Jiao, Xidao Wen, and Dan Pei. 2021.
Science+ Business Media. LLC, New York (2006). Multivariate time series anomaly detection and interpretation using hierarchical
[11] Shohreh Deldari, Daniel V. Smith, Hao Xue, and Flora D. Salim. 2021. Time Series inter-metric and temporal embedding. In Proceedings of the 27th ACM SIGKDD
Change Point Detection with Self-Supervised Contrastive Predictive Coding. conference on knowledge discovery & data mining. 3220–3230.
In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW ’21). [31] Dapeng Liu, Youjian Zhao, Haowen Xu, Yongqian Sun, Dan Pei, Jiao Luo, Xi-
Association for Computing Machinery, New York, NY, USA, 3124–3135. https: aowei Jing, and Mei Feng. 2015. Opprentice: Towards practical and automatic
//doi.org/10.1145/3442381.3449903 anomaly detection through machine learning. In Proceedings of the 2015 internet
[12] XiaoYan Duan, NingJiang Chen, and YongSheng Xie. 2019. Intelligent detection measurement conference. 211–224.
of large-scale KPI streams anomaly based on transfer learning. In Big Data: 7th [32] Wei Lu and Ali A Ghorbani. 2008. Network anomaly detection based on wavelet
CCF Conference, BigData 2019, Wuhan, China, September 26–28, 2019, Proceedings analysis. EURASIP Journal on Advances in Signal processing 2009 (2008), 1–16.
7. Springer, 366–379. [33] Xiaofeng Lu, Xiaoyu Zhang, and Pietro Lio. 2023. GAT-DNS: DNS Multivariate
[13] Vaibhav Ganatra, Anjaly Parayil, Supriyo Ghosh, Yu Kang, Minghua Ma, Chetan Time Series Prediction Model Based on Graph Attention Network. In Companion
Bansal, Suman Nath, and Jonathan Mace. 2023. Detection Is Better Than Cure: Proceedings of the ACM Web Conference 2023 (Austin, TX, USA) (WWW ’23
A Cloud Incidents Perspective. In Proceedings of the 31st ACM Joint European Companion). Association for Computing Machinery, New York, NY, USA, 127–131.
Software Engineering Conference and Symposium on the Foundations of Software https://doi.org/10.1145/3543873.3587329
Engineering. 1891–1902. [34] Minghua Ma, Shenglin Zhang, Junjie Chen, Jim Xu, Haozhe Li, Yongliang Lin,
[14] Nikou Günnemann, Stephan Günnemann, and Christos Faloutsos. 2014. Robust Xiaohui Nie, Bo Zhou, Yong Wang, and Dan Pei. 2021. { Jump-Starting } Mul-
Multivariate Autoregression for Anomaly Detection in Dynamic Product Ratings. tivariate Time Series Anomaly Detection for Online Service Systems. In 2021
In Proceedings of the 23rd International Conference on World Wide Web (Seoul, USENIX Annual Technical Conference (USENIX ATC 21). 413–426.
Korea) (WWW ’14). Association for Computing Machinery, New York, NY, USA, [35] Ajay Mahimkar, Zihui Ge, Jia Wang, Jennifer Yates, Yin Zhang, Joanne Emmons,
361–372. https://doi.org/10.1145/2566486.2568008 Brian Huntley, and Mark Stockert. 2011. Rapid detection of maintenance induced
[15] Xiaodi Hou and Liqing Zhang. 2007. Saliency detection: A spectral residual changes in service performance. In Proceedings of the Seventh COnference on
approach. In 2007 IEEE Conference on computer vision and pattern recognition. Emerging Networking EXperiments and Technologies. 1–12.
Ieee, 1–8. [36] Oded Ovadia, Oren Elisha, and Elad Yom-Tov. 2022. Detection of Infectious
[16] Siteng Huang, Donglin Wang, Xuehan Wu, and Ao Tang. 2019. Dsanet: Dual Disease Outbreaks in Search Engine Time Series Using Non-Specific Syndromic
self-attention network for multivariate time series forecasting. In Proceedings of Surveillance with Effect-Size Filtering. In Companion Proceedings of the Web
the 28th ACM international conference on information and knowledge management. Conference 2022 (Virtual Event, Lyon, France) (WWW ’22). Association for Com-
2129–2132. puting Machinery, New York, NY, USA, 924–929. https://doi.org/10.1145/3487553.
[17] Tao Huang, Pengfei Chen, and Ruipeng Li. 2022. A Semi-Supervised VAE Based 3524672
Active Anomaly Detection Framework in Multivariate Time Series for Online [37] Faraz Rasheed, Peter Peng, Reda Alhajj, and Jon Rokne. 2009. Fourier trans-
Systems. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, form based spatial outlier mining. In Intelligent Data Engineering and Automated
France) (WWW ’22). Association for Computing Machinery, New York, NY, USA, Learning-IDEAL 2009: 10th International Conference, Burgos, Spain, September
1797–1806. https://doi.org/10.1145/3485447.3511984 23-26, 2009. Proceedings 10. Springer, 317–324.
[18] Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and [38] Hansheng Ren, Bixiong Xu, Yujing Wang, Chao Yi, Congrui Huang, Xiaoyu Kou,
Tom Soderstrom. 2018. Detecting spacecraft anomalies using lstms and nonpara- Tony Xing, Mao Yang, Jie Tong, and Qi Zhang. 2019. Time-series anomaly detec-
metric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD interna- tion service at microsoft. In Proceedings of the 25th ACM SIGKDD international
tional conference on knowledge discovery & data mining. 387–395. conference on knowledge discovery & data mining. 3009–3017.
[19] Yuxuan Jiang, Chaoyun Zhang, Shilin He, Zhihao Yang, Minghua Ma, Si Qin, [39] Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochas-
Yu Kang, Yingnong Dang, Saravan Rajmohan, Qingwei Lin, et al. 2024. Xpert: tic backpropagation and approximate inference in deep generative models. In
Empowering Incident Management with Query Recommendations via Large International conference on machine learning. PMLR, 1278–1286.
Language Models. (2024). [40] Bernard Rosner. 1983. Percentage points for a generalized ESD many-outlier
[20] Pengxiang Jin, Shenglin Zhang, Minghua Ma, Haozhe Li, Yu Kang, Liqun Li, procedure. Technometrics 25, 2 (1983), 165–172.
Yudong Liu, Bo Qiao, Chaoyun Zhang, Pu Zhao, et al. 2023. Assess and Summarize: [41] Lifeng Shen, Zhuocong Li, and James Kwok. 2020. Timeseries anomaly detection
Improve Outage Understanding with Large Language Models. (2023). using temporal hierarchical one-class network. Advances in Neural Information
[21] Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodriguez, Chao Zhang, and Processing Systems 33 (2020), 13016–13026.
B Aditya Prakash. 2022. CAMul: Calibrated and Accurate Multi-View Time-Series [42] Alban Siffer, Pierre-Alain Fouque, Alexandre Termier, and Christine Largouet.
Forecasting. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, 2017. Anomaly detection in streams with extreme value theory. In Proceedings of
France) (WWW ’22). Association for Computing Machinery, New York, NY, USA, the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data
3174–3185. https://doi.org/10.1145/3485447.3512037 Mining. 1067–1075.
WWW ’24, May 13–17, 2024, Singapore Wang et al.

[43] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output ’21). Association for Computing Machinery, New York, NY, USA, 2404–2409.
representation using deep conditional generative models. Advances in neural https://doi.org/10.1145/3448016.3457236
information processing systems 28 (2015). [53] Chaoli Zhang, Tian Zhou, Qingsong Wen, and Liang Sun. 2022. TFAD: A De-
[44] Owen Vallis, Jordan Hochenbaum, and Arun Kejariwal. 2014. A novel technique composition Time Series Anomaly Detection Architecture with Time-Frequency
for long-term anomaly detection in the cloud. In 6th { USENIX } workshop on hot Analysis. In Proceedings of the 31st ACM International Conference on Information
topics in cloud computing (HotCloud 14). & Knowledge Management. 2497–2507.
[45] Charles Van Loan. 1992. Computational frameworks for the fast Fourier transform. [54] Shenglin Zhang, Zhenyu Zhong, Dongwen Li, Qiliang Fan, Yongqian Sun, Man
SIAM. Zhu, Yuzhi Zhang, Dan Pei, Jiyan Sun, Yinlong Liu, et al. 2022. Efficient kpi
[46] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, anomaly detection through transfer learning for large-scale web services. IEEE
Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all Journal on Selected Areas in Communications 40, 8 (2022), 2440–2455.
you need. Advances in neural information processing systems 30 (2017). [55] Xu Zhang, Qingwei Lin, Yong Xu, Si Qin, Hongyu Zhang, Bo Qiao, Yingnong
[47] Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue Wang, Dang, Xinsheng Yang, Qian Cheng, Murali Chintalapati, et al. 2019. Cross-dataset
and Huan Xu. 2020. Time series data augmentation for deep learning: A survey. Time Series Anomaly Detection for Cloud Systems.. In USENIX Annual Technical
arXiv preprint arXiv:2002.12478 (2020). Conference. 1063–1076.
[48] Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng [56] Yin Zhang, Zihui Ge, Albert Greenberg, and Matthew Roughan. 2005. Network
Long. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series anomography. In Proceedings of the 5th ACM SIGCOMM conference on Internet
Analysis. In International Conference on Learning Representations. Measurement. 30–30.
[49] Sihong Xie, Guan Wang, Shuyang Lin, and Philip S. Yu. 2012. Review Spam [57] Chenyu Zhao, Minghua Ma, Zhenyu Zhong, Shenglin Zhang, Zhiyuan Tan, Xiao
Detection via Time Series Pattern Discovery. In Proceedings of the 21st Inter- Xiong, LuLu Yu, Jiayi Feng, Yongqian Sun, Yuzhi Zhang, et al. 2023. Robust
national Conference on World Wide Web (Lyon, France) (WWW ’12 Compan- Multimodal Failure Detection for Microservice Systems. 29th ACM SIGKDD
ion). Association for Computing Machinery, New York, NY, USA, 635–636. Conference on Knowledge Discovery and Data Mining (KDD) (2023).
https://doi.org/10.1145/2187980.2188164 [58] Nengwen Zhao, Jing Zhu, Yao Wang, Minghua Ma, Wenchi Zhang, Dapeng Liu,
[50] Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ming Zhang, and Dan Pei. 2019. Automatic and generic periodicity adaptation
Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, et al. 2018. Unsupervised anomaly for kpi anomaly detection. IEEE Transactions on Network and Service Management
detection via variational auto-encoder for seasonal kpis in web applications. In 16, 3 (2019), 1170–1183.
Proceedings of the 2018 world wide web conference. 187–196. [59] Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong,
[51] Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. 2022. Anomaly and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long se-
Transformer: Time Series Anomaly Detection with Association Discrepancy. In quence time-series forecasting. In Proceedings of the AAAI conference on artificial
International Conference on Learning Representations. https://openreview.net/ intelligence. 11106–11115.
forum?id=LzQQ89U1qm_ [60] Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin.
[52] Zhiqiang Xu, Dong Li, Weijie Zhao, Xing Shen, Tianbo Huang, Xiaoyun Li, 2022. Fedformer: Frequency enhanced decomposed transformer for long-term
and Ping Li. 2021. Agile and Accurate CTR Prediction Model Training for series forecasting. In International Conference on Machine Learning. PMLR, 27268–
Massive-Scale Online Advertising Systems. In Proceedings of the 2021 Inter- 27286.
national Conference on Management of Data (Virtual Event, China) (SIGMOD