Revisiting VAE For Unsupervised Time Series Anomaly Detection: A Frequency Perspective
Revisiting VAE For Unsupervised Time Series Anomaly Detection: A Frequency Perspective
A Frequency Perspective
Zexin Wang∗ Changhua Pei† Minghua Ma
Computer Network Information Computer Network Information Microsoft
Center, Chinese Academy of Sciences Center, Chinese Academy of Sciences Beijing, China
Beijing, China Beijing, China
effective VAE-based reconstruction. To address these challenges, reconstruction. M-ELBO is defined in (1), 𝛼 𝑤 is defined as an indi-
we sub-divide the entire window into smaller ones and propose cator, where 𝛼 𝑤 = 1 indicates 𝑥 𝑤 being not anomalous or missing,
a target attention method to select the most useful sub-window
Í
and 𝛼 𝑤 = 0 otherwise. 𝛽 is defined as ( 𝑊𝑤=1 𝛼 𝑤 )/𝑊 .
frequencies.
𝑊
In this paper, we introduce a novel unsupervised anomaly detec- L = E𝑞𝜙 (z|x) [
∑︁
𝛼 𝑤 log𝑝𝜃 (𝑥 𝑤 |z) + 𝛽log𝑝𝜃 (z|x) − log𝑞𝜙 (z|x) ] (1)
tion algorithm, named FCVAE (Frequency-enhanced Conditional 𝑤=1
Variational AutoEncoder). Different from current VAE-based anom- The overall structure of CVAE [43] is similar to VAE, and it com-
aly detection methods, FCVAE innovatively incorporates both global bines conditional generative models with VAE to achieve stronger
and local frequency information to guide the encoding-decoding control over the generated data. The training objective of CVAE is
procedures, that both heterogeneous periodic and detailed trend defined as (2), where c is the condition, similar to that of VAE. FC-
patterns can be effectively captured. This in turn enables more VAE which will be elaborated on later extends the CVAE framework
accurate anomaly detection. Our paper’s contributions can be sum- by incorporating frequency information.
marized as follows: L = E𝑞𝜙 (z|x,c) [log𝑝𝜃 (x|z, c) + log𝑝𝜃 (z) − log𝑞𝜙 (z|x, c) ] (2)
• Our analysis of the widely-used VAE model for anomaly
detection reveals that existing VAE-based models fail to cap- 3 METHODOLOGY
ture both heterogeneous periodic patterns and detailed trend
3.1 Framework Overview
patterns. We attribute this failure to the missing of some
frequency-domain information, which current methods fail
to reconstruct.
• Our study systematically improves the long-standing VAE by
focusing on frequency. Our proposed FCVAE makes the VAE-
based approach the state-of-the-art in anomaly detection
once more. This is significant because VAE-based methods
can inherently handle mixed anomaly-normal training data,
while prediction-based methods cannot.
• Evaluations demonstrate that our FCVAE substantially sur-
passes state-of-the-art methods (∼40% on public datasets
and 10% in a real-world web system in terms of F1 score).
Comprehensive ablation studies provide an in-depth analy-
sis of the model, revealing the reasons behind its superior
performance.
The replication package for this paper, including all our data, Figure 3: Overall Framework.
source code, and documentation, is publicly available online at
https://github.com/CSTCloudOps/FCVAE. The proposed algorithm for anomaly detection is illustrated in
Figure 3 and comprises three main components: data preprocessing,
2 PRELIMINARIES training, and testing.
2.1 Problem Statement 3.2 Data Preprocessing
Given a UTS x = [𝑥 0, 𝑥 1, 𝑥 2, · · · , 𝑥𝑡 ] and label series L = [𝑙 0, 𝑙 1, 𝑙 2, · · · , 𝑙𝑡 ], Data preprocessing encompasses standardization, filling missing
where 𝑥𝑖 ∈ R, 𝑙𝑖 ∈ {0, 1}, and 𝑡 ∈ N. x represents the entire time and anomaly points, and the newly introduced method of data
series data array, while 𝑥𝑖 signifies the metric value at time 𝑖. L augmentation. The efficacy of data standardization and filling
denotes the label of time series x. We define the UTS anomaly missing and anomaly points has been substantiated in prior studies
detection task as follows: [27, 30, 50]. Therefore, we directly incorporate these techniques
Given a UTS x = [𝑥 0, 𝑥 1, 𝑥 2, · · · , 𝑥𝑡 ], the objective of UTS anomaly into our approach.
detection is to utilize the data [𝑥 0, 𝑥 1, · · · , 𝑥𝑖 −1 ] preceding each point Previous data augmentation methods [26, 47, 53] often added
𝑥𝑖 to predict 𝑙𝑖 . normal samples, such as variations of data from the time domain or
frequency domain. However, for our method, we train the model by
2.2 VAEs and CVAEs incorporating all the time series from the dataset together, which
VAE is composed of an encoder 𝑞𝜙 (z|x) and a decoder 𝑝𝜃 (z|x). VAE provides sufficient pattern diversity. Furthermore, FCVAE has the
can be trained by using the reparameterization trick. SGVB [39] is ability to extract pattern information due to the addition of fre-
a commonly used training method for VAE because of its simplicity quency information, so it can handle new patterns well. Nonethe-
and effectiveness. It maximizes the evidence lower bound (ELBO) to less, even with the introduction of frequency information, anom-
simultaneously train the reconstruction and generation capabilities alies are often challenging to be effectively addressed. For the model
of VAE. to learn how to handle anomalies, we primarily focus on abnormal
DONUT [50] proposed the modified ELBO (M-ELBO) to weaken data augmentation. In time series data, anomalies are mostly mani-
the impact of abnormal and missing data in the window on the fested as pattern mutations or value mutations (shown in Figure 6),
WWW ’24, May 13–17, 2024, Singapore Wang et al.
(a) Pattern Anomaly (b) Value Anomaly tails in the frequency domain. Therefore, we employ a linear layer
after the FFT to filter out the useful frequency information that can
Figure 6: Examples of the two most frequent anomalies, represent the current window pattern. Moreover, we incorporate
where the red shaded area denotes the abnormal segments. a dropout layer following Fedformer [60] to enhance the model’s
ability to learn the missing frequency information.
The 𝑓𝑔𝑙𝑜𝑏𝑎𝑙 ∈ R1×𝑑 is calculated as (4), where d is the embedding
so our data augmentation mainly targets to these two aspects. The dimension of the global frequency information and F means FFT.
augmentation on the pattern mutation is generated by combining
𝑓𝑔𝑙𝑜𝑏𝑎𝑙 = Dropout(Dense( F (x) ) ) (4)
two windows from different curves, with the junction acting as the
anomaly. Value mutation refers to changing some points in the win- 3.3.2 LFM. The attention mechanism [46] has been widely adopted
dow to randomly assigned abnormal values. With the augmented in time series data processing due to its ability to dynamically pro-
anomaly data, M-ELBO in CVAE, which will be introduced in detail cess dependencies between different time steps and focus on impor-
later, can perform well even in an unsupervised setting without tant ones. Target attention, which is developed based on attention,
true labels. is widely used in the field of recommendation [4]. Specifically, tar-
get attention can weigh the features of the target domain, leading
3.3 Network Architecture to more accurate domain adaptation.
The proposed FCVAE model is illustrated in Figure 4. It comprises The GFM module extracts the frequency information from the
three main components: encoder, decoder, and a condition extrac- entire window, proving to be effective in reconstructing the data
tion block that includes a global frequency information extraction within the whole window. However, we use a window to detect
module (GFM) and a local frequency information extraction module whether the last point is abnormal, which poses a challenge because
(LFM). Equation (3) illustrates how our model works. the GFM module does not provide sufficient attention to the last
point. This can result in a situation where the reconstruction is
𝜇, 𝜎 = Encoder(x, LFM(x), GFM(x) )
satisfactory for part of the window but not for another part, espe-
z = Sample(𝜇, 𝜎 ) (3)
cially when changes in system services lead to the concept drift
𝜇 x , 𝜎x = Decoder(z, LFM(x), GFM(x) )
in the time series data. Even in the absence of concept drift, GFM
3.3.1 GFM. The GFM module (Figure 7) extracts the global fre- cannot capture local changes as it extracts the average frequency
quency information using the FFT transformation (F ). However, information from the entire window; hence, the reconstruction of
not all frequency information is useful. The frequencies resulted the last key point may be unsatisfactory. Nonetheless, as previously
from the noise and anomalies in the time series data appear as long mentioned, target attention can effectively address this issue, as
Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective WWW ’24, May 13–17, 2024, Singapore
Table 1: Performance on test data. P means precison, R means recall, F1 means best F1 and F1* means delay F1.
is because the anomaly segments in Yahoo are very short, while 4.3 Different Types of Conditions in CVAE
in NAB, they are typically much longer, often spanning several We conduct experiments under identical settings to evaluate differ-
hundred data points. ent types of conditions. The chosen conditions encompass infor-
4.1.4 Implementation Details. To guarantee the widespread appli- mation potentially useful for time series anomaly detection within
cability, all the experiments described below were conducted under the scope of our understanding, including timestamps [54], time
entirely unsupervised conditions, without employing any actual domain information, and frequency domain information. To en-
labels (all labels are set to zero). For consistency across all methods, sure consistency, we apply the same operation on the time domain
we trained a single model for all curves within a dataset. Regard- information as we do on the frequency domain information.
ing hyperparameters, we conducted a grid search to identify the As illustrated in Figure 9(a), the performance of employing the
most effective parameters for different datasets. Additionally, we frequency information as a condition surpasses that using the times-
later evaluated the sensitivity of these parameters to ensure robust tamp or time domain information. This can be readily compre-
performance. hended, as timestamps carry limited information and typically re-
quire one-hot encoding, resulting in sparse data representation.
4.2 Overall Performance Time domain information is already incorporated in VAE, and uti-
lizing it as a condition may lead to redundant information without
The performance of FCVAE and baseline methods across the four
significantly benefiting the reconstruction. Conversely, frequency
datasets is depicted in Table 1. Our method surpasses all baselines
information, as a valuable and complementary prior, render-
on four datasets regarding best F1 by 6.45%, 0.98%, 14.14% and 0.31%.
ing it a more effective condition for anomaly detection.
In terms of delay F1, our method outperforms all baselines on four
datasets by 4.98%, 1.58%, 38.68% and 0.65%.
4.4 Frequency VAE and FACVAE
The performance of various baseline methods on the datasets
exhibits considerable variation. For instance, SPOT [42] does not Is CVAE the optimal strategy for harnessing the frequency infor-
excel on most datasets, as it erroneously treats extreme values as mation in anomaly detection? In this study, we compare FCVAE
anomalies, whereas anomalies are not always manifested as such. with an improved frequency-based VAE (FVAE) model, in which the
SRCNN [38] is a reasonably proficient classifier, yet its performance frequency information is integrated into VAE along with the input
falls short compared to most other models. This underscores the fact to reconstruct the original time series. As depicted in Figure 9(b),
that implicitly extracting abnormal features is challenging. Informer FCVAE surpasses FVAE. This outcome can be attributed to two pri-
[59] outperforms most other baselines across different datasets, mary reasons. Firstly, CVAE, due to its unique architecture that in-
as many anomalies exhibit notable value jumps, and prediction- corporates conditional information, intrinsically outperforms VAE
based methods can effectively manage this situation. However, it in numerous applications. Secondly, FVAE does not fully exploit
struggles with anomalies induced by frequency changes. Anomaly- frequency information. Although it incorporates this additional in-
Transformer [51] attains commendable results on most datasets formation, it still lacks efficient utilization in practice, particularly
in terms of best F1 but demonstrates a low delay F1. It detects in the decoder. Consequently, the CVAE that incorporates the
anomalies based on the relationships with nearby points, and only frequency information as a condition represents the most
when the anomalous point is relatively central within the window effective structure known to date.
can it easily capture the correlation. Conversely, TFAD [53] achieves
favorable results on various datasets but exhibits a certain delay in 4.5 GFM and LFM
detection. We propose GFM and LFM to extract global and local frequency
Moreover, our method surpasses DONUT [50] and VQRAE [22] information, respectively. However, do these two modules achieve
in terms of reconstruction-based methods. Although VQRAE [22] our intended effects through their designs? Additionally, it is worth
introduces numerous modifications to the VAE, employing RNN noting that GFM and LFM may overlap to some degree. Thus, we
to capture temporal relationships, our method still outperforms it. would like to determine if combining the two can further enhance
This finding implies that for UTS anomaly detection, it is imperative the performance.
to incorporate only key information while avoiding overloading We conduct experiments and the results are depicted in Fig-
the model with superfluous data. ure 9(c). It can be observed that, across the four datasets, employing
Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective WWW ’24, May 13–17, 2024, Singapore
(a) Performance of CVAE using different (b) Performance of different ways using (c) Performance of different model struc- (d) Performance of whether using atten-
conditions. frequency information. ture. tion mechanism.
either LFM or GFM in FCVAE outperforms the VAE model under We present a comprehensive explanation of the attention mech-
identical conditions of other settings except for NAB, where the anism in LFM using a case. A specific data segment, denoted by the
frequent oscillation of data results in inconsistency between the black dashed box in Figure 10(b), is selected and all small windows
information extracted from GFM and the data value of the cur- produced by LFM’s sliding window module are transformed into
rent time. For all datasets, when both LFM and GFM modules are the frequency domain to obtain their spectra. As illustrated in Fig-
utilized concurrently, they synergistically enhance each other, re- ure 10(a), the 5-th (green) and the 8-th (red) windows exhibit the
sulting in superior performance. Consequently, both global and highest similarity, where the 8-th window serves as the query 𝑄 for
local frequency information play a crucial role in detecting our attention. Upon examining Figure 10(b), it can be observed that
anomalies. the heat value of the 5-th window is the highest, which corresponds
with the findings in Figure 10(a).
4.6 Attention Mechanism
It is crucial to discern whether the enhancement in LFM stems from 4.7 Key Techniques in Framework
the reduced window size or the attention mechanism. Thus, we per- In this section, we evaluate the effectiveness of our novel data aug-
form experiments by excluding the attention operation from LFM mentation technique, masking the last point, and the application
while keeping GFM unaltered. Specifically, we utilized frequency of CM-ELBO on four distinct datasets. The results are presented
information either from the latest small window in LFM (Latest) or in Table 2. Based on the results, it is clear that CM-ELBO plays
from the average pooling of frequency information across all small the most crucial role in most datasets, which aligns with our ex-
windows in LFM (Average Pooling). pectations. This is because it can tolerate abnormal or missing
The findings in Figure 9(d) demonstrate that without attention, data to a certain extent. Furthermore, masking the last point has
it is impossible to attain the original performance of FCVAE since a substantial impact on the results, as when an anomaly occurs
it is not feasible to determine the specific weight of each small at the last point of the window, it affects the entire frequency in-
window in advance. However, the attention mechanism effec- formation. Effectively masking this point resolves the issue and
tively addresses this issue by assigning higher weights to improves the detection accuracy. Data augmentation, on the other
more informative windows. hand, introduces some artificial anomalies to boost the performance
of CM-ELBO, particularly in unsupervised settings.
(a) Window Size (b) Embedding Dimension (c) Missing Data Injection Rate (d) Data Augment Rate
The results, shown in Figure 11, indicate that our model can achieve [3, 53] obtain pseudo-labels through data augmentation to enhance
stable and excellent results under different parameter settings. the learning ability.
Unsupervised methods are mainly divided into reconstruction-
5 PRODUCTION IMPACT AND EFFICIENCY based and prediction-based methods. Reconstruction-based meth-
Our FCVAE approach has been incorporated as a crucial component ods [5, 22, 27, 29, 50] learn low-dimensional representations and
in a large-scale cloud system that caters to millions of users globally reconstruct the “normal patterns” of data and detect anomalies
[6, 19, 20]. The system generates billions of time series data points according to reconstruction error. DONUT[50] proposed the modi-
on a daily basis. The FCVAE detects anomalies in the cloud system, fied ELBO to enhance the capability of VAE in reconstructing the
with the primary goal of identifying any potential regressions in normal data. Buzz[5] is the first to propose a deep generative model.
the system that may indicate the occurrence of an incident. ACVAE[29] adds active learning and contrastive learning on the
basis of VAE. Prediction-based methods [18, 59] try to predict the
normal values of metrics based on historical data and detect anom-
Table 3: Online performance of FCVAE in production com- alies according to the prediction error. Informer[59] changes the
pared to legacy detector. F1 and F1∗ are defined in Table 1. relevant mechanism of self attention to achieve better prediction ef-
fect and efficiency. In recent years, transformer-based methods have
Baseline FCVAE Improvement Inference efficiency been widely proposed. Anomaly-Transformer [51] detect anom-
F1 F1∗ F1 F1∗ F1 F1∗ [points/second] alies by comparing Kullback-Leible (KL) divergence between two
distributions. Some methods [48, 60] have begun to solve some
0.66 0.63 0.73 0.69 10.9% 11.1% 1195.7
practical problems from the frequency domain. Moerover, many
transfer learning methods have been proposed[12, 27, 54, 55].
Table 3 presents the online performance improvement achieved
by employing FCVAE over a period of one year. The experiments
were conducted on a 24GB memory 3090 GPU. The results demon- 7 CONCLUSION
strate substantial enhancements in both Best F1 and Delay F1 com- Our paper presents a novel unsupervised method for detecting
pared to the legacy detector. This underscores the effectiveness anomalies in UTS, termed FCVAE. At the model level, we introduce
and robustness of our proposed method. Furthermore, our model is the frequency domain information as a condition to work with
lightweight and highly efficient, capable of processing over 1000 CVAE. To capture the frequency information more accurately, we
data points within 1 second. This far exceeds the speed at which propose utilizing both GFM and LFM to concurrently capture the
the system generates new temporal points. features from global and local frequency domains, and employing
the target attention to more effectively extract local information. At
6 RELATED WORK the architecture level, we propose several new technologies, includ-
Traditional statistical methods [32, 35, 37, 40, 44, 45, 56] are ing CM-ELBO, data augmentation and masking the last point. We
widely used in time series anomaly detection because of their great carry out experiments on four dataset and an online cloud system
advantages in time series data processing. For example, [37] find the to evaluate our approach’s accuracy, and comprehensive ablation
high frequency abnormal part of data through FFT[45] and verify experiments to demonstrate the effectiveness of each module.
it twice. Twitter[44] uses STL[8] to detect anomaly points. SPOT
[42] considers that some extreme values are abnormal, therefore,
detects them through Extreme Value Theory [10]. 8 ACKNOWLEDGMENTS
Supervised methods [24, 31, 38, 57] mostly learn the features This work was supported in part by the National Key Research and
of anomalies and identify them through classifiers based on the Development Program of China (No.2021YFE0111500), in part by
features learned. Opprentice [31] efficiently combines the results the National Natural Science Foundation of China (No.62202445),
of many detectors through random forest. SRCNN [38] build a in part by the State Key Program of National Natural Science of
classifier through spectral residual [15] and CNN. Some methods China under Grant 62072264.
Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective WWW ’24, May 13–17, 2024, Singapore
REFERENCES [22] Tung Kieu, Bin Yang, Chenjuan Guo, Razvan-Gabriel Cirstea, Yan Zhao, Yale
[1] [n. d.]. WSD dataset. Available: https://github.com/anotransfer/AnoTransfer- Song, and Christian S Jensen. 2022. Anomaly detection in time series with
data/. robust variational quasi-recurrent autoencoders. In 2022 IEEE 38th International
[2] [n. d.]. Yahoo dataset. Available: https://webscope.sandbox.yahoo.com/. Conference on Data Engineering (ICDE). IEEE, 1342–1354.
[3] Chris U Carmona, François-Xavier Aubet, Valentin Flunkert, and Jan Gasthaus. [23] Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes.
2021. Neural contextual anomaly detection for time series. arXiv preprint arXiv preprint arXiv:1312.6114 (2013).
arXiv:2107.07702 (2021). [24] Nikolay Laptev, Saeed Amizadeh, and Ian Flint. 2015. Generic and scalable
[4] Qiwei Chen, Changhua Pei, Shanshan Lv, Chao Li, Junfeng Ge, and Wenwu Ou. framework for automated time-series anomaly detection. In Proceedings of the
2021. End-to-end user behavior retrieval in click-through rateprediction model. 21th ACM SIGKDD international conference on knowledge discovery and data
arXiv preprint arXiv:2108.04468 (2021). mining. 1939–1947.
[5] Wenxiao Chen, Haowen Xu, Zeyan Li, Dan Pei, Jie Chen, Honglin Qiao, Yang [25] Alexander Lavin and Subutai Ahmad. 2015. Evaluating real-time anomaly detec-
Feng, and Zhaogang Wang. 2019. Unsupervised anomaly detection for intricate tion algorithms–the Numenta anomaly benchmark. In 2015 IEEE 14th international
kpis via adversarial training of vae. In IEEE INFOCOM 2019-IEEE Conference on conference on machine learning and applications (ICMLA). IEEE, 38–44.
Computer Communications. IEEE, 1891–1899. [26] Arthur Le Guennec, Simon Malinowski, and Romain Tavenard. 2016. Data
[6] Yinfang Chen, Huaibing Xie, Minghua Ma, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, augmentation for time series classification using convolutional neural networks.
Xuedong Gao, Hao Fan, Ming Wen, et al. 2024. Automatic Root Cause Analysis In ECML/PKDD workshop on advanced analytics and learning on temporal data.
via Large Language Models for Cloud Incidents. (2024). [27] Zeyan Li, Wenxiao Chen, and Dan Pei. 2018. Robust and unsupervised kpi
[7] Yuhang Chen, Chaoyun Zhang, Minghua Ma, Yudong Liu, Ruomeng Ding, Bowen anomaly detection based on conditional variational autoencoder. In 2018 IEEE 37th
Li, Shilin He, Saravan Rajmohan, Qingwei Lin, and Dongmei Zhang. 2023. Imdif- International Performance Computing and Communications Conference (IPCCC).
fusion: Imputed diffusion models for multivariate time series anomaly detection. IEEE, 1–9.
VLDB (2023). [28] Zeyan Li, Nengwen Zhao, Shenglin Zhang, Yongqian Sun, Pengfei Chen, Xidao
[8] Robert B Cleveland, William S Cleveland, Jean E McRae, and Irma Terpenning. Wen, Minghua Ma, and Dan Pei. 2022. Constructing Large-Scale Real-World
1990. STL: A seasonal-trend decomposition. J. Off. Stat 6, 1 (1990), 3–73. Benchmark Datasets for AIOps. arXiv preprint arXiv:2208.03938 (2022).
[9] Liang Dai, Tao Lin, Chang Liu, Bo Jiang, Yanwei Liu, Zhen Xu, and Zhi-Li Zhang. [29] Zhihan Li, Youjian Zhao, Yitong Geng, Zhanxiang Zhao, Hanzhang Wang, Wenx-
2021. SDFVAE: Static and Dynamic Factorized VAE for Anomaly Detection of iao Chen, Huai Jiang, Amber Vaidya, Liangfei Su, and Dan Pei. 2022. Situation-
Multivariate CDN KPIs. In Proceedings of the Web Conference 2021 (Ljubljana, Aware Multivariate Time Series Anomaly Detection Through Active Learning
Slovenia) (WWW ’21). Association for Computing Machinery, New York, NY, and Contrast VAE-Based Models in Large Distributed Systems. IEEE Journal on
USA, 3076–3086. https://doi.org/10.1145/3442381.3450013 Selected Areas in Communications 40, 9 (2022), 2746–2765.
[10] L de Haan and A Ferreira. 2006. Extreme Value Theory: an Introduction Springer [30] Zhihan Li, Youjian Zhao, Jiaqi Han, Ya Su, Rui Jiao, Xidao Wen, and Dan Pei. 2021.
Science+ Business Media. LLC, New York (2006). Multivariate time series anomaly detection and interpretation using hierarchical
[11] Shohreh Deldari, Daniel V. Smith, Hao Xue, and Flora D. Salim. 2021. Time Series inter-metric and temporal embedding. In Proceedings of the 27th ACM SIGKDD
Change Point Detection with Self-Supervised Contrastive Predictive Coding. conference on knowledge discovery & data mining. 3220–3230.
In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW ’21). [31] Dapeng Liu, Youjian Zhao, Haowen Xu, Yongqian Sun, Dan Pei, Jiao Luo, Xi-
Association for Computing Machinery, New York, NY, USA, 3124–3135. https: aowei Jing, and Mei Feng. 2015. Opprentice: Towards practical and automatic
//doi.org/10.1145/3442381.3449903 anomaly detection through machine learning. In Proceedings of the 2015 internet
[12] XiaoYan Duan, NingJiang Chen, and YongSheng Xie. 2019. Intelligent detection measurement conference. 211–224.
of large-scale KPI streams anomaly based on transfer learning. In Big Data: 7th [32] Wei Lu and Ali A Ghorbani. 2008. Network anomaly detection based on wavelet
CCF Conference, BigData 2019, Wuhan, China, September 26–28, 2019, Proceedings analysis. EURASIP Journal on Advances in Signal processing 2009 (2008), 1–16.
7. Springer, 366–379. [33] Xiaofeng Lu, Xiaoyu Zhang, and Pietro Lio. 2023. GAT-DNS: DNS Multivariate
[13] Vaibhav Ganatra, Anjaly Parayil, Supriyo Ghosh, Yu Kang, Minghua Ma, Chetan Time Series Prediction Model Based on Graph Attention Network. In Companion
Bansal, Suman Nath, and Jonathan Mace. 2023. Detection Is Better Than Cure: Proceedings of the ACM Web Conference 2023 (Austin, TX, USA) (WWW ’23
A Cloud Incidents Perspective. In Proceedings of the 31st ACM Joint European Companion). Association for Computing Machinery, New York, NY, USA, 127–131.
Software Engineering Conference and Symposium on the Foundations of Software https://doi.org/10.1145/3543873.3587329
Engineering. 1891–1902. [34] Minghua Ma, Shenglin Zhang, Junjie Chen, Jim Xu, Haozhe Li, Yongliang Lin,
[14] Nikou Günnemann, Stephan Günnemann, and Christos Faloutsos. 2014. Robust Xiaohui Nie, Bo Zhou, Yong Wang, and Dan Pei. 2021. { Jump-Starting } Mul-
Multivariate Autoregression for Anomaly Detection in Dynamic Product Ratings. tivariate Time Series Anomaly Detection for Online Service Systems. In 2021
In Proceedings of the 23rd International Conference on World Wide Web (Seoul, USENIX Annual Technical Conference (USENIX ATC 21). 413–426.
Korea) (WWW ’14). Association for Computing Machinery, New York, NY, USA, [35] Ajay Mahimkar, Zihui Ge, Jia Wang, Jennifer Yates, Yin Zhang, Joanne Emmons,
361–372. https://doi.org/10.1145/2566486.2568008 Brian Huntley, and Mark Stockert. 2011. Rapid detection of maintenance induced
[15] Xiaodi Hou and Liqing Zhang. 2007. Saliency detection: A spectral residual changes in service performance. In Proceedings of the Seventh COnference on
approach. In 2007 IEEE Conference on computer vision and pattern recognition. Emerging Networking EXperiments and Technologies. 1–12.
Ieee, 1–8. [36] Oded Ovadia, Oren Elisha, and Elad Yom-Tov. 2022. Detection of Infectious
[16] Siteng Huang, Donglin Wang, Xuehan Wu, and Ao Tang. 2019. Dsanet: Dual Disease Outbreaks in Search Engine Time Series Using Non-Specific Syndromic
self-attention network for multivariate time series forecasting. In Proceedings of Surveillance with Effect-Size Filtering. In Companion Proceedings of the Web
the 28th ACM international conference on information and knowledge management. Conference 2022 (Virtual Event, Lyon, France) (WWW ’22). Association for Com-
2129–2132. puting Machinery, New York, NY, USA, 924–929. https://doi.org/10.1145/3487553.
[17] Tao Huang, Pengfei Chen, and Ruipeng Li. 2022. A Semi-Supervised VAE Based 3524672
Active Anomaly Detection Framework in Multivariate Time Series for Online [37] Faraz Rasheed, Peter Peng, Reda Alhajj, and Jon Rokne. 2009. Fourier trans-
Systems. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, form based spatial outlier mining. In Intelligent Data Engineering and Automated
France) (WWW ’22). Association for Computing Machinery, New York, NY, USA, Learning-IDEAL 2009: 10th International Conference, Burgos, Spain, September
1797–1806. https://doi.org/10.1145/3485447.3511984 23-26, 2009. Proceedings 10. Springer, 317–324.
[18] Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and [38] Hansheng Ren, Bixiong Xu, Yujing Wang, Chao Yi, Congrui Huang, Xiaoyu Kou,
Tom Soderstrom. 2018. Detecting spacecraft anomalies using lstms and nonpara- Tony Xing, Mao Yang, Jie Tong, and Qi Zhang. 2019. Time-series anomaly detec-
metric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD interna- tion service at microsoft. In Proceedings of the 25th ACM SIGKDD international
tional conference on knowledge discovery & data mining. 387–395. conference on knowledge discovery & data mining. 3009–3017.
[19] Yuxuan Jiang, Chaoyun Zhang, Shilin He, Zhihao Yang, Minghua Ma, Si Qin, [39] Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochas-
Yu Kang, Yingnong Dang, Saravan Rajmohan, Qingwei Lin, et al. 2024. Xpert: tic backpropagation and approximate inference in deep generative models. In
Empowering Incident Management with Query Recommendations via Large International conference on machine learning. PMLR, 1278–1286.
Language Models. (2024). [40] Bernard Rosner. 1983. Percentage points for a generalized ESD many-outlier
[20] Pengxiang Jin, Shenglin Zhang, Minghua Ma, Haozhe Li, Yu Kang, Liqun Li, procedure. Technometrics 25, 2 (1983), 165–172.
Yudong Liu, Bo Qiao, Chaoyun Zhang, Pu Zhao, et al. 2023. Assess and Summarize: [41] Lifeng Shen, Zhuocong Li, and James Kwok. 2020. Timeseries anomaly detection
Improve Outage Understanding with Large Language Models. (2023). using temporal hierarchical one-class network. Advances in Neural Information
[21] Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodriguez, Chao Zhang, and Processing Systems 33 (2020), 13016–13026.
B Aditya Prakash. 2022. CAMul: Calibrated and Accurate Multi-View Time-Series [42] Alban Siffer, Pierre-Alain Fouque, Alexandre Termier, and Christine Largouet.
Forecasting. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, 2017. Anomaly detection in streams with extreme value theory. In Proceedings of
France) (WWW ’22). Association for Computing Machinery, New York, NY, USA, the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data
3174–3185. https://doi.org/10.1145/3485447.3512037 Mining. 1067–1075.
WWW ’24, May 13–17, 2024, Singapore Wang et al.
[43] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output ’21). Association for Computing Machinery, New York, NY, USA, 2404–2409.
representation using deep conditional generative models. Advances in neural https://doi.org/10.1145/3448016.3457236
information processing systems 28 (2015). [53] Chaoli Zhang, Tian Zhou, Qingsong Wen, and Liang Sun. 2022. TFAD: A De-
[44] Owen Vallis, Jordan Hochenbaum, and Arun Kejariwal. 2014. A novel technique composition Time Series Anomaly Detection Architecture with Time-Frequency
for long-term anomaly detection in the cloud. In 6th { USENIX } workshop on hot Analysis. In Proceedings of the 31st ACM International Conference on Information
topics in cloud computing (HotCloud 14). & Knowledge Management. 2497–2507.
[45] Charles Van Loan. 1992. Computational frameworks for the fast Fourier transform. [54] Shenglin Zhang, Zhenyu Zhong, Dongwen Li, Qiliang Fan, Yongqian Sun, Man
SIAM. Zhu, Yuzhi Zhang, Dan Pei, Jiyan Sun, Yinlong Liu, et al. 2022. Efficient kpi
[46] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, anomaly detection through transfer learning for large-scale web services. IEEE
Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all Journal on Selected Areas in Communications 40, 8 (2022), 2440–2455.
you need. Advances in neural information processing systems 30 (2017). [55] Xu Zhang, Qingwei Lin, Yong Xu, Si Qin, Hongyu Zhang, Bo Qiao, Yingnong
[47] Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue Wang, Dang, Xinsheng Yang, Qian Cheng, Murali Chintalapati, et al. 2019. Cross-dataset
and Huan Xu. 2020. Time series data augmentation for deep learning: A survey. Time Series Anomaly Detection for Cloud Systems.. In USENIX Annual Technical
arXiv preprint arXiv:2002.12478 (2020). Conference. 1063–1076.
[48] Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng [56] Yin Zhang, Zihui Ge, Albert Greenberg, and Matthew Roughan. 2005. Network
Long. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series anomography. In Proceedings of the 5th ACM SIGCOMM conference on Internet
Analysis. In International Conference on Learning Representations. Measurement. 30–30.
[49] Sihong Xie, Guan Wang, Shuyang Lin, and Philip S. Yu. 2012. Review Spam [57] Chenyu Zhao, Minghua Ma, Zhenyu Zhong, Shenglin Zhang, Zhiyuan Tan, Xiao
Detection via Time Series Pattern Discovery. In Proceedings of the 21st Inter- Xiong, LuLu Yu, Jiayi Feng, Yongqian Sun, Yuzhi Zhang, et al. 2023. Robust
national Conference on World Wide Web (Lyon, France) (WWW ’12 Compan- Multimodal Failure Detection for Microservice Systems. 29th ACM SIGKDD
ion). Association for Computing Machinery, New York, NY, USA, 635–636. Conference on Knowledge Discovery and Data Mining (KDD) (2023).
https://doi.org/10.1145/2187980.2188164 [58] Nengwen Zhao, Jing Zhu, Yao Wang, Minghua Ma, Wenchi Zhang, Dapeng Liu,
[50] Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ming Zhang, and Dan Pei. 2019. Automatic and generic periodicity adaptation
Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, et al. 2018. Unsupervised anomaly for kpi anomaly detection. IEEE Transactions on Network and Service Management
detection via variational auto-encoder for seasonal kpis in web applications. In 16, 3 (2019), 1170–1183.
Proceedings of the 2018 world wide web conference. 187–196. [59] Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong,
[51] Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. 2022. Anomaly and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long se-
Transformer: Time Series Anomaly Detection with Association Discrepancy. In quence time-series forecasting. In Proceedings of the AAAI conference on artificial
International Conference on Learning Representations. https://openreview.net/ intelligence. 11106–11115.
forum?id=LzQQ89U1qm_ [60] Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin.
[52] Zhiqiang Xu, Dong Li, Weijie Zhao, Xing Shen, Tianbo Huang, Xiaoyun Li, 2022. Fedformer: Frequency enhanced decomposed transformer for long-term
and Ping Li. 2021. Agile and Accurate CTR Prediction Model Training for series forecasting. In International Conference on Machine Learning. PMLR, 27268–
Massive-Scale Online Advertising Systems. In Proceedings of the 2021 Inter- 27286.
national Conference on Management of Data (Virtual Event, China) (SIGMOD