WWW24 Graph YangyangLi
WWW24 Graph YangyangLi
Yongdong Zhang
[email protected]
University of Science and Technology
of China
Hefei, China
ABSTRACT CCS CONCEPTS
Graph anomaly detection (GAD) has various applications in fi- • Security and privacy → Web application security; • Com-
nance, healthcare, and security. Graph Neural Networks (GNNs) puting methodologies → Semi-supervised learning settings;
are now the primary method for GAD, treating it as a task of semi- Neural networks.
supervised node classification (normal vs. anomalous). However,
most traditional GNNs aggregate and average embeddings from KEYWORDS
all neighbors, without considering their labels, which can hinder Graph Neural Networks, Anomaly Detection, Bi-level Optimization
detecting actual anomalies. To address this issue, previous methods
try to selectively aggregate neighbors. However, the same selection ACM Reference Format:
strategy is applied regardless of normal and anomalous classes, Yuan Gao, Junfeng Fang, Yongduo Sui, Yangyang Li, Xiang Wang, Huamin
which does not fully solve this issue. This study discovers that Feng, and Yongdong Zhang. 2024. Graph Anomaly Detection with Bi-level
nodes with different classes yet similar neighbor label distributions Optimization. In Proceedings of the ACM Web Conference 2024 (WWW ’24),
May 13–17, 2024, Singapore, Singapore. ACM, New York, NY, USA, 12 pages.
(NLD) tend to have opposing loss curves, which we term it as “loss
https://doi.org/10.1145/3589334.3645673
rivalry”. By introducing Contextual Stochastic Block Model (CSBM)
and defining NLD distance, we explain this phenomenon theoreti-
cally and propose a Bi-level optimization Graph Neural Network 1 INTRODUCTION
(BioGNN), based on these observations. In a nutshell, the lower Graph anomaly detection (GAD) is a learning-to-detect task. The
level of BioGNN segregates nodes based on their classes and NLD, objective is to differentiate anomalies from normal ones, assuming
while the upper level trains the anomaly detector using separation that the anomalies are generated from a distinct distribution that
outcomes. Our experiments demonstrate that BioGNN outperforms diverges from the normal nodes [26]. As demonstrated by [33],
state-of-the-art methods and effectively mitigates “loss rivalry”. GAD has various real-world applications including detecting spam
Codes are available at https://github.com/blacksingular/Bio-GNN. reviews in user-rating-product graphs [17], finding misinformation
and fake news in social networks [9], and identifying fraud in
∗ Corresponding authors.
† Xiang
financial transaction graphs [34, 51].
Wang is also affiliated with Institute of Dataspace, Hefei Comprehensive
National Science Center.
A primary method is to consider GAD as a semi-supervised node
classification problem, where the edges are crucial. By examining
the edges, we can divide an ego node’s neighbors into two groups:
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed (1) homophilous neighbors that have the same labels as the ego node,
for profit or commercial advantage and that copies bear this notice and the full citation and (2) heterophilous neighbors whose labels are different from the
on the first page. Copyrights for components of this work owned by others than the ego node’s label. For instance, in the case of an anomaly ego node, its
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission interactions with anomaly neighbors display homophily, while its
and/or a fee. Request permissions from [email protected]. anomaly-normal edges demonstrate heterophily. Both homophily
WWW ’24, May 13–17, 2024, Singapore, Singapore and heterophily are prevalent in nature. In transaction networks,
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0171-9/24/05. . . $15.00 fraudsters have heterophilous connections with their customers,
https://doi.org/10.1145/3589334.3645673 while their connections with accomplices are homophilous.
4383
WWW ’24, May 13–17, 2024, Singapore, Singapore Yuan Gao et al.
4384
Graph Anomaly Detection with Bi-level Optimization WWW ’24, May 13–17, 2024, Singapore, Singapore
Figure 2: Illustration of the ‘loss rivalry’ phenomenon in YelpChi and Amazon Datasets with BWGNN [51]. From the same-color
circles around the maxima and minima, we observe that the two loss curves in the same dataset are opposite along the epochs.
4385
WWW ’24, May 13–17, 2024, Singapore, Singapore Yuan Gao et al.
Table 2: Summary of the dataset statistics and the neighbor label distributions.
The larger the distance ||𝜇𝑢 −𝜇 𝑣 || 2 , the more expressive the represen-
tation and the better capability of the downstream linear detector.
From (4) and (5), we observe two things: (1) the minimum value
of ||𝜇𝑢 − 𝜇 𝑣 || 2 is achieved when 𝑑 (𝑢, 𝑣) is 0; (2) using second-order
polynomial graph filtering can improve the ability to distinguish
between nodes, especially when the NLD of nodes from differ-
ent classes are similar. This finding aligns with previous research
[64, 69] in this area.
4386
Graph Anomaly Detection with Bi-level Optimization WWW ’24, May 13–17, 2024, Singapore, Singapore
𝝋𝟏
𝒈𝟏 (𝑳) Loss
Prediction-based 𝚽
Input Graph 𝓖 Node Feature 𝜽 Ground Truth
Mask 𝝋𝟐 𝒈𝟐 (𝑳)
Masked
Representation
Figure 4: BioGNN Framework. Mask generator 𝜃 (·) identifies subsets of nodes according to equation (10). Two projection heads
𝜑 1 (·) and 𝜑 2 (·) and two spectral filters 𝑔1 (𝐿) and 𝑔2 (𝐿) assign labels to according subset of nodes. Mask generator and filters are
optimized iteratively according to equation (11) and equation (12).
In this section, we introduce our bi-level optimization graph neural the aggregated feature ℎ𝑢,𝑐 for class 𝑐:
network BioGNN. To begin with, we introduce the learning ob- 1 ∑︁
jectives in §4.1 and present the parameterization process in §4.2. ℎ𝑢,𝑐 = 𝑥𝑣, (9)
|N𝑙,𝑐 (𝑢)|
In §4.4, we validate the effectiveness of the framework on golden- 𝑣 ∈ N𝑙,𝑐 (𝑢 )
separated graphs. where N𝑙,𝑐 (𝑢) is the set of neighbors labeled with 𝑐. When there
are no labeled neighbors belonging to class 𝑐, we assign a zero
4.1 The Learning Objectives embedding to ℎ𝑢,𝑐 . Then we set
To start with, we introduce Lemma 4.1 which is widely agreed upon 𝑀1 (𝑢) = argmax(MLP𝜃 ([𝑥𝑢 ; ℎ𝑢,0 ; ℎ𝑢,1 ])). (10)
in the literature [6, 8, 32]: 𝜕𝑦
To ensure smoothed and well-defined gradients 𝜕𝜃 ,
we apply a
Lemma 4.1 The prediction performance of a spectral filter is better straight-through (ST) gradient estimator [2] to make the model
when the spectral label energy distribution concentrates more on differentiable. Note that BioGNN is trained in an iterative fashion,
the pass band of the filter. the encoders {Φ(·), 𝜑 1 (·), 𝜑 2 (·)} are fixed as {Φ∗ (·), 𝜑 1∗ (·), 𝜑 2∗ (·)},
A more detailed analysis about Lemma 4.1 can be found in [8]. the objective function in this phase is:
Building on Lemma 4.1, we could identify nodes according to the
performance of different spectral filters through bi-level optimiza- min R (Φ∗ (𝑔1 (L)𝜑 1∗ (𝑀1 ◦ X)), Y𝑠𝑒𝑝 )
𝑀1
tion. As shown in Figure 4, our learning objective is twofold: (1)
+ min R (Φ∗ (𝑔2 (L)𝜑 2∗ (𝑀2 ◦ X)), Y𝑠𝑒𝑝 ) (11)
Optimize the encoders {Φ(·), 𝜑 1 (·), 𝜑 2 (·)} to maximize the proba- 𝑀2
bility of correctly classifying nodes separated by 𝜃 (·); (2) Optimize 𝑠.𝑡 . 𝑀1 + 𝑀2 = 1.
the encoder 𝜃 (·) which predicts the NLD of nodes and separate
nodes to two sets. All the encoders are learnable and set as MLPs. Parameterizing {Φ(·), 𝜑 1 (·), 𝜑 2 (·)}. These three encoders serve
Concretely, the learning objective of BioGNN is defined as follows: as a predictor that assigns labels to input nodes. As we aim to
distinguish between different spectral label distributions, which
are closely related to the performance of filters with correspond-
min R (Φ(𝑔1 (L)𝜑 1 (𝑀1 ◦ X)), Y)
𝜑 1 ,Φ,𝑀1 ing band-pass, we adopt low-pass and high-pass filters as 𝑔1 (𝐿)
+ min R (Φ(𝑔2 (L)𝜑 2 (𝑀2 ◦ X)), Y), (8) and 𝑔2 (𝐿), respectively. Here, we choose to use two branches and
𝜑 2 ,Φ,𝑀2 leave the multi-branch framework for future work. Therefore, the
𝑠.𝑡 . 𝑀1 + 𝑀2 = 1, functions of 𝑀1 and 𝑀2 become the masking of nodes with high-
frequency and low-frequency ego-graphs, respectively. In this iter-
where 𝑀1 and 𝑀2 are hard masks given by learnable encoder 𝜃 (·), ative training phase, we freeze the masks as 𝑀1∗ and 1 − 𝑀1∗ , and
1 is an all-one vector, 𝑔1 (𝐿) and 𝑔2 (𝐿) are spectral filters, and ◦ set the objective function as:
denotes the element-wise multiplication.
min R (Φ(𝑔1 (L)𝜑 1 (𝑀1∗ ◦ X)), Y)
Φ,𝜑 1
4.2 Instantiation of BioGNN + min R (Φ(𝑔2 (L)𝜑 2 ((1 − 𝑀1∗ ) ◦ X)), Y).
(12)
Φ,𝜑 2
Given the two-fold objective, we propose to parameterize the en-
coder 𝜃 (·) and {Φ(·), 𝜑 1 (·), 𝜑 2 (·)}. A similar training process has also been used in graph contrastive
learning [50]. For the choice of 𝑔1 (L) and 𝑔2 (L), we adopt Bern-
Parameterizing 𝜃 (·). The encoder 𝜃 (·) serves as a separator that stein polynomial-based filters [27, 51] for their convenience to
predicts the NLD of nodes and feeds nodes into different branches decompose low-pass and high-pass filters:
of filters. Consequently, to obtain informative input for 𝜃 , we em-
ploy a label-wise message passing layer [11] which aggregates the 1 (L/2)𝑎 (𝐼 − L/2)𝑏
𝑔(L) = 𝑈 𝛽𝑎,𝑏 (Λ)𝑈 𝑇 = ∫ 1 , (13)
labeled neighbors of the nodes label-wise. Concretely, for node 𝑢, 2 2 0 𝑡 𝑎−1 (1 − 𝑡)𝑏 −1 d𝑡
4387
WWW ’24, May 13–17, 2024, Singapore, Singapore Yuan Gao et al.
where 𝛽𝑎,𝑏 is the standard beta distribution parameterized by 𝑎 and 4.4 Validation on Golden-separated Graphs
𝑏. When 𝑎 → 0, we acquire 𝑔(L) as a low-pass filter; similarly, From an omniscient perspective, where we know all the labels of
𝑔(L) acts as a high-pass filter when 𝑏 → 0. For the choices of 𝑎 the nodes, we have access to the accurate NLD of all the nodes. In
and 𝑏 on the specific benchmark and more training details, please this case, we can separate the nodes ideally, and validate the effec-
refer to Appendix B.1 and B.2. tiveness of BioGNN excluding the impact of false NLD prediction.
From Figure 5a, we observe that the loss decreases smoothly,
4.3 Initialization of BioGNN demonstrating our argument that mixed nodes are the main cause
To embrace a more stable process of the bi-level optimization, we of the “loss rivalry” phenomenon. Based on this finding, BioGNN
initialize the encoders before iterative training. can alleviate the problem and boost the performance of GAD. We
discovered that the training order is significant in achieving bet-
Initialization of 𝜃 (·). 𝜃 (·) is initialized in a supervised fashion, ter performance. Training nodes with high-frequency ego-graphs
where the supervision signal is obtained by counting the labeled before those with low-frequency ones leads to better results. One
inter-class neighbors: possible reason for this is the shared linear classifier Φ between the
two branches. Embeddings learned from the high-pass filter are
1 ∑︁
𝑌𝑠𝑒𝑝 (𝑢) = 𝑟𝑜𝑢𝑛𝑑 ( |{𝑦𝑢 ≠ 𝑦 𝑣 }|), (14) noisier, and a classifier that performs well on noisy embeddings
|N𝐿 (𝑢)|
𝑣 ∈ N𝐿 (𝑢 ) would most likely perform well on the whole dataset [28]. We con-
sider this to be an intriguing discovery, yet leaving a comprehensive
then the cross-entropy is minimized: theoretical examination for future research.
4388
Graph Anomaly Detection with Bi-level Optimization WWW ’24, May 13–17, 2024, Singapore, Singapore
Table 3: Performance Results. The best results are in boldface, and the 2nd-best are underlined.
• ChebyNet [12]: ChebyNet generalizes CNN to graph data in the • MixHop [1]: Mixhop repeatedly mixes feature representations
context of spectral graph theory. of neighbors at various distances to learn relationships.
• GWNN [62]: GWNN leverages graph wavelet transform to ad- • GPRGNN [10]: GPR-GNN learns a polynomial filter by directly
dress the shortcomings of spectral graph CNN methods that performing gradient descent on the polynomial coefficients.
depend on graph Fourier transform.
• JKNet [64]: The jumping-knowledge network which concate-
nates or max-pooling the hidden representations. 5.2 Performance Comparison
• Care-GNN [17]: Care-GNN is a camouflage-resistant GNN that
The main results are reported in Table 3. Note that we search for
adaptively samples neighbors according to the feature similarity.
the best threshold to achieve the best F1-macro in validation for
• PC-GNN [34]: PC-GNN consists of two modules “pick” and
all methods. In general, BioGNN achieves the best F1-macro score
“choose”, and maintains a balanced label frequency around fraud-
except YelpChi, empirically verifying that it has a larger distance
sters by downsampling and upsampling.
between predictions and the decision boundary, benefiting from
• H2GCN [69]: H2GCN is a tailored heterophily GNN which iden-
measuring the NLD distance. For AUC, BioGNN performs poorly
tifies three useful designs.
in T-Social. We suppose the reason is that T-social has a complex
• BWGNN [51]: BWGNN is a spectral filter addressing the “right-
frequency composition since the best performance is achieved when
shift" phenomenon in anomaly detection.
the frequency order is high according to BWGNN [51]. We believe
• GDN [24]: GDN deals with heterophily by leveraging constraints
this issue could be alleviated if multi-branch filters are adopted,
on original node features.
which we leave for future work. Furthermore, some methods could
• GHRN [23]: GHRN calculates post-aggregation score and mod-
achieve high AUC while maintaining a low F1-Macro, indicating
ify the graph to make downstream model better at handling
that the instances can be distinguished but hold tightly in the space.
heterophily issues.
In such cases, we can’t evaluate these methods as effective [44].
4389
WWW ’24, May 13–17, 2024, Singapore, Singapore Yuan Gao et al.
Figure 7: The ego-graph of some yellow-circled ego nodes classified as high-frequency by BioGNN in YelpChi. The anomalies
are represented in red, while normals are represented in blue.
H2GCN, MixHop, and GPRGNN are three state-of-the-art het- the distribution curves are shown in dashed lines. From Figure 6,
erophilous GNNs that shed light on the relationship between the we observe that the two histograms seldom overlap, and the mean
ego node and neighbor labels. We observe that they consistently of two curves maintains a separable distance, demonstrating that
outperform other groups of methods, including some tailored GAD BioGNN successfully sets the nodes apart.
methods. We ascribe this large performance gap to two reasons: (1)
Visualization. To show the results in an intuitive way, we report
the harmfulness of heterophily where vast normal neighborhoods
the ego-graph of some nodes in Figure 7. These nodes are assigned
attenuate the suspiciousness of the anomalies; (2) the superiority
to the high-pass filter by 𝜃 (·). As observed from the figure where
of spectral filters to distinguish nodes with different NLD. How-
color denotes the class of the nodes, the ego node (red-circled) has
ever, BioGNN outperforms these methods, especially in F1-Macro,
more inter-class neighbors compared to the nodes assigned to the
where the improvement ranges from 2.7% to 25.8%. This supports
low-pass filter. This finding provides support for Equation 7 and
our analysis that different class nodes with similar NLD should be
verifies the effectiveness of our novel framework.
treated separately to alleviate “loss rivalry”. Furthermore, among
the tailored GNN methods, BWGNN and BioGNN are polynomial- Time complexity analysis. The time complexity of BioGNN is
based filters that perform better than others, further suggesting 𝑂 (𝐶 |E |), where 𝐶 is a constant and |E | denotes the number of edges
that spectral filtering is more promising in GAD. in the graph. This is due to the fact that the BernNet-based filter is
In several datasets, MLP outperforms some GNN-based methods, a polynomial function that can be computed recursively [51].
indicating that blindly mixing neighbors can sometimes degrade the
prediction performance. Therefore, structural information should 6 LIMITATION AND CONCLUSION
be used with care, especially when the neighborhood label distri-
butions for nodes are complex. Limitation. Although we propose a novel network that treats
nodes separately, it has some limitations. Our work only separates
the nodes into two sets, and we hope to extend it to more fine-
5.3 Analysis of BioGNN grained multi-branch neural networks in the future. Furthermore,
In this section, we take a closer look in BioGNN. We first verify the our theoretical result largely relies on CSBM’s assumptions; hence
smoothness of the BioGNN loss curve to demonstrate its effective- our model may fail in some cases where the graph generation
ness in alleviating “loss rivalry”. Then we plot the distribution of process doesn’t follow these assumptions.
the separated nodes to elucidate that our model can successfully Conclusion. This work starts with “loss rivalry”, expressing the
discriminate nodes with different NLD and set them apart. Making phenomenon that some nodes tend to have opposite loss curves
it more clear, we visualize some high-frequency ego-graphs. from others. We argue that it is caused by the mixed training of dif-
Loss Rivalry Addressing. To answer the question of whether ferent class nodes with similar NLD. Furthermore, we discover that
BioGNN can alleviate the “loss rivalry”, we plot the training loss spectral filters are superior in addressing the problem. To this end,
of BioGNN in Figure 5b. Similar to Section 4.4, two separate sets we propose BioGNN, which essentially discriminates nodes that
of nodes are trained in a specific order: high-frequency nodes are share similar NLD but are likely to be in different classes and feeds
trained first, followed by low-frequency nodes. Comparing Figure them into different filters to prevent “loss rivalry”. Although the
2, 5a, and 5b, we find that the smoothness of BioGNN’s training dataset, experiments and analysis of the are based on graph anom-
curve lies between golden-separate and mixed training, indicating aly detection, this two-branch method could be further deployed
that the new framework is effective in alleviating “loss rivalry” and to more downstream tasks, such as graph adversarial learning [52–
improves the overall performance of GAD. 54], graph prompts learning [66–68], graph OOD [47–49], graph
explanation [19, 20, 22] in the future.
Distribution of the separated nodes. The core of BioGNN is
node separation. To further validate its effectiveness, we report the
empirical histogram of the NLD in four benchmarks in Figure 6. The 7 ACKNOWLEDGMENTS
x-axis represents the edge homophily, which explicitly represents This research is supported by the National Natural Science Founda-
the NLD around the ego node. The y-axis denotes the density, and tion of China (9227010114).
4390
Graph Anomaly Detection with Bi-level Optimization WWW ’24, May 13–17, 2024, Singapore, Singapore
REFERENCES [30] Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang.
[1] Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina 2020. Graph Structure Learning for Robust Graph Neural Networks. In KDD.
Lerman, Hrayr Harutyunyan, Greg Ver Steeg, and Aram Galstyan. 2019. Mixhop: ACM, 66–74.
Higher-order graph convolutional architectures via sparsified neighborhood [31] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with
mixing. In ICML. 21–29. Graph Convolutional Networks. In ICLR.
[2] Yoshua Bengio, Nicholas Léonard, and Aaron C. Courville. 2013. Estimating or [32] Runlin Lei, Zhen Wang, Yaliang Li, Bolin Ding, and Zhewei Wei. 2022. EvenNet:
Propagating Gradients Through Stochastic Neurons for Conditional Computation. Ignoring Odd-Hop Neighbors Improves Robustness of Graph Neural Networks.
CoRR abs/1308.3432 (2013). In NeuIPS.
[3] Deyu Bo, Xiao Wang, Chuan Shi, and Huawei Shen. 2021. Beyond low-frequency [33] Kay Liu, Yingtong Dou, Yue Zhao, Xueying Ding, Xiyang Hu, Ruitong Zhang,
information in graph convolutional networks. In AAAI. 3950–3957. Kaize Ding, Canyu Chen, Hao Peng, Kai Shu, et al. 2022. BOND: Benchmarking
[4] Ziwei Chai, Siqi You, Yang Yang, Shiliang Pu, Jiarong Xu, Haoyang Cai, and Unsupervised Outlier Node Detection on Static Attributed Graphs. In NeurIPS
Weihao Jiang. 2022. Can Abnormality be Detected by Graph Neural Networks?. Datasets and Benchmarks Track.
In IJCAI. 1945–1951. [34] Yang Liu, Xiang Ao, Zidi Qin, Jianfeng Chi, Jinghua Feng, Hao Yang, and Qing
[5] Sudhanshu Chanpuriya and Cameron Musco. 2022. Simplified Graph Convolution He. 2021. Pick and choose: a GNN-based imbalanced learning approach for fraud
with Heterophily. In NeurIPS. detection. In WWW. 3168–3177.
[6] Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. 2020. [35] Zhiwei Liu, Yingtong Dou, Philip S Yu, Yutong Deng, and Hao Peng. 2020. Al-
Simple and Deep Graph Convolutional Networks. In ICML. 1725–1735. leviating the inconsistency problem of applying graph neural network to fraud
[7] Yu Chen, Lingfei Wu, and Mohammed J. Zaki. 2020. Iterative Deep Graph detection. In SIGIR. 1569–1572.
Learning for Graph Neural Networks: Better and Robust Node Embeddings. In [36] Kangkang Lu, Yanhua Yu, Hao Fei, Xuan Li, Zixuan Yang, Zirui Guo, Meiyu Liang,
NeurIPS. Mengran Yin, and Tat-Seng Chua. 2024. Improving Expressive Power of Spectral
[8] Zhixian Chen, Tengfei Ma, and Yang Wang. 2022. When Does A Spectral Graph Graph Neural Networks with Eigenvalue Correction. CoRR abs/2401.15603 (2024).
Neural Network Fail in Node Classification? CoRR abs/2202.07902 (2022). [37] Dongsheng Luo, Wei Cheng, Wenchao Yu, Bo Zong, Jingchao Ni, Haifeng Chen,
[9] Lu Cheng, Ruocheng Guo, Kai Shu, and Huan Liu. 2021. Causal understanding and Xiang Zhang. 2021. Learning to Drop: Robust Graph Neural Network via
of fake news dissemination on social media. In KDD. 148–157. Topological Denoising. In WSDM. ACM, 779–787.
[10] Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. 2021. Adaptive Universal [38] Xiaoxiao Ma, Jia Wu, Shan Xue, Jian Yang, Chuan Zhou, Quan Z Sheng, Hui
Generalized PageRank Graph Neural Network. In ICLR. Xiong, and Leman Akoglu. 2021. A comprehensive survey on graph anomaly
[11] Enyan Dai, Zhimeng Guo, and Suhang Wang. 2021. Label-Wise Message Passing detection with deep learning. TKDE (2021).
Graph Neural Network on Heterophilic Graphs. CoRR abs/2110.08128 (2021). [39] Yao Ma, Xiaorui Liu, Neil Shah, and Jiliang Tang. 2022. Is homophily a necessity
[12] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu- for graph neural networks?. In ICLR.
tional Neural Networks on Graphs with Fast Localized Spectral Filtering. In NIPS. [40] Yao Ma, Xiaorui Liu, Tong Zhao, Yozen Liu, Jiliang Tang, and Neil Shah. 2021. A
3837–3845. Unified View on Graph Neural Networks as Graph Signal Denoising. In CIKM.
[13] Yash Deshpande, Subhabrata Sen, Andrea Montanari, and Elchanan Mossel. 2018. ACM, 1202–1211.
Contextual Stochastic Block Models. In NeurIPS. 8590–8602. [41] Julian John McAuley and Jure Leskovec. 2013. From amateurs to connoisseurs:
[14] Kaize Ding, Jundong Li, and Huan Liu. 2019. Interactive anomaly detection on modeling the evolution of user expertise through online reviews. In WWW.
attributed networks. In WSDM. 357–365. 897–908.
[15] Kaize Ding, Zhe Xu, Hanghang Tong, and Huan Liu. 2022. Data Augmentation [42] Shebuti Rayana and Leman Akoglu. 2016. Collective opinion spam detection
for Deep Graph Learning: A Survey. SIGKDD Explor. 24, 2 (2022), 61–77. using active inference. In SDM. 630–638.
[16] Yushun Dong, Kaize Ding, Brian Jalaian, Shuiwang Ji, and Jundong Li. 2021. [43] Fengzhao Shi, Yanan Cao, Yanmin Shang, Yuchen Zhou, Chuan Zhou, and Jia
Adagnn: Graph neural networks with adaptive frequency response filter. In Wu. 2022. H2-FDetector: A GNN-based Fraud Detector with Homophilic and
CIKM. 392–401. Heterophilic Connections. In WWW. ACM, 1486–1494.
[17] Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S Yu. 2020. [44] Wentao Shi, Jiawei Chen, Fuli Feng, Jizhi Zhang, Junkang Wu, Chongming Gao,
Enhancing graph neural network-based fraud detectors against camouflaged and Xiangnan He. 2023. On the Theories Behind Hard Negative Sampling for
fraudsters. In CIKM. 315–324. Recommendation. In WWW. ACM, 812–822.
[18] Lun Du, Xiaozhou Shi, Qiang Fu, Xiaojun Ma, Hengyu Liu, Shi Han, and Dongmei [45] Wentao Shi, Junkang Wu, Xuezhi Cao, Jiawei Chen, Wenqiang Lei, Wei Wu, and
Zhang. 2022. GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Xiangnan He. 2023. FFHR: Fully and Flexible Hyperbolic Representation for
Both Homophily and Heterophily. In WWW. ACM, 1550–1558. Knowledge Graph Completion. CoRR abs/2302.04088 (2023).
[19] Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xi- [46] Yongduo Sui, Tianlong Chen, Pengfei Xia, Shuyao Wang, and Bin Li. 2022. To-
ang Wang, and Xiangnan He. 2024. EXGC: Bridging Efficiency and Explainability wards robust detection and segmentation using vertical and horizontal adversarial
in Graph Condensation. In WWW. ACM. training. In IJCNN. 1–8.
[20] Junfeng Fang, Wei Liu, Yuan Gao, Zemin Liu, An Zhang, Xiang Wang, and [47] Yongduo Sui, Xiang Wang, Tianlong Chen, Meng Wang, Xiangnan He, and Tat-
Xiangnan He. 2023. Evaluating Post-hoc Explanations for Graph Neural Networks Seng Chua. 2023. Inductive Lottery Ticket Learning for Graph Neural Networks.
via Robustness Analysis. In NeurIPS. Journal of Computer Science and Technology (2023).
[21] Junfeng Fang, Wei Liu, An Zhang, Xiang Wang, Xiangnan He, Kun Wang, and [48] Yongduo Sui, Xiang Wang, Jiancan Wu, Min Lin, Xiangnan He, and Tat-Seng Chua.
Tat-Seng Chua. 2022. On Regularization for Explaining Graph Neural Networks: 2022. Causal attention for interpretable and generalizable graph classification. In
An Information Theory Perspective. (2022). KDD. 1696–1705.
[22] Junfeng Fang, Xiang Wang, An Zhang, Zemin Liu, Xiangnan He, and Tat-Seng [49] Yongduo Sui, Qitian Wu, Jiancan Wu, Qing Cui, Longfei Li, Jun Zhou, Xiang Wang,
Chua. 2023. Cooperative Explanations of Graph Neural Networks. In WSDM. and Xiangnan He. 2023. Unleashing the Power of Graph Data Augmentation on
ACM, 616–624. Covariate Distribution Shift. In NeurIPS.
[23] Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, and Yong- [50] Susheel Suresh, Pan Li, Cong Hao, and Jennifer Neville. 2021. Adversarial Graph
dong Zhang. 2023. Addressing Heterophily in Graph Anomaly Detection: A Augmentation to Improve Graph Contrastive Learning. In NeurIPS. 15920–15933.
Perspective of Graph Spectrum. In WWW. ACM, 1528–1538. [51] Jianheng Tang, Jiajin Li, Ziqi Gao, and Jia Li. 2022. Rethinking Graph Neural
[24] Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, and Yong- Networks for Anomaly Detection. In ICML. 21076–21089.
dong Zhang. 2023. Alleviating Structrual Distribution Shift in Graph Anomaly [52] Shuchang Tao, Qi Cao, Huawei Shen, Liang Hou, and Xueqi Cheng. 2021. Adver-
Detection. In WSDM. sarial Immunization for Certifiable Robustness on Graphs. In WSDM.
[25] Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, and Yong- [53] Shuchang Tao, Qi Cao, Huawei Shen, Junjie Huang, Yunfan Wu, and Xueqi Cheng.
dong Zhang. 2023. Alleviating Structural Distribution Shift in Graph Anomaly 2021. Single Node Injection Attack against Graph Neural Networks. In CIKM.
Detection. In WSDM. 1794–1803.
[26] Douglas M Hawkins. 1980. Identification of outliers. Vol. 11. Springer. [54] Shuchang Tao, Huawei Shen, Qi Cao, Yunfan Wu, Liang Hou, and Xueqi Cheng.
[27] Mingguo He, Zhewei Wei, Zengfeng Huang, and Hongteng Xu. 2021. Bern- 2023. Graph Adversarial Immunization for Certifiable Robustness. TKDE (2023).
Net: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation. In [55] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
NeurIPS. 14239–14251. Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR.
[28] Weihua Hu, Kaidi Cao, Kexin Huang, Edward W. Huang, Karthik Subbian, and [56] Daixin Wang, Jianbin Lin, Peng Cui, Quanhui Jia, Zhen Wang, Yanming Fang,
Jure Leskovec. 2022. TuneUp: A Training Strategy for Improving Generalization Quan Yu, Jun Zhou, Shuang Yang, and Yuan Qi. 2019. A semi-supervised graph
of Graph Neural Networks. CoRR abs/2210.14843 (2022). attentive network for financial fraud detection. In ICDM. 598–607.
[29] Mengda Huang, Yang Liu, Xiang Ao, Kuan Li, Jianfeng Chi, Jinghua Feng, Hao [57] Jianyu Wang, Rui Wen, Chunming Wu, Yu Huang, and Jian Xion. 2019. Fdgars:
Yang, and Qing He. 2022. AUC-oriented Graph Neural Network for Fraud Detec- Fraudster detection via graph convolutional networks in online app review
tion. In WWW. 1311–1321. system. In WWW (Companion Volume). 310–316.
4391
WWW ’24, May 13–17, 2024, Singapore, Singapore Yuan Gao et al.
[58] Shuyao Wang, Yongduo Sui, Jiancan Wu, Zhi Zheng, and Hui Xiong. 2024. Dy- 𝑥𝑢 ∼ 𝑁 (𝜇1, I) and 𝑥 𝑣 ∼ 𝑁 (𝜇 0, I), hence we know ℎ𝑢 and ℎ 𝑣 should
namic Sparse Learning: A Novel Paradigm for Efficient Recommendation. In obey Gaussian distribution, whose mean can be acquired as:
WSDM. ACM.
[59] Xiyuan Wang and Muhan Zhang. 2022. How Powerful are Spectral Graph Neural
Networks. In ICML. 23341–23362.
[60] Yanling Wang, Jing Zhang, Shasha Guo, Hongzhi Yin, Cuiping Li, and Hong
𝜇𝑢 = 𝜇1 − (𝑝 1 𝜇 0 + 𝑞 1 𝜇 1 ) + 𝑝 1 (𝑝 0 𝜇0 + 𝑞 0 𝜇 1 ) + 𝑞 1 (𝑝 1 𝜇 0 + 𝑞 1 𝜇 1 )
Chen. 2021. Decoupling representation learning and classification for gnn-based = 𝜇1 + 𝑝 1 (𝑝 0 𝜇 0 + 𝑞 0 𝜇 1 − 𝑝 1 𝜇 0 − 𝑞 1 𝜇 1 )
anomaly detection. In SIGIR. 1239–1248.
[61] Qitian Wu, Hengrui Zhang, Junchi Yan, and David Wipf. 2022. Handling Distri- = 𝜇1 + 𝑝 1 [(𝑝 0 − 𝑝 1 )𝜇 0 + (𝑞 0 − 𝑞 1 )𝜇1 ]
bution Shifts on Graphs: An Invariance Perspective. In ICLR.
[62] Bingbing Xu, Huawei Shen, Qi Cao, Yunqi Qiu, and Xueqi Cheng. 2019. Graph 𝜇 𝑣 = 𝜇0 − 𝑞 0 [(𝑝 0 − 𝑝 1 )𝜇 0 + (𝑞 0 − 𝑞 1 )𝜇 1 ]
Wavelet Neural Network. In ICLR (Poster). OpenReview.net. (21)
[63] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful
are Graph Neural Networks?. In ICLR.
Hence the distance between the mean of these two distributions:
[64] Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi ||𝜇𝑢 − 𝜇 𝑣 || 2 = ||𝜇 1 − 𝜇 0 || 2 + (𝑝 1 + 𝑞 0 )||(𝑝 0 − 𝑝 1 )𝜇 0 + (𝑞 0 − 𝑞 1 )𝜇1 || 2
Kawarabayashi, and Stefanie Jegelka. 2018. Representation Learning on Graphs
with Jumping Knowledge Networks. In ICML. 5449–5458. = ||𝜇 1 − 𝜇 0 || 2 + (1 + 𝑞 0 − 𝑞 1 ) · |𝑞 0 − 𝑞 1 | · ||𝜇 1 − 𝜇0 || 2
[65] Zhe Xu, Boxin Du, and Hanghang Tong. 2022. Graph Sanitation with Application
to Node Classification. In WWW. ACM, 1136–1147. = [1 + |𝑞 0 − 𝑞 1 | + |(𝑝 0 − 𝑝 1 )(𝑞 0 − 𝑞 1 )|] · ||𝜇 1 − 𝜇 0 || 2
[66] Xingtong Yu, Yuan Fang, Zemin Liu, and Xinming Zhang. 2023. HGPROMPT: (22)
Bridging Homogeneous and Heterogeneous Graphs for Few-shot Prompt Learn-
ing. CoRR abs/2312.01878 (2023). Similarly, since |𝑞 0 − 𝑞 1 | = |𝑝 0 − 𝑝 1 | , we have:
[67] Xingtong Yu, Zhenghao Liu, Yuan Fang, Zemin Liu, Sihong Chen, and Xinming
Zhang. 2023. Generalized Graph Prompt: Toward a Unification of Pre-Training ||𝜇𝑢 − 𝜇 𝑣 || 2 = [1 + |𝑝 0 − 𝑝 1 | + |(𝑝 0 − 𝑝 1 )(𝑞 0 −𝑞 1 )|] · ||𝜇 1 − 𝜇 0 || 2 (23)
and Downstream Tasks on Graphs. CoRR abs/2311.15317 (2023).
[68] Xingtong Yu, Chang Zhou, Yuan Fang, and Xinming Zhang. 2023. MultiGPrompt In our paper, we adopt Euclidean distance as NLD:
for Multi-Task Pre-Training and Prompting on Graphs. CoRR abs/2312.03731
(2023).
√︃
[69] Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai 𝑑 (𝑢, 𝑣) = (𝑝 0 − 𝑝 1 ) 2 + (𝑞 0 − 𝑞 1 ) 2 (24)
Koutra. 2020. Beyond Homophily in Graph Neural Networks: Current Limitations
and Effective Designs. In NeurIPS. Joining equations (23) and (24), we can rewrite the distance between
distribution mean values as:
A PROOFS [𝑑 (𝑢, 𝑣)] [𝑑 (𝑢, 𝑣)] 2
In this section, the proofs of propositions are listed. ||𝜇𝑢 − 𝜇 𝑣 || 2 = [1 + √ + ] · ||𝜇 1 − 𝜇 0 || 2 . (25)
2 2
A.1 Proof of Proposition 1 Likewise, the mean values of hidden represenation given by a
2-layer vanilla GCN are:
Proof. In the spectral domain, the hidden representation of the
spectral filter can be expressed as: 𝜇𝑢 = 𝑝 1 (𝑝 0 𝜇 0 + 𝑝 1 𝜇 1 ) + 𝑞 1 (𝑝 1 𝜇 0 + 𝑝 0 𝜇 1 )
∑︁ ∑︁ = 𝜇 0 + 𝑝 12 (𝜇 1 − 𝜇 0 ) + 𝑞 1 𝑝 0 (𝜇 1 − 𝜇 0 )
𝛼𝑘 (I − D −1/2 AD −1/2 )𝑘 X
𝑘
𝐻= 𝛼𝑘 L̃ X = (18) (26)
𝑘 𝑘 𝜇 𝑣 = 𝑝 0 (𝑝 0 𝜇 0 + 𝑝 1 𝜇 1 ) + 𝑞 0 (𝑝 1 𝜇 0 + 𝑝 0 𝜇 1 )
Taking the second-order spectral filter as an example, = 𝜇 0 + 𝑝 1 𝑝 0 (𝜇1 − 𝜇 0 ) + 𝑞 0 𝑝 0 (𝜇 1 − 𝜇 0 )
𝐻 2 = 𝛼 0 X+𝛼 1 (I−D −1/2 AD −1/2 )X+𝛼 2 (I−D −1/2 AD −1/2 ) 2 X (19) Hence we have the distance between them:
4392
Graph Anomaly Detection with Bi-level Optimization WWW ’24, May 13–17, 2024, Singapore, Singapore
4393
WWW ’24, May 13–17, 2024, Singapore, Singapore Yuan Gao et al.
Figure 9: The ego-graph of some yellow-circled ego nodes classified as high-frequency by BioGNN in Amazon.
Table 5: Performance with limited label information or (and) small percentage of abnormal nodes
Table 6: Performance with standard deviations mechanism. BernNet [27] expresses the filtering operation with
Bernstein polynomials. BWGNN [51] designs a band-pass filter to
Dataset YelpChi Amazon T-Finance aggregate different frequency signals simultaneously. AMNet [4]
Metric F1-Macro AUC F1-Macro AUC F1-Macro AUC
aims to capture both low-frequency and high-frequency signals,
CAREGNN 0.5921±0.054 0.7617±0.018 0.8850±0.015 0.9092±0.021 0.7508±0.025 0.9161±0.006
PCGNN 0.6499±0.030 0.7985±0.011 0.8662±0.029 0.9571±0.020 0.5390±0.093 0.9162±0.039 and adaptively combine signals of different frequencies. GHRN
BWGNN 0.7583±0.002 0.9011±0.004 0.9188±0.002 0.9724±0.002 0.8793±0.011 0.9517±0.008 [23] design an edge indicator to distinguish homophilous and het-
GPRGNN 0.6005±0.030 0.7928±0.027 0.7253±0.067 0.9265±0.021 0.8283±0.022 0.9500±0.014
Ours 0.7632±0.006 0.8920±0.003 0.9368±0.009 0.9748±0.002 0.9047±0.001 0.9639±0.003 erophilous edges. GBK-GNN [18] utilizes two kernels to aggregate
homophily and heterophily neighbors respectively. These methods
can alleviate heterophily problem, however they train all the nodes
as a whole which suffers from loss-rivalry.
anomaly detection into a decision-making problem; DCI [60] de-
couples representation learning and classification with the self-
D.3 Graph Sanitation
supervised learning task. Recent methods realize the necessity of
leveraging multi-relation graphs into GAD. FdGars [57] and Graph- Graph Santinzation aim to learn a modified graph G to boost the
Consis [35] construct a single homo-graph with multiple relations. performance of the corresponding mining model, that is, they have
Likewise, Semi-GNN [56], CARE-GNN [17], and PC-GNN [34] con- a converged classifier and modify the graph topology to fit this
struct multiple homo-graphs based on node relations. In addition, model [15]. GASOLINE [65] formulates the graph sanitation prob-
some works discover that heterophily should be addressed properly lem as a bi-level optimization problem, and further instantiate it by
in GAD. Semi-GNN employs hierarchical attention mechanisms for semi-supervised node classification. Pro-GNN [30] jointly learns
interpretable prediction, while based on camouflage behaviors and a structural graph and a robust graph neural network model from
imbalanced problems, CARE-GNN, PC-GNN, and AO-GNN [29] the perturbed graph. IDGL [7] iteratively learns a better graph
prune edges adaptively according to neighbor distribution. GDN structure based on better node embeddings, and learns better node
[25] and H2 -FDetector [43] adopt different strategies for anomalies embeddings based on a better graph structure. PTDNet [37] prunes
and normal nodes. task-irrelevant edges by penalizing the number of edges in the spar-
sified graph with parameterized networks. Ada-UGNN [40] solves a
D.2 Graph Spectral Filtering graph denoising problem with a smoothness assumption, and han-
dles graphs with adaptive smoothness across nodes. Although there
Spectral GNNs simulate filters with different passbands in the spec- are similarities between BioGNN and graph sanitation methods,
tral domain, enabling GNNs to work on both homophilic and het- BioGNN is trained from scratch without a converged classifier.
erophilic graphs [36, 59]. GPRGNN [10] adaptively learns the Gener-
alized PageRank weights, regardless of whether the node labels are
homophilic or heterophilic. FAGCN [3] adaptively fuses different
signals in the process of message passing by employing a self-gating
4394