0% found this document useful (0 votes)
1 views

Graph Diffusion Models for Anomaly Detection

Uploaded by

Enes Aydın
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Graph Diffusion Models for Anomaly Detection

Uploaded by

Enes Aydın
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Graph Diffusion Models for Anomaly Detection

Zekuan Liu, Huijun Yu, Yao Yan, Ziqing Hu, Pankaj Rajak, Amila Weerasinghe,
Olcay Boz, Deepayan Chakrabarti, Fei Wang
{lzekuan,huijuyu,ynyao,ziqinghu,rajakpan,weera,olcayboz,deepayc,fiwan}@amazon.com
Amazon
Seattle, Washington, USA

ABSTRACT large datasets. These anomalies often correspond to critical, action-


Anomaly detection on graphs focuses on identifying irregular pat- able insights in various domains such as spam detection, network
terns or anomalous nodes within graph-structured data, which security, and health monitoring. The importance of accurate and
deviate significantly from the norm. This domain gains paramount efficient anomaly detection on graphs is further underscored in
importance due to its wide applicability in various fields such as the web era, where interconnected data proliferates at an unprece-
spam detection, anti-money laundering, and network security. In dented rate. The interconnected nature of these data sources, often
the application of anomaly detection on graphs, tackling the chal- represented as graphs, amplifies the complexity of detecting anom-
lenges posed by label imbalance and data insufficiency is of signifi- alies. Graphs, which effectively capture relationships and structural
cance. Recent proliferation in generative models, especially diffu- information, are pivotal in understanding the interactions in diverse
sion models, paves a promising way. In this paper, we introduce applications ranging from social networks to biological networks.
a graph diffusion model in latent space, designed to alleviate the However, the task of anomaly detection in graphs presents
label imbalance problem prevalent in anomaly detection on graphs. unique challenges. The complexity of graph structures, which in-
The proposed model is capable of multitask generation of graph clude nodes, edges, and their attributes, requires sophisticated algo-
structures and node features, and further endowed with conditional rithms capable of capturing both local and global irregularities. Tra-
generative capabilities to produce only positive examples, thereby ditional methods, which predominantly focus on point anomalies
mitigating label imbalance issues. We improved the diffusion model in tabular data, are insufficient for graph data due to its intrinsic
to apply on both homogeneous graphs and heterogeneous graphs. relational information and high dimensionality. Furthermore, the
Through extensive experiments, we demonstrate that our proposed dynamic nature of graphs, where nodes and edges can evolve over
method offers notable improvements over conventional techniques. time, necessitates algorithms that can adapt and detect anomalies
in a continuously changing environment.
CCS CONCEPTS Recent advancements in graph mining techniques, particularly
in the context of deep learning, have opened new avenues for ad-
• Computing methodologies → Anomaly detection; • Mathe-
dressing these challenges. Graph neural networks (GNNs) learn
matics of computing → Graph algorithms; • Computer systems
complex patterns and dependencies within graph data, and have
organization → Neural networks.
shown promising results in identifying anomalies on graphs. Yet,
KEYWORDS the application of these advanced techniques raises additional con-
siderations, similar to anomaly detection on tabular data. Anomaly
Anomaly Detection, Graph Neural Networks, Diffusion Models detection on graphs also faces the challenge of label imbalance,
ACM Reference Format: where the number of anomaly examples is significantly lower than
Zekuan Liu, Huijun Yu, Yao Yan, Ziqing Hu, Pankaj Rajak, Amila Weeras- benign examples. Label imbalance can result in the downgraded
inghe,, Olcay Boz, Deepayan Chakrabarti, Fei Wang. 2018. Graph Diffu- performance of anomaly detection.
sion Models for Anomaly Detection. In Proceedings of Make sure to en-
The advent of diffusion models in the field of generative mod-
ter the correct conference title from your rights confirmation emai (Confer-
eling marks a significant milestone, particularly in synthesizing
ence acronym ’XX). ACM, New York, NY, USA, 6 pages. https://doi.org/
XXXXXXX.XXXXXXX high-quality data representations. Originating from the domain of
image generation, these models have demonstrated an unprece-
1 INTRODUCTION dented ability to capture intricate data distributions. In essence, a
diffusion model operates through a gradual process of noising and
Anomaly detection has always been a critical task in the realm of
denoising data, where the original data distribution is incrementally
machine learning and data mining applications, due to its signifi-
corrupted with noise, followed by a learned reverse process that
cance in identifying irregular patterns that deviate from the norm in
denoises this data back to its original distribution. This iterative
Permission to make digital or hard copies of all or part of this work for personal or approach enables the model to capture complex, high-dimensional
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
data distributions effectively. While their application has been pre-
on the first page. Copyrights for components of this work owned by others than the dominantly in image synthesis, the potential of diffusion models
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or in other domains, such as anomaly detection, especially on graph
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected]. data remain unexplored.
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY To bridge this gap, this study proposes a novel diffusion model
© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. based graph generator. Our generator use graph autoencoder to
ACM ISBN 978-1-4503-XXXX-X/18/06
https://doi.org/XXXXXXX.XXXXXXX encode graph and applies diffusion model in the latent space. The
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Liu et al.

generator generates graph structure and node feature simultane- exploring the structural variability of graphs and understanding
ously in a multitask fashion. It offers a unique capability to generate how anomalies manifest in different graph contexts.
positive (anomaly) examples, thus effectively alleviating the prob- Label Imbalance in anomaly detection presents a significant
lem of label imbalance. Also, it is able to generate heterogeneous hurdle, particularly in graph data where anomalies are naturally
graphs with heterogeneous generator. Adding the generated posi- rare and often overshadowed by the majority of normal instances.
tive examples to the training set, it provides a relatively balanced This imbalance leads to models that are biased towards the majority
dataset for training downstream anomaly detection, and alleviates class, thereby diminishing their effectiveness in identifying true
the label imbalance problem in anomaly detection on graphs. anomalies. Some works adopt interpolation based methods to ad-
The contribution in this project can be mainly summarized in to dress label imbalance [18], and some others use data augmentations
following two parts: [7]. The integration of graph generators offers a novel solution to
this problem. By generating synthetic anomalies, these models can
• We propose a diffuision model based graph generator that improve datasets, creating a more balanced landscape for model
capable to generate heterogeneous graph structure and node training and evaluation.
features simultaneously conditioned on positive label.
• We further alleviate the label imbalance problem in general
anomaly detection on graphs framework.
3 METHODOLOGY
• Experimental results on multiple datasets prove that our gen- To alleviate the label imbalance problem in our anomaly detection,
erator outperforms baselines. Ablation study demonstrated this study proposes a diffusion model based graph generator to
the effectiveness of each components of our generator. generate synthetic anomaly nodes within a heterogeneous graph
structure. The overview of the proposed generator is shown in
The remaining paper is organized as follows: after this introduc- Figure 1. During the training Phase, the generator is trained solely
tion, we delineate previous research followed by the methodology, on the real training data. Then the generator can generate synthetic
featuring the architecture of our diffusion model based graph gen- graphs with fraudulent nodes. In downstream application, these
erator and its components. This is followed by rigorous empirical synthetic data, in conjunction with real training data, are utilized
evaluations, including ablation studies to assess the effectiveness for training the anomaly detection model. This section includes the
of each component and case studies for an in-depth understanding problem definition and how we generate the graph. In addition, the
of generated features. Finally, we summarize our contributions and details of the diffusion model will be introduced.
explore avenues for future research.
3.1 Problem Definition
2 RELATED WORKS Our primary objective is to enrich a dataset with anomaly nodes
Anomaly Detection on Graphs has emerged as a critical area for robust training of anomaly detection systems. These synthetic
of research, distinguished by unique challenges associated with nodes encompass four components:
the complex nature of graph structures. Early methods focused on • Node Feature: Attributes associated with the node within
clustering and proximity-based techniques, which were adept at the graph.
handling anomalies in simpler graph but faltered when applied to • Node Timestamp: A temporal stamp related to the node.
the intricate relationships inherent in complex graph data [15]. The • Positive Label: The positive label denoting the anomaly
introduction of GNNs marked a significant shift, offering a more nature of the node.
nuanced approach to modeling relational data. GNNs, through their • Heterogeneous Graph Structure: Different type of nodes
capacity to encapsulate both node-level and graph-wide patterns, and edge connections.
have demonstrated considerable success in detecting anomalies
across a range of applications, from fake news detection to anti- Each of these components is generated through a sequence of
money laundering [1, 8, 9, 14]. However, label imbalance challenge systematic steps, elaborated in the subsequent subsections.
persists, particularly in adapting these models to heterogeneous
graphs [17] where the graph contains different types of nodes and 3.2 Encoding with Graph Autoencoder
edges. The problem is more complicated if the graph is dynamic. In order to encode our anomaly detection graphs into a latent space
Graph Generators have gained attention as a potent tool for for the diffusion model, we adopt a graph autoencoder. For graph
addressing the data scarcity and synthetic data generation chal- data, which inherently possesses a complex and multifaceted nature,
lenges in graph-based applications. Conventional methods adopt including node attributes and graph topology, a specialized design
autoregressive paradigm to generate graphs [16]. Recently. the de- is imperative. This complexity arises from the interconnectedness
velopment of generative models, particularly diffusion models, has and the structural dependencies within the graph, which are not
been a noteworthy advancement. Originating in the domain of im- present in traditional tabular data. Therefore, the design of the
age synthesis, these models have demonstrated a unique capability graph autoencoder needs to address these unique characteristics.
in capturing complex data distributions through iterative noising
and denoising processes [3]. Their application in graph data is rela- 3.2.1 Generating Node Attributes. We first apply a traditional au-
tively new but promising, offering a method to generate realistic toencoder to cope with the node attributes generation. An autoen-
graph structures and node features that mirror the intricacies of coder is a type of artificial neural network used to learn efficient
real-world graphs [13]. This capability is particularly valuable in representations (encodings) of data, typically for the purpose of
Graph Diffusion Models for Anomaly Detection Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

Input GNN Diffusion Process


Graph

Encoder !0 !"
Diffusion Model

Graph Autoencoder
x T-1 times

Generated
MLPs MLP
Graph

Decoder !" 0 !" !"# !"

Graph Space Latent Space

Figure 1: An overview of proposed diffusion model based graph generator. We first encoder the input graph to latent space with
a GNN encoder. The diffusion process gradually adds noise to the embedding by 𝑇 steps, while the MLP denoises the embedding
𝑇 times. Finally, the decoder reconstructs the graph from the latent embedding.

dimensionality reduction. Its architecture is characterized by a two- and normal nodes. To achieve this, we have incorporated condi-
part structure: the encoder and the decoder. The encoder of an tional variables into the VAE. It enables the model to factor in the
autoencoder is responsible for transforming the input data into a additional label information during the generation process.
lower-dimensional latent space. This process involves a series of During the training phase of the VAE, we concatenate the labels
layers that gradually compress the input data, extracting and re- indicating the node type (anomalous or normal) with their respec-
taining the most salient features. The encoder’s primary objective tive feature vectors. This concatenated data is then fed into the
is to learn a compressed latent representation of the input data encoder of the VAE. The encoder, thus, learns a latent representa-
that encapsulates its essential characteristics, thereby reducing its tion that encapsulates not only the features of the nodes but also
dimensionality while preserving its significant attributes. their corresponding labels.
Following the encoding process, the decoder reconstructs the In the generation phase, the decoder of the VAE is explicitly
input data from the condensed representation in the latent space. conditioned on these labels. This conditioning is pivotal as it directs
The goal of the decoder is to generate an output that closely approx- the decoder to generate feature vectors that are inherently aligned
imates the original input data, using the compressed information with the specified labels. Consequently, when the label indicates
encoded by the encoder. This reconstruction process is crucial, as ’fraudulent’, the decoder is steered to produce feature vectors that
it ensures that the learned representations in the latent space are are characteristic of fraudulent nodes.
meaningful and informative, capturing the intrinsic patterns and This methodology allows for a more controlled generation of
structures of the input data. nodes, enhancing the model’s ability to differentiate and generate
The training of an autoencoder involves adjusting the weights distinct types of nodes based on their underlying characteristics.
of the network to minimize the difference between the original Such an approach is particularly beneficial in scenarios like fraud
input and its reconstruction, typically using a loss function like detection, where the distinction between normal and anomalous
mean squared error (MSE). Through this process, the autoencoder behavior is crucial for effective model performance.
learns to prioritize the most significant features in the input data,
effectively learning a compressed but informative representation. 3.2.3 Generating Heterogeneous Graph Structure. In our method-
As an enhanced variant of traditional autoencoder, the Varia- ology for generating the graph structure, the Variational Graph
tional Autoencoder (VAE) introduces a probabilistic approach, in- Autoencoder (VGAE) [6] is utilized. This approach adapts the ar-
creasing the generalizing ability. Specifically, In a VAE, the encoder chitecture of a VAE to graph-based data, enabling the encoder to
predicts two parameters for the prior Gaussian distribution—mean effectively capture both topological and feature information of the
(𝜇) and standard deviation (𝜎). During the forward pass, a sample is graph into a latent representation. In turn, the decoder focuses
drawn from the distribution defined by these parameters, which is on reconstructing the graph structure. Our model is specifically
then decoded to reconstruct the input data. The VAE loss function tailored for multi-task learning, leveraging a shared GNN as the
is a sum of two terms: reconstruction term, which is a MSE between encoder. This encoder is central to our approach, as it processes
the input and the reconstructed output, and a regularization term, the graph structure and node features simultaneously, embedding
which is KL divergence between the learned distribution and a them into a latent space.
standard Gaussian distribution. The decoder is bifurcated into two separate entities, each with a
distinct role. The first decoder is dedicated to feature reconstruction,
3.2.2 Conditional Generation for Only Nodes with Positive Label. while the second focuses on graph structure. This bifurcation is
In our approach, we aim to enhance the control over the genera- pivotal in addressing the distinct aspects of our graph data, namely,
tion of node types, particularly differentiating between anomalous the node features and the graph topology. The VGAE model is
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Liu et al.

inherently aligned with the framework of a traditional VAE, where with learned Gaussian transitions, defined as:
the loss function comprises two components: the reconstruction 𝑇
Ö
loss and the Kullback-Leibler (KL) divergence. These components 𝑝𝜃 (z0:𝑇 ) = 𝑝 (z𝑇 ) 𝑝𝜃 (z𝑡 −1 |z𝑡 ), (2)
are critical as they collectively cater to both feature and graph 𝑡 =1
structure generation tasks within our model. 𝑝𝜃 (z𝑡 −1 |z𝑡 ) = N (z𝑡 −1 ; 𝝁 𝜃 (z𝑡 , 𝑡), 𝚺𝜃 (z𝑡 , 𝑡)). (3)
However, our research addresses a graph that is inherently het-
A distinct characteristic of diffusion models is the forward diffu-
erogeneous, encompassing multiple types of nodes and edges. This
sion process, an approximate posterior 𝑞(z1:𝑇 |z0 ). This process is a
complexity necessitates a modification in the VGAE framework.
fixed Markov chain that incrementally introduces Gaussian noise
Consequently, we substitute the standard VGAE encoder with a
into the data based on a variance schedule 𝛽 1, . . . , 𝛽𝑇 :
Heterogeneous Graph Transformer (HGT), as introduced by [4].
The HGT is adept at handling the diverse and complex nature of our 𝑇
Ö
graph, allowing for a more nuanced understanding and processing 𝑞(z1:𝑇 |z0 ) = 𝑞(z𝑡 |z𝑡 −1 ), (4)
of the heterogeneous elements. 𝑡 =1
√︁
Additionally, we implement separate decoders for different types 𝑞(z𝑡 |z𝑡 −1 ) = N (z𝑡 ; 1 − 𝛽𝑡 z𝑡 −1, 𝛽𝑡 I). (5)
of edges. This differentiation is essential for accurately generating The training of diffusion models involves optimizing the varia-
adjacency matrices specific to each edge type in our heterogeneous tional bound on the negative log likelihood:
graph. The use of distinct decoders enables us to tailor the recon- " #
struction process for each edge type, thereby enhancing the fidelity ∑︁ 𝑝𝜃 (z𝑡 −1 |z𝑡 )
𝐿 = E[− log 𝑝𝜃 (z0 )] ≤ E𝑞 − log 𝑝 (z𝑇 ) − log .
of the reconstructed heterogeneous graph. 𝑞(z𝑡 |z𝑡 −1 )
𝑡 ≥1
(6)
3.2.4 Timestamp Generation. In the proposed methodology, a dis-
tinct temporal generator is meticulously designed and deployed, The variance parameters 𝛽𝑡 of the forward process can either be
specifically tasked with the generation of timestamps. This com- learned through reparameterization or set as fixed hyperparameters.
ponent parallels the architecture of the feature generator in terms The reverse process is made expressive, particularly by the choice of
of its foundational structure; however, it diverges in its targeted Gaussian conditionals in 𝑝𝜃 (z𝑡 −1 |z𝑡 ), especially when the 𝛽𝑡 values
output, which is singularly dimensional. The primary output of this are small. A notable aspect of the forward process is the ability to
temporal generator is a continuous variable representing the time sample z𝑡 at any timestep 𝑡 in a closed form. This is facilitated by
Î𝑡
aspect of the data, encapsulating a critical dimension in temporal the definitions 𝛼𝑡 = 1 − 𝛽𝑡 and 𝛼¯𝑡 = 𝑠=1 𝛼𝑠 . The sampling is:

data analysis. 𝑞(z𝑡 |z0 ) = N (z𝑡 ; 𝛼¯𝑡 z0, (1 − 𝛼¯𝑡 )I). (7)
To ensure the precision and reliability of the temporal generator,
Efficient training is feasible by optimizing random terms of 𝐿 using
a Mean Squared Error (MSE) loss function is employed. This choice
stochastic gradient descent. Additional improvements in training
of loss function is particularly effective for regression tasks, where
are achieved through variance reduction. This is done by reformu-
the goal is to minimize the average squared difference between
lating 𝐿 (from Eq. 6) as:
the estimated values and the actual value. In the context of this ∑︁
research, the MSE loss function aids in fine-tuning the temporal E𝑞 {KL[𝑞(z𝑇 |z0 )||𝑝 (z𝑇 )] + KL[𝑞(z𝑡 −1 |z𝑡 , z0 )||𝑝𝜃 (z𝑡 −1 |z𝑡 )]
generator to produce timestamps that closely align with the true | {z } 𝑡 >1 | {z }
temporal characteristics of the dataset. 𝐿𝑇 𝐿𝑡 −1
− log 𝑝𝜃 (z0 |z1 ) }. (8)
| {z }
3.3 Diffusion Model in Latent Space 𝐿0
With the graph autoencoder described above, we are able to gener- Equation 8 uses the Kullback-Leibler (KL) divergence to compare
ate the synthetic anomalies. However, when due to the complexity 𝑝𝜃 (z𝑡 −1 |z𝑡 ) against the tractable forward process posteriors condi-
of the graph structure, the divergence between prior distribution tioned on z0 . The expressions for the posteriors are:
and data distribution can be large, thereby result in suboptimal
𝑞(z𝑡 −1 |z𝑡 , z0 ) = N (z𝑡 −1 ; 𝝁˜ 𝑡 (z𝑡 , z0 ), 𝛽˜𝑡 I), (9)
performance in the downstream anomaly detection. As such, we
√ √
propose to further apply a diffusion model in latent space for better where 𝝁˜ 𝑡 (z𝑡 , z0 ) =
𝛼¯𝑡 −1 𝛽𝑡
+
𝛼𝑡 (1−𝛼¯𝑡 −1 )
and 𝛽˜𝑡 = 1− 𝛼¯𝑡 −1
generation quality. 1−𝛼¯𝑡 z0 1−𝛼¯𝑡 z𝑡
1−𝛼¯𝑡 𝛽𝑡 .
Since all KL divergences in Equation 8 compare Gaussians, they
In this paper, we adopt DDPM [3]. It is mathematically formu-
can be computed in a Rao-Blackwellized manner using closed-form
lated as:
expressions instead of Monte Carlo estimates with high variance.

𝑝𝜃 (z0 ) = 𝑝𝜃 (z0:𝑇 ) 𝑑z1:𝑇 , (1) 4 EXPERIMENTS
4.1 Experimental Setup
where z1, . . . , z𝑇 represent latent variables of the same dimensional- To comprehensively evaluate the performance of our proposed
ity as the data z0 ∼ 𝑞(z0 ). The joint distribution 𝑝𝜃 (z0:𝑇 ) is known graph generator, we compared it against baseline methods, includ-
as the reverse process. This process is a Markov chain initiated from ing simple reweighting and only graph autoencoder (GAE) without
a standard Gaussian distribution 𝑝 (z𝑇 ) = N (z𝑇 ; 0, I) and evolves the diffusion model. The backbones of the baselines contain on the
Graph Diffusion Models for Anomaly Detection Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

Table 1: Anomaly Detection Performance Comparison Over Different Datasets.

Metrics Dataset GCN GraphSAGE GAT GT GAE Diffusion


reddit 0.6085 0.6550 0.6671 0.6436 0.6719 0.6735
weibo 0.9820 0.9655 0.9406 0.9651 0.9841 0.9882
AUROC amazon 0.8237 0.8802 0.9778 0.9086 0.9624 0.9745
tfinance 0.9361 0.8042 0.9412 0.8409 0.9471 0.9700
dgraph 0.7562 0.7559 0.7603 0.7582 0.7594 0.7604
reddit 0.0439 0.0622 0.0641 0.0593 0.0681 0.0698
weibo 0.9344 0.8926 0.9025 0.9042 0.9479 0.9489
AUPRC amazon 0.3347 0.6061 0.8846 0.7656 0.8584 0.8922
tfinance 0.7652 0.1468 0.6745 0.2676 0.6968 0.8874
dgraph 0.0385 0.0375 0.0390 0.0386 0.0390 0.0390
reddit 0.0476 0.0884 0.0612 0.1020 0.1088 0.1088
weibo 0.8963 0.8646 0.8674 0.8530 0.9107 0.9222
Recall@K amazon 0.3750 0.6413 0.8424 0.7554 0.8315 0.8587
tfinance 0.7143 0.1678 0.6976 0.3204 0.7074 0.8294
dgraph 0.0679 0.0735 0.0752 0.0774 0.0765 0.0739

top of various GNNs, including GCN [5], GraphSAGE [2], GAT [12], by our method but also qualitatively underscores the necessity of
and Graph Transformer (GT) [10]. We opted for three metrics that the diffusion model in our graph generator.
are robust to label imbalance: Area Under the Receiver Operating
Characteristic Curve (AUROC), Area Under the Precision Recall 4.4 Case Study
Curve (AUPRC), and Recall@𝑘, where 𝑘 is the number of anomalies
in the ground truth label.
800 Real
4.2 Datasets Diffusion
GAE
We follow previous work [11], and conducted experiments on five 600
Frequency

distinct datasets, characterized by varying scales and label imbal-


ances. The statistics of the dataset are presented in Table 2. 400
Table 2: The statistics of the datasets.
200
Nodes Edges Attr. Ratio Etypes Time
0
reddit 10,984 168,016 64 3.30% 1 × 2 1 0 1
weibo 8,405 407,963 400 10.30% 1 ×
tfinance 39,357 21,222,543 10 4.60% 1 ×
amazon 11,944 4,398,392 25 9.50% 3 × Figure 2: Distribution comparison of one dimension node fea-
dgraph 3,700,550 4,300,999 17 1.30% 11 ✓ ture between real data and generated data in DGraph dataset.

In Figure 2, we visualize the data generated by GAE and our gen-


4.3 Experimental Results erator compared with the real data. From the figure, we can observe
In the experimental evaluation, as delineated in Table 1, our pro- that the data generated by our diffusion model based generator is
posed method demonstrates superior performance compared to closer to real data in distribution, while GAE can only generate a
the established baselines across a majority of the test scenarios. Gaussian distribution.
Particularly noteworthy is the performance on the tfinance dataset,
where our generator model achieves a significant improvement 5 CONCLUSION
in AUROC, despite the high performance on baselines. The AU- In this research, we propose a novel diffusion model based graph
ROC value escalates to 0.97, up from 0.94, when compared to GAT, generator operating within a latent space framework, specifically
underscoring the efficacy of our approach in this context. engineered to address the challenges posed by label imbalance in
Further analysis reveals the integral role of the diffusion model graph-based anomaly detection. This model demonstrates profi-
in our framework. By conducting a comparative study with GAE, ciency in the simultaneous generation of graph structures and node
which notably omits the diffusion component but keep the varia- features, incorporating multitasking capabilities. Furthermore, it is
tional autoencoder parts , we observe a marked enhancement in uniquely equipped with conditional generative functions, enabling
performance with the inclusion of our diffusion model. This com- the selective generation of positive examples. It is instrumental in
parison not only highlights the quantitative improvements afforded effectively counteracting the prevalent issue of label imbalance.
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Liu et al.

REFERENCES [11] Jianheng Tang, Fengrui Hua, Ziqi Gao, Peilin Zhao, and Jia Li. 2023. GADBench:
[1] Yingtong Dou, Kai Shu, Congying Xia, Philip S Yu, and Lichao Sun. 2021. User Revisiting and Benchmarking Supervised Graph Anomaly Detection. arXiv
preference-aware fake news detection. In Proceedings of the 44th International preprint arXiv:2306.12251 (2023).
ACM SIGIR Conference on Research and Development in Information Retrieval. [12] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
2051–2055. Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint
[2] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation arXiv:1710.10903 (2017).
learning on large graphs. Advances in neural information processing systems 30 [13] Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher,
(2017). and Pascal Frossard. 2022. Digress: Discrete denoising diffusion for graph gener-
[3] Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic ation. arXiv preprint arXiv:2209.14734 (2022).
models. Advances in neural information processing systems 33 (2020), 6840–6851. [14] Mark Weber, Giacomo Domeniconi, Jie Chen, Daniel Karl I Weidele, Claudio
[4] Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous Bellei, Tom Robinson, and Charles E Leiserson. 2019. Anti-money laundering in
graph transformer. In Proceedings of the web conference 2020. 2704–2710. bitcoin: Experimenting with graph convolutional networks for financial forensics.
[5] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph arXiv preprint arXiv:1908.02591 (2019).
convolutional networks. arXiv preprint arXiv:1609.02907 (2016). [15] Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas AJ Schweiger. 2007. Scan:
[6] Thomas N Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv a structural clustering algorithm for networks. In Proceedings of the 13th ACM
preprint arXiv:1611.07308 (2016). SIGKDD international conference on Knowledge discovery and data mining. 824–
[7] Fanzhen Liu, Xiaoxiao Ma, Jia Wu, Jian Yang, Shan Xue, Amin Beheshti, Chuan 833.
Zhou, Hao Peng, Quan Z Sheng, and Charu C Aggarwal. 2022. Dagad: Data [16] Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. 2018.
augmentation for graph anomaly detection. In 2022 IEEE International Conference Graphrnn: Generating realistic graphs with deep auto-regressive models. In
on Data Mining (ICDM). IEEE, 259–268. International conference on machine learning. PMLR, 5708–5717.
[8] Kay Liu, Yingtong Dou, Yue Zhao, Xueying Ding, Xiyang Hu, Ruitong Zhang, [17] Jianan Zhao, Xiao Wang, Chuan Shi, Zekuan Liu, and Yanfang Ye. 2020. Net-
Kaize Ding, Canyu Chen, Hao Peng, Kai Shu, et al. 2022. Bond: Benchmarking work schema preserving heterogeneous information network embedding. In
unsupervised outlier node detection on static attributed graphs. Advances in International joint conference on artificial intelligence (IJCAI).
Neural Information Processing Systems 35 (2022), 27021–27035. [18] Tianxiang Zhao, Xiang Zhang, and Suhang Wang. 2021. Graphsmote: Imbalanced
[9] Kay Liu, Yingtong Dou, Yue Zhao, Xueying Ding, Xiyang Hu, Ruitong Zhang, node classification on graphs with graph neural networks. In Proceedings of the
Kaize Ding, Canyu Chen, Hao Peng, Kai Shu, et al. 2022. Pygod: A python library 14th ACM international conference on web search and data mining. 833–841.
for graph outlier detection. arXiv preprint arXiv:2204.12095 (2022).
[10] Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjin Wang, and Yu Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009
Sun. 2020. Masked label prediction: Unified message passing model for semi-
supervised classification. arXiv preprint arXiv:2009.03509 (2020).

You might also like