05graph Anomaly
05graph Anomaly
network
(e.g., Patterns, laws, connectivity, etc.)
node/link
We are (mostly) here
• Level 1: diameter, connectivity, graph-level classification, graph-level embedding, graph kernel, graph structure learning, graph generator,…
• Level 2: frequent subgraphs, clustering, community detection, motif, teams, dense subgraphs, subgraph matching, NetFair, …
• Level 3: node proximity, node classification, link prediction, anomaly detection, node embedding, network alignment, NetFair,
• Beyond:, network of X, ….
2
An Atlas of Network Science
Network of Networks
Network Alignment
Anomaly Detection
Knowledge Graphs
Matrix & Tensor
NetFair
GNNs
GCOs
…
Consistency principle
MAP: Meta Approaches
Data compression
Spectrum
Perturbative analysis
Sub-modularity
Bilevel opt.
Alternating opt.
Message Passing
Lecture Topics
3
Outliers vs. Graph anomalies
Our Focus
Healthcare fraud
Network intrusion
5
Applications
Malware
Investment fraud Click fraud Spyware
Insurance fraud Malicious cargo
Insider threat
and many more…
Web spam
Image/video surveillance
6
Anomaly detection: definition
◼ (Hawkins’ Definition of Outlier, 1980)
“An outlier is an observation that differs
so much from other observations as to
arouse suspicion that it was generated
by a different mechanism.”
7
Anomaly detection: definition
◼ for practical purposes,
a record/point/graph-node/graph-edge
is flagged as anomalous
if a rarity/likelihood/outlierness score
exceeds a user-defined threshold
◼ anomalies:
→ rare (e.g., rare combination of
categorical attribute values)
→ isolated points in n-d spaces
→ surprising (don't fit well in our mental/statistical
model == need too many bits under MDL)
8
Challenges
record
Temporal
Unbalanced (Velocity)
(fraud is rare)
Data Categorical,
(Volume) Numerical,
Unlabeled Relational, …
(no ground truth) (Variety)
9
Why graph-based detection?
◼ Powerful representation
❑ Interdependent instances
❑ Long-range relations
❑ Node/Edge attributes (data complexity)
❑ Hard to fake/alter (adversarial robustness)
◼ Abundant relational data
❑ Web, email, phone call, …
◼ Nature of applications
❑ organized fraud (group activity, e.g., fraudsters +
accomplice)
10
Real graphs (1) -
-
-
-
-
Internet Map
Food Web Terrorist Network
Biological networks
Blog networks Web Graph
11
Real graphs (2)
Retail networks
Protein-protein
Interaction
Social Network
Power Dating network
Grid
12
Problem revisited for graphs
◼ Three different problem settings
❑ Plain/Attributed Graphs
❑ Static/Dynamic Graphs
13
Taxonomy
Graph Anomaly Detection
Plain
Plain Attributed Learning Inference
Distance based models Iterative
classification
Feature based Structure based Feature-distance RMNs
Belief
Structure distance PRMs propagation
Structural features Substructures
RDNs Relational netw.
Recursive features Subgraphs
MLNs classification
Structure based
Community Community “phase transition”
based based
14
Goal of this lecture
◼ Introduce various problem formulations
❑ Definitions change by application/representation
◼ Applications of problem settings
❑ Intrusion, fraud, spam
◼ Introduce existing techniques
❑ Model fitting, factorization, relational inference
◼ Pros and Cons
❑ Parameters, scalability, robustness
15
Outline
◼ Motivation, applications, challenges
◼ Part I: Anomaly detection in static data
❑ Overview: Outliers in clouds of points
❑ Anomaly detection in graph data
16
Part I: Outline
◼ Overview: Outliers in clouds of points
❑ Outliers in numerical data points
◼ distance-based, density-based, …
❑ Outliers in categorical data points
◼ model-based
17
Outlier detection (See Chapter 11@DM Textbook)
◼ Anomalies in multi-dimensional data points
❑ Density-based
❑ Distance-based
❑ Depth-based
❑ Distribution-based
❑ Clustering-based
❑ Classification-based
❑ Information theory-based
❑ Spectrum-based
❑ …
◼ No relational links between points
18
Part I: References (outliers)
◼ M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF:
Identifying density-based local outliers. SIGMOD, 2000.
◼ S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C.
Faloutsos. LOCI: Fast outlier detection using the local
correlation integral. ICDE, 2003.
◼ C. C. Aggarwal and P. S. Yu. Outlier detection for high
dimensional data. SIGMOD, 2001.
◼ A. Ghoting, S. Parthasarathy and M. Otey, Fast Mining of
Distance Based Outliers in High-Dimensional Datasets.
DAMI, 2008.
◼ Y. Wang, S. Parthasarathy and S. Tatikonda, Locality
Sensitive Outlier Detection. ICDE, 2011.
◼ Kaustav Das, Jeff Schneider. Detecting Anomalous
Records in Categorical Datasets. KDD 2007.
19
Part I: References (outliers)
◼ Müller E., Schiffer M., Seidl T. Adaptive Outlierness for
Subspace Outlier Ranking. CIKM, 2010.
◼ Müller E., Assent I., Iglesias P., Mülle Y., Böhm K.
Outlier Ranking via Subspace Analysis in Multiple Views
of the Data. ICDM, 2012.
◼ L. Akoglu, H. Tong, J. Vreeken, and C. Faloutsos. Fast
and Reliable Anomaly Detection in Categoric Data.
CIKM, 2012.
◼ A. Chaudhary, A. S. Szalay, and A. W. Moore. Very fast
outlier detection in large multidimensional data sets.
DMKD, 2002.
◼ Survey: V. Chandola, A. Banerjee, V. Kumar: Anomaly
Detection: A Survey. ACM Computing Surveys, Vol.
41(3), Article 15, July 2009.
20
Part I: Outline
◼ Overview: Outliers in clouds of points
❑ Outliers in numerical data points
◼ distance-based, density-based, …
❑ Outliers in categorical data points
◼ model-based
21
Taxonomy
Graph Anomaly Detection
Plain
Plain Attributed Learning Inference
Distance based models Iterative
classification
Feature based Structure based Feature-distance RMNs
Belief
Structure distance PRMs propagation
Structural features Substructures
RDNs Relational netw.
Recursive features Subgraphs
MLNs classification
Structure based
Community Community “phase transition”
based based
22
Akoglu et al. ’10
Anomalies in Weighted Graphs
◼Problem:
Q1. Given a weighted
and unlabeled graph,
how can we spot
strange, abnormal,
extreme nodes?
24
OddBall: approach
1) For each node,
1.1) Extract “ego-net” (=1-step neighborhood)
1.2) Extract features (#edges, total weight, etc.)
→ features that could yield “laws”
→ features fast to compute and interpret
2) Detect patterns: ego ego-net
→ regularities
3) Detect anomalies:
→“distance” to patterns
25
What is odd?
• In terms of topology
• A near-star: telemarketer, port scanner, people adding friends indiscriminately
• A near-clique: terrorists, tightly connected group of people, ambitious political
discussion group in a post network
• In terms of weights :
• different weight distribution on edges is acceptable – a person having close friends,
closer friends, closest friend etc.
• Single heavy link – tight pair – single-minded, tight company
• Uniform weights – robot-like behavior
Which features to compute?
▪ Ni: number of neighbors (degree) of ego i
▪ Ei: number of edges in egonet i
λw,i = W λw,i ≈ W
discussion group,
“rank boosting”, etc. slope=2 slope=1.35
#edges E
slope=1
telemarketer, spammer,
port scanner, “popularity
contests”, etc.
#neighbors N
29
OddBall: pattern#2
high $ vs. #accounts,
high $ vs. #donors, etc.
slope=1.08
total weight W
slope=1
uniform, robot-like
behavior
#edges E
30
OddBall: pattern#3
slope=1
slope=0.5
λ1,w
total weight W
31
OddBall: anomaly detection
scoredist = distance to fitting line
scoreoutl = outlier-ness score
score = func ( scoredist , scoreoutl )
33
OddBall at work (Posts)
# of Edges (E) POSTS
223K posts
217K citations
# of Nodes (N)
34
OddBall at work (DBLP)
Extremely focused
Extremely focused
#publications (weights W)
36
Henderson et al. ’11
Recursive structural features
◼ Main idea: recursively combine “local” (node-
based) and neighbor (egonet-based) features
❑ Recursive feature: any aggregate computed over
any feature (including recursive) value among a
node’s neighbors
Structural information
local
Neighborhood
egonet Regional
recursive
37
Recursive structural features
local egonet recursive
• K. Ding, Q. Zhou, H. Tong and H. Liu: Few-shot Network Anomaly Detection via Cross-network Meta-learning. TheWebConf 2021
• H Qiao, H Tong, B An, I King, C Aggarwal, G Pang: Deep Graph Anomaly Detection: A Survey and New Perspectives. arXiv preprint
40 arXiv:2409.09957, 2024
Taxonomy
Graph Anomaly Detection
Plain
Plain Attributed Learning Inference
Distance based models Iterative
classification
Feature based Structure based Feature-distance RMNs
Belief
Structure distance PRMs propagation
Structural features Substructures
RDNs Relational netw.
Recursive features Subgraphs
MLNs classification
Structure based
Community Community “phase transition”
based based
41
Sun et al. ’05
Anomalies in Bipartite Graphs
◼ Problem:
Q1. Neighborhood formation (NF)
V1 V2
❑ Given a query node q in V1, .3
what are the relevance scores .2
.25
.05
.05
Q2. Anomaly detection (AD)
.01
❑ Given a query node q in V1, .002
42
Applications of problem setting
◼ Publication network
❑ (similar) authors vs. (unusual) papers
◼ P2P network
❑ (similar) users vs. (“cross-border”) files
◼ Financial trading network
❑ (similar) stocks vs. (cross-sector) traders
◼ Collaborative filtering
❑ (similar) users vs. (“cross-border”) products
43
1) Neighborhood formation
V1 V2
◼ Main idea: .3
❑ Random-Walk-with Restart from q .2
❑ Steady-state V1 prob.s as relevance q
(1-c)
Approx: RWR on graph partition containing q
44
2) Anomaly detection
◼ Main idea:
❑ Pairwise “normality” scores of neighbors(t)
❑ Function of (e.g. avg) pair-wise scores S
t
❑ (1) Find set S of nodes connected to t
❑ (2) Compute |S|x|S| normality matrix R
◼ asymmetric, diagonal reset to 0
S
❑ (3) Apply score function f(R) t
◼ e.g. f(R) = mean(R)
45
Tong et al. ’11
Graph Anomalies by NrMF
◼ Low-rank adjacency matrix factorization of a
(sparse) graph reveals communities and anomalies
Low-rank matrices Residual matrix
community anomalies
49
Non-negativity constraints
◼ For improved interpretability
◼ A Typical Procedure: Interpretation by Non-negativity
community
Non-negative Matrix Factorization
Adjacency
Graph
Matrix A
A=FxG+R F >= 0; G >= 0
(for community detection)
anomalies
◼ An Example
Non-negative Residual
Matrix Factorization
R(i,j) >= 0; for A(i,j) > 0
(for anomaly detection)
50
Optimization formulation
Common in
Matrix Factorization
Non-negative
residual
51
Experiments
◼ NNrMF can spot 4 types of anomalies
NNrMF SVD
residuals residuals (top-k edges)
52
Part I: References (plain graphs)
Community mining Feature mining
54
Taxonomy
Graph Anomaly Detection
Plain
Plain Attributed Learning Inference
Distance based models Iterative
classification
Feature based Structure based Feature-distance RMNs
Belief
Structure distance PRMs propagation
Structural features Substructures
RDNs Relational netw.
Recursive features Subgraphs
MLNs classification
Structure based
Community Community “phase transition”
based based
55
Noble & Cook. ’03
Anomalies in labeled graphs
◼Problem:
Q1. Given a graph in which nodes and edges
contain (non-unique) labels, what are
unusual substructures?
56
Background
◼ Subdue*: An algorithm for detecting repetitive
patterns (substructures) within graphs.
◼ Substructure: A connected subgraph of the
overall graph.
◼ Compressing a graph: Replacing each
instance of the substructure with a new
vertex representing that substructure.
◼ Description Length (DL): Number of bits
needed to encode a piece of data
* http://ailab.wsu.edu/subdue/
57
Background
◼ Subdue uses the following heuristic:
❑ The best substructure is the one that minimizes
F1(S,G) = DL(G | S) + DL(S)
◼ G: Entire graph, S: The substructure,
◼ DL(G|S) is the DL of G after compressing it using S,
◼ DL(S) is the description length of the substructure.
A B
People Groups
People Groups
A F
People
People
L (M) + L (D|M)
encoding length encoding length
of clustering of blocks
Good implies
Good
Clustering Compression
64
Problem formulation details
business
grad
Subjects
Phone calls title
Subjects
Device scans title
68
PICS at work (YouTube)
familiar strangers
anime lovers
bridges
71
Part II: Outline
◼ Overview: Events in point sequences
❑ Change detection in time series
❑ Learning under concept drift
72
Event detection
◼ Anomaly detection in time series of multi-
dimensional data points
❑ Exponentially Weighted Moving Average
❑ CUmulative SUM Statistics
❑ Regression-based
❑ Box-Jenkins models eg. ARMA, ARIMA
❑ Wavelets
❑ Hidden Markov Models
❑ Model-based hypothesis testing
❑ …
◼ This part: time series of graphs
73
Part II: References (data series)
◼ Montgomery, D. C. Introduction to Statistical Quality Control. John
Wiley and Sons, Inc., 2001.
◼ Box, George and Jenkins, Gwilym. Time series analysis: Forecasting
and control, San Francisco: Holden-Day, 1970.
◼ Gama J., Medas P., Castillo G., Rodrigues P.P.: Learning with Drift
Detection. SBIA 2004: 286-295.
◼ Grigg et al.; Farewell, VT; Spiegelhalter, DJ. The Use of Risk-Adjusted
CUSUM and RSPRT Charts for Monitoring in Medical
Contexts. Statistical Methods in Medical Research 12 (2): 147–170.
◼ Bay, S. D., and Pazzani, M. J., Detecting change in categorical data:
Mining contrast sets. KDD, pages 302–306, 1999.
◼ M. Van Leeuwen, A. Siebes. StreamKrimp: Detecting Change in Data
Streams. ECML PKDD, 2008.
◼ Wong, W.-K., Moore, A., Cooper, G. and Wagner, M. WSARE: An
Algorithm for the Early Detection of Disease Outbreaks. JML, 2005.
◼ Tutorial: D. B. Neill and W.-K. Wong. A tutorial on event detection.
KDD, 2009.
74
Part II: Outline
◼ Overview: Events in point sequences
❑ Change detection in time series
❑ Learning under concept drift
75
Events in time-evolving graphs
◼Problem: Given a sequence of graphs,
Q1. change detection: find time points at
which graph changes significantly
76
Events in time-evolving graphs
◼ Main framework
❑ Compute graph similarity/distance scores
… … …
time
Plain
Plain Attributed Learning Inference
Distance based models Iterative
classification
Feature based Structure based Feature-distance RMNs
Belief
Structure distance PRMs propagation
Structural features Substructures
RDNs Relational netw.
Recursive features Subgraphs
MLNs classification
Structure based
Community Community “phase transition”
based based
78
Graph distance – 10 metrics [& more]
Shoubridge et al. ‘02
◼ (1) Weight distance Dickinson et al. ‘04
79
Graph distance – 10 metrics
◼ (4) MCS Node distance
edge weights
Non-linear cost functions
81
Graph distance – 10 metrics
◼ (6) Median Graph distance Dickinson et al. ’04
❑ Median graph of sequence
Perron vector
82
Graph distance – 10 metrics
◼ (8) Diameter distance Gaston et al. ’06
shortest distance
◼ (9) Entropy distance
* *
83
Graph distance – 10 metrics
84
Pincombe ’05
Graph distance to time series
◼ Time series of graph distances per dist. func.
◼ model for each time series
❑ assumes stationary series, due to construction
◼ Anomalous time points: where residuals
exceed a threshold
residuals
day
85
Part II: Outline
◼ Overview: Events in point sequences
❑ Change detection in time series
❑ Learning under concept drift
86
Ide et al. ’04
Eigen-space-based events
sx s10 s4
◼ Given a time-evolving graph
s3 s9
Identify faulty vertices
s7 s11 sy
◼ Challenges
❑ Large number of nodes,
impractical to monitor each
❑ Edge weights are highly dynamic
❑ Anomaly defined collectively (different than “others”)
88
Activity feature
◼ Why “activity”? (intuition)
❑ If D12 is large, then u1 and u2 should be large
because of argmax (note: D is a positive matrix).
❑ So, if s1 actively links to other nodes at t, then the
“activity” of s1 should be large.
u1
❑ Also interpreted as “stationary state”: u
probability that a node is holding the u= 2
“control token” sx s10 s2
u N
s3 s9
s7 s11 sy
89
Anomaly detection
◼ Problem reduced from a sequence of graphs to
a sequence of (activity) vectors
adjacency matrix activity vector
principal
eigenvector
u(t)
t-1
SVD
r(t-1) t-W
r(t-1)
u(t)
track angle for change summary vector
90
Experiment
◼ Time evolution of
activity scores
effectively visualizes
malfunction
91
Reconstruction-based events
◼ Network forensics
❑ Sparsification ➔ load shedding
❑ Matrix decomposition ➔ summarization
❑ Error Measure ➔ anomaly detection
Sun+ICDM’07
92 modified with permission
Matrix decomposition
◼ Goal: summarize a given graph
decompose adjacency matrix
into smaller components
1800’s,
1. Singular Value Decomposition (SVD) PCA, LSI, …
2. CUR decomposition Drineas et al. ’05
Actual column
Singular vector
◼ Network forensics
❑ Sparsification ➔ load shedding
❑ Matrix decomposition ➔ summarization
❑ Error Measure ➔ anomaly detection
97
Sun et al. ’07
Error measure: reconstruction
◼ accuracy = 1- Relative Sum-Square-Error
RSSE =
Volume monitoring
◼ Monitor accuracy over time cannot detect anomaly
structural change
of link patterns
101
Part III: Outline
◼ Online auction fraud
◼ Fake review spam
◼ Web spam
102
Taxonomy
Graph Anomaly Detection
Plain
Plain Attributed Learning Inference
Distance based models Relational netw.
classification
Feature based Structure based Feature-distance RMNs
Iterative
Structure distance PRMs classification
Structural features Substructures
RDNs Gibbs sampling
Recursive features Subgraphs
MLNs Belief
Structure based propagation
Community Community “phase transition”
based based
Applications
Fraud detection
Spam detection
103
Chau et al. ’06
(1) Online auction fraud
◼ Auction sites: attractive target for fraud
◼ 63% complaints to Federal Internet Crime
Complaint Center in U.S. in 2006
◼ Average loss per incident: = $385
◼ Often non-delivery fraud:
$$$
Seller Buyer
104
Online auction fraud detection
◼ Insufficient solution:
❑ Look at individual features, geographic locations,
login times, session history, etc.
$$$
108
BP in action At each iteration, for each
Initialize prior beliefs of node, compute messages to
fraudsters to P(f)=1 its neighbors
Initialize
other
nodes as
unbiased
109
Computing beliefs → roles
P(accomplice)
P(honest)
A E
P(fraudster)
Chau+PKDD’06
110 modified with permission
(2) Fake review spam
◼ Review sites: attractive target for spam
◼ Often hype/defame spam
◼ Paid spammers
111
Fake review spam detection
◼ Behavioral analysis [Jindal & Liu’08]
❑ individual features, geographic locations, login
times, session history, etc.
◼ Language analysis [Ott et al.’11]
❑ use of superlatives, many self-referencing, rate of
misspell, many agreement words, …
112
[Wang et al. ’11]
Graph-based detection
Reviewer r trustiness T(r)
• Reviewer r ’s i-th review
• nr : total # of reviews by r
Trustiness
T(r)
Honesty Hr
113
Graph-based detection
Store s reliability R(s)
review v’s author id
Median rating (3)
114
Graph-based detection
Review v honesty H(v)
Trustiness of the author
116
Graph-based detection
◼ Algorithm: iterate trustiness, reliability, and
honesty scores in a mutual recursion
❑ similar to Kleinberg’s HITS algorithm
❑ non-linear relations
◼ Challenges:
❑ Convergence not guaranteed
❑ Cannot use attribute info
❑ Parameters: agreement time window ∆t, review
similarity threshold (for dis/agreement)
117
(3) Web spam
◼ Spam pages: pages designed to trick search
engines to direct traffic to their websites
118
Web spam
◼ Challenges:
❑ pages are not independent
❑ what features are relevant?
❑ small training set
❑ noisy labels (consensus is hard)
❑ content very dynamic
119
Web spam
◼ Many graph-based solutions
❑ TrustRank [Gyöngyi et al. ’04]
❑ SpamRank [Benczur et al. ’05]
❑ Anti-trustRank [Krishnan et al. ’06]
❑ Propagating trust and distrust [Wu et al. ’06]
❑ Know your neighbors [Castillo et al. ’07]
❑ Guilt-by-association [Kang et al. ’11]
❑ …
120
Web spam
◼ Main idea: exploit homophily and reachability
121
[Gyöngyi et al. ’04]
TrustRank: combating web spam
◼ Main steps:
❑ Find seed set S of “good” pages
(e.g. using oracle)
❑ Compute trust scores by biased
(personalized) PageRank from
good pages
◼ Intuition: spam pages are
hardly reachable from
trustworthy pages
❑ Hard to acquire direct inlinks
from good pages
122
details
TrustRank mathematically
◼ Remember PageRank score of a page p:
◼ In closed form:
126
Other references
◼ B. T. Messmer and H. Bunke. A new algorithm for error-tolerant
subgraph isomorphism detection. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 20(5):493–504, 1998.
◼ P. J. Dickinson, H. Bunke, A. Dadej, M. Kraetzl: Matching graphs with
unique node labels. Pattern Anal. Appl. 7(3): 243-254 (2004)
◼ Peter J. Dickinson, Miro Kraetzl, Horst Bunke, Michel Neuhaus, Arek
Dadej: Similarity Measures For Hierarchical Representations Of
Graphs With Unique Node Labels. IJPRAI 18(3): 425-442 (2004)
◼ Kraetzl, M. and Wallis, W. D., Modality Distance between Graphs.
Utilitas Mathematica, 69, 97–102, 2006.
◼ Gaston, M. E., Kraetzl, M. and Wallis, W. D., Graph Diameter as a
Pseudo-Metric for Change Detection in Dynamic
Networks, Australasian Journal of Combinatorics, 35, 299—312, 2006.
◼ B. Pincombe. Detecting changes in time series of network graphs using
minimum mean squared error and cumulative summation. ANZIAM J.
48, pp.C450–C473, 2007.
127
Conclusions
◼ Graphs are powerful tools to detect
❑ Anomalies
❑ Events
❑ Fraud/Spam
129
Open challenges: practice
◼ What makes the results better in practice?
❑ better priors?
❑ better parameter learning?
❑ more data?
❑ …
◼ Graph construction
❑ If no network, what to use to build one?
❑ If one network,
◼ more latent edges? (e.g. review similarity)
◼ less edges? (e.g. domain knowledge)
❑ If more than one network, how to exploit all?
130
Key Papers
Core Papers
◼ L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted Graphs. PAKDD, 2010.
◼ Ide, T. and Kashima, H., Eigenspace-Based Anomaly Detection in Computer Systems. KDD, 2004.
Further Reading
◼ K. Ding, Q. Zhou, H. Tong and H. Liu: Few-shot Network Anomaly Detection via Cross-network Meta-learning.
TheWebConf 2021
◼ Minji Yoon, Bryan Hooi, Kijung Shin, Christos Faloutsos: Fast and Accurate Anomaly Detection in Dynamic
Graphs with a Two-Pronged Approach. KDD 2019: 647-657
◼ Bryan Perozzi, Leman Akoglu: Scalable Anomaly Ranking of Attributed Neighborhoods. SDM 2016: 207-215
◼ Duen Horng Chau, Carey Nachenberg, Jeffrey Wilhelm, Adam Wright, Christos Faloutsos: Large Scale Graph
Mining and Inference for Malware Detection. SDM 2011: 131-142
◼ Si Zhang, Dawei Zhou, Mehmet Yigit Yildirim, Scott Alcorn, Jingrui He, Hasan Davulcu, Hanghang Tong: HiDDen:
Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection. SDM 2017: 570-578
◼ Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, Christos Faloutsos: FRAUDAR: Bounding Graph
Fraud in the Face of Camouflage. KDD 2016: 895-904
◼ Leman Akoglu, Hanghang Tong, Danai Koutra: Graph based anomaly detection and description: a survey. Data
Min. Knowl. Discov. 29(3): 626-688 (2015)
◼ Gao, Xinbo and Xiao, Bing and Tao, Dacheng and Li, Xuelong. A survey of graph edit distance. Pattern Anal. and
App.s 13 (1), pp. 113-129. 2010.
◼ Leman Akoglu, Hanghang Tong, Brendan Meeder, Christos Faloutsos. PICS: Parameter-free Identification of
Cohesive Subgroups in large attributed graphs. SDM, 2012.
◼ G. Wang, S. Xie, B. Liu, P. S. Yu. Review Graph based Online Store Review Spammer Detection. ICDM, 2011.
◼ H Qiao, H Tong, B An, I King, C Aggarwal, G Pang: Deep Graph Anomaly Detection: A Survey and New
Perspectives. arXiv preprint arXiv:2409.09957, 2024
131
◼ Appendix
132
Relational Markov Nets
◼ Undirected dependencies
◼ Potentials on cliques of size 1
◼ Potentials on cliques of size 2
❑ (label-attribute)
❑ (label-observed label)
❑ (label-label)
For pairwise
RMNs max
clique size is 2
133
pairwise Markov Random Field
◼ For an assignment y to all unobserved Y,
pMRF is associated with probability distr:
136
Loopy belief propagation
◼ Invented in 1982 [Pearl] to calculate marginals
in Bayes nets.
◼ Also used to estimate marginals (=beliefs), or
most likely states (e.g. MAP) in MRFs
◼ Iterative process in which neighbor variables
“talk” to each other, passing messages
“I (variable x1) believe
you (variable x2) belong
in these states with
various likelihoods…”
◼ When consensus reached, calculate belief
137
details
Loopy belief propagation
1) Initialize all messages to 1
2) Repeat for each node:
j
i k
k
3) When messages “stabilize”:
k
138
m1→2(SH) = (0.0096*0.9+0.0216*0.1) / (m1→2(SH) + m1→2(CH)) ~0.35
m1→2(CH) = (0.0096*0.1+0.0216*0.9) / (m1→2(SH) + m1→2(CH)) ~0.65
139
Loopy belief propagation
Advantages:
◼ Easy to program & parallelize
◼ General: can apply to any graphical model w/ any
form of potentials (higher order than pairwise)
Challenges:
◼ Convergence is not guaranteed (when to stop)
❑ esp. if many closed loops
◼ Potential functions (parameters)
❑ require training to estimate
❑ learning by gradient-based optimization:
convergence issues during training
140