0% found this document useful (0 votes)
13 views136 pages

05graph Anomaly

Uploaded by

chunfeng277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views136 pages

05graph Anomaly

Uploaded by

chunfeng277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 136

CS 514 Advanced Topics in Network Science

Lecture 5. Graph Anomaly


Hanghang Tong, Computer Science, Univ. Illinois at Urbana -Champaign, 2024

Slides credit: Leman Akoglu @ CMU


Network Science: An Overview

network
(e.g., Patterns, laws, connectivity, etc.)

(e.g., clusters, communities,


dense subgraphs, etc.)
subgraph

(e.g., ranking, link prediction, embedding, etc.)

node/link
We are (mostly) here
• Level 1: diameter, connectivity, graph-level classification, graph-level embedding, graph kernel, graph structure learning, graph generator,…
• Level 2: frequent subgraphs, clustering, community detection, motif, teams, dense subgraphs, subgraph matching, NetFair, …
• Level 3: node proximity, node classification, link prediction, anomaly detection, node embedding, network alignment, NetFair,
• Beyond:, network of X, ….

2
An Atlas of Network Science

Network of Networks
Network Alignment
Anomaly Detection

Knowledge Graphs
Matrix & Tensor

Graph & LLMs


Optimal DGL
Proximity

NetFair
GNNs

GCOs


Consistency principle
MAP: Meta Approaches

Data compression

‘Real data’ science

Spectrum

Perturbative analysis

Sub-modularity

Bilevel opt.

Alternating opt.

Message Passing

Lecture Topics

3
Outliers vs. Graph anomalies
Our Focus

Clouds of points Inter-linked objects


(multi-dimensional) (graph/network)
4
Anomaly detection: Applications
Tax evasion Credit card fraud

Healthcare fraud
Network intrusion

5
Applications
Malware
Investment fraud Click fraud Spyware
Insurance fraud Malicious cargo

Auction fraud Damage detection


Fake reviews Medical diagnosis Email spam
False advertising
Performance monitoring

Insider threat
and many more…
Web spam

Image/video surveillance

6
Anomaly detection: definition
◼ (Hawkins’ Definition of Outlier, 1980)
“An outlier is an observation that differs
so much from other observations as to
arouse suspicion that it was generated
by a different mechanism.”

No unique Many definitions in


leads to
definition various contexts

outlier, anomaly, outbreak, event, fraud, …

7
Anomaly detection: definition
◼ for practical purposes,
a record/point/graph-node/graph-edge
is flagged as anomalous
if a rarity/likelihood/outlierness score
exceeds a user-defined threshold

◼ anomalies:
→ rare (e.g., rare combination of
categorical attribute values)
→ isolated points in n-d spaces
→ surprising (don't fit well in our mental/statistical
model == need too many bits under MDL)
8
Challenges
record

Temporal
Unbalanced (Velocity)
(fraud is rare)
Data Categorical,
(Volume) Numerical,
Unlabeled Relational, …
(no ground truth) (Variety)

9
Why graph-based detection?
◼ Powerful representation
❑ Interdependent instances
❑ Long-range relations
❑ Node/Edge attributes (data complexity)
❑ Hard to fake/alter (adversarial robustness)
◼ Abundant relational data
❑ Web, email, phone call, …
◼ Nature of applications
❑ organized fraud (group activity, e.g., fraudsters +
accomplice)

10
Real graphs (1) -
-
-
-
-

Internet Map
Food Web Terrorist Network

Biological networks
Blog networks Web Graph

11
Real graphs (2)

Retail networks

Protein-protein
Interaction

Social Network
Power Dating network
Grid

12
Problem revisited for graphs
◼ Three different problem settings
❑ Plain/Attributed Graphs

❑ Static/Dynamic Graphs

❑ Un-/Semi-/- Supervised Graph Techniques

13
Taxonomy
Graph Anomaly Detection

Static graphs Dynamic graphs Graph algorithms

Plain
Plain Attributed Learning Inference
Distance based models Iterative
classification
Feature based Structure based Feature-distance RMNs
Belief
Structure distance PRMs propagation
Structural features Substructures
RDNs Relational netw.
Recursive features Subgraphs
MLNs classification
Structure based
Community Community “phase transition”
based based

14
Goal of this lecture
◼ Introduce various problem formulations
❑ Definitions change by application/representation
◼ Applications of problem settings
❑ Intrusion, fraud, spam
◼ Introduce existing techniques
❑ Model fitting, factorization, relational inference
◼ Pros and Cons
❑ Parameters, scalability, robustness

15
Outline
◼ Motivation, applications, challenges
◼ Part I: Anomaly detection in static data
❑ Overview: Outliers in clouds of points
❑ Anomaly detection in graph data

◼ Part II: Event detection in dynamic data


❑ Overview: Change detection in time series
❑ Event detection in graph sequences

◼ Part III: Graph-based apps [optional]


❑ fraud and spam detection

16
Part I: Outline
◼ Overview: Outliers in clouds of points
❑ Outliers in numerical data points
◼ distance-based, density-based, …
❑ Outliers in categorical data points
◼ model-based

◼ Anomaly detection in graph data


❑ Anomalies in unlabeled, plain graphs
❑ Anomalies in node-/edge-labeled, attributed
graphs

17
Outlier detection (See Chapter 11@DM Textbook)
◼ Anomalies in multi-dimensional data points
❑ Density-based
❑ Distance-based
❑ Depth-based
❑ Distribution-based
❑ Clustering-based
❑ Classification-based
❑ Information theory-based
❑ Spectrum-based
❑ …
◼ No relational links between points
18
Part I: References (outliers)
◼ M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF:
Identifying density-based local outliers. SIGMOD, 2000.
◼ S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C.
Faloutsos. LOCI: Fast outlier detection using the local
correlation integral. ICDE, 2003.
◼ C. C. Aggarwal and P. S. Yu. Outlier detection for high
dimensional data. SIGMOD, 2001.
◼ A. Ghoting, S. Parthasarathy and M. Otey, Fast Mining of
Distance Based Outliers in High-Dimensional Datasets.
DAMI, 2008.
◼ Y. Wang, S. Parthasarathy and S. Tatikonda, Locality
Sensitive Outlier Detection. ICDE, 2011.
◼ Kaustav Das, Jeff Schneider. Detecting Anomalous
Records in Categorical Datasets. KDD 2007.

19
Part I: References (outliers)
◼ Müller E., Schiffer M., Seidl T. Adaptive Outlierness for
Subspace Outlier Ranking. CIKM, 2010.
◼ Müller E., Assent I., Iglesias P., Mülle Y., Böhm K.
Outlier Ranking via Subspace Analysis in Multiple Views
of the Data. ICDM, 2012.
◼ L. Akoglu, H. Tong, J. Vreeken, and C. Faloutsos. Fast
and Reliable Anomaly Detection in Categoric Data.
CIKM, 2012.
◼ A. Chaudhary, A. S. Szalay, and A. W. Moore. Very fast
outlier detection in large multidimensional data sets.
DMKD, 2002.
◼ Survey: V. Chandola, A. Banerjee, V. Kumar: Anomaly
Detection: A Survey. ACM Computing Surveys, Vol.
41(3), Article 15, July 2009.
20
Part I: Outline
◼ Overview: Outliers in clouds of points
❑ Outliers in numerical data points
◼ distance-based, density-based, …
❑ Outliers in categorical data points
◼ model-based

◼ Anomaly detection in graph data


❑ Anomalies in unlabeled, plain graphs
❑ Anomalies in node-/edge-labeled, attributed
graphs

21
Taxonomy
Graph Anomaly Detection

Static graphs Dynamic graphs Graph algorithms

Plain
Plain Attributed Learning Inference
Distance based models Iterative
classification
Feature based Structure based Feature-distance RMNs
Belief
Structure distance PRMs propagation
Structural features Substructures
RDNs Relational netw.
Recursive features Subgraphs
MLNs classification
Structure based
Community Community “phase transition”
based based

22
Akoglu et al. ’10
Anomalies in Weighted Graphs
◼Problem:
Q1. Given a weighted
and unlabeled graph,
how can we spot
strange, abnormal,
extreme nodes?

Q2. Can we explain why


the spotted nodes are
anomalous?
23
Problem sketch

24
OddBall: approach
1) For each node,
1.1) Extract “ego-net” (=1-step neighborhood)
1.2) Extract features (#edges, total weight, etc.)
→ features that could yield “laws”
→ features fast to compute and interpret
2) Detect patterns: ego ego-net
→ regularities
3) Detect anomalies:
→“distance” to patterns

25
What is odd?

• In terms of topology
• A near-star: telemarketer, port scanner, people adding friends indiscriminately
• A near-clique: terrorists, tightly connected group of people, ambitious political
discussion group in a post network
• In terms of weights :
• different weight distribution on edges is acceptable – a person having close friends,
closer friends, closest friend etc.
• Single heavy link – tight pair – single-minded, tight company
• Uniform weights – robot-like behavior
Which features to compute?
▪ Ni: number of neighbors (degree) of ego i
▪ Ei: number of edges in egonet i

▪ Wi: total weight of egonet i


▪ λw,i: principal eigenvalue of the weighted
adjacency matrix of egonet i
details
Weighted principal eigenvalue
λw,i = √N = √E = √W
λw,i > √N
~ √E, √W
λw,i √W λw,i = N ≈ √W

λw,i = W λw,i ≈ W

N: #neighbors, W: total weight


OddBall: pattern#1

discussion group,
“rank boosting”, etc. slope=2 slope=1.35
#edges E

slope=1
telemarketer, spammer,
port scanner, “popularity
contests”, etc.

#neighbors N
29
OddBall: pattern#2
high $ vs. #accounts,
high $ vs. #donors, etc.
slope=1.08
total weight W

slope=1

uniform, robot-like
behavior

#edges E
30
OddBall: pattern#3

slope=1

largest eigenvalue slope=0.64

slope=0.5
λ1,w

total weight W
31
OddBall: anomaly detection
scoredist = distance to fitting line
scoreoutl = outlier-ness score
score = func ( scoredist , scoreoutl )

✓ can tell what type


of anomaly a node
belongs to
✓ can quantify “anomalous-ness”
of nodes using score
32
OddBall: datasets

Bipartite graphs: |V| |E|


1. FEC Don2Com 1.6M 2M
2. FEC Com2Cand 6K 125K
3. DBLP Auth2Conf 21K 1M

Unipartite graphs: |V| |E|


4. BlogNet 27K 126K
5. PostNet 223K 217K
6. Enron 36K 183K
7. AS peering 11K 8K

33
OddBall at work (Posts)
# of Edges (E) POSTS

223K posts
217K citations

# of Nodes (N)

34
OddBall at work (DBLP)
Extremely focused

Extremely focused

many papers in many conferences

#publications (weights W)
36
Henderson et al. ’11
Recursive structural features
◼ Main idea: recursively combine “local” (node-
based) and neighbor (egonet-based) features
❑ Recursive feature: any aggregate computed over
any feature (including recursive) value among a
node’s neighbors
Structural information

local
Neighborhood
egonet Regional

recursive

37
Recursive structural features
local egonet recursive

in- and out-degree, within-, incoming-, aggregate feature


weighted versions outgoing-egonet over neighbors
edges, weighted e.g. max/min/avg degree
versions
38
ReFeX: Recursive Feature eXtraction

◼ Recursive features proved effective in transfer


learning, identity resolution
(yet to be studied for anomaly detection)
39
Graph Deviation Networks (GDN)
◼ GNNs as Node-feature Extractor?
❑ No really

◼ A Better Solution: GDN

Z-Score-based deviation loss

• K. Ding, Q. Zhou, H. Tong and H. Liu: Few-shot Network Anomaly Detection via Cross-network Meta-learning. TheWebConf 2021
• H Qiao, H Tong, B An, I King, C Aggarwal, G Pang: Deep Graph Anomaly Detection: A Survey and New Perspectives. arXiv preprint
40 arXiv:2409.09957, 2024
Taxonomy
Graph Anomaly Detection

Static graphs Dynamic graphs Graph algorithms

Plain
Plain Attributed Learning Inference
Distance based models Iterative
classification
Feature based Structure based Feature-distance RMNs
Belief
Structure distance PRMs propagation
Structural features Substructures
RDNs Relational netw.
Recursive features Subgraphs
MLNs classification
Structure based
Community Community “phase transition”
based based

41
Sun et al. ’05
Anomalies in Bipartite Graphs
◼ Problem:
Q1. Neighborhood formation (NF)
V1 V2
❑ Given a query node q in V1, .3
what are the relevance scores .2
.25

of all the nodes in V1 to q ? q .25

.05

.05
Q2. Anomaly detection (AD)
.01
❑ Given a query node q in V1, .002

what are the normality scores .01

for nodes in V2 that link to q ?

42
Applications of problem setting
◼ Publication network
❑ (similar) authors vs. (unusual) papers
◼ P2P network
❑ (similar) users vs. (“cross-border”) files
◼ Financial trading network
❑ (similar) stocks vs. (cross-sector) traders
◼ Collaborative filtering
❑ (similar) users vs. (“cross-border”) products

43
1) Neighborhood formation
V1 V2
◼ Main idea: .3
❑ Random-Walk-with Restart from q .2
❑ Steady-state V1 prob.s as relevance q

❑ (1) Construct transition matrix P .05


.01
.002
.01
❑ (2) Fly-back prob. c to q
c
❑ (3) Solve for steady state q c
(t+1) (t) c c c

(1-c)
Approx: RWR on graph partition containing q
44
2) Anomaly detection
◼ Main idea:
❑ Pairwise “normality” scores of neighbors(t)
❑ Function of (e.g. avg) pair-wise scores S
t
❑ (1) Find set S of nodes connected to t
❑ (2) Compute |S|x|S| normality matrix R
◼ asymmetric, diagonal reset to 0
S
❑ (3) Apply score function f(R) t
◼ e.g. f(R) = mean(R)

45
Tong et al. ’11
Graph Anomalies by NrMF
◼ Low-rank adjacency matrix factorization of a
(sparse) graph reveals communities and anomalies
Low-rank matrices Residual matrix

Graph Adj. Matrix A A=FxG+R

community anomalies

49
Non-negativity constraints
◼ For improved interpretability
◼ A Typical Procedure: Interpretation by Non-negativity
community
Non-negative Matrix Factorization
Adjacency
Graph
Matrix A
A=FxG+R F >= 0; G >= 0
(for community detection)
anomalies
◼ An Example
Non-negative Residual
Matrix Factorization
R(i,j) >= 0; for A(i,j) > 0
(for anomaly detection)

50
Optimization formulation

Common in
Matrix Factorization

Non-negative
residual

◼ Q: How to find ‘optimal’ F and G?


❑ D1: Quality → C1: objective non-convex
❑ D2: Scalability → C2: large graph size

51
Experiments
◼ NNrMF can spot 4 types of anomalies

NNrMF SVD
residuals residuals (top-k edges)
52
Part I: References (plain graphs)
Community mining Feature mining

◼ L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting


Anomalies in Weighted Graphs. PAKDD, 2010.
◼ K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad,
H. Tong, C. Faloutsos. It's Who You Know: Graph Mining
Using Recursive Structural Features. KDD, 2011.
◼ J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos.
Neighborhood formation and anomaly detection in bipartite
graphs. ICDM, 2005.
◼ Hanghang Tong, Ching-Yung Lin: Non-Negative Residual
Matrix Factorization with Application to Graph Anomaly
Detection. SDM, pages 143-153, 2011.
◼ Q. Ding, N. Katenka, P. Barford, E. Kolaczyk, and M.
Crovella. Intrusion as (Anti)social Communication:
Characterization and Detection. KDD, 2012.
53
Part I: Outline
◼ Overview: Outliers in clouds of points
❑ Outliers in numerical data points
◼ distance-based, density-based, …
❑ Outliers in categorical data points
◼ model-based

◼ Anomaly detection in graph data


❑ Anomalies in unlabeled, plain graphs
❑ Anomalies in node-/edge-labeled, attributed
graphs

54
Taxonomy
Graph Anomaly Detection

Static graphs Dynamic graphs Graph algorithms

Plain
Plain Attributed Learning Inference
Distance based models Iterative
classification
Feature based Structure based Feature-distance RMNs
Belief
Structure distance PRMs propagation
Structural features Substructures
RDNs Relational netw.
Recursive features Subgraphs
MLNs classification
Structure based
Community Community “phase transition”
based based

55
Noble & Cook. ’03
Anomalies in labeled graphs
◼Problem:
Q1. Given a graph in which nodes and edges
contain (non-unique) labels, what are
unusual substructures?

56
Background
◼ Subdue*: An algorithm for detecting repetitive
patterns (substructures) within graphs.
◼ Substructure: A connected subgraph of the
overall graph.
◼ Compressing a graph: Replacing each
instance of the substructure with a new
vertex representing that substructure.
◼ Description Length (DL): Number of bits
needed to encode a piece of data

* http://ailab.wsu.edu/subdue/

57
Background
◼ Subdue uses the following heuristic:
❑ The best substructure is the one that minimizes
F1(S,G) = DL(G | S) + DL(S)
◼ G: Entire graph, S: The substructure,
◼ DL(G|S) is the DL of G after compressing it using S,
◼ DL(S) is the description length of the substructure.

◼ Iterations after compressing at each step


58
Background
Given database D and set of models
for D, Minimum Description Length d=1

selects model M that minimizes


L (M) + L (D|M)
length in bits: length in bits: data, vs.
d=9
description of encoded by M
model M

a1x+a0 deltas Bishop: PR&ML


vs.
a9x9+…+a1x+a0 {}
59
1) Anomalous Substructures
◼ Main idea: anomalies (by def.) occur infrequently,
they are roughly opposite to “best substructures”
❑ Find substructures S that maximize F1(S,G)?
◼ Nope, it flags all single nodes as anomalies!
❑ Instead, find those that minimize
F2(S, G) = Size(S) * Instances(S,G)
◼ Approximate inverse of F1(S,G)

◼ Intuition: Larger substructures are expected to


occur few times; the smaller the
substructure, the less likely it is rare
60
Example
◼ F2(S, G) = Size(S) * Instances(S,G)
❑ For node D, F2 = 1 * 1 = 1

❑ For A→C and D→A, it is 2 * 1 = 2

❑ For G (whole graph), it is 9 * 1 = 9

◼ Hence D is considered the most anomalous.

◼ Note: Usually a threshold for F2 is used and


anomalies are ranked by their scores.
61
Akoglu et al. ’12
Cohesive groups in attributed graphs
◼ Problem:
Given a graph with node attributes (features)
❑ social networks + user interests
❑ phone call networks + customer demographics
❑ gene interaction networks + gene expression info
Find cohesive clusters, bridges, anomalies

A B

Note: cohesive cluster: similar connectivity & attributes


62
Problem sketch
(Binary) Feature
People Features People Groups Groups

People Groups

People Groups
A F
People

People

Given adjacency matrix A and feature matrix F


Find homogeneous blocks (clusters) in A and F
* parameter-free
* scalable
63
Problem formulation
1. How many node- & attribute-clusters?
2. How to assign nodes and attributes to clusters?

Main idea: employ Minimum Description Length

L (M) + L (D|M)
encoding length encoding length
of clustering of blocks

Good implies
Good
Clustering Compression

64
Problem formulation details

▪ L (M) : Model description cost


1. as n: #nodes, f: #attributes
2. k: #node-clusters, l: #attribute-clusters
3. size of node-cluster i
size of attribute-cluster j

▪ L(D|M): Data description cost given Model


1. For each block in A and F , #1s:
2.
A similar problem (column re-ordering for minimum
total run length) is shown to be NP-hard [Johnson+].
where
(reduction from Hamiltonian Path)
or
• log ∗ x : universal code length of integers
65 • log ∗ x ≈ log 2 𝑥 + log 2 log 2 𝑥 + ⋯ (only keep the positive part, assume the range of x is unknown beforehand
Algorithm sketch

The algorithm is iterative and monotonic


–will converge to local optimum
66
PICS at work (Political books)
Examples of “core” liberal and conservative books
Books

Book groups liberal vs.


Examples of bridging ‘conservative’ books
conservative

“core and periphery”


67
PICS at work (Reality mining)
call-center
casual

business

grad

Subjects
Phone calls title

Subjects
Device scans title

68
PICS at work (YouTube)
familiar strangers

anime lovers

bridges

YouTube users YouTube


77K users groups
30K groups
69
Part I: References (attribute graphs)
Substructures

◼ C. C. Noble and D. J. Cook. Graph-based anomaly


detection. KDD, pages 631–636, 2003.
◼ W. Eberle and L. B. Holder. Discovering structural
anomalies in graph-based data. ICDM Workshops, pages
393–398, 2007.
◼ Michael Davis, Weiru Liu, Paul Miller, George Redpath:
Detecting anomalies in graphs with numeric labels. 1197-
Community mining

1202, CIKM 2011.


◼ Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun,
Jiawei Han: On community outliers and their efficient
detection in information networks. KDD 2010: 813-822.
◼ Leman Akoglu, Hanghang Tong, Brendan Meeder, Christos
Faloutsos. PICS: Parameter-free Identification of Cohesive
Subgroups in large attributed graphs. SDM, 2012.
70
Tutorial Outline
◼ Motivation, applications, challenges
◼ Part I: Anomaly detection in static data
❑ Overview: Outliers in clouds of points
❑ Anomaly detection in graph data

◼ Part II: Event detection in dynamic data


❑ Overview: Change detection in time series
❑ Event detection in graph sequences

◼ Part III: Graph-based apps


❑ fraud and spam detection

71
Part II: Outline
◼ Overview: Events in point sequences
❑ Change detection in time series
❑ Learning under concept drift

◼ Events in graph sequences


❑ Change by graph distance
❑ Change by graph connectivity

72
Event detection
◼ Anomaly detection in time series of multi-
dimensional data points
❑ Exponentially Weighted Moving Average
❑ CUmulative SUM Statistics
❑ Regression-based
❑ Box-Jenkins models eg. ARMA, ARIMA
❑ Wavelets
❑ Hidden Markov Models
❑ Model-based hypothesis testing
❑ …
◼ This part: time series of graphs
73
Part II: References (data series)
◼ Montgomery, D. C. Introduction to Statistical Quality Control. John
Wiley and Sons, Inc., 2001.
◼ Box, George and Jenkins, Gwilym. Time series analysis: Forecasting
and control, San Francisco: Holden-Day, 1970.
◼ Gama J., Medas P., Castillo G., Rodrigues P.P.: Learning with Drift
Detection. SBIA 2004: 286-295.
◼ Grigg et al.; Farewell, VT; Spiegelhalter, DJ. The Use of Risk-Adjusted
CUSUM and RSPRT Charts for Monitoring in Medical
Contexts. Statistical Methods in Medical Research 12 (2): 147–170.
◼ Bay, S. D., and Pazzani, M. J., Detecting change in categorical data:
Mining contrast sets. KDD, pages 302–306, 1999.
◼ M. Van Leeuwen, A. Siebes. StreamKrimp: Detecting Change in Data
Streams. ECML PKDD, 2008.
◼ Wong, W.-K., Moore, A., Cooper, G. and Wagner, M. WSARE: An
Algorithm for the Early Detection of Disease Outbreaks. JML, 2005.
◼ Tutorial: D. B. Neill and W.-K. Wong. A tutorial on event detection.
KDD, 2009.
74
Part II: Outline
◼ Overview: Events in point sequences
❑ Change detection in time series
❑ Learning under concept drift

◼ Events in graph sequences


❑ Change by graph distance
◼ feature-based
◼ structure-based
❑ Change by graph connectivity

75
Events in time-evolving graphs
◼Problem: Given a sequence of graphs,
Q1. change detection: find time points at
which graph changes significantly

Q2. attribution: find (top k) nodes / edges /


regions that change the most

76
Events in time-evolving graphs
◼ Main framework
❑ Compute graph similarity/distance scores

… … …
time

❑ Find unusual occurrences in time series

◼ *Note: scalability is a desired property


77
Taxonomy
Graph Anomaly Detection

Static graphs Dynamic graphs Graph algorithms

Plain
Plain Attributed Learning Inference
Distance based models Iterative
classification
Feature based Structure based Feature-distance RMNs
Belief
Structure distance PRMs propagation
Structural features Substructures
RDNs Relational netw.
Recursive features Subgraphs
MLNs classification
Structure based
Community Community “phase transition”
based based

78
Graph distance – 10 metrics [& more]
Shoubridge et al. ‘02
◼ (1) Weight distance Dickinson et al. ‘04

◼ (2) Maximumum Common Subgraph (MCS)


Weight distance

◼ (3) MCS Edge distance

79
Graph distance – 10 metrics
◼ (4) MCS Node distance

◼ (5) Graph Edit distance Gao et al. ’10 (survey)

❑ Total cost of sequence of edit operations, to make


two graphs isomorphic (costs may vary)
❑ Unique labeling of nodes reduces computation
◼ otherwise an NP-complete problem
❑ Alternatives for weighted graphs
80
Graph distance – 10 metrics
◼ (5.5) Weighted Graph Edit distance
Kapsabelis et al. ’07

edge weights
Non-linear cost functions

81
Graph distance – 10 metrics
◼ (6) Median Graph distance Dickinson et al. ’04
❑ Median graph of sequence

❑ for each graph in sequence


❑ free to choose any distance function d

◼ (7) Modality distance Kraetzl et al. ’06

Perron vector
82
Graph distance – 10 metrics
◼ (8) Diameter distance Gaston et al. ’06

shortest distance
◼ (9) Entropy distance
* *

◼ (10) Spectral distance Largest pos.


eigenvalues of
Laplacian

83
Graph distance – 10 metrics

84
Pincombe ’05
Graph distance to time series
◼ Time series of graph distances per dist. func.
◼ model for each time series
❑ assumes stationary series, due to construction
◼ Anomalous time points: where residuals
exceed a threshold

residuals

day
85
Part II: Outline
◼ Overview: Events in point sequences
❑ Change detection in time series
❑ Learning under concept drift

◼ Events in graph sequences


❑ Change by graph distance
◼ feature-based
◼ structure-based
❑ Change by graph connectivity
◼ phase transition

86
Ide et al. ’04
Eigen-space-based events
sx s10 s4
◼ Given a time-evolving graph
s3 s9
Identify faulty vertices
s7 s11 sy

◼ Challenges
❑ Large number of nodes,
impractical to monitor each
❑ Edge weights are highly dynamic
❑ Anomaly defined collectively (different than “others”)

Event: a “phase transition” of the graph


(in overall relation between the edge weights)
87
“Summary feature” extraction
◼ Definition of “activity” vector

activity vector at t adjacency matrix at t


(symmetric, non-negative)

◼ The above equation can be reduced to

→The principal eigenvector gives the summary


of node “activity”

88
Activity feature
◼ Why “activity”? (intuition)
❑ If D12 is large, then u1 and u2 should be large
because of argmax (note: D is a positive matrix).
❑ So, if s1 actively links to other nodes at t, then the
“activity” of s1 should be large.
u1 
❑ Also interpreted as “stationary state”: u 
probability that a node is holding the u=  2 

 
“control token” sx s10 s2  
u N 
s3 s9
s7 s11 sy

89
Anomaly detection
◼ Problem reduced from a sequence of graphs to
a sequence of (activity) vectors
adjacency matrix activity vector
principal
eigenvector

u(t)

t-1
SVD
r(t-1) t-W
r(t-1)
u(t)
track angle for change summary vector
90
Experiment
◼ Time evolution of
activity scores
effectively visualizes
malfunction

◼ Anomaly measure and


online thresholding
dynamically capture
activity change

◼ Nodes changing most


can be attributed

91
Reconstruction-based events

◼ Network forensics
❑ Sparsification ➔ load shedding
❑ Matrix decomposition ➔ summarization
❑ Error Measure ➔ anomaly detection

Sun+ICDM’07
92 modified with permission
Matrix decomposition
◼ Goal: summarize a given graph
decompose adjacency matrix
into smaller components

1800’s,
1. Singular Value Decomposition (SVD) PCA, LSI, …
2. CUR decomposition Drineas et al. ’05

3. Compact Matrix Decomposition (CMD) Sun et al. ’07

4. Colibri Tong et al. ’08


93
1. Singular Value Decomposition
A = UVT
A U
1
 v1
VT
2 v2
x(1) x(2) x(M) u1 u2 uk . .
=
k vk
singular values right singular vectors
input data left singular vectors

+ Optimal low-rank approximation


VT
- Lack of Sparsity 
=
U
94
Drineas et al. ’05
2. CUR decomposition
C, U, R for small ||A-CUR||
basis vectors actual
cols and rows of A

Actual column
Singular vector

+ Provably good approximation to SVD


+ Sparse basis (A is sparse)
- Space overhead (duplicate bases)
Sun+ICDM’07
95 modified with permission
Sun et al. ’07
3. Compact Matrix Decomposition
C, U, R for small ||A-CUR||, and
No duplicates in C and R
CUR CMD
Rd Rd Rs
A ~ =
Cd Cs

+ Sparse basis (A is sparse)


+ Efficiency in space and computation time
96
Reconstruction-based events

◼ Network forensics
❑ Sparsification ➔ load shedding
❑ Matrix decomposition ➔ summarization
❑ Error Measure ➔ anomaly detection

97
Sun et al. ’07
Error measure: reconstruction
◼ accuracy = 1- Relative Sum-Square-Error
RSSE =
Volume monitoring
◼ Monitor accuracy over time cannot detect anomaly

structural change
of link patterns

◼ Also, high reconstruction error of rows/cols for static


snapshot anomalies
98
Part II: References (graph series)
◼ Shoubridge P., Kraetzl M., Wallis W. D., Bunke H. Detection of Abnormal
Change in a Time Series of Graphs. Journal of Interconnection Networks
(JOIN) 3(1-2):85-101, 2002.
◼ Shoubridge, P. Kraetzl, M. Ray, D. Detection of abnormal change in
dynamic networks. Information, Decision and Control, 1999.
◼ Kelly Marie Kapsabelis, Peter John Dickinson, Kutluyil Dogancay.
Investigation of graph edit distance cost functions for detection of network
anomalies. ANZIAM J. 48 (CTAC2006) pp.436–449, 2007.
◼ Panagiotis Papadimitriou, Ali Dasdan, Hector Garcia-Molina. Web graph
similarity for anomaly detection. J. Internet Services and Applications (JISA)
1(1):19-30 (2010)
◼ B. Pincombe. Anomaly Detection in Time Series of Graphs using ARMA
Processes. ASOR BULLETIN, 24(4):2, 2005.
◼ Gao, Xinbo and Xiao, Bing and Tao, Dacheng and Li, Xuelong. A survey of
graph edit distance. Pattern Anal. and App.s 13 (1), pp. 113-129. 2010.
◼ Horst Bunke, Peter J. Dickinson, Andreas Humm, Christophe Irniger, Miro
Kraetzl: Computer Network Monitoring and Abnormal Event Detection Using
Graph Matching and Multidimensional Scaling. Industrial Conference on
Data Mining 2006:576-590
99
Part II: References (graph series) (2)
◼ C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park. Scan Statistics on
Enron Graphs. Computational & Mathematical Organization Theory,
11(3):229–247, 2005.
◼ Ide, T. and Kashima, H., Eigenspace-Based Anomaly Detection in Computer
Systems. KDD, 2004.
◼ L. Akoglu, M. McGlohon, C. Faloutsos. Event Detection in Time Series of
Mobile Communication Graphs. Army Science Conference, 2010.
◼ Sun, Jimeng and Xie, Yinglian and Zhang, Hui and Faloutsos, Christos. Less
is more: Compact matrix representation of large sparse graphs. ICDM 2007.
◼ Sun, Jimeng and Tao, Dacheng and Faloutsos, Christos. Beyond streams
and graphs: dynamic tensor analysis. KDD 2006: 374-383
◼ Sun J., Faloutsos C., Papadimitriou S., Yu P. S. GraphScope: parameter-free
mining of large time-evolving graphs. KDD, 2007.
◼ R. Rossi, B. Gallagher, J. Neville, and K. Henderson. Role-Dynamics: Fast
Mining of Large Dynamic Networks. 1st Workshop on Large Scale Network
Analysis, WWW, 2012.
◼ Cemal Cagatay Bilgin , Bülent Yener . Dynamic Network Evolution: Models,
Clustering, Anomaly Detection. Survey, 2008.
100
Tutorial Outline
◼ Motivation, applications, challenges
◼ Part I: Anomaly detection in static data
❑ Overview: Outliers in clouds of points
❑ Anomaly detection in graph data

◼ Part II: Event detection in dynamic data


❑ Overview: Change detection in time series
❑ Event detection in graph sequences

◼ Part III: Graph-based apps [optional]


❑ fraud and spam detection

101
Part III: Outline
◼ Online auction fraud
◼ Fake review spam
◼ Web spam

102
Taxonomy
Graph Anomaly Detection

Static graphs Dynamic graphs Graph algorithms

Plain
Plain Attributed Learning Inference
Distance based models Relational netw.
classification
Feature based Structure based Feature-distance RMNs
Iterative
Structure distance PRMs classification
Structural features Substructures
RDNs Gibbs sampling
Recursive features Subgraphs
MLNs Belief
Structure based propagation
Community Community “phase transition”
based based
Applications
Fraud detection
Spam detection

103
Chau et al. ’06
(1) Online auction fraud
◼ Auction sites: attractive target for fraud
◼ 63% complaints to Federal Internet Crime
Complaint Center in U.S. in 2006
◼ Average loss per incident: = $385
◼ Often non-delivery fraud:

$$$

Seller Buyer

104
Online auction fraud detection
◼ Insufficient solution:
❑ Look at individual features, geographic locations,
login times, session history, etc.

◼ Harder to fake: graph structure


◼ Capture relationships between users

◼ Q: How do fraudsters interact with other


users and among each other?
→ in addition to buy/sell relations, there is a
feedback mechanism
105
Feedback mechanism
◼ Each user has a reputation score
◼ Users rate each other via feedback

$$$

Reputation score: Reputation score:


70 + 1 = 71 15 - 1 = 14

◼ Q: How do fraudsters game the feedback


system?
106
Auction “roles”
◼ Do they boost each
other’s reputation?

◼ They form near-bipartite


cores (2 roles)
accomplice
❑ trades w/ honest, looks legit
fraudster
❑ trades w/ accomplice
❑ fraud w/ honest
107
Detecting online fraud
◼ How to find near-bipartite cores? How to find
roles (honest, accomplice, fraudster)?
❑ Use Belief Propagation!

◼ How to set BP parameters (potentials)?

❑ prior beliefs: prior knowledge, unbiased if none


❑ compatibility potentials: by insight

108
BP in action At each iteration, for each
Initialize prior beliefs of node, compute messages to
fraudsters to P(f)=1 its neighbors

Initialize
other
nodes as
unbiased

Continue till Compute beliefs,


“convergence” use most likely state

109
Computing beliefs → roles
P(accomplice)
P(honest)

A E
P(fraudster)

Chau+PKDD’06
110 modified with permission
(2) Fake review spam
◼ Review sites: attractive target for spam
◼ Often hype/defame spam
◼ Paid spammers

111
Fake review spam detection
◼ Behavioral analysis [Jindal & Liu’08]
❑ individual features, geographic locations, login
times, session history, etc.
◼ Language analysis [Ott et al.’11]
❑ use of superlatives, many self-referencing, rate of
misspell, many agreement words, …

◼ Harder to fake: graph structure


◼ Capture relationships between
reviewers, reviews, stores

112
[Wang et al. ’11]
Graph-based detection
Reviewer r trustiness T(r)
• Reviewer r ’s i-th review
• nr : total # of reviews by r

Trustiness
T(r)

Honesty Hr

113
Graph-based detection
Store s reliability R(s)
review v’s author id
Median rating (3)

review v’s rating


All reviews store s receives, written by
reviewers with positive trustiness scores

Intuition: a store is more reliable if more


trustworthy reviewers say good things about it

114
Graph-based detection
Review v honesty H(v)
Trustiness of the author

Surrounding reviews which Remaining


are similar w/ each other surrounding reviews

Intuition: (1) store reliability, (2) surrounding


reviews – agreement between v and other reviews
about the same store in a given time window
115
Graph-based detection
Reviewer r trustiness T(r)

Store s reliability R(s)

Review v honesty H(v)

116
Graph-based detection
◼ Algorithm: iterate trustiness, reliability, and
honesty scores in a mutual recursion
❑ similar to Kleinberg’s HITS algorithm
❑ non-linear relations

◼ Challenges:
❑ Convergence not guaranteed
❑ Cannot use attribute info
❑ Parameters: agreement time window ∆t, review
similarity threshold (for dis/agreement)

117
(3) Web spam
◼ Spam pages: pages designed to trick search
engines to direct traffic to their websites

118
Web spam
◼ Challenges:
❑ pages are not independent
❑ what features are relevant?
❑ small training set
❑ noisy labels (consensus is hard)
❑ content very dynamic

119
Web spam
◼ Many graph-based solutions
❑ TrustRank [Gyöngyi et al. ’04]
❑ SpamRank [Benczur et al. ’05]
❑ Anti-trustRank [Krishnan et al. ’06]
❑ Propagating trust and distrust [Wu et al. ’06]
❑ Know your neighbors [Castillo et al. ’07]
❑ Guilt-by-association [Kang et al. ’11]
❑ …

120
Web spam
◼ Main idea: exploit homophily and reachability

121
[Gyöngyi et al. ’04]
TrustRank: combating web spam
◼ Main steps:
❑ Find seed set S of “good” pages
(e.g. using oracle)
❑ Compute trust scores by biased
(personalized) PageRank from
good pages
◼ Intuition: spam pages are
hardly reachable from
trustworthy pages
❑ Hard to acquire direct inlinks
from good pages

122
details
TrustRank mathematically
◼ Remember PageRank score of a page p:

◼ In closed form:

damping factor Transition matrix


◼ Personalized PageRank:

1/|S| for S nodes of


interest (seeds)
123
Part III: References (alg.s and app.s)
◼ P. Sen,G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T.
Eliassi-Rad. Collective Classification in Network Data. AI
Magazine, 29(3):93-106, 2008.
◼ S. A. Macskassy and F. Provost. A Simple Relational
Classifier. KDD Workshops, 2003.
◼ S. Pandit, D. H. Chau, S. Wang, C. Faloutsos. NetProbe: A
Fast and Scalable System for Fraud Detection in Online
Auction Networks. WWW, 2007.
◼ M. McGlohon, S. Bay, M. G. Anderle, D. M. Steier, C.
Faloutsos: SNARE: a link analytic system for graph labeling
and risk detection. KDD, 2009.
◼ Zhongmou Li, Hui Xiong, Yanchi Liu, Aoying Zhou. Detecting
Blackhole and Volcano Patterns in Directed Networks. ICDM,
pp 294-303, 2010.
124
Part III: References (alg.s and app.s) (2)
◼ G. Wang, S. Xie, B. Liu, P. S. Yu. Review Graph based
Online Store Review Spammer Detection. ICDM, 2011.
◼ Zoltán Gyöngyi , Hector Garcia-molina , Jan Pedersen.
Combating web spam with TrustRank. VLDB, 2004.
◼ Andras A. B., Karoly C., Tamas S., Mate U. SpamRank -
Fully Automatic Link Spam Detection. AIRWeb, 2005.
◼ Vijay Krishnan, Rashmi Raj: Web Spam Detection with
Anti-Trust Rank. AIRWeb, pp. 37-40, 2006.
◼ Baoning W., Vinay G., and Brian D. D.. Propagating Trust
and Distrust to Demote Web Spam. WWW, 2006.
◼ Castillo, C. and Donato, D. and Gionis, A. and Murdock, V.
and Silvestri, F. Know your neighbors: web spam detection
using the web topology. SIGIR, pp. 423-430, 2007.
125
Other references
◼ Matthew J. Rattigan, David Jensen: The case for anomalous link
discovery. SIGKDD Explorations 7(2): 41-47 (2005)
◼ Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han, Philip S. Yu. Mining
Behavior Graphs for "Backtrace" of Noncrashing Bugs. SDM, 2005.
◼ Shetty, J. and Adibi, J. Discovering Important Nodes through Graph
Entropy: The Case of Enron Email Database. KDD, workshop, 2005.
◼ Boden B., Günnemann S., Hoffmann H., Seidl T. Mining Coherent
Subgraphs in Multi-Layer Graphs with Edge Labels. KDD 2012.
◼ K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L.
Akoglu, D. Koutra, L. Li, C. Faloutsos. RolX: Structural Role Extraction
& Mining in Large Graphs. KDD, 2012.
◼ J. Neville, O. Simsek, D. Jensen, J. Komoroske, K. Palmer, and H.
Goldberg. Using Relational Knowledge Discovery to Prevent Securities
Fraud. KDD, pp. 449–458, 2005.
◼ H. Bunke and K. Shearer. A graph distance metric based on the
maximal common subgraph. Pattern Rec. Let., 19(3-4):255–259, 1998.

126
Other references
◼ B. T. Messmer and H. Bunke. A new algorithm for error-tolerant
subgraph isomorphism detection. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 20(5):493–504, 1998.
◼ P. J. Dickinson, H. Bunke, A. Dadej, M. Kraetzl: Matching graphs with
unique node labels. Pattern Anal. Appl. 7(3): 243-254 (2004)
◼ Peter J. Dickinson, Miro Kraetzl, Horst Bunke, Michel Neuhaus, Arek
Dadej: Similarity Measures For Hierarchical Representations Of
Graphs With Unique Node Labels. IJPRAI 18(3): 425-442 (2004)
◼ Kraetzl, M. and Wallis, W. D., Modality Distance between Graphs.
Utilitas Mathematica, 69, 97–102, 2006.
◼ Gaston, M. E., Kraetzl, M. and Wallis, W. D., Graph Diameter as a
Pseudo-Metric for Change Detection in Dynamic
Networks, Australasian Journal of Combinatorics, 35, 299—312, 2006.
◼ B. Pincombe. Detecting changes in time series of network graphs using
minimum mean squared error and cumulative summation. ANZIAM J.
48, pp.C450–C473, 2007.

127
Conclusions
◼ Graphs are powerful tools to detect
❑ Anomalies

❑ Events

❑ Fraud/Spam

in complex real-world data (attributes,


(noisy) side information, weights, …)
◼ Nature of the problem highly dependent on
the application domain
◼ Each problem formulation needs a
different approach
128
Open challenges: research
◼ Anomalies in dynamic graphs
❑ dynamic attributed graphs (definitions,
formulations, real-world scenarios)
❑ temporal effects: node/edge history (not only
updates)
◼ Fraud/spam detection: system perspective
❑ adversarial robustness
❑ cost (to system in measurement , to adversary to
fake, to user in exposure)
❑ detection timeliness and other system design
aspects; e.g. dynamicity, latency

129
Open challenges: practice
◼ What makes the results better in practice?
❑ better priors?
❑ better parameter learning?
❑ more data?
❑ …
◼ Graph construction
❑ If no network, what to use to build one?
❑ If one network,
◼ more latent edges? (e.g. review similarity)
◼ less edges? (e.g. domain knowledge)
❑ If more than one network, how to exploit all?
130
Key Papers
Core Papers
◼ L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted Graphs. PAKDD, 2010.
◼ Ide, T. and Kashima, H., Eigenspace-Based Anomaly Detection in Computer Systems. KDD, 2004.

Further Reading
◼ K. Ding, Q. Zhou, H. Tong and H. Liu: Few-shot Network Anomaly Detection via Cross-network Meta-learning.
TheWebConf 2021
◼ Minji Yoon, Bryan Hooi, Kijung Shin, Christos Faloutsos: Fast and Accurate Anomaly Detection in Dynamic
Graphs with a Two-Pronged Approach. KDD 2019: 647-657
◼ Bryan Perozzi, Leman Akoglu: Scalable Anomaly Ranking of Attributed Neighborhoods. SDM 2016: 207-215
◼ Duen Horng Chau, Carey Nachenberg, Jeffrey Wilhelm, Adam Wright, Christos Faloutsos: Large Scale Graph
Mining and Inference for Malware Detection. SDM 2011: 131-142
◼ Si Zhang, Dawei Zhou, Mehmet Yigit Yildirim, Scott Alcorn, Jingrui He, Hasan Davulcu, Hanghang Tong: HiDDen:
Hierarchical Dense Subgraph Detection with Application to Financial Fraud Detection. SDM 2017: 570-578
◼ Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, Christos Faloutsos: FRAUDAR: Bounding Graph
Fraud in the Face of Camouflage. KDD 2016: 895-904
◼ Leman Akoglu, Hanghang Tong, Danai Koutra: Graph based anomaly detection and description: a survey. Data
Min. Knowl. Discov. 29(3): 626-688 (2015)
◼ Gao, Xinbo and Xiao, Bing and Tao, Dacheng and Li, Xuelong. A survey of graph edit distance. Pattern Anal. and
App.s 13 (1), pp. 113-129. 2010.
◼ Leman Akoglu, Hanghang Tong, Brendan Meeder, Christos Faloutsos. PICS: Parameter-free Identification of
Cohesive Subgroups in large attributed graphs. SDM, 2012.
◼ G. Wang, S. Xie, B. Liu, P. S. Yu. Review Graph based Online Store Review Spammer Detection. ICDM, 2011.
◼ H Qiao, H Tong, B An, I King, C Aggarwal, G Pang: Deep Graph Anomaly Detection: A Survey and New
Perspectives. arXiv preprint arXiv:2409.09957, 2024

131
◼ Appendix

132
Relational Markov Nets
◼ Undirected dependencies
◼ Potentials on cliques of size 1
◼ Potentials on cliques of size 2
❑ (label-attribute)

❑ (label-observed label)

❑ (label-label)

For pairwise
RMNs max
clique size is 2
133
pairwise Markov Random Field
◼ For an assignment y to all unobserved Y,
pMRF is associated with probability distr:

Node labels as compatibility


random variables potentials
(label-label)
“known”
potential
prior belief observed potentials
(1-clique potentials) (label-observed label)
(label-attribute)
134
135
pMRF interpretation
◼ Defines a joint pdf of all unknown labels
◼ P(y | x) is the probability of a given world y
◼ Best label yi for Yi is the one with highest
marginal probability
◼ Computing one marginal probability P(Yi = yi)
requires summing over exponential # terms

◼ #P problem → approximate inference →


loopy belief propagation

136
Loopy belief propagation
◼ Invented in 1982 [Pearl] to calculate marginals
in Bayes nets.
◼ Also used to estimate marginals (=beliefs), or
most likely states (e.g. MAP) in MRFs
◼ Iterative process in which neighbor variables
“talk” to each other, passing messages
“I (variable x1) believe
you (variable x2) belong
in these states with
various likelihoods…”
◼ When consensus reached, calculate belief
137
details
Loopy belief propagation
1) Initialize all messages to 1
2) Repeat for each node:
j

i k

k
3) When messages “stabilize”:
k

138
m1→2(SH) = (0.0096*0.9+0.0216*0.1) / (m1→2(SH) + m1→2(CH)) ~0.35
m1→2(CH) = (0.0096*0.1+0.0216*0.9) / (m1→2(SH) + m1→2(CH)) ~0.65
139
Loopy belief propagation
Advantages:
◼ Easy to program & parallelize
◼ General: can apply to any graphical model w/ any
form of potentials (higher order than pairwise)
Challenges:
◼ Convergence is not guaranteed (when to stop)
❑ esp. if many closed loops
◼ Potential functions (parameters)
❑ require training to estimate
❑ learning by gradient-based optimization:
convergence issues during training
140

You might also like