Lecture 1_Introduction
Lecture 1_Introduction
2B01)
Graph Analytics for Big Data
▪Coursework:
▪ One programming project
2
Course Outline
▪ Machine Learning and Representation Learning for graph data:
3
Thanh H. Nguyen 5/7/2025
Course Schedule (Tentative)
Lecture Topics
1 (May 07th) Introduction, Traditional Methods
2 (May 09th) Node Embedding, Link Analysis
3 (May 12th) Graph Neural Nets: Part 1
4 (May 14th) Graph Neural Nets: Part 2
5 (May 16th) Label Propagation, Heterogeneous Graphs
6 (May 19th) Knowledge Graphs
7 (May 21st) Subgraph Matching, GNN for Recommendations
8 (May 23rd) Deep Generative Models, Advanced Topics
9 (May 26th) Graph Transformer, Scaling Up GNNs
10 (May 28th) Selective Topics
11 (June 9th) Class Projects: Presentations
12 (June 9th) Class Projects: Presentations
4
Thanh H. Nguyen 5/7/2025
Prerequisites
▪ Background
▪ Machine learning
▪ Algorithms
▪ Probability and statistics
▪ Programming
▪ Write non-trivial Python programs
▪ Familiar with Pytorch
5
Thanh H. Nguyen 5/7/2025
Graph Machine Learning Tools
▪ PyG
▪ Link: https://www.pyg.org
▪ A library for Graph Neural Networks
▪ GraphGym
▪ Link: https://github.com/snap-stanford/GraphGym
▪ Platform for designing Graph Neural Networks.
▪ This platform is now supported in PyG
6
Thanh H. Nguyen 5/7/2025
Course Logistics
▪ All lecture slides and class discussions will be posted and held on
Slack accordingly.
▪ Lecture structures
▪ Two parts per class: approximately 90 minutes each part
▪ 30 minutes for coffee break, Q&A
7
Thanh H. Nguyen 5/7/2025
Course Logistics
▪ Readings
8
Thanh H. Nguyen 5/7/2025
Grading
▪ Class participation (10%)
▪ Attendance: each student is allowed to be absent from class <= two times
▪ Engagement (actively join discussions, Q&A)
9
Thanh H. Nguyen 5/7/2025
Class Projects:
Real-world Applications of GNNs
▪ Determine a specific use case (e.g., fraud detection)
10
Thanh H. Nguyen 5/7/2025
Class Projects: Tasks
▪ Identify an appropriate public dataset
▪ Formulate the use case as a clear graph ML problem
▪ Demonstrate how GNNs can be used to solve this problem.
▪ Important note:
▪ It is not enough to simply apply an existing GNN model to a new dataset
▪ Students are expected to provide some form of novelty: for example,
▪ Develop a new GNN method for a specific problem, improving on existing methods in a
non-trivial way,
▪ Or comprehensive analyses (ablation studies, comparison between multiple model
architectures) for the project.
11
Thanh H. Nguyen 5/7/2025
Class Projects: Examples
▪ Example graphs and datasets
▪ Open Graph Benchmark: https://ogb.stanford.edu
▪ Datasets available in PyG
▪ Application examples:
▪ Recommender systems
▪ Fraud detection in transaction graphs
▪ Friend recommendation
▪ Paper citation graphs
▪ Author collaboration networks
▪ Heterogeneous academic graphs
▪ Knowledge graphs
▪ Drug-drug interaction networks
12
Thanh H. Nguyen 5/7/2025
Class Projects: Components
▪ Project proposal (10%)
13
Thanh H. Nguyen 5/7/2025
Class Projects: Project Proposal (10%)
▪ Application domain
▪ Which dataset are you planning to use?
▪ Describe the dataset/task/metric.
▪ Why did you choose the dataset?
▪ Submission:
▪ Deadline: by June 8th.
▪ Format: NeurIPS2025 LaTeX style
▪ Message me the file on the class Slack channel.
16
Thanh H. Nguyen 5/7/2025
Machine Learning with Graphs:
Why Graphs?
17
Thanh H. Nguyen 5/7/2025
Many Types of Data are Graphs
3D Shapes
Code Graphs Molecules
20
5/7/2025
Graphs: Machine Learning
▪ Complex domains have a rich relational structure, which can be
represented as a relational graph
21
Thanh H. Nguyen 5/7/2025
Today: Modern ML Toolbox
Images
Text/Speech
23
Thanh H. Nguyen 5/7/2025
Hot Subfield in Machine Learning
ICLR 2023 keywords
24
Thanh H. Nguyen 5/7/2025
Why is Graph Deep Learning Hard?
Networks are complex
▪ Arbitrary size and complex topological structure (i.e., no spatial
locality like grids)
vs
26
Thanh H. Nguyen 5/7/2025
Components of A Network
27
Thanh H. Nguyen 5/7/2025
Graphs: A Common Language
Actor 1 Peter
Actor 2 Mary
Actor 3 Tom
Actor 4 John
Protein 1 Protein 6
Protein 7
Protein 4
28
Thanh H. Nguyen 5/7/2025
Choosing a Proper Representation
▪ If you connect individuals who work with
each other, you will explore a professional
network
29
Thanh H. Nguyen 5/7/2025
How to Define a Graph
▪ How to build a graph
▪ What are nodes?
▪ What are edges?
30
Thanh H. Nguyen 5/7/2025
Directed and Undirected Graphs
▪ Undirected ▪ Directed
▪ Links: undirected (symmetrical, ▪ Links: directed
reciprocal)
▪ Examples: ▪ Examples:
▪ Collaborations ▪ Phone calls
▪ Friendships on Facebook ▪ Following on Twitter (X)
▪ Other considerations
▪ Weights ▪ Types
Thanh H. Nguyen ▪ Properties ▪ Attributes 5/7/2025
31
Representing Graphs: Adjacency Matrix
32
Thanh H. Nguyen 5/7/2025
Adjacency Matrices are Sparse
33
Thanh H. Nguyen 5/7/2025
Networks are Sparse Graphs
Most real-world networks are sparse
35
Thanh H. Nguyen 5/7/2025
Representing Graphs: Adjacency List
▪ Adjacency list
▪ Easier to work with if network is
▪ Large
▪ Sparse
36
Thanh H. Nguyen 5/7/2025
Heterogeneous Graphs
▪ A heterogeneous graph is defined as: 𝐺 = 𝑉, 𝐸, 𝑅, 𝑇
37
Thanh H. Nguyen 5/7/2025
Many Graphs are Heterogeneous
Academic Graphs
Biomedical Knowledge Graphs
▪ Example node: Migraine ▪ Example node: ICML
▪ Example edge: (fulvestrant, Treats, Breast Neoplasms) ▪ Example edge: (GraphSAGE, NeurIPS)
▪ Example edge type (relation): Causes ▪ Example edge type (relation): pubYear
38
Thanh H. Nguyen 5/7/2025
Bipartite Graph
▪ Nodes can be divided into two disjoint sets U and V
▪ Every link connects a node in U to one in V
▪ U and V are independent sets
▪ Examples:
▪ Authors-to-Papers (they authored)
▪ Actors-to-Movies (they appeared in)
▪ Users-to-Movies (they rated)
▪ Recipes-to-Ingredients (they contain)
39
Thanh H. Nguyen 5/7/2025
More Types of Graphs
▪ Unweighted ▪ Weighted
40
Thanh H. Nguyen 5/7/2025
More Types of Graphs
▪ Self-edges (self-loop) ▪ Multi-graph
41
Thanh H. Nguyen 5/7/2025
Connectivity of Undirected Graphs
▪ Connected undirected graph
▪ Any two vertices can be joined by a path
42
Thanh H. Nguyen 5/7/2025
Connectivity Example
▪ The adjacency matrix of a network with several components can
be written in a block-diagonal form, so that nonzero elements are
confined to squares, with all other elements being zero:
43
Thanh H. Nguyen 5/7/2025
Connectivity of Directed Graphs
▪ Strongly connected directed graphs
▪ Have paths from each node to every other nodes
44
Thanh H. Nguyen 5/7/2025
Connectivity of Directed Graphs
▪ Strongly connected components (SCCs): not every node is part of a
non-trivial SCC
45
Thanh H. Nguyen 5/7/2025
Applications of Graph ML
46
Thanh H. Nguyen 5/7/2025
Different Types of Tasks
Node level
Edge level
47
Thanh H. Nguyen 5/7/2025
Node-Level Tasks
Machine
Learning
Node Classification
48
Thanh H. Nguyen 5/7/2025
Node-level Network Structure
▪ Goal: Characterize the structure and position of a node in the network
▪ Node degree
▪ Node importance and position
▪ E.g., Number of shortest paths passing through a node
▪ E.g., Avg. shortest path length to other nodes
49
Thanh H. Nguyen 5/7/2025
Example (1): Anomaly Detection
▪ Computer network
▪ Nodes: computers/machines
▪ Edges: connection/communication between computers
▪ Task: detect compromised computers
50
Thanh H. Nguyen 5/7/2025
Example (1): Anomaly Detection
Zhuo, Ming, Leyuan Liu, Shijie Zhou, and Zhiwen Tian. "Survey on security issues of routing and
anomaly detection for space information networks." Scientific Reports 11, no. 1 (2021): 22261. 51
Thanh H. Nguyen 5/7/2025
Example (2): Research Paper Topics
▪ Citation networks
▪ Node: research papers
▪ Edges: citations
▪ Task: predict topics of papers
Valmarska, Anita, and Janez Demšar. "Analysis of citation networks." PhD diss., Diploma Thesis, Faculty of
Computer and Information Science, University of Ljubljana, 2014. 52
Thanh H. Nguyen 5/7/2025
Link-level Prediction Task
▪ The task is to predict new/missing/unknown links based on the
existing links.
▪ At test time, node pairs (with no existing links) are ranked, and
top 𝐾 node pairs are predicted.
▪ Task: Make a prediction for a pair of nodes.
53
Thanh H. Nguyen 5/7/2025
Link Prediction as a Task
▪ Links missing at random
▪ Remove a random set of links and then aim to
predict them
54
Thanh H. Nguyen 5/7/2025
Example (1): Recommender Systems
▪ Users interacts with items
▪ Watch movies, buy merchandise, listen to music
▪ Nodes: Users and items
▪ Edges: User-item interactions
You might
also like
Items
55
Thanh H. Nguyen 5/7/2025
Example (2): Drug Side Effects
Many patients take multiple drugs to treat
complex or co-existing diseases
▪ 46% of people ages 70-79 take more than 5 drugs
▪ Many patients take more than 20 drugs to treat heart
disease, depression, insomnia, etc.
56
Thanh H. Nguyen 5/7/2025
Graph-level Tasks
57
Thanh H. Nguyen 5/7/2025
Graph-level Features
▪ Goal: We want make a prediction for an entire graph or a
subgraph of the graph.
▪ Example
58
Thanh H. Nguyen 5/7/2025
Example (1): Traffic Prediction
59
Thanh H. Nguyen 5/7/2025
Road Network as a Graph
▪ Nodes: Road segments
▪ Edges: Connectivity between road segments
▪ Prediction: Time of Arrival (ETA)
Stokes, Jonathan M., et al. "A deep learning approach to antibiotic discovery."
Cell 180.4 (2020): 688-702. 63
Thanh H. Nguyen 5/7/2025
Summary
Node level
Edge level
64
Thanh H. Nguyen 5/7/2025
Traditional ML Methods for Graphs
66
Thanh H. Nguyen 5/7/2025
Traditional ML Pipeline
▪ Design features for nodes/links/graphs
▪ Obtain features for all training data
67
Thanh H. Nguyen 5/7/2025
Traditional ML Pipeline
▪ Train an ML model ▪ Apply the model
▪ Logistic regression ▪ Given a new node/link/graph,
▪ Random forest obtain its features and make a
prediction
▪ Neural network, etc
68
Thanh H. Nguyen 5/7/2025
This Lecture: Feature Design
▪ Use effective features 𝑥 over graphs
▪ Traditional ML pipeline uses hand-designed features
69
Thanh H. Nguyen 5/7/2025
Machine Learning in Graphs
▪ Goal: Make predictions for a set of objects
▪ Design choices
▪ Features: d-dimensional vectors 𝑥
▪ Objects: Nodes, edges, sets of nodes, entire graphs
▪ Objective functions: What tasks are we aiming to solve?
70
Thanh H. Nguyen 5/7/2025
Machine Learning in Graphs
▪ Example: Node-level prediction
▪ Given: 𝐺 = 𝑉, 𝐸
▪ Learn a function: 𝑓: 𝑉 → ℝ
71
Thanh H. Nguyen 5/7/2025
Node-level Tasks and Features
72
Thanh H. Nguyen 5/7/2025
Node-Level Tasks
Machine
Learning
Node Classification
ML needs features 73
Thanh H. Nguyen 5/7/2025
Node-level Features: Overview
▪ Goal: Characterize the structure and position of a node in the
network
▪ Node degree
▪ Node centrality
▪ Clustering coefficient
▪ Graphlets
74
Thanh H. Nguyen 5/7/2025
Node Features: Node Degree
▪ The degree 𝑘𝑣 of node 𝑣 is the number of edges (neighboring
nodes) the node has
▪ Treat all neighboring nodes equally
75
Thanh H. Nguyen 5/7/2025
Node Features: Node Centrality
▪ Node degree counts the neighboring nodes without capturing their
importance
76
Thanh H. Nguyen 5/7/2025
Node Centrality: Eigenvector Centrality
▪ A node 𝑣 is important if surrounded by important neighboring
nodes 𝑢 ∈ 𝑁 𝑣
▪ We model the centrality of node 𝑣 as the sum of the centrality of
neighboring nodes:
1
𝑐𝑣 = 𝑐𝑢
𝜆
𝑢∈𝑁(𝑣)
𝜆 is normalizing constant (it will turn out to be
the largest eigenvalue of the adjacency matrix 𝐴)
77
Thanh H. Nguyen 5/7/2025
Node Centrality: Eigenvector Centrality
▪ Rewrite the recursive equation in the matrix form
1
𝑐𝑣 = 𝑐𝑢 𝜆𝒄 = 𝐴𝒄
𝜆
𝑢∈𝑁(𝑣) ▪ 𝐴: adjacency matrix
𝜆 is normalizing constant 𝐴𝑢𝑣 = 1 if 𝑢 ∈ 𝑁 𝑣
(largest eigenvalue of 𝐴) ▪ 𝒄: centrality vector
▪ 𝜆: eigenvalue
▪ Example:
79
Thanh H. Nguyen 5/7/2025
Node Centrality: Closeness Centrality
▪ A node is important if it has small shortest path lengths to all
other nodes
1
𝑐𝑣 =
σ𝑢≠𝑣 𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡 𝑝𝑎𝑡ℎ 𝑙𝑒𝑛𝑔𝑡ℎ 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑢 𝑎𝑛𝑑 𝑣
▪ Example:
80
Thanh H. Nguyen 5/7/2025
Node Features: Clustering Coefficient
▪ Measure how connected 𝑣’s neighboring nodes are:
81
Thanh H. Nguyen 5/7/2025
Node Features: Graphlets
▪ Observation: Clustering coefficient counts the number of triangles
in the ego-network
▪ Analogy:
▪ Degree counts #edges that a node touches
▪ Clustering coefficient: counts #triangles that a node touches
▪ Graphlet Degree Vector (GDV): Graph-based features for nodes
▪ Count #graphlets that a node touches
83
Thanh H. Nguyen 5/7/2025
Node Features: Graphlets
▪ Induced subgraph: is another graph formed from a subset of
vertices and all edges connecting vertices in the subset
85
Thanh H. Nguyen 5/7/2025
Node Features: Graphlets
▪ Graphlet Degree Vector (GDV): a vector with the frequency of the
node in each orbit position
Possible graphlets
on up to 3 nodes
▪ Example:
86
Thanh H. Nguyen 5/7/2025
Graphlet Degree Vector (GDV)
▪ Count #graphlets that a node touches at a particular orbit
87
Thanh H. Nguyen 5/7/2025
Graphlet Degree Vector: Example
▪ GDV of a node A:
▪ i-th element of GDV(A): #graphlets that touch A at orbit i
▪ Highlighted are graphlets that touch A at orbits 15, 19, 27, 35.
88
Thanh H. Nguyen 5/7/2025
Node-Level Feature: Summary
▪ Importance based features
▪ Node degree
▪ Different node centrality measures (eigenvector, betweenness, closeness)
▪ Structure-based features
▪ Node degree
▪ Clustering coefficient
▪ Graphlet count vector
89
Thanh H. Nguyen 5/7/2025
Node-Level Feature: Summary
▪ Importance-based features
▪ Node degree: count #neighboring nodes
▪ Node centrality:
▪ Model importance of neighboring nodes in a graph
▪ Different modeling choices: eigenvector centrality, betweenness centrality, closeness
centrality
90
Thanh H. Nguyen 5/7/2025
Node-Level: Summary
▪ Structure-based features: capture topological properties of local
neighborhood around a node
▪ Node degree: count #neighboring nodes
▪ Clustering coefficient: measure how connected neighboring nodes are
▪ Graphlet count vector: count the occurrences of different graphlets
91
Thanh H. Nguyen 5/7/2025
Link Prediction Task and Features
93
Thanh H. Nguyen 5/7/2025
Link-level Prediction Task: Recap
▪ The task is to predict new/missing/unknown links based on the
existing links.
▪ At test time, node pairs (with no existing links) are ranked, and
top 𝐾 node pairs are predicted.
▪ Task: Make a prediction for a pair of nodes.
94
Thanh H. Nguyen 5/7/2025
Link Prediction as a Task
▪ Links missing at random
▪ Remove a random set of links and then aim to
predict them
95
Thanh H. Nguyen 5/7/2025
Link Prediction via Proximity
▪ Methodology:
▪ For each pair of nodes 𝑥, 𝑦 , compute score 𝑐 𝑥, 𝑦
▪ For example: #common neighbors of 𝑥 and 𝑦
96
Thanh H. Nguyen 5/7/2025
Link-Level Feature: Overview
▪ Distance-based feature
▪ Local neighborhood overlap
▪ Global neighborhood overlap
97
Thanh H. Nguyen 5/7/2025
Distance-based Feature
▪ Shortest path distance between two nodes
▪ Example
98
Thanh H. Nguyen 5/7/2025
Local Neighborhood Overlap
▪ Capture #neighboring nodes shared between two nodes
▪ Common neighbors: 𝑁 𝑣1 ∩ 𝑁 𝑣2
▪ Example: 𝑁 𝐴 ∩ 𝑁 𝐵 =1
𝑁 𝑣1 ∩𝑁 𝑣2
▪ Jaccard’s coefficient:
𝑁 𝑣1 ∪𝑁 𝑣2
𝑁 𝐴 ∩𝑁 𝐵 1
▪ Example: =2
𝑁 𝐴 ∪𝑁 𝐵
1
▪ Adamic-Adar index: σ𝑢∈𝑁 𝑣1 ∩𝑁 𝑣2
log 𝑘𝑢
1 1
▪ Example: =
log 𝑘 𝐶 log 4
99
Thanh H. Nguyen 5/7/2025
Global Neighborhood Overlap
▪ Limitation of local neighborhood overlap
▪ Metric is always zero if the two nodes do not have any neighbors in common
𝑁 𝐴 ∩𝑁 𝐸 =0
▪ However, the two nodes may still potentially be connected in the future
▪ Compute #walks:
▪ Use powers of the graph adjacency matrix
101
Thanh H. Nguyen 5/7/2025
Intuition: Powers of Adj Matrices
▪ Compute #walks between two nodes
▪ Recall: 𝐴𝑢𝑣 = 1 if 𝑢 ∈ 𝑁 𝑣
𝑘
▪ Let 𝑃𝑢𝑣 = #𝑤𝑎𝑙𝑘𝑠 of length k between u and v
▪ We will show 𝑃 𝑘 = 𝐴𝑘
1
▪ 𝑃𝑢𝑣 = 𝐴𝑢𝑣 = #𝑤𝑎𝑙𝑘𝑠 of length 1 (direct neighborhood) between u and v
102
Thanh H. Nguyen 5/7/2025
Intuition: Powers of Adj Matrices
2
▪ How to compute 𝑃𝑢𝑣 ?
▪ Step 1: Compute #walks of length 1 between each neighbor of u and v
▪ Step 2: Sum up these #walks across u’s neighbors
2 1
𝑃𝑢𝑣 = 𝐴𝑢𝑖 ∗ 𝑃𝑖𝑣 = 𝐴𝑢𝑖 ∗ 𝐴𝑖𝑣 = 𝐴2𝑢𝑣
𝑖 𝑖
103
Thanh H. Nguyen 5/7/2025
Global Neighborhood Overlap
▪ Katz index between 𝑣1 and 𝑣2 is computed as:
104
Thanh H. Nguyen 5/7/2025
Link-Level Features: Summary
▪ Distance-based feature
▪ Use the shortest path length
▪ Does not capture how neighborhood overlaps
105
Thanh H. Nguyen 5/7/2025
Graph-Level Features and Graph Kernels
106
Thanh H. Nguyen 5/7/2025
Graph-Level Features
▪ Goal: characterize structure of an entire graph
▪ Example:
107
Thanh H. Nguyen 5/7/2025
Graph-Level Features: Overview
▪ Graph kernels: measure similarity between two graphs
▪ Graphlet kernel
▪ Weisfeiler-Lehman kernel
110
Thanh H. Nguyen 5/7/2025
Graph Kernel: Ideas
▪ Goal: design graph feature vector 𝜙 𝐺
▪ Key ideas: Bag-of-Words (BoW) for a graph
▪ Recall: BoW uses word counts as features for documents (no ordering)
▪ Naïve extension to a graph: consider nodes as words
▪ Limitation:
▪ Since both graphs have 4 nodes, we get the same feature vector for two different graphs
111
Thanh H. Nguyen 5/7/2025
Graph Kernel: Key Ideas
▪ What if we use Bag of node degrees?
112
Thanh H. Nguyen 5/7/2025
Graph-Level Graphlet Features
▪ Key idea: count #different graphlets in a graph
▪ Two differences:
▪ Nodes in graphlets here do not need to be connected
▪ Graphlets here are not rooted
113
Thanh H. Nguyen 5/7/2025
Graph-Level Graphlet Features
▪ Let 𝒢𝑘 = 𝑔1 , 𝑔2 , … , 𝑔𝑛𝑘 be a list of graphlets of size k.
▪ For k=3, there are 4 graphlets
114
Thanh H. Nguyen 5/7/2025
Graph-Level Graphlet Features
▪ Given graph G, and a graphlet list 𝒢𝑘 = 𝑔1 , 𝑔2 , … , 𝑔𝑛𝑘 , define the
graphlet count vector 𝑓𝐺 ∈ ℝ𝑛𝑘 as:
𝑓𝐺 𝑖 = # 𝑔𝑖 ∈ 𝐺 , ∀𝑖 = 1, 2, … , 𝑛𝑘
115
Thanh H. Nguyen 5/7/2025
Graph-Level Graphlet Features
▪ Example: k = 3
116
Thanh H. Nguyen 5/7/2025
Graph-Level Graphlet Kernel
▪ Given two graphs, G and G’, graphlet kernel is computed as:
𝐾 𝐺, 𝐺 ′ = 𝑓𝐺𝑇 𝑓𝐺 ′
▪ Problem:
▪ If G and G’ have different sizes, that will greatly skew the value.
𝑓𝐺
ℎ𝐺 = 𝐾 𝐺, 𝐺 ′ = ℎ𝐺𝑇 ℎ𝐺 ′
𝑠𝑢𝑚(𝑓𝐺 )
117
Thanh H. Nguyen 5/7/2025
The Graphlet Kernel
▪ Limitations: counting graphlets is expensive
▪ Counting size-k graphlets for a graph of size n by enumeration takes 𝑛𝑘
118
Thanh H. Nguyen 5/7/2025
Weisfeiler-Lehman Kernel
▪ Goal: Design an efficient graph feature description 𝜙 𝐺
119
Thanh H. Nguyen 5/7/2025
Color Refinement
▪ Given: A graph G with a set of nodes V.
▪ Assign an initial color 𝑐 0 𝑣 to each node 𝑣.
▪ Iteratively refine node colors by
𝑘+1 𝑘 𝑘
𝑐 𝑣 = 𝐻𝐴𝑆𝐻 𝑐 𝑣 , 𝑐 𝑢 𝑢∈𝑁 𝑣
120
Thanh H. Nguyen 5/7/2025
Color Refinement: Example
▪ Assign initial colors
121
Thanh H. Nguyen 5/7/2025
Color Refinement: Example
▪ Aggregate neighboring colors
122
Thanh H. Nguyen 5/7/2025
Color Refinement: Example
▪ Aggregate neighboring colors
123
Thanh H. Nguyen 5/7/2025
Weisfeiler-Lehman Graph Features
▪ After color refinement, WL kernel counts #nodes with a given
color
124
Thanh H. Nguyen 5/7/2025
Weisfeiler-Lehman Kernel
▪ The WL kernel is computed by the inner product of the color count
vectors
125
Thanh H. Nguyen 5/7/2025
Weisfeiler-Lehman Kernel
▪ Computationally efficient
▪ Time complexity for color refinement at each step is linear in #edges.
126
Thanh H. Nguyen 5/7/2025
Graph-Level Features: Summary
▪ Graphlet kernel
▪ Graph is represented as Bag-of-graphlets
▪ Computationally expensive
▪ Weisfeiler-Lehman kernel
▪ Apply K-step color refinement algorithm to enrich node colors
▪ Different colors capture different K-hop neighborhood structures
▪ Graph is represented as Bag-of-colors
▪ Computationally efficient
▪ Closely related to Graph Neural Nets (will study later)
127
Thanh H. Nguyen 5/7/2025
Summary
▪ Traditional ML pipeline
▪ Hand-crafted (structural) features + ML models
128
Thanh H. Nguyen 5/7/2025