0% found this document useful (0 votes)
35 views18 pages

A Survey of Dynamic Graph Neural Networks: Yanping ZHENG, Lu YI, Zhewei WEI

This review article surveys dynamic graph neural networks (GNNs), highlighting their importance in modeling real-world networks where structures and attributes evolve over time. It categorizes various dynamic GNN models based on how they incorporate temporal information and discusses their advantages, challenges, and future research directions. The paper aims to provide a comprehensive understanding of the state-of-the-art in dynamic GNNs, including representation learning techniques and large-scale applications.

Uploaded by

raed waheed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views18 pages

A Survey of Dynamic Graph Neural Networks: Yanping ZHENG, Lu YI, Zhewei WEI

This review article surveys dynamic graph neural networks (GNNs), highlighting their importance in modeling real-world networks where structures and attributes evolve over time. It categorizes various dynamic GNN models based on how they incorporate temporal information and discusses their advantages, challenges, and future research directions. The paper aims to provide a comprehensive understanding of the state-of-the-art in dynamic GNNs, including representation learning techniques and large-scale applications.

Uploaded by

raed waheed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Front. Comput. Sci.

, 2025, 19(6): 196323


https://doi.org/10.1007/s11704-024-3853-2

REVIEW ARTICLE

A survey of dynamic graph neural networks

Yanping ZHENG, Lu YI, Zhewei WEI ( ✉)


Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China

The Author(s) 2024. This article is published with open access at link.springer.com and journal.hep.com.cn

Abstract Graph neural networks (GNNs) have emerged as a integrated GNNs with sequence learning to develop dynamic
powerful tool for effectively mining and learning from graph- GNN models, such as the well-known TGAT [2], TGN [3],
structured data, with applications spanning numerous domains. and ROLAND [4]. These models aggregate features from
However, most research focuses on static graphs, neglecting the neighboring nodes and incorporate a time series module,
dynamic nature of real-world networks where topologies and enabling the modeling of both structural features and temporal
attributes evolve over time. By integrating sequence modeling dependencies within dynamic graphs. As a result, dynamic
modules into traditional GNN architectures, dynamic GNNs GNNs can generate evolving node representations that
aim to bridge this gap, capturing the inherent temporal accurately capture the progression of time.
dependencies of dynamic graphs for a more authentic depiction Although some work has surveyed methods for dynamic
of complex networks. This paper provides a comprehensive graph representation learning [1,5,6], with the continuous
review of the fundamental concepts, key techniques, and state- emergence of new methods and applications, the content of
of-the-art dynamic GNN models. We present the mainstream existing reviews has become outdated and cannot reflect the
dynamic GNN models in detail and categorize models based on latest research developments and technological trends.
how temporal information is incorporated. We also discuss Therefore, this paper offers a comprehensive review of recent
large-scale dynamic GNNs and pre-training techniques. developments in dynamic GNN models, encompassing
Although dynamic GNNs have shown superior performance, representation and modeling techniques for dynamic graphs,
challenges remain in scalability, handling heterogeneous along with a comparison of existing models. The taxonomy of
information, and lack of diverse graph datasets. The paper also existing methods is illustrated in Fig. 1. We also explore new
discusses possible future directions, such as adaptive and research directions, including large-scale applications and pre-
memory-enhanced models, inductive learning, and theoretical training strategies, summarize the current state of research,
analysis. and present an outlook on future trends.
Keywords graph neural networks, dynamic graph, temporal Influenced by the available public datasets, node
modeling, large-scale classification and link prediction are widely used as evaluation
tasks for dynamic GNNs in current research. They are
1 Introduction extensions of traditional static graph learning tasks to dynamic
graphs, where features and labels may involve changes in the
With the remarkable advances in deep learning across various
temporal dimension. Some studies also focus on domain-
application domains, Graph Neural Networks (GNNs) have
specific graph data to accomplish tasks such as knowledge
steadily emerged as a prominent solution for addressing
graph completion [7], stock price prediction [8], and traffic
problems involving complex graph-structured data. In the real
flow prediction [9]. It is important to note that there exists a
world, crucial information is often represented in the form of
category of dynamic graphs in literature where the graph
graphs, such as relationships within social networks, roads and
structure remains static, but the attributes of the nodes change
intersections in transportation systems, and protein interaction
over time. Such graphs are referred to as spatio-temporal
networks in bioinformatics. Compared to static graphs,
graphs, which are considered to be outside of the scope of this
dynamic graphs provide a more realistic representation of real-
survey. The main contributions are summarized as follows:
world systems and networks, as they can model structural and
attribute information that evolves over time [1]. ● We provide a comprehensive review of the fundamental
The nodes and edges of dynamic graphs may change over concepts and characteristics of dynamic GNNs,
time, making traditional static GNNs difficult to apply construct the knowledge system of dynamic GNNs, and
directly, which introduces new challenges for the analysis and systematically investigate the key techniques and
learning of dynamic graphs. Recently, researchers have models.
● We present in detail the current mainstream dynamic
Received October 30, 2023; accepted April 23, 2024
GNN models and analyze in depth their advantages and
E-mail: [email protected] disadvantages.
2 Front. Comput. Sci., 2025, 19(6): 196323

Fig. 1 The taxonomy of dynamic graph representation learning

● We discuss the challenges facing this field, highlighting adjacency matrix, with L = I − P representing the Laplacian
potential future research directions and technological matrix. σ indicates the nonlinear activation function.
trends.
2.2 Dynamic graphs
The remainder of this paper is organized as follows. In Dynamic graphs refer to intricate graph structures that involve
Section 2 we begin with important definitions and background temporal evolution. In these graphs, the nodes, edges, and
knowledge about dynamic graphs. The problem statement of properties within the graph demonstrate a state of continuous
dynamic graph representation learning is explicated in Section change. Dynamic graphs are commonly observed in various
3, which also provides a comprehensive review of dynamic domains, including social networks [10], citation networks
GNNs following the taxonomy demonstrated in Fig. 1. Section [11], biological networks [12], and so on. Taking social
4 focuses on models and systems designed for large-scale networks as an example, the registration of a new user results
dynamic graphs. Section 5 summarizes the commonly used in the addition of new nodes to the graph, while the
datasets, prediction tasks, and benchmarks. We find that new deactivation of existing users leads to a decrease in nodes.
techniques such as transfer learning and pretrained technique Moreover, when a user follows or unfollows another user, it
are recently being applied to deal with the problem of dynamic represents the creation or disappearance of edges in the graph.
graph learning, and we briefly review them in Section 6. We Additionally, attributes like a user’s age, location, and hobbies
then discuss possible future directions in Section 7 and can change over time, indicating that node attributes are also
conclude the paper in Section 8. dynamically evolving. Based on the granularity of time step,
dynamic graphs can be categorized into Discrete-Time
2 Notation and background Dynamic Graphs (DTDGs) and Continuous-Time Dynamic
2.1 Notation Graphs (CTDGs).
Table 1 summarizes the necessary notations used in this paper. Definition 1 (Discrete-time dynamic graphs). Taking
We consider the undirected graph G = (V, E) , where V and E snapshots of the dynamic graph at equal intervals results in a
are the vertex and edge set, respectively. The node feature discrete sequence of network evolution, defined as
matrix is denoted as X ∈ Rn×d , where n is the number of nodes G = {G0 ,G1 , . . . ,GT } , where Gt = (V t , E t ) (0 ⩽ t ⩽ T ) is the
in the graph, and d is the dimension of the feature vector of t th snapshot.
each node. The connection between nodes in the graph is As shown in Fig. 2(c), DTDGs decompose the dynamic
represented by the adjacency matrix A . D is the diagonal graph into a series of static networks, capturing the graph
degree matrix, and P = D− 2 AD− 2 denotes the normalized
1 1
structure at various selected time points. However, some
Table 1 Notations and the corresponding definitions. intricate details might be lost within the chosen intervals.
Moreover, determining time intervals that balance accuracy
Notation Description
The graph
and efficiency presents a significant challenge.
G
V The vertex set
Definition 2 (Continuous-time dynamic graphs). A CTDG is
E The edge set
n The number of nodes
typically defined as G = {G, S } , where G denotes the initial
state of the graph at time t0 . This initial state might be empty
X ∈ Rn×d The feature matrix
xi ∈ Rd The feature vector of node i or include initial structures (associated edges) but does not
A ∈ Rn×n The adjacent matrix contain any history of graph events. The set S =
D ∈ Rn×n The degree matrix {event1 , event2 , . . . , eventT } represents a collection of graph
P ∈ Rn×n The normalized adjacent matrix events. Each graph event is defined by a triple, which consists
L ∈ Rn×n The Laplacian matrix of the nodes participating in the event, its associated event
σ The nonlinear activation function type, and a timestamp. The event type signifies the various
G The dynamic graph possible modifications occurring within a graph, including the
Yanping ZHENG et al. A survey of dynamic graph neural networks 3

E[H(t + ∆t) − H(t) | H(t)]


λ∗ (t)= lim
∆t→0 ∆t
wt
= µ + g(t − s)dN(s), (1)
0
where H(t) represents historical events prior to time t , H(t)
denotes the number of historical events that occurred before
time t , and ∆t indicates the window size. The first equation
describes the probability that an event will occur in the region
beyond the observed data H(t) . Specifically, it represents the
probability that an event will occur at time t + ∆t based on
prior observations up to time t . The background intensity is
represented by µ , and the excitation function is denoted by
g(·) . Typically, g(·) is defined as an exponential decay
Fig. 2 Comparative illustration of different graph representations.
function, such as e−βt (β > 0) , which describes the
(a) Representation of a static graph where the structure remains unchanged phenomenon where the influence of an event diminishes as the
over time, without temporal information; (b) Illustration of a continuous-time time interval increases. According to the second equation,
dynamic graph (CTDG) where interactions between nodes are labeled by λ∗ (·) is a flexible function that depends only on the
timestamps, and multiple interactions are allowed; (c) Representation of a background intensity and the excitation function. Therefore,
discrete-time dynamic graph (DTDG), which captures the evolution of
relationships in discrete time intervals
for an observed event sequence {t1 , t2 , t3 , ...} , the intensity
function of the Hawkes process is defined as
addition or deletion of edges or nodes, as well as updates to ∑
λ∗ (t) = µ + αi g (t − ti ) , (2)
node features. Fundamentally, these modifications may result
ti <t
in changes to the structure or attributes of the graph.
As shown in Fig. 2(b), the initial graph G is gradually where αi represents the weight of the impact of each event at
time t .
updated according to the guidance of S . The structure of the
Initially, the Hawkes process was utilized for sequence
graph at time t = t9 is consistent with Fig. 2(a), and the
modeling [14,15]. Following the development of graph
complete process of change is recorded in S . It should be representation learning, HTNE [16] first introduced the
noted that an edge previously existed between node v0 and Hawkes process into dynamic graph representation learning. It
node v4 , which quickly disappeared. However, the DTDG models the evolutionary incentive of historical behaviors on
illustrated in Fig. 2(c) does not capture this dynamic. current actions, thereby simulating the graph generation
process. Lu et al. [15] proposed a dynamic graph
Remark To ensure clarity and consistency in this paper, representation model that combines macroscopic graph
unless otherwise specified, the graph at time t refers to the t th evolution with microscopic node edge prediction, thereby
version of the graph. This could mean either the t th graph capturing the dynamic nature of the graph structure and the
snapshot in a DTDG or a graph updated with events up to time rising trend of graph scale.
t from its initial state in a CTDG.
3 Graph representation learning
Definition 3 (Temporal neighbors). At time t , the temporal The objective of graph representation learning is to build low-
neighbor of node v is represented as N t (v) . Based on the dimensional vector representations of nodes while preserving
traditional neighbor set, each neighbor node w in the set is important information such as structural, evolutionary, or
associated with a timestamp t′ to form a (node, timestamp) semantic features [17]. This has emerged as a crucial concern
pair. It indicates that at time t′ , node v is connected to node w . in graph data mining and analysis in recent years. The learned
It is required that the timestamp t′ corresponding to a representations have extensive applications in various tasks,
neighboring node is smaller than the current timestamp t . In including node classification, link prediction, and
other words, N t (v) represents the collection of adjacent nodes recommendation.
resulting from events occurring prior to time t . It serves as a bridge between raw graph data and graph
analysis tasks by encoding each node into a vector space. This
2.3 Hawkes processes approach transforms graph learning tasks into corresponding
Introduced by Hawkes in 1971 [13], the Hawkes process is a problems in machine learning applications. For example, we
mathematical model for time-sequenced events. Based on the can effectively and efficiently address these graph analysis
assumption that past events influence current events, the challenges by using the learned representations as the input
Hawkes process simulates a point process with an underlying feature for traditional machine learning algorithms.
self-exciting pattern. It is appropriate for modeling the Additionally, we can further enhance this process by
leveraging suitable classification and clustering techniques for
evolution of discrete sequences. Given all historical events up
tasks such as node classification and community detection.
to time t , the probability of an event occurring in a short time
interval [t, t + ∆) at time t can be defined by the conditional Definition 4 (Graph representation learning). Given a graph
intensity function of the Hawkes process, denoted as G = (V, E) , the task of graph representation learning is to learn
4 Front. Comput. Sci., 2025, 19(6): 196323

an efficient mapping function F (v) → zv , where zv ∈ Rd successive snapshots. In continuous-time dynamics,


represents the low-dimensional representation of node v , and interactions between graph nodes are tagged with timestamps,
d ≪ n , n is the number of nodes in the graph. effectively differentiating each edge based on its timestamp.
Graph representation learning has been extensively studied Continuous-time dynamic graph representation learning refers
in recent years. Earlier methods attempted to decompose data to the methods that utilize this edge information with
matrices into lower-dimensional forms while preserving the accompanying timestamps for learning representations.
manifold structures and topological properties inherent to the By introducing matrix perturbation theory, the matrix
original matrices. Prominent examples of this approach decomposition-based representation learning method designed
include techniques such as Local Linear Embedding (LLE) for static graphs can be extended to the learning of dynamic
[18] and Laplacian Eigenmaps (LE) [19]. These matrix graphs. A few assumptions and operations form the basis for
factorization-based methods were further developed to matrix perturbation theory. First, it is assumed that the
incorporate higher-order adjacency matrices, thereby adjacency matrix of the current time step is a perturbed
preserving the structural information of graphs, such as version of the matrix from the previous time step. Then, a
GraRep [20] and HOPE [21]. Another strategy for graph common low-rank decomposition is performed on these two
embedding creates multiple paths by traversing from random matrices, ensuring that the representation matrix of the current
initial nodes to capture contextual information about global
time step is updated smoothly from the previous time step. By
and local structural information of neighboring nodes, namely
constraining the magnitude of the perturbation, it is possible to
random walk-based methods. Probabilistic models such as
regulate the smoothness of the representation between time
Skip-gram [22] and Bag-of-Words [23] are then employed to
steps [32,33]. However, this method is more suited to dealing
sample these paths and learn node representations randomly,
and typical examples include DeepWalk [24] and Node2vec with DTDGs than CTDGs, as it may face challenges such as
[25]. The success of deep learning techniques has paved the high computational complexity, difficulties in real-time
way for an abundance of deep neural network-based graph updates, and inadequacies in capturing temporal dependencies
representation learning techniques. Graph embedding methods on frequently changing CTDGs.
based on deep learning typically apply deep neural networks By requiring that random walks on graphs strictly follow the
directly to the entire graph (or its adjacency matrix), encoding order of timestamps, it is possible to perform temporal walks
it into a lower-dimensional space. In this field, notable on dynamic graphs, particularly suitable for CTDGs. For
methods include SDNE [26], DNGR [27], ChebNet [28], example, CTDNE [34] incorporates a biased sampling
GCN [29], GAT [30], and GraphSAGE [31]. mechanism that exhibits temporal information during the
Unlike traditional static graphs, which focus solely on sampling process, ensuring a higher probability of sampling
maintaining structural features, dynamic graphs also the edges with smaller time intervals. EvoNRL [35], on the
incorporate evolutionary features. Dynamic graph other hand, designs strategies for four distinct scenarios in
representation learning employs graph representation dynamic graphs, i.e., the addition and removal of edges or
techniques for dynamic graphs and tracks the evolution of nodes, updating node representations with each change of the
node representations. As the graph evolves over time, the graph.
representations of the nodes adapt accordingly. A time- Traditional methods rely more frequently on manually
dependent function expresses this evolution, and we denote crafted features and customized models, resulting in limited
the updated representation vector following an event involving learning and adaptability. Rooted in deep learning principles,
node v at time t as ztv . GNNs have shown a remarkable ability to learn complex
patterns and features in static graphs, exhibiting improved
Definition 5 (Dynamic graph representation learning). Given
representation learning performance as well as superior
a dynamic graph G evolving over time, the task of dynamic
generalization abilities. As a result, there has been a growing
graph representation learning is to learn an efficient mapping
interest in utilizing GNN-inspired methods in the field of
function F (v, t) → ztv , where ztv ∈ Rd represents the time-
dynamic graph learning. These approaches leverage the
dependent low-dimensional representation of node v at time t .
powerful learning capabilities of GNNs to enhance the
Here, d ≪ nt , and nt is the number of nodes in the graph at
comprehension of dynamic graphs that exhibit complex
time t .
temporal evolution patterns, which are known as Dynamic
3.1 Models for dynamic graphs Graph Neural Networks (Dynamic GNNs). Recently, there has
Corresponding to the different types of dynamic graphs been an explosion of work on the design of Dynamic GNNs.
described in Definitions 1 and 2, dynamic graph Therefore, we introduce Dynamic GNNs that deal with
representation learning models can be divided into two main DTDGs and CTDGs in Sections 3.2 and 3.3, respectively.
categories: discrete-time dynamic graph representation These two types of methods capture temporal evolution
learning and continuous-time dynamic graph representation patterns in various dynamic graphs and have achieved
learning. The former divides the dynamic graph into multiple excellent performance in several dynamic graph analysis
snapshots, which can be regarded as multiple static graphs. tasks. To highlight the strengths and weaknesses of each
Consequently, methods designed for static graphs can be used method, we provide an in-depth comparison in Table 2,
to learn representations for each snapshot, while also focusing on the base models used, the types of graphs
analyzing interrelationships and evolutionary trends between supported, and the graph events handled.
Yanping ZHENG et al. A survey of dynamic graph neural networks 5

Table 2 Comparison of graph neural network methods in literature review. √ represents that the method supports the corresponding graph event, × means it
does not support that event, and □* denotes that it considers only static attributes
Insertion Deletion Attribute
Method Models Graph
Node Edge Node Edge Node Edge
WD-GCN [36] GCN with skip connections and LSTM DTDG × √ × √ √ √
DySAT [37] GAT and Transformer DTDG √ √ √ √ √ √
TNDCN [38] GNNs and spatio-temporal convolutions DTDG × √ × √ √ √
TEDIC [39] GNNs and spatio-temporal convolutions DTDG × √ × √ √ √
GC-LSTM [40] GCN and LSTM DTDG × √ × √ √ √
LRGCN [41] R-GCN and LSTM DTDG × √ × √ √ √
RE-Net [42] integrate R-GCN within RNNs DTDG × √ × √ × ×
WinGNN [43] GNN and randomized sliding-window DTDG √ √ √ √ □* ×
ROLAND [4] GNNs and RNNs DTDG √ √ √ √ √ √
EvolveGCN [44] GCN and LSTM/GRU DTDG √ √ √ √ √ √
SEIGN [45] GCN and LSTM/GRU DTDG √ √ √ √ √ √
dyngraph2vec [46] AE and LSTM DTDG √ √ √ √ × ×
Know-Evolve [47] RNN-parameterized TPP CTDG × √ × × × ×
DyREP [48] RNN-parameterized TPP CTDG √ √ × × × ×
LDG [49] RNN-parameterized TPP and self attention CTDG √ √ × × × ×
GHNN [50] LSTM-parameterized TPP CTDG × √ × × × ×
GHT [51] LSTM-parameterized TPP CTDG × √ × × × ×
TREND [52] Hawkes process-based GNN CTDG √ √ × × □* ×
DGNN [53] GNNs and LSTM CTDG √ √ × × × ×
JODIE [54] GNNs and RNNs CTDG √ √ × × □* √
TGAT [2] GAT, time encode and RNNs CTDG √ √ × × □* √
TGN [3] GNNs, time encode and RNNs CTDG √ √ × × □* √
APAN [55] GNNs and RNNs CTDG √ √ × × □* √
CAW-N [56] causal anonymous walk based GNN CTDG √ √ × × × √
Zebra [57] temporal PPR based GNN CTDG √ √ × × √ √
EARLY [58] GCN and temporal random walk CTDG √ √ √ √ √ √
Zheng et al. [59] decoupled GNNs and sequential models CTDG √ √ √ √ √ √
SDG [60] APPNP and dynamic propagation CTDG × √ × √ √ √

3.2 Dynamic GNNs for DTDGs follows:


DTDGs are composed of multiple snapshots that are ( ) ( )
chronologically organized and can be modeled as sequential Zt = f Gt , Ht = g Ht−1 , Zt , (3)
data, as described in Definition 1. Therefore, temporal patterns where f is a GNN model used to encode Gt and obtain the
in DTDGs are measured by the sequential relationships graph representation Zt at this time step. g is a sequence
between different snapshots. The prevailing research model such as RNN, used to integrate the previous hidden
methodology involves employing static graph neural network state Ht−1 and current graph representation Zt to update the
models to represent each graph snapshot individually. These current hidden state Ht . Manessi et al. [36] use this
snapshots are subsequently organized chronologically and architecture to propose CD-GCN and WD-GCN models,
treated as sequential data, which can be input into sequence where f is a GCN model and g uses a standard Long Short-
models to comprehensively determine the interplay between Term Memory (LSTM). Notably, each node utilizes a distinct
different graph snapshots and learn temporal patterns. As a LSTM, though the weights can be shared. The difference
representative sequence model, Recurrent Neural Networks between WD-GCN and CD-GCN is that CD-GCN adds skip
(RNNs) are often combined with GNNs to form the connections to GCN.
mainstream dynamic GNNs for DTDGs. Based on how they RNNs are not the only available option for learning time
are combined, they can be broadly classified as stacked series. In recent studies, GNNs have also been used in
architectures or integrated architectures. combination with other kinds of deep time series models. For
instance, DySAT [37] introduces a stacked architecture made
Stacked dynamic GNNs combine spatial GNNs with up entirely of self-attention blocks. They use attention
temporal RNNs in a modular fashion to model discrete mechanisms in both the spatial and temporal dimensions,
sequences of dynamic graphs. As shown in Fig. 3(a), the employing Graph Attention Networks (GAT) [30] for the
distinct components are dedicated to spatial and temporal spatial dimension and Transformers [14] for the temporal
modeling, respectively. Specifically, these models use distinct dimension. Similarly, models like TNDCN [38] and TEDIC
GNNs to aggregate spatial information on each graph snapshot [39] combine GNNs with one-dimensional spatio-temporal
individually. The outputs from the different time steps are then convolutions.
fed into a temporal module for sequence modeling. Therefore,
given a graph Gt at time t , the representation is computed as Integrated dynamic GNNs merge GNNs and RNNs within a
6 Front. Comput. Sci., 2025, 19(6): 196323

Fig. 3 Different model architectures for dynamic graphs. (a) Stacked dynamic GNN for DTDGs; (b) integrated dynamic GNN for DTDGs;
(c) dynamic GNN for CTDGs

single network layer, thus amalgamating spatial and temporal dynamic knowledge graphs. WinGNN [43] utilizes the idea of
modeling within the same architectural level, as shown in randomized gradient aggregation over sliding windows to
Fig. 3(b). GC-LSTM [40] uses the adjacency matrix of a given model temporal patterns without extra-temporal encoders.
time as input to the LSTM and performs spectral graph Specifically, WinGNN introduces a stochastic gradient
convolution [28,61] on the hidden layer. The model takes a aggregation mechanism based on sliding windows, gathering
sequence of adjacency matrices as input and returns node gradients across multiple consecutive snapshots to attain a
representations that encode both temporal and spatial global optimum. In addition, an adaptive gradient aggregation
information. Within this framework, the GNN learns the strategy is designed to assign different weights to gradients
topological features of the LSTM’s cell and hidden states, from different snapshots, overcoming the influence of local
which are used to maintain long-term relationships and extract optima. It also introduces a snapshot random dropout
input information, respectively. This implies that the node mechanism to enhance the robustness of gradient aggregation
representations provided by the GNN are distinct from the further.
node history captured by the LSTM. Let fi , ff , fu , fo , and fc ROLAND [4] extends the GNN-RNN framework by
represent the five GNN models acting on the input gate, forget utilizing hierarchical node states. Specifically, You et al. [4]
gate, update, output gate, and cell state, respectively. They propose to stack multiple GNN layers and interleave them
possess the same structure but different parameters. The with a sequence model such as RNN, where the node
parameters of fi , ff , fu , and fo are initialized by the hidden embedding of different GNN layers is regarded as hierarchical
state Ht−1 from the previous time step, while fc is initialized node states that are continuously updated over time. In
based on the memory Ct−1 : contrast to traditional GNNs, ROLAND uses not only the
( ( ) ) embedding information Ht
(ℓ−1)
from the previous layer but
it = σ At Wi + fi Gt−1 + bi , (ℓ)
( ( ) ) also the information Ht−1 from the previous moment of the
f t = σ At Wf + ff Gt−1 + bf , current layer, when updating the embedding matrix Ht at
(ℓ)
( ) ( ( ) )
Ct = f t ⊗ fc Gt−1 + it ⊙ tanh At Wc + fu Gt−1 + bc , time t . Thus, the (ℓ) th layer of the framework is
( ( ) ) ( ) ( )
ot = σ At Wo + fo Gt−1 + bo , (ℓ)
Ht = f (ℓ) Ht
(ℓ−1)
, H(ℓ) = g H(ℓ)
, H
(ℓ)
, (5)
( ) t t−1 t
Ht = ot ⊙ tanh Ct , (4) where f (ℓ) is the (ℓ) -th GNN layer, g is the sequence encoder.
where {Wi , Wf , Wo , Wc } and {bi , bf , bc , bo } are the weight and Bonner et al. [62] propose stacking a GCN layer, a Gated
bias matrices, respectively, and At is the adjacency matrix of Recurrent Unit (GRU) layer [63], and a linear layer to form a
Gt . Other integrated approaches often follow a similar Temporal Neighborhood Aggregation (TNA) layer, and
framework, differing perhaps in the GNN or RNN models employ a two-layer model to implement 2-hop convolution
they employ, the use cases, or the types of graphs they focus combined with variational sampling for link prediction.
on. The Long Short-Term Memory R-GCN (LRGCN) [41] Instead of updating node representations with RNNs as in
adopts a similar strategy, incorporating topological variations previous methods, EvolveGCN [44] uses RNNs to update the
directly into its computations. Specifically, LRGCN [41] weight parameters of GCNs between graph snapshots,
integrates a Relational GCN (R-GCN) [41] within the LSTM facilitating dynamic adaptation and model performance
framework. The computations for the input gate, forget gate improvement. Building on this, they introduced two variants:
and output gate are the results of the R-GCN model acting on EvolveGCN-H, employing a GRU to enhance temporal
the input node representations and embeddings computed stability, and EvolveGCN-O, using an LSTM to capture
from the previous step. Meanwhile, RE-Net [42] integrates R- longer dependencies. To improve scalability, SEIGN [45]
GCN [41] into multiple RNNs, enabling it to learn from focuses on evolving parameters associated with the graph
Yanping ZHENG et al. A survey of dynamic graph neural networks 7

filter, specifically the convolutional kernel, rather than the two custom RNNs are used to update node embeddings as new
entire GNN model. graph events occur.
Based on a similar strategy, Trivedi et al. proposed DyRep
Autoencoder-based approaches are trained distinctively [48], a method for modeling multiple types of graphs. DyRep
compared to the dynamic graph neural network models develops a pairwise intensity function that quantifies the
discussed above. They leverage dynamic graph embeddings to strength of relationships between nodes. It further employs
capture both the topology and its temporal changes. DynGEM self-attention mechanisms to construct a temporal point
[64], for instance, employs an autoencoder to learn the process, which enables the model to prioritize relevant
embeddings from each snapshot of the DTDG. Interestingly, information for the current task, filtering out irrelevant details.
the autoencoder at time t initializes its parameters based on
the autoencoder from time t − 1 . The architecture of this
Therefore, the structural
({ ( embedding ) for node})u is defined as:
¯
htstruct (u) = max σ qt (u, i) · ht¯(i) , ∀i ∈ N t¯(u) , where ht¯(i) =
autoencoder, including the number of layers and neurons in
each layer, is adjusted based on the differences between the Wh zt¯(i) + bh , zt¯(i) is the embedding of the most recent node i ,
current snapshot and its predecessor. The Net2Net [65] t¯ represents the most recent time node i was active, and q(u, i)
technique is utilized to maintain the mapping relationships denotes the aggregation weight corresponding to the neighbor
across layers when modifications are made to the autoencoder. node i . Specifically, the model computes the relationship
To effectively capture the relationships and changes within between node u and its neighbor node i ∈ N t (u) by
( )
dynamic graphs, dyngraph2vec [46] employs a series of exp St (u, i)
adjacency matrices {At−w+1 , . . . , At−1 } within a time window as qt (u, i) = ∑ ( ), (6)
t¯ ′
i′ ∈N t (u) exp S (u, i )
input, where w represents the size of the time window. An
autoencoder is then utilized to obtain the adjacency matrix At where St ∈ Rn×n is the attention matrix, which represents the
for the current timestep t . fraction of attention between pairs of nodes at time t . S is
computed and maintained by the adjacency matrix A and the
3.3 Dynamic GNNs for CTDGs conditional intensity function, and the conditional intensity
As defined in Definition 2, CTDGs are not stored as snapshots function between node u and node v is defined as
with fixed intervals, but each event is recorded as a triplet ( )
(node, event type, timestamp), e.g., ((u, v), EdgeAddition, t) λt (u, v) = f gt¯(u, v) , (7)
denotes an edge insertion event between nodes u and v at time where gt¯(u,v) is an internal function that computes the
t . Therefore, each event in a continuous dynamic graph has a
compatibility [ between ] two recently updated nodes:
timestamp, which smoothly tracks the evolution of the graph
gt¯(u, v) = wT · zt¯(u) : zt¯(v) , and : denotes the concatenation,
structure and allows for more accurate analysis of temporal
relations and dependencies between events. In addition, wT is the learnable scale-specific compatibility model
discrete graphs synchronously aggregate time-slice parameter. ( f (·) uses ( )) the modified softplus function
information and cannot distinguish the sequence of events, f (x) = ψ log 1 + exp ψx , where x represents gt (·) , ψ > 0 is
whereas continuous graphs specify the temporal order of the learnable scalar time parameter corresponding to the
events and support modeling of asynchronous time intervals. proportion of relevant events occurring during the process.
Approaches to learning continuous dynamic graphs revolve Latent Dynamic Graph (LDG) [49] is an enhanced version
around how to effectively capture and characterize the of DyREP [48] that integrates Neural Relational Inference
evolution of graph structure over time, as shown in Fig. 3(c). (NRI) [66] to encode temporal interactions, thereby
There are currently three classes of approaches to continuous- optimizing the attention mechanism and resulting in improved
type dynamic graph neural networks: (1) temporal point performance. Graph Hawkes Neural Network (GHNN) [50]
process-based approaches, in which the temporal point process and Graph Hawkes Transformer (GHT) [51] model temporal
is parameterized by a neural network; (2) RNN-based dependencies between nodes using Hawkes processes, and
approaches, in which node representations are maintained by estimate the intensity functions using continuous-time LSTM
an RNN-based architecture; and (3) temporal random walk- and Transformer respectively to learn time-evolving node
based approaches, whose essence is to capture both temporal representations. TREND [52] also designs a Hawkes process-
and structural properties of graphs through temporal random based GNN to learn representations for CTDGs, where the
walks. parameters of the conditional intensity are learned through a
Fully Connected Layer (FCL), and Feature-wise Linear
Temporal point process-based methods. Know-Evolve [47] Modulation (FiLM) is used to learn the parameters of the
models knowledge graphs as interactive networks, where FCL.
temporal point processes are used to model the occurrence of
events. It defines a bilinear relationship function for RNN-based methods. These models use RNNs to maintain
representing multi-relational interactions between nodes. The node embeddings in a continuous manner, which allows them
central tenet of the method is to model the occurrence of facts to respond to graph changes in real time and update the
as a multivariate temporal point process, whose conditional representations of affected nodes. Specifically, such models
intensity function is modulated by the score of the relevant define an RNN unit to track the evolution of each node
relationship. This relationship score is derived from embedding. When the graph changes, the RNN unit can
embeddings of nodes that evolve dynamically. In addition, calculate the new hidden state at the current time t based on
8 Front. Comput. Sci., 2025, 19(6): 196323

the current input and the hidden state at the previous moment. layer specifically designed to incorporate feature data from
Therefore, one of the main differences between these methods historical neighbors of a target node.
lies in how they define the embedding function and customize TGN [3] presents a unified approach by integrating the
the RNN. methodological frameworks of TGAT [2], JODIE [54], and
DGNN [53] consists of an update component and a DyREP [48], which is composed of five core modules:
propagation component. When introducing a new edge, the Memory, Message Function, Message Aggregator, Memory
update component maintains the freshness of node Updater, and Embedding. The Memory module stores
information by capturing the order of edge additions and the historical information about nodes, and the Embedding
time interval between interactions, implemented by the LSTM module produces real-time representations. Whenever an
model. The propagation component propagates the new event involving a node occurs, the corresponding memory is
interaction information to the affected nodes by considering updated. This update function can be learning models like
the intensity of the influence. RNNs or non-learning methods like averaging or aggregation,
JODIE [54] is composed of two operations: the update and thus take advantage of previous messages to refresh the
operation and the projection operation. The former is used to memory during training, facilitating the generation of node
update the embedding information of nodes, while the latter is embeddings. Furthermore, by ingeniously architecting each
used to predict future embedding trajectories. Moreover, each module, TGN can encapsulate a variety of models, including
node has a static embedding to represent its fixed attributes, the likes of JODIE [54], DyRep [48], and TGAT [2].
and a dynamic embedding to reflect its current state. JODIE During online training, it cannot be guaranteed that graph
focuses on interaction network data. In the update operation, update events arrive in timestamp order, which will make
the embeddings of the user node u and the item node i are RNN-based embedding generation unstable. APAN [55]
updated by two RNNs that have the same structure but addresses this issue by employing an asynchronous
different parameters: propagation mechanism. It allows messages to propagate
( ) through the graph asynchronously by specifying the number of
zt (u)= σ Wu,1 zt¯(u) + Wu,2 zt¯(i) + Wu,3 eu,i + Wu,4 ∆u ,
( ) propagation steps and thus ensures events are stored in order
zt (i)= σ Wi,5 zt¯(i) + Wi,6 zt¯(u) + Wi,7 eu,i + Wi,8 ∆i , (8) of their timestamps.
where zt¯(u) and zt¯(i) represent the previous embeddings of Temporal random walk-based methods. Inspired by models
node u and node i respectively. ∆u and ∆i represent the time such as DeepWalk [24] and Node2vec [25], Nguyen et al. [34]
difference from the last interaction. eu,i is the feature vector first proposed a temporal random walk based on the properties
corresponding to the current interaction between node u and of dynamic graphs, where a random walk consists of a
node i . The sets {Wu,1 , . . . , Wu,4 } and {Wi,5 , . . . , Wi,8 } sequence of edges with non-decreasing timestamps. Therefore,
represent the parameters for the two RNN, RNNu and RNNi , the movement from one vertex to another is dictated not only
respectively. σ is the sigmoid function. by the adjacency of the nodes but also by temporal constraints,
The embedding projection operation is designed to generate ensuring that the walks align with both the graph's structure
the future embedding trajectory of a user, and this embedding and the sequence of time.
can be applied to downstream tasks. The projection operation Starting from links of interest, Causal Anonymous Walk
takes two inputs: (1) the embedding of node u at time t , (CAW) [56] traces back several neighboring links to encode
denoted as zt (u) , and (2) the time difference, ∆ . By projecting the latent causal relationships in network dynamics. During
zt (u) , the representation of node u over a certain time interval walks, CAW [56] removes node identities to facilitate
can be learned as: inductive learning and instead encodes relative node identities
( ) based on their frequencies in a set of sampled walks. This
ẑt+∆ (u) = 1 + W p ∆ ∗ zt (u), (9) method of relative node identification ensures that pattern
where W p is the learned matrix. As ∆ increases, the offset of structures and their correlations are preserved after node
the projection embedding becomes larger. Assume that item j identities are removed, achieving set-based anonymization. To
interacts with user u at time t + ∆ , the embedding of node j at predict temporal links between two nodes, the CAW-Network
time t + ∆ is predicted based on a fully connected linear layer: (CAW-N) model is proposed. It samples some CAWs
associated with the two nodes, encodes and aggregates these
zt+∆ ( j) = W1 ẑt+∆ (u) + W2 z̄(u) + W3 zt+∆ (i) + W4 z̄(i) + B, (10) CAWs using RNNs and set pooling, respectively, to make
where z̄(u) and z̄(i) represent the static embeddings of nodes u predictions.
and i , respectively. W1 , W2 , W3 , W4 and B are the parameters Zebra [57] introduces the Temporal Personalized PageRank
of this linear layer. (T-PPR) metric that uses an exponential decay model to
Inspired by Time2Vec [67], TGAT [2] introduces a time estimate the influence of nodes in dynamic graphs. Given a
encoding function based on the Bochner theorem [68], which source node (i, τ1 ) and a target node ( j, τ2 ) , where (i, τ1 )
can map continuous time to a vector space. This method represents node i at time τ1 , the T-PPR value πi,τ1 ( j, τ2 ) of the
presents a viable substitute for position encoding within self- target node ( j, τ2 ) with respect to the source node (i, τ1 ) is
attention mechanisms, effectively leveraging temporal defined as the probability of starting a temporal random walk
information. By utilizing the improved self-attention from (i, τ1 ) with a parameter α (0 < α < 1) , and eventually
framework, TGAT [2] presents a temporal graph attention stopping at ( j, τ2 ) . T-PPR operates under the premise that
Yanping ZHENG et al. A survey of dynamic graph neural networks 9

recent neighboring nodes hold paramount importance. optimization. Following this framework, SEIGN [45] employs
Therefore, by applying an exponential decay model a parameter-free messaging strategy for preprocessing and
parameterized by β (0 < β < 1) , it ensures that the probability uses the Inception architecture to obtain multi-scale node
of traversing the most distant ones is the lowest among all the representations. Similarly to EvolveGCN [44], SEIGN updates
historical neighbors of a given node. For example, suppose the convolutional kernel weights with GRU modules for each
source node (v1 , t6 ) has three historical neighbor nodes (v4 , t2 ) , graph snapshot to maintain their freshness. However, instead
(v2 , t4 ) and (v3 , t5 ) , assigned with transition probabilities β3 , of evolving the entire GNN model, SEIGN refines only the
β2 , β . According to the definition, the T-PPR value of (v3 , t5 ) parameters of the convolutional kernel, thereby enhancing the
β method’s scalability. Furthermore, SEIGN introduces the
can be computed as: πv1 ,t6 (v3 , t5 ) = β+β2 +β3 × α(1 − α) . Based
evolution of node representations, increasing the
on the T-PPR values, Zebra [57] can discern the top- k
expressiveness of the final node representation. Additionally,
temporal neighbors with the most significant impact on the
it facilitates efficient graph mini-batch training, further
target node. It directly aggregates these k high-influence
enhancing the model’s scalability.
temporal neighbor nodes to generate the target node’s
EARLY [58] reduces computational costs by updating only
embedding, avoiding complex recursive temporal neighbor
the top- k nodes most susceptible to event impacts to meet the
aggregation.
frequent changes in CTDGs. The selection of these top- k
4 Processing and learning on large-scale nodes is guided by two primary metrics: local and global
dynamic graphs scores. Local scores emphasize the theoretical analysis of
Several strategies are available for managing large-scale static graph convolution networks, addressing both the effect of
graphs. First, to improve processing efficiency, distributed node updates and the variations in random walk probabilities.
parallel training can be implemented by dividing the graph On the other hand, global scores emphasize the node
across multiple devices for simultaneous processing, which influence. Ultimately, an information gain function integrates
ensures data integrity. Second, to enhance throughput and these scores to identify the most influential top- k nodes.
optimize resource use, techniques such as pipeline parallelism, Inspired by scalable GNNs on static graphs [69], Zheng
data parallelism, and model parallelism are widely employed. et al. [59] proposed a decoupled GNN model that separates the
Third, to address computational complexity, sampling graph propagation process and the training process for the
methods and neighbor truncation are utilized while preserving downstream tasks, where the unified dynamic propagation
the integrity of critical data. Meanwhile, efficient process can efficiently handle both continuous and discrete
communication methods, such as peer-to-peer exchanges, are dynamic graphs. By leveraging the identity equation of the
often implemented to reduce data transfer times. graph propagation process, the model locates affected nodes
However, managing large-scale graphs becomes and quantifies the impact on their representations from graph
considerably more challenging when dealing with dynamic changes. Other node representations are adjusted in the next
graphs. For example, if we employ mini-batch training step. This approach allows incremental updates from prior
techniques, the model must maintain a sufficient history results, avoiding costly recalculations. Additionally, since
horizon, optimize the memory update mechanism for nodes, computations related to the graph structure only occur during
and transfer information using memory synchronization after propagation, subsequent training for downstream tasks can be
partitioning into subgraphs. For temporal sequential performed separately without involving expensive graph
processing, the main approaches include using time encoding operations. Therefore, arbitrary sequence learning models can
to capture temporal information, generating batches of data in be plugged in. Similarly, SDG [60] employs a dynamic
chronological order, and ensuring that all data are traversed in propagation matrix based on personalized PageRank,
an overall sequential manner. Collectively, these replacing the static probability transition matrix of APPNP
methodologies and fundamental prerequisites contribute to
[70]. Specifically, SDG first extracts hidden node features
enhancing parallelism, scalability, and efficient processing
using a model-agnostic neural network. These features are
capabilities for large dynamic graphs.
then multiplied with the dynamic propagation matrix to
Existing approaches can be grouped into two primary lines.
disseminate neighborhood node information. The dynamic
One approach gives precedence to scalability in the design of
dynamic GNNs, resulting in various algorithms aimed at propagation matrix tracks the steady-state distribution of
efficient learning on dynamic graphs. The second focuses on random walks following graph topology changes using the
the development of scalable training frameworks that enable push-out and add-back algorithm, eliminating redundant
the adaptation of existing algorithms to large-scale dynamic propagation computations in dynamic graphs.
graphs.
4.2 Scalable training frameworks for dynamic GNNs
4.1 Scalable GNNs for dynamic graphs Frameworks designed for DTDGs Memory management
Instead of adapting training frameworks to fit existing optimization to support efficient model parallelism is a
algorithms for large-scale dynamic graph learning, several prevalent method for enhancing the training efficacy of GNNs
studies have suggested developing algorithms specifically for DTDGs. To optimize the end-to-end training performance
designed for large-scale dynamic graphs. To ensure the of dynamic graph neural networks on a single GPU, PiPAD
efficiency and scalability of the model, techniques like [71] introduces a pipeline execution framework and multi-
sampling and parallel training are often integrated for snapshot parallel processing manner. Additionally, it employs
10 Front. Comput. Sci., 2025, 19(6): 196323

a slice-based graph representation to efficiently extract techniques are utilized to mitigate the associated costs of batch
overlapping sections of the graph, enabling parallel formation and to facilitate overlap with GPU training. The
computations for multiple snapshots simultaneously. DGNN- DistTGL [75] framework offers notable enhancements in
Booster [72] uses the Field-Programmable Gate Array terms of convergence speed and training throughput compared
(FPGA) to enhance the inference speed of dynamic graph to TGL [74]. This enables the efficient scaling of memory-
neural networks in a single-machine setting. It also introduces based dynamic GNNs training on distributed GPU clusters,
two distinct dataflow designs to support mainstream DGNN achieving near-linear speedup ratios. Guidance on the
models. Chakaravarthy et al. [73] employed optimization identification of the most effective training configurations is
techniques such as gradient checkpointing and graph- provided by employing a variety of parallelization algorithms
difference-based CPU-GPU data transfer to enhance the that are tailored to the specific dataset and hardware attributes.
training of dynamic GNNs on single-node multi-GPU In contrast, the SPEED [76] framework introduces a
systems. Additionally, they introduced an algorithm that uses streaming edge partitioning module that integrates temporal
snapshot partitioning to train large-scale dynamic GNNs on information through an exponential decay technique.
distributed multi-node multi-GPU systems efficiently. Additionally, it effectively minimizes the replication ratio by
Experimental results indicate that this method provides regulating the number of shared nodes. A vertex partitioning
notable improvements over traditional methods. method based on interval properties is also employed,
combined with joint and asynchronous pipelining techniques
Frameworks designed for CTDGs To overcome the between mini-batches. This ensures efficient parallel training
limitations of single-machine configurations and further on multiple GPUs for large-scale dynamic graphs. The design
improve the scalability of dynamic graph neural network speeds up the training and reduces memory consumption on a
models for processing large dynamic graphs, more prevalent single GPU significantly.
strategies involve partitioning the graph across multiple Liu et al. [77] proposed a method that emphasizes
devices for parallel training. In this context, random division hierarchical pipeline parallelism, integrating data prefetching,
and vertex-based partitioning are frequently employed. TGL joint pipelining across mini-batches, and asynchronous
[74] proposes a generalized framework to train different pipelining. Additionally, to enhance communication
dynamic GNNs, including snapshot-based, time encoding- efficiency, they introduced a strategy for partitioning graphs
based and memory-based methods. Temporal-CSR data and vertices to eliminate redundancy. Another framework,
structures are designed to enable fast access to temporal edges, STEP [78], provides an unsupervised pruning method for
while parallel samplers support efficient implementation of large-scale dynamic graphs. Node and edge representations
various temporal neighbor sampling algorithms. To overcome are acquired by self-supervised adversarial learning
the timeliness issue of node memory when training with large subsequent to each graph update. Moreover, it selects edges
batches, a random chunk scheduling technique is proposed, for sampling depending on their significance, resulting in a
which enables efficient multi-GPU training on large-scale graph known as the undermine graph. STEP utilizes the graph
dynamic graphs. Numerous dynamic GNNs maintain node- provided to train a basic pruning network. This enables the
level memory vectors to summarize historical information STEP framework to effectively manage the process of
about each node. To maintain and update these node memories removing unnecessary elements from dynamic graphs, during
efficiently, TGL designs the mailbox to cache the latest both the training and inference stages, with a particular focus
messages. In each training iteration, the latest messages are on newly added edges.
read from the mailbox, aggregated by a certain combiner. In general, these advanced methods jointly enhance the
Then, the memory state of the node is updated by a sequential effective parallel training of dynamic graph neural networks in
model such as GRU. This design circumvents the direct distributed settings, each with its distinct benefits and potential
generation of mail using the most recent memory state, use cases, as shown in Table 3.
effectively avoiding potential information leakage.
Additionally, it incorporates an asynchronous mechanism for 5 Datasets and benchmarks overview
updating node states, making it better aligned with the 5.1 Datasets
streaming or batch training process. In recent years, the quantity and variety of dynamic graph
To further enhance efficiency, DistTGL [75] implements a datasets made available to the public have increased
mechanism where memory operations are serialized and significantly. These datasets provide valuable resources for
executed asynchronously through a separate daemon process, researchers to validate and compare algorithms, thereby
circumventing intricate synchronization requirements. facilitating the exploration of new research directions. Table 4
DistTGL [75] incorporates an additional static node memory contains basic characteristics of commonly used dynamic
within the dynamic GNNs, which not only enhances the graph datasets. We can divide these datasets into CTDGs and
model’s accuracy but also accelerates its convergence rate. DTDGs, according to the collection and storage methods of
Two novel training methodologies, namely time period the datasets. However, with appropriate processing, they can
parallelism and memory parallelism, are also proposed. These be transformed into one another. From these datasets, we can
two methodologies enable capturing a comparable number of observe the following characteristics:
graph event dependencies on multiple GPUs, similar to that
achieved in single GPU training. Prefetching and pipelining ● Domains and applications. These datasets cover a
Yanping ZHENG et al. A survey of dynamic graph neural networks 11

Table 3 Scalable dynamic GNNs training framework


Name Graph Type Hardware Software
PiPAD [71] DTDG Single-node single-GPU with multi-core CPU PyG Temporal
DGNN-Booster [72] DTDG Single-node single-GPU with multi-core CPU Pytorch
Chakaravarthy et al. [73] DTDG Multi-node multi-GPU PyTorch
TGL [74] CTDG Single-node multi-GPU DGL
DistTGL [75] CTDG Multi-node multi-GPU DGL
SPEED [76] CTDG Single-node multi-GPU Pytorch
Xia et al. [77] CTDG Single-node multi-GPU PyTorch
STEP [78] CTDG Single-node single-GPU with multi-core CPU PyTorch

Table 4 The statistics of datasets, where n represents the number of nodes, m denotes the number of edges, |T | indicates the number of timestamps or
snapshots, and de refers to the dimension of edge features
Types Dataset Domain Category n m |T | de
Social Evolution [79] Proximity General 74 2,099,519 565,932 1†
Enron [80] Social General 184 125,235 22,632 −
Contact [81] Proximity General 692 2,426,279 8,065 1†
UCI [82] Social General 1,899 59,835 58,911 100
LastFM [54] Interaction Bipartite 1,980 1,293,103 1,283,614 −
Bitcoin-Alpha [83,84] Finance General 3,783 24,186 24,186 −
Bitcoin-OTC [83,84] Finance General 5,881 35,592 35,592 −
MOOC [54] Interaction Bipartite 7,144 411,749 345,600 4‡
Wikipedia [54] Social Bipartite 9,227 157,474 152,757 172
CTDG
Reddit [54] Social Bipartite 10,984 672,447 669,065 172
GDELT [74,85] Event General 16,682 191,290,882 170,522 186
eBay-Small [86] E-commerce Bipartite 38,427 384,677 − −
Taobao [87,88] E-commerce Bipartite 82,566 77,436 − 4‡
YouTube-Reddit-Small [89] Social Bipartite 264,443 297,732 − −
eBay-Large [86] E-commerce Bipartite 1,333,594 1,119,454 − −
Taobao-Large [87,88] E-commerce Bipartite 1,630,453 5,008,745 − 4‡
DGraphFin [90] Social General 3,700,550 4,300,999 − −
Youtube-Reddit-Large [89] Social Bipartite 5,724,111 4,228,523 − −
UN Vote [91] Politics General 201 1,035,742 72 1†
US Legislative [92,93] Politics General 225 60,396 12 1†
UN Trade [94] Finance General 255 507,497 32 1†
Canadian Parliament [92] Politics General 734 74,478 14 −
Twitter-Tennis [95] Social General 1,000 40,839 120 −
DTDG
Autonomous systems [96] Communication General 7,716 13,895 733 −
Flights [97] Transport General 13,169 1,927,145 122 1†
HEP-TH [96,98] Citation General 27,770 352,807 3,487 −
Elliptic [99] Finance General 203,769 234,355 49 −
MAG [11,74] Citation General 121,751,665 1,297,748,926 120 −
† Represents the edge weight.
‡ Indicates the number of different behaviors.

diverse range of domains, including social, music, interactions between these sets. In contrast, some
political, transportation, economic, and e-commerce datasets, like Enron [80] and SocialEvo [100], are
networks. This diversity enables researchers to test and general graphs.
validate algorithms on various real-world applications. ● Features. Although these datasets provide rich
● Scale. The datasets vary in graph sizes, ranging from information about nodes and edges, many of them do
small-scale graphs with thousands of nodes and edges not provide features of nodes or edges. Only a few
(e.g., UCI [82] and Contact [81]) to large-scale graphs datasets like Reddit [54], Wikipedia [54] and UCI [82]
with millions or even billions of nodes (e.g., Taobao provide edge features, which can help researchers gain
[87,88] and MAG [74]). This indicates that existing deeper insights into interaction patterns between nodes.
datasets can support research and applications at Datasets offering node features are even rarer, with
different scales. GDELT [74], Elliptic [99], and MAG [74] being
● Types. Many datasets are bipartite graphs. For example, notable exceptions. Conversely, datasets such as
datasets such as MOOC [54], LastFM [54], eBay [86], LastFM [54], Enron [80], and Flights [97] lack node
and TaoBao [86] fall into this category, with nodes feature details, potentially due to privacy concerns or
categorized into two distinct sets and edges representing the sensitive nature of the raw data. Moreover, in
12 Front. Comput. Sci., 2025, 19(6): 196323

practical applications, identifying or defining aims to identify edges in a dynamic graph that deviate from
appropriate features can also be challenging. historical evolution patterns. Such irregularities may arise
from connections that rogue users have introduced or
Overall, the currently available public datasets provide diverse manipulated maliciously. Conversely, node anomaly detection
resources for representation learning on dynamic graphs. focuses on identifying nodes that distinctly differ from their
However, these datasets also exhibit some common peers or exhibit atypical behaviors. Anomaly detection has
characteristics, such as the lack of node and edge features, broad applications, particularly in fraud detection and e-
which pose challenges and opportunities for researchers. This commerce. For example, on e-commerce sites, users might
is because collecting features in real-world applications is manipulate the system by repeatedly clicking on specific
often costly, and learning patterns solely from graph structure products to increase their popularity artificially. They could
changes can improve model generalization. It would be very also simultaneously click on a product and another trendy
encouraging if new high-quality public datasets with extensive item, deceiving the recommendation system into
features were to emerge in the future, as this would facilitate overestimating the similarity between the two items. These
more research projects. connections are considered anomalous edges, and the
deceptive users are considered anomalous nodes.
5.2 Prediction problems
Graph learning tasks include three levels: node-level, edge- Community detection is a fundamental task in understanding
level, and graph-level [11]. In node-level tasks, the primary complex graphs. Community structures are commonly
focus is on the attributes of individual nodes, which include observed in real-world networks that encompass various
node classification, regression, clustering, and so on. Edge- domains, such as social, biological, or technological. Based on
level tasks emphasize information about the edge. For the density of connections among nodes, the graph can be
example, link prediction determines whether an edge exists divided into various communities. Typically, nodes within a
between two nodes, while link classification identifies the type community have denser connections, while connections
or attributes of an edge. Graph-level tasks analyze the entire between nodes from different communities are less frequent.
structure, such as classifying the category of the graph (graph This community structure is also present in dynamic graphs,
classification) or predicting its continuous values (graph but the community configurations can evolve as the graph
regression). Graph-generation tasks aim to generate new structure changes. The identification and comprehension of
graphs. In addition, there are specialized tasks, such as graph dynamic community structures and their evolutionary patterns
generation, whose objective is to generate entirely new graph can provide a deeper comprehension of the graph and its
structures based on specific criteria or patterns observed in community formations, which facilitates the prediction of
existing data. potential changes in the graph. Taking social networks as an
The three levels of tasks also exist in dynamic graphs. The example, users with similar likes and follows can be grouped
primary difference from the static context is that the graph into the same interest clusters. By identifying these clusters,
changes over time, resulting in simultaneous alterations to the the system is able to recommend new connections or content
properties of nodes, edges, and the entire graph. For node- that aligns with their interests.
level tasks, attributes and connection patterns of nodes may
change, potentially impacting the results of classification or Transductive / inductive setting. In the transductive setting,
regression analyses. In edge-level tasks, the presence and the model attempts to make accurate predictions on the graph
attributes of edges may shift over time, for example, new on which it was initially trained. It trains on a specific portion
connections can emerge while old ones disappear. For graph- of the graph while being fully aware of the entire structure,
level tasks, the topology of the graph might become more including the unlabeled portions. In contrast, in the inductive
intricate with the addition of new nodes and edges, which then setting, the model learns from a designated training graph to
impact results in graph classification, regression, and make reliable predictions on independent, completely
generation. Next, we will explore specific application unknown graphs. In dynamic graphs, where nodes and edges
scenarios of dynamic graph analytics using link prediction, frequently change over time, the essence of the inductive
anomaly detection, and community discovery as examples. setting is the model’s ability to effectively generalize to
unseen nodes and dynamically evolving structures.
Link prediction in dynamic graphs aims to predict potential
edges in future graphs based on connections from previous Interpolation / extrapolation. When predicting the value of a
time steps, also known as temporal link prediction. This function in an unknown region based on existing data and
requires not only the structural features of the current graph models, predictions within the range of existing data are called
but also identifying trends from its dynamic changes. Link interpolations, while those outside are termed extrapolations.
prediction has a wide range of applications. In citation Taking dynamic node classification as an example,
networks, by examining both authors’ evolving citation extrapolation involves predicting future states (> t) based on
behaviors and changes in research focus, we can predict the observations up to time t , such as in weather forecasting. The
authors they might reference in future publications. interpolation setting involves predicting states within the
observed range (t0 , t) based on observations up to time t , like
Anomaly detection in dynamic graphs can be categorized into completing missing values. Similarly, in link prediction,
edge and node anomaly detection. Edge anomaly detection extrapolation predicts whether future edges (vi , v j ) will exist,
Yanping ZHENG et al. A survey of dynamic graph neural networks 13

while interpolation determines if edges (vi , v j ) exist within the diverse real-world challenges in the areas of computer vision
observed range (t0 , t) , as in knowledge graph completion. In and natural language processing. Recently, researchers have
summary, interpolation involves predicting unobserved values tried to explore its potential in graph learning. As of now,
within existing data, whereas extrapolation predicts beyond research into dynamic transfer learning for graphs is still in its
the observed range. For dynamic graphs, extrapolation is used early stages. It mainly involves applying ideas from fields
for future predictions, while interpolation focuses on imputing other than graphs, like using LSTM or attention mechanisms
missing information within observations, depending on to model the graph’s temporal pattern [103]. In domains
whether the target concerns observed or unobserved future outside of graph theory, substantial research has been
states. conducted on dynamic source domain non-stationarity, target
domain concept drift, and related topics. Future investigations
5.3 Benchmarks should delve deeper into the specific attributes of graph data,
In the past, different methods had their own configurations aiming to design more suitable dynamic transfer learning
and implementations, complicating the direct comparison of methods. This will further position dynamic transfer learning
model performances. Recent work has tried to establish as a pivotal tool in addressing various time-sensitive practical
unified frameworks for training and evaluation to rectify this. challenges.
By providing standardized datasets, synchronized
preprocessing stages, and fair training strategies, these 6.2 Pretraining techniques for dynamic GNNs
frameworks make sure that all methods are tested in the same Typically, pre-trained models are trained on large-scale
way. Researchers can then easily compare different models for datasets to obtain universal feature representations, which are
the same tasks, pinpointing the merits and shortcomings of then fine-tuned on specific downstream tasks to enhance their
each approach with greater clarity. These efforts demonstrate a performance. Originating from the fields of computer vision
prevailing tendency towards heightened standardization in the and natural language processing, pre-trained models have
field of dynamic graphs, which benefits the evaluation and achieved significant successes in many tasks, such as semantic
comparison of different methods. Table 5 provides links to parsing, image classification and image segmentation. With
these benchmarks. the increase in the volume of graph data, pre-training models
have recently been introduced to the domain of graph learning.
6 Advanced techniques in graph Related work usually employs self-supervised strategies to
processing design pre-training tasks such as edge generation [104],
6.1 Dynamic transfer learning attribute masking [104] and subgraph comparison [105]. Since
Transfer learning employs the knowledge of a model trained nodes and edges in dynamic graphs vary over time, these
on one task (called the source task) to facilitate the learning of models face the issue of describing the temporal dependence.
another distinct but related task (called the target task). The Therefore, PT-DGNN [106] uses a time-based masking
core idea is to share specific knowledge between the source strategy to predict the most recent edges or attributes, learning
and the target task. This concept is based on the observation the temporal evolution law of the graph. On the other hand,
that various tasks frequently share underlying features or CPDG [107] contrasts pairs of positive and negative samples.
patterns. Consequently, insights from one task can potentially CPDG captures patterns both temporally and structurally by
benefit another [101,102]. Dynamic transfer learning is a enhancing the similarity between positive samples and
transfer learning setting proposed for scenarios where both the diminishing it among negative samples. Additionally, CPDG
source and target domains evolve. In this context, the current stores memory states from different time steps to help the
state of the target domain depends on its historical state, which model preserve long-term evolutionary information, further
cannot be transferred straightforwardly. Thus, dynamic supporting subsequent downstream tasks.
transfer learning requires continuous real-time adjustments to However, there are still numerous challenges that require
the shifting data distributions in both the source and target additional research. Scalability is a major issue in this field,
domains. It also demands the prompt transfer of pertinent and the key question is how to train models in advance so that
knowledge from the evolving source domain to the they can efficiently deal with dynamic graphs at different
concurrently changing target domain, aiming to optimize temporal granularities and still be flexible enough to perform
performance in the target tasks. The primary challenge lies in many different downstream tasks. In addition, dynamic graphs
effectively modeling inter-domain evolution and managing contain both long-term stability patterns and short-term
temporal interdependencies across various time steps. variability patterns, which means the model needs to adjust for
Dynamic transfer learning was first introduced to address both. Furthermore, the structural neighbor information of each

Table 5 Links of benchmarks.


Benchmark Code
PyTorch geometric temporal See github.com/benedekrozemberczki/pytorch_geometric_temporal website.
TGL See github.com/amazon-science/tgl website.
Dynamic graph library (DyGLib) See github.com/yule-BUAA/DyGLib website.
Temporal graph benchmark (TGB) See github.com/shenyangHuang/TGB website.
BenchTemp See github.com/qianghuangwhu/benchtemp website.
14 Front. Comput. Sci., 2025, 19(6): 196323

node is essential, so the model is required to preserve this early stages. Xie et al. [110] proposed DGExplainer, which
structural information effectively. In real-world applications, aims to provide a reliable interpretation of dynamic GNNs.
pre-training for large-scale dynamic graphs requires efficient DGExplainer redistributes the output activation scores of a
sampling and training mechanisms. Simultaneously, the dynamic GNN to the neurons of the previous layer to calculate
temporal patterns learned by the pre-trained models should be the relevance scores of the input neurons, thereby identifying
promptly incorporated into subsequent tasks. important nodes in link prediction and node regression tasks.
In summary, current research and applications primarily The DynBrainGNN [111] model, designed to provide both
focus on designing appropriate pre-training tasks, sampling prediction and explanation, contains built-in interpretable
strategies, and contrastive learning methods for dynamic GNNs with dynamic FCs for psychiatric disease analysis.
graphs. The objective is to achieve resilient temporal and These studies show that although the interpretability of
structural depictions that improve the adaptability of the dynamic GNNs is a relatively new and less discussed area,
model for various tasks. Simultaneously, it is important to there are already some studies attempting to address this issue.
include enduring and immediate trends effectively, and then
improve the efficacy of model training and transfer. 7.3 LLMs for dynamic graph learning
Recently, large language models (LLMs) have shown
7 Future trends and challenges remarkable capabilities in natural language processing,
7.1 More complex dynamic graphs achieving breakthroughs in many aspects, such as text
Although some large-scale dynamic graph datasets have generation [112], question answering [113], and dialogue
emerged in recent years, dynamic graph datasets are still tasks [114]. Researchers have tried to adapt LLM to more
lacking in diversity compared to static graph datasets. This fields to utilize its powerful pre-trained language
may lead to insufficient model performance in diverse, real- understanding ability. LLM has strong few-shot, in-context
world, dynamic graph applications. learning ability, and by designing prompting techniques
carefully, it can help dynamic graph learning models to
● Firstly, the results of EdgeBank [100] indicate that in understand spatial and temporal information better. For
some cases, using only historical information can example, LLM4DyG [115] examines that providing only one
achieve good results. This suggests that the complexity example can significantly improve the accuracy of LLM on
of current datasets may be insufficient to distinguish multiple dynamic graph tasks such as “when link”, “when
between different models significantly. Therefore, we connect”, “when triadic closure”, “neighbor at time” and
need to construct more challenging datasets containing “neighbor in periods”. On the other hand, LLM can perform
richer structural information and temporal evolution reasoning in a given context and solve dynamic graph related
patterns to evaluate models’ modeling capabilities tasks. This insight motivates new applications of LLMs in
better. dynamic graph learning tasks and potentially transforms
● Secondly, operations like users unfollowing others are approaches in related fields. Although LLMs have
shared in real applications, but currently, there is a lack demonstrated inspiring understanding capabilities on dynamic
of real-world datasets containing such edge deletion graph tasks, they still face challenges as follows.
data. Expanding datasets could involve collecting and
● Processing large-scale graphs. As graph size grows,
integrating data with real edge deletion operations to
the performance of LLMs markedly declines [116]. The
simulate the dynamics of graph data in the real world.
reason may be that the increase in graph size makes the
This can help assess models’ robustness and
amount of contextual information soaring, which makes
performance when handling edge deletion.
it difficult for LLM to capture the key information. The
● Finally, some special graphs are rarely mentioned in
ability of LLMs to model large-scale dynamic graphs
current works, such as dynamic signed networks [108]
remains to be validated.
and dynamic heterophilic graphs. In the future, dynamic
● Computational efficiency. Current LLMs are mostly
graph datasets containing different network topologies
large and computationally expensive. How to improve
(e.g. small-world, scale-free) could be constructed, and
the computational speed of LLMs and improve their
more special graphs could be collected and generated.
efficiency in learning dynamic graphs is an issue that
In summary, larger-scale, richer-content, and more complex needs to be addressed.
structure datasets are needed to facilitate the development of ● Lack of structural modeling. LLMs lack the capability
dynamic graph learning towards handling real-world complex to explicitly model graph structures in the same way
networks. that GNN does. Merely relying on contextual
descriptions to acquire information may not be adequate
7.2 Interpretability of dynamic GNNs and direct. Therefore, additional investigation is
The frameworks and methods related to deep model required to improve LLMs’ ability to model graph
interpretability are designed to help users understand and trust topology.
these models, enabling them to be applied more effectively.
There have been some advances in the area of interpretability 8 Conclusion
for static GNN models [109], while research on Dynamic graph neural networks, a recent development in
interpretability for dynamic GNNs appears to still be in its representation learning, can process changing graph structures
Yanping ZHENG et al. A survey of dynamic graph neural networks 15

and have potential applications across various fields. In this representation learning on temporal graphs. In: Proceedings of the 8th
paper, we conduct a comprehensive survey on the key issues International Conference on Learning Representations. 2020
of dynamic GNNs, including prediction tasks, temporal 3. Rossi E, Chamberlain B, Frasca F, Eynard D, Monti F, Bronstein M.
Temporal graph networks for deep learning on dynamic graphs. 2020,
modeling methods, and the handling of large-scale dynamic
arXiv preprint arXiv: 2006.10637
graphs. We have summarized the main advancements in
4. You J, Du T, Leskovec J. ROLAND: graph learning framework for
current research and pointed out some promising key
dynamic graphs. In: Proceedings of the 28th ACM SIGKDD
directions, such as expanding the scope of existing models and Conference on Knowledge Discovery and Data Mining. 2022,
researching the expressive capability of dynamic GNNs. 2358−2366
Dynamic GNNs still face challenges, such as scalability, 5. Zhu Y, Lyu F, Hu C, Chen X, Liu X. Encoder-decoder architecture for
theoretical guidance, and dataset construction. We hope that supervised dynamic graph learning: a survey. 2022, arXiv preprint
this paper can provide valuable references for researchers in arXiv: 2203.10480
this field, and promote the further development of theories and 6. Barros C D T, Mendonca M R F, Vieira A B, Ziviani A. A survey on
methods related to dynamic GNNs. An important direction for embedding dynamic graphs. ACM Computing Surveys, 2021, 55(1):
future research is to continue expanding the applicability of 10
dynamic graph representation learning, handling more 7. Cai B, Xiang Y, Gao L, Zhang H, Li Y, Li J. Temporal knowledge
complex and diverse dynamic graph structures and evolution graph completion: a survey. In: Proceedings of the 32nd International
patterns, and making it a unified representation learning Joint Conference on Artificial Intelligence. 2023, 6545−6553
8. Liu C, Paterlini S. Stock price prediction using temporal graph model
framework with both breadth and depth. As more and more
with value chain data. 2023, arXiv preprint arXiv: 2303.09406
practical applications involve dynamic graph analysis,
9. Wang X, Ma Y, Wang Y, Jin W, Wang X, Tang J, Jia C, Yu J. Traffic
dynamic GNNs will undoubtedly become a continuously flow prediction via spatial temporal graph neural network. In:
active and increasingly important research area. Proceedings of Web Conference 2020. 2020, 1082−1092
10. Gao Y, Wang X, He X, Feng H, Zhang Y. Rumor detection with self-
Acknowledgements This research was supported in part by National
supervised learning on texts and social graph. Frontiers of Computer
Science and Technology Major Project (2022ZD0114 802), by National
Natural Science Foundation of China (Grant Nos. U2241212, 61932001), by Science, 2023, 17(4): 174611
Beijing Natural Science Foundation (No. 4222028), by Beijing Outstanding 11. Hu W, Fey M, Ren H, Nakata M, Dong Y, Leskovec J. OGB-LSC: a
Young Scientist Program (No. BJJWZYJH012019100020098), by Alibaba large-scale challenge for machine learning on graphs. In: Proceedings
Group through Alibaba Innovative Research Program, and by Huawei- of the 1st Neural Information Processing Systems Track on Datasets
Renmin University joint program on Information Retrieval. We also wish to and Benchmarks. 2021
acknowledge the support provided by the fund for building world-class 12. Fu D, He J. DPPIN: a biological repository of dynamic protein-protein
universities (disciplines) of Renmin University of China, by Engineering interaction network data. In: Proceedings of 2022 IEEE International
Research Center of Next-Generation Intelligent Search and Recommendation, Conference on Big Data. 2022, 5269−5277
Ministry of Education, Intelligent Social Governance Interdisciplinary
13. Hawkes A G. Spectra of some self-exciting and mutually exciting
Platform, Major Innovation & Planning Interdisciplinary Platform for the
point processes. Biometrika, 1971, 58(1): 83−90
“Double-First Class” Initiative, Public Policy and Decision-making Research
Lab, and Public Computing Cloud, Renmin University of China. The work 14. Zuo S, Jiang H, Li Z, Zhao T, Zha H. Transformer HawKes process.
was partially done at Beijing Key Laboratory of Big Data Management and In: Proceedings of the 37th International Conference on Machine
Analysis Methods, MOE Key Lab of Data Engineering and Knowledge Learning. 2020, 11692−11702
Engineering, and Pazhou Laboratory (Huangpu), Guangzhou, Guangdong 15. Lu Y, Wang X, Shi C, Yu P S, Ye Y. Temporal network embedding
510555, China. with micro- and macro-dynamics. In: Proceedings of the 28th ACM
International Conference on Information and Knowledge Management.
Competing interests The authors declare that they have no competing 2019, 469−478
interests or financial conflicts to disclose. 16. Zuo Y, Liu G, Lin H, Guo J, Hu X, Wu J. Embedding temporal
network via neighborhood formation. In: Proceedings of the 24th
Open Access This article is licensed under a Creative Commons Attribution
ACM SIGKDD International Conference on Knowledge Discovery &
4.0 International License, which permits use, sharing, adaptation, distribution
Data Mining. 2018, 2857−2866
and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative 17. Chen F, Wang Y C, Wang B, Kuo C C J. Graph representation
Commons licence, and indicate if changes were made. learning: a survey. APSIPA Transactions on Signal and Information
The images or other third party material in this article are included in the Processing. 2020, 9: e15
article’s Creative Commons licence, unless indicated otherwise in a credit line 18. Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally
to the material. If material is not included in the article’s Creative Commons linear embedding. Science, 2000, 290(5500): 2323−2326
licence and your intended use is not permitted by statutory regulation or 19. Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction
exceeds the permitted use, you will need to obtain permission directly from and data representation. Neural Computation, 2003, 15(6): 1373−1396
the copyright holder. 20. Cao S, Lu W, Xu Q. GraRep: Learning graph representations with
To view a copy of this licence, visit http://creativecommons.org/licenses/
global structural information. In: Proceedings of the 24th ACM
by/4.0/.
International on Conference on Information and Knowledge
References Management. 2015, 891−900
21. Luo X, Yuan J, Huang Z, Jiang H, Qin Y, Ju W, Zhang M, Sun Y.
1. Kazemi S M, Goel R, Jain K, Kobyzev I, Sethi A, Forsyth P, Poupart Hope: High-order graph ode for modeling interacting dynamics. In:
P. Representation learning for dynamic graphs: a survey. The Journal International Conference on Machine Learning. 2023, 23124–23139
of Machine Learning Research, 2020, 21(1): 70 22. Bartunov S, Kondrashkin D, Osokin A, Vetrov D. Breaking sticks and
2. Xu D, Ruan C, Korpeoglu E, Kumar S, Achan K. Inductive ambiguities with adaptive skip-gram. In: Proceedings of the 19th
16 Front. Comput. Sci., 2025, 19(6): 196323

International Conference on Artificial Intelligence and Statistics. 2016, 41. Li J, Han Z, Cheng H, Su J, Wang P, Zhang J, Pan L. Predicting path
130−138 failure in time-evolving graphs. In: Proceedings of the 25th ACM
23. Joachims T. Text categorization with support vector machines: SIGKDD International Conference on Knowledge Discovery & Data
learning with many relevant features. In: Proceedings of the 10th Mining. 2019, 1279−1289
European Conference on Machine Learning. 1998, 137−142 42. Jin W, Qu M, Jin X, Ren X. Recurrent event network: autoregressive
24. Perozzi B, Al-Rfou R, Skiena S. DeepWalk: online learning of social structure inferenceover temporal knowledge graphs. In: Proceedings of
representations. In: Proceedings of the 20th ACM SIGKDD 2020 Conference on Empirical Methods in Natural Language
International Conference on Knowledge Discovery and Data Mining. Processing. 2020, 6669−6683
2014, 701−710 43. Zhu Y, Cong F, Zhang D, Gong W, Lin Q, Feng W, Dong Y, Tang J.
25. Grover A, Leskovec J. Node2vec: scalable feature learning for WinGNN: dynamic graph neural networks with random gradient
networks. In: Proceedings of the 22nd ACM SIGKDD International aggregation window. In: Proceedings of the 29th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining. 2016, Conference on Knowledge Discovery and Data Mining. 2023,
855−864 3650−3662
26. Wang D, Cui P, Zhu W. Structural deep network embedding. In: 44. Pareja A, Domeniconi G, Chen J, Ma T, Suzumura T, Kanezashi H,
Proceedings of the 22nd ACM SIGKDD International Conference on Kaler T, Schardl T, Leiserson C. EvolveGCN: evolving graph
Knowledge Discovery and Data Mining. 2016, 1225−1234 convolutional networks for dynamic graphs. In: Proceedings of the
27. Cao S, Lu W, Xu Q. Deep neural networks for learning graph 34th AAAI Conference on Artificial Intelligence. 2020, 5363−5370
representations. In: Proceedings of the 30th AAAI Conference on 45. Qin X, Sheikh N, Lei C, Reinwald B, Domeniconi G. SEIGN: a simple
Artificial Intelligence. 2016, 1145−1152 and efficient graph neural network for large dynamic graphs. In:
28. Defferrard M, Bresson X, Vandergheynst P. Convolutional neural Proceedings of the 39th IEEE International Conference on Data
networks on graphs with fast localized spectral filtering. In: Engineering. 2023, 2850−2863
Proceedings of the 30th International Conference on Neural 46. Goyal P, Chhetri S R, Canedo A. Dyngraph2vec: capturing network
Information Processing Systems. 2016, 3844−3852 dynamics using dynamic graph representation learning. Knowledge-
29. Kipf T N, Welling M. Semi-supervised classification with graph Based Systems, 2020, 187: 104816
convolutional networks. In: Proceedings of the 5th International 47. Trivedi R, Dai H, Wang Y, Song L. Know-evolve: deep temporal
Conference on Learning Representations. 2017 reasoning for dynamic knowledge graphs. In: Proceedings of the 34th
30. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. International Conference on Machine Learning. 2017, 3462−3471
Graph attention networks. In: Proceedings of the 6th International 48. Trivedi R, Farajtabar M, Biswal P, Zha H. DyRep: learning
Conference on Learning Representations. 2018 representations over dynamic graphs. In: Proceedings of the 7th
31. Hamilton W L, Ying Z, Leskovec J. Inductive representation learning International Conference on Learning Representations. 2019
on large graphs. In: Proceedings of the 31st International Conference 49. Knyazev B, Augusta C, Taylor G W. Learning temporal attention in
on Neural Information Processing Systems. 2017, 1025−1035 dynamic graphs with bilinear interactions. PLoS One, 2021, 16(3):
32. Zhu D, Cui P, Zhang Z, Pei J, Zhu W. High-order proximity preserved e0247936
embedding for dynamic networks. IEEE Transactions on Knowledge 50. Han Z, Ma Y, Wang Y, Gunnemann S, Tresp V. Graph Hawkes neural
and Data Engineering, 2018, 30(11): 2134−2144 network for forecasting on temporal knowledge graphs. In:
33. Li J, Dani H, Hu X, Tang J, Chang Y, Liu H. Attributed network Proceedings of the Automated Knowledge Base Construction. 2020
embedding for learning in a dynamic environment. In: Proceedings of 51. Sun H, Geng S, Zhong J, Hu H, He K. Graph Hawkes transformer for
2017 ACM on Conference on Information and Knowledge extrapolated reasoning on temporal knowledge graphs. In: Proceedings
Management. 2017, 387−396 of 2022 Conference on Empirical Methods in Natural Language
34. Nguyen G H, Lee J B, Rossi R A, Ahmed N K, Koh E, Kim S. Processing. 2022, 7481−7493
Continuous-time dynamic network embeddings. In: Proceedings of 52. Wen Z, Fang Y. Trend: temporal event and node dynamics for graph
Web Conference 2018. 2018, 969−976 representation learning. In: Proceedings of ACM Web Conference
35. Heidari F, Papagelis M. Evonrl: Evolving network representation 2022. 2022, 1159−1169
learning based on random walks. In: Complex Networks and Their 53. Ma Y, Guo Z, Ren Z, Tang J, Yin D. Streaming graph neural
Applications VII: Volume 1 Proceedings The 7th International networks. In: Proceedings of the 43rd International ACM SIGIR
Conference on Complex Networks and Their Applications COMPLEX Conference on Research and Development in Information Retrieval.
NETWORKS 2018 7. 2019, 457–469 2020, 719−728
36. Manessi F, Rozza A, Manzo M. Dynamic graph convolutional 54. Kumar S, Zhang X, Leskovec J. Predicting dynamic embedding
networks. Pattern Recognition, 2020, 97: 107000 trajectory in temporal interaction networks. In: Proceedings of the 25th
37. Sankar A, Wu Y, Gou L, Zhang W, Yang H. DySAT: deep neural ACM SIGKDD International Conference on Knowledge Discovery &
representation learning on dynamic graphs via self-attention networks. Data Mining. 2019, 1269−1278
In: Proceedings of the 13th International Conference on Web Search 55. Wang X, Lyu D, Li M, Xia Y, Yang Q, Wang X, Wang X, Cui P,
and Data Mining. 2020, 519−527 Yang Y, Sun B, Guo Z Y. APAN: asynchronous propagation attention
38. Wang Y, Li P, Bai C, Subrahmanian V S, Leskovec J. Generic network for real-time temporal graph embedding. In: Proceedings of
representation learning for dynamic social interaction. In: Proceedings 2021 International Conference on Management of Data. 2021,
of KDD’ 20: Knowledge Discovery in Databases. 2020 2628−2638
39. Wang Y, Li P, Bai C, Leskovec J. TEDIC: neural modeling of 56. Wang Y, Chang Y Y, Liu Y, Leskovec J, Li P. Inductive
behavioral patterns in dynamic social interaction networks. In: representation learning in temporal networks via causal anonymous
Proceedings of Web Conference 2021. 2021, 693−705 walks. In: Proceedings of the 9th International Conference on Learning
40. Chen J, Wang X, Xu X. GC-LSTM: graph convolution embedded Representations. 2021
LSTM for dynamic network link prediction. Applied Intelligence, 57. Li Y, Shen Y, Chen L, Yuan M. Zebra: when temporal graph neural
2022, 52(7): 7513−7528 networks meet temporal personalized PageRank. Proceedings of the
Yanping ZHENG et al. A survey of dynamic graph neural networks 17

VLDB Endowment, 2023, 16(6): 1332−1345 interaction graph embedding. 2023, arXiv preprint arXiv: 2308.14129
58. Li H, Chen L. EARLY: efficient and reliable graph neural network for 77. Xia Y, Zhang Z, Wang H, Yang D, Zhou X, Cheng D. Redundancy-
dynamic graphs. Proceedings of the ACM on Management of Data, free high-performance dynamic GNN training with hierarchical
2023, 1(2): 163 pipeline parallelism. In: Proceedings of the 32nd International
59. Zheng Y, Wei Z, Liu J. Decoupled graph neural networks for large Symposium on High-Performance Parallel and Distributed Computing.
dynamic graphs. Proceedings of the VLDB Endowment, 2023, 16(9): 2023, 17−30
2239−2247 78. Li J, Tian S, Wu R, Zhu L, Zhao W, Meng C, Chen L, Zheng Z, Yin
60. Fu D, He J. SDG: a simplified and dynamic graph neural network. In: H. Less can be more: unsupervised graph pruning for large-scale
Proceedings of the 44th International ACM SIGIR Conference on dynamic graphs. 2023, arXiv preprint arXiv: 2305.10673
Research and Development in Information Retrieval. 2021, 2273−2277 79. Madan A, Cebrian M, Moturu S, Farrahi K, Pentland A. Sensing the
61. Liu H, Xu X, Lu J A, Chen G, Zeng Z. Optimizing pinning control of “health state” of a community. IEEE Pervasive Computing, 2012,
complex dynamical networks based on spectral properties of grounded 11(4): 36−45
laplacian matrices. IEEE Transactions on Systems, Man, and 80. Shetty J, Adibi J. The enron email dataset database schema and brief
Cybernetics: Systems. 2018, 51(2): 786–796 statistical report. Information Sciences Institute Technical Report,
62. Bonner S, Atapour-Abarghouei A, Jackson P T, Brennan J, Kureshi I, University of Southern California, 2004, 4(1): 120−128
Theodoropoulos G, McGough A S, Obara B. Temporal neighbourhood 81. Sapiezynski P, Stopczynski A, Lassen D D, Lehmann S. Interaction
aggregation: predicting future links in temporal graphs via recurrent data from the copenhagen networks study. Scientific Data, 2019, 6(1):
variational graph convolutions. In: Proceedings of 2019 IEEE 315
International Conference on Big Data. 2019, 5336−5345 82. Panzarasa P, Opsahl T, Carley K M. Patterns and dynamics of users’
63. Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated behavior and interaction: network analysis of an online community.
recurrent neural networks on sequence modeling. 2014, arXiv preprint Journal of the American Society for Information Science and
arXiv: 1412.3555 Technology, 2009, 60(5): 911−932
64. Goyal P, Kamra N, He X, Liu Y. DynGEM: deep embedding method 83. Kumar S, Spezzano F, Subrahmanian V S, Faloutsos C. Edge weight
for dynamic graphs. 2018, arXiv preprint arXiv: 1805.11273 prediction in weighted signed networks. In: Proceedings of the 16th
65. Chen T, Goodfellow I, Shlens J. Net2Net: accelerating learning via IEEE International Conference on Data Mining. 2016, 221−230
knowledge transfer. In: Proceedings of the 4th International 84. Kumar S, Hooi B, Makhija D, Kumar M, Faloutsos C, Subrahmanian
Conference on Learning Representations. 2016 V S. REV2: fraudulent user prediction in rating platforms. In:
66. Kipf T, Fetaya E, Wang K C, Welling M, Zemel R. Neural relational Proceedings of the 11th ACM International Conference on Web Search
inference for interacting systems. In: Proceedings of the 35th and Data Mining. 2018, 333−341
International Conference on Machine Learning. 2018, 2688−2697 85. Leetaru K, Schrodt P A. GDELT: global data on events, location, and
67. Kazemi S M, Goel R, Eghbali S, Ramanan J, Sahota J, Thakur S, Wu tone, 1979-2012. In: Proceedings of ISA Annual Convention. 2013,
S, Smyth C, Poupart P, Brubaker M. Time2Vec: learning a vector 1−49
representation of time. 2019, arXiv preprint arXiv: 1907.05321 86. Huang Q, Jiang J, Rao X S, Zhang C, Han Z, Zhang Z, Wang X, He Y,
68. Loomis L H. Introduction to Abstract Harmonic Analysis. New York: Xu Q, Zhao Y, Hu C, Shang S, Du B. BenchTemp: a general
Dover Publications, 2013 benchmark for evaluating temporal graph neural networks. 2023, arXiv
69. Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K. Simplifying preprint arXiv: 2308.16385
graph convolutional networks. In: Proceedings of the 36th 87. Jin M, Li Y F, Pan S. Neural temporal walks: motif-aware
International Conference on Machine Learning. 2019, 6861−6871 representation learning on continuous-time dynamic graphs. In:
70. Gasteiger J, Bojchevski A, Günnemann S. Predict then propagate: Proceedings of the 36th International Conference on Neural
graph neural networks meet personalized pagerank. In: Proceedings of Information Processing Systems. 2022, 1445
the 7h International Conference on Learning Representations. 2019 88. Zhu H, Li X, Zhang P, Li G, He J, Li H, Gai K. Learning tree-based
71. Wang C, Sun D, Bai Y. PiPAD: pipelined and parallel dynamic GNN deep model for recommender systems. In: Proceedings of the 24th
training on GPUs. In: Proceedings of the 28th ACM SIGPLAN Annual ACM SIGKDD International Conference on Knowledge Discovery &
Symposium on Principles and Practice of Parallel Programming. 2023, Data Mining. 2018, 1079−1088
405−418 89. Jin Y, Lee Y C, Sharma K, Ye M, Sikka K, Divakaran A, Kumar S.
72. Chen H, Hao C. DGNN-booster: a generic FPGA accelerator Predicting information pathways across online communities. In:
framework for dynamic graph neural network inference. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge
Proceedings of the 31st IEEE Annual International Symposium on Discovery and Data Mining. 2023, 1044−1056
Field-Programmable Custom Computing Machines. 2023, 195−201 90. Huang X, Yang Y, Wang Y, Wang C, Zhang Z, Xu J, Chen L,
73. Chakaravarthy V T, Pandian S S, Raje S, Sabharwal Y, Suzumura T, Vazirgiannis M. DGraph: a large-scale financial dataset for graph
Ubaru S. Efficient scaling of dynamic graph neural networks. In: anomaly detection. In: Proceedings of the 36th International
Proceedings of International Conference for High Performance Conference on Neural Information Processing Systems. 2022, 1654
Computing, Networking, Storage and Analysis. 2021, 77 91. Bailey M A, Strezhnev A, Voeten E. Estimating dynamic state
74. Zhou H, Zheng D, Nisa I, Ioannidis V, Song X, Karypis G. TGL: a preferences from united nations voting data. Journal of Conflict
general framework for temporal GNN training on billion-scale graphs. Resolution, 2017, 61(2): 430−456
Proceedings of the VLDB Endowment, 2022, 15(8): 1572−1580 92. Huang S, Hitti Y, Rabusseau G, Rabbany R. Laplacian change point
75. Zhou H, Zheng D, Song X, Karypis G, Prasanna V. DistTGL: detection for dynamic graphs. In: Proceedings of the 26th ACM
distributed memory-based temporal graph neural network training. In: SIGKDD International Conference on Knowledge Discovery & Data
Proceedings of International Conference for High Performance Mining. 2020, 349−358
Computing, Networking, Storage and Analysis. 2023, 39 93. Fowler J H. Legislative cosponsorship networks in the US house and
76. Chen X, Liao Y, Xiong Y, Zhang Y, Zhang S, Zhang J, Sun Y. senate. Social Networks, 2006, 28(4): 454−465
SPEED: streaming partition and parallel acceleration for temporal 94. MacDonald G K, Brauman K A, Sun S, Carlson K M, Cassidy E S,
18 Front. Comput. Sci., 2025, 19(6): 196323

Gerber J S, West P C. Rethinking agricultural trade relationships in an Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M,
era of globalization. BioScience, 2015, 65(3): 275−289 Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess
95. Béres F, Pálovics R, Oláh A, Benczúr A A. Temporal walk based B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei
centrality metric for graph streams. Applied Network Science, 2018, D. Language models are few-shot learners. In: Proceedings of the 34th
3(1): 32 International Conference on Neural Information Processing Systems.
96. Leskovec J, Kleinberg J, Faloutsos C. Graphs over time: densification 2020, 159
laws, shrinking diameters and possible explanations. In: Proceedings 113. Zhuang Y, Yu Y, Wang K, Sun H, Zhang C. ToolQA: a dataset for
of the 11th ACM SIGKDD International Conference on Knowledge LLM question answering with external tools. In: Proceedings of the
Discovery in Data Mining. 2005, 177−187 37th International Conference on Neural Information Processing
97. Schäfer M, Strohmeier M, Lenders V, Martinovic I, Wilhelm M. Systems. 2023, 36
Bringing up OpenSky: a large-scale ads-b sensor network for research. 114. Mendonça J, Pereira P, Moniz H, Carvalho J P, Lavie A, Trancoso I.
In: Proceedings of the 13th International Symposium on Information Simple LLM prompting is state-of-the-art for robust and multilingual
Processing in Sensor Networks. 2014, 83−94 dialogue evaluation. In: Proceedings of the 11th Dialog System
98. Gehrke J, Ginsparg P, Kleinberg J. Overview of the 2003 KDD cup. Technology Challenge. 2023, 133−143
ACM SIGKDD Explorations Newsletter, 2003, 5(2): 149−151 115. Zhang Z, Wang X, Zhang Z, Li H, Qin Y, Wu S, Zhu W. LLM4DyG:
99. Weber M, Domeniconi G, Chen J, Weidele D K I, Bellei C, Robinson can large language models solve problems on dynamic graphs? 2023,
T, Leiserson C E. Anti-money laundering in Bitcoin: experimenting arXiv preprint arXiv: 2310.17110
with graph convolutional networks for financial forensics. 2019, arXiv 116. Tang J, Yang Y, Wei W, Shi L, Su L, Cheng S, Yin D, Huang C.
preprint arXiv: 1908.02591 GraphGPT: graph instruction tuning for large language models. 2023,
100. Poursafaei F, Huang S, Pelrine K, Rabbany R. Towards better arXiv preprint arXiv: 2310.13023
evaluation for dynamic link prediction. In: Proceedings of the 36th
International Conference on Neural Information Processing Systems. Yanping Zheng is a PhD candidate at Gaoling
2022, 2386
School of Artificial Intelligence, Renmin
101. Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on
University of China, China advised by Professor
Knowledge and Data Engineering, 2010, 22(10): 1345−1359
Zhewei Wei. She received her master’s degree of
102. Neyshabur B, Sedghi H, Zhang C. What is being transferred in transfer
learning? In: Proceedings of the 34th International Conference on engineering from Beijing Technology and
Neural Information Processing Systems. 2020, 44 Business University, China in 2020. Her research
103. Wang H, Mao Y, Sun J, Zhang S, Zhou D. Dynamic transfer learning focuses on graph learning algorithms. She is
across graphs. 2023, arXiv preprint arXiv: 2305.00664 particularly interested in efficient algorithms on Graph Neural
104. Hu Z, Dong Y, Wang K, Chang K W, Sun Y. GPT-GNN: generative Networks, Dynamic Graph Representation Learning.
pre-training of graph neural networks. In: Proceedings of the 26th
ACM SIGKDD International Conference on Knowledge Discovery &
Lu Yi is currently a PhD student at Gaoling
Data Mining. 2020, 1857−1867
School of Artificial Intelligence, Renmin
105. Qiu J, Chen Q, Dong Y, Zhang J, Yang H, Ding M, Wang K, Tang J.
GCC: graph contrastive coding for graph neural network pre-training. University of China, China and advised by
In: Proceedings of the 26th ACM SIGKDD International Conference Professor Zhewei Wei. She received her B.E.
on Knowledge Discovery & Data Mining. 2020, 1150−1160 degree in Computer Science and Technology at
106. Chen K J, Zhang J, Jiang L, Wang Y, Dai Y. Pre-training on dynamic School of Computer Science, Beijing University
graph neural networks. Neurocomputing, 2022, 500: 679−687 of Posts and Telecommunications, China in June
107. Bei Y, Xu H, Zhou S, Chi H, Zhang M, Li Z, Bu J. CPDG: a 2022. Her research lie in the field of graph-related machine learning
contrastive pre-training method for dynamic graph neural networks. and efficient graph algorithm.
2023, arXiv preprint arXiv: 2307.02813
108. Sharma K, Raghavendra M, Lee Y C, Kumar M A, Kumar S.
Representation learning in continuous-time dynamic signed networks. Zhewei Wei is currently a professor at Gaoling
In: Proceedings of the 32nd ACM International Conference on School of Artificial Intelligence, Renmin
Information and Knowledge Management. 2023, 2229−2238 University of China, China. He obtained his PhD
109. Dai E, Wang S. Towards self-explainable graph neural network. In: degree at Department of Computer Science and
Proceedings of the 30th ACM International Conference on Information Engineering, The Hong Kong University of
& Knowledge Management. 2021, 302−311 Science and Technology (HKUST), China in
110. Xie J, Liu Y, Shen Y. Explaining dynamic graph neural networks via 2012. He received the BSc degree in the School of
relevance back-propagation. 2022, arXiv preprint arXiv: 2207.11175
Mathematical Sciences at Peking University, China in 2008. His
111. Zheng K, Ma B, Chen B. DynBraingNN: Towards spatio-temporal
research interests include graph algorithms, massive data algorithms,
interpretable graph neural network based on dynamic brain
and streaming algorithms. He was the Proceeding Chair of
connectome for psychiatric diagnosis. In: Proceedings of the 14th
International Workshop on Machine Learning in Medical Imaging. SIGMOD/PODS2020 and ICDT2021, the Area Chair of ICML
2023, 164−173 2022/2023, NeurIPS 2022/2023, ICLR 2023, WWW 2023. He is also
112. Brown T B, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, the PC member of various top conferences, such as VLDB, KDD,
Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert- ICDE, ICML, and NeurIPS.

You might also like