0% found this document useful (0 votes)
230 views70 pages

A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges

This article provides a comprehensive survey of graph neural networks (GNNs) across different learning paradigms, including supervised, unsupervised, semi-supervised, self-supervised, and few-shot learning. It introduces GNNs and provides a taxonomy of methods within each learning setting. The approaches are analyzed theoretically and empirically, and applications are discussed along with benchmark datasets and open challenges. Compared to prior surveys, this article focuses specifically on categorizing GNNs according to learning paradigm rather than architecture.

Uploaded by

liuyunwu2008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
230 views70 pages

A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges

This article provides a comprehensive survey of graph neural networks (GNNs) across different learning paradigms, including supervised, unsupervised, semi-supervised, self-supervised, and few-shot learning. It introduces GNNs and provides a taxonomy of methods within each learning setting. The approaches are analyzed theoretically and empirically, and applications are discussed along with benchmark datasets and open challenges. Compared to prior surveys, this article focuses specifically on categorizing GNNs according to learning paradigm rather than architecture.

Uploaded by

liuyunwu2008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Artificial Intelligence Review

https://doi.org/10.1007/s10462-022-10321-2

MANUSCRIPT

A survey of graph neural networks in various learning


paradigms: methods, applications, and challenges

Lilapati Waikhom1 · Ripon Patgiri1

© The Author(s), under exclusive licence to Springer Nature B.V. 2022

Abstract
In the last decade, deep learning has reinvigorated the machine learning field. It has solved
many problems in computer vision, speech recognition, natural language processing, and
other domains with state-of-the-art performances. In these domains, the data is gener-
ally represented in the Euclidean space. Various other domains conform to non-Euclidean
space, for which a graph is an ideal representation. Graphs are suitable for representing the
dependencies and inter-relationships between various entities. Traditionally, handcrafted
features for graphs are incapable of providing the necessary inference for various tasks
from this complex data representation. Recently, there has been an emergence of employ-
ing various advances in deep learning for graph-based tasks (called Graph Neural Net-
works (GNNs)). This article introduces preliminary knowledge regarding GNNs and com-
prehensively surveys GNNs in different learning paradigms—supervised, unsupervised,
semi-supervised, self-supervised, and few-shot or meta-learning. The taxonomy of each
graph-based learning setting is provided with logical divisions of methods falling in the
given learning setting. The approaches for each learning task are analyzed from theoreti-
cal and empirical standpoints. Further, we provide general architecture design guidelines
for building GNN models. Various applications and benchmark datasets are also provided,
along with open challenges still plaguing the general applicability of GNNs.

Keywords Graph · Neural network · Deep learning · Graph neural network

1 Introduction

A graph is an ordered pair G = (V, E) where V is the set of nodes and E is the set of edges.
We observe graph structures everywhere, starting from social networks (Qiu et al. 2018;
Zhang and Chen 2018; Yu et al. 2020), physical interactions (Chen et al. 2015), physi-
cal networks (Deng et al. 2019; Zhou et al. 2017). Graph can also be used to represent

* Ripon Patgiri
[email protected]
Lilapati Waikhom
[email protected]
1
Department of Computer Science & Engineering, National Institute of Technology Silchar, Silchar,
Assam 788010, India

13
Vol.:(0123456789)
L. Waikhom, R. Patgiri

inconceivable structures like atoms, molecules, ecosystems, living creatures, planetary


systems (Duvenaud et al. 2015; Kearnes et al. 2016; Gilmer et al. 2017; Fout 2017; Li
et al. 2018; Cao and Kipf 2018; You et al. 2018), and so on. Evidently, graph structures
are everywhere in our surroundings and our world perception (a thought is connected to
other thoughts). Our world perception comprises entities and inter-relationships to estab-
lish concepts like reasoning, communication, relations, marketing, etc. These entities and
their relationship perfectly fit in the framework of graph representation. Further, graph-
structured representations are common on the rapidly growing Internet (a giant graph) and
other massive graphs like Social Networks, Knowledge databases of search engines, Street
Maps, Chemical Compounds, High-energy Physics particles, and Bio-environments. These
massive graphs pose unique problems from computational and algorithms perspectives.
Several tasks are defined based on the observable scope of a graph, such as at node level,
edge levels, sub-graph, and graph-level tasks. Effective and novel techniques are required
to solve tasks at the different observable levels of scopes in graph structures.
Many traditional Machine Learning (ML) techniques have been proposed on top of the
extracted features using various predefined processes from the raw data. The extracted fea-
tures could be pixel statistics in images or word occurrence statistics in natural language
data. In the last decade, Deep Learning (DL) techniques have gained massive popular-
ity, tackling the learning problems efficiently, learning representation from raw data, and
using the learned representation to predict simultaneously. Usually, this is accomplished
by exploring many different non-linear transformations (performed by layers) and end-to-
end training of such models using the gradient descent-based optimization method. Even
though DL has recently advanced in several fields such as Computer Vision, Natural Lan-
guage Processing, Biomedical Imaging, Bio-informatics, and so on, it still lacks relational
and causal reasoning, intellectual abstraction, and various other human abilities. Structur-
ing the computation and representation of a Deep Neural Network (DNN) in the form of a
graph is one of the ways to address these problems. These types of approaches are charac-
terized as Graph Neural networks (GNNs).
GNNs are successful on graph-based datasets in various domains employing differ-
ent—supervised, semi-supervised, self-supervised, and unsupervised—learning settings.
Most graph-based methods fall under unsupervised learning and are often based on Auto-
Encoder (AE), Contrastive Learning, or Random Walk concepts. Recent work by Cao
et al. (2020) extract features in hyperspectral classification using the graph AE. Yang et al.
(2020) have prevented over-smoothing of message passing, while Park et al. (2021) have
used message passing AE for hyperbolic representation learning. Wu et al. (2021) have
employed graph AE to address a limitation of current methods for link prediction. Recently
Contrastive learning-based methods are also successful, as many research works indicate.
Okuda et al. (2021) have used the Contrastive Learning setting to extract unsupervised
graph representation for discovering common objects and localization methods for particu-
lar objects in images. On the other hand, Random Walks have been coupled with current
representation learning methods for language modeling to provide exceptional representa-
tions of nodes. The learned representation can be utilized for downstream learning tasks
like node classification and edge prediction, as shown in Du et al. (2018) and Perozzi et al.
(2014). Subgraph embeddings are also captured using expanded Random Walks in work
done by Adhikari et al. (2018), and node representations in heterogeneous graphs as in
work done by Dong et al. (2017).

13
A survey of graph neural networks in various learning paradigms:…

Table 1  Existing surveys on Graph Neural Networks and their comparison with our work
Papers Difference and novelty of ours article

Wu et al. (2021) The authors presented a new taxonomy by dividing existing GNNs into four catego-
ries—Recurrent, Convolutional, Spatial-Temporal, and Graph Auto-encoders GNNs.
Nevertheless, the paper has not explained each learning setting separately. On the
contrary, our article presents the new taxonomies for each type of GNNs’ learning
setting. We further provide many more available datasets related to various fields
and also the current popular applications of GNN
Sato (2020) The authors focus more on the power of GNNs and present a comprehensive overview
of the powerful variants of GNNs, but have not focused on taxonomy. Therefore, we
present the taxonomy and diverse features of GNN
Zhou et al. (2020) The authors provided a broad design pipeline of the GNNs and discussed each of
the module’s variants of GNNs. The article analyzed the GNNs both theoretically
and empirically. The paper presented applications of GNNs by dividing them into
structural scenarios and non-structural scenarios. The paper also introduced four
open problems of GNNs and gave future directions. Nevertheless, the paper has not
provided a separate taxonomy for each learning setting. Our article provides new
taxonomies for each type of GNNs’ learning setting. We introduced various applica-
tions of GNNs in the existing works and discussed several datasets for GNNs in
many domains
Abdal et al. (2021) The authors presented a comprehensive review of GNNs from the computing per-
spective. The paper also provides a detailed examination of current software and
hardware acceleration schemes, from which a graph-aware, hardware-software, and
communication-centric vision for GNN accelerators is derived. On the contrary, our
paper focuses on different learning settings of GNNs by providing proper taxonomy
for each of the setting types
Zhang et al. (2020) The authors provided a comprehensive overview of many types of graph-based DL
approaches. Based on the model architectures, the authors classified the existing
methods into five categories—Graph Recurrent Neural Networks, Graph Convolu-
tional Networks, Graph Auto-encoders, Graph Reinforcement Learning, and Graph
Adversarial methods. On the other hand, our paper presented a new taxonomy
of GNN based on different types of learning settings. Our paper also presented a
comprehensive review of self-supervised Learning, one of the modern and robust
methods. Besides that, our paper also discussed various graph-based datasets and
many applications of GNNs

Most existing survey papers on GNNs focus either on the single learning setting or
general GNNs, as shown in Table 1. These surveys have not explained each learning set-
ting separately. One of the most recent works is done by Song et al. (2021) that focused
on semi-supervised learning settings. Similarly, Zhou et al. (2020) focused on various
machine learning algorithms on graphs. This paper classifies the graph semi-supervised
learning methods based on their embedding characteristics, such as Shallow Graph
Embedding and Deep Graph Embedding. We divided the Shallow Graph Embedding
as Factorization, Random Walk, Deep Graph Embedding as Auto-encoder embedding,
and GNN. A further explanation of each method is provided, along with the categories
of the GNN. The graph-based self-supervised learning methods are classified based on
pretext tasks and training strategies. Furthermore, we explored each graph-based learn-
ing setting and divided it into logical categories.

13
L. Waikhom, R. Patgiri

1.1 Contributions

The key contributions of the article are outlined below.

1. Preliminary knowledge of graphs and their variants and different hierarchical levels of
graph-based tasks are introduced to help readers acquire the required understanding of
the given domain.
2. Theoretical and empirical aspects of graph neural networks are presented to give the
ideas behind the evolution of GNN-based methods.
3. Most comprehensive review of GNNs-based methods is presented with a focus on dif-
ferent learning settings, contrary to current surveys concentrating on a single learning
setting.
4. Further, a detailed description of each graph-based learning setting is given and divided
into required logical categories.
5. A general design guideline for designing GNN architecture models for different graph-
based tasks is presented to help readers understand the practical designing part of the
GNN model and the applicability of these models to real-world problems.
6. We provide many GNN resources, including state-of-the-art models, popular graph-
based datasets, and various applications. This helps readers find the domains and refer-
ences to build their systems upon.
7. Further, we assess the challenges of current techniques and possible future research
avenues for researchers, such as model depth, scalability, higher-order complex struc-
tures, and robustness of the methods.

1.2 Organization

Section 2 defines the variants of graph and tasks based on graph-structured data in Sects. 2.1
and 2.2 respectively. Section 3 introduces the basic terminologies and concepts of GNN and
also describes the message passing system in GNN. Section 4 analyzes the GNN methods
in both theoretical and empirical aspects. Section 5 discusses some of the common practi-
cal aspects of GNNs. Section 6 explains the GNN-based methods for each learning setting
and further breakdown of methods and learning settings into logical divisions. Section 6.1
briefly explains the existing graph supervised learning methods. Then we present the graph
semi-supervised learning methods in Section 6.2 along with subdivision of the methods by
the embedding methods. Section 6.3 explains the graph of self-supervised learning methods,
dividing each method in terms of pretext tasks and training strategies. Graph-based unsu-
pervised learning methods are explained in Sect. 6.4 along with a subdivision of the exist-
ing techniques in terms of learning methods. In Section 6.5, we present the graph few-shot
and meta-learning based methods. The general stepwise architecture of the GNN is given in
Sect. 7 and also discusses each part of the architecture, followed by Sect. 8, presenting various
popular applications of GNN. In Sect. 9, we present several commonly available datasets used
in research on GNNs. Section 10 summarises the unresolved issue still plaguing the GNN-
based solutions for graph-based tasks. Finally, we conclude this work in Sect. 11.

13
A survey of graph neural networks in various learning paradigms:…

2 Preliminaries

2.1 Variants of graph

Various graphs are found in nature and multiple domains, as shown in Table 2. Undi-
rected graphs are the most commonly found type. Problem settings are defined based on
the structure, scale, and graph types present in the data. Below we present the variants
of graphs.

2.1.1 Undirected/directed graph

An undirected graph G = (V, E) has undirected edges indicating that the two nodes have
a link with no directional information. Contrarily,A directed graph GD defined as an
ordered pair GD = (V, ED ), where V represents the set of nodes of the given graph, and
ED is the set of ordered pairs of directed links.. A directed graph provides more infor-
mation about the relationships among the nodes. For example, suppose we have a class
hierarchy representing Food Chains. In this case, we can describe that class hierarchy
using a directed graph, where the head points toward the dominating species and the tail
represent the herbivore or prey.

2.1.2 Heterogeneous graph

A heterogeneous graph is a variant of graphs that consist of the different node or edge
types and given as Gn = (Vi , Ei ) where Vi represents the node belongs to a kind of nodes,
and Ei represents the edge belongs to a kind of edges. The computation in this category
is achieved by altering the data using one-hot encoding to standardize node representa-
tion. The Graph Inception method groups the nodes based on the attributes and treats
them as single special nodes to form a smaller graph. These clusters, called subgraphs,
are used to do parallel calculations. The Heterogeneous Graph Attention Networks (Lin-
mei et al. 2019) have been developed with the same heterogeneous characteristics.

Table 2  The Real World Networks/Graphs with their important characteristics


Network Nodes Links Directed/undirected

Internet Routers Internet connections Undirected


Power Grid Power plants, transformers Cables Undirected
WWW​ Webpages Links Directed
Phone Calls Subscribers Calls Directed
Science Collaboration Scientists Co-authorship Undirected
Email Email Addresses Emails Directed
Actor-Network Actors Co-acting Undirected
E.Coli Metabolism Metabolites Chemical reactions Directed
Citation Network Paper Citations Directed
Protein Interactions Proteins Binding interactions Undirected
E.Coli Metabolism Metabolites Chemical reactions Directed

13
L. Waikhom, R. Patgiri

2.1.3 Dynamic graph

A dynamic graph is a type of graph which changes over time. Its input attributes might
also change. The nodes and edges are updated over time in this variant of graphs. Nodes
are added or deleted, and the corresponding edges between the nodes are either cre-
ated or updated. It allows adaptive structures or algorithms that require internal struc-
tures to be dynamic to be applied to graphs. The dynamic graph is formally defined as
Gt ∶= (Vt , Et ) ≠ Gt+1 ∶= (Vt+1 , Et+1 ), where Gt represents the graph at t sec, and Gt+1 rep-
resents the graph at t + 1 s (Fig. 1).

2.1.4 Attributed graph

Edges in graphs with additional information, such as their weights or types, are called
attributed graphs. The attributed graph is represented by GA = (V, E, L), where L denotes
the function that assigns attributes to the nodes and edges such that the attributes are rep-
resented as L(V) and L(E) respectively. The additional information of attributed graphs can
aid in developing architectures such as Graph-to-Sequence (G2S encoders) (Marcheggiani
and Perez-Beltrachini 2018) and Relational-GCN (R-GCN) (Schlichtkrull et al. 2018).
As a result, having a graph with edges storing extra information, such as the relationship
between nodes, becomes more manageable when working with relational data.

2.2 Hierarchical task classification

Graph-based data has knowledge embedded in various hierarchies of the structure. There-
fore at the different hierarchical levels, different types of tasks are defined. Tasks are
defined at node-level and edge-level. Furthermore, graph-level tasks encompassing whole
graphs or sub-graph are also defined based on various applications. Figure 2 represents
various hierarchical graph-based tasks.

2.2.1 Node level task

Tasks like node classification, node clustering, node regression, etc., are defined at node-
level taxonomy. Node classification attempts to classify nodes, whereas the regression task

Fig. 1  Variant of graphs found in nature and science. Directed VS Undirected Graphs, Static VS Dynamic
Graphs, Homogeneous VS Heterogeneous Graphs, and Attributed VS Non-attributed Graphs

13
A survey of graph neural networks in various learning paradigms:…

Fig. 2  Various hierarchies of tasks in a graph representing edge, node, subgraph, and graph

for nodes predicts a real value for each node. The node clustering intends to divide nodes
into many distinct classes, with related nodes grouped together. One famous example of
node-level tasks is the protein folding problem, where amino acids in a protein sequence
are treated as nodes and the proximity between amino acids as edges. High-level node rep-
resentations can be extracted using information propagation/graph convolution, recurrent
GNNs, and Convolutional GNNs. These different GNNs can execute node-level tasks end-
to-end using a multi-layer perceptron or softmax layer as the output layer.

2.2.2 Edge level task

The link prediction and edge classification tasks are edge-level tasks. In short, link predic-
tion and edge classification are the tasks that require the model to predict whether or not
there is an edge between two nodes or categorize edge types. One of the primary examples
of edge-level tasks is the recommendation system (recommend the items users might like),
which can be represented as a link prediction task. Here, items and users act as the nodes
and user-item interactions as edges. Another important example is biomedical graph link
prediction. Here, drugs and proteins represent the nodes, and their interactions represent
the edges. The task is how likely Simvastatin and Ciprofloxacin will break down muscle
tissue. Using the hidden representations of two nodes from GNNs as inputs, a similarity
function or a DNN is used to determine the connection strength of an edge.

2.2.3 Graph level task

Graph classification, regression, and graph matching tasks require modeling of graph-level
representation. The graph classification task depends on the results of subgraph level tasks.
Some popular graph-level tasks are

(a) Drugs discovery (such as antibiotics are treated as small molecular graphs). Here, atoms
represent nodes and chemical bonds between them as edges.
(b) Physics simulation where nodes represent the particles and the edges represent interac-
tions between particles.

13
L. Waikhom, R. Patgiri

(c) The graph level tasks can be at the subgraph level. One of the most famous examples
of subgraph-level tasks is traffic prediction by considering the road network as a graph.
Road segments represent the nodes, and connectivity between road segments is the
link/edges.

GNNs are frequently coupled with pooling and readout operations to get a compact repre-
sentation on the graph level.

3 Graph neural networks

Graph Neural Network is an extension and evolution of deep learning-based methods for
analyzing graph data. Table 3 shows the mathematical notations used by us throughout
this article. As stated previously, a graph is an ordered pair of a set of V nodes and a set
of E edges. A graph is represented as G = (V, E), where a node v ∈ V and a directed edge
between u and v nodes (represented by an arrow as u → v), it forms an ordered pair of
nodes (u, v) ∈ V × V . Undirected edges are assumed to have equal weightage from both
directions. An instance of a graph G where nodes are associated with feature vectors xu
with or without edges associated with edge feature vectors x(u,v) is fed to a GNN as the
input. hu and h(u,v) are the hidden representations for nodes and edges, respectively. As an
initial node representation, we use hu = xu and h(u,v) = x(u,v). The graph’s structure dictates
the message passing updates to get updated node and edge representations h′u and h�(u,v).

Table 3  Important mathematical notations used in this article


Notation Description

G A Graph
V Set of nodes
E Set of edges
|V| Total number of nodes
N(u) Neighboring nodes of node u
A Adjacency matrix
v Node v; v ∈ V
e(u,v) Edge between nodes u and v; e(u,v) ∈ E
L Labels of the nodes
xu Feature vector for node u
X Node feature matrix
x(u,v) Feature vector for e(u,v) edge
XE Edge feature matrix
hu Hidden representation vector for node u
h(u,v) Hidden representation vector for e(u,v) edge
h(n)
u Updated hidden representation vector for node u at nth step
h(n) Updated hidden representation vector for e(u,v) edge at nth step
(u,v)
zu Embedding for node u
Z Embedding matrix

13
A survey of graph neural networks in various learning paradigms:…

The message passing operation updates the hidden representation hu of each node u
in every iteration. Message passing operation entails aggregating messages MN(u)
(n)
based
on some pooling strategy (Max, Mean, etc.) from the neighboring nodes N(u) of node u
and updating the previous hidden representation vector hu by passing the through non-
linear transformation. The message passing aggregation and update operations are given
as:

h(n+1) = 𝚄𝚙𝚍𝚊𝚝𝚎 h(n) , 𝙰𝚐𝚐𝚛𝚎𝚐𝚊𝚝𝚎({h(n)


( )
u u v
, ∀v ∈ N(u)})
( )
(n)
h(n+1)
u = 𝚄𝚙𝚍𝚊𝚝𝚎 h(n)
u , MN(u) ; (1)
(n)
MN(u) = 𝙰𝚐𝚐𝚛𝚎𝚐𝚊𝚝𝚎({h(n)
v
, ∀v ∈ N(u)})

where both Update and Aggregate functions are arbitrarily differentiable (usually neu-
ral networks). MN(u)
(n)
represents the aggregated message from the neighboring nodes N(v)
of u node at nth step of the message passing. Message passing updates of the hidden states
do not require the shared parameters of Update and Aggregate functions but can be
shared to reduce the total number of learnable parameters and, on some graph types, helps
in reducing the overfitting (Hamilton et al. 2017). Figure 3 shows an illustration of the mes-
sage passing operation (aggregation and update of hidden representation of a node).
The Aggregate function takes the set of representations (embeddings) of the
neighboring nodes N(v) of node u and creates a massage MN(u) (n)
based on the aggregated
neighbourhood information at nth iteration. The Update function then combines the
message MN(u)(n)
with prior embedding h(n)v
of node u to produce the updated embedding
(n+1) . The initial embeddings are set to the input features for all nodes at k = 0 , i.e.,
hu
h(0)
u = xu , ∀u ∈ V
. We may utilize the output of the last layer to define the embeddings
for each node after performing nth iterations of the GNN message passing and given as:

zu = h(n)
u
, ∀u ∈ V (2)

To get a concrete form of the Update and Aggregate functions to transform the
abstract GNN concept provided in Eq. 1 into something we can implement. We begin with
the most fundamental GNN framework, simplifying the original GNN models proposed by

Fig. 3  Illustration of the message passing update in a GNN using five nodes fully connected graph. In the
left figure, node representation h(n)
u is assigned to each node. Intermediate edge representations (u,v) are
h(n)
produced using nearby node representations h(n)u , their initial node features x u , and edge features (if present)
x(u,v) in the message passing step (shown in the right figure) (u, v) . Edge representations for incoming edges
are aggregated to obtain updated node representations h(n+1)
u

13
L. Waikhom, R. Patgiri

Scarselli (2008), and Merkwirth and Lengauer (2005). The message passing of the basic
GNN is given as:
( )
(n+1)

(3)
(n+1) (n+1) (n) (n) (n+1)
hu = 𝜎 Wu hu + WN(u) hv + b
v∈N(u)

where, 𝜎 is the element-wise non-linearity, Wu(n+1) and WN(u)


(n+1)
∈ ℝd ×d are the trainable
(n+1) (n)

parameter matrices of the node u and its neighboring nodes N(u). For notational simplicity,
the bias term b(n+1) ∈ ℝd is frequently removed, however incorporating the bias term
(n+1)

might be essential to achieve high performance.


Kipf and Welling (2016) generalized and united prior approaches into an approach
known as the Graph Convolution Network (GCN). A similar approach (Interaction Net-
work) was also presented by Battaglia et al. (2016) and Gilmer et al. (2017) (message pass-
ing neural network). Gori et al. (2005) supposedly gave the first GNN model. The model
incorporates many basic principles, like a recurrent neural network trained in back-propa-
gation over time and identified a contraction map to get an on-demand general description
of GNNs. Moreover, an explicit hidden edge representation h(u,v) is not learned by this type
of GNNs, and the node update function Update was based on adjacent states hv,u and
v ∈ N(u) (along with initial node feature vectors xu). Scarselli et al. (2008) expanded the
work of updating initial edges x(u,v) by conditioning the messages.

4 Theoretical and empirical aspects

Distinct motivations influenced advances in GNNs. Generalizing convolutions operation in


Euclidian space to non-Euclidian space led to significant efforts. On the other hand, gener-
alization of message passing approaches for probabilistic inference in the graphical domain
and research work in graph isomorphism test also shaped this domain.

4.1 Theoretical aspect

Here we overview the various concepts and foundations of applying deep learning to
graphs from diverse viewpoints and theoretical underpinnings.

4.1.1 Graph signal processing

We can view the message passing in GNNs with nodes as signals propagating between
nodes until an equilibrium is achieved. Fourier transform is a vital analysis approach for
spectral analysis of signals. One crucial aspect of convolution operation ⋆ (as shown in
Eq. 4) is computation by performing an elementwise product ◦ of Fourier transforms F of
the two convolved functions p and q, as shown in Equation 5.

∫ℝd (4)
(p ⋆ q)(x) = p(y)q(x − y)dy

13
A survey of graph neural networks in various learning paradigms:…

(p ⋆ q)(x) =F−1 (F(p(x)◦F(q(x)) (5)


Another essential property of convolutions is translation (or shift) equivariance, as shown
in Eq. 6.
p(t + a) ⋆ q(t) = p(t) ⋆ q(t + a) = (p ⋆ q)(t + a) (6)
Hence, translating a signal and then convolving it is equal to convolving the signal first and
then applying the translation. Further, as convolutions are equivariant, they are equivariant
to difference operation, as shown in Eq. 7.
Δp(t) ⋆ q(t) = p(t) ⋆ Δq(t) = Δ(p ⋆ q)(t) (7)
where
Δp(t) = p(t + 1) − p(t) (8)
Δ is the difference (Laplace) operator on discrete univariate signals. These ideas of filter-
ing and equivariance properties are essential to digital signal processing (DSP) and are the
main ideas of the convolutional neural network (CNN).
These ideas can be translated to the graph domain by viewing time-varying signals
propagating from one node to another (chain graph). This perspective allows us to repre-
sent time-shift operations on graph signals in terms of adjacency and the Laplacian matrix
of the graph. As a result, we can represent convolution operation on graphs using matrix
multiplication of a filter q with a vector p as shown in Eq. 9.
N−1

(p ⋆ q)(t) = p(i)q(i − t) = Qq f (9)
i

where Qq ∈ ℝN×N represents the matrix form of convolution operation by filter q. The Q
matrix must satisfy the translation equivariance property.
The connection of the Fourier transform and eigenfunctions of the Laplace opera-
tor guides the generalization of the Fourier transform to random graphs. The convolution
operations of the GNNs are done in the spectral domain that follows the signal processing
theory for graphs and is performed over the input data. The convolutional operation on a
graph operates on the spectral representation of that graph. In this approach, the Graph
Fourier Transform F transform is used to perform the convolutional operation on a graph
signal s. The Inverse Graph Fourier transform F −1 converts the graph signal s back to its
original form.

F(s) = U T s, and F −1 (s) = Us (10)


Where U stands for the normalized graph laplacian eigenvectors matrix. Suppose, the adja-
cency matrix of the graph is given
1 1
by A, and the degree matrix is D, then the normalized
graph laplacian is L = In − D 2 AD 2 . We can factorize L as U𝕍 U T where 𝕍 is the diagonal
matrix of the eigenvalues. The spectral convolutional propagation operator is given by the
Eq. 11, which is based on the convolution theorem given by Mallet (1999).

f ∗ s = F −1 (F(f ) ⊙ F(s)) = U(U T f ⊙ U T s) (11)


Here U T f is the spectral filter, and ⊙ denotes the element-wise product. We can further
simplify the spectral filter by using various learnable diagonal matrix fw.

13
L. Waikhom, R. Patgiri

fp ∗ s = Ufw U T s (12)

4.1.2 Generalization

Recent emphasis has also been paid to GNN’s capacity for generalization. The Vapnik-
Chervonenkis (VC) dimensions of a limited class of GNNs are shown by Scarselli et al.
(2018). Garg et al. (2020) also offer stricter generalization limits for neural networks based
on Rademacher constraints. The stability and widespread characteristics of single-layer
GNNs with various convolutional filters examine by Verma and Zhang (2019). The authors
find that GNN stability depends on the filters’ large value. Knyazev et al. (2019) concen-
trate on the capacity of the GNN mechanism for generalization. Their result indicates that
GNNs generalize attention in bigger and noisier graphs.

4.1.3 Expressivity

Concerning the expressiveness of GNNs, Xu et al. (2018), and Morris et al. (2019) have
shown that the GCNs and GraphSAGE are not so distinctive as compared to Weisfeiler
Leman’s (WL) testing for graph isomorphism testing. Xu et al. (2019) offer more expres-
sive GNN called GINs. Barcelo et al. (2019) explore whether GNNs express FOC2, which
is a first-order logic fragment, go beyond a WL-test. Existing GNNs are barely suitable
for writers. Garg et al. (2020) show that local GNN varieties cannot acquire global graph
characteristics, including diameters, largest/smallest cycles, or motifs, to learn about graph
topologies. Loukas (2019) and Dehmamy et al. (2019) claim that present efforts only con-
sider expressiveness when GNNs contain endless units and layers. Their work examines
the depth and width of representing the power of GNNs. Oono and Suzuki (2019) dis-
cuss the asymptotic behavior of GNNs as the model expands them and describes them as
dynamic systems.

4.1.4 Invariance

GNN output embedding should be permutation invariant or similar to the input features
since node orderings are present in graphs. Maron et al. (2018) describe linear or equiva-
lent permutation layers to construct invariant GNNs. Maron et al. (2019) further demon-
strate the finding that a higher-order tensorization of the universal invariant GNN is possi-
ble. The alternative proof by Keriven and Peyre (2019) extends the result to the equivalent
case. Chen et al. (2019) establish linkages between graph isomorphism testing and per-
mutation invariance testing. The author uses 𝜎 algebra to show the expression of GNNs to
demonstrate their equivalence.

4.1.5 Transferability

Learned parameterization with graphs is one deterministic feature of GNNs. It indicates


the possibility of transferring learning between graphs with performance guarantees (trans-
ferability). Levie et al. (2019) investigate the transferability of spectrum filters, and it is

13
A survey of graph neural networks in various learning paradigms:…

shown that these filters are transferred on graphs having the same dominance. The GNN
behavior on graphs is further analyzed by Ruiz et al. (2020), and they conclude that GNNs
can be transmitted across graphs from the same graph of various sizes.

4.1.6 Label efficiency

Self-supervised learning for GNNs requires a significant quantity of labeled information to


succeed. Given the self-learning, improving label marking efficiency is explored to advance
the system. Labels are chosen from informative nodes with the help of an oracle for train-
ing GNNs. Cai et al. (2017), Gao et al. (2018), and Hu et al. (2020) show that the labeling
efficiency can be substantially increased by choosing high-degree nodes and uncertain nodes
(informative nodes).

4.2 Empirical aspect

Empirical analyses of GNNs are also necessary to properly compare and assess. We include
many empirical research and standards for GNN assessment. Various open-source codes and
frameworks are available for experiments on the GNN models (shown in Table 8).

4.2.1 Evaluation

A crucial stage in research is the evaluation of machine learning models. Over the years, con-
cerns have been expressed regarding experimental reproductivity and replicability. Which
GNN models function and to what extent? In which Sections of the models the ultimate per-
formance contributes? Studies on proper assessment techniques are critically needed to exam-
ine such fundamental issues. Shchur et al. (2018) are looking at how the GNN models perform
with the same training techniques and hyperparameter tuning in semi-supervised node clas-
sification tasks. Different divides in their data sets lead to significantly different model rank-
ings. In addition, simple models might exceed sophisticated models in suitable conditions.
The structural information is not completely used in graph categorization based on a thorough
assessment. You et al. (2020) discuss the architectural designs of GNN models and others,
such as the number of layers and the aggregation function. This study gives extensive instruc-
tions for the designation of GNN for different purposes with many tests.

4.2.2 Benchmarks

In machine study research, high-quality and big benchmark data sets such as ImageNet are
essential. However, widely adopted benchmarks are arduous in graph learning. For exam-
ple, most categorization node data sets are tiny compared to real-world charts, with just
3000 to 20,000 nodes. In addition, the experimental procedures in each study, which are
dangerous to the literature, are not harmonized. This problem is alleviated by providing
scalable and reliable graph learning benchmarks, as shown by Dwivedi et al. (2020) and
Hu et al. (2020). In various domains and tasks, Dwivedi et al. (2020) construct medium-
scale datasets, whereas Hu et al. (2020) offers large-scale datasets. In addition, these two
papers analyze existing GNN models and provide guidelines for further comparison.

13
L. Waikhom, R. Patgiri

5 GNN architectures in practice

In this Section, we define how GNNs are optimized in practice by taking some of the
robust approaches of GNNs, such as Convolutional Graph Neural Network, Recurrent Neu-
ral Network, Graph Auto-encoder Network, and Graph Adversarial Network.

5.1 Convolutional graph neural network

Several convolutional propagation methods are developed using filters fp where p ∈ ℝN is


the parameter. Bruna et al. (2013) presented Spectral Network using the learnable diagonal
matrix as a filter. This technique, however, is computationally inefficient, and the filter is
not spatially localized. Henaff et al. (2015) aims to localize spectral filters using a param-
eterization with smooth coefficients. According to Hammond et al. (2011), fp may be esti-
mated by a short expansion in terms of Chebyshev polynomials Tk (s) up to kth order. Based
on this hypothesis, Defferrard et al. (2016) proposed the ChebNet. Further, to avoid over-
fitting, Kipf and Welling reduced the convolution operation used in ChebNet (2016) with
k − 1 and proposed a technique called GCN.
Li et al. (2018) use graph convolution, which performs Laplacian smoothing over the
feature matrix, resulting in similar hidden representations of neighboring nodes. Laplacian
smoothing is based on Homophily’s assumption that nearby nodes have similar labels. The
smoothing is performed as a low-pass filter applied over the input feature matrix. Wu et al.
(2019) confirmed it by removing the non-linearity between layers and weight matrices,
showing that GNNs work only because of smoothing.
Using the idea of a low-pass filter over the feature matrix is used by many researchers
to get new insights by applying different filters (Zhang et al. 2019; Cui et al. 2020). Nt and
Maehara (2019) state that graph convolution is essentially a process of denoising the input
feature. The model performance depends significantly upon the quantity of noise in the
matrix. Chen et al. (2020) presents two metrics to measure the smoothness of representa-
tion for nodes and the excess smoothness of GNN models to relieve the over-smoothing
problem. The authors conclude that the key to over-smoothing is the noise-to-information
ratio.
For example, the convolution matrix H of GCN Kipf and Welling (2016) is given as

(13)
1 1
̃ − 2 AD
H=D ̃ − 2 XW

where X is the feature matrix, and W is the node’s trainable parameter matrix. The GCN
uses symmetric-normalized aggregation with the self-loop updating technique. It is one of
the most common baseline GNN models. Again, from the Eq. 3, the GCN model specifies
the message passing function as
� �
� hu
(14)
(n+1) (n+1)
hu =𝜎 W √
v∈N(u)∪u ∣ N(u)‖N(v) ∣

where the current and neighboring nodes are u and v. W (n+1) represents the parameter
matrix in (n + 1)th iteration, and N(u) are the neighbouring nodes.

13
A survey of graph neural networks in various learning paradigms:…

5.2 Recurrent graph neural network

Recurrent GNNs are most pioneers in the GNNs research work. In the Recurrent GNNs,
same weights are used for the operators over the nodes of a graph recurrently and extract
the high-level representation of nodes. Due to computational constraints, previous research
has mostly concentrated on directed acyclic networks (Sperduti and Starita 1997; Micheli
et al. 2004). Scarselli et al. (2008) introduced a GNN model by extending the earlier recur-
rent models that can work on various kinds of graphs, such as cyclic, acyclic, directed,
and undirected graphs. The model updates node states by sharing neighborhood informa-
tion recurrently until a stable equilibrium is established based on an information diffusion
process.
GNN can be applied to all nodes because of the sum operation, even if the number of
neighbors varies and no neighborhood ordering is known. The recurrent function must be
a contraction mapping To ensure convergence, which decreases the distance between two
points after projecting them into a latent space. When the recurrent function is a neural
network, a penalty term must be applied to the Jacobian matrix of parameters. When the
convergence condition is satisfied, nodes’ hidden states in each step are fed to a readout
layer. The basic Recurrent GNN model defined as the message passing function is given by
Eq. 15.
(n+1)
h(n+1)
u = RNN(h(n)
v , MN(u) ) (15)

where RNN represents any variant of Recurrent Neural Network such as Gated recurrent
unit (GRU), Long short-term memory (LSTM), etc., MN(u)
(n+1)
represents the message aggre-
gated from the neighbouring nodes N(u) of the node u in (n + 1)th iteration.

5.3 Graph auto‑encoder network

The Auto-Encoder (AE) and its variants have been widely used in unsupervised learning tasks
(Vincent et al. 2008) and are well suited for learning graph node representations. The implicit
assumption is that graphs have a low-rank structure that is possibly nonlinear. A variation
of the Auto-encoder called Variational Auto-Encoder (VAE) (Kingma and Welling 2013) is
one of the most widely used methods for developing deep generative models for graph data.
The central concept behind using a VAE on graphs is to train a probabilistic decoder model
D(A ∣ ℤ) from which we may sample the adjacency matrices A by conditioning on a latent
variable Z . VAE is used to learn a conditional distribution across adjacency matrices in a
probabilistic manner, with the distribution being conditioned on some latent variable.
A VAE is trained by combining the probabilistic decoder model D(ℤ ∣ G) with a prob-
abilistic encoder model E(G ∣ ℤ). This encoder model converts an input graph G into a
posterior distribution over a latent variable ℤ. The objective is to train both the encoder
and the decoder together so that the decoder can reconstruct training graphs given a latent
variable ℤ sampled from the encoder. After training, the encoder may discard and construct
new graphs by sampling latent variables ℤ from some prior distribution and feeding them
to the decoder. By training the AE-based models to reconstruct a set of training graphs,
it can use the sample node embeddings from the traditional normal distribution and the
decoder to produce a graph. However, these approaches have limited capacity, for exam-
ple, when a simple dot-product decoder is utilized. The fundamental difficulty is that the

13
L. Waikhom, R. Patgiri

decoder has no parameters; therefore, the model cannot construct non-trivial graph topolo-
gies without a training graph.

5.4 Graph adversarial network

VAE models have a well-defined probabilistic foundation, and numerous publications exploit
and evaluate the structure of the latent spaces learned by VAE models. However, VAEs have
significant limitations, such as the tendency for VAEs to distorted recreations. As a result, to
overcome these limitations, many state-of-the-art graph generative approaches use different
generative adversarial techniques.
The key idea of the generative adversarial-based approach is to define a trainable genera-
tor network g𝜃 ∶ ℝd → X . Using a random seed z ∈ ℝd as input, the generator network is
trained to create fake but look like real graph data samples x� ∈ X . Simultaneously, a discrimi-
nator network d𝜙 ∶ X → [0, 1] is also defined. The discriminator’s objective is to differentiate
between actual graph data samples x ∈ X and samples created by the generator x� ∈ X . In this
case, we suppose that the discriminator outputs the probability that particular input is fake.
Both generator and discriminator networks are optimized in an adversarial game to train an
adversarial-based method, as shown in Eq. 16.
min max 𝔼x∼pdata (x) [log(1 − d𝜙 (x))] + 𝔼z∼pseed (z) [log(d𝜙 (g𝜃 (z)))]) (16)
𝜃 𝜙

One significant advantage of the adversarial-based approach is that it eliminates the need
to define a node ordering in the loss computation. The adversarial-based method does not
require any node order to be provided as long as the discriminator model is permutation
invariant, which is the case for practically every GNN. If the discriminator is permuta-
tion invariant, the ordering of the adjacency matrix formed by the generator is irrelevant.
Despite this significant advantage, adversarial-based graph creation techniques have gained
less attention and success than variational counterparts. It is most likely due to the diffi-
culty associated with the min-max optimization required by adversarial-based techniques.

6 GNNs in various learning paradigms

We observe from the literature and divide graph-based learning methods into four distinct
training settings from the perspective of supervision: 1. Supervised Graph Learning. 2. Semi-
supervised Graph Learning. 3. Self-supervised Graph Learning. 4. Unsupervised Graph
Learning. 5. Few shot and Meta Graph Learning.

6.1 Supervised graph learning

Every supervised learning method begins with a dataset D = (xi , yi ), i ∈ {1, 2, … , n}, where
xi is a d-dimensional feature vector, and yi is the corresponding label. We presume that these
data points are selected from a distribution P, so that (xi , yi ) ∼ P. In this learning paradigm,
we want each (xi , yi ) to be identically distributed and independent, which is defined as

D = (x1 , y1 ), … , (xn , yn ) ⊆ ℝd × S (17)

13
A survey of graph neural networks in various learning paradigms:…

where n is the dataset’s size, ℝd is the d-dimensional feature space, xi is the feature vector
of the ith node, yi is the label of the ith node, and S is the space of all labels.
The aim of supervised learning may be summarized as developing a function 𝔽 such that
for every new input and output pair (x, y) sampled from P, we have 𝔽 (x) ≈ y, where 𝔽 is the
function with parameters and gives the output yi for each input vector xi:

yi if there exists (xi , yi ) such that x = xi ≥ 0


{
𝔽 (X) =
0 otherwise (18)

The prevailing graph construction techniques are unsupervised during the building phase,
i.e., unsupervised learning does not employ any specific label information. However, the
labeled samples are utilized to improve the graph created for downstream learning tasks
as a type of prior knowledge. Dhillon et al. (2010) investigate the use of labeled points to
determine the similarities between node pairs. Rohban et al. (2012) provide another super-
vised technique of graph building, demonstrating that the best solution for a neighborhood
graph is considered a subgraph of a KNN graph as long as the manifold sampling rate is
big enough.
Based on earlier research presented by Ozaki et al. (2011), a novel technique called
graph based on the informativeness of labeled instances (GBILI) is proposed by Berton
et al. (2014), which also uses the label information. GBILI achieves a reasonable clas-
sification accuracy level but also has quadratic time complexity. Furthermore, Lilian
et al. (2017) have improved the method for creating more resilient graphs by addressing
an optimization issue with the Robust graph that considers labeled instances (RGCLI)
algorithm, which is based on GBILI (Berton et al. 2014). Recently, a low-rank semi-
supervised representation by Zhuang et al. (2017) has been suggested as a novel semi-
supervised learning technique that includes labeled data in the Low-rank representation
(LRR). Taherkhani et al. (2019) published a follow-up study. The produced similarity
graph can significantly aid the subsequent label inference process by including extra
supervised information. Zhang et al. (2022) presented a novel dynamic GNN model
called Dynamic Graph Recommendation Network to extract the preference items of
the users from the dynamic users-items graph. Huang et al. (2021) proposed a knowl-
edge-aware GNN model called Knowledge-aware Coupled Graph Neural Network. This
network integrates interdependent knowledge from users and items to provide recom-
mendations. The network enables high-order user- and item-wise relation encoding by
utilizing mutual information for global graph structure awareness. The authors also
enhance the proposed network by giving it the capability to capture dynamic, multi-
typed user-item interactive patterns.
Recently, numerous research domains have begun to pay more attention to hypergraphs,
which are expressive graph structures and flexible to represent the higher-order correlations
among entities. Feng et al. (2019) proposed a hypergraph neural networks framework that
can encode high-order data correlation in a hypergraph structure to learn data representa-
tion. Sun et al. (2021) presented a heterogeneous hypergraph-based representation learning
framework for conventional graphs that can effectively characterise various non-pairwise
relations. Hu et al. (2021) presented an adaptive hypergraph Auto-encoder to generate low-
dimensional node embeddings that utilize the high-order relation to provide embeddings
for clustering.
Hyperbolic spaces have recently gained increasing popularity in graph data process-
ing with tree-like structure or power-law distribution, owing to their exponential growth
property for datasets with highly non-Euclidean latent anatomy. Liu et al. (2019) presented

13
L. Waikhom, R. Patgiri

a GNN model for learning representations on Riemannian manifolds with differentiable


exponential and logarithmic maps. Liu et al. (2021) introduced a multi-relational hyper-
bolic GNN-based approach for building a disease-predictive model. Chami et al. (2019)
presented an inductive hyperbolic GCN that leverages both the expressiveness of GCNs
and hyperbolic geometry to learn inductive node representations for hierarchical and
scale-free graphs. Xu et al. (2022) presented a hyperbolic GNN approach that joins the
hyperbolic and Euclidean geometry contrastive GNNs that learn multiple layer-wise opti-
mal combinations of Euclidean and hyperbolic geometries to encode the diverse and com-
plex graph structures effectively. Sun et al. (2021) presented a hyperbolic GNN model to
learn dynamic graph representation in hyperbolic space that aims to infer stochastic node
representations.
Supervised learning has various advantages and limitations. In this learning approach,
one can easily understand the computational/learning procedures of the model. The user
has exact knowledge about the classes of the data before training. Therefore, the model
is trained to have a perfect decision boundary to classify the different classes accurately.
However, self-supervised learning has several limitations, such as: 1. It does not work for
some complex tasks in machine learning. 2. Unlike unsupervised learning, the trained
model cannot give anonymous information. It cannot discover or cluster the data’s features
on its own. 3. The classifier needs to be trained on many robust examples from each class.
Otherwise, your model’s accuracy would be pretty low. Therefore, working with a lot of
training data is quite challenging. 4. Training of model usually takes a lot of computation
time, especially for the huge data set.

6.2 Semi‑supervised graph learning

Semi-supervised learning has been around for many years. For this type of learning, p
nodes where 0 < p < n have corresponding labels Lp, and the labels Lq of the remaining
q = n − p nodes are missing. The main objective of the semi-supervised learning task is
to learn a predictive function f ∶ G, X; Lp → Lq to derive the missing labels Lq for the
unlabeled nodes. It applies to the scenario where only a few labeled samples are available,
and the rest of the data samples are unlabeled. Actual labels of unlabeled data samples
are computationally heavy to compute. Therefore, unlabeled data samples must be utilized
using novel ways to solve the problems.
The manifold assumption in this field of the study suggests that nodes closer to each
other in the low-dimensional manifold are similar and should have the same label. Over
the years, many methods have been employed to do semi-supervised learning in various
domains. Semi-supervised graph learning is an emerging subfield that is a good fit as graph
structure fits manifold assumptions of semi-supervised learning. The node represents the
data samples in this learning paradigm, and edges give the similarity between the nodes.
Nodes having large edge weights represent high similarity and belong to the same cate-
gory, following the manifold assumptions.
Graph structures are intuitive and highly expressive, leading to successful semi-super-
vised graph learning-based methods falling under the manifold assumption. Several semi-
supervised survey articles (Zhu 2005; Pise et al. 2008; Prakash and Nithya 2014) have
focused on traditional ways of dealing with semi-supervised settings. A few of the new
works, such as Ouali et al. (2020), Van Engelen and Hoos (2020), study graph construction
and graph regularization, which focus on the current general overview of semi-supervised

13
A survey of graph neural networks in various learning paradigms:…

learning. In this Section, we focus on the advances in the semi-supervised graph learning
paradigm, which explicitly advances in graph embeddings. Table 4 shows the most recent
methods for semi-supervised graph learning.

6.2.1 Graph embedding

Embedding is generally the focus of semi-supervised graph learning tasks. Two levels
of embedding are observed in semi-supervised graph learning methods. One is com-
puted for the entire graph, while the second is for a single node (Hamilton et al. 2017).
The objective of both embedding levels is to represent the given object in a low-dimen-
sional embedding space where the local structure of the given object is preserved.
Given a graph G, the node embedding is denoted as a mapping
hz ∶ v → zv ∈ ℝd , ∀v ∈ V such that d ≪ |V|. The proximity measure for nodes in graph
G is preserved by hz . The loss function for graph embedding methods can be defined by
the generalized Eq. 19.
∑ ∑
ℂ(f ) = ℂs (h(hz (xi )), yi ) + 𝜇 ℂr (h(hz (xi )))
(19)
(xi ,yi )∈𝔻i xi ∈𝔻i +𝔻u

Here hz represents the embedding function. This function is similar to the one used in
graph regularization. The only difference is that models are trained using node embedding
results instead of node attributes in graph regularization. All the graph embedding meth-
ods are generalized under the framework of encoder–decoder. The encoder generates low
dimensional embeddings of input nodes, while the decoder reconstructs the original infor-
mation related to each node from the embeddings.

• Encoder: The encoder is formally defined as mapping v ∈ V nodes into zv ∈ ℝd vec-


tor embedding space. The generated embeddings in the latent space with additional
dimensions are more discriminatory. In addition, the following decoder module
can easily convert the embedding vector back to the original feature vector. From a
mathematical point of view, we have an encoder: V → ℝd .
• Decoder: The principal aim of the decoder module is to rebuild specific graph statistics
from the nodes embedded in the previous phase. For example, if zu is a node embedding
of a node u, the decoder might try to predict the neighbor N(u) or its row A[u] in the
adjacency matrix of that node. Often Decoders are defined in a pair form that is shown
to predict the similarity of each node. Mathematically, a Decoder maps ℝd × ℝd → ℝ+.

Shallow graph embedding Many specific matrix factorization optimization methods are
used as a determining means of solving optimization issues. The task is to develop a
low-dimensional approximation of an S matrix using matrix factorization techniques. S
encodes the information relating to the original adjacency matrix or other matrix meas-
urements. Various methods have successfully developed overlapping stochastic neigh-
borhood measures to provide low embedding, unlike the deterministic factorization
techniques. The essential new feature is the node embedding which optimizes two net-
work nodes to coincide with a high probability of short Random Walks, and the embed-
dings are also the same (Spitzer 2013).
Factorization A matrix that specifies the relationship between each node pair is
factorized to get the node embedding. This matrix generally includes certain basic

13
Table 4  Summary of existing GNNs-based Semi-Supervised learning methods

13
Methods Embedding Embedding Loss function Decoder Similarity measure
architecture methods

Roweis et al. (2000) Shallow Fact p q


zp − q Apq zq Apq
∑ ∑ ∑
��zp − Apq zq ��2
Belkin et al. (2001) Shallow Fact Dec(zp , zq ).S[p, q] ||zp − zq ||22 Apq
Ahmed et al. (2013) Shallow Fact ||Dec(zp , zq ) − S[p, q]||22 zTp zq Apq
Cao et al. (2015) Shallow Fact ||Dec(zp , zq ) − S[p, q]||22 zTp zq Apq , … , Akpq
Ou et al. (2016) Shallow Fact zTp zq Similarity matrix S
Tz
ezp p PG(p|q)
||Dec(zp , zq ) − S[p, q]||22
Perozzi et al. (2014) Shallow RW −S[p, q] log(Dec(zp , zq ))
Tz
ezp k
k∈V

Tz
Yang et al. (2016) Shallow RW 𝔼pn ∼Pn (V) [log(−𝜎(zTp zqn ))] ezp p PG(p|q)
Tz
ezp k
k∈V

Tz
Graover et al. (2016) Shallow RW ezp p PG(p|q)
(p,q)∈D − log(𝜎(zTp zqn )) − 𝛾𝔼pn ∼ Pn (V)[log(−𝜎(zTp zqn ))]
Tz
k

k∈V ezp

Tang et al. (2015) Shallow RW 1 PG(p|q)


(p,q)∈D − log(𝜎(zTp zqn )) − 𝛾𝔼pn ∼ Pn (V)[log(−𝜎(zTp zqn ))]
−zT
1−e p zk

Tang et al. (2015) Shallow RW −S[p, q] log(PG(p|q)) 1 PG(p|q)


−zT
p zk
1−e
Wang et al. (2016) Deep AE
∑ 2 MLP sp
2
) −
p∈V ��Dec(zp ) − sp ��2
Cao et al. (2016) Deep AE p∈V p s p 2 MLP sp

2
) −
��Dec(z ��
Taheri et al. (2019) Deep AE p∈V p s p 2 LSTM sp

2
��Dec(z ��
Tu et al. (2018) Deep AE p∈N( p) LSTM(zp )��2
LSTM sp
∑ ∑
p∈V ��(zp ) −
Kipf & Welling (2016) Deep AE 𝔼u(Z|X,A) [log p(A|Z)] − KL[u(Z|X, A)||p(Z)] zTp zq Apq
Pan et al. (2019) Deep AE minG maxD 𝔼z∼uz [log D(Z)] + 𝔼x∼u(x) [log(1 − D(G(X, A)))] zTp zq Apq
L. Waikhom, R. Patgiri
A survey of graph neural networks in various learning paradigms:…

structural information, such as an adjacency matrix and a normalized Laplacian matrix,


regarding the built similarity graph. Various matrix characteristics can lead to the fac-
torization of these matrices in different ways. The normalized Laplacian matrix is posi-
tive and semi-definite; hence, its eigenvalue decomposition is a natural match.
Random walk It is a valuable tool for obtaining approximation outcomes on specific
characteristics of a particular graph, such as node centrality (Newman 2005) and similarity
(Fouss et al. 2007). Thus, random node embedding methods for specific circumstances are
successful when just part of the graph is accessible or too broad for efficient management.
Limitations of shallow embedding Although shallow embedding methods have dem-
onstrated remarkable performance on various semi-supervised tasks. Shallow embed-
ding also has significant disadvantages that researchers have found difficult to overcome.
Some of the limitations are as follows.

• Lack of shared parameters: Few parameters are shared. Parameters are not shared
among nodes in the encoder module, as the encoder directly creates a unique vector for
each node. The lack of shared parameters means increasing the number of parameters
as O(∣ V ∣), which are intractable in massive graphs.
• No use of node features: Another critical issue in shallow embedding methods is the
exclusion of the node features. However, the encoding procedure might contain rich
feature information. It applies notably to semi-supervised learning activities because
each node provides essential information about features.
• Failure in inductive applications: Methods of shallow embedding are intrinsically trans-
ductive (Hamilton et al. 2017). It is impossible to generate embeddings for additional
nodes discovered after the training phase. Because of this constraint, shallow embed-
ding methods can not be employed for inductive applications.

6.2.2 Deep graph embedding

Many deep embedding techniques have been developed to address the limitations high-
lighted in recent years. The shallow embedding methods are different from the deep
embedding ones. Here, the encoder module would consider both the graph’s structural and
attribute information. A top-level classifier is trained to predict class labels for unlabelled
nodes under the transductive setup for semi-supervised learning tasks based on the node
embeddings.
Auto-encoder Besides using DL models, Auto-encoder based methods differ from shal-
low embedding methods, and Auto-encoder uses a unary decoder rather than a pairwise
one. In Auto-encoder based techniques, every node, i, is represented by a high-dimensional
vector derived from a row in the similarity matrix, specifically,

si =ith row of S
(20)
where, Si,j =sG (i, j)

The Auto-encoder based techniques seek to encode each node based on the associated vec-
tor si , then reconstruct it from the embedding results, with the requirement that the recon-
structed one is as similar to the original as possible. From Eq. 20, the encoder module
relies on the provided si vector. It enables deep embedding techniques based on Auto-
encoder to embed local structural information inside the encoder, while the low embedding
methods can simply not.

13
L. Waikhom, R. Patgiri

Several recent deep embedding techniques aim to address the primary disadvantages of
shallow embedding methods by building certain particular functions dependent on a node’s
neighborhood. The GNN is a generic framework for building DNNs on graph-structured
data widely used in state-of-the-art deep embedding methods. The fundamental concept
is the node display vector, which depends on the graph topology and any node feature
information. Unlike the previously discussed techniques, GNNs create the necessary node
embeddings using node attributes, such as node information for a citation network or even
basic statistics like node degree, one-hot vectors, etc. Like previously mentioned deep node
integration methods, the classifier is trained explicitly or implicitly in GNN-based models
on top of the node embedding created by the final hidden state. Subsequently, unlabeled
nodes for semi-supervised learning tasks might be implemented because GNN consists of
two primary operations: the update operation and the aggregate operation; these processes
will be revised and enhanced compared to the overall basic GNN from the perspective of a
specific operation.
Semi-supervised learning combines both supervised and unsupervised learning. This
approach enables a developer to educate a machine using their domain-specific knowledge.
During the training phase, the machine’s performance can be enhanced through semi-
supervised learning. In comparison to supervised learning, it is also significantly faster.
It can be applied to many different fields. This type of learning can give great benefits in
processes for automatic data labeling. It is the best choice for training the machine. How-
ever, some limitations exist, such as—(1) Semi-supervised learning can only be applied to
simple problems; it is not appropriate for complex problems. (2) Semi-supervised learning
is not advised for tasks that require a lot of manual labor. (3) The approach may not work if
the labeled data sample isn’t indicative of the distribution as a whole; the result of the itera-
tions is unstable.

6.3 Self‑supervised graph learning

For a graph G = (V, E, X), where V is the set of nodes, E is the set of edges between all the
nodes, and X is the feature matrix of the nodes ( xu as the features of node u). If the adja-
cency matrix of all nodes is represented by A ∈ [0, 1]∣V∣∗∣V∣, where ∣ V ∣ is the total number
of nodes, and Auv = 1 indicates whether there is edge between the node u and v, otherwise
Auv = 0. Therefore, we can represent the graph as G = (A, X). In this learning paradigm,
we aim to build a self-supervised pretext task with a corresponding loss Lslf that can be
combined with the task-specific loss Ltsk to learn a GNN f𝜃 that can generalize better on
unlabeled data.

min Ltsk (𝜃, A, X, dl ) = l(f𝜃 (G)v , y)�
𝜃 (21)
(v,y)∈dp

where dp = (Vl , Yl ) denote the labeled data and dq as the unlabeled data in which some
nodes Vp are associated with corresponding labels Yp, 𝜃 denote the parameters of f𝜃 , and the
prediction of node v is denoted by f𝜃 (G)v. The function l(.., ) represents the loss function to
measure the difference between the true and predicted labels.
When we have the degree of nodes, let us say the degree of a node v is represented as
Dv = ∣V∣ A , then the associated loss of the self-supervised pretext task is given as:

u=1 vu

13
A survey of graph neural networks in various learning paradigms:…

1 ∑
Lslf (𝜃, A, X, dp ) = (f (G) − Di )2 (22)
∣ dp ∣ v∈d 𝜃 v
p

where, 𝜃 is the parameters of a GNN model f𝜃 , and f𝜃 (G)v denote the predicted local node
property of a node v.
Self-supervised learning is a new emergence in the advances of DL. It addresses the
reliance issue of semi-supervised learning on manual labels, computationally heavy access
to ground truth labels, overfitting, and poor performance against adversarial attacks (Liu
et al. 2021). Self-supervised learning comes over the mentioned shortcoming by training a
model to solve well-designed “Pretext Tasks”. Self-supervised learning learns more gener-
alized representations from unlabelled, which performs better on the desired “Downstream
Tasks” (You et al. 2020) such as node, edge, and graph level tasks. Table 5 shows the exist-
ing methods for graph self-supervised learning.

6.3.1 Pretext tasks

The pretext task is critical in self-supervised learning, and its construction is crucial to
the model’s performance for the downstream tasks. We divide the pretext task into the
following categories: Masked Feature Regression (MFR), Auxiliary Property Predic-
tion (APP), Same-Scale Contrasting (SSC), Cross-Scale Contrasting (CSC), and Hybrid
Self-Supervised Learning (HSL).
Masked feature regression
The Masked Feature Regression (MFR) category of pretext task comes from the field
of computer vision, precisely the task of image inpainting inspired it (Yu et al. 2018).
The idea behind the method is to mask the feature of nodes/edges with zero or a specific
value. Then the pretext task is to recover the original information of nodes/edges before
GNNs mask the data. You et al. (2020) provided a node-based MFR technique that lets
GNN extract features from the surrounding environment information. Jin et al. (2020),
and Hu et al. (2019) provided MFR techniques for edge-level tasks. Jin et al. (2020) pro-
posed method (AttributeMask) focuses on reconstructing the attributes feature matrix
for edges which is dimensionally reduced using the Principle Component Analysis
(PCA). Similarly, Hu et al. (2019) designed a method called AttrMasking, to replace the
attributes of nodes and edges with unique masks and then use GNNs to reconstruct the
original features matrices for both nodes and edges simultaneously.
Furthermore, Manessi and Rozza (2021) introduced new reconstruction tasks by
training a joint learning strategy for encoders. The tasks, namely reconstructing raw fea-
tures from noisy input data, raw reconstruction of features from ideal input data, and
reconstruction of feature embeddings from noisy feature embeddings, are used as pre-
text tasks for learning well-generalized representations.
Auxiliary property prediction
Besides the methods mentioned falling in the category of MFR, other methods
explore the underlying attribute information of nodes/edges or even the graph structure
to design new pretext tasks for providing better learning signals to the self-supervision
models. These methods, both regression and classification, fall into the category of aux-
iliary property prediction (Liu et al. 2021).
Regression-based technique The regression-based technique is similar to MFR, but it
diverges to numerical structure and attributes property prediction of graphs by focusing
on pretext tasks. A node degree-based local structure-aware pretext task was introduced

13
Table 5  Summary of existing GNNs-based self-supervised learning methods
Methods Pretext Task Category Training Scheme Technique Task Key functionality

13
Perozzi et al. (2014) CSSC URL Shallow NN NC Network classification and Anomaly detection
Jin et al. (2020) MFR, RAPP, RAPP PT &FT /JL GCN NC AttributeMask for establishing a pretext task by focusing on attribute
information. PairwiseAttrSim for accomplishing the similarity of a
pair of nodes to be near to the node with similar feature on a collec-
tion of representative node pairs
You et al. (2020) MFR PT &FT /JL GCN NC Improving the generality and robustness of GCNs
Manessi and Rozza (2021) MFR JL GCN NC Train GNN models in a multi-task fashion
Hu et al. (2019) MFR PT &FT GCN NC A novel strategy for pre-training GNNs on both the individual nodes
and on the entire graph
Sun et al. (2020) CAPP JL GCN NC Iteratively train an encoder architecture for assigning pseudo labels to
unlabeled nodes
You et al. (2020) CAPP PT &FT /JL GCN NC Node clustering by using pre-computed cluster index
Hu et al. (2019) CAPP PT &FT GCN NC, LP, GC A generic structural feature extraction by pretrain the GNNs
Zhu et al. (2020) CAPP URL GCN NC Maps pseudo label from to the cluster indices and then optimize the
edges of the inter-clusters
Rong et al. (2020) CAPP PT &FT NN NC, LP, GC Learning rich structural and semantic information of molecules from
enormous unlabelled molecular data
Grover and Leskovec (2016) CSSC URL Shallow NN NC Generating embeddings by exploring the distinct neighbors as context
Hamilton et al. (2017) CSSC URL SAGE# NC Low-dimensional nodes embedding for the large graphs
Tang et al. (2015) CSSC URL Shallow NN NC Solving the problem of node embedding into low-dimensional vector
spaces for arbitrary networks’ information
Kipf and Welling (2016) CSSC URL GCN LP Able to learn interpretable latent representations in the undirected
graphs
Jin et al. (2020) CSSC PT &FT /JL GCN NC Alleviating the limitation of DL (requiring larger amounts of costly
annotated data) by building domain-specific pretext tasks on
unlabeled data
Peng et al. (2020a) CSSC URL GCN NC, LP Learning of graph representations using global context prediction
Kim and Oh (2020) CSSC JL GAT​ NC Graph attention model for noisy graphs
L. Waikhom, R. Patgiri
Table 5  (continued)
Methods Pretext Task Category Training Scheme Technique Task Key functionality

Hwang et al. (2020) CSSC JL NN NC, LP Introducing a subordinate task to predict meta-paths by employing
node embeddings
Qiu et al. (2020) ASSC URL/ PT &FT GIN NC, GC Using Random Walk as augmentations over subgraphs and artificially
designed positional node embeddings are used as node features
Zhu et al. (2020) ASSC URL GCN NC Leveraging a contrastive objective at the node level
Zhu et al. (2021) ASSC URL GCN NC Contrastive Learning of the graph with adaptive augmentation
Zeng and Xie (2020) ASSC URL/ PT& FT/JL NN GC Alleviating overfitting in graph classification
Zhang et al. (2020) ASSC URL/JL GCN/GIN GC Iteratively performed self-distillation with graph augmentations
Jiao et al. (2020) CSC URL GCN NC Capturing the regional structure information in graph representation
learning by using the strong interaction between the central nodes
and their sampled subgraphs
Velickovic et al. (2019) CSC URL GCN NC Learning node representations within the graph-structured data
Hassani and Khasahmadi, (2020) CSC URL GCN NC,GC Graph level representation learning by maximizing the mutual infor-
mation between the nodes
Hu et al. (2019) CSC PT &FT GCN NC, GC Pretrain the GNN at both the level of individual nodes as well as
entire graphs
A survey of graph neural networks in various learning paradigms:…

Mavromatis and Karypis (2020) CSC URL GCN NC Leveraging the cluster level node information for the graph represen-
tation learning
Sun et al. (2019) CSC URL NN GC Elevating the graph level tasks by augmenting the mutual information
between substructures of different levels and graph representations
Subramonian (2021) CSC URL GCN GC Finding pattern motifs iteratively and try to enhance the similarity of
the embeddings between motifs and a graph
Sun et al. (2021) CSC JL GCN GC Employing the representation of local subgraph and global graph to
discriminate the representation of the subgraph between graph
Ren et al. (2020) CSC URL GCN/GAT​ NC Maximized combination of local and global mutual information for
representation learning in heterogeneous graphs
Wang et al. (2021) CSC PT &FT Shallow NN LP Prediction task of contextual nodes in Heterogeneous Networks

13
Table 5  (continued)
Methods Pretext Task Category Training Scheme Technique Task Key functionality

13
Park et al. (2020) CSC URL GCN NC,LP Attributed multiplex network embedding
Cao et al. (2021) CSC URL NN NC,LP Bipartite graph embedding by maximizing the mutual information
between the graphs. mutual information maximization
Opolka et al. (2019) CSC URL GCN NC Tackling the challenging task of node level regression by training
embeddings
Hu et al. (2020) CSSC PT &FT HGT NC,LP Dropped edge prediction task to a graph generation task for pre-
training the GNNs
Peng et al. (2020) Hybrid URL GCN NC,LP Maximizing feature mutual information between the raw feature of
neighboring node and node embedding
Zhang et al. (2020) CSSC PT &FT NN NC Graph structure recovery is used to pretrain a transformer based
model for graphs
Wan et al. (2020) Hybrid JL GCN/ HGCN NC Contrastive and generative graph convolutional network
Jin et al. (2021) Hybrid JL GCN/ HGCN NC, GC Automatically leveraging multiple pretext tasks effectively
Wu et al. (2021) ASSC PT &FT GCN LC Improving the accuracy and robustness of GCNs for recommendation
Bui et al. (2021) Hybrid PT &FT CNN NC, LP, GC Code Representations learning by Predicting Subtrees
Che et al. (2021) ASSC PT &FT GCN NC, LP, GC Graph representation learning
Jin et al. (2021) ASSC URL GCN NC, LP, GC Automatically leveraging multiple pretext tasks effectively
Lin et al. (2021) Hybrid URL GCN NC, LP, GC Multi-label classification of fundus images
Li et al. (2021) Hybrid URL GCN NC Ethereum Phishing Scam Detection
Choudhary et al. (2021) ASSC URL GCN NC, LP, GC Automatically leveraging multiple pretext tasks effectively
Gu et al. (2022) CAPP URL GCN GC Graph collaborative filtering model for multi-behavior recommenda-
tion that considers the discrepancies and commonalities of multiple
behaviors

NC node classification, LP link prediction, GC graph classification, CSSC context-based, ASSC augmentation-based, MFR masked feature regression, RAPP regression-based,
CAPP classification-based, PT &FT pre-training and fine-tuning, JL joint learning, URL unsupervised representation learning
L. Waikhom, R. Patgiri
A survey of graph neural networks in various learning paradigms:…

by Jin et al. (2020) called NodeProperty, and global structure information is also consid-
ered in the computation. Jin et al. (2020) proposed a method called Distance2Cluster, to
compute the distance between pre-defined clusters in the graph to unlabeled nodes. This
technique forces the node representation to consider the global positioning for training.
PairwiseAttrSim, another proposed method by Jin et al. (2020) focuses on closing the
gap between the similarity value for pair of nodes to the feature similarity of the pair of
nodes on representation distribution. It comes from the idea of increasing feature trans-
formation for the local structures considering over-smoothing.
Classification-based technique Contrary to the Regression-based approach, the meth-
ods based on classification take the task of constructing pseudo labels to help model
training. Sun et al. (2020) present a method called Multi-Stage self-supervised (M3S)
by leveraging the DeepCluster (Caron et al. 2018) to iteratively train an encoder archi-
tecture for assigning pseudo labels to unlabeled nodes during each iteration of the train-
ing process. Similarly, You et al. (2020) introduced a method (node clustering) for node
clustering by using pre-computed cluster index as self-supervised labels.
Other similar techniques presented by Hu et al. (2019) focused on cluster preservation,
and You et al. (2020) provided a method based on intrinsic topology properties focusing
on minimization of connection across subsets; followed by assignment partition indices for
labels in different subsets similar to node clustering. Zhu et al. (2020) taking advantage of
both structural and attributes-based clustering, presented Cluster-Aware GNNs (CAGNN)
to map pseudo labels from the cluster indices and then optimized the edges of the inter-
clusters by minimizing them. Gu et al. introduced a graph collaborative filtering model for
the multi-behavior recommendation that considers the discrepancies and commonalities of
multiple behaviors.
Apart from the techniques based on graph clustering, property prediction of nodes or
edges is another helpful technique for self-supervision. Rong et al. (2020) presented a
method called Graph Representation from a self-supervised message passing transformer
(GROVER), which takes motif prediction and contextual properties for designing pretext
tasks. In the first task, pre-determined motif phrases are identified from graph embedding.
While in the second task, statistical properties (e.g., node-edge counts) of anchor nodes’
subgraph are predicted. Hu et al. (2019) leverage the node centrality (e.g., eigencentrality,
etc.) for calculating node centrality score ranks. Also, it estimates score ranks for any two
nodes to preserve various hierarchal information in the graph.
Same-scale contrasting Unlike the two types mentioned above of methods that focus on
building on a single element (e.g., a single node), methods based on Contrastive Learning
learn by training on the agreement between two graph elements. Positive pairs denoting
the agreement between samples with similar semantic data are maximized in this method,
while negative pairs denoting samples with unrelated semantic data are minimized. Same-
Scale Contrasting (SSC) subdivides two elements of a graph by contrasting them in a
similar or equal scale, e.g., graph-graph contrasting and node-node contrasting. Further
SSC-based techniques are divided into categories based on the definition of positive and
negative pairs.
Context-based The theory behind the Context-based Same-Scale Contrasting (CSSC)
gives closer locations in the embedding space to the contextual nodes. The contextual
nodes are mostly adjacent positions in the structure of the graph. The intuition behind
the idea of interconnected entities with similar semantic data is based on the Homoph-
ily hypothesis (McPherson et al. 2001). A practical method to define context is using a
Random Walk to generate sets of nodes with similar semantic data. Closer node pairs in

13
L. Waikhom, R. Patgiri

the walk are denoted as positive pairs, while those acquired using negative sampling are
denoted as negative pairs.
Perozzi et al. (2014) gave DeepWalk, a node embedding technique applicable for unat-
tributed graphs. This technique used the Skip-Gram model (Mikolov et al. 2013, 2013)
to generate a low dimensional node representation for samples of Random Walks (taken
as sequences). Grover and Leskovec (2016) proposed a technique, Node2vec, based on
DeepWalk, which uses the procedure of biased Random Walk for generating embeddings
by exploring the distinct neighbors as context. The methods mentioned above use shallow
neural networks to encode the input space of unattributed graphs. Techniques like Graph-
SAGE, proposed by Hamilton et al. (2017), used sophisticated methods for learning the
representation of attributed graphs. GraphSAGE is a novel GNN architecture designed to
get node encoding using Random Walk-based sampling mentioned above. Graph structure
could be employed directly instead of using Random Walk sampling of contextual nodes.
Nodes in k-hop distance away from a given node are considered positive samples, while
nodes farther away than k-hop are considered negative samples. This technique is presented
by Liu et al. (2021) for reconstructing the graph structure. Tang et al. (2015) presented a
large-scale information network embedding (LINE) method by taking both first- and sec-
ond-order neighbor nodes as contextual information for generating node embedding.
In EdgeMask proposed by Jin et al. (2020), for generating node embeddings for nega-
tive samples, the authors used only one hop adjacent node as positive contextual informa-
tion and random sampling. Similarly, Hu et al. (2019) uses the same learning objective to
denoise link reconstruction. Peng et al. (2020) presented ­S2GRL (pair of nodes in a graph)
by constructing a decoder and learns simultaneously for contrasting nodes between 1 hop
to k hop distance away. A technique for learning correct attention weights of the graph
attention units called self-supervised graph attention network (SuperGAT) by Kim and Oh
(2020) used structural reconstruction of the latent output data for each layer. Hwang et al.
(2020) presented a method called Self-supervised Auxiliary Learning (SELAR) for hetero-
geneous graphs. To begin with, it generates meta-paths and then promotes the pretext task
by introducing a subordinate task to predict meta paths by employing node embeddings.
Augmentation-based Contrastive visual feature learning has witnessed many advances
in the last few years (He et al. 2020; Chen et al. 2020). Augmentation-based Same-Scale
Contrasting (ASSC) is also motivated by these advances, generating new augmented exam-
ples for actual data samples. Defining the data augmentation process is a crucial factor
in ASSC. Two augmented samples from actual data are considered positive pairs, while
augmented samples from different actual data are treated as negative pairs. The techniques
falling in this category are based on mutual information (MI) estimation (Hjelm 2018) and
using InfoNCE for estimation (Oord et al. 2018).
Qiu et al. (2020) presented a method called Graph Contrastive Coding (GCC) for node-
level tasks, focusing on universal unattributed graphs. This technique uses Random Walk
as augmentations over subgraphs with a restart for each node, and then artificially designed
positional node embeddings are used as node features. Zhu et al. (2020) presented a tech-
nique called Graph Contrastive Representation Learning (GRACE) that evolves two aug-
mentation strategies by removing masking node features and edges for generating an aug-
mented representation of the graph. For contrast purposes, inter and intra-view negative
pairs are considered. Zhu et al. (2021) presented a method called Graph Contrastive Learn-
ing with Adaptive augmentation (GCA), which further improves GRACE by incorporating
adaptive augmentation.
You et al. (2020) provided Graph Contrastive Learning (GraphCL) method for graph-
level tasks; this method considers four augmentations, namely, node dropping, attribute

13
A survey of graph neural networks in various learning paradigms:…

masking, edge perturbation, and subgraph sampling. Chen et al. (2020) utilized a simple
framework for Contrastive Learning of visual representations (SimCLR) for contrast-
ing examples of graphs using different augmentations. Zeng and Xie (2020) augmented
examples in their method (Contrastive self-supervised learning (CSSL)) by adding add/
delete operation of node/edge and examined the performance of the method under differ-
ent training schemes. Graph diffusion (Klicpera et al. 2019) over input graphs is used by
Zhang et al. (2020) in their method, Iterative graph self-distillation (IGSD) for generat-
ing augmented views. It leverages the concept of Bootstrap your own latent (BYOL) Grill
et al. (2020) for contrasting anchor graphs from others in GNNs. Graph diffusion is a use-
ful structural augmentation strategy for providing a global representation of the underlying
structure. The rich global information of such structures is encoded and further used by
self-supervised learning.
Cross-scale contrasting The Cross-Scale Contrasting (CSC) technique is different from
SSC, and it learns different scale graphs (node-subgraph, node-graph contrasting) repre-
sentations by contrasting. The summary of the graph/subgraph is generally acquired by
adopting a readout function. These techniques also inherit the idea of mutual information
maximization, as in ASSC. Hjelm et al. (2018) proposed a method using Jensen Shannon
divergence as mutual information estimator.
Velickovic et al. (2019) presented Deep Graph Infomax (DGI) to learn the representa-
tions of the nodes by maximizing the mutual information between the top-level summary
of graphs and the corresponding patch representations. DGI pollutes the original graph
using node features via random sampling to get negative samples of each graph. The struc-
ture of the graph is preserved in DGI. Similarly, Hassani and Khasahmadi (2020) devised a
method called Multi-view representation learning on graphs (MVGRL) to consider multi-
view contrasting. This method takes graph diffusion and the original graph structure as two
different views. The goal is to maximize the mutual information between large-scale graphs
and cross-view representations. Jiao et al. (2020) presented another method, called a Sub-
graph Contrast (Subg-Con), to contrast the subgraphs context and the node embeddings for
learning the local structure information of the graph. Hu et al. (2019) focus on context pre-
diction to enhance the agreement between context subgraphs and nodes. Mavromatis and
Karypis (2020) proposed a method called Graph InfoClust (GIC) to maximize the mutual
information of intra-cluster with the help of a differentiable clustering layer. The agreement
between cluster centroids and nodes is represented in mutual information. Sun et al. (2019)
proposed a method, InfoGraph, to elevate the graph level tasks by maximizing the mutual
information between substructures of different levels and graph representations. Subramo-
nian (2021) proposed a method, Motif-driven Contrastive Learning of Graph representa-
tions (MICRO-Graph), by using Expectation-Maximization Clustering to find pattern
motifs iteratively. The method aims to enhance the similarity of the embeddings between
motifs and a graph. Sun et al. (2021) proposed a novel Subgraph NN with reinforcement
pooling and self-supervised mutual information mechanism method (SUGAR) employing
the representation of local subgraph and global graph. The closer representations are pulled
to discriminate the subgraph representation between graphs. It further helps the mechanism
of reinforcement pooling.
Different types of graphs can benefit from the CSC techniques. Ren et al. (2020) gave
a maximized combination of local and global mutual information for representation learn-
ing in heterogeneous graphs. This method uses an attention-based combination of nodes
for local representation within a meta-path. Further, the author gave a summary of local
representation in global representation. Similarly, Wang et al. (2020) proposed a method
for the prediction task of contextual nodes, which maximizes the occurrence probability

13
L. Waikhom, R. Patgiri

between a randomly masked node and a context graph within the given subgraph. Park
et al. (2020) gave a method, Deep multiplex graph Infomax (DMGI), that increases the
agreements between the summary representation of each relation graph at the global graph
level and specific node embedding. Cao et al. (2021) also used local-global mutual infor-
mation in their method BiGI for bipartite graphs. Global representation in BiGi combines
two prototype representations and k hop subgraph aggregation of nodes comprised in an
edge for local representation based on attention. Opolka et al. (2019) presented a method
called Spatio-temporal deep graph Infomax (STDGI), which uses spatial-temporal graphs
extended from Contrastive Learning. The node embeddings at the t time step and raw fea-
tures at the t + k time step of the same node are maximized for agreement, and then the
relevant information is captured by the encoder for future feature predictions.
Hybrid self-supervised learning
Few proposed methods combine pretext tasks from different categories in multitask
learning strategies to leverage their advantages. Hu et al. (2020) proposed MFR-based
Generative pretraining GNN (GPT-GNN) that drops the edge prediction task (CSSC) to
a graph generation task for pretraining the GNN. Peng et al. (2020) gave the Graphical
mutual Information (GMI) method, which jointly maximizes the feature mutual informa-
tion between the raw feature of a neighboring node and node embedding. It also maximizes
the edge mutual information (node embedding of two adjacent nodes) for learning graph
representation. Node feature reconstruction (MFR) is utilized by Zhang et al. (2020) in
their proposed method, Bert. Along with MFR, graph structure recovery (CSSC) is used
to pre-train a transformer-based graphs model. Wan et al. (2020) took context-based SSCs
and augmentation as self-supervised learning signals and learned them simultaneously
using downstream node classification tasks.

6.3.2 Self‑supervised training strategies

Depending on the relationship between pretext tasks, downstream tasks, and graph encod-
ers, training schemes for self-supervised learning methods are categorized into three types:
Joint Learning (JL), Pre-training, Fine-tuning (PT & FT), and Unsupervised Representa-
tion Learning (URL).
Joint learning
This scheme trains the pretext and downstream tasks jointly with the encoder. The com-
bined loss function takes errors from the downstream task loss function and self-supervised
learning process. A trade-off hyper-parameter controls each error’s contribution percentage
into total error. It is taken as a multitask learning or regularization of the downstream task
used as a pretext task.
Pre-training and fine-tuning In this scheme, pretext tasks along with the encoder are
pre-trained at the start. It is treated as the initialization of parameters in the encoder. Fur-
ther, the prediction head and pre-trained encoder are fine-tuned simultaneously under the
guidance of particular downstream tasks.
Unsupervised Representation Learning Like PT & FT, this scheme also performs the
pretext tasks, and the encoder is pre-trained at the start. However, the second stage differs;
the encoder parameters are locked when training the model using the downstream task.
URL is more challenging than other training schemes as encoder training is performed
without supervision.
The self-supervised learning addresses the issues persistent in other learning setting’s
procedures by eliminating the necessity of data labeling such as: As similar to supervised

13
A survey of graph neural networks in various learning paradigms:…

learning, in the self-supervised learning setting, the model is trained from training datasets
with their labels. However, self-supervised learning doesn’t need to be manually labeled
since it produces labels on its own. Both the self-supervised and unsupervised learning
work with unlabelled datasets. We can consider self-supervised learning as the subset of
unsupervised learning. The only difference is that unsupervised learning focuses on clus-
tering and dimensionality reduction, whereas self-supervised learning aims to conclude
regression and classification tasks. Therefore, self-supervised learning reduces the cost and
time of preparing data labels. The self-supervised learning framework brings machine cog-
nition closer to embedded human cognition. However, self-supervised learning has some
limitations—1. It is crucial to choose the right pretext task for the given purpose. Some-
times, the pretext task can do more harm than good. 2. The time taken by self-supervised
learning to train a model is too much compared to supervised learning because of the mul-
tiple stages of training. 3. Existing self-supervised learning approaches need significant
data to reach accuracy comparable to supervised learning counterparts. Although the self-
supervised learning approach’s basic concept is to avoid using labeled data, the drawback
of that strategy is that you either need a lot of data to produce accurate pseudo labels or
you have to compromise on accuracy.

6.4 Unsupervised graph learning

Unsupervised learning means training the machine using unlabeled data that does not indi-
cate what it sees. For example, we have a task of binary image classification for dogs and
cats. Here, It cannot identify the difference between cats and dogs since it does not know
which photos are of them. Instead, it may learn the similarities between the images shown
to it. That does not help with image classification (this neural network will never tell you
when a picture contains a dog or a cat). However, it is helpful for a variety of other tasks.
It can advise when a fresh image is so different from what it has already seen that it is
confident it does not include any dogs or cats. It can take large images of cats or dogs and
reduce them to lists of features (such as ’pointy ears’ or ’soft’) that take up less storage
space and then expand them back to pictures. It can even generate fresh photos of cats and
dogs.
The ground truth labels and the corresponding data samples are present in supervised
and semi-supervised learning settings. There are no labeled samples in the unsupervised
setting; thus, loss functions must rely on information extracted from the graph, such as
nodes, edge attributes, and graph topology. In this part, we mostly describe unsupervised
training variations. Most graph unsupervised learning methods are often based on the con-
cepts of Contrastive Learning, Auto-encoder, or Random Walk, as shown in Table 6.

6.4.1 Graph‑based auto‑encoders

For the graph-structured data, Kipf and Welling (2016) used the extended Auto-encoder
and called it Graph Auto-encoder (GAE). GAE (Kipf and Welling 2016) gets the initial
representation of nodes using GCNs. The training is done using the loss computed using
the similarity between the reconstructed adjacency matrix and the original one. The learn-
ing process follows the variational training style called Variational Graph Auto-encoder
(VGAE). Wang et al. (2017), and Park et al. (2019) attempt to reconstruct the feature
matrix rather than the adjacency matrix. To provide resilient node representation, Margin-
alized Graph Auto-encoder (MGAE) (Wang et al. 2017) use a marginalization denoising

13
Table 6  Summary of existing GNNs-based unsupervised learning methods
Papers Feature extraction Technique Task Key functionality

13
Cao et al. (2020) GAE CNN NC Hyperspectral Classification
Yang et al. (2020) GAE GCN& GAT​ LP Prevent over smoothing of message passing
Okuda et al. (2021) RW CNN NC Discover common object and localization method for a set of particular object images
Velickovic et al. (2019) Contrastive CNN NC Maximizing the mutual information between the graph representation and node repre-
sentation
Sun et al. (2019) Contrastive K-layer GCN NC, LP, GC Graph level representations
Hassani et al. (2020) Contrastive GCN NC,GC Learning node and graph level representations
Hjelm et al. (2018) Contrastive NN NC New avenue for unsupervised learning of representations
Perozzi et al. (2014) RW NN NC Predicting the local vicinity of nodes
Tang et al. (20155) RW NN NC, LP Low dimensional node embeddings in the huge graphs
Grover & Leskovec (2016) RW NN NC, LP Learning continuous feature representations for nodes in networks
Hamilton et al. (2017) RW GCN & LSTM NC, GC Low dimensional node embeddings in the huge graphs
Ribeiro et al. (2017) RW - NC Structural identity of nodes in the networks
You et al. (2019) RW NN NC Learning node embeddings
Dong et al. (2017) RW NN NC Node Representation Learning for Heterogeneous Networks
Adhikari et al. (2018) RW NN NC Formulate subgraph embedding problem
Park et al. (2021) GAE NN NC, LP Hyperbolic Representation Learning via Message Passing Auto-encoders
Wu et al. (2021) GAE GCN LP Addressed a limitation of current methods for link prediction
Sun et al. (2021) GAE DL NC, LP Discover the dialogue structure for the task-oriented dialogue systems
Sakhuja et al. (2021) GAE DL LP Framework for learning latent layers in the heterogeneous, multi-relational, or multilayer
networks
Li et al. (2021) GAE NN NC Learning prerequisite chains in both the known and unknown domains for acquiring
knowledge
Kipf & Welling et al. (2016) GAE GCN LP Learning the interpretable latent representations for the undirected graphs
Pan et al. (2018) GAE GCN LP, GC Representing graph-structured data in a low dimensional space for graph analytics
Wang et al. (2017) GAE GCN GC Marginalized graph Auto-encoder algorithm for graph clustering
Park et al. (2019) GAE GCN NC, LP, GC Extracted low-dimensional latent representations from a graph in irregular domains
L. Waikhom, R. Patgiri
Table 6  (continued)
Papers Feature extraction Technique Task Key functionality

Cui et al. (2020) GAE GCN NC, LP Graph embedding for learning the vector representations from node features and graph
topology
Liu et al. (2020) SGT RP NC, LP Heterogeneous network embedding framework that automatically preserve both attribute
semantics and multi-type relations
Huang et al. (2021) RW NN NC, LP Heterogeneous information network embedding by learning for low-dimensional
representation of multi-type nodes by considering all temporal dynamics during the
evolution
Ji et al. (2021) HPN NN NC Heterogeneous graph Propagation Network to alleviate the semantic confusion
Hu et al. (2021) GAE NN Node clustering Generating low-dimensional node embeddings for heterogeneous graphs

NC node classification, LP link prediction, GC graph classification, GCN graph convolutional network, NN neural network, GAT​ graph attention network, GAE graph auto-
encoder, RW random walk, HPN heterogeneous graph propagation network
A survey of graph neural networks in various learning paradigms:…

13
L. Waikhom, R. Patgiri

Auto-encoder. Park et al. (2019) proposed a technique called Graph Convolutional Auto-
encoder using Laplacian smoothing and sharpening (GALA), a Laplacian sharpener
that performs the inverse operation of smoothing and decoding hidden states to create a
symmetric graph Auto-encoder. This technique addresses the issue of over smoothing in
GNN training. In contrast to the previous method, Adaptive Graph Encoder (AGE) Cui
et al. (2020) claims that recovering losses is incompatible with downstream activities. As a
result, it uses adaptive learning to assess pairwise node similarity and achieve state-of-the-
art link prediction and node clustering performance.
The recent works on the graph Auto-encoders are: feature extraction in hyperspectral
classification by Cao et al. (2020); for preventing over smoothing of message passing by
Yang et al. (2020); a hyperbolic representation learning via Message Passing Auto-encod-
ers by Park et al. (2021); for addressing a limitation of current methods for link prediction
by Wu et al. (2021); for discovering the dialogue structure for the task-oriented dialogue
systems by Sun et al. (2021); a generative framework for learning latent layers in the het-
erogeneous, multi-relational, or multilayer networks by Sakhuja et al. (2021); for learning
prerequisite chains in both the known and unknown domains for acquiring knowledge by
Li et al. (2021). Hu et al. (2021) recently introduced a GNN model for generating low-
dimensional node embeddings for heterogeneous graphs.

6.4.2 Contrastive learning

In addition to GAE, Contrastive Learning is used for graph representation learning in the
unsupervised learning setting. Velickovic et al. (2019) proposed Deep Graph Infomax
(DGI), which is extended from deep InfoMax presented by Hjelm et al. (2018). DGI maxi-
mizes the mutual information across graph and node representations.
Infograph presented by Sun et al. (2019) maximizes the mutual information between
graph level representations and subgraph level representations of various sizes, such as
nodes, edges, and triangles, to learn graph representation. Hassani and Khasahmadi’s
multi-view Hassani and Khasahmadi (2020) compares first-order adjacency matrix repre-
sentation with graph diffusion, achieving state-of-the-art results on numerous graph learn-
ing challenges. Okuda et al. (2021) is a recent unsupervised graph representation learn-
ing for discovering common objects and localization method for a set of particular object
images. Other works such as Cao et al. (2020) for hyperspectral classification and Yang
et al. (2020) for preventing over smoothing of message passing.

6.4.3 Random walk

The Random Walks have been proven to be scalable for large networks to capture the graph
structure efficiently by Perozzi et al. (2014) proposed method called Deepwalk. Moreover,
Random Walks were demonstrated to be capable of compensating structural equivocation
(nodes with comparable local structures with similar embeddings) and equally (nodes with
similar embedding belonging to the same communities) (Du et al. 2018). Random Walks
have been coupled with current language modeling representation learning methods to pro-
vide high-quality representations of nodes are utilized for downstream learning tasks like
node and edge prediction as shown in Du et al. (2018), and Perozzi et al. (2014). In addi-
tion, Random Walk-based techniques have been expanded to capture subgraph embeddings
in Adhikari et al. (2018), and node representations in heterogeneous graphs as in Dong
et al. (2017).

13
A survey of graph neural networks in various learning paradigms:…

Furthermore, the matrix Factorization has been proven to be fundamentally linked to


Random Walk-based techniques in Qiu et al. (2018.)
Random Walks flatten graphs into sequences by conducting Random Walks across
nodes and using language models to learn node representations, as shown in Perozzi et al.
(2014), Tang et al. (2015), Grover et al. (2016), and Hamilton et al. (2017). Ribeiro et al.
(2017) prove to place an excessive emphasis on proximity information at the expense of
structural data. Random Walks are also restricted to transductive settings and cannot take
advantage of node features, as shown by You et al. (2019). Recently, Wu et al. (2021) pre-
sented an enhanced GNN model via auxiliary training for a Semi-supervised node clas-
sification task based on Random Walk to find the most confident nodes for each class to
add them to the labeled set. Huang et al. (2021) introduced a heterogeneous information
network embedding approach by learning for low-dimensional representation of multi-type
nodes by considering all temporal dynamics during the evolution. Another novel approach
was presented by Liu et al. (2020) which maps the units from different modalities into
the same latent space in an efficient way for heterogeneous network representation learn-
ing. The proposed approach by Liu et al. (2020) integrates the scalable spectral graph
transformation (SGT) and sparse random projection (RP) to preserve attribute semantics
and multi-type relations in learned embeddings automatically. Ji et al. (2021) introduced
a novel heterogeneous graph propagation network (HPN) in which its semantic propaga-
tion mechanism absorbs node’s local semantics and injects distinct semantics into node
embedding in node-level aggregation, whereas its semantic fusion mechanism learns the
importance of meta paths and properly fuse them. Hu et al. (2021) proesented an adaptive
auto-encoders framework for both the graph struture or hypergraph structure data that learn
the node embeddings of relational data for clustering tasks.
The unsupervised learning algorithms operate without human supervision. Without
prior data training, the model aims to group ungrouped information based on similari-
ties and differences. In other words, the model is supposed to discover the structure and
hidden patterns in the unlabeled data on its own. Again, unsupervised learning has some
general advantages over other learning approaches. Unsupervised learning doesn’t require
anyone to comprehend the input data before labeling them, in contrast to supervised algo-
rithms. As a result, the approach becomes less complex. It is considerably simpler to add
the labels once the data has been classified. It is greatly aided by finding patterns in data
that are impossible to find using standard techniques. Unsupervised learning makes it sim-
ple to reduce dimensionality. The ability to analyze raw data through unsupervised learning
makes this the ideal tool for data scientists. Additionally, we can determine how closely
the data are related. It is possible to achieve these using probabilistic techniques. Despite
many advantages, there are still some disadvantages, such as—(1) Due to the lack of prior
knowledge data, the outcome may be less accurate as the model is learning entirely from
raw data. (2) The learning phase may take a long time because it evaluates and computes
every possibility. (3) It is impossible to determine the analysis’s conclusions. As the model
have no knowledge about how many classes there are in the input data, it is impossible to
determine the outcomes of the analysis. (4) It might be necessary to continuously feed data
to the model for some tasks that use live data, leading to time-consuming and inaccurate
outcomes. (5) The complexity of the model rises as the number of features grows.

13
L. Waikhom, R. Patgiri

6.5 Few shot and meta learning

Few-shot learning (FSL) has drawn tremendous attention due to the expensive cost of data
annotation. FSL is a new deep learning paradigm for improving experiences automatically
to acquire new knowledge and skills by learning with a limited number of samples with
supervised information. Despite GNNs’ recent breakthroughs in various applications, the
FSL setting is still challenging as existing GNN models constantly need to re-learn their
parameters to incorporate the new data when new classes are encountered. If the num-
ber of samples in each new class is minimal, the performance of the models drastically
decreases. Few-shot learning methods operate under a restricted set of assumptions, which
may make them difficult to employ in practical contexts. The selection of the training set
greatly impacts achieving high performance. Furthermore, FSL benchmarks often assume
that the training set is uniformly sampled from a single distribution. In real-world applica-
tions, however, the training set is more likely to be provided progressively over time, con-
tain varying samples per class, and come from a highly correlated data source.
Therefore, suffering drastic decreases in the model’s performance when the samples are
very few is one of the major issues for the existing GNN models in FSL. A recent few
works have concentrated on tackling the few-shot learning problem on graph data, where
models are trained either through co-training and self-training (Li et al. 2018) or by stack-
ing transposed graph convolutional layers for reconstructing input features (Zhang et al.
2018). Many related machine learning approaches have been proposed, such as meta-learn-
ing (Finn et al. 2017; Ravi and Larochelle 2016; Santoro et al. 2016), embedding learning
(Bertinetto et al. 2016; Sung et al. 2018; Vinyals et al. 2016), and generative modeling
(Edwards and Storkey 2016; Fei-Fei et al. 2006; Salakhutdinov et al. 2012) to address this
problem of FSL. Among them, meta-learning is the most common approach to address this
problem of FSL because of its fast adapting capability for new tasks and learning transfer-
able knowledge between tasks with minimal samples. Meta-learning systems are trained by
being introduced to various tasks, and their capability to learn new tasks is evaluated. In
contrast to many standard GNN models, which train on a single task and test on held-out
samples from that task, this approach uses multiple tasks simultaneously.
GNN is widely used in research to capture the intra and inter-class links in meta-learn-
ing. The work by Garcia et al. (2017) suggested constructing a complete graph network,
where each node feature is concatenated with the relevant class label. Then, update the
node features via the graph network’s attention mechanism to transmit the label informa-
tion. Unlike in Garcia and Bruna (2017), which modeled the few-shot learning as a node
classification problem to determine whether the related two nodes belong to the same class,
the work by Kim et al. (2019) modeled the few-shot learning as an edge labeling prob-
lem. Similarly, Liu et al. (2018) created a transductive setting to simultaneously spread
one-hot encoded labels across all labeled and unlabeled instances. Yang et al. (2020), and
Zhang et al. (2021) built a dual complete graph network for few-shot learning to incorpo-
rate the distribution-level relations with instance-level relations among all cases. Recently,
Yu et al. (2022) presented a novel hybrid GNN model that combines a prototype GNN
with an instance GNN. They perform as feature embedding adaptation modules for rapid
meta-learned feature embedding adaption to new tasks rather than label propagation. Liu
et al. (2022) suggested using a prototype-based approach to initialize parameters in meta-
learning and develop an effective technique for learning expressive node representations
even on heterogeneous graphs.

13
A survey of graph neural networks in various learning paradigms:…

Fig. 4  General architecture of graph neural network

7 General architecture design of GNNs

As observed in the previous Section, various GNN approaches are given by researchers in
multiple domains for different tasks employing varying learning settings. A unified archi-
tecture design of GNNs would solidify the practical understanding of GNNs. So in this
Section, we present the architectural design of GNNs from a designer’s perspective.
Figure 4 shows the overall design workflow for creating a GNN model for a node-
level, edge-level, and graph or subgraph-level task under different learning settings. We
go through each step in-depth in Sects. 7.1, 7.2, 7.3, 7.4, and 7.5 . The GNN architectures
design process for a given task on a specific graph type consists of five steps:

(a) Identifying the underlying graph structure in the given application domain,
(b) Determining the type and scale of the underlying graph present in the given application
domain,
(c) Designing an appropriate loss function for the given task
(d) Creating the model using the chosen loss function, sampling strategy, and computa-
tional modules, and
(e) Applying regularization module to tackle the overfitting.

Furthermore, we cover broad design concepts and background information relevant to each
step. The design specification of these steps is covered in the subsequent Sections.

7.1 Graph structure

The design of a GNN architecture starts by determining the graph’s structure in the given
application domain. Graph representation in most of the application domains is either
structural or non-structural. In the structural scenario, graph structures are apparently
visible in various physical systems, molecules, knowledge networks, etc. While in the

13
L. Waikhom, R. Patgiri

non-structural setting, graphs are not evidently clear but implicitly present in non-structural
contexts. Therefore, the first step is to form the graph representation for the specific task
in structural or non-structural, such as generating a fully connected “word” graph for text
or a picture’s scene graphs. Then after representing the underlying graph, the subsequent
process involves finding the best GNN architecture for this graph.

7.2 Graph type and scale

The next step after discovering the underlying graph’s structure in the application under
consideration is to determine the type and scale of the underlying graph data. Complex
graphs inherently hold more information about the nodes and edges that can help solve
the given task. The classified categories (as described in Sect. 2.1) of graphs are not mutu-
ally exclusive and can be mixed because these categories are orthogonal and can result in
a dynamic directed heterogeneous graph. Other graph forms, such as signed graphs and
hypergraphs, tailored for specific applications are also possible. All the possible forms of
graphs can not be possibly listed here. The main takeaway is to examine the additional
information offered by these graphs and employ it in the design process.
In terms of scale, there is no apparent distinction between “small” and “large” graphs.
The criterion is still evolving with the advancement of computing devices, significant
improvements in GPUs, and main memory. A widely accepted standard is that if the Lapla-
cian or adjacency matrix of a graph (O(n2 ) space complexity) cannot be processed and
stored on the device memory, it is considered large scale. So, if the scale of the underlying
graph is large and does not fit in the main memory, sharding of the graph and even GNN
architecture has to be performed on multiple machines to train large graphs or large GNN
models.

7.3 Task

After determining the variant and scale of the underlying graph, the next step is identifying
the type of task involved in the given application. As discussed in Sect. 7.3, the given task
based on the scope level is classified into the following levels.

• Node level task: The nodes are the focus at this level. Various tasks are defined at this
level, including node classification, node clustering, node generation, etc.
• Edge level task: At the edge level, usually, two tasks are defined: edge classification and
edge prediction. Edge prediction involves predicting the existence of an edge between
two nodes or subgraphs, while edge classification indicates the edge type.
• Graph level task: At the graph level, tasks such as graph classification, graph matching,
and graph generation are performed.

The hierarchy level of the task leads to the choices of the loss function, computational
approaches, data sampling strategy, and inference.

13
A survey of graph neural networks in various learning paradigms:…

7.4 Loss Function

The type of task and training environment guides the design of the loss function. As dis-
cussed in Sect. 7.3, there are typically three types of graph learning tasks. Node classifica-
tion is one of the most prominent benchmark tasks for GNN-based approaches. From the
supervision point of view, the graph learning tasks can also be divided into three distinct
training settings.

(a) In the supervised scenario, labeled data is provided for training. This type of learning
setting is very limited in scale as very few large graphs are labeled for a subset of all
the nodes rather than the whole graph.
(b) The semi-supervised setup provides many unlabeled nodes and a small number of
labeled nodes for the training process. The transductive learning setting gets the model
to predict the labels for the unlabelled nodes from the employed graph in the inference
stage. In contrast, the inductive learning setting can label nodes from the unseen graph
by learning the distribution of additional unlabeled nodes. The vast majority of semi-
supervised tasks are node and edge classification tasks.
(c) The unsupervised learning setting lets the model discover patterns from unlabeled data.
Unsupervised learning tasks such as node clustering are common.

Supervised and semi-supervised ways of training with softmax classification function and
negative log-likelihood loss are the traditional method for applying GNNs to the node clas-
sification task.
∑ ( )
L= −log softmax(Zvi , Lvi )
(23)
vi ∈Vtraining

where, Lvi ∈ Z c is a one-shot vector that indicates the type of training node vi ∈ Vtraining.
For example, in the citation network (McCallum et al. 2000; Giles et al. 1998; Sen
et al. 2008), Lvi represents the topic of article vi . The anticipated probability that the node
belongs to the class vi , as computed by the softmax function, is denoted by softmax(zvi , Lvi ):
c zT pr
� e vi
softmax(zvi , Lvi ) = Lvi [r] ∑
c zT p (24)
r=1 q=1
e vi q

It is the most well-known optimization technique for GNNs using supervised training
based on the loss in Eq. 24, where Pr ∈ ℝd , r = 1, … , c are the trainable parameters.
A customized loss function is designed according to the task type and the learning set-
ting. For example, the cross-entropy loss is used for the labeled nodes in the semi-super-
vised node classification training. For the link prediction task, pairwise loss functions are
generally employed, inspired by shallow embeddings. Finally, the graph or subgraph level
task is usually a regression-based task such as molecular property, chemical solubility, etc.
Here real value-based loss functions such as mean square loss or mean absolute loss are
better suited.

13
L. Waikhom, R. Patgiri

7.5 Message passing computation

Before training the GNN model, we have to choose specific critical components of the
architecture design that can affect the performance of the designed model. These compo-
nents have multiple options to choose from, and choosing a suitable choice depends on the
task and learning setting employed.

1. Sampling module: Huge graphs do not fit into the main memory of the computing
machines. Therefore, portions of the graph must be analyzed one at a time. The com-
putation of these portions must reflect the weights of the GNN model. It is achieved
by sampling a portion (fixed number of nodes) and recursively proceeding with the
neighbors of the selected portions. The sampling is the precursor module to aggregation
and update modules. The Augmentation module can be incorporated after the sampling
module when it seems logical.
2. Aggregation module: The propagation module transmits data across nodes, allowing
the aggregated data to incorporate feature and structural information of the graph. The
convolution and recurrent operators are often employed in propagation modules for col-
lecting neighbors’ information. At the same time, the skip connection is typically used
to acquire information from previous node representations and alleviate the problem of
over-smoothing.
3. Update Module: Information is extracted by pooling modules when an abstract repre-
sentation of high-ranking subgraphs or graphs is required.

7.6 Regularization

Regularization techniques from deep learning for popular domains such as computer
vision, natural language processing, etc., can be employed here to overcome overfit-
ting and weight normalization problems. Some regularization techniques are specific to
GNNs. The following describes some of the widely used regularization modules in the
GNN setting.

7.6.1 Dropout

A widely used regularization technique, dropout, can also be utilized to regularize the
GNN models. It can be incorporated in the aggregation step, where some percentage of
messages from neighboring nodes are dropped randomly. This idea can be incorporated
into different hierarchies of tasks. Another dropout strategy called edge dropout Schlicht-
krull et al. (2018) can also be included, randomly removing the links (or edges) from the
graphs by masking the values in the adjacency matrix.

7.6.2 Regularization and parameter sharing

Other regularization techniques such as weight initialization, weight regularization, and


weight sharing ideas are also helpful for GNN models. The weight-sharing concept from
the convolutional neural network is also borrowed in message-passing neural networks.

13
A survey of graph neural networks in various learning paradigms:…

The weight and bias matrix can be shared among the message computation of the nodes or
edges.
Usually, a GNN model consists of combining the modules mentioned above. A general
GNN architecture is shown in Fig. 4. The sampling, aggregation, and update modules are
illustrated to form a distinct model. Layers often include residual connections to increase
the backpropagation of the deep model training signal and overcome the problem of van-
ishing gradients.

8 Applications

Standard neural networks work on grid data, whereas GNN works on graphs. In recent
years, graphs have gained tremendous popularity because of their ability to represent real-
world problems in connected ways. GNNs have several applications in various activities
and areas. Tasks are addressed according to the task category they fall in, such as node
classification, graph classification, graph generation, network embedding, and spatial-tem-
poral graph forecasting. GNNs can also manage other basic graph tasks such as link pre-
diction (Zhang and Chen 2018), node clustering (Wang et al. 2017), and graph partitioning
(Kawamoto et al. 2019). GNN captures the input in the form of graphs and maintains state
information to capture neighborhood information. The structural data, such as a social net-
work, has a clear structure, but non-structured data, such as text and photos, require the
data to be transformed into a structured format before employing GNN. We outline several
applications based on the given fields of research.

8.1 Natural language processing

Text categorization in natural language processing is one of the popular applications of


GNNs. GNNs have gained dramatic attention recently and becoming more popular in text
categorization (Yao et al. 2019; Zhang et al. 2020). Except for a recent paper, Li et al.
(2020), that utilizes meta-learning and GNNs for cross-lingual sentiment classification but
only employs GNN as a tool for meta-learning, most previous work focuses on monolin-
gual text classification. GNNs infer the document labels (Kipf and Welling 2016; Hamilton
et al. 2017; Veličković et al. 2017) from the interrelationships of documents or words. The
biggest challenge is to bridge the semantic and syntactic gap between languages. Natural
language data can also have an internal graph structure like a syntactic dependency tree
despite having a sequential sequence. A syntactic dependency tree defines the syntactic
relationships between words in a sentence. Most available techniques look for semantic
similarities between languages and learn a language-agnostic representation for documents
written in many languages (Chen et al. 2018; Zhang et al. 2020). These include state-of-
the-art multilingual pre-trained language models (Devlin et al. 2019; Lample and Conneau
2019), which use large-scale multilingual corpora to pre-train transformer-based neural
networks.
In several tasks, the pre-trained language model techniques demonstrate improved
cross-lingual transfer capacity (Wu and Dredze 2019). It explicitly accounts for syntactic
differences between languages, which results in poor generalization performance on target
languages (Ahmad et al. 2019; Hu et al. 2020). On the other hand, there are often enough
unlabeled target language texts that naturally provide rich linguistic and task information.

13
L. Waikhom, R. Patgiri

However, only a few prior studies have taken advantage of the unlabeled data (Wan 2009;
Dong and Melo 2019). The graph-based framework, Cross-Lingual Heterogeneous GCN
(CLHG) Wang et al. (2021), integrates semantic and syntactic information within and
across languages. A Syntactic GCN approach is proposed by Marcheggiani et al. (2017),
which operates on top of a CNN/RNN sentence encoder. The proposed approach gathers
hidden words from the syntactical dependence tree of a sentence. The approach is further
used by Bastings et al. (2017) to do neural machine translation. To manage the semantic
dependence graph of a phrase, Marcheggiani et al. (2018) use the same model as Bast-
ings et al. (2017). Cai et al. (2020) work on graph-to-sequence learning that can produce
sentences with a semantic graph of abstract terms with the same meaning (called Abstract
meaning representation). A graph-LSTM is proposed by Song et al. (2018) to encode
graph-level semantic information. Beck et al. (2018) uses a Gated graph sequence neural
network (GGNN) Li et al. (2015) for neural machine translation and graph-to-Sequence
learning . In knowledge discovery, generating a semantic or knowledge network from a
phrase is quite beneficial (Johnson 2016; Chen et al. 2018).

8.2 Computer vision

GNN applications for computer vision include generating scene graphs, point cloud classi-
fication, and action identification. Recognizing semantic connections between things makes
it easier for the visual scene to grasp its meaning. Models used to produce scene graphs,
such as Xu et al. (2017), Yang et al. (2018), Li et al. (2018), etc., aim to analyze an image
in a graph consisting of objects and their semantic relations. In another application, real-
istic images are generated using scene graph (Johnson et al. 2018). Since natural language
can be processed as semanticized networks in which every word is an object, it is a possible
way of synthesizing images given text descriptions. Light Detection and Ranging (LIDAR)
sensors can see the surrounding world by classifying and segmenting point clouds. A point
cloud is a collection of three-dimensional points created by LIDAR scans. Works such as
Wang et al. (2019), Landrieu and Simonovsky (2018), Te et al. (2018) employ Convolu-
tional GNNs to investigate the topological structure of point clouds by converting them to
k-nearest neighbor graphs or super point networks. The ability to recognize human activi-
ties in films aids in better comprehending video material from a computer perspective. In
video footage, certain methods can determine the positions of human joints. Human joints
are naturally connected by skeletons, forming a graph. Works such as Jain et al. (2016);
Yan et al. (2018) use STGNNs for learning human activity patterns from a time sequence
of human joint positions. Furthermore, the number of GNN orientations used in computer
vision is continuously rising. It covers Human-object interaction (Qi et al. 2018), few-shot
picture categorization (Garcia and Bruna 2017; Guo et al. 2018; Liu et al. 2019), semantic
segmentation (Qi et al. 2017; Yi et al. 2017), visual reasoning (Chen et al. 2018), and ques-
tion responding (Narasimhan et al. 2018).

8.3 Image classification

The early document image classification algorithms used optical character recognition
(OCR) to draw content information as shown in (Dengel (1993), Shin et al. (2001), Dili-
genti et al. (2003)). Many advanced techniques have emerged successfully in recent dec-
ades, such as image characteristics, text features, and document layout information for

13
A survey of graph neural networks in various learning paradigms:…

document image classification. Deep Convolutional Neural Networks (DCNN) is one of


the successful models to provide new tools for document picture categorization (Kang
et al. 2014), as it can extract salient and hierarchical visual feature representations. DCNN
can partially mirror the hierarchical structure of document layout. Several DCNN train-
ing techniques are suggested and thoroughly examined for document image classification
(Kang et al. 2014; Afzal et al. 2017). Variants of VGG-16 Das et al. (2018) have been
upgraded in Tobacco data sets that are accessible to the public (Lewis et al. 2006). Word
embedding techniques (Mikolov et al. 2013) in NLP and big language models (Devlin et al.
2018) have also been used for document comprehension and analysis community to gener-
ate contextualized embedding for the textual information in an image. BERTGrid (Denk
and Reisswig 2019) extracts information by predicting segmentation masks using both
contextualized word embeddings and 2D layout coordinates. Using the text embeddings
and 2D layout positional embeddings collected using OCR and Region of Interest (ROI)
Regressions, LayoutLM (Xu et al. 2020) models find interactions between text and layout
information. Multi-modalities techniques (Audebert et al. 2019; Yang et al. 2017) are tai-
lored to document classification problems by taking into account both image and linguistic
modalities in document pictures.

8.4 Social networks

DeepInf by Qiu et al. (2018) incorporates user-specific features and network structures in
graph convergence and attention processes that can anticipate the social impact. SEAL by
Zhang et al. (2018) is a link prediction framework. SEAL extracts a local enclosing sub-
graph for each target link and learns generic graph-structured characteristics using a GNN.
Multiple Conditional Network Embedding (MCNE) is an embedding method by Wang
et al. (2019). The technique introduced the binary mask, followed by an attentive network,
combined with a GNN based on the message passing. Liu et al. (2019) observed that one
single vector representation does not suffice to integrate a network, for instance, when the
client has purchased products of different kinds on an online shopping website. The prob-
lematic vector representation of existing embedding methods tends to merge several node
characteristics into a single form. This research provides a polysemic technique for embed-
ding several node models. Ioannidis et al. (2019) proposed a model called Graph Recurrent
Neural Network (GRNN), where nodes are involved in multi-relationship situations. Yu
et al. (2020) developed a method using CNN to rank nodes, named RCNN, based on the
idea of GCNs in complex communication networks for identifying the critical nodes (super
Spreader).

8.5 E‑Commerce

Discovering item relationships has recently gotten a lot of attention (Wang et al. 2018;
Zhang et al. 2018; Rakesh et al. 2019; Zhao et al. 2017). The relationships among items
primarily lie in the content information of the items, such as descriptions and reviews of
the items (McAuley et al. 2015). Mcaulet et al. (2015) proposed a Sceptre that uses the
Latent Dirichlet Allocation (LDA) for learning the content features of items and fits a logis-
tic function over the Scepter. It is further extended using Variational Auto-encoder (VAE)
by He et al. (2016) to generate noisy word clusters by avoiding overfitting. Another area of
research is utilizing the picture of the items to infer the visual level relationships between
items. To reveal linkages at the visual level, Mcauley et al. (2015) and He et al. (2016) used

13
L. Waikhom, R. Patgiri

the picture of the items for style matching. He et al. (2016) investigated item visual infor-
mation and suggested a mixture of frameworks to handle complicated and diverse item
relationships. Zhang et al. (2018) aggregated both the descriptions and the pictures of the
items and captured distinct characteristics for heterogeneous item relationships, taking into
account the variations across item categories.

8.6 Recommender systems

GNNs are successfully used in recommender systems problems also. Graph-based


recommender systems treat things and users as nodes. The graph-based recommend-
ing systems can make high-quality suggestions by exploiting the relationships between
users and users, items and items, users and items, and the content information. A system
recommender’s key is determining if an item is essential to the user. It can therefore
be framed as an issue with the prediction of relations. The Works Berg et al. (2017);
Ying et al. (2018) employs Convolutional GNNs as the encoders to prevent missing
links between users and objects. Monti et al. (2017) integrate RNNs with graph con-
volutions to understand the mechanism behind which the known ratings are generated.
Zhang et al. (2022) presented a novel dynamic GNN model called Dynamic Graph Rec-
ommendation Network to extract the preference items of the users from the dynamic
users-items graph. Huang et al. (2021) proposed a knowledge-aware GNN model called
Knowledge-aware Coupled Graph Neural Network. This network integrates interdepend-
ent knowledge from users and items to provide recommendations. The network enables
high-order user- and item-wise relation encoding by utilizing mutual information for
global graph structure awareness. The authors also enhance the proposed network by
giving it the capability to capture dynamic, multi-typed user-item interactive patterns.
Gu et al. introduced a graph collaborative filtering model for the multi-behavior recom-
mendation that considers the discrepancies and commonalities of multiple behaviors.

8.7 Chemistry

Researchers also use GNNs in chemistry to characterize compounds or molecules where


atoms are represented as nodes in a compounds or molecules graph, and chemical bonds
between them are represented as edges. The primary tasks working by researchers in the
compounds or molecules graphs are node classification, graph generation, and graph
classification. Various works are done such as molecular fingerprints learning (Duve-
naud et al. 2015; Kearnes et al. 2016), to infer protein interfaces (Fout 2017), predicting
molecular properties (Gilmer et al. 2017), and synthesizing chemical compounds (Li
et al. 2018; Cao and Kipf 2018; You et al. 2018).
Recent works suggest that it is challenging to manually identify side effects caused
by drug-drug interactions as the interactions are infrequent (Bansal et al. 2014; Tatonetti
et al. 2012). Moreover, practically testing all the possible drug combinations is impos-
sible, and side effects in small clinical trials are not usually detected. Feng et al. (2020),
for example, utilizes a two-layer GCN to learn node embeddings and then a DNN to
predict drug-drug interaction. Zitnik et al. (2018) present a method for predicting drug-
drug interaction as a multi-relational link prediction problem on a multimodal graph
including drugs, proteins, and side effect relationships. Rohani et al. (2020) presented
a technique called Integrated similarity-constrained matrix factorization (ISCMF) for

13
A survey of graph neural networks in various learning paradigms:…

drug-drug interaction problems, which uses a similarity-constrained matrix factoriza-


tion. The method finds eight similarities, such as substructure, targets, and side effects,
and then integrates them to create an integrated similarity matrix after selection.

8.8 Drug discovery

Graph neural networks are becoming more and more useful and powerful in the field of
drug discovery (Li et al. 2020, 2021). One of the popular perspectives is the study of
the graph structure of a molecule using GNN models (Lu et al. 2019; Yang et al. 2019).
Nguyen et al. (2021) presented a GNN model to predict the drug-target affinity, which
reduces the cost and time of discovering and developing drugs. Chu et al. (2021) pro-
posed a method for the heterogeneous graph to learn various similarity features between
drugs and predict the targeted drugs’ relationship. Similarly, Cheng et al. (2021) also
work on heterogeneous graphs; here, the authors presented a method for obtaining the
node and substructure information of the heterogeneous graphs using graph embedding.
The study of molecular structure in drugs has been made easier because of the develop-
ment of GNN models. Li et al. (2021) proposed a GNN-based model by integrating the
distance and angle information between the nodes to increase drug prediction perfor-
mance. Similarly, Wang et al. (2021) presented a GNN-based multi-view comparative
representation learning method for drug interaction prediction by treating each node in
the graph as a drug molecule. Liu et al. (2020) presented a method that automatically
learns the potential representations between atoms and their bonds for a drug’s chemical
structure and predicts drug response for cancer.

8.9 Physical networks

Neuroscience research has become a major research field involving structural and func-
tional connectivity (Fornito et al. 2013) and centralized actions that characterize the
important area of a region in the network (Page et al. 1999). Functional connectivity
centrality has been used to show differences in age and sex, Bipolar disorder (Deng
et al. 2019; Zhou et al. 2017), Retinitis Pigmentosa (Lin et al. 2021), diabetic optic
neuropathy (Xu et al. 2020), and genotype (Wink et al. 2018). Furthermore, for patients
with prenatal alcohol exposure, structural connectivity centrality has also been utilized
to reveal disparities between patient and control groups (Long et al. 2020), traumatic
brain injury (Raizman et al. 2020), gut inflammation (Turkiewicz et al. 2021), and brain
tumors (Yu et al. 2016). To identify a connection between the structural centrality and
the functional complexity that suggests the more complicated functional activity of the
regions that incorporate information from multiple sources (Neudorf et al. 2020). How-
ever, one of the important questions still to be answered is—To what extent can struc-
tural connectivity account for variance in functional connectivity centrality measures?

8.10 Traffic

In the intelligent transport system, predicting traffic speed accurately and road volume
or density in traffic networks are essential. Many works such as Zhang et al. (2018),
Li et al. (2017), and Yu et al. (2017) use spatial-temporal GNNs (STGNNs) to make

13
L. Waikhom, R. Patgiri

models for addressing various traffic network problems. The authors consider such a
traffic network as a spatial-temporal graph. Sensors installed on the roads act as nodes,
and the distances between pairs of sensors are treated as edges. Each node is a dynamic
input feature with average traffic speed during a frame. Another important industrial-
level application of GNN is taxi demand prediction. Yao et al. (2018) uses LINE (Tang
et al. 2015) network embeddings because of historical taxi requirements, location,
weather, and event information. The LSTM and CNN are a joint representation for each
location to forecast the number of taxis requested for one place in a given period.

8.11 Circuit design

Graph neural networks have also played a great role in Circuit designing. Various GNN
techniques have been explored and incorporated in designing circuits in the last few
years. Graph neural network (GNN) is used in Zhang et al. (2019) to create layout tem-
plates for passive components in RF circuits. For analog circuit designs, a good genera-
tion technique based on a GAN (Generative Adversarial Network) is proposed by Xu
et al. (2019). In Zhu et al. (2019), a variational auto-encoder is used to learn from a
manual configuration and offer routing instructions. GCN (Graph Convolutional Net-
work) is used in the work of Kunal et al. (2020) to identify analog sub-circuits. Liu et al.
(2020) introduce an analog circuit performance model based on CNNs. GCN and rein-
forcement learning automate the size of analog transistors (Wang et al. 2020). Mirho-
seini et al. (2021) investigates the challenges in chip placement and utilizes GNNs as
the structural foundation for feature embeddings in RL policies and value functions that
are end-to-end trained using customized reward functions. Ren et al. (2020) use GNNs
for local information and circuit parasitic prediction based on the circuit’s graph. GNNs
serve as the foundation of the prediction models used by Zhang et al. (2019), Liu et al.
(221) to optimize analog circuit metrics. the recent spike of interest in using GNNs for
circuit prediction models (Ren et al. 2020; Ma et al. 2019).
The use of the GNNs is not restricted only to the domains and tasks described above
but also a variety of issues were investigated, such as program verification (Li et al. 2015),
social influence prediction (Qiu et al. (2018)), program reasoning (Allamanis et al. 2017),
event detection (Nguyen and Grishman 2018), electrical health records modeling (Choi
et al. 2017, 2018), and combinatorial optimization (Li et al. 2018). There are several other
important applications of GNN, such as the Neural Machine Translation –sequence-to-
sequence problem. One of the most popular uses of GNN is the incorporation of seman-
tic information into the Neural Machine Translation problem. On syntax-aware Neural
Machine Translation problems, both the Syntactic GCN (Kipf and Welling 2016) and
GGNN (Li et al. 2015) can also be used. It restructures the syntactic dependency network
by transforming the edges into extra nodes, allowing edge labels to be given as embed-
dings. Another task is Relation extraction—the process of extracting semantic relationships
from texts—which is handled by traditional systems as a pipeline of two independent tasks:
entity recognition and relation extraction. Still, current research indicates that end-to-end
modeling of entity and relation is critical for excellent performance since relations interact
directly with entity information.
Molecular fingerprints are the feature vectors that represent molecules. ML models esti-
mate the characteristics of a new molecule by learning from sample molecules with fixed-
length fingerprints as inputs. GNNs can replace previous methods that provide a fixed mol-
ecule encoding to provide differentiable fingerprints adapted to the task at hand. Protein

13
A survey of graph neural networks in various learning paradigms:…

interface prediction: This is a complex problem with substantial implications for drug
development. The proposed GCN-based technique learns and blends ligand and receptor
protein residue representations for pairwise categorization. Edges can be bindings between
atoms in a molecule or interactions between amino-acid residues in a protein at the molec-
ular level. Graphs can show interactions between more complicated entities like proteins,
mRNA, or metabolites on a vast scale.
Another task, Combinatorial optimization (CO), the selection of the best object from a
limited number of options, is the foundation for many specific financial, logistics, energy,
science, and hardware design applications. Graphs are used to solve the majority of Com-
binatorial optimization problems. DeepMind and Google recently employed graph nets
for two necessary subtasks in the mixed integer linear programming solver: bounding the
objective value and the joint variable. On large datasets, the NN technique outperforms
conventional solutions. Graph generation: Generative models for real-world graphs have
received a lot of attention because of their essential applications, such as modeling social
interactions, identifying novel chemical structures, and building knowledge graphs. The
GNN-based approach learns node embeddings individually for each graph and matches
them using attention methods. When compared to standard relaxation-based approaches,
this strategy performs well.

9 Dataset

Various graph-structured datasets are freely available for research and experimental pur-
poses for GNNs (Leskovec and Krevl 2014). The datasets are hosted in the diverse plat-
form, including the Citation Network dataset, Webpage Citation Networks, Social Net-
works, Co-purchase Networks, Bio-chemical, Temporal Networks, Communication
Networks, Autonomous Systems Graphs, and Road Netwoks, which are described in
Table 7. Table 7 describes the popular datasets for GNNs, which belong to one of the cat-
egories mentioned above. We describe the datasets with their number of nodes, edges, and
the number of classes.

9.1 Citation network

9.1.1 Cora

McCallum et al. (2000) dataset includes 2708 scientific papers in a class of seven. There
are 5429 links in the citation network, and each publication in the dataset includes an 0/1
word vector, which indicates the absence or presence of the corresponding word in the
dictionary. There are 1433 distinct words contained in the dictionary. In addition, Citeseer
(Giles et al. 1998) collection contains 3312 scientific papers that are divided into six cat-
egories. There are 4732 links in the citation network. A 0/1-valued word vector describes
each publication in the dataset, indicating the existence or absence of the associated term
from the dictionary. There are 3703 distinct terms in the dictionary. Moreover, Pubmed
(Sen et al. 2008) includes 19,717 scientific papers about diabetes from the PubMed data-
base, which are divided into 3 categories. There are 44,338 links in the citation network. A
TF/IDF weighted word vector from a dictionary of 500 unique words is used to character-
ize each publication in the dataset. Furthermore, DBLP (Tang et al. 2008) is a network of

13
Table 7  Summary of available benchmark datasets for GNN experimentation and their characteristics
Category Dataset Edge Type Heterogeneity Nature Nodes Edges Classes

13
Citation Networks Cora (McCallum et al. 2000) → ◦ □ 2708 5429 7
Citeseer (Giles et al. 1998) → ◦ □ 3327 4732 6
PubMed (Sen et al. 2008) → ◦ ▪ 19,717 44,338 3
DBLP (Tang et al. 2008) → ◦ ▪ 29,199 133,664 4
Patents (Leskovec et al. 2005) → ◦ □ 3,774,768 16,518,948 –
HepPh (Leskovec et al. 2005) → ◦ □ 34,546 421,578 –
HepTh (Leskovec et al. 2005) → ◦ □ 27,770 352,807 –
Web graphs Cornell (Zhang et al. 2020) → ∙ □ 195 286 5
Texas (Zhang et al. 2020) → ∙ □ 187 298 5
Washington (Zhang et al. 2020) → ∙ □ 230 417 5
Wisconsin (Zhang et al. 2020) → ∙ □ 265 479 5
BerkStan (Leskovec et al. 2008) → ∙ □ 685,230 7,600,595 –
Google (Leskovec et al. 2008) → ∙ □ 875,713 5,105,039 –
NotreDame (Albert et al. 1999) → ∙ □ 325729 1497134 –
Stanford (Leskovec et al. 2008) → ∙ □ 281,903 2,312,497 –
Social Networks Karate club (Zachary 1977) ↔ ∙ □ 34 77 2
Reddit (Hamilton et al. 2017) → ∙ ▪ 232965 11606919 41
BlogCatalog (Tang and Liu 2012) → ∙ □ 10312 333983 39
Flickr (Mislove et al. 2008) ↔ ∙ ▪ 1,715,256 22,613,981 5
Facebook (McAuley and Leskovec 2012) ↔ ∙ □ 4039 88,234 –
Youtube (Yang and Leskovec 2015) ↔ ∙ □ 1,138,499 2,990,443 8,385
Co-purchase Networks Amazon Computers (Shchur et al. 2018) → ∙ ▪ 13,752 245,861 10
Co-purchase Networks Amazon Photo (Shchur et al. 2018) → ∙ ▪ 7,650 119,081 8
Amazon0601 (Leskovec et al. 2007) → ∙ □ 403,394 3,387,388 –
Amazon-Meta (Leskovec et al. 2007) → ∙ □ 548,552 1,788,725 –
Coauthor CS (Shchur et al. 2018) → ◦ □ 18,333 81,894 15
Coauthor Physics (Shchur et al. 2018) → ◦ □ 34,493 247,962 5
L. Waikhom, R. Patgiri
Table 7  (continued)
Category Dataset Edge Type Heterogeneity Nature Nodes Edges Classes

Bio-chemical MUTAG (Debnath et al. 1991) ↔ ◦ □ 97900 202500 2


PROTEINS (Borgwardt et al. 2005) ↔ ◦ □ 39 72 2
PPI (Li and Zitnik 2021) → ∙ ▪ 56944 818716 121
NCI-1 (Zhang et al. 2020) ↔ ◦ □ 29 32 2
RedditHyperlinks Kumar et al. (2018) → ∙ ▪ 55,863 858,490 –
Temporal Networks stackoverflow (Paranjape et al. 2017) → ∙ □ 2,601,977 63,497,050 –
mathoverflow (Paranjape et al. 2017) → ∙ □ 24,818 506,550 –
superuser (Paranjape et al. 2017) → ∙ □ 194,085 1,443,339 –
askubuntu (Paranjape et al. 2017) → ∙ □ 159,316 964,437 –
wiki-talk-temporal (Paranjape et al. 2017) → ∙ □ 1,140,149 7,833,140 –
mooc (Kumar et al. 2019) → ∙ ▪ 7,143 411,749 –
Communication Networks email-EuAll (Leskovec et al. 2010) → ∙ □ 265,214 420,045 –
Enron (Leskovec et al. 2009) ↔ ∙ □ 36692 183,831 –
wiki-Talk (Leskovec et al. 2010) → ∙ □ 2394385 5021410 –
f2f-Resistance Bai et al. (2019) → ∙ ▪ 451 3,126,993 –
A survey of graph neural networks in various learning paradigms:…

as-733 (733 graphs) (Leskovec et al. 2005) ↔ ◦ □ 103-6,474 243-13,233 –


Autonomous systems graphs Skitter (Leskovec et al. 2005) ↔ ◦ □ 1,088,092 1,541,898 –
Caida (122 graphs) (Leskovec et al. 2005) → ◦ □ 1,379,917 1,921,660 –
Oregon-1 (9 graphs) (Leskovec et al. 2005) ↔ ◦ □ 1,379,917 1,921,660 –
Oregon-2 (9 graphs) (Leskovec et al. 2005) ↔ ◦ □ 1,379,917 1,921,660 –
Road Networks roadNet-CA (Leskovec et al. 2009) ↔ ◦ □ 1,965,206 2,766,607 –
roadNet-PA (Leskovec et al. 2009) ↔ ◦ □ 1,088,092 1,541,898 –
roadNet-TX (Leskovec et al. 2009) ↔ ◦ □ 1,379,917 1,921,660 –

Directed: →, Undirected: ↔, Homogeneous: ◦, Heterogeneous: ∙ , Weighted: ▪ and Unweighted: □

13
Table 8  Openly available source code for graph neural networking
Platforms Source Code Library Solution

13
GraphVite (DeepGraphLearning 2021) – CPU-GPU node embedding, visualization

Euler (Alibaba 2021) Tensorflow Node level, and graph level tasks

Paddle Graph Learning (PaddlePaddle 2021) Tensorflow Graph learning framework


Graph-Learn (Alibaba 2021) Tensorflow and Pytorch Focuses on portability and scalability

Deep Graph Library (Wang et al. 2021) Tensorflow and Pytorch Ease the DL on graph

OpenNE (Thunlp 2021) PyTorch Self-supervised/Unsupervised graph embedding


CogDL (Thudm 2021) Pytorch Node classification, Graph classification, and other important tasks in

the graph domain
GNN (SeongokRyu 2021) TensorFlow Molecular applications

Spektral (Danielegrattarola 2021) Tensorflow and Keras Social Networking, molecular, GAN etc

QGNN (Daiquocnguyen 2021) TensorFlow and PyTorch Provides quaternion GNNs


GraphGym (Deepmind 2021) PyTorch Parallel GNN Library


GNN-NLP (Svjan5 2021) PyTorch and TensorFlow GNN for NLP


Stellargrph (Data61 2018) TensorFlow and Keras Provides algorithms for graph classification

Autogl (Guan et al. 2021) Pytorch To conduct autoML on graph datasets and tasks easily and quickly

Jraph (Godwin et al. 2020) Jax A lightweight library for working with GNNs in jax

Pytorch geometric (Fey and Lenssen 2019) Pytorch A geometric DL extension library for PyTorch

Ktrain (Maiya 2020) Tensorflow and Keras Making DL and AI more accessible and easier to apply

PyTorch Geometric Temporal (Rozemberczki et al. 2021) Pytorch Making dynamic and temporal GNNs implementing quite easy

Deeprobust (Li et al. 2020) Pytorch An adversarial library for attack and defense methods on images and

graphs
Graphein (Jamasb et al. 2020) Pytorch Provides functionality for producing a number of types of graph-based

representations of proteins
Graph Nets (Wang et al. 2019) Tensorflow A deepmind’s library for building graph networks

L. Waikhom, R. Patgiri

The links are cited in the first column


A survey of graph neural networks in various learning paradigms:…

citations dataset. Citation datasets are collected from the databases such as DBLP, MAG
(Microsoft Academic Graph), and others. There are 29,199 publications and 133,664 cita-
tions in the initial edition. Each article has an abstract, authors, year, location, and title.
The dataset is utilized for clustering using network and side information, analyzing cita-
tion network influence, locating the most influential articles, and analyzing the topic, mod-
eling, etc. The publications are categorized into four classes. Citation network are mostly
directed, homogeneous and unweighted but the PubMed and DBLP are exception to it as
shown in Table 7, Patents (Leskovec et al. 2005) dataset include U.S. patent from 1/1/1963
to 12/30/1999 spanning 37 years. There are 3,774,768 nodes and 16,518,948 edges in this
citation network. The HepPh and HepTh (Leskovec et al. 2005) both datasets are collected
from Arxiv, and these datasets cover citation on high energy physics theory and phenom-
enon. Both are moderately big datasets with 34,556 and 22,770 node in the HepPh and
HepTh datasets. The number of edges is 421,578 and 352,807 edges, respectively.

9.2 Web graphs

9.2.1 Cornell dataset

Zhang et al. (2020) is a citation network of web pages obtained from Cornell Univer-
sity, with nodes representing webpages and edges representing connections or webpage
accesses. The underlying objects are corresponding persons who can visit particular web
pages to get course information or entertainment news. If two people are friends or class-
mates, they can share similar interests and online webpage accesses, explaining the net-
works’ closed triangles. The dataset consists of 195 nodes and 286 edges and is classified
into five different classes. Texas dataset (Zhang et al. 2020) is also a citation network of
webpages obtained from Texas University, with nodes representing webpages and edges
representing connections or webpage accesses. Also, it consists of 187 nodes and 298
edges and is categorized the dataset into five classes. The another similar citation networks
of webpages is Washington dataset (Zhang et al. 2020). The dataset is collected from
Washington University, with nodes representing web pages and edges representing connec-
tions or webpage accesses. It consists of 230 nodes and 417 edges and is categorized the
dataset into five classes.
One more similar citation network of webpages is Wisconsin dataset (Zhang et al. 2020)
that is collected from Wisconsin University, with nodes representing webpages and edges
representing connections or webpage accesses. Moreover, it consists of 265 nodes, 479
edges, and 1703 and is categorized the dataset into five classes. Web graph networks are
usually directed and heterogeneous. Table 7 shows that web graph datasets are unweighted.
Several other datasets like BerkStan, Google, Stanford (Leskovec et al. 2008), Notredame
(Albert et al. 1999) are webpage link datasets collected from various domains, these data-
sets contain huge number of nodes and edges.

9.3 Social networks

Karate club (Zachary 1977) dataset is a social network of a karate club that was observed
for three years from 1970 to 1972. The network includes 34 karate club members, with
relationships between pairs of members who interacted outside of the club. The dataset
includes 77 edges (connections between the pairs of members). The data are shortened into
a list of integer pairs. Each number symbolizes a karate club member, and a pair denotes

13
L. Waikhom, R. Patgiri

the interaction between the two. The dataset is classified into two classes. Reddit (Hamilton
et al. 2017) dataset is also one of the social network datasets that include Reddit posts from
September 2014. In this scenario, the node label is the community, or subreddit, to which
a post belongs. A post-to-post graph was created by sampling 50 major communities and
connecting posts where the same person commented on both. There are 232,965 postings
in this dataset, with an average degree of 492. The dataset has 232965 nodes and 11606919
edges and is divided into 41 classes. One important dataset for the social network is Blog-
Catalog (Tang and Liu 2012) dataset, which is a social network of bloggers from the Blog-
catalog website, which manages the bloggers and their blogs. This dataset has 10,312 blog-
gers with unique ids starting from 1 to 10,312 and 333,983 friendship pairs. Each blogger
belongs to multiple groups. There are 39 groups with indices ranging from 1 to 39. Flickr
(Mislove et al. 2008) dataset is a sparse social network of Flickr collected in a specific
time interval, where persons represent the nodes, the friend status represents the edges, and
labels represent user interests. The dataset has 1,715,256 nodes and 22,613,981 edges, and
the dataset is classified into five classes. Facebook (McAuley and Leskovec 2012) dataset
is a undirected graph along with Flickr, it contains friend list of user as nodes and connec-
tion between them as edges. It has 4,039 nodes and 88,234 edges between them. Youtube
(Yang and Leskovec 2015) dataset is also a popular video-based social network dataset
where users can friend each other, and users can also create groups in which other users
can join. Here, the genre interests indicate the labels. The dataset consists of 1,128,499
nodes and 2,990,443 edges.

9.4 Co‑purchase networks

Amazon Computers (Shchur et al. 2018) dataset is an Amazon-created network of co-pur-


chase connections, where nodes represent items and the relation of often purchased goods
together. Each node is labeled with its category and includes a sparse bag-of-words fea-
ture encoding product reviews. The dataset includes 13,752 nodes (items), 245,861 edges
(items that are purchased together), and ten classes (product categories). Another Co-pur-
chase network dataset is Amazon Photo (Shchur et al. 2018) dataset, which Amazon also
creates. Similar to the Amazon Computers dataset, The Amazon Photo dataset is also a
network created of co-purchase connections, where nodes represent items. The edge rep-
resents the purchased items together. Each node is labeled with its category and includes
a sparse bag-of-words feature encoding product reviews. The dataset includes 7650 nodes
(items), 119,081 edges (items that are purchased together), and eight classes (product cat-
egories). Amazon0601 (Leskovec et al. 2007) and Amazon-Meta (Leskovec et al. 2007) are
also the popular amazon product co-purchasing network collected from June 1 2003. The
Amazon0601 dataset contains 403,394 nodes and 3,387,388 edges, and the Amazon-Meta
dataset contains 548,552 nodes and 1,788,725 edges. Co-author CS (Shchur et al. 2018)
dataset is an academic network that contains graphs made up of co-authorship based on
the Microsoft Academic Graph, which is collected from the KDD Cup 2016 competition.
The graph nodes represent authors, and the edges represent co-authorship connections.
Two nodes are considered connected if the nodes co-authored a publication. Each node
contains a sparse bag-of-words feature that is based on the authors’ paper keywords. The
authors’ label refers to their most active research area. The dataset contain 18,333 nodes,
81,894 edges, and 15 label/classes. Another important Co-purchase network is Co-author
Physics (Shchur et al. 2018) dataset containing co-authorship graphs based on the Micro-
soft Academic Graph from the KDD Cup 2016 competition. The nodes in these graphs

13
A survey of graph neural networks in various learning paradigms:…

represent authors, and the edges represent co-authorship connections; two nodes are con-
nected if the nodes co-authored a publication. Each node contains a sparse bag-of-words
feature that is based on the authors’ paper keywords. The authors’ label refers to their most
active research area. The dataset includes 34,493 nodes, 247,962 edges, and 5 label/classes.

9.5 Bio‑chemical

MUTAG​ (Debnath et al. 1991) dataset is obtained by collecting nitroaromatic compounds


of mutagenicity in Salmonella typhimurium. The dataset is created for the prediction
task. Here, input graphs are used to represent chemical compounds, with nodes represent-
ing atoms and edges between nodes representing bonds between the corresponding atoms
(encoded using one-hot encoding). It contains 97900 nodes and 202500 edges and is clas-
sified into two classes. PROTEIN (Borgwardt et al. 2005) dataset is a collection of proteins
categorized as either enzymes or non-enzymes. Nodes indicate amino acids, and if two
nodes are less than 6 Angstroms away, the nodes are connected by an edge. The dataset
consists of 39 nodes and 72 edges and is classified into two classes. Protein–Protein Inter-
actions (PPI) (Li and Zitnik 2021) dataset contains yeast protein interactions which are
collected from the Molecular Signatures Database. In various PPI graphs, each graph cor-
responds to different human tissue. The positional gene sets represent nodes (56944), the
gene ontology sets are given as classes or labels (121 in total), and interactions between the
proteins represent edges (818716). NCI1 (Zhang et al. 2020) is also a dataset that originates
from the chemical field, in which each input graph represents a chemical compound, each
node is a molecular atom, and its edges indicate links between atoms. This data relates to
anti-cancer screenings, in which chemicals are identified as positive or negative for lung
cancer cells. Each node has an input label representing the associated atom type, encoded
into a vector of 0/1 elements using a one-hot encoding method. The dataset contains 29
nodes and 32 edges and is categorized into two classes.

9.6 Temporal networks

RedditHyperlinks (Kumar et al. 2018), stackoverflow (Paranjape et al. 2017), mathover-


flow (Paranjape et al. 2017), superuser (Paranjape et al. 2017), askubuntu (Paranjape et al.
2017), wiki-talk-temporal (Paranjape et al. 2017), and mooc (Kumar et al. 2018) are the
popular temporal networks. The RedditHyperlinks is a network of Hyperlinks between
Reddit subreddits, consisting of 55,863 nodes and 858,490 edges. The stackoverflow is a
network of Comments, questions, and answers on Stack Overflow. The mathoverflow is
a network for Comments, questions, and answers on Math Overflow consisting of 24,818
nodes and 506,550 edges. The superuser is a temporal network for Comments, questions,
and answers on Super User containing 194,085 nodes and 1,443,339 edges. The askubuntu
is a network for Comments, questions, and answers on Ask Ubuntu, consisting of 159,316
nodes and 964,437 edges. The wiki-talk-temporal is a temporal network of Users editing
talk pages on Wikipedia, containing 1,140,149 nodes and 7,833,140 edges. The mooc is
a network of student actions on a MOOC platform with student drop-out binary labels; it
consists of 7,143 nodes and 411,749 edges.

13
L. Waikhom, R. Patgiri

9.7 Communication networks

The commonly used Communication networks datasets are email-EuAll (Leskovec et al.
2010), Enron (Leskovec et al. 2009), wiki-Tal (Leskovec et al. 2010), and f2f-Resistance
(Bai et al. 2019). The email-EuAll communication network is the email network from an
EU research institution, which consists of 265,214 nodes and edges 420,045. The Enron
is the email communication network from Enron, consisting of 36692 nodes and 183831
edges. The wiki-Talk is the Wikipedia talk communication network having 2394385 nodes
and 5021410 edges. The f2f-Resistance is the dynamic face-to-face interaction network
between group of people, consisting 451 nodes and 3,126,993 edges.

9.8 Autonomous systems graphs

as-733 (Leskovec et al. 2005), Skitter (Leskovec et al. 2005), Caida (Leskovec et al. 2005),
Oregon-1 (Leskovec et al. 2005), and Oregon-2 (Leskovec et al. 2005) are the commonly
used available datasets for autonomous systems graphs. The as-733 is the autonomous sys-
tems graphs of 733 daily instances (graphs) from November 8 1997 to January 2 2000,
consisting 103-6,474 nodes and 243-13,233 edges. The Skitter is the autonomous systems
internet topology graph, collected from trace routes run daily in 2005. The dataset con-
sists of 1,088,092 nodes and 1,541,898 edges. The Caida is the CAIDA autonomous rela-
tionships dataset, collected from January 2004 to November 2007. The dataset consists of
1,379,917 nodes and 1,921,660 edges. The Oregon-1 is the autonomous system peering
information inferred from Oregon route views between March 31 and May 26, 2001. The
dataset consists of 1,379,917 nodes and 1,921,660 edges. The Oregon-2 is the autonomous
system peering information inferred from Oregon route views between March 31 and May
26, 2001. The dataset consists of 1,379,917 nodes and 1,921,660 edges.

9.9 Road Networks

roadNet-CA (Leskovec et al. 2009), roadNet-PA (Leskovec et al. 2009), and roadNet-TX
(Leskovec et al. 2009) are the three commonly used road networks datasets. The road-
Net-CA is the dataset collected from road network of California, consisting 1,965,206
nodes and 2,766,607 edges. The roadNet-PA is the road network of Pennsylvania, con-
sisting 1,088,092 nodes and 1,541,898 edges. The roadNet-TX is the road network of
Texas, consisting 1,379,917 nodes and 1,921,660 edges.

10 Open challenges

Although GNNs have succeeded in several domains, they cannot provide acceptable
answers for graph structures in several situations. Several of the following open prob-
lems need to be investigated further.

13
A survey of graph neural networks in various learning paradigms:…

10.1 Complex graph structures

In real-world applications, graph topologies are both flexible and complicated. Many
studies are offered to deal with complex graph structures such as heterogeneous graphs
or dynamic graphs. With the fast expansion of social networks on the Internet, new
issues, difficulties, and application scenarios will undoubtedly emerge, demanding
more powerful models. While graphs are typical for modeling complicated systems, an
abstraction is sometimes too basic to use as dynamic, time-changing systems in the real
world. Sometimes it is the temporal behavior of a system that provides critical insights.
Despite recent advances, developing GNN models capable of coping with continuous-
time graphs represented as a stream of the node or edge-wise events remains an open
subject of study.

10.2 Model depth

Deep neural architectures are the key to successful deep learning-based approaches to
graphs. However, the performance of the ConvGNN can decline drastically by using
many graph convolutionary layers. Graph convolutions drive neighboring nodes’ rep-
resentations closer to each other. In the end, all nodes’ embeddings converge. In prin-
ciple, it should happen with an unlimited number of graph convolutionary layers. All
this raises the question if a deep learning-based approach is still a suitable technique for
learning graph data.

10.3 Scalability

Scalability is a significant limitation for industrial applications that often deal with large
graphs. Twitter has millions of nodes with billions of edges and low latency restric-
tions. Many models presented in the literature are utterly inappropriate for large-scale
contexts, and the academic research community has virtually completely disregarded
this aspect until recently. Furthermore, graphics processing unit (GPU) hardware is not
always the most crucial choice for graph-structured data. In the long term, it requires
graph-specific hardware.

10.4 Higher‑order structures

Higher-order structures such as motifs, graphlets, and simplicial complexes are essential in
complex networks. Such as characterizing protein–protein interactions in biological appli-
cations. GNNs, on the other hand, are mostly confined to nodes and edges. The transmis-
sion mechanism might give graph-based models additional expressive capability, including
such components in the message.

10.5 Robustness and guaranteed performance

Another important and mostly new research topic is the robustness of GNNs when data
are noisy or exposed to opponent assaults. GNNs are equally subject to adversarial attacks
as a family of neural network models. In contrast to adversarial attacks on pictures or text

13
L. Waikhom, R. Patgiri

focusing solely on characteristics, graph attacks consider structural information. Several


works have been proposed to attack existing graph models, and more robust models are
proposed to defend.

10.6 Stable and permutation equivariant GNN layers

A GNN layer should be stable and permutation equivariant. If these two conditions are
satisfied, the GNN layer is guaranteed to be transferrable and have improved generaliza-
tion capability. Permutation equivariance encapsulates the essential inductive bias of sev-
eral graph learning problems and indicates that model predictions should be irrelevant to
how one indexes the nodes. If the two graphs perfectly match each other, then a permuta-
tion equivariant layer can predict the same thing over a new testing graph as it can over a
training graph. Stability is a more important requirement than permutation equivariance
because it describes the expected difference between the predictions of the two graphs
when they do not match exactly.

11 Conclusion

This article provides a comprehensive literature survey of graph neural network methods,
their applications, and open challenges. Background knowledge to understand the studied
domain is presented at the beginning. Theoretical and empirical aspects of GNNs are pre-
sented to provide an in-depth understanding and evolution of the theory underlying GNNs.
Architectural components present in most practical GNN-based approaches are elaborated
to give an overview of the real-world employability of GNN-based solutions to various
problems. Further, we classify the GNNs-based approaches given by researchers in the
past for various problems based on the learning paradigms the approach has employed,
such as supervised, unsupervised, semi-supervised, self-supervised, and few-shot or meta-
learning. Each learning paradigm is introduced, and methods falling under that category
are surveyed and summarized in tables. The approaches for each learning task are analyzed
from theoretical and empirical standpoints. We also present a general design guideline
for designing GNN architectures for different hierarchical levels of tasks on graph data.
Popular datasets and useful applications of GNNs are also present to give an overview of
various fields employing GNN-based solutions. Finally, we have presented open challenges
and future research avenues in the GNN domain for the researchers to work on.

References
Abadal S, Jain A, Guirado R, López-Alonso J, Alarcón E (2021) Computing graph neural networks: a sur-
vey from algorithms to accelerators
Adhikari B, Zhang Y, Ramakrishnan N, Prakash BA (2018) Sub2vec: feature learning for subgraphs. In:
Pacific-Asia conference on knowledge discovery and data mining. Springer, Melbourne, Australia. pp
170–182
Afzal MZ, Kölsch A, Ahmed S, Liwicki M (2017) Cutting the error by half: Investigation of very deep
CNN and advanced training strategies for document image classification. In: 2017 14th IAPR Inter-
national Conference on Document Analysis and Recognition (ICDAR), vol 1, pp 883–888. IEEE,
Kyoto, Japan. IEEE

13
A survey of graph neural networks in various learning paradigms:…

Ahmed A, Shervashidze N, Narayanamurthy S, Josifovski V, Smola AJ (2013) Distributed large-scale natu-


ral graph factorization. In: Proceedings of the 22nd international conference on World Wide Web, pp
37–48. Association for Computing Machinery, New York, NY, USA
Ahmad W, Zhang Z, Ma X, Hovy E, Chang K-W, Peng N (2019) On difficulties of cross-lingual transfer
with order differences: a case study on dependency parsing. In: Proceedings of the 2019 conference of
the north american chapter of the association for computational linguistics: human language technolo-
gies, volume 1 (Long and Short Papers), pp. 2440–2452. Association for Computational Linguistics,
Minneapolis, Minnesota
Albert R, Jeong H, Barabási A-L (1999) Diameter of the world-wide web. Nature 401(6749):130–131
Alibaba: Euler (2021a). https://​github.​com/​aliba​ba/​euler
Alibaba: Graph-learn (2021b) https://​github.​com/​aliba​ba/​graph-​learn
Allamanis M, Brockschmidt M, Khademi M (2017) Learning to represent programs with graphs
Audebert N, Herold C, Slimani K, Vidal C (2019) Multimodal deep networks for text and image-based
document classification
Bai C, Kumar S, Leskovec J, Metzger M, Nunamaker J, Subrahmanian V (2019) Predicting the visual focus
of attention in multi-person discussion videos. In: IJCAI 2019, pp 4504–4510. International Joint
Conferences on Artificial Intelligence
Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H, Xiao G, Li Y, Allen J, Zhong R et al (2014)
A community computational challenge to predict the activity of pairs of compounds. Nat Biotechnol
32(12):1213–1222
Barceló P, Kostylev EV, Monet M, Pérez J, Reutter J, Silva JP (2019) The logical expressiveness of graph
neural networks. In: International conference on learning representations, pp 1–21. ICLR, Ethiopia
Bastings J, Titov I, Aziz W, Marcheggiani D, Sima’an K (2017) Graph convolutional encoders for syntax-
aware neural machine translation
Battaglia PW, Pascanu R, Lai M, Rezende D, Kavukcuoglu K (2016) Interaction networks for learning
about objects, relations and physics
Beck D, Haffari G, Cohn T (2018) Graph-to-sequence learning using gated graph neural networks
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In:
Proceedings of the 14th International conference on neural information processing systems: natural
and synthetic. NIPS’01, pp 585–591. MIT Press, Cambridge, MA, USA
Berg Rvd, Kipf TN, Welling M (2017) Graph convolutional matrix completion
Bertinetto L, Henriques JF, Valmadre J, Torr P, Vedaldi A (2016) Learning feed-forward one-shot learners.
Advances in neural information processing systems 29
Berton L, De Andrade Lopes A (2014) Graph construction based on labeled instances for semi-supervised
learning. In: 2014 22nd international conference on pattern recognition, pp 2477–2482. IEEE, Stock-
holm, Sweden. https://​doi.​org/​10.​1109/​ICPR.​2014.​428
Berton L, de Paulo FT, Valejo A, Valverde-Rebaza J, de Andrade LA (2017) Rgcli: Robust graph that con-
siders labeled instances for semi-supervised learning. Neurocomputing 226:238–248
Borgwardt KM, Ong CS, Schönauer S, Vishwanathan S, Smola AJ, Kriegel H-P (2005) Protein function
prediction via graph kernels. Bioinformatics 21:47–56
Bruna J, Zaremba W, Szlam A, LeCun Y (2013) Spectral networks and locally connected networks on
graphs. arXiv:​1312.​6203
Bui ND, Yu Y, Jiang L (2021) Infercode: Self-supervised learning of code representations by predicting
subtrees. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), pp
1186–1197. IEEE, Madrid, ES. IEEE
Cai D, Lam W (2020) Graph transformer for graph-to-sequence learning. Proc AAAI Conf Artif Intell
34:7464–7471
Cai H, Zheng VW, Chang KC-C (2017) Active learning for graph embedding
Cao S, Lu W, Xu Q (2015) Grarep: Learning graph representations with global structural information.
In: Proceedings of the 24th ACM International on Conference on Information and Knowledge
Management. CIKM ’15, pp. 891–900. Association for Computing Machinery, New York, NY,
USA. https://​doi.​org/​10.​1145/​28064​16.​28065​12
Cao S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: Proceedings of
the Thirtieth AAAI Conference on Artificial Intelligence. AAAI’16, pp. 1145–1152. AAAI Press,
Phoenix, Arizona
Cao Z, Li X, Zhao L (2020) Unsupervised Feature Learning by Autoencoder and Prototypical Contras-
tive Learning for Hyperspectral Classification
Cao J, Lin X, Guo S, Liu L, Liu T, Wang B (2021) Bipartite graph embedding via mutual informa-
tion maximization. In: Proceedings of the 14th ACM International Conference on Web Search and

13
L. Waikhom, R. Patgiri

Data Mining. WSDM ’21, pp. 635–643. Association for Computing Machinery, New York, NY,
USA. https://​doi.​org/​10.​1145/​34379​63.​34417​83
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual
features. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision - ECCV 2018.
Springer, Cham, pp 139–156
Chami I, Ying Z, Ré C, Leskovec J (2019) Hyperbolic graph convolutional neural networks. Advances in
neural information processing systems 32
Che F, Yang G, Zhang D, Tao J, Liu T (2021) Self-supervised graph representation learning via boot-
strapping. Neurocomputing 456:88–96
Chen C, Wang H-L, Wu S-H, Huang H, Zou J-L, Chen J, Jiang T-Z, Zhou Y, Wang G-H (2015) Abnor-
mal degree centrality of bilateral putamen and left superior frontal gyrus in schizophrenia with
auditory hallucinations: a resting-state functional magnetic resonance imaging study. Chin Med J
128(23):3178
Chen X, Sun Y, Athiwaratkun B, Cardie C, Weinberger K (2018a) Adversarial deep averaging networks
for cross-lingual sentiment classification. Trans Assoc Comput Linguist 6:557–570
Chen X, Li L-J, Fei-Fei L, Gupta A (2018b) Iterative visual reasoning beyond convolutions. In: Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7239–7248. IEEE,
Salt Lake City, UT, USA
Chen B, Sun L, Han X (2018c) Sequence-to-action: End-to-end semantic graph generation for semantic
parsing
Chen T, Kornblith S, Norouzi M, Hinton G (2020b) A simple framework for contrastive learning of
visual representations. Proceedings of the 37th International Conference on Machine Learning
119:1597–1607
Chen Z, Villar S, Chen L, Bruna J (2019) On the equivalence between graph isomorphism testing and
function approximation with gnns
Chen D, Lin Y, Li W, Li P, Zhou J, Sun X (2020a) Measuring and relieving the over-smoothing problem
for graph neural networks from the topological view. Proc AAAI Conf Artif Intell 34(04):3438–
3445. https://​doi.​org/​10.​1609/​aaai.​v34i04.​5747
Cheng S, Zhang L, Jin B, Zhang Q, Lu X (2021) Drug target prediction using graph representation learn-
ing via substructures contrast
Choi E, Bahadori MT, Song L, Stewart WF, Sun J (2017) Gram: Graph-based attention model for health-
care representation learning. In: Proceedings of the 23rd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining. KDD ’17, pp. 787–795. Association for Computing
Machinery, New York, NY, USA. https://​doi.​org/​10.​1145/​30979​83.​30981​26
Choi E, Xiao C, Stewart WF, Sun J (2018) Mime: Multilevel medical embedding of electronic health
records for predictive healthcare
Choudhary N, Rao N, Katariya S, Subbian K, Reddy CK (2021) Self-supervised hyperboloid representa-
tions from logical queries over knowledge graphs. In: Proceedings of the Web Conference 2021.
WWW ’21, pp. 1373–1384. Association for Computing Machinery, New York, NY, USA. https://​
doi.​org/​10.​1145/​34423​81.​34499​74
Chu Y, Kaushik AC, Wang X, Wang W, Zhang Y, Shan X, Salahub DR, Xiong Y, Wei D-Q (2021) Dti-
cdf: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid
features. Brief Bioinform 22(1):451–462
Cui G, Zhou J, Yang C, Liu Z (2020) Adaptive graph encoder for attributed graph embedding. In: Pro-
ceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data
Mining. KDD ’20, pp. 976–985. Association for Computing Machinery, NY, USA
Daiquocnguyen: QGNN. (2021) https://​github.​com/​daiqu​ocngu​yen/​QGNN
Danielegrattarola: Spektral (2021) https://​github.​com/​danie​legra​ttaro​la/​spekt​ral
Das A, Roy S, Bhattacharya U, Parui SK (2018) Document image classification with intra-domain trans-
fer learning and stacked generalization of deep convolutional neural networks. In: 2018 24th Inter-
national Conference on Pattern Recognition (ICPR), pp. 3180–3185. IEEE, Beijing, China. IEEE
Data61 C (2018) StellarGraph Machine Learning Library. GitHub
De Cao N, Kipf T (2018) MolGAN: An implicit generative model for small molecular graphs
Debnath AK, Lopez de Compadre RL, Debnath G, Shusterman AJ, Hansch C (1991) Structure–activity
relationship of mutagenic aromatic and heteroaromatic nitro compounds correlation with molecu-
lar orbital energies and hydrophobicity. J Med Chem 34(2):786–797
DeepGraphLearning: graphvite (2021). https://​github.​com/​DeepG​raphL​earni​ng/​graph​vite
Deepmind: Graph_nets (2021). https://​github.​com/​deepm​ind/​graph_​nets
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast
localized spectral filtering. Adv Neural Inf Process Syst 29:3844–3852

13
A survey of graph neural networks in various learning paradigms:…

Dehmamy N, Barabási A-L, Yu R (2019) Understanding the representation power of graph neural net-
works in learning graph topology
Deng W, Zhang B, Zou W, Zhang X, Cheng X, Guan L, Lin Y, Lao G, Ye B, Li X et al (2019) Abnor-
mal degree centrality associated with cognitive dysfunctions in early bipolar disorder. Front Psych
10:140
Dengel A (1993) Initial learning of document structure. In: Proceedings of 2nd International Conference
on Document Analysis and Recognition (ICDAR’93), pp. 86–90. IEEE Tsukuba, Japan. IEEE
Denk TI, Reisswig C (2019) Bertgrid: Contextualized embedding for 2d document representation and
understanding
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers
for language understanding
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transform-
ers for language understanding. In: Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol-
ume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Min-
neapolis, Minnesota
Dhillon PS, Talukdar PP, Crammer K (2010) Learning better data representation using inference-driven
metric learning. In: Proceedings of the ACL 2010 Conference Short Papers. ACLShort ’10, pp.
377–381. Association for Computational Linguistics, USA
Diligenti M, Frasconi P, Gori M (2003) Hidden tree Markov models for document image classification.
IEEE Trans Pattern Anal Mach Intell 25(4):519–523
Dong XL, de Melo G (2019) A robust self-learning framework for cross-lingual text classification. In:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and
the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.
6306–6310. Association for Computational Linguistics, Hong Kong, China
Dong Y, Chawla NV, Swami A (2017) Metapath2vec: Scalable representation learning for heterogene-
ous networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 135–144. Association for Computing Machinery, New York, NY,
USA. https://​doi.​org/​10.​1145/​30979​83.​30980​36
Du L, Wang Y, Song G, Lu Z, Wang J (2018) Dynamic network embedding: An extended approach for
skip-gram based network embedding. In: IJCAI, vol. 2018, pp. 2086–2092. AAAI Press Stock-
holm, Sweden
Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, Gómez-Bombarelli R, Hirzel T, Aspuru-Guzik A,
Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints
Dwivedi VP, Joshi CK, Laurent T, Bengio Y (2020) Benchmarking graph neural networks
Edwards H, Storkey A (2016) Towards a neural statistician. arXiv:​1606.​02185
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal
Mach Intell 28(4):594–611
Feng Y, You H, Zhang Z, Ji R, Gao Y (2019) Hypergraph neural networks. Proc AAAI Conf Artif Intell
33:3558–3565
Feng Y-H, Zhang S-W, Shi J-Y (2020) Dpddi: a deep predictor for drug–drug interactions. BMC Bioinform
21(1):1–15
Fey M, Lenssen JE (2019) Fast graph representation learning with PyTorch Geometric. In: ICLR Workshop
on Representation Learning on Graphs and Manifolds, pp. 1–9
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In:
International Conference on Machine Learning, pp. 1126–1135. PMLR
Fornito A, Zalesky A, Breakspear M (2013) Graph analysis of the human connectome: promise, progress,
and pitfalls. Neuroimage 80:426–444
Fouss F, Pirotte A, Renders J-M, Saerens M (2007) Random-walk computation of similarities between
nodes of a graph with application to collaborative recommendation. IEEE Trans Knowl Data Eng
19(3):355–369
Fout AM (2017) Protein interface prediction using graph convolutional networks. PhD thesis, Colorado
State University
Gao L, Yang H, Zhou C, Wu J, Pan S, Hu Y (2018) Active discriminative network representation learn-
ing. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 2142–2148. AAAI Press,
Stockholm, Sweden
Garcia V, Bruna J (2017) Few-shot learning with graph neural networks. arXiv:​1711.​04043
Garg V, Jegelka S, Jaakkola T (2020) Generalization and representational limits of graph neural networks.
Proceedings of the 37th International Conference on Machine Learning 119:3419–3430

13
L. Waikhom, R. Patgiri

Giles CL, Bollacker KD, Lawrence S (1998) Citeseer: An automatic citation indexing system. In: Proceed-
ings of the Third ACM Conference on Digital Libraries. DL ’98, pp. 89–98. Association for Comput-
ing Machinery, New York, NY, USA. https://​doi.​org/​10.​1145/​276675.​276685
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chem-
istry. Proceedings of the 34th International Conference on Machine Learning 70:1263–1272
Godwin* J, Keck* T, Battaglia P, Bapst V, Kipf T, Li Y, Stachenfeld K, Veličković P, Sanchez-Gonzalez A
(2020) Jraph: A library for graph neural networks in jax. http://​github.​com/​deepm​ind/​jraph
Gori M, Monfardini G, Scarselli F (2005) A new model for learning in graph domains. In: Proceedings.
2005 IEEE International Joint Conference on Neural Networks, 2005., vol. 2, pp. 729–734. IEEE,
Montreal, QC, Canada. IEEE
Grill J-B, Strub F, Altché F, Tallec C, Richemond PH, Buchatskaya E, Doersch C, Pires BA, Guo ZD, Azar
MG, et al (2020) Bootstrap your own latent: a new approach to self-supervised learning
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864.
Association for Computing Machinery, New York, NY, USA
Gu S, Wang X, Shi C, Xiao D Self-supervised graph neural networks for multi-behavior recommendation
Guan C, Zhang Z, Li H, Chang H, Zhang Z, Qin Y, Jiang J, Wang X, Zhu W (2021) AutoGL: A library for
automated graph learning. In: ICLR 2021 Workshop on Geometrical and Topological Representation
Learning, pp. 1–8. https://​openr​eview.​net/​forum?​id=​0yHwp​LeInDn
Guo M, Chou E, Huang D-A, Song S, Yeung S, Fei-Fei L (2018) Neural graph matching networks for fews-
hot 3d action recognition. In: Proceedings of the European Conference on Computer Vision (ECCV),
pp. 653–669. Springer, Cham
Hamilton WL, Ying R, Leskovec J (2017a) Inductive representation learning on large graphs. Proceedings
of the 31st International Conference on Neural Information Processing Systems 30:1025–1035
Hamilton WL, Ying R, Leskovec J (2017b) Representation learning on graphs: Methods and applications
Hammond DK, Vandergheynst P, Gribonval R (2011) Wavelets on graphs via spectral graph theory. Appl
Comput Harmon Anal 30(2):129–150
Hassani K, Khasahmadi AH (2020) Contrastive multi-view representation learning on graphs. Proceedings
of the 37th International Conference on Machine Learning 119, 4116–4126. PMLR
He R, McAuley J (2016) Ups and downs: Modeling the visual evolution of fashion trends with one-class
collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web, pp.
507–517. International World Wide Web Conferences Steering Committee, Republic and Canton of
Geneva, CHE
He R, Packer C, McAuley J (2016) Learning compatibility across categories for heterogeneous item rec-
ommendation. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 937–942.
IEEE, Barcelona, Spain. IEEE
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation
learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 9729–9738. IEEE Seattle, WA, USA
Henaff M, Bruna J, LeCun Y (2015) Deep convolutional networks on graph-structured data. arXiv:​1506.​
05163
Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learn-
ing deep representations by mutual information estimation and maximization
Hu Z, Fan C, Chen T, Chang K-W, Sun Y (2019a) Pre-training graph neural networks for generic structural
feature extraction
Hu W, Liu B, Gomes J, Zitnik M, Liang P, Pande V, Leskovec J (2019b) Strategies for pre-training graph
neural networks
Hu Z, Dong Y, Wang K, Chang K-W, Sun Y (2020a) Gpt-gnn: Generative pre-training of graph neural
networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Dis-
covery & Data Mining, pp. 1857–1867. Association for Computing Machinery, New York, NY, USA
Hu W, Fey M, Zitnik M, Dong Y, Ren H, Liu B, Catasta M, Leskovec J (2020b) Open graph benchmark:
Datasets for machine learning on graphs
Hu J, Ruder S, Siddhant A, Neubig G, Firat O, Johnson M (2020c) XTREME: A massively multilingual
multi-task benchmark for evaluating cross-lingual generalisation. Proceedings of the 37th Interna-
tional Conference on Machine Learning 119, 4411–4421
Hu S, Xiong Z, Qu M, Yuan X, Côté M-A, Liu Z, Tang J (2020d) Graph policy network for transferable
active learning on graphs
Hu Y, Li X, Wang Y, Wu Y, Zhao Y, Yan C, Yin J, Gao Y (2021) Adaptive hypergraph auto-encoder for
relational data clustering. IEEE Transactions on Knowledge and Data Engineering

13
A survey of graph neural networks in various learning paradigms:…

Huang C, Xu H, Xu Y, Dai P, Xia L, Lu M, Bo L, Xing H, Lai X, Ye Y (2021a) Knowledge-aware coupled


graph neural network for social recommendation. Proc AAAI Conf Artif Intell 35:4115–4122
Huang H, Shi R, Zhou W, Wang X, Jin H, Fu X (2021b) Temporal heterogeneous information network
embedding. In: IJCAI, pp. 1470–1476
Hwang D, Park J, Kwon S, Kim K-M, Ha J-W, Kim HJ (2020) Self-supervised auxiliary learning with meta-
paths for heterogeneous graphs
Ioannidis VN, Marques AG, Giannakis GB (2019) A recurrent graph neural network for multi-relational
data. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Pro-
cessing (ICASSP), pp. 8157–8161. IEEE, Brighton, UK. IEEE
Jain A, Zamir AR, Savarese S, Saxena A (2016) Structural-rnn: Deep learning on spatio-temporal graphs.
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5308–
5317. IEEE, Las Vegas, NV, USA
Jamasb AR, Lio P, Blundell T (2020) Graphein - a python library for geometric deep learning and network
analysis on protein structures. https://​doi.​org/​10.​1101/​2020.​07.​15.​204701
Jiao Y, Xiong Y, Zhang J, Zhang Y, Zhang T, Zhu Y (2020) Sub-graph contrast for scalable self-supervised
graph representation learning. In: 2020 IEEE International Conference on Data Mining (ICDM), pp.
222–231. IEEE, Sorrento, Italy. IEEE
Ji H, Wang X, Shi C, Wang B, Yu P (2021) Heterogeneous graph propagation network. IEEE Transactions
on Knowledge and Data Engineering
Jin W, Derr T, Liu H, Wang Y, Wang S, Liu Z, Tang J (2020) Self-supervised learning on graphs: deep
insights and new direction
Jin W, Liu X, Zhao X, Ma Y, Shah N, Tang J (2021a) Automated self-supervised learning for graphs
Jin M, Zheng Y, Li Y-F, Gong C, Zhou C, Pan S (2021b) Multi-scale contrastive Siamese networks for self-
supervised graph representation learning
Johnson DD (2016) Learning Graphical State Transitions. In: Proceedings of 5th International Conference
on Learning Representations, pp. 1–19. ICLR, Palais des Congrès Neptune, Toulon, France
Johnson J, Gupta A, Fei-Fei L (2018) Image generation from scene graphs. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 1219–1228. IEEE, Salt Lake City, UT,
USA
Kang L, Kumar J, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for document image
classification. In: 2014 22nd International Conference on Pattern Recognition, pp. 3168–3172. IEEE
Stockholm, Sweden. IEEE
Kawamoto T, Tsubaki M, Obuchi T (2019) Mean-field theory of graph neural networks in graph partition-
ing. J Stat Mech 2019(12):124007
Kearnes S, McCloskey K, Berndl M, Pande V, Riley P (2016) Molecular graph convolutions: moving
beyond fingerprints. J Comput Aided Mol Des 30(8):595–608
Keriven N, Peyré G (2019) Universal invariant and equivariant graph neural networks. Adv Neural Inf Pro-
cess Syst 32:7092–7101
Kim D, Oh A (2020) How to find your friendly neighborhood: Graph attention design with self-supervision.
In: International Conference on Learning Representations, pp. 1–25. ICLR, Vienna, Austria
Kim J, Kim T, Kim S, Yoo CD (2019) Edge-labeling graph neural network for few-shot learning. In: Pro-
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11–20
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:​1312.​6114
Kipf TN, Welling M (2016a) Semi-supervised classification with graph convolutional networks
Kipf TN, Welling M (2016b) Variational graph auto-encoders
Klicpera J, Weißenberger S, Günnemann S (2019) Diffusion improves graph learning. Adv Neural Inf Pro-
cess Syst 32:13354–13366
Knyazev B, Taylor GW, Amer MR (2019) Understanding attention and generalization in graph neural
networks
Kumar S, Hamilton WL, Leskovec J, Jurafsky D (2018) Community interaction and conflict on the web. In:
Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 933–943. Interna-
tional World Wide Web Conferences Steering Committee
Kumar S, Zhang X, Leskovec J (2019) Predicting dynamic embedding trajectory in temporal interaction
networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Dis-
covery & Data Mining, pp. 1269–1278. ACM
Kunal K, Dhar T, Madhusudan M, Poojary J, Sharma A, Xu W, Burns SM, Hu J, Harjani R, Sapatnekar SS
(2020) Gana: Graph convolutional network based automated netlist annotation for analog circuits. In:
2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 55–60. IEEE
Lample G, Conneau A (2019) Cross-lingual language model pretraining

13
L. Waikhom, R. Patgiri

Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs.
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4558–
4567. IEEE Salt Lake City, UT, USA
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://​snap.​stanf​ord.​
edu/​data
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and
possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on
Knowledge Discovery in Data Mining, pp. 177–187
Leskovec J, Adamic LA, Huberman BA (2007) The dynamics of viral marketing. ACM Trans Web (TWEB)
1(1):5
Leskovec J, Lang K, Dasgupta A, Mahoney M (2008) Community structure in large networks: Natural clus-
ter sizes and the absence of large well-defined clusters. arXiv:​0810.​1355
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural
cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123
Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the
SIGCHI conference on human factors in computing systems, pp. 1361–1370
Levie R, Huang W, Bucci L, Bronstein M.M, Kutyniok G (2019) Transferability of spectral graph convolu-
tional neural networks
Lewis D, Agam G, Argamon S, Frieder O, Grossman D, Heard J (2006) Building a test collection for com-
plex document information processing. In: Proceedings of the 29th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval, pp. 665–666. Association for
Computing Machinery, New York, NY, USA
Li MM, Zitnik M (2021) Deep contextual learners for protein networks
Li Y, Tarlow D, Brockschmidt M, Zemel R (2015) Gated graph sequence neural networks
Li Y, Yu R, Shahabi C, Liu Y (2017) Diffusion convolutional recurrent neural network: data-driven traffic
forecasting
Li Q, Han Z, Wu X-M (2018a) Deeper insights into graph convolutional networks for semi-supervised
learning. In: Thirty-second AAAI conference on artificial intelligence, pp. 3538–3545. AAAI Press,
New Orleans, USA
Li Y, Vinyals O, Dyer C, Pascanu R, Battaglia P (2018b) Learning deep generative models of graphs
Li Z, Chen Q, Koltun V (2018c) Combinatorial optimization with graph convolutional networks and guided
tree search
Li Y, Ouyang W, Zhou B, Shi J, Zhang C, Wang X (2018d) Factorizable net: an efficient subgraph-based
framework for scene graph generation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Com-
puter vision—ECCV 2018. Springer, Cham, pp 346–363
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S (2020a) Learn molecular representations
from large-scale unlabeled molecules for drug discovery. arXiv:​2012.​11175
Li Z, Kumar M, Headden W, Yin B, Wei Y, Zhang Y, Yang Q (2020c) Learn to cross-lingual transfer with
meta graph learning across heterogeneous languages. In: Proceedings of the 2020 conference on
empirical methods in natural language processing (EMNLP), pp. 2290–2301. Association for com-
putational linguistics
Li S, Xu F, Wang R, Zhong S (2021a) Self-supervised incremental deep graph learning for ethereum phish-
ing scam detection
Li Y, Jin W, Xu H, Tang J (2020b) Deeprobust: a pytorch library for adversarial attacks and defenses. arXiv:​
2005.​06149
Li I, Yan V, Li T, Qu R, Radev D (2021b) Unsupervised cross-domain prerequisite chain learning using
variational graph autoencoders
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S (2021c) An effective self-supervised
framework for learning expressive molecular global representations to drug discovery. Brief Bioin-
form 22(6):109
Liu Y, Lee J, Park M, Kim S, Yang E, Hwang SJ, Yang Y (2018) Learning to propagate labels: transductive
propagation network for few-shot learning. arXiv:​1805.​10002
Li S, Zhou J, Xu T, Huang L, Wang F, Xiong H, Huang W, Dou D, Xiong H (2021d) Structure-aware inter-
active graph neural networks for the prediction of protein–ligand binding affinity. In: Proceedings of
the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 975–985
Lin J, Cai Q, Lin M (2021a) Multi-label classification of fundus images with graph convolutional network
and self-supervised learning. IEEE Signal Process Lett 28:454–458
Lin Q, Zhu F-Y, Shu Y-Q, Zhu P-W, Ye L, Shi W-Q, Min Y-L, Li B, Yuan Q, Shao Y (2021b) Altered brain
network centrality in middle-aged patients with retinitis pigmentosa: a resting-state functional mag-
netic resonance imaging study. Brain Behav 11(2):01983

13
A survey of graph neural networks in various learning paradigms:…

Linmei H, Yang T, Shi C, Ji H, Li X (2019) Heterogeneous graph attention networks for semi-supervised
short text classification. In: Proceedings of the 2019 conference on empirical methods in natural lan-
guage processing and the 9th international joint conference on natural language processing (EMNLP-
IJCNLP), pp. 4821–4830. Association for Computational Linguistics, Hong Kong, China
Liu Q, Nickel M, Kiela D (2019a) Hyperbolic graph neural networks. Advances in Neural Information Pro-
cessing Systems 32
Liu L, Zhou T, Long G, Jiang J, Yao L, Zhang C (2019b) Prototype propagation networks (PPN) for weakly-
supervised few-shot learning on category graph
Liu N, Tan Q, Li Y, Yang H, Zhou J, Hu X (2019c) Is a single vector enough? exploring node polysemy for
network embedding. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowl-
edge Discovery & Data Mining, pp. 932–940. Association for Computing Machinery, New York, NY,
USA
Liu Z, Huang C, Yu Y, Fan B, Dong J (2020a) Fast attributed multiplex heterogeneous network embedding.
In: Proceedings of the 29th ACM International Conference on Information & Knowledge Manage-
ment, pp. 995–1004
Liu M, Zhu K, Gu J, Shen L, Tang X, Sun N, Pan DZ (2020b) Towards decrypting the art of analog layout:
Placement quality prediction via transfer learning. In: 2020 Design, Automation & Test in Europe
Conference & Exhibition (DATE), pp. 496–501. IEEE
Liu Q, Hu Z, Jiang R, Zhou M (2020c) Deepcdr: a hybrid graph convolutional network for predicting cancer
drug response. Bioinformatics 36:911–918
Liu Z, Li X, You Z, Yang T, Fan W, Yu P (2021a) Medical triage chatbot diagnosis improvement via multi-
relational hyperbolic graph neural network. In: Proceedings of the 44th International ACM SIGIR
Conference on Research and Development in Information Retrieval, pp. 1965–1969
Liu Y, Pan S, Jin M, Zhou C, Xia F, Yu PS (2021b) Graph self-supervised learning: a survey
Liu M, Turner WJ, Kokai GF, Khailany B, Pan DZ, Ren H (2021c) Parasitic-aware analog circuit siz-
ing with graph neural networks and bayesian optimization. In: 2021 Design, Automation & Test in
Europe Conference & Exhibition (DATE), pp. 1372–1377. IEEE
Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J (2021d) Self-supervised learning: generative or
contrastive. IEEE Trans Knowl Data Eng Early Access. https://​doi.​org/​10.​1109/​TKDE.​2021.​30908​66
Liu Y, Li M, Li X, Giunchiglia F, Feng X, Guan R (2022) Few-shot node classification on attributed net-
works with graph meta-learning. In: Proceedings of the 45th international ACM SIGIR conference on
research and development in information retrieval, pp. 471–481
Long X, Little G, Treit S, Beaulieu C, Gong G, Lebel C (2020) Altered brain white matter connectome in
children and adolescents with prenatal alcohol exposure. Brain Struct Funct 225(3):1123–1133
Loukas A (2019) What graph neural networks cannot learn: depth vs width
Lu C, Liu Q, Wang C, Huang Z, Lin P, He L (2019) Molecular property prediction: a multilevel quantum
interactions modeling perspective. Proc AAAI Conf Artif Intell 33:1052–1060
Maiya AS (2020) ktrain: a low-code library for augmented machine learning. arXiv:​2004.​10703
Mallat S (1999) A wavelet tour of signal processing. Elsevier, Amsterdam
Manessi F, Rozza A (2021) Graph-based neural network models with multiple self-supervised auxiliary
tasks. Pattern Recogn Lett 148:15–21
Marcheggiani D, Perez-Beltrachini L (2018) Deep graph convolutional encoders for structured data to text
generation. arXiv:​1810.​09995
Marcheggiani D, Bastings J, Titov I (2018) Exploiting semantics in neural machine translation with graph
convolutional networks. arXiv:​1804.​08313
Ma Y, Ren H, Khailany B, Sikka H, Luo L, Natarajan K, Yu B (2019) High performance graph convolu-
tional networks with applications in testability analysis. In: Proceedings of the 56th Annual Design
Automation Conference 2019, pp. 1–6
Maron H, Ben-Hamu H, Shamir N, Lipman Y (2018) Invariant and equivariant graph networks
Maron H, Fetaya E, Segol N, Lipman Y (2019) On the universality of invariant networks. In: International
Conference on Machine Learning, pp. 4363–4371. PMLR
Mavromatis C, Karypis G (2020) Graph InfoClust: Leveraging cluster-level node information for unsuper-
vised graph representation learning
McAuley JJ, Leskovec J (2012) Learning to discover social circles in ego networks. In: NIPS, vol. 2012, pp.
548–56. Citeseer
McAuley J, Pandey R, Leskovec J (2015a) Inferring networks of substitutable and complementary products.
In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, pp. 785–794. Association for Computing Machinery, New York, NY, USA

13
L. Waikhom, R. Patgiri

McAuley J, Targett C, Shi Q, Van Den Hengel A (2015b) Image-based recommendations on styles and sub-
stitutes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Develop-
ment in Information Retrieval, pp. 43–52
McCallum AK, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with
machine learning. Inf Retrieval 3(2):127–163
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev
Sociol 27(1):415–444
Merkwirth C, Lengauer T (2005) Automatic generation of complementary descriptors with molecular graph
networks. J Chem Inf Model 45(5):1159–1168
Micheli A, Sona D, Sperduti A (2004) Contextual processing of structured data by recursive cascade cor-
relation. IEEE Trans Neural Netw 15(6):1396–1410
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases
and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119
Mirhoseini A, Goldie A, Yazgan M, Jiang JW, Songhori E, Wang S, Lee Y-J, Johnson E, Pathak O, Nazi A
et al (2021) A graph placement methodology for fast chip design. Nature 594(7862):207–212
Mislove A, Koppula HS, Gummadi KP, Druschel P, Bhattacharjee B (2008) Growth of the flickr social net-
work. In: Proceedings of the First Workshop on Online Social Networks. WOSN ’08, pp. 25–30. Asso-
ciation for Computing Machinery, New York, NY, USA. https://​doi.​org/​10.​1145/​13977​35.​13977​42
Monti F, Bronstein MM, Bresson X (2017) Geometric matrix completion with recurrent multi-graph neural
networks
Morris C, Ritzert M, Fey M, Hamilton WL, Lenssen JE, Rattan G, Grohe M (2019) Weisfeiler and leman go
neural: higher-order graph neural networks. Proc AAAI Conf Artif Intell 33(01):4602–4609
Narasimhan M, Lazebnik S, Schwing AG (2018) Out of the box: Reasoning with graph convolution nets for
factual visual question answering
Neudorf J, Ekstrand C, Kress S, Borowsky R (2020) Brain structural connectivity predicts brain functional
complexity: diffusion tensor imaging derived centrality accounts for variance in fractal properties of
functional magnetic resonance imaging signal. Neuroscience 438:1–8
Newman ME (2005) A measure of betweenness centrality based on random walks. Soc Netw 27(1):39–54
Nguyen TH, Grishman R (2018) Graph convolutional networks with argument-aware pooling for event
detection. In: Thirty-second AAAI Conference on Artificial Intelligence, pp. 5900–5907
Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S (2021) Graphdta: predicting drug–target bind-
ing affinity with graph neural networks. Bioinformatics 37(8):1140–1147
Nt H, Maehara T (2019) Revisiting graph neural networks: All we have is low-pass filters
Okuda M, Satoh S, Sato Y, Kidawara Y (2021) Unsupervised common particular object discovery and
localization by analyzing a match graph. In: ICASSP 2021-2021 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 1540–1544. IEEE
Oono K, Suzuki T (2019) Graph neural networks exponentially lose expressive power for node classification
Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding
Opolka FL, Solomon A, Cangea C, Veličković P, Liò P (2019) Spatio-temporal deep graph infomax
Ouali Y, Hudelot C, Tami M (2020) An overview of deep semi-supervised learning
Ou M, Cui P, Pei J, Zhang Z, Zhu W (2016) Asymmetric transitivity preserving graph embedding. In: Pro-
ceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pp. 1105–1114. Association for Computing Machinery, New York, NY, USA
Ozaki K, Shimbo M, Komachi M, Matsumoto Y (2011) Using the mutual k-nearest neighbor graphs for
semi-supervised classification on natural language data. In: Proceedings of the Fifteenth Conference
on Computational Natural Language Learning, pp. 154–162
PaddlePaddle: PGL (2021) https://​github.​com/​Paddl​ePadd​le/​PGL
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web.
Technical report, Stanford InfoLab
Pan S, Hu R, Fung S-F, Long G, Jiang J, Zhang C (2019) Learning graph embedding with adversarial train-
ing methods. IEEE Trans Cybern 50(6):2475–2487
Pan S, Hu R, Long G, Jiang J, Yao L, Zhang C (2018) Adversarially regularized graph autoencoder for graph
embedding. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelli-
gence, IJCAI-18, pp. 2609–2615. International Joint Conferences on Artificial Intelligence Organization
Paranjape A, Benson AR, Leskovec J (2017) Motifs in temporal networks. In: Proceedings of the Tenth
ACM International Conference on Web Search and Data Mining, pp. 601–610
Park J, Cho J, Chang HJ, Choi JY (2021) Unsupervised hyperbolic representation learning via message
passing auto-encoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 5516–5526

13
A survey of graph neural networks in various learning paradigms:…

Park C, Kim D, Han J, Yu H (2020) Unsupervised attributed multiplex network embedding. In: Proceedings
of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5371–5378
Park J, Lee M, Chang HJ, Lee K, Choi JY (2019) Symmetric graph convolutional autoencoder for unsuper-
vised graph representation learning. In: Proceedings of the IEEE/CVF International Conference on
Computer Vision, pp. 6519–6528
Peng Z, Dong Y, Luo M, Wu X-M, Zheng Q (2020a) Self-supervised graph representation learning via
global context prediction
Peng Z, Huang W, Luo M, Zheng Q, Rong Y, Xu T, Huang J (2020b) Graph representation learning via
graphical mutual information maximization. In: Proceedings of The Web Conference 2020, pp.
259–270
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings
of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.
701–710. Association for Computing Machinery New York, NY, USA
Pise NN, Kulkarni P (2008) A survey of semi-supervised learning methods. In: 2008 International Confer-
ence on Computational Intelligence and Security, vol. 2, pp. 30–34. IEEE
Prakash VJ, Nithya DL (2014) A survey on semi-supervised learning techniques
Qi X, Liao R, Jia J, Fidler S, Urtasun R (2017) 3d graph neural networks for rgbd semantic segmentation.
In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5199–5208
Qiu J, Chen Q, Dong Y, Zhang J, Yang H, Ding M, Wang K, Tang J (2020) Gcc: Graph contrastive cod-
ing for graph neural network pre-training. In: Proceedings of the 26th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, pp. 1150–1160. Association for Computing
Machinery, New York, NY, USA
Qiu J, Dong Y, Ma H, Li J, Wang K, Tang J (2018a) Network embedding as matrix factorization: Unifying
deepwalk, line, pte, and node2vec. In: Proceedings of the Eleventh ACM International Conference on
Web Search and Data Mining, pp. 459–467
Qiu J, Tang J, Ma H, Dong Y, Wang K, Tang J (2018b) Deepinf: Social influence prediction with deep
learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discov-
ery & Data Mining, pp. 2110–2119. Association for Computing Machinery
Qi S, Wang W, Jia B, Shen J, Zhu S-C (2018) Learning human-object interactions by graph parsing neural
networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 401–417.
Springer Cham
Raizman R, Tavor I, Biegon A, Harnof S, Hoffmann C, Tsarfaty G, Fruchter E, Tatsa-Laur L, Weiser M,
Livny A (2020) Traumatic brain injury severity in a network perspective: a diffusion MRI based con-
nectome study. Sci Rep 10(1):1–12
Rakesh V, Wang S, Shu K, Liu H (2019) Linked variational autoencoders for inferring substitutable and
supplementary items. In: Proceedings of the Twelfth ACM International Conference on Web Search
and Data Mining, pp. 438–446
Ravi S, Larochelle H (2016) Optimization as a model for few-shot learning
Ren H, Kokai GF, Turner WJ, Ku T-S (2020a) Paragraph: Layout parasitics and device parameter pre-
diction using graph neural networks. In: 2020 57th ACM/IEEE Design Automation Conference
(DAC), pp. 1–6. IEEE
Ren Y, Liu B, Huang C, Dai P, Bo L, Zhang J (2020b) Hdgi: An unsupervised graph neural network for
representation learning in heterogeneous graph. In: AAAI Workshop, pp. 1638–1645
Ribeiro LF, Saverese PH, Figueiredo DR (2017) struc2vec: Learning node representations from struc-
tural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 385–394. Association for Computing Machinery, New York, NY,
USA
Rohani N, Eslahchi C, Katanforoush A (2020) Iscmf: integrated similarity-constrained matrix factoriza-
tion for drug–drug interaction prediction. Netw Model Anal Health Inform Bioinform 9(1):1–8
Rohban MH, Rabiee HR (2012) Supervised neighborhood graph construction for semi-supervised clas-
sification. Pattern Recogn 45(4):1363–1372
Rong Y, Bian Y, Xu T, Xie W, Wei Y, Huang W, Huang J (2020) Self-supervised graph transformer on
large-scale molecular data
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science
290(5500):2323–2326
Rozemberczki B, Scherer P, He Y, Panagopoulos G, Riedel A, Astefanoaei M, Kiss O, Beres F, Lopez G,
Collignon N, Sarkar R (2021) PyTorch geometric temporal: spatiotemporal signal processing with
neural machine learning models
Ruiz L, Chamon L, Ribeiro A (2020) Graphon neural networks and the transferability of graph neural
networks. Advances in Neural Information Processing Systems 33

13
L. Waikhom, R. Patgiri

Sakhuja A (2021) Unsupervised learning of latent edge types from multi-relational data. PhD thesis,
Applied sciences: school of computing science
Salakhutdinov R, Tenenbaum J, Torralba A (2012) One-shot learning with a hierarchical nonparametric
bayesian model. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp.
195–206. JMLR Workshop and Conference Proceedings
Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T (2016) Meta-learning with memory-
augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850.
PMLR
Sato R (2020) A survey on the expressive power of graph neural networks
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model.
IEEE Trans Neural Netw 20(1):61–80
Scarselli F, Tsoi AC, Hagenbuchner M (2018) The Vapnik-Chervonenkis dimension of graph and recur-
sive neural networks. Neural Netw 108:248–259
Schlichtkrull M, Kipf TN, Bloem P, Van Den Berg R, Titov I, Welling M (2018) Modeling relational data
with graph convolutional networks. In: European Semantic Web Conference, pp. 593–607. Springer
Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in net-
work data. AI Magn 29(3):93–93
SeongokRyu: Graph-neural-networks (2021). https://​github.​com/​Seong​okRyu/​Graph-​neural-​netwo​rks
Shchur O, Mumme M, Bojchevski A, Günnemann S (2018) Pitfalls of graph neural network evaluation
Shin C, Doermann D, Rosenfeld A (2001) Classification of document pages using structure-based fea-
tures. Int J Doc Anal Recogn 3(4):232–247
Song Z, Yang X, Xu Z, King I (2021) Graph-based semi-supervised learning: a comprehensive review
Song L, Zhang Y, Wang Z, Gildea D (2018) A graph-to-sequence model for AMR-to-text generation
Sperduti A, Starita A (1997) Supervised neural networks for the classification of structures. IEEE Trans
Neural Netw 8(3):714–735
Spitzer F (2013) Principles of random walk, vol 34. Springer, Cham
Subramonian A (2021) MOTIF-driven contrastive learning of graph representations. AAAI
35(18):15980–15981
Sun F-Y, Hoffmann J, Verma V, Tang J (2019) Infograph: Unsupervised and semi-supervised graph-level
representation learning via mutual information maximization
Sun Q, Li J, Peng H, Wu J, Ning Y, Yu PS, He L (2021b) Sugar: Subgraph neural network with rein-
forcement pooling and self-supervised mutual information mechanism. In: Proceedings of the web
conference 2021, pp. 2081–2091
Sun K, Lin Z, Zhu Z (2020) Multi-stage self-supervised learning for graph convolutional networks on
graphs with few labeled nodes. In: Proceedings of the AAAI conference on artificial intelligence,
vol. 34, pp. 5892–5899
Sun Y, Shan Y, Tang C, Hu Y, Dai Y, Yu J, Sun J, Huang F, Si L (2021c) Unsupervised learning of
deterministic dialogue structure with edge-enhanced graph auto-encoder. In: Proceedings of the
AAAI conference on artificial intelligence, vol. 35, pp. 13869–13877
Sun X, Yin H, Liu B, Chen H, Cao J, Shao Y, Viet Hung NQ (2021d) Heterogeneous hypergraph embed-
ding for graph classification. In: Proceedings of the 14th ACM international conference on web
search and data mining, pp. 725–733
Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: Relation
network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and
pattern recognition, pp. 1199–1208
Sun L, Zhang Z, Zhang J, Wang F, Peng H, Su S, Philip SY (2021a) Hyperbolic variational graph neural
network for modeling dynamic graphs. Proc AAAI Conf Artif Intell 35:4375–4383
Svjan5: GNNs-for-NLP (2021). https://​github.​com/​svjan5/​GNNs-​for-​NLP
Taheri A, Gimpel K, Berger-Wolf T (2019) Learning to represent the evolution of dynamic graphs with
recurrent models. In: Companion Proceedings of The 2019 World Wide Web Conference. WWW
’19, pp. 301–307. Association for Computing Machinery, New York, NY, USA. https://​doi.​org/​10.​
1145/​33085​60.​33165​81
Taherkhani F, Kazemi H, Nasrabadi NM (2019) Matrix completion for graph-based deep semi-super-
vised learning. Proc AAAI Conf Artif Intell 33:5058–5065
Tang J, Liu H (2012) Unsupervised feature selection for linked social media data. In: Proceedings of
the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.
904–912
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015b) Line: Large-scale information network embed-
ding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077.

13
A survey of graph neural networks in various learning paradigms:…

International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva,
CHE
Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: Extraction and mining of academic social
networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Dis-
covery and Data Mining. KDD ’08, pp. 990–998. Association for Computing Machinery, New York,
NY, USA. https://​doi.​org/​10.​1145/​14018​90.​14020​08
Tang J, Qu M, Mei Q (2015a) Pte: Predictive text embedding through large-scale heterogeneous text
networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 1165–1174. Association for Computing Machinery, New York,
NY, USA
Tatonetti NP, Patrick PY, Daneshjou R, Altman RB (2012) Data-driven prediction of drug effects and
interactions. Sci Transl Med 4(125):125
Te G, Hu W, Zheng A, Guo Z (2018) Rgcnn: Regularized graph cnn for point cloud segmentation. In:
Proceedings of the 26th ACM International Conference on Multimedia, pp. 746–754. Association
for Computing Machinery, New York, NY, USA
Thudm: Cogdl (2021). https://​github.​com/​THUDM/​cogdl
Thunlp: OpenNE (2021). https://​github.​com/​thunlp/​OpenNE/​tree/​pytor​ch
Tu K, Cui P, Wang X, Yu PS, Zhu W (2018) Deep recursive network embedding with regular equivalence.
In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining, pp. 2357–2366. Association for Computing Machinery, New York, NY, USA
Turkiewicz J, Bhatt RR, Wang H, Vora P, Krause B, Sauk JS, Jacobs JP, Bernstein CN, Kornelsen J, Labus
JS et al (2021) Altered brain structural connectivity in patients with longstanding gut inflammation is
correlated with psychological symptoms and disease duration. NeuroImage 30:102613
Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks
Velickovic P, Fedus W, Hamilton WL, Liò P, Bengio Y, Hjelm RD (2019) Deep graph infomax. ICLR 2(3):4
Verma S, Zhang Z-L (2019) Stability and generalization of graph convolutional neural networks. In:
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining, pp. 1539–1548. Association for Computing Machinery, New York, NY, USA
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with
denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learn-
ing, pp. 1096–1103
Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al. (2016) Matching networks for one shot learning.
Advances in neural information processing systems 29
Wan X (2009) Co-training for cross-lingual sentiment classification. In: Proceedings of the Joint Confer-
ence of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natu-
ral Language Processing of the AFNLP, pp. 235–243
Wang M, Zheng D, Ye Z, Gan Q, Li M, Song X, Zhou J, Ma C, Yu L, Gai Y, Xiao T, He T, Karypis G, Li J,
Zhang Z (2021d) DGL. https://​github.​com/​dmlc/​dgl
Wang C, Pan S, Long G, Zhu X, Jiang J (2017) Mgae: Marginalized graph autoencoder for graph clustering.
In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp.
889–898
Wang H, Wang K, Yang J, Shen L, Sun N, Lee H-S, Han S (2020) Gcn-rl circuit designer: Transferable tran-
sistor sizing with graph neural networks and reinforcement learning. In: 2020 57th ACM/IEEE design
automation conference (DAC), pp. 1–6. IEEE
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019a) Dynamic graph CNN for learning
on point clouds. Acm Trans Graph 38(5):1–12
Wan S, Pan S, Yang J, Gong C (2020) Contrastive and generative graph convolutional networks for graph-
based semi-supervised learning
Wang Z, Jiang Z, Ren Z, Tang J, Yin D (2018) A path-constrained framework for discriminating substitut-
able and complementary products in e-commerce. In: Proceedings of the Eleventh ACM International
Conference on Web Search and Data Mining, pp. 619–627
Wang Z, Liu X, Yang P, Liu S, Wang Z (2021b) Cross-lingual text classification with heterogeneous graph
neural network
Wang P, Agarwal K, Ham C, Choudhury S, Reddy CK (2021a) Self-supervised learning of contextual embed-
dings for link prediction in heterogeneous networks. In: Proceedings of the Web Conference 2021, pp.
2946–2957
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIG-
KDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234. Associa-
tion for Computing Machinery, New York, NY, USA

13
L. Waikhom, R. Patgiri

Wang Y, Fass J, Stern C, hodera J (2019b) Luolibrary for graph neural networks in jaxK. Graph nets for
partial charge prediction. arXiv:​1909.​07903
Wang H, Xu T, Liu Q, Lian D, Chen E, Du D, Wu H, Su W (2019c) MCNE: an end-to-end framework for
learning multiple conditional network representations of social network. In: Proceedings of the 25th
ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1064–1072.
Association for computing machinery, New York, NY, USA
Wang Y, Min Y, Chen X, Wu J (2021c) Multi-view graph contrastive representation learning for drug-
drug interaction prediction. In: Proceedings of the Web Conference 2021, pp. 2921–2933
Wink AM, Tijms BM, Ten Kate M, Raspor E, de Munck JC, Altena E, Ecay-Torres M, Clerigue M, Estanga
A, Garcia-Sebastian M et al (2018) Functional brain network centrality is related to APOE genotype
in cognitively normal elderly. Brain Behav 8(9):01080
Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks.
Proceedings of the 36th international conference on machine learning, vol 97, pp 6861–6871. PMLR
Wu Y, Song Y, Huang H, Ye F, Xie X, Jin H (2021a) Enhancing graph neural networks via auxiliary train-
ing for semi-supervised node classification. Knowl-Based Syst 220:106884
Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2021b) A comprehensive survey on graph neural networks.
IEEE Trans Neural Netw Learn Syst 32(1):4–24. https://​doi.​org/​10.​1109/​TNNLS.​2020.​29783​86
Wu X, Cheng Q (2021) Deepened graph auto-encoders help stabilize and enhance link prediction
Wu S, Dredze M (2019) Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In: Pro-
ceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th
International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 833–844.
Association for Computational Linguistics
Wu J, Wang X, Feng F, He X, Chen L, Lian J, Xie X (2021c) Self-supervised graph learning for recommen-
dation. In: Proceedings of the 44th international ACM SIGIR conference on research and develop-
ment in information retrieval, pp. 726–735
Xu X, Pang G, Wu D, Shang M (2022) Joint hyperbolic and Euclidean geometry contrastive graph neural
networks. Inf Sci
Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks?
Xu B, Shen H, Cao Q, Qiu Y, Cheng X (2019b) Graph wavelet neural network
Xu Q-H, Li Q-Y, Yu K, Ge Q-M, Shi W-Q, Li B, Liang R-B, Lin Q, Zhang Y-Q, Shao Y (2020b) Altered
brain network centrality in patients with diabetic optic neuropathy: a resting-state FMRI study.
Endocr Pract 26(12):1399–1405
Xu B, Lin Y, Tang X, Li S, Shen L, Sun N, Pan DZ (2019a) Wellgan: Generative-adversarial-network-guided
well generation for analog/mixed-signal circuit layout. In: 2019 56th ACM/IEEE Design Automation
Conference (DAC), pp. 1–6. IEEE
Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. In: Proceed-
ings of the IEEE conference on computer vision and pattern recognition, pp. 5410–5419
Xu Y, Li M, Cui L, Huang S, Wei F, Zhou M (2020a) Layoutlm: Pre-training of text and layout for docu-
ment image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining, pp. 1192–1200. Association for Computing Machinery, New
York, NY, USA
Yang L, Li L, Zhang Z, Zhou X, Zhou E, Liu Y (2020b) Dpgn: Distribution propagation graph network for few-
shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
pp. 13390–13399
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recog-
nition. In: Thirty-second AAAI conference on artificial intelligence, pp. 744–7452. AAAI Press New
Orleans, USA
Yang Z, Cohen W, Salakhudinov R (2016) Revisiting semi-supervised learning with graph embeddings. In: Pro-
ceedings of the 33rd international conference on machine learning, vol 48, pp 40–48. PMLR
Yang L, Gu J, Wang C, Cao X, Zhai L, Jin D, Guo Y (2020a) Toward unsupervised graph neural network: inter-
active clustering and embedding via optimal transport. In: 2020 IEEE international conference on data
mining (ICDM), pp. 1358–1363. IEEE
Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M
et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model
59(8):3370–3388
Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf
Syst 42(1):181–213
Yang X, Yumer E, Asente P, Kraley M, Kifer D, Lee Giles C (2017) Learning to extract semantic structure from
documents using multimodal fully convolutional neural networks. In: Proceedings of the IEEE confer-
ence on computer vision and pattern recognition, pp. 5315–5324

13
A survey of graph neural networks in various learning paradigms:…

Yang J, Lu J, Lee S, Batra D, Parikh D (2018) Graph R-CNN for scene graph generation. In: Proceedings of the
European conference on computer vision (ECCV), pp. 670–685. Springer, Cham
Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI
conference on artificial intelligence, vol. 33, pp. 7370–7377
Yao H, Wu F, Ke J, Tang X, Jia Y, Lu S, Gong P, Ye J, Li Z (2018) Deep multi-view spatial-temporal network
for taxi demand prediction. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32, pp.
2588–2595
Ying R, He R, Chen K, Eksombatchai P, Hamilton WL, Leskovec J (2018) Graph convolutional neural net-
works for web-scale recommender systems. In: Proceedings of the 24th ACM SIGKDD international
conference on knowledge discovery & data mining, pp. 974–983. Association for Computing Machinery,
New York, NY, USA
Yi L, Su H, Guo X, Guibas LJ (2017) SyncSpecCNN: Synchronized spectral CNN for 3D shape segmenta-
tion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2282–2290.
IEEE Computer Society, Los Alamitos, CA, USA. https://​doi.​org/​10.​1109/​CVPR.​2017.​697
You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y (2020a) Graph contrastive learning with augmentations. Adv
Neural Inf Process Syst 33:5812–5823
You Y, Chen T, Wang Z, Shen Y (2020b) When does self-supervision help graph convolutional networks? In:
Proceedings of the 37th international conference on machine learning, vol 119, pp 10871–10880. PMLR
You J, Liu B, Ying R, Pande V, Leskovec J (2018) Graph convolutional policy network for goal-directed molec-
ular graph generation
You J, Ying R, Leskovec J (2019) Position-aware graph neural networks. In: International conference on
machine learning, pp. 7134–7143. PMLR
You J, Ying Z, Leskovec J (2020c) Design space for graph neural networks. Adv Neural Inform Process Syst 33
Yu Z, Tao L, Qian Z, Wu J, Liu H, Yu Y, Song J, Wang S, Sun J (2016) Altered brain anatomical networks and
disturbed connection density in brain tumor patients revealed by diffusion tensor tractography. Int J Com-
put Assist Radiol Surg 11(11):2007–2019
Yu T, He S, Song Y-Z, Xiang T (2022) Hybrid graph neural networks for few-shot learning. In: Proceedings of
the AAAI conference on artificial intelligence, vol. 36, pp. 3179–3187
Yu E-Y, Wang Y-P, Fu Y, Chen D-B, Xie M (2020) Identifying critical nodes in complex networks via graph
convolutional networks. Knowl-Based Syst 198:105893
Yu B, Yin H, Zhu Z (2017) Spatio-temporal graph convolutional networks: A deep learning framework for traf-
fic forecasting
Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5505–5514. IEEE,
Salt Lake City, UT, USA
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res
452–473
Zeng J, Xie P (2020) Contrastive self-supervised learning for graph classification
Zhang M, Chen Y (2018) Link prediction based on graph neural networks. Adv Neural Inf Process Syst
31:5165–5175
Zhang M, Fujinuma Y, Boyd-Graber J (2020b) Exploiting cross-lingual subword similarities in low-resource
document classification. Proc AAAI Conf Artif Intell 34:9547–9554
Zhang H, Lin S, Liu W, Zhou P, Tang J, Liang X, Xing EP (2020c) Iterative graph self-distillation
Zhang M, Wu S, Yu X, Liu Q, Wang L (2022) Dynamic graph neural networks for sequential recommendation.
IEEE Trans Knowl Data Eng
Zhang G, He H, Katabi D (2019a) Circuit-GNN: graph neural networks for distributed circuit design. In: Inter-
national conference on machine learning, pp. 7364–7373. PMLR
Zhang Z, Cui P, Zhu W (2020a) Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng
Zhang J, Shi X, Xie J, Ma H, King I, Yeung D-Y (2018b) Gaan: Gated attention networks for learning on large
and spatiotemporal graphs
Zhang S, Zhou Z, Huang Z, Wei Z (2018c) Few-shot classification on graphs with structural regularized gcns
Zhang X, Liu H, Li Q, Wu X-M (2019b) Attributed graph clustering via adaptive graph convolution
Zhang L, Wang X, Li H, Zhu G, Shen P, Li P, Lu X, Shah SAA, Bennamoun M (2020d) Structure-feature based
graph self-adaptive pooling. In: Proceedings of the web conference 2020, pp. 3098–3104
Zhang B, Yu Z, Zhang W (2020f) Community-centric graph convolutional network for unsupervised commu-
nity detection. In: IJCAI, pp. 551–556
Zhang J, Zhang H, Xia C, Sun L (2020g) Graph-bert: only attention is needed for learning graph representations
Zhang R, Xu L, Yu Z, Shi Y, Mu C, Xu M (2021) Deep-irtarget: an automatic target detector in infrared imagery
using dual-domain feature extraction and allocation. IEEE Trans Multimedia 24:1735–1749

13
L. Waikhom, R. Patgiri

Zhang Y, Lu H, Niu W, Caverlee J (2018a) Quality-aware neural complementary item recommendation. In:
Proceedings of the 12th ACM conference on recommender systems, pp. 77–85
Zhang Y, Yu X, Cui Z, Wu S, Wen Z, Wang L (2020e) Every document owns its structure: inductive text clas-
sification via graph neural networks. In: Proceedings of the 58th annual meeting of the association for
computational linguistics, pp. 334–339. Association for Computational Linguistics
Zhao Q, Zhang Y, Zhang Y, Friedman D (2017) Multi-product utility maximization for economic recommenda-
tion. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp.
435–443
Zhou Q, Womer FY, Kong L, Wu F, Jiang X, Zhou Y, Wang D, Bai C, Chang M, Fan G (2017) Trait-related
cortical-subcortical dissociation in bipolar disorder: analysis of network degree centrality. J Clin Psychia-
try 78(5):0–0
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of
methods and applications. AI Open 1:57–81
Zhu XJ (2005) Semi-supervised learning literature survey. University of Wisconsin–Madison Department of
Computer Sciences
Zhuang L, Zhou Z, Gao S, Yin J, Lin Z, Ma Y (2017) Label information guided graph construction for semi-
supervised learning. IEEE Trans Image Process 26(9):4182–4192
Zhu K, Liu M, Lin Y, Xu B, Li S, Tang X, Sun N, Pan DZ (2019) Geniusroute: a new analog routing paradigm
using generative neural network guidance. In: 2019 IEEE/ACM international conference on computer-
aided design (ICCAD), pp. 1–8. IEEE
Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2020a) Deep graph contrastive representation learning
Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2021) Graph contrastive learning with adaptive augmentation. In:
Proceedings of the web conference 2021, pp. 2069–2080
Zhu Y, Xu Y, Yu F, Wu S, Wang L (2020b) Cagnn: Cluster-aware graph neural networks for unsupervised graph
representation learning
Zitnik M, Agrawal M, Leskovec J (2018) Modeling polypharmacy side effects with graph convolutional net-
works. Bioinformatics 34(13):457–466
Zuo X-N, Ehmke R, Mennes M, Imperati D, Castellanos FX, Sporns O, Milham MP (2012) Network centrality
in the human functional connectome. Cereb Cortex 22(8):1862–1875

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.

13

You might also like