GNN Foundations Frontiers and Applications Chapter2
GNN Foundations Frontiers and Applications Chapter2
Peng Cui, Lingfei Wu, Jian Pei, Liang Zhao and Xiao Wang
Many complex systems take the form of graphs, such as social networks, biological
networks, and information networks. It is well recognized that graph data is often
sophisticated and thus is challenging to deal with. To process graph data effectively,
the first critical challenge is to find effective graph data representation, that is, how
to represent graphs concisely so that advanced analytic tasks, such as pattern discov-
ery, analysis, and prediction, can be conducted efficiently in both time and space.
Liang Zhao
Department of Computer Science, Emory University, e-mail: [email protected]
Lingfei Wu
JD.COM Silicon Valley Research Center, e-mail: [email protected]
Peng Cui
Department of Computer Science, Tsinghua University, e-mail: [email protected]
Jian Pei
Department of Computer Science, Simon Fraser University, e-mail: [email protected]
Xiao Wang
Department of Computer Science, Beijing University of Posts and Telecommunications, e-mail:
[email protected]
17
18 Peng Cui, Lingfei Wu, Jian Pei, Liang Zhao and Xiao Wang
for graph inference. After the representation is obtained, downstream tasks such as
node classification , node clustering , graph visualization and link prediction can be
dealt with based on these representations. Overall, there are three main categories of
graph representation learning methods: traditional graph embedding, modern graph
embedding, and graph neural networks, which will be introduced separately in the
following three sections.
rectly defined. For example, an edge between two nodes usually just implies there is
a relationship between them, but cannot indicate the specific proximity. Also, even
if there is no edge between two nodes, we cannot say the proximity between these
two nodes is zero. The definition of node proximities depends on specific analytic
tasks and application scenarios. Therefore, modern graph embedding usually incor-
porates rich information, such as network structures, properties, side information
and advanced information, to facilitate different problems and applications. Modern
graph embedding needs to target both of goals mentioned before. In view of this,
traditional graph embedding can be regarded as a special case of modern graph em-
bedding, and the recent research progress on modern graph embedding pays more
attention to network inference.
To well support network inference, modern graph embedding considers much richer
information in a graph. According to the types of information that are preserved in
graph representation learning, the existing methods can be categorized into three
categories: (1) graph structures and properties preserving graph embedding, (2)
graph representation learning with side information and (3) advanced information
preserving graph representation learning. In technique view, different models are
adopted to incorporate different types of information or address different goals. The
commonly used models include matrix factorization, random walk, deep neural net-
works and their variations.
Among all the information encoded in a graph, graph structures and properties are
two crucial factors that largely affect graph inference. Thus, one basic requirement
of graph representation learning is to appropriately preserve graph structures and
capture properties of graphs. Often, graph structures include first-order structures
and higher-order structures, such as second-order structures and community struc-
tures. Graphs with different types have different properties. For example, directed
graphs have the asymmetric transitivity property. The structural balance theory is
widely applicable to signed graphs.
Graph structures can be categorized into different groups that present at differ-
ent granularities. The commonly exploited graph structures in graph representation
2 Graph Representation Learning 21
learning include neighborhood structure, high-order node proximity and graph com-
munities.
How to define the neighborhood structure in a graph is the first challenge. Based
on the discovery that the distribution of nodes appearing in short random walks is
similar to the distribution of words in natural language, DeepWalk (Perozzi et al,
2014) employs the random walks to capture the neighborhood structure. Then for
each walk sequence generated by random walks, following Skip-Gram, DeepWalk
aims to maximize the probability of the neighbors of a node in a walk sequence.
Node2vec defines a flexible notion of a node’s graph neighborhood and designs
a second order random walks strategy to sample the neighborhood nodes, which
can smoothly interpolate between breadth-first sampling (BFS) and depth-first sam-
pling (DFS). Besides the neighborhood structure, LINE (Tang et al, 2015b) is pro-
posed for large scale network embedding, which can preserve the first and second
order proximities. The first order proximity is the observed pairwise proximity be-
tween two nodes. The second order proximity is determined by the similarity of
the “contexts” (neighbors) of two nodes. Both are important in measuring the re-
lationships between two nodes. Essentially, LINE is based on the shallow model,
consequently, the representation ability is limited. SDNE (Wang et al, 2016) pro-
poses a deep model for network embedding, which also aims at capturing the first
and second order proximites. SDNE uses the deep auto-encoder architecture with
multiple non-linear layers to preserve the second order proximity. To preserve the
first-order proximity, the idea of Laplacian eigenmaps (Belkin and Niyogi, 2002)
is adopted. Wang et al (2017g) propose a modularized nonnegative matrix factor-
ization (M-NMF) model for graph representation learning, which aims to preserve
both the microscopic structure, i.e., the first-order and second-order proximities of
nodes, and the mesoscopic community structure (Girvan and Newman, 2002). They
adopt the NMF model (Févotte and Idier, 2011) to preserve the microscopic struc-
ture. Meanwhile, the community structure is detected by modularity maximization
(Newman, 2006a). Then, they introduce an auxiliary community representation ma-
trix to bridge the representations of nodes with the community structure. In this
way, the learned representations of nodes are constrained by both the microscopic
structure and community structure.
In summary, many network embedding methods aim to preserve the local struc-
ture of a node, including neighborhood structure, high-order proximity as well as
community structure, in the latent low-dimensional space. Both linear and non-
linear models are attempted, demonstrating the large potential of deep models in
network embedding.
We usually demonstrate that the transitivity usually exists in a graph. But mean-
while, we can find that preserving such a property is not challenging, because in a
metric space, the distance between different data points naturally satisfies the trian-
gle inequality. However, this is not always true in the real world. Ou et al (2015) aim
to preserve the non-transitivity property via latent similarity components. The non-
transitivity property declares that, for nodes v1 , v2 and v3 in a graph where (v1 ; v2 )
and (v2 ; v3 ) are similar pairs, (v1 ; v3 ) may be a dissimilar pair. For example, in a
social network, a student may connect with his classmates and his family, while his
classmates and family are probably very different. The main idea is that they learn
multiple node embeddings, and then compare different nodes based on multiple
similarities, rather than one similarity. They observe that if two nodes have a large
semantic similarity, at least one of the structure similarities is large, otherwise, all
of the similarities are small. In a directed graph, it usually has the asymmetric tran-
sitivity property. Asymmetric transitivity indicates that, if there is a directed edge
from node i to node j and a directed edge from j to v, there is likely a directed edge
from i to v, but not from v to i. In order to measure this high-order proximity, HOPE
(Ou et al, 2016) summarizes four measurements in a general formulation, and then
utilizes a generalized SVD problem to factorize the high-order proximity (Paige and
Saunders, 1981), such that the time complexity of HOPE is largely reduced, which
means HOPE is scalable for large scale networks. In a signed graph with both of
positive and negative edges, the social theories, such as structural balance theory
(Cartwright and Harary, 1956; Cygan et al, 2012), which are very different from the
unsigned graph. The structural balance theory demonstrates that users in a signed
social network should be able to have their “friends” closer than their “foes”. To
model the structural balance phenomenon, SiNE (Wang et al, 2017f) utilizes a deep
learning model consisting of two deep graphs with non-linear functions.
The importance of maintaining network properties in network embedding space,
especially the properties that largely affect the evolution and formation of networks,
has been well recognized. The key challenge is how to address the disparity and het-
erogeneity of the original network space and the embedding vector space at property
level. Generally, most of the structure and property preserving methods take high
order proximities of nodes into account, which demonstrate the importance of pre-
serving high order structures in network embedding. The difference is the strategy
of obtaining the high order structures. Some methods implicitly preserve highorder
structure by assuming a generative mechanism from a node to its neighbors, while
some other methods realize this by explicitly approximating high-order proximities
in the embedding space. As topology structures are the most notable characteristic
of networks, structure-preserving network methods embody a large part of the lit-
erature. Comparatively, property preserving network embedding is a relatively new
research topic and is only studied lightly. As network properties usually drive the
formation and evolution of networks, it shows great potential for future research and
applications.
2 Graph Representation Learning 23
Different from side information, the advanced information refers to the supervised
or pseudo supervised information in a specific task. The advanced information pre-
serving network embedding usually consists of two parts. One is to preserve the
network structure so as to learn the representations of nodes. The other is to estab-
lish the connection between the representations of nodes and the target task. The
combination of advanced information and network embedding techniques enables
representation learning for networks.
Information Diffusion. Information diffusion (Guille et al, 2013) is an ubiquitous
phenomenon on the web, especially in social networks. Bourigault et al (2014) pro-
pose a graph representation learning algorithm for predicting information diffusion
in social network. The goal of the proposed algorithm is to learn the representations
of nodes in the latent space such that the diffusion kernel can best explain the cas-
cades in the training set. The basic idea is to map the observed information diffusion
process into a heat diffusion process modeled by a diffusion kernel in the continu-
ous space. The kernel describes that the closer a node in the latent space is from
the source node, the sooner it is infected by information from the source node. The
cascade prediction problem here is defined as predicting the increment of cascade
size after a given time interval (Li et al, 2017a). Li et al (2017a) argue that the pre-
vious work on cascade prediction all depends on the bag of hand-crafting features
to represent the cascade and graph structures. Instead, they present an end-to-end
deep learning model to solve this problem using the idea of graph embedding. The
whole procedure is able to learn the representation of cascade graph in an end-to-end
manner.
Anomaly Detection. Anomaly detection has been widely investigated in previous
work (Akoglu et al, 2015). Anomaly detection in graphs aims to infer the structural
inconsistencies, which means the anomalous nodes that connect to various diverse
influential communities (Hu et al, 2016), (Burt, 2004). Hu et al (2016) propose a
graph embedding based method for anomaly detection. They assume that the com-
munity memberships of two linked nodes should be similar. An anomaly node is
one connecting to a set of different communities. Since the learned embedding of
nodes captures the correlations between nodes and communities, based on the em-
bedding, they propose a new measure to indicate the anomalousness level of a node.
The larger the value of the measure, the higher the propensity for a node being an
anomaly node.
Graph Alignment. The goal of graph alignment is to establish the correspon-
dence between the nodes from two graphs, i.e., to predict the anchor links across
two graphs. The same users who are shared by different social networks naturally
form the anchor links, and these links bridge the different graphs. The anchor link
prediction problem is, given a source graph,a target graph and a set of observed
anchor links, to identify the hidden anchor links across the two graphs. Man et al
(2016) propose a graph representation learning algorithm to solve this problem. The
2 Graph Representation Learning 25
learned representations can preserve the graph structures and respect the observed
anchor links.
Advanced information preserving graph embedding usually consists of two parts.
One is to preserve the graph structures so as to learn the representations of nodes.
The other is to establish the connection between the representations of nodes and the
target task. The first one is similar to structure and property preserving network em-
bedding, while the second one usually needs to consider the domain knowledge of a
specific task. The domain knowledge encoded by the advanced information makes
it possible to develop end-to-end solutions for network applications. Compared with
the hand-crafted network features, such as numerous network centrality measures,
the combination of advanced information and network embedding techniques en-
ables representation learning for networks. Many network applications may be ben-
efitted from this new paradigm.
Over the past decade, deep learning has become the “crown jewel” of artificial intel-
ligence and machine learning, showing superior performance in acoustics, images
and natural language processing, etc. Although it is well known that graphs are ubiq-
uitous in the real world, it is very challenging to utilize deep learning methods to
analyze graph data. This problem is non-trivial because of the following challenges:
(1) Irregular structures of graphs. Unlike images, audio, and text, which have a clear
grid structure, graphs have irregular structures, making it hard to generalize some
of the basic mathematical operations to graphs. For example, defining convolution
and pooling operations, which are the fundamental operations in convolutional neu-
ral networks (CNNs), for graph data is not straightforward. (2) Heterogeneity and
diversity of graphs. A graph itself can be complicated, containing diverse types and
properties. These diverse types, properties, and tasks require different model archi-
tectures to tackle specific problems. (3) Large-scale graphs. In the big-data era, real
graphs can easily have millions or billions of nodes and edges. How to design scal-
able models, preferably models that have a linear time complexity with respect to the
graph size, is a key problem. (4) Incorporating interdisciplinary knowledge. Graphs
are often connected to other disciplines, such as biology, chemistry, and social sci-
ences. This interdisciplinary nature provides both opportunities and challenges: do-
main knowledge can be leveraged to solve specific problems but integrating domain
knowledge can complicate model designs.
Currently, graph neural networks have attracted considerable research attention
over the past several years. The adopted architectures and training strategies vary
greatly, ranging from supervised to unsupervised and from convolutional to re-
cursive, including graph recurrent neural networks (Graph RNNs), graph convo-
lutional networks (GCNs), graph autoencoders (GAEs), graph reinforcement learn-
ing (Graph RL), and graph adversarial methods. Specifically, Graroperty h RNNs
capture recursive and sequential patterns of graphs by modeling states at either the
26 Peng Cui, Lingfei Wu, Jian Pei, Liang Zhao and Xiao Wang
2.5 Summary