0% found this document useful (0 votes)

21 views16 pages

Word2vec, Node2vec, Graph2vec, X2vec - Towards A Theory of Vector Embeddings of Structured Data

Martin Grohe [email protected] RWTH Aachen University Germany

Uploaded by

Tamás Mákos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views16 pages

Word2vec, Node2vec, Graph2vec, X2vec - Towards A Theory of Vector Embeddings of Structured Data

Martin Grohe [email protected] RWTH Aachen University Germany

Uploaded by

Tamás Mákos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

word2vec, node2vec, graph2vec, X2vec: Towards a Theory of

Vector Embeddings of Structured Data
Martin Grohe
[email protected]
RWTH Aachen University
Germany

Figure 1: Embedding graphs into a vector space

ABSTRACT ACM Reference Format:
Vector representations of graphs and relational structures, whether Martin Grohe. 2020. word2vec, node2vec, graph2vec, X2vec: Towards a
Theory of Vector Embeddings of Structured Data. In Proceedings of the 39th
hand-crafted feature vectors or learned representations, enable us
ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
to apply standard data analysis and machine learning techniques (PODS’20), June 14–19, 2020, Portland, OR, USA. ACM, New York, NY, USA,
to the structures. A wide range of methods for generating such 16 pages. https://doi.org/10.1145/3375395.3387641
embeddings have been studied in the machine learning and knowl-
edge representation literature. However, vector embeddings have
received relatively little attention from a theoretical point of view. 1 INTRODUCTION
Starting with a survey of embedding techniques that have been Typical machine learning algorithms operating on structured data
used in practice, in this paper we propose two theoretical ap- require representations of the often symbolic data as numerical vec-
proaches that we see as central for understanding the foundations tors. Vector representations of the data range from handcrafted
of vector embeddings. We draw connections between the various feature vectors via automatically constructed graph kernels to
approaches and suggest directions for future research. learned representations, either computed by dedicated embedding
algorithms or implicitly computed by learning architectures like
CCS CONCEPTS graph neural networks. The performance of machine learning meth-
• Theory of computation → Database theory; • Computing ods crucially depends on the quality of the vector representations.
methodologies → Knowledge representation and reasoning; Therefore, there is a wealth of research proposing a wide range of
Learning latent representations; • Mathematics of comput- vector-embedding methods for various applications. Most of this
ing → Discrete mathematics. research is empirical and often geared towards specific application
areas. Given the importance of the topic, there is surprisingly little
KEYWORDS theoretical work on vector embeddings, especially when it comes
to representing structural information that goes beyond metric
vector embedding; representation learning; graph kernel; graph information (that is, distances in a graph).
neural net; homomorphism; Weisfeiler Leman The goal of this paper is to give an overview over the various em-
bedding techniques for structured data that are used in practice and
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed to introduce theoretical ideas that can, and to some extent have been
for profit or commercial advantage and that copies bear this notice and the full citation used to understand and analyse them. The research landscape on
on the first page. Copyrights for components of this work owned by others than the vector embeddings is unwieldy, with several communities working
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission largely independently on related questions, motivated by differ-
and/or a fee. Request permissions from [email protected]. ent application areas such as social network analysis, knowledge
PODS’20, June 14–19, 2020, Portland, OR, USA graphs, chemoinformatics, computational biology, etc. Therefore,
© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-7108-7/20/06. . . $15.00 we need to be selective, focussing on common ideas and connections
https://doi.org/10.1145/3375395.3387641 where we see them.

1
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

Vector embeddings can bridge the gap between the “discrete” The key theoretical questions we will ask about vector embed-
world of relational data and the “differentiable” world of machine dings of objects in X are the following.
learning and for this reason have a great potential for database
research. Yet relatively little work has been done on embeddings of Expressivity: Which properties of objects X ∈ X are represented
relational data beyond the binary relations of knowledge graphs. by the embedding? What is the meaning of the induced
Throughout the paper, I will try to point out potential directions distance measure? Are there geometric properties of the
for database related research questions on vector embeddings. latent space that represent meaningful relations on X?
A vector embedding for a class X of objects is a mapping f from Complexity: What is the computational cost of computing the
X into some vector space, called the latent space, which we usually vector embedding? What are efficient embedding algorithms?
assume to be a real vector space Rd of finite dimension d. The How can we efficiently retrieve semantic information of the
idea is to define a vector embedding in such a way that geomet- embedded data, for example, answer queries?
ric relationships in the latent space reflect semantic relationships
between the objects in X. Most importantly, we want similar ob- A third question that relates to both expressivity and complexity is
jects in X to be mapped to vectors close to one another with re- what dimension to choose for the latent space. In general, we expect
spect to some standard metric on the latent space (say, Euclidean). a trade-off between (high) expressivity and (low) dimension, but it
For example, in an embedding of words of a natural language we may well be that there is an inherent dimension of the data set. It is
want words with similar meanings, like “shoe” and “boot”, to be an appealing idea (see, for example, [98]) to think of “natural” data
mapped to vectors that are close to each other. Sometimes, we want sets appearing in practice as lying on a low dimensional manifold
further-reaching correspondences between properties of and re- in high dimensional space. Then we can regard the dimension of
lations between objects in X and the geometry of their images in this manifold as the inherent dimension of the data set.
latent space. For example, in an embedding f of the entities of a Reasonably well-understood from a theoretical point of view are
knowledge base, among them Paris, France, Santiago, Chile, we node embeddings of graphs that aim to preserve distances between
may want t := f (Paris) − f (France) to be (approximately) equal nodes, that is, embeddings f : V (G) → Rd of the vertex set V (G) of
to f (Santiago) − f (Chile), so that the relation is-capital-of some graph G such that distG (x, y) ≈ ∥ f (x) − f (y)∥, where distG
corresponds to the translation by the vector t in latent space. is the shortest-path distance in G. There is a substantial theory of
A difficulty is that the semantic relationships and similarities such metric embeddings (see [64]). In many applications of node
between the objects in X can rarely be quantified precisely. They embeddings, metric embeddings are indeed what we need.
usually only have an intuitive meaning that, moreover, may be However, the metric is only one aspect of the information car-
application dependent. However, this is not necessarily a problem, ried by a graph or relational structure, and arguably not the most
because we can learn vector representations in such a way that they important one from a database perspective. Moreover, if we con-
yield good results when we use them to solve machine learning sider graph embeddings rather than node embeddings, there is no
tasks (so-called downstream tasks). This way, we never have to metric to start with. In this paper, we are concerned with structural
make the semantic relationships explicit. As a simple example, we vector embeddings of graphs, relational structures, and their nodes.
may use a nearest-neighbour based classification algorithm on the Two theoretical ideas that have been shown to help in understand-
vectors our embedding gives us; if it performs well then the distance ing and even designing vector embeddings of structures are the
between vectors must be relevant for this classification task. This Weisfeiler-Leman algorithm and various concepts in its context, and
way, we can even use vector embeddings, trained to perform well on homomorphism vectors, which can be seen as a general framework
certain machine learning tasks, to define semantically meaningful for defining “structural” (as opposed to “metric”) embeddings. We
distance measures on our original objects, that is, to define the will see that these theoretical concepts have a rich theory that con-
distance distf (X, Y ) between objects X, Y ∈ X to be ∥ f (X ) − f (Y )∥. nects them to the embedding techniques used in practice in various
We call distf the distance measure induced by the embedding f . ways.
In this paper, the objects X ∈ X we want to embed either are The rest of the paper is organised as follows. Section 2 is a very
graphs, possibly labelled or weighted, or more generally relational brief survey of some of the embedding techniques that can be found
structures, or they are nodes of a (presumably large) graph or more in the machine learning and knowledge representation literature.
generally elements or tuples appearing in a relational structure. In Section 3, we introduce the Weisfeiler-Leman algorithm. This
When we embed entire graphs or structures, we speak of graph algorithm, originally a graph isomorphism test, turns out to be an
embeddings or relational structure embeddings; when we embed only important link between the embedding techniques described in
nodes or elements we speak of node embeddings. These two types Section 2 and the theory of homomorphism vectors, which will be
of embeddings are related, but there are clear differences. Most discussed in detail in Section 4. Finally, Section 5 is devoted to a
importantly, in node embeddings there are explicit relations such discussion of similarity measures for graphs and structures.
as adjacency and derived relations such as distance between the
objects of X (the nodes of a graph), whereas in graph embeddings
all relations between objects are implicit or “semantic”, for example 2 EMBEDDING TECHNIQUES
“having the same number of vertices” or”having the same girth” In this section, we give a brief and selective overview of embedding
(see Figure 1). techniques. More thorough recent surveys are [50] (on node em-
beddings), [104] (on graph neural networks), [102] (on knowledge
graph embeddings), and [61] (on graph kernels).

2
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

2.1 From Metric Embeddings to Node

Embeddings
Node embeddings can be traced back to the theory of embeddings of
finite metric spaces and dimensionality reduction, which have been
studied in geometry (e.g. [21, 55]) and algorithmic graph theory
(e.g. [54, 64]). In statistics and data science, well-known traditional
methods of metric embeddings and dimensionality reduction are
multidimensional scaling [63], Isomap [98], and Laplacian eigen- (a)
map [11]. More recent related approaches are [1, 25, 85, 97]. The
idea is always to embed the nodes of a graph in such a way that the
distance (or similarity) between vectors approximates the distance
(or similarity) between nodes. Sometimes, this can be viewed as
a matrix factorisation. Suppose we have defined a similarity mea-
sure on the nodes of our graph G = (V , E) that is represented by
a similarity matrix S ∈ RV ×V . In the simplest version, we can just
take S to be the adjacency matrix of the graph; in the literature this
is sometimes referred to a first-order proximity. Another common
choice is S = (Svw ) with Svw := exp(−c distG (v, w)), where c > 0 (b)
is a parameter. We describe our embedding of V into Rd by a matrix
X ∈ RV ×d whose rows x v ∈ Rd are the images of the nodes. If we
measure the similarity between vectors x, y by their normalised
⟨x ,y ⟩
inner product ∥x ∥ ∥y ∥ (this is known as the cosine similarity), then
our objective is to find a matrix X with normalised rows that min-
imises ∥XX ⊤ − S ∥F with respect to the Frobenius norm ∥ · ∥F (or
any other matrix norm, see Section 5). In the basic version with
the Frobenius norm, the problem can be solved using the singular
value decomposition of S. In a more general form, we compute
(c)
a similarity matrix Sb ∈ RV ×V whose (v, w)-entry quantifies the
similarity between vectors x v , x w and minimise the distance be-
Figure 2: Three node embeddings of a graph using (a) sin-
tween S and S, b for example using stochastic gradient descent. In
gular value decomposition of adjacency matrix, (b) singular
[50], this approach to learning node embeddings is described as an
value decomposition of the similarity matrix with entries
encoder-decoder framework.
Svw = exp(−2 dist(v, w)), (c) node2vec [48]
Learned word embeddings and in particular the word2vec algo-
rithm [74] introduced new ideas that had huge impact in natural
language processing. These ideas also inspired new approaches to
node embeddings like DeepWalk [87] and node2vec [48] based 2.2 Graph Neural Networks
on taking short random walks in a graph and interpreting the se- When trying to apply deep learning methods to graphs or rela-
quence of nodes seen on such random walks as if they were words tional structures, we face two immediate difficulties: (1) we want
appearing together in a sentence. These approaches can still be the methods to scale across different graph sizes, but standard feed-
described in the matrix-similarity (or encoder-decoder) framework: forward neural networks have a fixed input size; (2) we want the
as the similarity between nodes v and w we take the probability methods to be isomorphism invariant and not depend on a specific
that a fixed-length random walk starting in v ends in w. We can representation of the input graph. Both points show that it is prob-
approximate this probability by sampling random walks. Note that, lematic to just feed the adjacency matrix (or any other standard
even in undirected graphs, this similarity measure is not necessarily representation of a graph) into a deep neural network in a generic
symmetric. “end-to-end” learning architecture.
From a deep learning perspective, the embedding methods de- Graph neural networks (GNNs) are a deep learning framework
scribed so far are all “shallow” in that they directly optimise the for graphs that avoids both of these difficulties, albeit at the price
output vectors and there are no hidden layers; computing the vector of limited expressiveness (see Section 3). Early forms of GNNs
x v corresponding to a node v amounts to a table lookup. There were introduced in [36, 90]; the version we present here is based
are also deep learning methods for computing node embeddings on [49, 59, 88]. Intuitively, a GNN model can be thought of as a
(e.g. [26, 49, 101]). Before we discuss such approaches any further, message passing network over the input graph G = (V , E). Each
let us introduce graph neural networks as a general deep learning node v has a state x v ∈ Rd . Nodes can exchange messages along
framework for graphs that has received a lot of attention in recent the edges of G and update their states. To specify the model, we
years. need to specify two functions: an aggregation function that takes
the current states of the neighbours of a node and aggregates them
into a single vector, and an update function that takes the aggregate

3
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

value obtained from the neighbours and the current state of the Let me close this section by remarking that GNNs are used for
node as inputs and computes the new state of the node. In a simple all kinds of machine learning tasks on graphs and not only to
form, we may take the following functions: compute node embeddings. For example, a GNN based architecture
for graph classification would plug the output of the GNN layer(s)
(t +1) (t )
Õ
Aggregate : av ← Wagg · x w , (2.1) into a standard feedforward network (possibly consisting only of a
w ∈N (v) single softmax layer).
!!
(t )
(t +1) x
Update : xv ← σ Wup · (tv+1) , (2.2)
av 2.3 Knowledge Graph and Relational Structure
Embeddings
where Wagg ∈ Rc×d and Wup ∈ Rd×(c+d) are learned parameter Node embeddings of knowledge graphs have also been studied
matrices and σ is a nonlinear “activation” function, for example quite intensely in recent years, remarkably by a community that
the ReLU (rectified linear unit) function σ (x) := max{0, x } applied seems almost disjoint from that involved in the node embedding
pointwise to a vector. It is important to note that the parameter techniques described in Section 2.1. What makes knowledge graphs
matrices Wagg and Wup do not depend on the node v; they are somewhat special is that they come with labelled edges (or, equiv-
shared across all nodes of a graph. This parameter sharing allows alently, many different binary relations) as well as labelled nodes.
it to use the same GNN model for graphs of arbitrary sizes. It is not completely straightforward to adapt the methods of Sec-
Of course, we can also use more complicated aggregation and tion 2.1 to edge- and vertex-labelled graphs. Another important
update functions. We only want these functions to be differentiable difference is in the objective function: the methods of Section 2.1
to be able to use gradient descent optimisation methods in the train- mainly focus on the graph metric (even though approaches based
ing phase, and we want the aggregation function to be symmetric on random walks like node2vec are flexible and also incorporate
(t )
in its arguments x w for w ∈ N (v) to make sure that the GNN structural criteria). However, shortest-path distance is less relevant
computes a function that is isomorphism invariant. For example, in in knowledge graphs.
[100] we use a linear aggregation function and an update function Rather than focussing on distances, knowledge graph embed-
computed by an LSTM (long short-term memory, [51]), a specific dings focus on establishing a correspondence between the rela-
recurrent neural network component that allows it to “remember” tions of the knowledge graph and geometric relationships in the
(0) (1) (t ) latent space. A very influential algorithm, TransE [18] aims to
relevant information from the sequence x v , x v , . . . , x v .
associate a specific translation of the latent space with each rela-
The computation of such a GNN model starts from a initial
(0) tion. Recall the example of the introduction, where entities Paris,
configuration x v )v ∈V and proceeds through a fixed-number t of France, Santiago, Chile were supposed to be embedded in such
aggregation- and update-steps, resulting in a final configuration
(t )
a way that x Paris − x France ≈ x Santiago − x Chile , so that the
x v )v ∈V . Note that this configuration gives us a node embedding relation is-capital-of corresponds to the translation by t :=
(t )
v 7→ x v of the input graph. We can also stack several such GNN x Paris − x France .
layers, each with its own aggregation and activation function, on Another way of mapping relations to geometric relationships is
top of one another, using the final configuration of each (but the implemented in Rescal [83]. Here the idea is to associate a bilinear
last) layer as the initial configuration of the following layer and form β R with each relation R in such a way that for all entities v, w
the final configuration of the last layer as the node embedding. As it holds that β R (x v , x w ) ≈ 1 if (v, w) ∈ R and β R (x v , x w ) ≈ 0 if
initial states, we can take constant vectors like the all-ones vector (v, w) < R. We can represent such a bilinear form β R by a matrix
for each node, or we can assign a random initial state to each node. B R such that β R (x, y) = x ⊤ B R y. Then the objective is to minimise,
We can also use the initial state to represent the node labels if the simultaneously for all R, the term ∥X B R X ⊤ − AR ∥, where X is the
input graph is labelled. embedding matrix with rows x v and AR is the adjacency matrix
To train a GNN for computing a node-embedding, in principle we of the relation R. Note that this is a multi-relational version of the
can use any of the loss functions used by the embedding techniques matrix-factorisation approach described in Section 2.1, with the
described in Section 2.1. The reader may wonder what advantage additional twist that we also need to find the matrix B R for each
the complicated GNN architecture has over just optimising the relation R.
embedding matrix X (as the methods described in Section 2.1 do). Completing our remarks on knowledge graph embeddings, we
The main advantage is that the GNN method is inductive, whereas mention that it is fairly straightforward to generalise the GNN
the previously described methods are transductive. This means based node embeddings to (vertex- and edge-)labelled graphs and
that a GNN represents a function that we can apply to arbitrary hence to knowledge graphs [91].
graphs, not just to the graph it was originally trained on. So if the While there is a large body of work on embedding knowledge
graph changes over time and, for example, nodes are added, we graphs, that is, binary relational structures, not much is known
do not have to re-train the embedding, but just embed the new about embedding relations of higher arities. Of course one approach
nodes using the GNN model we already have, which is much more to embedding relational structures of higher arities is to transform
efficient. We can even apply the model to an entirely new graph and them into their binary incidence structures (see Section 4.2 for a
still hope it gives us a reasonable embedding. The most prominent definition) and then embed these using any of the methods available
example of an inductive node-embedding tool based on GNNs is for binary structures. Currently, I am not aware of any empirical
GraphSage [49]. studies on the practical viability of this approach. An alternative

4
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

approach [16, 17] is based on the idea of treating the rows of a table, techniques discussed before, they only play a minor role. For graph
that is, tuples in a relation, like sentences in natural language and embeddings, we are in the opposite situation: kernels are the domi-
then use word embeddings to embed the entities. nant technique. However, there are a few other approaches.

2.4 Graph Kernels 2.5 Graph Embeddings

There are machine learning methods that operate on vector repre- Graph2Vec [80] is a transductive approach to embedding graphs
sentations of their input objects, but only use these vector repre- inspired by word2vec and some of the node embedding techniques
sentations implicitly and never actually access the vectors. All they discussed in Section 2.1. As before, “transductive” means that the
need to access is the inner product between two vectors. For rea- embedding is computed for a fixed set of graphs in form of an
sons that will become apparent soon, such methods are known as embedding matrix (or look-up table) and thus it only yields an
kernel methods. Among them are support vector machines for clas- embedding for graphs known at training time. For typical machine
sification [29] and principal component analysis as well as k-means learning applications it is unusual to operate on set of graphs that
clustering for unsupervised learning [92] (see [93, Chapter 16] for is fixed in advance, so inductive embedding approaches are clearly
background). preferable.
A kernel functions for a set X of objects is a binary function GNNs can also be used to embed entire graphs, in the simplest
K : X × X → R that is symmetric, that is, K(x, y) = K(y, x) for form by just aggregating the embeddings of the nodes computed
all x, y ∈ X, and positive semidefinite, that is, for all n ≥ 1 and by the GNN. Graph autoencoders [58, 86] give a way to train such
all x 1, . . . , x n ∈ X the matrix M with entries Mi j := K(x i , x j ) is graph embeddings in an unsupervised manner. A more advanced
positive semidefinite. Recall that a symmetric matrix M ∈ Rn×n GNN architecture for learning graph embedding, based on capsule
is positive semidefinite if all its eigenvalues are nonnegative, or neural networks, is proposed in [105].
equivalently, if x T Mx ≥ 0 for all x ∈ Rn . It can be shown that
a symmetric function K : X × X → R is a kernel function if and 3 THE WEISFEILER-LEMAN ALGORITHM
only if there is a vector embedding f : X → H of X into some We slightly digress from our main theme and introduce the Weisfeiler-
Hilbert space H such that K is the mapping induced by the inner Leman algorithm, a very efficient combinatorial partitioning algo-
product of H, that is, K(x, y) = ⟨f (x), f (y)⟩ for all x, y ∈ X (see [93, rithm that was originally introduced as a fingerprinting technique
Lemma 16.2] for a proof). For our purposes, it suffices to know that for chemical molecules [75]. The algorithm plays an important role
a Hilbert space is a potentially infinite dimensional real vector space in the graph isomorphism literature, both in theory (for example,
H with a a symmetric bilinear form ⟨·, ·⟩ : H × H → R satisfying [7, 41]) and practice, where it appears as a subroutine in all com-
⟨x, x⟩ > 0 for all x ∈ H \ {0}. A difference between the embeddings petitive graph isomorphism tools (see [73]). As we will see, the
underlying kernels and most other vector embeddings is that the algorithm has interesting connections with the embedding tech-
kernel embeddings usually embed into higher dimensional spaces niques discussed in the previous section.
(even infinite dimensional Hilbert spaces). Dimension does not
play a big role for kernels, because the embeddings are only used
implicitly. Nevertheless, it can sometimes be more efficient to use 3.1 1-Dimensional Weisfeiler-Leman
the embeddings underlying kernels explicitly [62]. The Weisfeiler-Leman algorithm has a parameter k, its dimension.
Kernel methods have dominated machine learning on graphs for We start by describing the 1-dimensional version 1-WL, which is
a long time and, despite of the recent successes of GNNs, they are also known as colour refinement or naive vertex classification. The
still competitive. Quoting [61], “It remains a current challenge in algorithm computes a partition of the nodes of its input graph. It
research to develop neural techniques for graphs that are able to learn is convenient to think of the classes of the partition as colours of
feature representations that are clearly superior to the fixed feature the nodes. A colouring (or partition) is stable if any two nodes v, w
spaces used by graph kernels.” of the same colour c have the same number of neighbours of any
The first dedicated graph kernels were the random walk graph colour d. The algorithm computes a stable colouring by iteratively
kernels [37, 56] based on counting walks of all lengths. In some refining an initial colouring as described in Algorithm 1. Figure 3
sense, they are similar to the random walk based node-embedding shows an example run of 1-WL. The algorithm is very efficient; it
algorithms described in Section 2.1. Other graph kernels are based can be implemented to run in time O((n + m) log n), where n is the
on counting shortest paths, trees and cycles, and small subgraph pat- number of vertices and m the number of edges of the input graph
terns [20, 52, 89, 95]. The important Weisfeiler-Leman kernels [94], [27]. Under reasonable assumptions on the algorithms used, this is
which we will describe in Section 3, are based on aggregating lo- best possible [12].
cal neighbourhood information. All these kernels are defined for To use 1-WL as an isomorphism test, we note that the colouring
graphs with discrete labels. The techniques, all essentially based on computed by the algorithm is isomorphism invariant, which means
counting certain subgraphs, are not directly applicable to graphs that if we run the algorithm on two isomorphic graphs the resulting
with continuous labels such as edge weights. An adaptation to coloured graphs will still be isomorphic and in particular have
continuous labels based on hashing is proposed in [77]. the same numbers of nodes of each colour. Thus, if we run the
Let me remark that there are also node kernels defined on the algorithm on two graphs and find that they have distinct numbers
nodes of a graph [60, 82, 96]. They implicitly give a vector em- of vertices of some colour, we have produced a certificate of non-
bedding of the nodes. However, compared to the node embedding isomorphism. If this is the case, we say that 1-WL distinguishes

5
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

1-WL v1 0.3
w1
Input: Graph G w1 w2 w3 w4 w5 2

Initialisation: All nodes get the same colour. v1 w2

0.
0.3 2 1 0 0.7

7
Refinement Round: For all colours c in the current colouring
v2 v2 w3
and all nodes v, w of colour c, the nodes v and w get 1 0 1 1 1
different colours in the new colouring if there is some v3 w4

7
0.7 2 0 1 0.3

0.
colour d such that v and w have different numbers of 2
neighbours of colour d. v3 w5
0.3
The refinement is repeated until the colouring is stable, then the
stable colouring is returned. Figure 4: Stable colouring of a matrix and the corresponding
weighted bipartite graph computed by matrix WL
Algorithm 1: The 1-dimensional WL algorithm

if there is some colour d such that

Õ Õ
α(v, x) , α(w, x), (3.1)
x of colour d x of colour d

where α(x, y) denotes the weight of the edge from x to y, and we set
α(x, y) = 0 if there is no edge from x to y. This idea also allows us
to define 1-WL on matrices: with a matrix A ∈ Rm×n we associate a
(a) initial graph (b) colouring after round 1
weighted bipartite graph with vertex set {v 1, . . . , vm , w 1, . . . , w n }
and edge weights α vi , w j ) := Ai j and α(vi , vi ′ ) = α(w j , w j ′ ) = 0
and run weighted 1-WL on this weighted graph with initial colour-
ing that distinguishes the vi (rows) from the w j (columns). An
example is shown in Figure 4. This matrix-version of WL was
applied in [44] to design a dimension reduction techniques that
(c) colouring after round 2 (d) stable colouring after round 3 speeds up the solving of linear programs with many symmetries
(or regularities).
Figure 3: A run of 1-WL
3.3 Higher-Dimensional WL
the two graphs. Unfortunately, 1-WL does not distinguish all non- For this paper, the 1-dimensional version of the Weisfeiler-Leman
isomorphic graphs. For example, it does not distinguish a cycle of algorithm is the most relevant, but let us briefly describe the higher
length 6 from the disjoint union of two triangles. But, remarkably, dimensional versions. In fact, it is the 2-dimensional version, also
1-WL does distinguish almost all graphs, in a precise probabilistic referred to as classical WL, that was introduced by Weisfeiler and
sense [8]. Leman [103] in 1968 and gave the algorithm its name. The k-
dimensional Weisfeiler-Leman algorithm (k-WL) is based on the
3.2 Variants of 1-WL same iterative-refinement idea as 1-WL. However, instead of ver-
tices, k-WL colours k-tuples of vertices of a graph. Initially, each
The version of 1-WL we have formulated is designed for undirected
k-tuple is “coloured” by the isomorphism type of the subgraph it
graphs. For directed graphs it is better to consider in-neighbours and
induces. Then in the refinement rounds, the colour information is
out-neighbours of nodes separately. 1-WL can easily be adapted to
propagated between “adjacent” tuples that only differ in one coor-
labelled graphs. If vertex labels are present, they can be incorporated
dinate (details can be found in [24]). If implemented using similar
in the initial colouring: two vertices get the same initial colour if
and only if they have the same label(s). We can incorporate edge ideas as for 1-WL, k-WL runs in time O(nk+1 log n) [53].
labels in the refinement rounds: two nodes v and w get different Higher-dimensional WL is much more powerful than 1-WL, but
colours in the new colouring if there is some colour d and some edge Cai, Fürer, and Immerman [24] proved that for every k there are
label λ such that v and w have a different number of λ-neighbours non-isomorphic graphs G k , Hk that are not distinguished by k-WL.
of colour d. These graphs, known as the CFI graphs, have size O(k) and are
However, if the edge labels are real numbers, which we interpret 3-regular.
as edge weights, or more generally elements of an arbitrary commu- DeepWL, a WL-Version of unlimited dimension that can distin-
tative monoid, then we can also use the following weighted version guish the CFI-graphs in polynomial time, was recently introduced
of 1-WL due to [44]. Instead of refining by the number of edges in [47].
into some colour, we refine by the sum of the edge weights into
that colour. Thus the refinement round of Algorithm 1 is modified 3.4 Logical and Algebraic Characterisations
as follows: for all colours c in the current colouring and all nodes The beauty of the WL algorithm lies in the fact that its expressive-
v, w of colour c, v and w get different colours in the new colouring ness has several natural and completely unrelated characterisations.

6
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

Of these, we will see two in this section. Later, we will see two more C0
characterisations in terms of GNNs and homomorphism numbers.
The logic C is the extension of first-order logic by counting C1
quantifiers of the form ∃ ≥p x (“there exists at least p elements x”).
Every C-formula is equivalent to a formula of plain first-order logic.
However, here we are interested in fragments of C obtained by
restricting the number of variables of formulas, and the translation C2
from C to first-order logic may increase the number of variables. G
For every k ≥ 1, by Ck we denote the fragment of C consisting
of all formulas with at most k (free or bound) variables. The finite Figure 5: Viewing colours of WL as trees
variable logics Ck play an important role in finite model theory
(see, for example, [40]). Cai, Fürer, and Immerman [24] have related
these fragments to the WL algorithm. Note that to decide whether two graphs G, H with adjacency
matrices G, H are fractionally isomorphic, we can minimise the
Theorem 3.1 ([24]). Two graphs are Ck +1 -equivalent, that is, they convex function ∥AX − X B∥F , where X ranges over the convex set
satisfy the same sentences of the logic Ck +1 , if and only if k-WL does of doubly stochastic matrices. To minimise this function, we can
not distinguish the graphs. use standard gradient descent techniques for convex minimisation.
It was shown in [57] that, surprisingly, the refinement rounds of 1-
Let us now turn to an algebraic characterisation of WL. Our
WL closely correspond to the iterations of the Frank-Wolfe convex
starting point is the observation that two graphs G, H with vertex
sets V ,W and adjacency matrices A ∈ RV ×V , B ∈ RW ×W are
minimisation algorithm.
isomorphic if and only there is a permutation matrix X ∈ RV ×W
3.5 Weisfeiler-Leman Graph Kernels
such that X ⊤AX = B. Recall that a permutation matrix is a {0, 1}-
matrix that has exactly one 1-entry in each row and in each column. The WL algorithm collects local structure information and prop-
Since permutation matrices are orthogonal (i.e., they satisfy X ⊤ = agates it along the edges of a graph. We can define very effective
X −1 ), we can rewrite this as AX = X B, which has the advantage of graph kernels based on this local information. For every i ≥ 0,
being linear. This corresponds to the following linear equations in let Ci be the set of colours that 1-WL assigns to the vertices of a
the variables Xvw , for v ∈ V and w ∈ W : graph in the i-th round. Figure 5 illustrates that we can identify the
Õ Õ colours in Ci with rooted trees of height i. For every graph G and
Avv ′ Xv ′w = Xvw ′ Bw ′w for all v ∈ V , w ∈ W . (3.2) every colour c ∈ Ci , by wl(c, G) we denote the number of vertices
v ′ ∈V w ′ ∈W that receive colour c in the ith round of 1-WL.
We can add equations expressing that the row and column sums of Example 3.3. For the graph G shown in Figure 5 we have
the matrix X are 1, which implies that X is a permutation matrix if
the Xvw are nonnegative integers. wl , G = 2, wl , G = 0.
Õ Õ
Xvw ′ = Xv ′w = 1 for all v ∈ V , w ∈ W . (3.3) For every t ∈ N, the t-round WL-kernel is the mapping K WL
(t )
w ′ ∈W v ′ ∈V defined by
Obviously, equations (3.2) and (3.3) have a nonnegative integer t Õ
(t )
Õ
solution if and only if the graphs G and H are isomorphic. This does K WL (G, H ) := wl(c, G) · wl(c, H )
not help much from an algorithmic point of view, because it is NP- i=0 c ∈C i
hard to decide if a system of linear equations and inequalities has an for all graphs G, H . It is easy to see that this mapping is symmetric
integer solution. But what about nonnegative rational solutions? We and positive-semidefinite and thus indeed a kernel mapping; the
know that we can compute them in polynomial time. A nonnegative corresponding vector embedding maps each graph G to the vector
rational solution to (3.2) and (3.3), which can also be seen as a t
doubly stochastic matrix satisfying AX = X B, is called a fractional
Ø
wl(c, G) c ∈ Ci .
isomorphism between G and H . If such a fractional isomorphism i=0
exists, we say that G and H are fractionally isomorphic. Tinhofer [99] Note that formally, we are mapping G to an infinite dimensional
proved the following theorem. vector space, because all the sets Ci for i ≥ 1 are infinite. However,
Theorem 3.2 ([99]). Graphs G and H are fractionally isomorphic for a graph G of order n the vector as at most kn + 1 nonzero entries.
if and only if 1-WL does not distinguish G and H . We can also define a version K WL of the WL-kernel that does not
depend on a fixed-number of rounds by letting
A corresponding theorem also holds for the weighted and the Õ 1 Õ
K WL (G, H ) := wl(c, G) · wl(c, H ).
matrix version of 1-WL [44]. Moreover, Atserias and Maneva [5]
i ≥0
2i c ∈C
proved a generalisation that relates k-WL to the level-k Sherali- i

Adams relaxation of the system of equations and thus yields an The WL-kernel was introduced by Shervashidze et al. [94] under the
algebraic characterisation of k-WL indistinguishability (also see name Weisfeiler-Leman subtree kernel. They also introduce variants
[45, 71] and [6, 13, 39, 84] for related algebraic aspects of WL). such as a Weisfeiler-Leman shortest path kernel. A great advantage

7
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

the WL (subtree) kernel has over most of the graph kernels dis- directed graphs they have to preserve the edge direction. Of course
cussed in Section 2.4 is its efficiency, while performing at least as we can generalise homomorphisms to arbitrary relational struc-
good as other kernels on downstream tasks. Shervashidze et al. [94] tures, and we remind the reader of the close connection between
report that in practice, t = 5 is a good number of rounds for the homomorphisms and conjunctive queries. We denote the number
t-round WL-kernel. of homomorphisms from F to G by hom(F, G).
There are also graph kernels based on higher dimensional WL
Example 4.1. For the graph G shown in Figure 5 we have
algorithm [76].
hom , G = 18, hom , G = 114.
3.6 Weisfeiler-Leman and GNNs
(t )
Recall that a GNN computes a sequence (x v )v ∈V , for t ≥ 0, of To calculate these numbers, we observe that for the star Sk (tree of
vector embeddings of a graph G = (V , E). In the most general form, height 1 with k leaves) we have hom(Sk , G) = v ∈V (G) degG (v)k .
Í
it is recursively defined by
For every class F of graphs, the homomorphism counts hom(F, G)
(t +1)

(t ) give a graph embedding HomF defined by
xv = f UP x v , f AGG x w w ∈ N (v) ,
HomF (G) := hom(F, G) F ∈ F

where the aggregation function f AGG is symmetric in its arguments.
It has been observed in several places [49, 78, 106] that this is very for all graphs G. If F is infinite, the latent space RF of the embedding
similar to the update process of 1-WL. Indeed, it is easy to see that HomF is an infinite dimensional vector space. By suitably scaling
(0)
if the initial embedding x v is constant then for any two vertices the infinite series involved, we can define an inner product on a
(t ) (t ) subspace HF of RF that includes the range of HomF . This also
v, w, if 1-WL assigns the same colour to v and w then x v = x w .
gives us a graph kernel. One way of making this precise is as
This implies that two graphs that cannot be distinguished by 1-WL
follows. For every k, we let Fk be the set of all F ∈ F of order
will give the same result for any GNN applied to them; that is,
|F | := |V (F )| = k. Then we let
GNNs are at most as expressive as 1-WL. It is shown in [78] that a
∞
converse of this holds as well, even if the aggregation and update Õ 1 Õ 1
K F (G, H ) := hom(F, G) · hom(F, H ). (4.1)
functions of the GNN are of a very simple form (like (2.1) and (2.2)
k=1
|Fk |
F ∈F
kk
k
in Section 2.2). Based on the connection between WL and logic, a
more refined analysis of the expressiveness of GNNs was carried There are various other ways of doing this, for example, rather than
out in [10]. However, the limitations of the expressiveness only hold looking at the sum over all F ∈ Fk we may look at the maximum.
(0)
if the initial embedding x v is constant (or at least constant on all In practice, one will simply cut off of the infinite series and only
1-WL colour classes). We can increase the expressiveness of GNNs consider a finite subset of F. A problem with using homomorphism
(0) vectors as graph embeddings is that the homomorphism numbers
by assigning random initial vectors x v to the vertices. The price
quickly get tremendously large. In practice, we take logarithms of
we pay for this increased expressiveness is that the output of a run
theses numbers, possibly scaled by the size of the graphs from F. So,
of the GNN model is no longer isomorphism invariant. However,
a practically reasonable graph embedding based on homomorphism
the whole randomised process is still isomorphism invariant. More
(0) vectors would take a finite class F of graphs and map each G to the
formally, the random variable that associates an output x v )v ∈V vector
with each graph G is isomorphism invariant. 1
log hom(F, G) F ∈ F .

A fully invariant way to increase the expressiveness of GNNs |F |
is to build “higher-dimensional” GNNs, inspired by the higher- Initial experiments show that this graph embedding performs very
dimensional WL algorithm. Instead of nodes of the graphs, they well on downstream classification tasks even if we take F to be
operate on constant sized tuples or sets of vertices. A flexible archi- a small class (of size 20) of graphs consisting of binary trees and
tecture for such higher-dimensional GNNs is proposed in [78]. cycles. This is a good indication that homomorphism vectors extract
relevant features from a graph. Note that the size of the class F is
4 COUNTING HOMOMORPHISMS the dimension of the feature space.
Most of the graph kernels and also some of the node embedding Apart from these practical considerations, homomorphisms vec-
techniques are based on counting occurrences of substructures tors have a beautiful theory that links them to various natural
like walks, cycles, or trees. There are different ways of embedding notions of similarity between structures, including indistinguisha-
substructures into a graph. For example, walks and paths are the bility by the Weisfeiler-Leman algorithm.
same structures, but we allow repeated vertices in a walk. Formally,
“walks” are homomorphic images of path graphs, whereas “paths” 4.1 Homomorphism Indistinguishability
are embedded path graphs. It turns that homomorphisms and ho- Two graphs G and H are homomorphism-indistinguishable over a
momorphic images give us a very robust and flexible “basis” for class F of a graphs if HomF (G) = HomF (H ). Lovász proved that
counting all kinds of substructures [30]. homomorphism indistinguishability over the class G of all graphs
A homomorphism from a graph F to a graph G is a mapping corresponds to isomorphism.
h from the nodes of F to the nodes of G such that for all edges
uu ′ of F the image h(u)h(u ′ ) is an edge of G. On labelled graphs, Theorem 4.2 ([65]). For all graphs G and H ,
homomorphisms have to preserve vertex and edge labels, and on HomG (G) = HomG (H ) ⇐⇒ G and H are isomorphic.

8
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

Proof. The backward direction is trivial. For the forward direc- G H

tion, suppose that HomG (G) = HomG (H ), that is, for all graphs F
it holds that hom(F, G) = hom(F, H ).
We can decompose every homomorphism h : F → F ′ as h = f ◦д
such that for some graph F ′′ :
• д : F → F ′′ is an epimorphism, that is, a homomorphism Figure 6: Co-spectral graphs
such that for every v ∈ V (F ′′ ) there is a u ∈ V (F ) with
д(u) = v and for every vv ′ ∈ E(F ′′ ) there is a uu ′ ∈ E(F )
However, not only the “full” homomorphism vector HomG (G)
with д(u) = v and д(u ′ ) = v ′ ;
of a graph G, but also its projections HomF (G) to natural classes F
• f : F ′′ → F ′ is an embedding (or monomorphism), that is, a
capture very interesting information about G. A first result worth
homomorphism such that f (u) , f (u ′ ) for all u , u ′ .
mentioning is the following. This result is well-known, though
Note that the graph F ′′ is isomorphic to the image h(F ) and thus usually phrased differently. Two graphs are co-spectral if their ad-
unique up to isomorphism. Moreover, there are precisely jacency matrices have the same eigenvalues with the same mul-
aut(F ′′ ) = number of automorphisms of F ′′ tiplicities. Figure 6 shows two graph that are co-spectral, but not
isomorphic.
isomorphisms from F ′′ to h(F ). This means that there are aut(F ′′ )
pairs (f , д) such that h = f ◦ д and д : F → F ′′ is an epimorphism Theorem 4.3 (Folklore). For all graphs G and H ,
and f : F ′′ → F ′ is an embedding. Thus we can write HomC (G) = HomC (H ) ⇐⇒ G and H are co-spectral.
Õ 1 Here C denotes the class of all cycles.
hom(F, F ′ ) = ′′ ′′ ′
′′ ) · epi(F, F ) · emb(F , F ), (4.2)
aut(F
′′F Proof sketch. Observe that for the cycle Ck of length k we
where epi(F, F ′′ ) is the number of epimorphisms from F onto F ′′ , have hom(Ck , G) := trace(Ak ), where A is the adjacency matrix
emb(F ′′, F ′ ) is the number of embeddings of F ′′ into F ′ , and the of G. It is well-known that the trace of a symmetric real matrix is
sum ranges over all isomorphism types of graphs F ′′ . Furthermore, the sum of its eigenvalues and that the eigenvalues of Ak are the
as epi(F, F ′′ ) = 0 if |F ′′ | > |F |, we can restrict the sum to F ′′ of kth powers of the eigenvalues of A. This immediately implies the
order |F ′′ | ≤ |F |. backward direction. By a simple linear-algebraic argument, it also
Let F 1, . . . , Fm be an enumeration of all graphs of order at most implies the forward direction. □
n := max{|G |, |H |} such that each graph of order at most n is Dvorák [33] proved that homomorphism counts of trees, and
isomorphic to exactly one graph in this list and that i ≤ j implies more generally, graphs of bounded tree width link homomorphism
|Fi | < |F j | or |Fi | = |F j | and ∥Fi ∥ := |E(Fi )| ≤ ∥F j ∥. Let HOM vectors to the Weisfeiler-Leman algorithm.
be the matrix with entries HOM i j := hom(Fi , F j ), P the matrix
with entries Pi j := epi(Fi , F j ), M the matrix with entries Mi j := Theorem 4.4 ([33]). For all graphs G and H and all k ≥ 1,
emb(Fi , F j ), and D the diagonal matrix with entries D ii := aut(F 1 .
HomTk (G) = HomTk (H ) ⇐⇒ k-WL does not distinguish G and H .
i)
Then (4.2) yields the following matrix equation; Here Tk denotes the class of all graphs of tree width at most k.
HOM = P · D · M. (4.3) To prove the theorem, it suffices to consider connected graphs in
The crucial observation is that P is an lower triangular matrix, be- Tk , because for a graph F with connected components F 1, . . . , Fm
we have hom(F, G) = m i=1 hom(Fi , G). The connected graphs of
Î
cause epi(Fi , F j ) > 0 implies |Fi | ≥ |F j | and ∥Fi ∥ ≥ ∥F j ∥. Moreover,
P has positive diagonal entries, because epi(Fi , Fi ) ≥ 1. Similarly, M tree width 1 are the trees. We sketch a proof of the theorem for
is an upper triangular matrix with positive diagonal entries. Thus trees in Section 4.4.
P and M are invertible. The diagonal matrix D is invertible as well, In combination with Theorem 3.2, Theorem 4.4 implies the fol-
because D ii = aut(Fi ) ≥ 1. Thus the matrix HOM is invertible. lowing.
As |G |, |H | ≤ n there are graphs Fi , F j such that G is isomorphic Corollary 4.5. For all graphs G and H ,
to Fi and H is isomorphic to F j . Since hom(Fk , G) = hom(Fk , H )
HomT (G) = HomT (H ) ⇐⇒ G and H are fractionally isomor-
for all k by the assumption of the lemma, the ith and jth column
phic, that is, equations (3.2) and
of HOM are identical. As HOM is invertible, this implies that i = j
(3.3) have a nonnegative rational
and thus that G and H are isomorphic. □
solution.
The theorem can be seen as a the starting point for the theory Here T denotes the class of all trees.
of graph limits [19, 67, 68]. The graph embedding HomG maps Remarkably, for paths we obtain a similar characterisation that
graphs into an infinite dimensional real vector space, which can involves the same equations, but drops the nonnegativity constraint.
be turned into a Hilbert space by defining a suitable inner product.
This transformation enables us to analyse graphs with methods of Theorem 4.6 ([32]). For all graphs G and H ,
linear algebra and functional analysis and, for example, to consider HomP (G) = HomP (H ) ⇐⇒ equations (3.2) and (3.3) have a
convergent sequences of graphs and their limits, called graphons rational solution.
(see [67]). Here P denotes the class of all paths.

9
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

G H Another very remarkable result, due to Mančinska and Rober-

son [72], states that two graphs are homomorphism-indistinguish-
able over the class of all planar graphs if and only if they are quan-
tum isomorphic. Quantum isomorphism, introduced in [4], is a com-
plicated notion that is based on similar systems of equations as (3.2)
Figure 7: Two graphs that are homorphism- and (3.3) and their generalisation characterising higher-dimensional
indistinguishable over the class of paths Weisfeiler-Leman indistinguishability, but with non-commutative
variables ranging over the elements of a C ∗ -algebra.

While the proof of Theorem 4.4 relies on techniques similar to 4.2 Beyond Undirected Graphs
the proof of Theorem 4.2, the proof of Theorem 4.3 is based on So far, we have only considered homomorphism indistinguishability
spectral techniques similar to the proof of Theorem 4.3. on undirected graphs. A few results are known for directed graphs.
In particular, Theorem 4.2 directly extends to directed graphs. Ac-
Example 4.7. For the co-spectral graphs G, H shown in Figure 6
tually, we have the following stronger result, also due to Lovász
we have
[66] (also see [14]).
hom , G = 20, hom , H = 16. Theorem 4.11 ([66]). For all directed graphs G and H ,
HomDA (G) = HomDA (G) ⇐⇒ G and H are isomorphic.
Thus HomP (G) , HomP (H ).
Here DA denote the class of all directed acyclic graphs.
Example 4.8. Figure 7 shows graphs G, H with It is straightforward to extend Theorem 4.2, Theorem 4.4 (for the
HomP (G) = HomP (H ). natural generalisation of the WL algorithms to relational structures),
and Theorem 4.10 to arbitrary relational structures. This is very
Obviously, 1-WL distinguishes the two graphs. Thus HomT (G) , useful for binary relational structures such as knowledge graphs.
HomT (H ). It can also be checked that the graphs are not co-spectral. But for relations of higher arity one may consider another version
Hence HomC (G) , HomC (H ) based on the incidence graph of a structure.
Let σ = {R 1, . . . , Rm } be a relational vocabulary, where Ri is
Combined with Theorem 3.1, Theorem 4.4 implies the follow-
a ki -ary relation symbol. Let k be the maximum of the ki . We
ing correspondence between homomorphism counts of graphs of
let σI := {E 1, . . . , Ek , P1, . . . , Pm }, where the Ei are binary and
bounded tree width and the finite variable fragments of the count-
the P j are unary relation symbols. With every σ -structure A =
ing logic C introduced in Section 3.4.
(V (A), R 1 (A), . . . , Rm (A)) we associates a σI -structure AI called the
Corollary 4.9. For all graphs G and H and all k ≥ 1, incidence structure of A as follows:
• the universe of AI is
HomTk (G) = HomTk (H ) ⇐⇒ G and H are Ck +1 -equivalent.
Øm
V (AI ) := V (A) ∪ (Ri , v 1, . . . , vki (v 1, . . . , vki ) ∈ Ri (A) ;

This is interesting, because it shows that the homomorphism
vector HomTk (G) gives us all the information necessary to answer i=1

queries expressed in the logic Ck +1 . Unfortunately, the result does • for 1 ≤ j ≤ k, the relation E j (AI ) consists of all pairs
not tell us how to answer Ck +1 -queries algorithmically if we have v j , (Ri , v 1, . . . , vki )

access to the vector HomTk (G). To make the question precise, sup- for (Ri , v 1, . . . , vki ) ∈ V (AI ) with ki ≥ j;
pose we have oracle access that allows us to obtain, for every graph • for 1 ≤ i ≤ m, the relation Pi consist of all (Ri , v 1, . . . , vki ) ∈
F ∈ Tk , the entry hom(F, G) of the homomorphism vector. Is it V (AI ).
possible to answer a Ck+1 -query in polynomial time (either with
With this encoding of general structures as binary incidence struc-
respect to data complexity or combined complexity)?
tures we obtain the following corollary.
Arguably, from a logical perspective it is even more natural to
restrict the quantifier rank (maximum number of nested quantifiers) Corollary 4.12. For all σ -structures A and B, the following are
in a formula rather than the number of variables. Let Ck be the equivalent.
fragment of C consisting of all formulas of quantifier rank at most (1) HomT(σI ) (AI ) = HomT(σI ) (B I ), where T(σI ) denotes the
k. We obtain the following characterisation of Ck -equivalence in class of all σI -structures whose underlying (Gaifman) graph is
terms of homomorphism vectors over the class of graphs of tree a tree;
depth at most k. Tree depth, introduced by Nešetřil and Ossona (2) AI and B I are not distinguished by 1-WL;
de Mendez [81], is another structural graph parameter that has (3) AI and B I are C2 -equivalent.
received a lot of attention in recent years (e.g. [9, 23, 28, 34, 35]).
Böker [14] gave a generalisation of Theorem 4.4 to hypergraphs
Theorem 4.10 ([42]). For all graphs G and H and all k ≥ 1, that is also based on incidence graphs.
In a different direction, we can generalise the results to weighted
HomTDk (G) = HomTDk (H ) ⇐⇒ G and H are Ck -.equivalent.
graphs. Let us consider undirected graphs with real-valued edge
Here TDk denotes the class of all graphs of tree depth at most k. weights. We can also view them as symmetric matrices over the

10
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

reals. Recall that we denote the weight of an edge uv by α(u, v) and vectors to define node embeddings. A rooted graph is a pair (G, v)
that weighted 1-WL refines by sums of edge weights (instead of where G is a graph and v ∈ V (G). For two rooted graphs (F, u) and
numbers of edges). Let F be an unweighted graph and G a weighted (G, v), by hom(F, G; u 7→ v) we denote the number of homomor-
graph. For every mapping h : V (F ) → V (G) we let phism h from F to G with h(u) = v. For a class F∗ of rooted graphs
Ö and a rooted graph (G, v), we let
wt(h) := α(h(u), h(u ′ )).
HomF∗ (G, v) := hom(F, G; u 7→ v) (F, u) ∈ F∗ .

uu ′ ∈E(F )
As α(v, v ′ ) = 0 if andonly if vv ′
< E(G), we have wt(h) , 0 if If we keep the graph G fixed, this gives us an embedding of the
and only if h is a homomorphism from F to G; the weight of this nodes of G into an infinite dimensional vector space. Note that in
homomorphism is the product of the weights of the edges in its the terminology of Section 2.1, this embedding is “inductive” and
image. We let not “transductive”, because it is not tied to a fixed graph. (Never-
Õ theless, the term “inductive” is not fitting very well here, because
hom(F, G) := wt(h).
the embedding is not learned.) In the same way we defined graph
h:V (F )→V (G)
kernels based on homomorphism vectors of graphs, we can now
In statistical physics, such sum-product functions are known as define node kernels.
partition functions. For a class F of graphs, we let It is straightforward to generalise Theorem 4.2 to rooted graphs,
HomF (G) := hom(F, G) F ∈ F . showing that for all rooted graphs (G, v) and (H, w) it holds that

Theorem 4.13 ([22]). For all weighted graphs G and H , the follow- HomG∗ (G, v) = HomG∗ (H, w) ⇐⇒ there is an isopmorphism f
ing are equivalent. from G to H with f (v) = w.
(1) HomT (G) = HomT (H ) (recall that T denotes the class of all Here G∗ denote the class of all rooted graphs. Maybe the easiest
trees); way to prove this is by a reduction to node-labelled graphs.
(2) G and H are not distinguished by weighted 1-WL; Another key result of Section 4.1 that can be adapted to the node
(3) equations (3.2) and (3.3) have a nonnegative rational solution. setting is Theorem 4.4. We only state the version for trees.
Theorem 4.14. For all graphs G, H and all v ∈ V (G), w ∈ V (H ),
4.3 Complexity the following are equivalent.
In general, counting the number of homomorphisms from a graph (1) HomT ∗ (G, v) = HomT ∗ (H, w) for the class T ∗ of all rooted
F to a graph G is a #P-hard problem. Dalmau and Jonsson [31] trees;
proved that, under the reasonable complexity theoretic assumption (2) 1-WL assigns the same colour to v and w.
#W[1] , FPT from parameterised complexity theory, for all classes
F of graphs, computing hom(F, G) given a graph F ∈ F and an This result is implicit in the proof of Theorem 4.4 (see [32, 33]).
arbitrary graph G, is in polynomial time if and only if F has bounded In fact, it can be viewed as the graph theoretic core of the proof.
tree width. This makes Theorem 4.4 even more interesting, because We sketch the proof here and also show how to derive Theorem 4.4
the entries of a homomorphism vector HomF (G) are computable (for trees) from it.
in polynomial time precisely for bounded tree width classes.
Proof sketch. Recall from Section 3.5 that we can view the
However, the computational problem we are facing is not to com-
colours assigned by 1-WL as rooted trees (see Figure 5). For the kth
pute individual entries of a homomorphism vectors, but to decide if
round of WL, this is a tree of height k, and for the stable colouring
two graphs have the same homomorphism vector, that is, if they are
we can view it as an infinite tree. Suppose now that the colour of a
homomorphism indistinguishable. The characterisation theorems
vertex v of G is T . The crucial observation is that for every rooted
of Section 4.1 imply that homomorphism indistinguishability is
tree (S, r ) the number hom(S, G; r 7→ v) is precisely the number of
polynomial-time decidable over the classes P of paths, C of cycles,
mappings h : V (S) → V (T ) that map the root r of S to the root of T
T of trees, Tk of graphs of tree width at most k, TDk of tree depth
and, for each node s ∈ V (S), map the children of s to the children
at most k. Moreover, homomorphism indistinguishability over the
of h(s) in T . Let us call such mappings rooted tree homomorphisms.
class of G of all graphs is decidable in quasi-polynomial time by
Implication (2) =⇒ (1) follows directly from this observation.
Babai’s [7] celebrated result that graph isomorphism is decidable in
Implication (1) =⇒ (2) follows as well, because by an argument
quasi-polynomial time. It was proved in [15] that homomorphism
similar to that used in the proof of Theorem 4.2 it can be shown
indistinguishability over the class of complete graphs is complete
that for distinct rooted trees T ,T ′ there is a rooted tree that has
for the complexity class C= P, which implies that it is co-NP hard,
distinct numbers of rooted tree homomorphisms to T ,T ′ . □
and that there is a polynomial time decidable class F of graphs of
bounded tree width such that homomorphism indistinguishability Corollary 4.15. For all graphs G, H and all v ∈ V (G), w ∈ V (H ),
over F is undecidable. Quite surprisingly, the fact that quantum the following are equivalent.
isomorphism is undecidable [4] implies that homomorphism distin- (1) HomT ∗ (G, v) = HomT ∗ (H, w);
guishability over the class of planar graphs is undecidable. (2) for all formulas φ(x) of the logic C2 ,

4.4 Homomorphism Node Embeddings G |= φ(v) ⇐⇒ H |= φ(w).

So far, we have defined graph and structure embeddings based Note that the node embeddings based on homomorphism vec-
on homomorphism vectors. But we can also use homomorphism tors are quite different from the node embeddings described in

11
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

(d ) Îm di
Section 2.1. They are solely based on structural properties and ig- Thus, letting a j := i=1 ai j , we have
nore the distance information. Results like Corollary 4.15 show that
the structural information captured by the homomorphism-based n n
(d ) (d)
Õ Õ
embeddings in principle enables us to answer queries directly on the a j p j = hom(T (d), G) = hom(T (d), H ) = aj qj .
embedding, which may be more useful than distance information j=1 j=1
in database applications. Using these additional equations, it can be shown that p j = q j for
We close this section by sketching how Theorem 4.4 (for trees) all j (see [43, Lemma 4.2]). Thus WL does not distinguish G and
follows from Theorem 4.14. H. □
Proof sketch of Theorem 4.4 (for trees). Let G, H be graphs.
We need to prove 4.5 Homomorphisms and GNNs
We have a correspondence between homomorphism vectors and
HomT (G) = HomT (H ) ⇐⇒ 1-WL does not distinguish G and H . and the Weisfeiler-Leman algorithm (Theorems 4.4 and 4.14) and
(4.4) between the the WL algorithm and GNNs (see Section 3.6). This
Without loss of generality we assume that V (G) ∩ V (H ) = ∅. For also establishes a correspondence between homomorphism vectors
x, y ∈ V (G) ∪V (H ), we write x ∼ y if 1-WL assigns the same colour and GNNs. More directly, the correspondence between GNNs and
to x and y. By Theorem 4.14, for X, Y ∈ {G, H } and x ∈ V (X ), homomorphism counts is also studied in [69].
y ∈ V (Y ) we have x ∼ y if and only if HomT ∗ (X, x) = HomT ∗ (X, y).
Let R 1, . . . , Rn be the ∼-equivalence classes, and for every j, let
5 SIMILARITY
P j := R j ∩ V (G) and Q j := R j ∩ V (H ). Furthermore, let p j := |P j |
and q j := |Q j |. The results described in the previous section can be interpreted
We first prove the backward direction of (4.4). Assume that 1-WL as results on the expressiveness of homomorphism-based embed-
does not distinguish G and H . Then p j = q j for all j ∈ [n]. Let T be dings of structures and their nodes. However, all these results only
a tree, and let t ∈ V (T ). Let h j := hom(T , X ; t 7→ x) for x ∈ R j and show what it means that two objects are mapped to the same ho-
X ∈ {G, H } with x ∈ V (X ). Then momorphism vector. More interesting is the similarity measure the
Õ vector embeddings induce via some an inner product/norm on the
hom(T , G) = hom(T , G; t 7→ v) latent space (see (4.1)). We can speculate that, given the nice re-
v ∈V (G) sults regarding equality of vectors, the similarity measure will have
Õn n
Õ similarly nice properties. Let me propose the following, admittedly
= pj hj = qj hj vague, hypothesis.
j=1 j=1
Õ For suitable classes F, the homomorphism embedding
= hom(T , H ; t 7→ w) = hom(T , H ). HomF combined with a suitable inner product on the
w ∈V (H ) latent space induces a natural similarity measure on
Since T was arbitrary, this proves HomT (G) = HomT (H ). graphs or relational structures.
The proof of the forward direction of (4.4) is more complicated. From a practical perspective, we could support this hypothesis by
Assume HomT (G) = HomT (H ). There is a finite collection of m ≤ showing that the vector embeddings give good results when com-
n
2 rooted trees (T1, r 1 ), . . . , (Tm , rm ) such that for all X, Y ∈ {G, H } bined with similarity based downstream tasks. As mentioned earlier,
and x ∈ V (X ), y ∈ V (Y ) we have x ∼ y if and only if for all i ∈ [m], initial experiments show that homomorphism vectors in combina-
tion with support vector machines perform well on standard graph
hom(Ti , X ; r i 7→ x) = hom(Ti , Y ; r i 7→ y).
classification benchmarks. But a more thorough experimental study
Let ai j := hom(Ti , X ; r i 7→ x) for x ∈ R j and X ∈ {G, H } with will be required to have conclusive results.
x ∈ V (X ). Then for all i we have From a theoretical perspective, we can compare the homomor-
n n
phism-based similarity measures with other similarity measures for
Õ Õ
ai j p j = hom(Ti , G) = hom(Ti , H ) = ai j q j graphs and discrete structures. If we can prove that they coincide
j=1 j=1
or are close to each other, then this would support our hypothesis.

Unfortunately, the matrix A = (ai j )i ∈[m],j ∈[n] is not necessarily 5.1 Similarity from Matrix Norms
invertible, so we cannot directly conclude that p j = q j for all j.
A standard way of defining similarity measures on graphs is based
All we know is that for any two columns of the matrix there is a
on comparing their adjacency matrices. Let us briefly review a
row such that the two columns have distinct values in that row. It
few matrix norms. Recall the standard ℓp -vector norm ∥x ∥p :=
turns out that this is sufficient. For every vector d = (d 1, . . . , dm ) Í p 1/p . 1 The two best-known matrix norms are the Frobe-
of nonnegative integers, let (T (d ), r (d ) ) be the rooted tree obtained i |x i | qÍ
by taking the disjoint union of di copies of Ti for all i and then nius norm ∥M ∥F := i,j Mi j and the spectral norm ∥M ∥ ⟨2⟩ :=
2
identifying the roots of all these trees. It is easy to see that
m
Ö
hom(T (d ), X ; r (d ) 7→ x) = hom(Ti , X ; r i 7→ x). 1 Note that ∥x ∥2 is just the Euclidean norm, which we denoted by ∥x ∥ earlier in this
i=1 paper.

12
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

supx ∈Rn , ∥x ∥2 =1 ∥Mx ∥2 . More generally, for every p > 0 we define single vertex we need to flip to turn G into a graph isomorphic to
H . Formally,
1/p

|Mi j |p ®
©Õ
∥M ∥p :=
ª
dist1 (G, H ) = 2 min f (v)f (v ′ ) vv ′ ∈ E(G) △E(H ) . (5.3)

« i,j ¬ f :V →W
bijection
(so ∥M ∥F = ∥M ∥2 ) and dist ⟨1⟩ (G, H ) =

∥Mx ∥p , . min max f (v ′ ) v ′ ∈ NG (v) △ w ′ w ∈ N H (f (v)) .

∥M ∥ ⟨p ⟩ := sup
x ∈Rn , ∥x ∥p =1 f :V →W v ∈V
bijection
Thus both ∥M ∥p and ∥M ∥ ⟨p ⟩ are derived from the ℓp -vector norm. (5.4)
For ∥M ∥p , the matrix is simply flattened into a vector, and for
∥M ∥ ⟨p ⟩ the matrix is viewed as a linear operator. Another matrix Here △ denotes the symmetric difference. Equation (5.3) is ob-
norm that is interesting here is the cut norm defined by vious; the factor ’2’ comes from the fact that we regard (undi-
rected) edges as 2-element sets and not as ordered pairs. To prove
(5.4), we observe that for every matrix M ∈ Rn×n it holds that
Õ
∥M ∥□ := max Mi j ,
S ,T
i ∈S ,j ∈T ∥M ∥ ⟨1⟩ = maxj ∈[n] ni=1 |Mi j |.
Í

where S,T range over all subsets of the index set of the matrix. Despite these intuitive interpretations, it is debatable how much
Observe that for M ∈ Rn×n we have “semantic relevance” these distance measures have. How similar are
two graphs that can be transformed into each other by flipping, say,
∥M ∥□ ≤ ∥M ∥1 ≤ n∥M ∥F , 5% of the edges? Again, the answer to this question may depend
on the application context.
where the second inequality follows from the Cauchy-Schwarz
A big disadvantage the graph distance measures based on matrix
inequality. If we compare matrices of different size, it can be rea-
norms have is that computationally they are highly intractable (see,
sonable to scale the norms by a factor depending on n.
for example, [3] and the references therein). It is even NP-hard to
For technical reasons, we only consider matrix norms ∥ · ∥ that
compute the distance between two trees (see [46] for Frobenius
are invariant under permutations of the rows and columns, that is,
distance and [38] for the distances based on operator norms), and
∥M ∥ = ∥MP ∥ = ∥QM ∥ for all permutation matrices P, Q. (5.1) distances are hard to approximate. The problem of computing these
distances is related to the maximisation version of the quadratic
It is easy to see that the norms discussed above have this property.
assignment problem (see [70, 79]), a notoriously hard combinatorial
Now let G, H be graphs with vertex sets V ,W and adjacency
optimisation problem. Better behaved is the cut-distance dist□ ; at
matrices A ∈ RV ×V , B ∈ RW ×W . For convenience, let us assume least it can be approximated within a factor of 2 [2].
that |G | = |H | =: n. Then both A, B are n × n-matrices, and we can The main source of hardness is the minimisation over the un-
compare them using a matrix norm. However, it does not make wieldy set of all permutations (or permutation matrices). To alleviate
much sense to just consider ∥A − B∥, because graphs do not have this hardness, we can relax the integrality constraints and minimise
a unique adjacency matrix, and even if G and H are isomorphic, over the convex set of all doubly stochastic matrices instead. That
∥A − B∥ may be large. Therefore, we align the two matrices in an is, we define a relaxed distance measure
optimal way by permuting the rows and columns of A. For a matrix
norm ∥ · ∥, we define a graph distance measure dist ∥ · ∥ g ∥ · ∥ (G, H ) =
dist min ∥AX − X B∥. (5.5)
⊤ X ∈ {0,1}V ×W
dist ∥ · ∥ (G, H ) := min ∥P AP − B∥. doubly stochastic
P ∈ {0,1}V ×W
permutation matrix
Note that dist
g ∥ · ∥ is only a pseudo-metric: the distance between non-
It follows from (5.1) that dist ∥ · ∥ is well-defined, that is, does not isomorphic graphs may be 0. Indeed, it follows from Theorem 3.2
depend on the choice of the particular adjacency matrices A, B. It g ∥ · ∥ (G, H ) = 0 if and only if G and H are fractionally isomor-
that dist
also follows from (5.1) and that fact that P −1 = P ⊤ for permutation phic. The advantage of these “relaxed” distances is that for many
matrices that
norms ∥ · ∥, computing dist g ∥ · ∥ is a convex minimisation problem
dist ∥ · ∥ (G, H ) = min ∥AP − PB∥, (5.2) that can be solved efficiently.
P ∈ {0,1}V ×W So far, we have only discussed distance measures for graphs
permutation matrix
of the same order. To extend these distance measures to arbitrary
which is often easier to work with because the expression AP − graphs, we can replace vertices by sets of identical vertices in both
PB is linear in the “variables” Pi j . To simplify the notation, we graphs to obtain two graphs whose order is the least common
let distp := dist ∥ · ∥p and dist ⟨p ⟩ := dist ∥ · ∥⟨p⟩ for all p, and we let multiple of the orders of the two initial graphs (see [67, Section 8.1]
dist□ := dist ∥ · ∥□ . for details).
The distances defined from the ℓ1 -norm have natural interpre- Note that these matrix based similarity measures are only defined
tations as edit distances. dist1 (G, H ) is twice the number of edges for (possibly weighted) graphs. In particular for the operator norms,
that need to be flipped to turn G into a graph isomorphic to H , it is not clear how to generalise them to relational structures, and
and dist ⟨1⟩ (G, H ) is the maximum number of edges incident with a if such a generalisation would even be meaningful.

13
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

5.2 Comparing Homomorphism Distances and [5] A. Atserias and E. Maneva. 2013. Sherali–Adams Relaxations and Indistinguisha-
bility in Counting Logics. SIAM J. Comput. 42, 1 (2013), 112–137.
Matrix Distances [6] A. Atserias and J. Ochremiak. 2018. Definable Ellipsoid Method, Sums-of-
It would be very nice if we could establish a connection between Squares Proofs, and the Isomorphism Problem. In Proceedings of the 33rd Annual
ACM/IEEE Symposium on Logic in Computer Science. 66–75.
graph distance measures based on homomorphism vectors and [7] L. Babai. 2016. Graph Isomorphism in Quasipolynomial Time. In Proceedings of
those based on matrix norms. At least one important result in this the 48th Annual ACM Symposium on Theory of Computing (STOC ’16). 684–697.
[8] L. Babai, P. Erdös, and S. Selkow. 1980. Random graph isomorphism. SIAM J.
direction exists: Lovász [67] proves an equivalence between the cut- Comput. 9 (1980), 628–635.
distance of graphs and a distance measure derived from a suitably [9] M. Bannach and T. Tantau. 2016. Parallel Multivariate Meta-Theorems. In
scaled homomorphism vector HomG . Proceedings of the 11th International Symposium on Parameterized and Exact
Computation (LIPIcs), J. Guo and D. Hermelin (Eds.), Vol. 63. Schloss Dagstuhl -
It is tempting to ask if a similar correspondence can be estab- Leibniz-Zentrum für Informatik, 4:1–4:17.
g ∥ · ∥ and HomF . There are many related question
lished between dist [10] P. Barceló, E.V. Kostylev, M. Monet, J. Pérez, J. Reutter, and J.P. Silva. 2020. The
Logical Expressiveness of Graph Neural Networks. In Proceedings of the 8th
that deserve further attention. International Conference on Learning Representations. https://openreview.net/
forum?id=r1lZ7AEKvB
[11] M. Belkin and P. Niyogi. 2003. Laplacian Eigenmaps for Dimensionality Reduc-
6 CONCLUDING REMARKS tion and Data Representation. Neural Computation 15, 6 (2003), 1373–1396.
In this paper, we gave an overview of embeddings techniques for [12] C. Berkholz, P. Bonsma, and M. Grohe. 2017. Tight Lower and Upper Bounds for
the Complexity of Canonical Colour Refinement. Theory of Computing Systems
graphs and relational structures. Then we discussed two related 60, 4 (2017), 581–614.
theoretical approaches, the Weisfeiler-Leman algorithm with its [13] C. Berkholz and M. Grohe. 2015. Limitations of Algebraic Approaches to Graph
Isomorphism Testing. In Proceedings of the 42nd International Colloquium on Au-
various ramifications and homomorphism vectors. We saw that they tomata, Languages and Programming, Part I (Lecture Notes in Computer Science),
have a rich and beautiful theory that leads to new, generic families M.M. Halldórsson, K. Iwama, N. Kobayashi, and B. Speckmann (Eds.), Vol. 9134.
of vector embeddings and helps us to get a better understanding of Springer Verlag, 155–166.
[14] J. Böker. 2019. Color Refinement, Homomorphisms, and Hypergraphs. In Pro-
some of the techniques used in practice, for example graph neural ceedings of the 45th International Workshop on Graph-Theoretic Concepts in
networks. Computer Science (Lecture Notes in Computer Science), I. Sau and D.M. Thilikos
Yet we have also seen that we are only at the beginning and many (Eds.), Vol. 11789. Springer, 338–350.
[15] J. Böker, Y. Chen, M. Grohe, and G. Rattan. 2019. The Complexity of Homomor-
questions remain open, in particular when it comes to similarity phism Indistinguishability. In Proceedings of the 44th International Symposium on
measures defined on graphs and relational structures. Mathematical Foundations of Computer Science (Leibniz International Proceedings
in Informatics (LIPIcs)), P. Rossmanith, P. Heggernes, and J.-P. Katoen (Eds.),
From a database perspective, it will be important to generalise Vol. 138. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 54:1–54:13.
the embedding techniques to relations of higher arities, which is [16] R. Bordawekar and O. Shmueli. 2017. Using word embedding to enable semantic
not as trivial as it may seem (and where surprisingly little has been queries in relational databases. In Proceedings of the Data Management for End
to End Learning Work Workshop, SIGMOD’17.
done so far). A central question is then how to query the embedded [17] R. Bordawekar and O. Shmueli. 2019. Exploiting Latent Information in Relational
data. Which queries can we answer at all when we only see the Databases via Word Embedding and Application to Degrees of Disclosure. In
vectors in latent space? How do imprecisions and variations due to Proceedings of the 9th Biennial Conference on Innovative Data Systems Research.
[18] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. 2013.
randomness affect the outcome of such query answers? Probably, Translating embeddings for modeling multi-relational data. In Advances in
we can only answer queries approximately, but what exactly is the neural information processing systems. 2787–2795.
[19] C. Borgs, J. Chayes, L. Lovász, V. Sós, B. Szegedy, and K. Vesztergombi. 2006.
semantics of such approximations? These are just a few questions Graph limits and parameter testing. In Proceedings of the 38th Annual ACM
that need to be answered, and I believe they offer very exciting Symposium on Theory of Computing. 261–270.
research opportunities for both theoreticians and practitioners. [20] K.M. Borgwardt and H.-P. Kriegel. 2005. Shortest-path kernels on graphs. In
Proceedings of the 5th IEEE International Conference on Data Mining. 74–81.
[21] J. Bourgain. 1985. On Lipschitz embeddings of finite metric spaces in Hilbert
spaces. Israel Journal of Mathematics 52, 1-2 (1985), 46–52.
Acknowledgements [22] A. Bulatov, M. Grohe, and G. Rattan. [n.d.]. In preparation.
This paper was written in strange times during COVID-19 lock- [23] J. Bulian and A. Dawar. 2014. Graph isomorphism parameterized by elimination
distance to bounded degree. In Proceedings of the 9th International Symposium
down. I appreciate it that some of my colleagues nevertheless took on Parameterized and Exact Computation (Lecture Notes in Computer Science),
the time to answer various questions I had on the topics covered M. Cygan and P. Heggernes (Eds.), Vol. 8894. Springer Verlag, 135–146.
here and to give valuable feedback on an earlier version of this [24] J. Cai, M. Fürer, and N. Immerman. 1992. An optimal lower bound on the number
of variables for graph identification. Combinatorica 12 (1992), 389–410.
paper. In particular, I would like to thank Pablo Barceló, Neta Fried- [25] S. Cao, W. Lu, and Q. Xu. 2015. GraRep: Learning graph representations with
man, Benny Kimelfeld, Christopher Morris, Petra Mutzel, Martin global structural information. In Proceedings of the 24th ACM International on
Ritzert, and Yufei Tao. Conference on Information and Knowledge Management. 891–900.
[26] S. Cao, W. Lu, and Q. Xu. 2016. Deep neural networks for learning graph repre-
sentations. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.
1145–1152.
REFERENCES [27] A. Cardon and M. Crochemore. 1982. Partitioning a graph in O ( |A | log2 |V |).
[1] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A.J. Smola. Theoretical Computer Science 19, 1 (1982), 85 – 98.
2013. Distributed large-scale natural graph factorization. In Proceedings of the [28] Y. Chen and J. Flum. 2018. Tree-depth, Quantifier Elimination, and Quantifier
22nd International World Wide Web Conference. 37–48. Rank. In Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in
[2] N. Alon and A. Naor. 2006. Approximating the Cut-Norm via Grothendieck’s Computer Science. 225–234.
Inequality. SIAM J. Comput. 35 (2006), 787–803. [29] C. Cortes and V. Vapnik. 1995. Support-vector networks. Machine Learning 20,
[3] V. Arvind, J. Köbler, S. Kuhnert, and Y. Vasudev. 2012. Approximate Graph Iso- 3 (1995), 273–297.
morphism.. In Proceedings of the 37th International Symposium on Mathematical [30] R. Curticapean, H. Dell, and D. Marx. 2017. Homomorphisms are a good basis
Foundations of Computer Science (Lecture Notes in Computer Science), B. Rovan, for counting small subgraphs. In Proceedings of the 49th Annual ACM SIGACT
V. Sassone, and P. Widmayer (Eds.), Vol. 7464. Springer Verlag, 100–111. Symposium on Theory of Computing (STOC ’17). 210–223.
[4] A. Atserias, Laura Mančinska, D.E. Roberson, R. Šámal, S. Severini, and A. [31] V. Dalmau and P. Jonsson. 2004. The complexity of counting homomorphisms
Varvitsiotis. 2019. Quantum and non-signalling graph isomorphisms. Journal seen from the other side. Theoretical Computer Science 329, 1-3 (2004), 315–323.
of Combinatorial Theory, Series B 136 (2019), 289–328.

14
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

[32] H. Dell, M. Grohe, and G. Rattan. 2018. Lovász Meets Weisfeiler and Leman. In [60] R. Kondor and J.D. Lafferty. 2002. Diffusion Kernels on Graphs and Other
Proceedings of the 45th International Colloquium on Automata, Languages and Discrete Input Spaces. In Proceedings of the 19th International Conference on
Programming (Track A) (LIPIcs), I. Chatzigiannakis, C. Kaklamanis, D. Marx, and Machine Learning. 315–322.
D. Sannella (Eds.), Vol. 107. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, [61] N.M. Kriege, F.D Johansson, and C. Morris. 2019. A survey on graph kernels.
40:1–40:14. ArXiv arXiv:1903.11835 [cs.LG] (2019).
[33] Z. Dvorák. 2010. On recognizing graphs by numbers of homomorphisms. Journal [62] N. Kriege, M. Neumann, C. Morris, K. Kersting, and P. Mutzel. 2019. A unifying
of Graph Theory 64, 4 (2010), 330–342. view of explicit and implicit feature maps of graph kernels. Data Mining and
[34] M. Elberfeld, M. Grohe, and T. Tantau. 2016. Where First-Order and Monadic Knowledge Discovery 33, 6 (2019), 1505–1547. https://doi.org/10.1007/s10618-
Second-Order Logic Coincide. ACM Transaction on Computational Logic 17, 4 019-00652-0
(2016). Article No. 25. [63] J.B. Kruskal. 1964. Multidimensional scaling by optimizing goodness of fit to a
[35] M. Elberfeld, A. Jakoby, and T. Tantau. 2012. Algorithmic Meta Theorems for nonmetric hypothesis. Psychometrika 29, 1 (1964), 1–27.
Circuit Classes of Constant and Logarithmic Depth. In Proceedings of the 29th [64] N. Linial, E. London, and Y. Rabinovich. 1995. The geometry of graphs and some
International Symposium on Theoretical Aspects of Computer Science (LIPIcs), of its algorithmic applications. Combinatorica 15 (1995), 212–245.
C. Dürr and T. Wilke (Eds.), Vol. 14. Schloss Dagstuhl - Leibniz-Zentrum fuer [65] L. Lovász. 1967. Operations with Structures. Acta Mathematica Hungarica 18
Informatik, 66–77. (1967), 321–328.
[36] C. Gallicchio and A. Micheli. 2010. Graph echo state networks. In Proceedings [66] L. Lovász. 1971. On the cancellation law among finite relational structures.
of the IEEE International Joint Conference on Neural Networks. Periodica Mathematica Hungarica 1, 2 (1971), 145–156.
[37] T. Gärtner, P. Flach, and S. Wrobel. 2003. On graph kernels: Hardness results and [67] L. Lovász. 2012. Large Networks and Graph Limits. American Mathematical
efficient alternatives. In Learning theory and kernel machines. Springer Verlag, Society.
129–143. [68] L. Lovász and B. Szegedy. 2006. Limits of dense graph sequences. Journal of
[38] T. Gervens. 2018. Spectral Graph Similarity. Master Thesis at RWTH Aachen. Combinatorial Theory, Series B 96, 6 (2006), 933–957.
[39] E. Grädel, M. Grohe, B. Pago, and W. Pakusa. 2019. A Finite-Model-Theoretic [69] T. Maehara and H. NT. 2019. A Simple Proof of the Universality of Invari-
View on Propositional Proof Complexity. Logical Methods in Computer Science ant/Equivariant Graph Neural Networks. ArXiv arXiv:1910.03802 [cs.LG] (2019).
15, 1 (2019), 4:1–4:53. [70] K. Makarychev, R. Manokaran, and M. Sviridenko. 2014. Maximum quadratic
[40] E. Grädel, P.G. Kolaitis, L. Libkin, M. Marx, J. Spencer, M.Y. Vardi, Y. Venema, and assignment problem: Reduction from maximum label cover and lp-based ap-
S. Weinstein. 2007. Finite Model Theory and Its Applications. Springer Verlag. proximation algorithm. ACM Transactions on Algorithms 10, 4 (2014), 18.
[41] M. Grohe. 2017. Descriptive Complexity, Canonisation, and Definable Graph [71] P. Malkin. 2014. Sherali–Adams relaxations of graph isomorphism polytopes.
Structure Theory. Lecture Notes in Logic, Vol. 47. Cambridge University Press. Discrete Optimization 12 (2014), 73–97.
[42] M. Grohe. 2020. Counting Bounded Tree Depth Homomorphisms. ArXiv [72] L. Mančinska and D.E. Roberson. 2019. Quantum isomorphism is equiv-
arXiv:2003.08164 [cs.LO] (2020). alent to equality of homomorphism counts from planar graphs. ArXiv
[43] M. Grohe. 2020. Counting Bounded Tree Depth Homomorphisms. In Submitted. arXiv:1910.06958v2 [quant-ph] (2019).
[44] M. Grohe, K. Kersting, M. Mladenov, and E. Selman. 2014. Dimension Reduction [73] B.D. McKay and A. Piperno. 2014. Practical graph isomorphism, II. Journal of
via Colour Refinement. In Proceedings of the 22nd Annual European Symposium Symbolic Compututation 60 (2014), 94–112.
on Algorithms (Lecture Notes in Computer Science), A. Schulz and D. Wagner [74] T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, and J. Dean. 2013. Distributed
(Eds.), Vol. 8737. Springer-Verlag, 505–516. representations of words and phrases and their compositionality. In Proceedings
[45] M. Grohe and M. Otto. 2015. Pebble Games and Linear Equations. Journal of of the 27th Annual Conference on Neural Information Processing Systems. 3111–
Symbolic Logic 80, 3 (2015), 797–844. 3119.
[46] M. Grohe, G. Rattan, and G. Woeginger. 2018. Graph Similarity and Approximate [75] H.L. Morgan. 1965. The generation of a unique machine description for chemical
Isomorphism. In Proceedings of the 43rd International Symposium on Mathemat- structures—a technique developed at chemical abstracts service. Journal of
ical Foundations of Computer Science (LIPIcs), I. Potapov, P.G. Spirakis, and Chemical Documentation 5, 2 (1965), 107–113.
J. Worrell (Eds.), Vol. 117. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, [76] C. Morris, K. Kersting, and P. Mutzel. 2017. Globalized Weisfeiler-Lehman
20:1–20:16. Graph Kernels: Global-Local Feature Maps of Graphs. In Proceedings of the 2017
[47] M. Grohe, P. Schweitzer, and D. Wiebking. 2020. Deep Weisfeiler Leman. ArXiv IEEE International Conference on Data Mining. 327–336.
arXiv:2003.10935 [cs.LO] (2020). [77] C. Morris, N.M. Kriege, K. Kersting, and P. Mutzel. 2016. Faster kernel for
[48] A. Grover and J. Leskovec. 2016. node2vec: Scalable feature learning for net- graphs with continuous attributes via hashing. In Proceedings of the 16th IEEE
works. In Proceedings of the 22nd ACM SIGKDD International Conference on International Conference on Data Mining. 1095–1100.
Knowledge Discovery and Data Mining, B. Krishnapuram, M. Shah, A.J. Smola, [78] C. Morris, M. Ritzert, M. Fey, W. Hamilton, J.E. Lenssen, G. Rattan, and M. Grohe.
C.C. Aggarwal, D. Shen, and R. Rastogi: (Eds.). 855–864. 2019. Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks. In
[49] W. Hamilton, R. Ying, and J. Leskovec. 2017. Inductive Representation Learn- Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Vol. 4602-4609.
ing on Large Graphs. In Proceedings of the 30th Annual Conference on Neural AAAI Press.
Information Processing Systems. 1024–1034. [79] V. Nagarajan and M. Sviridenko. 2009. On the maximum quadratic assignment
[50] W.L. Hamilton, R. Ying, and J. Leskovec. 2017. Representation learning on problem. In Proceedings of the twentieth Annual ACM-SIAM Symposium on
graphs: methods and applications. ArXiv arXiv:1709.05584 [cs.SI] (2017). Discrete Algorithms. 516–524.
[51] S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural [80] A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal.
Computation 9, 8 (1997), 1735–1780. 2017. graph2vec: Learning Distributed Representations of Graphs. ArXiv (CoRR)
[52] T. Horváth, T. Gärtner, and S. Wrobel. 2004. Cyclic pattern kernels for predictive arXiv:1707.05005 [cs.AI] (2017).
graph mining. In Proceedings of the 10th ACM SIGKDD International Conference [81] J. Nešetřil and P. Ossona de Mendez. 2006. Linear time low tree-width partitions
on Knowledge Discovery and Data Mining. 158–167. and algorithmic consequences. In Proceedings of the 38th ACM Symposium on
[53] N. Immerman and E. Lander. 1990. Describing graphs: A first-order approach Theory of Computing. 391–400.
to graph canonization. In Complexity theory retrospective, A. Selman (Ed.). [82] M. Neumann, R. Garnett, and K. Kersting. 2013. Coinciding walk kernels: Parallel
Springer-Verlag, 59–81. absorbing random walks for learning with graphs and few labels. In Proceedings
[54] P. Indyk. 2001. Algorithmic applications of low-distortion geometric embeddings. of the 5th Asian Conference on Machine Learning. 357–372.
In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science. [83] M. Nickel, V. Tresp, and H.-P. Kriegel. 2011. A three-way model for collec-
10–33. tive learning on multi-relational data. In Proceedings of the 28th International
[55] W. Johnson and J. Lindenstrauss. 1984. Extensions of Lipschitz mappings into a Conference on Machine Learning. 809–816.
Hilbert space. Contemp. Math. 26 (1984), 189–206. [84] R. O’Donnell, J. Wright, C. Wu, and Y. Zhou. 2014. Hardness of Robust Graph
[56] H. Kashima, K. Tsuda, and A. Inokuchi. 2003. Marginalized kernels between Isomorphism, Lasserre Gaps, and Asymmetry of Random Graphs. In Proceedings
labeled graphs. In Proceedings of the 20th International Conference on Machine of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms. 1659–1677.
Learning. 321–328. [85] M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu. 2016. Asymmetric transitivity pre-
[57] K. Kersting, M. Mladenov, R. Garnet, and M. Grohe. 2014. Power Iterated Color serving graph embedding. In Proceedings of the 22nd ACM SIGKDD International
Refinement. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Conference on Knowledge Discovery and Data Mining. 1105–1114.
P. Stone C.E. Brodley (Ed.). 1904–1910. [86] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang. 2018. Adversarially Regular-
[58] T.N. Kipf and M. Welling. 2016. Variational Graph Auto-Encoders. ArXiv ized Graph Autoencoder for Graph Embedding. ArXiv (CoRR) arXiv:1802.04407
arXiv:1611.07308 [stat.ML] (2016). [cs.LG] (2018). http://arxiv.org/abs/1802.04407
[59] T. N. Kipf and M. Welling. 2017. Semi-supervised classification with graph [87] B. Perozzi, R. Al-Rfou, and S. Skiena. 2014. Deepwalk: Online learning of social
convolutional networks. In Proceedings of the 5th International Conference on representations. In Proceedings of the 20th ACM SIGKDD International Conference
Learning Representations. on Knowledge Discovery and Data Mining. 710–710.

15
Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

[88] T. Pham, T. Tran, D. Phung, and S. Venkatesh. 2017. Column networks for Vol. 2777. Springer Verlag, 144–158.
collective classification. In Proceedings of the 31st AAAI Conference on Artificial [97] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. 2015. LINE: Large-scale
Intelligence. 2485–2491. information network embedding. In Proceedings of the 24th International World
[89] J. Ramon and T. Gärtner. 2003. Expressivity versus efficiency of graph kernels. Wide Web Conference. 1067–1077.
In Proceedings of the 1st International Workshop on Mining Graphs, Trees and [98] J. Tenenbaum, V. De Silva, and J. Langford. 2000. A global geometric framework
Sequences. 65–74. for nonlinear dimensionality reduction. Science 290 (2000), 2319–2323.
[90] F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, and G. Monfardini. 2009. The [99] G. Tinhofer. 1991. A note on compact graphs. Discrete Applied Mathematics 30
graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2009), (1991), 253–264.
61–80. [100] J. Tönshoff, M. Ritzert, H. Wolf, and M. Grohe. 2019. Graph Neural Networks
[91] M . Schlichtkrull, T.N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling. for Maximum Constraint Satisfaction. ArXiv (CoRR) arXiv:1909.08387 [cs.AI]
2018. Modeling relational data with graph convolutional networks. In Pro- (2019).
ceedings of the European Semantic Web Conference (Lecture Notes in Computer [101] D. Wang, P. Cui, and W. Zhu. 2016. Structural deep network embedding. In
Science), A. Gangemi, R. Navigli, M.-E. Vidal, P. Hitzler, R. Troncy, L. Hollink, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
A. Tordai, and M. Alam (Eds.), Vol. 10843. Springer Verlag, 593–607. Discovery and Data Mining. 1225–1234.
[92] B. Schölkopf, A. Smola, and K.-R. Müller. 1997. Kernel principal component [102] Q. Wang, Z. Mao, B. Wang, and L. Guo. 2017. Knowledge Graph Embedding: A
analysis. In Proceedings of the International Conference on Artificial Neural Net- Survey of Approaches and Applications. IEEE TRANSACTIONS ON KNOWL-
works (Lecture Notes in Computer Science), W. Gerstner, A. Germond, M. Hasler, EDGE AND DATA ENGINEERING 29, 12 (2017), 2724–2743.
and J.D. Nicoud (Eds.), Vol. 1327. Springer Verlag, 583–588. [103] B. Weisfeiler and A. Leman. 1968. The reduction of a graph to canonical form and
[93] S. Shalev-Shwartz and S. Ben-David. 2014. Understanding Machine Learning: the algebgra which appears therein. NTI, Series 2 (1968). English transalation by
From Theory to Algorithms. Cambridge University Press. G. Ryabov avalable at https://www.iti.zcu.cz/wl2018/pdf/wl_paper_translation.
[94] N. Shervashidze, P. Schweitzer, E.J. van Leeuwen, K. Mehlhorn, and K.M. Borg- pdf.
wardt. 2011. Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning [104] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P.S. Yu. 2019. A comprehensive
Research 12 (2011), 2539–2561. survey on Graph Neural Networks. ArXiv arXiv:1901.00596 [cs.Lg] (2019).
[95] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt. 2009. [105] Z. Xinyi and L. Chen. 2019. Capsule Graph Neural Network. In Proceedings of
Efficient graphlet kernels for large graph comparison. In Artificial Intelligence the 7th International Conference on Learning Representations. OpenReviw.net.
and Statistics. 488–495. https://openreview.net/forum?id=Byl8BnRcYm
[96] A.J. Smola and R. Kondor. 2003. Kernels and Regularization on Graphs. In [106] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. 2019. How powerful are graph
Proceedings of the 16th Annual Conference on Computational Learning Theory neural networks?. In Proceedings of the 7th International Conference on Learning
(Lecture Notes in Computer Science), B. Schölkopf and M.K. Warmuth (Eds.), Representations.

Word2vec, Node2vec, Graph2vec, X2vec - Towards A Theory of Vector Embeddings of Structured Data 2003.12590
No ratings yet
Word2vec, Node2vec, Graph2vec, X2vec - Towards A Theory of Vector Embeddings of Structured Data 2003.12590
38 pages
Graph Representation Learning A Survey
No ratings yet
Graph Representation Learning A Survey
21 pages
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
No ratings yet
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
20 pages
Vector Database: Definitive Reference for Developers and Engineers
From Everand
Vector Database: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Peerj Cs 357
No ratings yet
Peerj Cs 357
62 pages
Discriminative Regularized Deep Generative Models For Semi-Supervised Learning
No ratings yet
Discriminative Regularized Deep Generative Models For Semi-Supervised Learning
10 pages
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
2019-Adversarially Regularized Graph Autoencoder For Graph Embedding
No ratings yet
2019-Adversarially Regularized Graph Autoencoder For Graph Embedding
7 pages
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
3/5 (7)
Bayesian Network: Modeling Uncertainty in Robotics Systems
From Everand
Bayesian Network: Modeling Uncertainty in Robotics Systems
Fouad Sabry
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Semantic Network: Fundamentals and Applications
From Everand
Semantic Network: Fundamentals and Applications
Fouad Sabry
No ratings yet
Intro To Vector Embeddings
No ratings yet
Intro To Vector Embeddings
8 pages
Conceptual Dependency Theory: Fundamentals and Applications
From Everand
Conceptual Dependency Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Comprehensive Survey On Deep Graph Representation Learning
No ratings yet
A Comprehensive Survey On Deep Graph Representation Learning
85 pages
Newwhitepaper - Embeddings & Vector Stores
No ratings yet
Newwhitepaper - Embeddings & Vector Stores
51 pages
Whitepaper - Embeddings & Vector Stores
No ratings yet
Whitepaper - Embeddings & Vector Stores
52 pages
Dimensionality Reduction: Advancements in data processing for intelligent systems
From Everand
Dimensionality Reduction: Advancements in data processing for intelligent systems
Fouad Sabry
No ratings yet
The Illustrated Word2vec - Jay Alammar - Visualizing Machine Learning One Concept at A Time
100% (1)
The Illustrated Word2vec - Jay Alammar - Visualizing Machine Learning One Concept at A Time
24 pages
Efficient String Processing with Trie Structures: Definitive Reference for Developers and Engineers
From Everand
Efficient String Processing with Trie Structures: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Knowledge Reasoning: Fundamentals and Applications
From Everand
Knowledge Reasoning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Embeddings - A Simple Guide To Rag
No ratings yet
Embeddings - A Simple Guide To Rag
10 pages
Nozza Et Al. - 2020 - CAGE Constrained Deep Attributed Graph Embedding
No ratings yet
Nozza Et Al. - 2020 - CAGE Constrained Deep Attributed Graph Embedding
15 pages
Word Embeddings
No ratings yet
Word Embeddings
163 pages
Nonlinear Dimensionality Reduction: Advanced Techniques for Enhancing Data Representation in Robotic Systems
From Everand
Nonlinear Dimensionality Reduction: Advanced Techniques for Enhancing Data Representation in Robotic Systems
Fouad Sabry
No ratings yet
GML Part3
No ratings yet
GML Part3
49 pages
GML Part2
No ratings yet
GML Part2
48 pages
Representation Learning On Graphs: Methods and Applications
No ratings yet
Representation Learning On Graphs: Methods and Applications
23 pages
Mastering Vector Databases: The Future of Data Retrieval and AI
From Everand
Mastering Vector Databases: The Future of Data Retrieval and AI
Robert Johnson
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
HoloViews in Scientific Data Visualization: Definitive Reference for Developers and Engineers
From Everand
HoloViews in Scientific Data Visualization: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Network Community Detection Via Neural Embeddings
No ratings yet
Network Community Detection Via Neural Embeddings
11 pages
Bipartite Embedding
No ratings yet
Bipartite Embedding
16 pages
Fast Gen of Kge
No ratings yet
Fast Gen of Kge
14 pages
Word Embeddings A Survey
No ratings yet
Word Embeddings A Survey
11 pages
词向量嵌入综述
No ratings yet
词向量嵌入综述
10 pages
Virtual Intelligence: Fundamentals and Applications
From Everand
Virtual Intelligence: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computational Geometry: Exploring Geometric Insights for Computer Vision
From Everand
Computational Geometry: Exploring Geometric Insights for Computer Vision
Fouad Sabry
No ratings yet
Relationship Extraction: Fundamentals and Applications
From Everand
Relationship Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet
Poincaré Embeddings For Learning Hierarchical Representations PDF
No ratings yet
Poincaré Embeddings For Learning Hierarchical Representations PDF
10 pages
Embeddings Networks
No ratings yet
Embeddings Networks
19 pages
Embeddings in Deep Learning An Introduction
No ratings yet
Embeddings in Deep Learning An Introduction
8 pages
GNN Foundations Frontiers and Applications Chapter2
No ratings yet
GNN Foundations Frontiers and Applications Chapter2
10 pages
unit-II Node Embeddings
No ratings yet
unit-II Node Embeddings
44 pages
Bag of Words Model: Unlocking Visual Intelligence with Bag of Words
From Everand
Bag of Words Model: Unlocking Visual Intelligence with Bag of Words
Fouad Sabry
No ratings yet
Mivar NETs and logical inference with the linear complexity
From Everand
Mivar NETs and logical inference with the linear complexity
Varlamov, Oleg O.
No ratings yet
AI Algorithms: Foundations, Applications, and Advancements
From Everand
AI Algorithms: Foundations, Applications, and Advancements
Anand Vemula
No ratings yet
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
From Everand
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Protvec: Problem Based Learning - July 2
No ratings yet
Protvec: Problem Based Learning - July 2
12 pages
Cloud Computing: Enhancing Robotics Through Distributed Data Processing and Virtual Infrastructure
From Everand
Cloud Computing: Enhancing Robotics Through Distributed Data Processing and Virtual Infrastructure
Fouad Sabry
No ratings yet
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
From Everand
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
Marcin Jamro
No ratings yet
Cumulative Distribution Function: A Mathematical Approach to Probabilistic Modeling in Robotics
From Everand
Cumulative Distribution Function: A Mathematical Approach to Probabilistic Modeling in Robotics
Fouad Sabry
No ratings yet
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
2020 Emnlp-Demos 4
No ratings yet
2020 Emnlp-Demos 4
8 pages
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Artificial Intelligence Frame: Fundamentals and Applications
From Everand
Artificial Intelligence Frame: Fundamentals and Applications
Fouad Sabry
No ratings yet
Technology and Its Impact On Child Development Paper
No ratings yet
Technology and Its Impact On Child Development Paper
7 pages
Phyton Lecture01.note
No ratings yet
Phyton Lecture01.note
34 pages
EMP10
No ratings yet
EMP10
3 pages
Ai Presentation
No ratings yet
Ai Presentation
9 pages
Microsoft: Exam Questions 70-740
100% (1)
Microsoft: Exam Questions 70-740
11 pages
Datasheet RX-series RX300 (En) 760529
No ratings yet
Datasheet RX-series RX300 (En) 760529
2 pages
Create A 5.1 Surround Audio Sequence: Adobe Premiere Pro
No ratings yet
Create A 5.1 Surround Audio Sequence: Adobe Premiere Pro
4 pages
GL Bajaj Dec 2022 14 (2) CHAP-1
No ratings yet
GL Bajaj Dec 2022 14 (2) CHAP-1
8 pages
Zabbix 5.2 in Centos 8 v1
No ratings yet
Zabbix 5.2 in Centos 8 v1
15 pages
GR12-CS
No ratings yet
GR12-CS
8 pages
Q 2 (A) Why Do We Need Video Controller in Computer Graphics Also Define Architecture of Raster Scan Display
No ratings yet
Q 2 (A) Why Do We Need Video Controller in Computer Graphics Also Define Architecture of Raster Scan Display
6 pages
E1 Unit1 ExtMemory MG
No ratings yet
E1 Unit1 ExtMemory MG
24 pages
Dead Space 2 - Manual
No ratings yet
Dead Space 2 - Manual
8 pages
The System Unit Is A Case That Contains Electronic Components of The Computer Used To Process Data
No ratings yet
The System Unit Is A Case That Contains Electronic Components of The Computer Used To Process Data
7 pages
Lab. 5
No ratings yet
Lab. 5
15 pages
Release Notes
No ratings yet
Release Notes
6 pages
E-Health in The Philippines
100% (1)
E-Health in The Philippines
4 pages
Microsoft Excel - Wikipedia
No ratings yet
Microsoft Excel - Wikipedia
28 pages
Haloalkane and Haloarenes
No ratings yet
Haloalkane and Haloarenes
12 pages
Hayes Command Set / Register Formats
100% (1)
Hayes Command Set / Register Formats
5 pages
BPSEcoConfigurator BPSC22 BPSC23 BPSC24 SoftwareManual en 515940091
No ratings yet
BPSEcoConfigurator BPSC22 BPSC23 BPSC24 SoftwareManual en 515940091
136 pages
Database Management System
No ratings yet
Database Management System
9 pages
NP PPT19 CS1-2a MukulMukul 2
No ratings yet
NP PPT19 CS1-2a MukulMukul 2
9 pages
Pik Best
50% (2)
Pik Best
3 pages
Oasys CADrebar 3.1.0
No ratings yet
Oasys CADrebar 3.1.0
7 pages
The Internet of Things With ESP32
80% (5)
The Internet of Things With ESP32
40 pages
Apply The Concept
No ratings yet
Apply The Concept
12 pages
Review Papers2
No ratings yet
Review Papers2
6 pages
Cotton EntranceAdmit
No ratings yet
Cotton EntranceAdmit
1 page
Library Automation Package
No ratings yet
Library Automation Package
26 pages

Word2vec, Node2vec, Graph2vec, X2vec - Towards A Theory of Vector Embeddings of Structured Data

Uploaded by

Word2vec, Node2vec, Graph2vec, X2vec - Towards A Theory of Vector Embeddings of Structured Data

Uploaded by

Keynote PODS ’20, June 14–19, 2020, Portland, OR, USA

word2vec, node2vec, graph2vec, X2vec: Towards a Theory of

Figure 1: Embedding graphs into a vector space

2.1 From Metric Embeddings to Node

2.4 Graph Kernels 2.5 Graph Embeddings

Initialisation: All nodes get the same colour. v1 w2

if there is some colour d such that

Proof. The backward direction is trivial. For the forward direc- G H

G H Another very remarkable result, due to Mančinska and Rober-

4.4 Homomorphism Node Embeddings G |= φ(v) ⇐⇒ H |= φ(w).

∥Mx ∥p , . min max f (v ′ ) v ′ ∈ NG (v) △ w ′ w ∈ N H (f (v)) .

You might also like