0% found this document useful (0 votes)
4 views11 pages

Nguyen 20 C

This paper explores graph classification through the lens of graph homomorphisms, proposing a method that utilizes homomorphism numbers as invariant embedding maps for efficient classification. The authors demonstrate the universality of these homomorphism vectors in approximating F-invariant functions and provide empirical evidence of their effectiveness. The study highlights the advantages of their approach over existing graph learning models, particularly in terms of computational efficiency and expressive power.

Uploaded by

Md
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views11 pages

Nguyen 20 C

This paper explores graph classification through the lens of graph homomorphisms, proposing a method that utilizes homomorphism numbers as invariant embedding maps for efficient classification. The authors demonstrate the universality of these homomorphism vectors in approximating F-invariant functions and provide empirical evidence of their effectiveness. The study highlights the advantages of their approach over existing graph learning models, particularly in terms of computational efficiency and expressive power.

Uploaded by

Md
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Graph Homomorphism Convolution

Hoang NT 1 2 Takanori Maehara 1

Abstract Geometric (deep) learning (Bronstein et al., 2017) is an


important extension of machine learning as it generalizes
In this paper, we study the graph classification
learning methods from Euclidean data to non-Euclidean
problem from the graph homomorphism perspec-
data. This branch of machine learning not only deals with
tive. We consider the homomorphisms from F to
learning irregular data but also provides a proper means to
G, where G is a graph of interest (e.g. molecules
combine meta-data with their underlying structure. There-
or social networks) and F belongs to some family
fore, geometric learning methods have enabled the appli-
of graphs (e.g. paths or non-isomorphic trees).
cation of machine learning to real-world problems: From
We show that graph homomorphism numbers pro-
categorizing complex social interactions to generating new
vide a natural invariant (isomorphism invariant
chemical molecules. Among these methods, graph-learning
and F-invariant) embedding maps which can be
models for the classification task have been the most impor-
used for graph classification. Viewing the ex-
tant subject of study.
pressive power of a graph classifier by the F-
indistinguishable concept, we prove the universal- Let X be the space of features (e.g., X = Rd for some
ity property of graph homomorphism vectors in positive integer d), Y be the space of outcomes (e.g., Y =
approximating F-invariant functions. In practice, {0, 1}), and G = (V (G), E(G)) be a graph with a vertex
by choosing F whose elements have bounded tree- set V (G) and edge set E(G) ⊆ V (G) × V (G). The graph
width, we show that the homomorphism method classification problem is stated follow1 .
is efficient compared with other methods.
Problem 1 (Graph Classification Problem). We are given
a set of tuples {(Gi , xi , yi ) : i = 1, . . . , N } of graphs
1. Introduction Gi = (V (Gi ), E(Gi )), vertex features xi : V (Gi ) → X ,
and outcomes yi ∈ Y. The task is to learn a hypothesis h
1.1. Background such that h((Gi , xi )) ≈ yi . 2
In many fields of science, objects of interest often exhibit
irregular structures. For example, in biology or chemistry, Problem 1 has been studied both theoretically and empiri-
molecules and protein interactions are often modeled as cally. Theoretical graph classification models often discuss
graphs (Milo et al., 2002; Benson et al., 2016). In multi- the universality properties of some targeted function class.
physics numerical analyses, methods such as the finite ele- While we can identify the function classes which these the-
ment methods discretize the sample under study by 2D/3D- oretical models can approximate, practical implementations
meshes (Mezentsev, 2004; Fey et al., 2018). In social stud- pose many challenges. For instance, the tensorized model
ies, interactions between people are presented as a social proposed by (Keriven & Peyré, 2019) is universal in the
network (Barabási et al., 2016). Understanding these irregu- space of continuous functions on bounded size graphs, but
lar non-Euclidean structures have yielded valuable scientific it is impractical to implement such a model. On the other
and engineering insights. With recent successful develop- hand, little is known about the class of functions which can
ments of machine learning on regular Euclidean data such be estimated by some practical state-of-the-art models. To
as images, a natural extension challenge arises: How do we address these disadvantages of both theoretical models and
learn from non-Euclidean data such as networks or meshes practical models, we need a practical graph classification
modeled as graphs? model whose approximation capability can be parameter-
1 ized. Such a model is not only effective in practice, as
RIKEN Center for Advanced Intelligence Project, Tokyo,
Japan 2 Tokyo Institute of Technology, Tokyo, Japan. Correspon- we can introduce inductive bias to the design by the afore-
dence to: Hoang NT <[email protected]>. mentioned parameterization, but also useful in theory as a
framework to study the graph classification problem.
Proceedings of the 37 th International Conference on Machine
1
Learning, Online, PMLR 119, 2020. Copyright 2020 by the au- This setting also includes the regression problem.
2
thor(s). h can be a machine learning model with a given training set.
Graph Homomorphism Convolution

In machine learning, a model often introduces a set of as- In this paper, we focus on simple undirected graphs without
sumptions, which is known as inductive bias. These as- edge weights for simplicity. The extension of all our results
sumptions help narrow down the hypothesis space while to directed and/or weighted graphs is left as future work.
maintaining the validity of the learning model subject to
the nature of the data. For example, a natural inductive 1.3. Related Works
bias for graph classification problems is the invariant to
the permutation property (Maron et al., 2018; Sannai et al., There are two main approaches to construct an embedding:
2019). We are often interested in a hypothesis h that is in- graph kernels and graph neural networks. In the following
variant to isomorphism, i.e., for two isomorphic graphs G1 paragraphs, we introduce some of the most popular methods
and G2 the hypothesis h should produce the same outcome, which directly related to our work. For a more comprehen-
h(G1 ) = h(G2 ). Therefore, it is reasonable to restrict our sive view of the literature, we refer to surveys on graph neu-
attention to only invariant hypotheses. More specifically, ral networks (Wu et al., 2019) and graph kernels (Gärtner,
we focus on invariant embedding maps because we can con- 2003; Kriege et al., 2019).
struct an invariant hypothesis by combining these mappings
with any machine learning model designed for vector data. 1.3.1. G RAPH K ERNELS
Consider the following research question: The kernel method first defines a kernel function on the
Question 2. How to design an efficient and invariant em- space, which implicitly defines an embedding ρ such that
bedding map for the graph classification problem? the inner product of the embedding vectors gives a kernel
function. Graph kernels implement ρ by counting methods
1.2. Homomorphism Numbers as a Classifier or graph distances (often exchangeable measures). There-
fore, they are isomorphism-invariant by definition.
A common approach to Problem 1 is to design an embed-
ding3 ρ : (G, x) 7→ ρ((G, x)) ∈ Rp , which maps graphs to The graph kernel method is the most popular approach for
vectors, where p is the dimensionality of the representation. studying graph embedding maps. Since designing a kernel
Such an embedding can be used to represent a hypothesis which uniquely represents graphs up to isomorphisms is as
for graphs as h((G, x)) = g(ρ((G, x)) by some hypothesis hard as solving graph isomorphism (Gärtner et al., 2003),
g : Rp → Y for vectors. Because the learning problem on many previous studies on graph kernels have focused on
vectors is a well-studied problem, we can focus on designing proposing a solution to the trade-off between computational
and understanding graph embedding. efficiency and representability. A natural idea is to compute
subgraph frequencies (Gärtner et al., 2003) to use as graph
We found that using homomorphism numbers as an invariant embeddings. However, counting subgraphs is a #W[1]-hard
embedding is not only theoretically valid but also extremely problem (Flum & Grohe, 2006) and even counting induced
efficient in practice. In a nutshell, the embedding for a subgraphs is an NP-hard problem (more precisely it is an
graph G is given by selecting k pattern graphs to form a #A[1]-hard problem (Flum & Grohe, 2006)). Therefore,
fixed set F, then computing the homomorphism numbers methods like the tree kernel (Collins & Duffy, 2002; Mahé
from each F ∈ F to G. The classification capability of & Vert, 2009) or the random walk kernel (Gärtner et al.,
the homomorphism embedding is parameterized by F. We 2003; Borgwardt et al., 2005) restrict the subgraph family
develop rigorous analyses for this idea in Section 2 (without to some computationally efficient graphs. Regarding graph
vertex features) and Section 3 (with vertex features). homomorphism, Gärtner et al. and also Mahé & Vert studied
Our contribution is summarized as follows: a relaxation which is similar to homomorphism counting
(walks and trees). Especially, Mahé & Vert showed that the
• Introduce and analyze the usage of weighted graph ho-
tree kernel is efficient for molecule applications. However,
momorphism numbers with a general choice of F. The
their studies limit to tree kernels and it is not known to what
choice of F is a novel way to parameterize the capabil-
extend these kernels can represent graphs.
ity of graph learning models compared to choosing the
tensorization order in other related work. More recently, the graphlet kernel (Shervashidze et al., 2009;
• Prove the universality of the homomorphism vector Pržulj et al., 2004) and the Weisfeiler-Lehman kernel (Sher-
in approximating F-indistinguishable functions. Our vashidze et al., 2011; Kriege et al., 2016) set the state-of-the-
main proof technique is to check the condition of the art for benchmark datasets (Kersting et al., 2016). Other sim-
Stone-Weierstrass theorem. ilar kernels with novel modifications to the distance function,
such as Wasserstein distance, are also proposed (Togninalli
• Empirically demonstrate our theoretical findings with
et al., 2019). While these kernels are effective for bench-
synthetic and benchmark datasets. We show that our
mark datasets, some are known to be not universal (Xu et al.,
methods perform well in graph isomorphism test.
2019; Keriven & Peyré, 2019) and it is difficult to address
3
Not to be confused with “vertex embedding”. their expressive power to represent graphs.
Graph Homomorphism Convolution

network f maps only G and graphs isomorphic to it to


WL-indistinguishable
f (G). On the other hand, efficient implementation of f can
WL-k only maps some F-indistinguishable of G to f (G). This
isomorphic to ernel/
GIN paper shows that graph homomorphism vectors with some
Tensorized GN
N polynomials universally approximate F-invariant functions.

2. Graphs without Features


We first establish our theoretical framework for graphs with-
out vertex features. Social networks are often feature-less
graphs, in which only structural information (e.g. hyper-
Figure 1. A visualization of Graph Neural Networks’ expressive links, friendships, etc.) is captured. The main result of this
power. An “ideal” GNN, for instance the tensorized GNN by
section is to show that using homomorphism numbers with
Keriven & Peyré, maps graphs that are isomorphic to G to f (G).
some polynomial not only yields a universally invariant ap-
In contrast, WL-kernel (Shervashidze et al., 2011) and (ideal)
GIN (Xu et al., 2019) is limited by the WL-indistinguishable set so proximator but that we can also select the pattern set F for
they might map graphs which are non-isomorphic to G to f (G). some targeted applications.

1.3.2. G RAPH N EURAL N ETWORKS 2.1. Definition

Graph Neural Networks refers to a new class of graph clas- An (undirected) graph G = (V (G), E(G)) is simple if it
sification models in which the embedding map ρ is imple- has neither self-loops nor parallel edges. We denote by G
mented by a neural network. In general, the mapping ρ the set of all simple graphs.
follows an aggregation-readout scheme (Hamilton et al., Let G be a graph. For a finite set U and a bijection
2017; Gilmer et al., 2017; Xu et al., 2019; Du et al., 2019) σ : V (G) → U , we denote by Gσ the graph defined by
where vertex features are aggregated from their neighbors V (Gσ ) = U and E(Gσ ) = {(σ(u), σ(v)) : (u, v) ∈
and then read-out to obtain the graph embedding. Empir- E(G)}. Two graphs G1 and G2 are isomorphic if Gσ1 = G2
ically, especially on social network datasets, these neural for some bijection σ : V (G1 ) → V (G2 ).
networks have shown better accuracy and inference time
than graph kernels. However, there exist some challenging 2.2. Homomorphism Numbers
cases where these practical neural networks fail, such as
Circular Skip Links synthetic data (Murphy et al., 2019) or Here, we introduce the homomorphism number. This is a
bipartite classification (Section 4). well-studied concept in graph theory (Hell & Nesetril, 2004;
Lovász, 2012) and plays a key role in our framework.
Theoretical analysis of graph neural networks is an active
topic of study. The capability of a graph neural network has Let F and G be undirected graphs. A homomorphism
been recently linked to the Weisfeiler-Lehman isomorphism from F to G is a function π : V (F ) → V (G) that pre-
test (Morris et al., 2019; Xu et al., 2019). Since Morris et al. serves the existence of edges, i.e., (u, v) ∈ E(F ) implies
and Xu et al. proved that the aggregation-readout scheme (π(u), π(v)) ∈ E(G). We denote by Hom(F, G) the set of
is bounded by the one-dimensional Weisfeiler-Lehman test, all homomorphisms from F to G. The homomorphism num-
much work has been done to quantify and improve the capa- ber hom(F, G) is the cardinality of the homomorphisms,
bility of graph neural networks via the tensorization order. i.e., hom(F, G) = |Hom(F, G)|. We also consider the ho-
Another important aspect of graph neural networks is their momorphism density t(F, G). This is a normalized version
ability to approximate graph isomorphism equivariant or of the homomorphism number:
invariant functions (Maron et al., 2018; 2019; Keriven &
Peyré, 2019). Interestingly, Chen et al. showed that isomor- hom(F, G)
t(F, G) = (1)
phism testing and function approximation are equivalent. |V (G)||V (F )|
X Y 1
The advantage of tensorized graph neural networks lies in =
their expressive power. However, the disadvantage is that |V (G)|
π:V (F )→V (G) u∈V (F )
the tensorization order makes it difficult to have an intuitive Y
× 1[(π(u), π(v)) ∈ E(G)], (2)
view of the functions which need to be approximated. Fur-
(u,v)∈E(F )
thermore, the empirical performance of these models might
heavily depends on initialization (Chen et al., 2019).
where 1[·] is the Iverson braket. Eq. (2) can be seen as the
Figure 1 visualizes the interaction between function approx- probability that randomly sampled |V (F )| vertices of V (G)
imation and isomorphism testing. An ideal graph neural preserve the edges of E(F ). Intuitively, a homomorphism
Graph Homomorphism Convolution

number hom(F, G) aggregates local connectivity informa-


Table 1. Meaning of F-indistinguishable
tion of G using a pattern graph F .
F F-indistinguishable
Example 3. Let ◦ be a single vertex, we have hom(◦, G) =
|V (G)| and hom( , G) = 2|V (E)|. single vertex graphs have the same number of
vertices (Example 3)
Sk be the star graph of size k + 1. Then,
Example 4. LetP
hom(Sk , G) ∝ u∈V (G) d(u)k , where d(u) is the degree single edge graphs have the same number of
of vertex u. edges (Example 3)
Example 5. We have: hom(Ck , G) ∝ tr(Ak ), where Ck stars graphs have the same degree se-
is a length k cycle and A is the adjacency matrix of G. quence (Example 4)
cycles adjacency matrices have the same
It is trivial to see that the homomorphism number is invariant eigenvalues (Example 5)
under isomorphism. Surprisingly, the converse holds as
homomorphism numbers identify the isomorphism class of treewidth up to k graphs cannot be distinguished
a graph. Formally, we have the following theorem. by the k-dimensional Weisfeiler-
Lehman test (Dell et al., 2018)
Theorem 6 ((Lovász, 1967)). Two graphs G1 and G2 are
isomorphic if and only if hom(F, G1 ) = hom(F, G2 ) for all simple graphs isomorphic graphs (Lovász, 1967)
all simple graphs F . In addition, if |V (G1 )|, |V (G)2 | ≤ n
then we only have to examine F with |V (F )| ≤ n.
rately approximated by a function of the F-homomorphism
2.3. Homomorphism Numbers as Embeddings embedding (Theorem 7 and 8).

The isomorphism invariance of the homomorphism numbers 2.4. Expressive Power: Universality Theorem
motivates us to use them as the embedding vectors for a
graph. Since examining all graphs will be impractical (i.e. By characterizing the class of functions that is represented
F = G), we select a subset F ⊆ G as a parameter for the by hom(F, G), we obtain the following results.
graph embedding. We obtain the embedding vector of a Theorem 7. Let f be an F-invariant function. For any
graph G by stacking the the homomorphism numbers from positive integer N , there exists a degree N polynomial hN
F ∈ F. When F = G, this is known as the Lovász vector. of hom(F, G) s.t. f (G) ≈ hN (G) ∀ G with |V (G)| ≤ N .
hom(F, G) = [ hom(F, G) : F ∈ F ] ∈ R|F | . Theorem 8. Let f be a continuous F-invariant function.
There exists a degree N polynomial hN of hom(F, G) (F ∈
We focus on two criterion: Expressive capability and com- F) such that f (G) ≈ hN (G) ∀G ∈ G.
putational efficiency. Similar to the kernel representability
and efficiency trade-off, a more expressive homomorphism Proof of Theorem 7. Since |V (G)| ≤ N , the graph space
embedding map is usually less efficient and vice versa. contains a finite number of points; therefore, under the dis-
crete topology, the space is compact Hausdorff.5 Let X be
Graphs G1 and G2 are defined to be F-indistinguishable a set of points (e.g., graphs). A set of functions A separates
if hom(F, G1 ) = hom(F, G2 ) for all F ∈ F (Böker et al., X if for any two different points G1 , G2 ∈ X, there exists
2019). Theorem 6 implies that the F-indistinguishability a function h ∈ A such that h(G1 ) 6= h(G2 ). By this separa-
generalizes graph isomorphism. For several classes F, the bility and the Stone–Weierstrass theorem (Theorem 9), we
interpretation of F-indistinguishability is studied; the results conclude the proof.
are summarized in Table 1. The most interesting result is
Theorem 7 is the universal approximation theorem for
the case when F ∈ F has treewidth4 at most k where
bounded size graphs. This holds without any assumption
the F-indistinguishability coincides with the k-dimensional
of the target function f . It is worth mentioning that the
Weisfeiler–Lehman isomorphism test (Dell et al., 2018).
invariant/equivariant universality results of tensorized graph
A function f : G → R is F-invariant if f (G1 ) = f (G2 ) neural networks on this bounded size setting were proven
for all F-indistinguishable G1 and G2 ; therefore, if we use by (Keriven & Peyré, 2019); the unbounded case remains
the F-homomorphism as an embedding, we can only repre- an open problem. Theorem 8 is the universal approximation
sent F-invariant functions. In practice, F should be chosen for all graphs (unbounded). This is an improvement to the
as small as possible such that the target hypothesis can be previous works. However, our theorem only holds for con-
assumed to be F-invariant. In the next section, we show tinuous functions, where the topology of the space has to
that any continuous F-invariant function is arbitrary accu- satisfy the conditions of the Stone-Weierstrass theorem.
4 5
Smaller treewidth implies the graph is more “tree-like”. In this topology, any function is continuous.
Graph Homomorphism Convolution

Theorem 9 (Stone–Weierstrass Theorem (Hart et al., 2003)). Algorithm 1 Compute hom(F, (G, x))
Let X be a compact Hausdorff space and C(X) be the set Input: target graph G, pattern graph F , vertex features x
of continuous functions from X to R. If a subset A ⊆ C(X) function recursion(current, visited)
separates X then the set of polynomials of A is dense in hom x ← x
C(X) w.r.t. the topology of uniform convergence. for y in F .neighbors(current) do
if y 6= visited then
In the unbounded graph case (Theorem 8), the input space
hom y ← Precursion(y, current)
contains infinitely many graphs; therefore, it is not compact
aux ← [ hom y[G.neighbors(i)] for i in V (G)]
under the discrete topology. Hence, we cannot directly
hom x ← hom x ∗ aux (element-wise mult.)
apply the Stone–Weierstrass theorem as in the bounded case.
end if
To obtain a stronger result, we have to complete the set
end for
of all graphs, and prove the completed space is compact
return hom x
Hausdorff. Since it is non-trivial to work directly with
end function
discrete graphs, we find that the graphon theory (Lovász, P
Output: recursion(0, -1)
2012) fits our purpose.

Graphon A sequence of graphs G1 , G2 , . . . is a conver-


that {u, v} ⊆ B(t), and (3) for any u ∈ V (F ) the
gence if the homomorphism density, t(F, Gi ), is a conver-
set {t ∈ V (T ) : u ∈ B(t)} is connected in T . The
gence for all simple graph F . A limit of a convergence is
treewidth (abbreviated as “tw”) of F is the minimum of
called a graphon, and the space obtained by adding the lim-
maxt∈V (T ) |B(t) − 1| for all tree-decomposition T of F .
its of the convergences is called the graphon space, which
is denoted by G. See (Lovász, 2012) for the detail of this Theorem 12 ((Dı́az et al., 2002)). For any graphs F and
construction. The following theorem is one of the most G, the homomorphism number hom(F, G) is computable
important results in graphon theory. in O(|V (G)|tw(F )+1 ) time.
Theorem 10 (Compactness Theorem (Lovász, 2012; The most useful case will be when F is the set of trees of size
Lovász & Szegedy, 2006)). The graphon space G with the at most k. The number of trees of size k is a known integer
cut distance δ is compact Hausdorff. sequence6 . There are 106 non-isomorphic trees of size k =
Now we can prove the graphon version of Theorem 8. 10, which is computationally tractable in practice. Also, in
this case, the algorithm for computing hom(F, G) is easily
Theorem 11. Any continuous F-invariant function f : G → implemented by dynamic programming with recursion as in
R is arbitrary accurately approximated by a polynomial of Algorithm 1. This algorithm runs in O(|V (G)| + |E(G)|)
{t(F, ·) : F ∈ F}. time. For the non-featured case, we sets x(u) = 1 ∀u ∈
Proof. The F-indistinguishability forms a closed equiva- V (G). The simplicity of Algorithm 1 comes from the fact
lence relation on G, where the homomorphism density is that if F is a tree then we only need to keep track of a
used instead of the homomorphism number. Let G/F be vertex’s immediate ancestor when we process that vertex by
the quotient space of this equivalence relation, which is the visited argument in the function recursion.
compact Hausdorff in the quotient topology.
By the definition of the quotient topology, any continuous F- 3. Graphs with Features
invariant function is identified as a continuous function on Biological/chemical datasets are often modeled as graphs
G/F. Also, by the definition, the set of F-homomorphisms with vertex features (attributed graphs). In this section, we
separates the quotient space. Therefore, the conditions of the develop results for graphs with features.
Stone–Weierstrass theorem (Theorem 9) are fulfilled.
3.1. Definition
2.5. Computational Complexity: Bounded Treewidth
A vertex-featured graph is a pair (G, x) of a graph G and a
Computing homomorphism numbers is, in general, an #P-
function x : V (G) → X , where X = [0, 1]p .
hard problem (Dı́az et al., 2002). However, if the pattern
graph F has bounded treewidth, homomorphism numbers Let (G, x) be a vertex-featured graph. For a finite set U
can be computed in polynomial time. and a bijection σ : V (G) → U , we denote by xσ the feature
vector on Gσ such that xσ (σ(u)) = x(u). Two vertex-
A tree-decomposition (Robertson & Seymour, 1986) of
featured graphs (G1 , x1 ) and (G2 , x2 ) are isomorphic if
a graph F is a tree T = (V (T ), SE(T )) with mapping Gσ1 = G2 and xσ1 = x2 for some σ : V (G1 ) → V (G2 ).
B : V (T ) → 2V (F ) such that (1) t∈V (T ) B(t) = V (F ),
6
(2) for any (u, v) ∈ E(F ) there exists t ∈ V (T ) such https://oeis.org/A000055
Graph Homomorphism Convolution

3.2. Weighted Homomorphism Numbers the subgraphs induced by vertices whose features are lexico-
graphicallly smaller than or equal to x1 (u) are isomorphic.
We first consider the case that the features are non-negative
However, this contradicts the minimality of the ordering of
real numbers. Let x(u) denote the feature of vertex u,
V (G2 ). Finally, by taking a continuous approximation of φ,
weighted homomorphism number is defined as follow:
we obtain the theorem.
X Y
hom(F, (G, x)) = x(π(u)), (3)
π∈Hom(F,G) u∈V (F ) 3.3. (F, φ)-Homomorphism Number

and weighted homomorphism density is defined by Let (G, x) be a vertex-featured graph. For a simple graph F
and a function φ : Rp → R, (F, φ)-convolution is given by
P
t(F, (G, x)) = hom(F, (G, x(u)/ v∈V (G) x(v))). This
definition coincides with the homomorphism number and X Y
hom(F, G, x; φ) = φ(x(π(u))). (4)
density if x(u) = 1 for all u ∈ V (G).
π∈Hom(F,G) u∈V (F )
The weighted version of the Lovász theorem holds as fol-
lows. We say that two vertices u, v ∈ V (G) are twins The (F, φ)-convolution first transform the vertex features
if the neighborhood of u and v are the same. The twin- into real values by the encoding function φ. Then this ag-
reduction is a procedure that iteratively selects twins u gregates the values by the pattern graph F . The aggregation
and v, contract them to create new vertex uv, and assign part has some similarity with the convolution in CNNs.
x(uv) = x(u) + x(v) as a new weight. Note that the result Thus, we call this operation “convolution.”
of the process is independent of the order of contractions. Example 15. Let ◦ be a singleton graph and φ be the i-th
Theorem 13 ((Freedman et al., 2007), (Cai & Govorov, component of the argument. Then,
2019)). Two graphs (G1 , x1 ) and (G2 , x2 ) are isomorphic hom(F, G, x; φ) =
X
xi (u). (5)
after the twin-reduction and removing vertices of weight
u∈V (G)
zero if and only if hom(F, (G1 , x1 )) = hom(F, (G2 , x2 ))
for all simple graph F . Example 16. Let be a graph of one edge and φ be the
i-th component of the argument. Then,
Now we can prove a generalization of the Lovaász theorem. X
Theorem 14. Two graphs (G1 , x1 ) and (G2 , x2 ) are hom( , G, x; φ) = xi (u)xi (v). (6)
(u,v)∈E(G)
isomorphic if and only if hom(F, φ, (G1 , x1 )) =
hom(F, φ, (G2 , x2 )) for all simple graph F and some con-
tinuous function φ. Algorithm 1 implements this idea when φ is the identity
function. We see that the (F, φ)-convolution is invariant
Proof. It is trivial to see that if (G1 , x1 ) and (G2 , x2 ) are under graph isomorphism in the following result.
isomorphic then they produce the same homomorphism Theorem 17. For a simple graph F , a function φ : Rp → R,
numbers. Thus, we only have to prove the only-if part. a vertex-featured graph (G, x), and a permutation σ on
V (G), we have
Suppose that the graphs are non-isomorphic. By setting
φ = 1, we have the same setting as the feature-less case; hom(F, G, x; φ) = hom(F, Gσ , xσ , φ). (7)
hence, by Theorem 6, we can detect the isomorphism classes
of the underlying graphs. Proof. Hom(F, Gσ ) = {σ ◦ π : π ∈ Hom(F, G)}. There-
fore, we have:
Assuming G1 and G2 are isomorphic, we arrange the ver- X Y
tices of V (G1 ) in the increasing order of the features (com- hom(F, Gσ , xσ ; φ) = φ(xσ (σ ◦ π(u)))
pared with the lexicographical order). Then, we arrange π∈Hom(F,G) u∈V (F )
the vertices of V (G2 ) lexicographically smallest while the
corresponding subgraphs induced by some first vertices are From the definition, we have xσ (σ ◦ π(u)) = x(π(u)).
isomorphic. Let us choose the first vertex u ∈ V (G1 ) whose
feature is different to the feature of the corresponding vertex Theorem 17 indicates that for any F and φ, the (F, φ)-
in V (G2 ). Then, we define convolution can be used as a feature map for graph classi-
( fication problems. To obtain a more powerful embedding,
1, z ≤lex x1 (u), we can stack multiple (F, φ)-convolutions. Let F be a (pos-
φ(z) = sibly infinite) set of finite simple graphs and Φ be a (pos-
0, otherwise,
sibly infinite) set of functions. Then (F, Φ)-convolution,
where ≤lex stands for the lexicographical order. Then, we hom(F, G, x; Φ), is a (possibly infinite) vector:
have hom(F, φ, (G1 , x1 )) 6= hom(F, φ, (G2 , x2 )) as fol-
lows. Suppose that the equality holds. Then, by Theorem 13, [ hom(F, G, x; φ) : F ∈ F, φ ∈ Φ ] . (8)
Graph Homomorphism Convolution

By Theorem 17, for any F and Φ, the (F, Φ)-convolution If G and G0 are non-isomorphic, by (Lovász, 1967),
is invariant under the isomorphism. Hence, we propose to hom(F, G, x; 1) 6= hom(F, G0 , y; 1) where 1 is the func-
use (F, Φ)-convolution as a embedding of graphs. tion that takes one for any argument.
Now we consider the case that G = G0 . Let {1, . . . , n}
3.4. (F, φ)-Homomorphism Number as Embedding be the set of vertices of G. Without loss of generality, we
Let Φ be a set of continuous functions. As same as the fea- assume x(1) ≤ x(2) ≤ . . . where ≤ is the lexicographical
tureless case, we propose to use the (F, Φ)-homomorphism order. Now we find a permutation π such that G = Gπ and
numbers as an embedding. We say that two featured y(π(1)), y(π(2)), . . . are lexicographically smallest. Let
graphs (G1 , x1 ) and (G2 , x2 ) are (F, Φ)-indistinguishable u ∈ {1, . . . , n} be the smallest index such that x(u) 6= y(u).
if hom(F, φ, (G1 , x1 )) = hom(F, φ, (G2 , x2 )) for all F ∈ By the definition, x(u) ≤ y(u). We choose ψ by
F and φ ∈ Φ. A function f is (F, Φ)-invariant if (
f (G1 , x1 ) = f (G2 , x2 ) for all (F, Φ)-indistinguishable 1, x ≤ x(u),
ψ(x) = (9)
(G1 , x1 ) and (G2 , x2 ). 0, otherwise.

3.5. Universality Theorem Then, there exists F ∈ F such that hom(F, G, x; ψ) 6=


hom(F, G, y; ψ) because the graphs induced by {1, . . . , k}
The challenge in proving the universality theorem for the and {π(1), . . . , π(k)} are non-isomorphic because of the
featured setting is similar to the featureless case, which is the choice of π.
difficulty of the topological space. We consider the quotient
Now we approximate ψ by a continuous function φ. Be-
space of graphs with respect to (F, Φ)-indistinguishability.
cause (F, φ)-convolution is continuous in the vertex weights
Our goal is to prove this space is completed to a compact
(i.e., φ(x(u))), by choosing φ sufficiently close to ψ, we get
Hausdorff space.
hom(F, G, x; φ) 6= hom(F, G, y; φ).
With a slight abuse of notation, consider a function ι that
maps a vertex featured graph (G, x) to a |Φ|-dimensional We say that a sequence (Gi , xi ) (i = 1, 2, . . . ) of fea-
vector [(G, φ(x) : φ ∈ Φ] ∈ (G/F)Φ where each coordi- tured graphs is an (F, Φ)-convergent if for each F ∈
nate is an equivalence class of F-indistinguishable graphs. F and φ ∈ Φ the sequence hom(F, Gi , xi ; φ) (i =
This space has a bijection to the quotient space by (F, Φ)- 1, 2, . . . ) is a convergent in R. A function f : (G, x) 7→
indistinguishability. Each coordinate of the |Φ|-dimensional f (G, x) is (F, Φ)-continuous if for any (F, Φ)-convergent
space is completed to a compact Hausdorff space (Borgs (Gi , xi ) (i = 1, 2, . . . ), the limit limi→∞ f (Gi , xi ) of
et al., 2008). Therefore, by the Tychonoff product theo- the function exists and its only depends on the limits
rem (Hart et al., 2003), the |Φ|-dimensional space is com- limi→∞ hom(F, Gi , xi , φ) of the homomorphism convo-
pact. The bijection between the quotient space shows the lutions for all F ∈ F and φ ∈ Φ.
quotient space is completed by a compact Hausdorff space.
We denote this space by G. Under this space, we have the Now we prove the universality theorem. Let H be a dense
following result. subset of the set of continuous functions, e.g., the set of
polynomials or the set of functions represented by a deep
Theorem 18. Any continuous (F, Φ)-invariant function neural network. We define H(G; F, Φ) by
G → R is arbitrary accurately approximated by a polyno-
mial of (G, x) 7→ t(F, (G, φ(x))). H(G; F, Φ) =
 
Proof. The space G is compact by construction. The sep-
 X 
hF,φ (hom(F, ·; φ) : hF,φ ∈ H (10)
arability follows from the definition of (F, Φ)-invariant. 
F ∈F ,φ∈Φ

Therefore, by the Stone–Weierstrass theorem, we complete
the proof. where the argument of the function is restricted to G. This
is the set of functions obtained by combining universal
The strength of an embedding is characterized by the sepa- approximators in H and the homomorphism convolutions
rability. hom(F, G, x, φ) for some F ∈ F and φ ∈ Φ. Let G be
Lemma 19. Let F be the set of all simple graphs and Φ a set of graphs, and let C(G; F, Φ) be the set of (F, Φ)-
be the set of all continuous functions from [0, 1]p to [0, 1]. continuous functions defined on G. Then, we obtain the
Then, (G, x) 7→ hom(F, G, x; Φ) is injective. following theorem.
Theorem 20 (Universal Approximation Theorem). Let G
Proof. Let (G, x) and (G0 , y) be two non-isomorphic be a compact set of graphs whose number of vertices are
vertex-featured graphs. We distinguish these graphs by the bounded by a constant. Then, H(G; F, Φ) is dense in
homomorphism convolution. C(G; F, Φ).
Graph Homomorphism Convolution

Proof. Because the number of vertices are bounded, the Table 2. Classification accuracy over 10 experiments
(a) Synthetic datasets
space of converging sequences is identified as G. Therefore,
this space is compact Hausdorff. The separability is proved M ETHODS CSL BIPARTITE PAULUS25
in Lemma 19. Hence, we can use the Stone–Weierstrass Practical models
theorem to conclude this result. GIN 10.00 ± 0.00 55.75 ± 7.91 7.14 ± 0.00
GNTK 10.00 ± 0.00 58.03 ± 6.84 7.14 ± 0.00
3.6. The Choice of F and Φ Theory models
Ring-GNN 10∼80 ± 15.7 55.72 ± 6.95 7.15 ± 0.00
In an application, we have to choose F and Φ appropriately. GHC-Tree 10.00 ± 0.00 52.68 ± 7.15 7.14 ± 0.00
The criteria of choosing them will be the same as with non- GHC-Cycle 100.0 ± 0.00 100.0 ± 0.00 7.14 ± 0.00
featured case: Trade-off between representability and effi-
ciency. Representability requires that (F, Φ)-convolutions (b) Benchmark datasets
can separate the graphs in which we are interested in. Effi-
M ETHODS MUTAG IMDB-BIN IMDB-MUL
ciency requires that (F, Φ)-convolutions must be efficiently
Practical models
computable. This trivially limits both F and Φ as finite sets.
GNTK 89.46 ± 7.03 75.61 ± 3.98 51.91 ± 3.56
The choice of Φ will depend on the property of the vertex GIN 89.40 ± 5.60 70.70 ± 1.10 43.20 ± 2.00
features. We will include the constant function 1 if the PATCHY-SAN 89.92 ± 4.50 71.00 ± 2.20 45.20 ± 2.80
WL kernel 90.40 ± 5.70 73.80 ± 3.90 50.90 ± 3.80
topology of the graph is important. We will also include
Theory models
the i-th component of the arguments. If we know some
Ring-GNN 78.07 ± 5.61 73.00 ± 5.40 48.20 ± 2.70
interaction between the features is important, we can also GHC-Tree 89.28 ± 8.26 72.10 ± 2.62 48.60 ± 4.40
include the cross terms. GHC-Cycles 87.81 ± 7.46 70.93 ± 4.54 47.41 ± 3.67
The choice of F relates with the topology of the graphs of
interest. If Φ = {1} where 1 is the constant function, the
homomorphism convolution coincides with the homomor- GHC-Cycle We let Fcycle(8) to be all simple cycles of size
phism number (Table 1). at most 8. This variant of GHC cannot distinguish iso-
spectral graphs. The i-th dimensional of the embedding
Here, we focus on the efficiency. In general, computing vector is
hom(F, G, x, φ) is #P-hard. However, it is computable in
polynomial time if F has a bounded tree The treewidth GHC-Cycle(G)i = hom(Fcycle(8) [i], G).
of a graph F , denoted by tw(F ), is a graph parameter that
measures the tree-likeness of the graph. The following result
With this configuration, GHC-Tree(G) has 13 dimensions
holds.
and GHC-Cycle(G) has 7 dimensions.
Theorem 21. hom(F, G, x; φ) is computable in
|V (G)|tw(F )+1 time, tw(F ) is the treewidth of F . Other methods To compare our performance with other
approaches, we selected some representative methods.
4. Experimental results GIN (Xu et al., 2019) and PATCHY-SAN (Niepert et al.,
2016) are representative of neural-based methods. WL-
4.1. Classification models kernel (Shervashidze et al., 2011) is a widely used efficient
method for graph classifications. GNTK (Du et al., 2019)
The realization of our ideas in Section 2 and Section 3 are
is a recent neural tangent approach to graph classification.
called Graph Homomorphism Convolution (GHC-*) models
We also include results for Ring-GNN (Chen et al., 2019) as
(due to their resemblance to the R−convolution (Haussler,
this recent model used in theoretical studies performed well
1999)). Here, we give specific formulations for two practi-
in the Circular Skip Links synthetic dataset (Murphy et al.,
cal embedding maps: GHC-Tree and GHC-Cycle. These
2019). Except for setting the number of epochs for GIN to
embedding maps are then used to train a classifier (Support
be 50, we use the default hyperparameters provided by the
Vector Machine). We report the 10-folds cross-validation
original papers. More details for hyperparamters tuning and
accuracy scores and standard deviations in Table 2.
source code is available in the Supplementary Materials.

GHC-Tree We let Ftree(6) to be all simple trees of size at


4.2. Synthetic Experiments
most 6. Algorithm 1 implements Equation 3 for this case.
Given G and vertex features x, the i-th dimension of the Bipartite classification We generate a binary classifica-
embedding vector is tion problem consisting of 200 graphs, half of which are
random bipartite graphs with density p = 0.2 and the other
GHC-Tree(G)i = hom(Ftree(6) [i], (G, x)). half are Erdős-Rényi graphs with density p = 0.1. These
Graph Homomorphism Convolution

graphs have from 40 to 100 vertices. According to Table 1, GHC-Tree GHC-Cycle GIN GNTK
GHC-Cycle should work well in this case while GHC-Tree
can not learn which graph is bipartite. More interestingly, 103

as shown in Table 2, other practical models also can not


work with this simple classification problem due to their 102
capability limitation (1-WL).

runtime
101
Circular Skip Links We adapt the synthetic dataset used
by (Murphy et al., 2019) and (Chen et al., 2019) to demon-
100
strate another case where GIN, Relational Pooling (Murphy
et al., 2019), and Order 2 G-invariant (Maron et al., 2018)
do not perform well. Circular Skip Links (CSL) graphs are MUTAG BIPARTITE CSL IMDBMULTI
datasets
undirected regular graphs with the same degree sequence
(4’s). Since these graphs are not cospectral, GHC-Cycle can
easily learn them with 100% accuracy. Chen et al. men- Figure 2. Runtime (sec) in log-scale for one 10-folds run
tioned that the performance of GNN models could vary
due to randomness (accuracies ranging from 10% to 80%).
However, it is not the case for GHC-Cycle. CSL classifi-
cation results shows another benefit of using F patterns as bounded tree-width condition (Dı́az et al., 2002). Figure 2
an inductive bias to implement a strong classifier without shows that our method runs much faster than other practi-
the need of additional features like Ring-GNN-SVD (Chen cal models. The results are recorded from averaging total
et al., 2019). runtime in seconds for 10 experiments, each computes the
10-folds cross-validation accuracy score. In principle, GHC
can be linearly distributed to multiple processes to further
Paulus graphs We prepare 14 non-isomorphic cospectral reduce the computational time making it an ideal baseline
strongly regular graphs known as the Paulus graphs7 and model for future studies.
create a dataset of 210 graphs belonging to 14 isomorphic
groups. This is a hard example because these graphs have
exactly the same degree sequence and spectrum. In our ex- 5. Conclusion
periments, no method achieves accuracy higher than random In this work we contribute an alternative approach to the
guesses (7.14%). This is a case when exact isomorphism question of quantifying a graph classification model’s ca-
tests clearly outperform learning-based methods. In our pability beyond the tensorization order and the Weisfeiler-
experiments, homomorphisms up to graph index 100 of Net- Lehman isomorphism test. In principle, tensorized graph
workX’s graph atlas still fail to distinguish these isomorphic neural networks can implement homomorphism numbers,
classes. We believe further studies of this case could be hence our work is in coherence with prior work. However,
fruitful to understand and improve graph learning. we find that the homomorphism from F to G is a more
“fine-grained” tool to analyze graph classification problems
4.3. Benchmark Experiments as studying F would be more intuitive (and graph-specific)
We select 3 datasets from the TU Dortmund data collec- than studying the tensorization order. Since GHC is a more
tion (Kersting et al., 2016): MUTAG dataset (Debnath et al., restricted embedding compared to tensorized graph neural
1991), IMDB-BINARY, and IMDB-MULTI (Yanardag & networks such as the model proposed by (Keriven & Peyré,
Vishwanathan, 2015). These datasets represent with and 2019), the universality result of GHC can be translated to the
without vertex features graph classification settings. We universality result of any other model that has the capability
run and record the 10-folds cross-validation score for each to implement the homomorphism numbers.
experiment. We report the average accuracy and standard de- Another note is about the proof for Theorem 8 (universality
viation of 10 experiments in Table 2. More experiments on on unbounded graphs). In order to prove this result, we
other datasets in the TU Dortmund data collection, as well made an assumption about the topology of f and also about
as the details of each dataset, are provided in the Appendix. the graph of interest belongs to the graphon space. While
the graphon space is natural in our application to prove the
4.4. Running time universality, there are a few concerns. First, we assumed
that the graphons exist for graphs of interest. However, it
Although homomorphism counting is #P-complete in gen-
might not be true in general. Second, graph limit theory is
eral, polynomial and linear time algorithms exist under the
well-studied in dense graphs while sparse graph problems
7
https://www.distanceregular.org/graphs/paulus25.html remain largely open.
Graph Homomorphism Convolution

Acknowledgement Dell, H., Grohe, M., and Rattan, G. Lovász meets weisfeiler
and leman. In Proceedings of the 45th International
HN is supported by the Japanese Government Scholarship Colloquium on Automata, Languages, and Programming
(Monbukagakusho: MEXT SGU 205144). We would like (ICALP’18), 2018.
to thank the anonymous reviewers whose comments have
helped improving the quality of our manuscript. We thank Dı́az, J., Serna, M., and Thilikos, D. M. Counting h-
Jean-Philippe Vert for the helpful discussion and suggestions colorings of partial k-trees. Theoretical Computer Sci-
on our early results. We thank Matthias Springer, Sunil ence, 281(1-2):291–309, 2002.
Kumar Maurya, and Jonathon Tanks for proof-reading our
manuscripts. Du, S. S., Hou, K., Póczos, B., Salakhutdinov, R., Wang,
R., and Xu, K. Graph neural tangent kernel: Fus-
References ing graph neural networks with graph kernels. CoRR,
abs/1905.13192, 2019.
Barabási, A.-L. et al. Network science. Cambridge univer-
sity press, 2016. Fey, M., Lenssen, J. E., Weichert, F., and Müller, H.
SplineCNN: Fast geometric deep learning with continu-
Benson, A. R., Gleich, D. F., and Leskovec, J. Higher-order ous B-spline kernels. In IEEE Conference on Computer
organization of complex networks. Science, 353(6295): Vision and Pattern Recognition (CVPR), 2018.
163–166, 2016.
Böker, J., Chen, Y., Grohe, M., and Rattan, G. The com- Flum, J. and Grohe, M. Parameterized Complexity Theory.
plexity of homomorphism indistinguishability. In 44th Springer, 2006.
International Symposium on Mathematical Foundations Freedman, M., Lovász, L., and Schrijver, A. Reflection pos-
of Computer Science (MFCS’19), 2019. itivity, rank connectivity, and homomorphism of graphs.
Borgs, C., Chayes, J. T., Lovász, L., Sós, V. T., and Veszter- Journal of the American Mathematical Society, 20(1):
gombi, K. Convergent sequences of dense graphs i: Sub- 37–51, 2007.
graph frequencies, metric properties and testing. Ad-
vances in Mathematics, 219(6):1801–1851, 2008. Gärtner, T. A survey of kernels for structured data. ACM
SIGKDD Explorations Newsletter, 5(1):49–58, 2003.
Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan,
S., Smola, A. J., and Kriegel, H.-P. Protein function Gärtner, T., Flach, P., and Wrobel, S. On graph kernels:
prediction via graph kernels. Bioinformatics, 21(suppl 1): Hardness results and efficient alternatives. In Learning
i47–i56, 2005. theory and kernel machines, pp. 129–143. Springer, 2003.
Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., and Van- Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and
dergheynst, P. Geometric deep learning: going beyond Dahl, G. E. Neural message passing for quantum chem-
euclidean data. IEEE Signal Processing Magazine, 34(4): istry. In Proceedings of the 34th International Conference
18–42, 2017. on Machine Learning, volume 70, pp. 1263–1272. JMLR,
Cai, J.-Y. and Govorov, A. On a theorem of lovász that 2017.
hom(·, h) determines the isomorhphism type of h. arXiv
Hamilton, W., Ying, Z., and Leskovec, J. Inductive repre-
preprint arXiv:1909.03693, 2019.
sentation learning on large graphs. In Advances in Neural
Chen, Z., Villar, S., Chen, L., and Bruna, J. On the equiv- Information Processing Systems, pp. 1024–1034, 2017.
alence between graph isomorphism testing and function
approximation with gnns. In Advances in Neural Infor- Hart, K. P., Nagata, J.-i., and Vaughan, J. E. Encyclopedia
mation Processing Systems, pp. 15868–15876, 2019. of general topology. Elsevier, 2003.

Collins, M. and Duffy, N. Convolution kernels for natural Haussler, D. Convolution kernels on discrete structures.
language. In Advances in Neural Information Processing Technical report, Technical report, Department of Com-
Systems, pp. 625–632, 2002. puter Science, University of California . . . , 1999.
Debnath, A. K., Lopez de Compadre, R. L., Debnath, G., Hell, P. and Nesetril, J. Graphs and homomorphisms. Ox-
Shusterman, A. J., and Hansch, C. Structure-activity rela- ford University Press, 2004.
tionship of mutagenic aromatic and heteroaromatic nitro
compounds. correlation with molecular orbital energies Keriven, N. and Peyré, G. Universal invariant and
and hydrophobicity. Journal of medicinal chemistry, 34 equivariant graph neural networks. arXiv preprint
(2):786–797, 1991. arXiv:1905.04943, 2019.
Graph Homomorphism Convolution

Kersting, K., Kriege, N. M., Morris, C., Mutzel, P., Pržulj, N., Corneil, D. G., and Jurisica, I. Modeling interac-
and Neumann, M. Benchmark data sets for graph tome: scale-free or geometric? Bioinformatics, 20(18):
kernels, 2016. URL http://graphkernels.cs. 3508–3515, 2004.
tu-dortmund.de.
Robertson, N. and Seymour, P. D. Graph minors. ii. algo-
Kriege, N. M., Giscard, P.-L., and Wilson, R. On valid opti- rithmic aspects of tree-width. Journal of algorithms, 7
mal assignment kernels and applications to graph classi- (3):309–322, 1986.
fication. In Advances in Neural Information Processing
Systems, pp. 1623–1631, 2016. Sannai, A., Takai, Y., and Cordonnier, M. Universal approxi-
mations of permutation invariant/equivariant functions by
Kriege, N. M., Johansson, F. D., and Morris, C. A survey on deep neural networks. arXiv preprint arXiv:1903.01939,
graph kernels. arXiv preprint arXiv:1903.11835, 2019. 2019.
Lovász, L. Operations with structures. Acta Mathematica Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K.,
Hungarica, 18(3-4):321–328, 1967. and Borgwardt, K. Efficient graphlet kernels for large
graph comparison. In Artificial Intelligence and Statistics,
Lovász, L. Large networks and graph limits, volume 60.
pp. 488–495, 2009.
American Mathematical Soc., 2012.
Shervashidze, N., Schweitzer, P., Leeuwen, E. J. v.,
Lovász, L. and Szegedy, B. Limits of dense graph sequences.
Mehlhorn, K., and Borgwardt, K. M. Weisfeiler-lehman
Journal of Combinatorial Theory, Series B, 96(6):933–
graph kernels. Journal of Machine Learning Research,
957, 2006.
12(Sep):2539–2561, 2011.
Mahé, P. and Vert, J.-P. Graph kernels based on tree patterns
Togninalli, M., Ghisu, E., Llinares-López, F., Rieck, B.,
for molecules. Machine learning, 75(1):3–35, 2009.
and Borgwardt, K. Wasserstein weisfeiler-lehman graph
Maron, H., Ben-Hamu, H., Shamir, N., and Lipman, Y. kernels. arXiv preprint arXiv:1906.01277, 2019.
Invariant and equivariant graph networks. arXiv preprint
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S.
arXiv:1812.09902, 2018.
A comprehensive survey on graph neural networks. arXiv
Maron, H., Fetaya, E., Segol, N., and Lipman, Y. On preprint arXiv:1901.00596, 2019.
the universality of invariant networks. arXiv preprint
Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful
arXiv:1901.09342, 2019.
are graph neural networks? International Conference on
Mezentsev, A. A. A generalized graph-theoretic mesh opti- Learning Representations, 2019.
mization model. In IMR, pp. 255–264, 2004.
Yanardag, P. and Vishwanathan, S. Deep graph kernels.
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., In Proceedings of the 21th ACM SIGKDD International
Chklovskii, D., and Alon, U. Network motifs: simple Conference on Knowledge Discovery and Data Mining,
building blocks of complex networks. Science, 298(5594): pp. 1365–1374, 2015.
824–827, 2002.
Morris, C., Ritzert, M., Fey, M., Hamilton, W. L., Lenssen,
J. E., Rattan, G., and Grohe, M. Weisfeiler and leman go
neural: Higher-order graph neural networks. In Proceed-
ings of the AAAI Conference on Artificial Intelligence,
volume 33, pp. 4602–4609, 2019.
Murphy, R., Srinivasan, B., Rao, V., and Ribeiro, B. Rela-
tional pooling for graph representations. In Chaudhuri, K.
and Salakhutdinov, R. (eds.), Proceedings of the 36th In-
ternational Conference on Machine Learning, volume 97
of Proceedings of Machine Learning Research, pp. 4663–
4673, Long Beach, California, USA, 09–15 Jun 2019.
PMLR.
Niepert, M., Ahmed, M., and Kutzkov, K. Learning con-
volutional neural networks for graphs. In International
conference on machine learning, pp. 2014–2023, 2016.

You might also like